.. _reader: Reading Tables ============== Use the :func:`pandas_gbq.read_gbq` function to run a BigQuery query and download the results as a :class:`pandas.DataFrame` object. .. literalinclude:: samples/snippets/read_gbq_simple.py :language: python :dedent: 4 :start-after: [START bigquery_pandas_gbq_read_gbq_simple] :end-before: [END bigquery_pandas_gbq_read_gbq_simple] .. note:: A project ID is optional if it can be inferred during authentication, but it is required when authenticating with user credentials. You can find your project ID in the `Google Cloud console `__. You can define which column from BigQuery to use as an index in the destination DataFrame as well as a preferred column order as follows: .. code-block:: python data_frame = pandas_gbq.read_gbq( 'SELECT * FROM `test_dataset.test_table`', project_id=projectid, index_col='index_column_name', columns=['col1', 'col2']) Querying with legacy SQL syntax ------------------------------- The ``dialect`` argument can be used to indicate whether to use BigQuery's ``'legacy'`` SQL or BigQuery's ``'standard'`` SQL. The default value is ``'standard'``. .. literalinclude:: samples/snippets/read_gbq_legacy.py :language: python :dedent: 4 :start-after: [START bigquery_pandas_gbq_read_gbq_legacy] :end-before: [END bigquery_pandas_gbq_read_gbq_legacy] * `Standard SQL reference `__ * `Legacy SQL reference `__ .. _reading-dtypes: Inferring the DataFrame's dtypes -------------------------------- The :func:`~pandas_gbq.read_gbq` method infers the pandas dtype for each column, based on the BigQuery table schema. ================== ========================= BigQuery Data Type dtype ================== ========================= BOOL boolean INT64 Int64 FLOAT64 float64 TIME dbtime DATE dbdate or object DATETIME datetime64[ns] or object TIMESTAMP datetime64[ns, UTC] or object ================== ========================= If any DATE/DATETIME/TIMESTAMP value is outside of the range of `pandas.Timestamp.min `__ (1677-09-22) and `pandas.Timestamp.max `__ (2262-04-11), the data type maps to the pandas `object` dtype. .. _reading-bqstorage-api: Improving download performance ------------------------------ Use the BigQuery Storage API to download large (>125 MB) query results more quickly (but at an `increased cost `__) by setting ``use_bqstorage_api`` to ``True``. #. Enable the BigQuery Storage API on the project you are using to run queries. `Enable the API `__. #. Ensure you have the `bigquery.readsessions.create permission `__. to create BigQuery Storage API read sessions. This permission is provided by the `bigquery.user role `__. #. Install the ``google-cloud-bigquery-storage`` and ``pyarrow`` packages. With pip: .. code-block:: sh pip install --upgrade google-cloud-bigquery-storage pyarrow With conda: .. code-block:: sh conda install -c conda-forge google-cloud-bigquery-storage #. Set ``use_bqstorage_api`` to ``True`` when calling the :func:`~pandas_gbq.read_gbq` function. As of the ``google-cloud-bigquery`` package, version 1.11.1 or later,the function will fallback to the BigQuery API if the BigQuery Storage API cannot be used, such as with small query results. Advanced configuration ---------------------- You can specify the query config as parameter to use additional options of your job. Refer to the `JobConfiguration REST resource reference `__ for details. .. code-block:: python configuration = { 'query': { "useQueryCache": False } } data_frame = read_gbq( 'SELECT * FROM `test_dataset.test_table`', project_id=projectid, configuration=configuration)