As of January 1, 2020 this library no longer supports Python 2 on the latest released version. Library versions released prior to that date will continue to be available. For more information please visit Python 2 support on Google Cloud.

Authentication

Before you begin, you must create a Google Cloud Platform project. Use the BigQuery sandbox to try the service for free.

pandas-gbq authenticates with the Google BigQuery service via OAuth 2.0. Use the credentials argument to explicitly pass in Google Credentials.

Default Authentication Methods

If the credentials parameter is not set, pandas-gbq tries the following authentication methods:

  1. In-memory, cached credentials at pandas_gbq.context.credentials. See pandas_gbq.Context.credentials for details.

    import pandas_gbq
    
    credentials = ...  # From google-auth or pydata-google-auth library.
    
    # Update the in-memory credentials cache (added in pandas-gbq 0.7.0).
    pandas_gbq.context.credentials = credentials
    pandas_gbq.context.project = "your-project-id"
    
    # The credentials and project_id arguments can be omitted.
    df = pandas_gbq.read_gbq("SELECT my_col FROM `my_dataset.my_table`")
    
  2. If running on Google Colab, pandas-gbq attempts to authenticate with the google.colab.auth.authenticate_user() method. See the Getting started with BigQuery on Colab notebook for an example of using this authentication method with other libraries that use Google BigQuery.

    Note

    To use Colab authentication, install version 1.8.0 or later of the pydata-google-auth package.

  3. Application Default Credentials via the google.auth.default() function.

    Note

    If pandas-gbq can obtain default credentials but those credentials cannot be used to query BigQuery, pandas-gbq will also try obtaining user account credentials.

    A common problem with default credentials when running on Google Compute Engine is that the VM does not have sufficient access scopes to query BigQuery.

  4. User account credentials.

    pandas-gbq loads cached credentials from a hidden user folder on the operating system.

    Windows

    %APPDATA%\pandas_gbq\bigquery_credentials.dat

    Linux/Mac/Unix

    ~/.config/pandas_gbq/bigquery_credentials.dat

    If pandas-gbq does not find cached credentials, it prompts you to open a web browser, where you can grant pandas-gbq permissions to access your cloud resources. These credentials are only used locally. See the privacy policy for details.

Authenticating with a Service Account

Using service account credentials is particularly useful when working on remote servers without access to user input.

Create a service account key via the service account key creation page in the Google Cloud Platform Console. Select the JSON key type and download the key file.

To use service account credentials, set the credentials parameter to the result of a call to:

  • google.oauth2.service_account.Credentials.from_service_account_file(),

    which accepts a file path to the JSON file.

    from google.oauth2 import service_account
    import pandas_gbq
    
    credentials = service_account.Credentials.from_service_account_file(
        'path/to/key.json',
    )
    df = pandas_gbq.read_gbq(sql, project_id="YOUR-PROJECT-ID", credentials=credentials)
    
  • google.oauth2.service_account.Credentials.from_service_account_info(),

    which accepts a dictionary corresponding to the JSON file contents.

    from google.oauth2 import service_account
    import pandas_gbq
    
    credentials = service_account.Credentials.from_service_account_info(
        {
            "type": "service_account",
            "project_id": "YOUR-PROJECT-ID",
            "private_key_id": "6747200734a1f2b9d8d62fc0b9414c5f2461db0e",
            "private_key": "-----BEGIN PRIVATE KEY-----\nM...I==\n-----END PRIVATE KEY-----\n",
            "client_email": "service-account@YOUR-PROJECT-ID.iam.gserviceaccount.com",
            "client_id": "12345678900001",
            "auth_uri": "https://accounts.google.com/o/oauth2/auth",
            "token_uri": "https://accounts.google.com/o/oauth2/token",
            "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
            "client_x509_cert_url": "https://www.googleapis.com/...iam.gserviceaccount.com"
        },
    )
    df = pandas_gbq.read_gbq(sql, project_id="YOUR-PROJECT-ID", credentials=credentials)
    

Alternatively, you can set GOOGLE_APPLICATION_CREDENTIALS environment variable to the full path to the JSON file.

$ export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json

Use the with_scopes() method to use authorize with specific OAuth2 scopes, which may be required in queries to federated data sources such as Google Sheets.

credentials = ...
credentials = credentials.with_scopes(
    [
        'https://www.googleapis.com/auth/drive',
        'https://www.googleapis.com/auth/cloud-platform',
    ],
)
df = pandas_gbq.read_gbq(..., credentials=credentials)

See the Getting started with authentication on Google Cloud Platform guide and Google Auth Library User Guide for more information on service accounts.

Authenticating with a User Account

Use the pydata-google-auth library to authenticate with a user account (i.e. a G Suite or Gmail account). The pydata_google_auth.get_user_credentials() function loads credentials from a cache on disk or initiates an OAuth 2.0 flow if cached credentials are not found.

import pandas_gbq
import pydata_google_auth

SCOPES = [
    'https://www.googleapis.com/auth/cloud-platform',
    'https://www.googleapis.com/auth/drive',
]

credentials = pydata_google_auth.get_user_credentials(
    SCOPES,
    # Note, this doesn't work if you're running from a notebook on a
    # remote sever, such as over SSH or with Google Colab. In those cases,
    # install the gcloud command line interface and authenticate with the
    # `gcloud auth application-default login` command and the `--no-browser`
    # option.
    auth_local_webserver=True,
)

df = pandas_gbq.read_gbq(
    "SELECT my_col FROM `my_dataset.my_table`",
    project_id='YOUR-PROJECT-ID',
    credentials=credentials,
)

Warning

Do not store credentials on disk when using shared computing resources such as a GCE VM or Colab notebook. Use the pydata_google_auth.cache.NOOP cache to avoid writing credentials to disk.

import pydata_google_auth.cache

credentials = pydata_google_auth.get_user_credentials(
    SCOPES,
    # Use the NOOP cache to avoid writing credentials to disk.
    cache=pydata_google_auth.cache.NOOP,
)

Additional information on the user credentials authentication mechanism can be found in the Google Cloud authentication guide.

Authenticating from Highly Constrained Development Environments

The instructions above may not be adequate for users who are working in a highly constrained development environment:

Highly constrained development environments typically prevent users from using the Default Authentication Methods and are generally characterized by one or more of the following circumstances:

  • There are limitations on what you can install on the development environment (i.e. you can’t install gcloud).

  • You don’t have access to a graphical user interface (i.e. you are remotely SSH’ed into a headless server and don’t have access to a browser to complete the authentication process used in the default login workflow) .

  • The code is being executed in a typical data science context: using a Jupyter (or similar) notebook.

If the conditions above apply to you, your needs may be better served by the content in the Authentication (Highly Constrained Development Environment) section.