Authentication¶
Before you begin, you must create a Google Cloud Platform project. Use the BigQuery sandbox to try the service for free.
pandas-gbq authenticates with the Google BigQuery service via OAuth 2.0. Use
the credentials
argument to explicitly pass in Google
Credentials
.
Default Authentication Methods¶
If the credentials
parameter is not set, pandas-gbq tries the following
authentication methods:
In-memory, cached credentials at
pandas_gbq.context.credentials
. Seepandas_gbq.Context.credentials
for details.import pandas_gbq credentials = ... # From google-auth or pydata-google-auth library. # Update the in-memory credentials cache (added in pandas-gbq 0.7.0). pandas_gbq.context.credentials = credentials pandas_gbq.context.project = "your-project-id" # The credentials and project_id arguments can be omitted. df = pandas_gbq.read_gbq("SELECT my_col FROM `my_dataset.my_table`")
Application Default Credentials via the
google.auth.default()
function.Note
If pandas-gbq can obtain default credentials but those credentials cannot be used to query BigQuery, pandas-gbq will also try obtaining user account credentials.
A common problem with default credentials when running on Google Compute Engine is that the VM does not have sufficient scopes to query BigQuery.
User account credentials.
pandas-gbq loads cached credentials from a hidden user folder on the operating system.
- Windows
%APPDATA%\pandas_gbq\bigquery_credentials.dat
- Linux/Mac/Unix
~/.config/pandas_gbq/bigquery_credentials.dat
If pandas-gbq does not find cached credentials, it prompts you to open a web browser, where you can grant pandas-gbq permissions to access your cloud resources. These credentials are only used locally. See the privacy policy for details.
Authenticating with a Service Account¶
Using service account credentials is particularly useful when working on remote servers without access to user input.
Create a service account key via the service account key creation page in the Google Cloud Platform Console. Select the JSON key type and download the key file.
To use service account credentials, set the credentials
parameter to the result of a call to:
google.oauth2.service_account.Credentials.from_service_account_file()
,which accepts a file path to the JSON file.
from google.oauth2 import service_account import pandas_gbq credentials = service_account.Credentials.from_service_account_file( 'path/to/key.json', ) df = pandas_gbq.read_gbq(sql, project_id="YOUR-PROJECT-ID", credentials=credentials)
google.oauth2.service_account.Credentials.from_service_account_info()
,which accepts a dictionary corresponding to the JSON file contents.
from google.oauth2 import service_account import pandas_gbq credentials = service_account.Credentials.from_service_account_info( { "type": "service_account", "project_id": "YOUR-PROJECT-ID", "private_key_id": "6747200734a1f2b9d8d62fc0b9414c5f2461db0e", "private_key": "-----BEGIN PRIVATE KEY-----\nM...I==\n-----END PRIVATE KEY-----\n", "client_email": "service-account@YOUR-PROJECT-ID.iam.gserviceaccount.com", "client_id": "12345678900001", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://accounts.google.com/o/oauth2/token", "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", "client_x509_cert_url": "https://www.googleapis.com/...iam.gserviceaccount.com" }, ) df = pandas_gbq.read_gbq(sql, project_id="YOUR-PROJECT-ID", credentials=credentials)
Use the with_scopes()
method
to use authorize with specific OAuth2 scopes, which may be required in
queries to federated data sources such as Google Sheets.
credentials = ...
credentials = credentials.with_scopes(
[
'https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/cloud-platform',
],
)
df = pandas_gbq.read_gbq(..., credentials=credentials)
See the Getting started with authentication on Google Cloud Platform guide for more information on service accounts.
Authenticating with a User Account¶
Use the pydata-google-auth
library to authenticate with a user account (i.e. a G Suite or Gmail
account). The pydata_google_auth.get_user_credentials()
function loads
credentials from a cache on disk or initiates an OAuth 2.0 flow if cached
credentials are not found.
import pandas_gbq
import pydata_google_auth
SCOPES = [
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive',
]
credentials = pydata_google_auth.get_user_credentials(
SCOPES,
# Set auth_local_webserver to True to have a slightly more convienient
# authorization flow. Note, this doesn't work if you're running from a
# notebook on a remote sever, such as over SSH or with Google Colab.
auth_local_webserver=True,
)
df = pandas_gbq.read_gbq(
"SELECT my_col FROM `my_dataset.my_table`",
project_id='YOUR-PROJECT-ID',
credentials=credentials,
)
Warning
Do not store credentials on disk when using shared computing resources
such as a GCE VM or Colab notebook. Use the
pydata_google_auth.cache.NOOP
cache to avoid writing credentials
to disk.
import pydata_google_auth.cache
credentials = pydata_google_auth.get_user_credentials(
SCOPES,
# Use the NOOP cache to avoid writing credentials to disk.
cache=pydata_google_auth.cache.NOOP,
)
Additional information on the user credentials authentication mechanism can be found in the Google Cloud authentication guide.