Python - Get Google Analytics Data
Upasana | January 09, 2020 | 4 min read | 464 views
Google Analytics is a web analytics service offered by Google that tracks and reports website traffic, currently as a platform inside the Google Marketing Platform brand.
Google Analytics Reporting API v4
The Google Analytics Reporting API v4 gives you access to the power of the Google Analytics platform. The API provides these key features:
-
Metric expressions
The API allows you to request not only built-in metrics but also combination of metrics expressed in mathematical operations. For example, you can use the expression ga:goal1completions/ga:sessions to request the goal completions per number of sessions.
-
Multiple date ranges
The API allows you in a single request to get data in two date ranges.
-
Cohorts and Lifetime value
The API has a rich vocabulary to request Cohort and Lifetime value reports.
-
Multiple segments
The API enables you to get multiple segments in a single request.
There is a list of dimensions and metrics which you can understand and define while querying as per the business use case.
Get all Pageviews of blog site from google analytics using python
Before starting on quering data, we need to define few pre-requisites
Environment pre-requisites
Create virtual environment with name venv
and activate it.
virtualenv -p python3.6 venv
source venv/bin/activate
Download google libraries
pip install --upgrade google-api-python-client
Import Libraries
from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials
Parameters for accessing analytics data
Now, we will define the required parameters
SCOPES = ['https://www.googleapis.com/auth/analytics.readonly']
KEY_FILE_LOCATION = './config/xxxxx-yyyy-vvvvjj68733.json' (2)
VIEW_ID = '66666666' (3)
-
Move the previously downloaded client_secrets.json to the same directory as the sample code.
-
Replace the values of the
KEY_FILE_LOCATION
with the appropriate values from the Developer Console. -
Replace the value of
VIEW_ID
. You can use the Account Explorer to find a View ID or you can also find it inAdmin
section on google analytics web app.
Initialize Reporting analytics
def initialize_analyticsreporting():
"""Initializes an Analytics Reporting API V4 service object.
Returns:
An authorized Analytics Reporting API V4 service object.
"""
credentials = ServiceAccountCredentials.from_json_keyfile_name(
KEY_FILE_LOCATION, SCOPES)
# Build the service object.
analytics = build('analyticsreporting', 'v4', credentials=credentials)
return analytics
First, we will define a base method to query for google analytics data. We will be changing body
as per our use cases
def get_report_v2(analytics,body):
return analytics.reports().batchGet(body = body).execute()
Query page views for all pages on blog site
The json body of reportRequests
, we need to define different parameters.
{
"reportRequests": [
{
"viewId": VIEW_ID, (1)
"dateRanges": [{"startDate": start_date, "endDate": end_date}], (2)
"metrics": [{"expression": "ga:pageviews"}], (3)
"dimensions": [{"name": "ga:pagePath"}]
}]
}
-
View ID defined above will be used here
-
We need to define dates here, in which we want to query data. They should be in format
YYYY-MM-DD
start_date = '2019-04-01'
end_date = 'today'
-
Metrics shall be defined as
ga:pageviews
as we want to get page views of our blog site/website -
Dimensions shall be defined as
ga:pagePath
as we want views against page paths from parent site/website. In case, you want against page titles then this can be defined asga:pageTitle
"ga:pagePath"
is advised to use as on blog sites, there could be cases of editing page title so page views of same article will divided because of different page titles at different time.
analytics = initialize_analyticsreporting()
response = get_report(analytics,start_date,end_date)
print(response)
This will give output something like below.
{'reports': [{'columnHeader': {'dimensions': ['ga:pagePath'], 'metricHeader': {'metricHeaderEntries': [{'name': 'ga:pageviews', 'type': 'INTEGER'}]}}, 'data': {'rows': [{'dimensions': ['/'], 'metrics': [{'values': ['2550']}]}, {'dimensions': ['/?page=1'], 'metrics': [{'values': ['13']}]}, {'dimensions': ['/?page=10'], 'metrics': [{'values': ['5']}]}, {'dimensions': ['/?page=12'], 'metrics': [{'values': ['5']}]}, {'dimensions': ['/?page=13'], 'metrics': [{'values': ['1']}]}, {'dimensions': ['/?page=14'], 'metrics': [{'values': ['2']}]}, {'dimensions': ['/?page=15'], 'metrics': [{'values': ['2']}]}, {'dimensions': ['/?page=17'], 'metrics': [{'values': ['2']}]}, ......}
Hard to understand?? Don’t worry.
dataframe = normalize_response(response)
print(dataframe)
This will give you output of a dataframe which will have columns containing two columns i.e. dimensions, metrics
Congratulations, we have successfully got the data we wanted.
Query page views for specific pages on blog site
Earlier, we got data for all pages but let’s say we want data for only those pages whose titles which contains specific terms.
Since we want to filter on page titles, page Titles are Dimensions so we will be using dimensionFilterClauses
to filter in the query
{
"reportRequests": [
{
"viewId": "",
"dateRanges": [
{
"startDate": "",
"endDate": ""
}
],
"metrics": [
{
"expression": ""
}
],
"dimensions": [
{
"name": ""
}
],
"dimensionFilterClauses": [
{
"filters": [
{
"expressions": [],
"dimensionName": ""
}
]
}
]
}
]
}
Now, we need to define the parameters in the body and get report from google analytics using batchGet
as earlier.
Great! we have successfully queried the data with filters too.
Top articles in this category:
- Google Data Scientist interview questions with answers
- Google Colab: import data from google drive as pandas dataframe
- Connect to MySQL with Python 3.x and get Pandas Dataframe
- Top 100 interview questions on Data Science & Machine Learning
- Connect to Postgresql with Python 3.x and get Pandas Dataframe
- Connect to Cassandra with Python 3.x and get Pandas Dataframe
- Python coding challenges for interviews