Python - Get Google Analytics Data

Upasana | January 09, 2020 | 4 min read | 464 views


Google Analytics is a web analytics service offered by Google that tracks and reports website traffic, currently as a platform inside the Google Marketing Platform brand.

Google Analytics Reporting API v4

The Google Analytics Reporting API v4 gives you access to the power of the Google Analytics platform. The API provides these key features:

  • Metric expressions

The API allows you to request not only built-in metrics but also combination of metrics expressed in mathematical operations. For example, you can use the expression ga:goal1completions/ga:sessions to request the goal completions per number of sessions.

  • Multiple date ranges

The API allows you in a single request to get data in two date ranges.

  • Cohorts and Lifetime value

The API has a rich vocabulary to request Cohort and Lifetime value reports.

  • Multiple segments

The API enables you to get multiple segments in a single request.

There is a list of dimensions and metrics which you can understand and define while querying as per the business use case.

Get all Pageviews of blog site from google analytics using python

Before starting on quering data, we need to define few pre-requisites

Enable the API

file

Environment pre-requisites

Create virtual environment with name venv and activate it.

virtualenv -p python3.6 venv

source venv/bin/activate

Download google libraries

pip install --upgrade google-api-python-client

Import Libraries

from apiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials

Parameters for accessing analytics data

Now, we will define the required parameters

SCOPES = ['https://www.googleapis.com/auth/analytics.readonly']
KEY_FILE_LOCATION = './config/xxxxx-yyyy-vvvvjj68733.json' (2)
VIEW_ID = '66666666' (3)
  1. Move the previously downloaded client_secrets.json to the same directory as the sample code.

  2. Replace the values of the KEY_FILE_LOCATION with the appropriate values from the Developer Console.

  3. Replace the value of VIEW_ID. You can use the Account Explorer to find a View ID or you can also find it in Admin section on google analytics web app.

Initialize Reporting analytics

def initialize_analyticsreporting():
  """Initializes an Analytics Reporting API V4 service object.

  Returns:
    An authorized Analytics Reporting API V4 service object.
  """
  credentials = ServiceAccountCredentials.from_json_keyfile_name(
      KEY_FILE_LOCATION, SCOPES)

  # Build the service object.
  analytics = build('analyticsreporting', 'v4', credentials=credentials)

  return analytics

First, we will define a base method to query for google analytics data. We will be changing body as per our use cases

def get_report_v2(analytics,body):

  return analytics.reports().batchGet(body = body).execute()

Query page views for all pages on blog site

The json body of reportRequests, we need to define different parameters.

{
        "reportRequests": [
        {
          "viewId": VIEW_ID, (1)
          "dateRanges": [{"startDate": start_date, "endDate": end_date}], (2)
          "metrics": [{"expression": "ga:pageviews"}], (3)
          "dimensions": [{"name": "ga:pagePath"}]
        }]
      }
  1. View ID defined above will be used here

  2. We need to define dates here, in which we want to query data. They should be in format YYYY-MM-DD

start_date = '2019-04-01'
end_date = 'today'
  1. Metrics shall be defined as ga:pageviews as we want to get page views of our blog site/website

  2. Dimensions shall be defined as ga:pagePath as we want views against page paths from parent site/website. In case, you want against page titles then this can be defined as ga:pageTitle "ga:pagePath" is advised to use as on blog sites, there could be cases of editing page title so page views of same article will divided because of different page titles at different time.

analytics = initialize_analyticsreporting()
response = get_report(analytics,start_date,end_date)

print(response)

This will give output something like below.

Output
{'reports': [{'columnHeader': {'dimensions': ['ga:pagePath'], 'metricHeader': {'metricHeaderEntries': [{'name': 'ga:pageviews', 'type': 'INTEGER'}]}}, 'data': {'rows': [{'dimensions': ['/'], 'metrics': [{'values': ['2550']}]}, {'dimensions': ['/?page=1'], 'metrics': [{'values': ['13']}]}, {'dimensions': ['/?page=10'], 'metrics': [{'values': ['5']}]}, {'dimensions': ['/?page=12'], 'metrics': [{'values': ['5']}]}, {'dimensions': ['/?page=13'], 'metrics': [{'values': ['1']}]}, {'dimensions': ['/?page=14'], 'metrics': [{'values': ['2']}]}, {'dimensions': ['/?page=15'], 'metrics': [{'values': ['2']}]}, {'dimensions': ['/?page=17'], 'metrics': [{'values': ['2']}]}, ......}

Hard to understand?? Don’t worry.

dataframe = normalize_response(response)
print(dataframe)

This will give you output of a dataframe which will have columns containing two columns i.e. dimensions, metrics

Congratulations, we have successfully got the data we wanted.

Query page views for specific pages on blog site

Earlier, we got data for all pages but let’s say we want data for only those pages whose titles which contains specific terms.

Since we want to filter on page titles, page Titles are Dimensions so we will be using dimensionFilterClauses to filter in the query

Define the body
{
  "reportRequests": [
    {
      "viewId": "",
      "dateRanges": [
        {
          "startDate": "",
          "endDate": ""
        }
      ],
      "metrics": [
        {
          "expression": ""
        }
      ],
      "dimensions": [
        {
          "name": ""
        }
      ],
      "dimensionFilterClauses": [
        {
          "filters": [
            {
              "expressions": [],
              "dimensionName": ""
            }
          ]
        }
      ]
    }
  ]
}

Now, we need to define the parameters in the body and get report from google analytics using batchGet as earlier.

Great! we have successfully queried the data with filters too.


Top articles in this category:
  1. Google Data Scientist interview questions with answers
  2. Google Colab: import data from google drive as pandas dataframe
  3. Connect to MySQL with Python 3.x and get Pandas Dataframe
  4. Top 100 interview questions on Data Science & Machine Learning
  5. Connect to Postgresql with Python 3.x and get Pandas Dataframe
  6. Connect to Cassandra with Python 3.x and get Pandas Dataframe
  7. Python coding challenges for interviews

Recommended books for interview preparation:

Find more on this topic: