Hands-on experience with BigQuery: how to query with Google Analytics Data

Query and Analyse Your Google Analytics Data with BigQuery

What you’ll expect in this article:

  • Learn how Google Analytics data is structured
  • See how to work with partitioned tables and nested fields
  • Practice querying your Google Analytics data

Let’s start by discussing why you need to put your Google Analytics data in Google BigQuery?

The primary and foremost reason is that you want to have complete control of your GA data if Google ever decides to remove your GA property for a reason. And there, you will have your data stored in BigQeury for backup.

GA 360 and GA 4 allow you to store your data in BigQuery. Still, if you are using Universal Analytics, you may need to use the external connector .i.e Supermetrics, Fivetran, Bluemetrix, Stitch, etc., to assist you with the data migration to BigQuery.

Example of Google Analytics data in BigQuery

Partitioned Table

Partitioned Table

According to Google Cloud reference, a partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance, and you can control costs by reducing the number of bytes read by a query.

Partitioned table is not used only for Google Analytics, but for every data. And you would always need to store your data in it.

Nested Fields

Nested Fields

BigQuery experts always recommend denormalizing your data in a single place. So simply put, nested fields are the repeated fields nested in one record. For example, your GA visit can have several hits, page views, events, etc. And the details will be stored as nested fields at this particular session.

Query and analyse your GA Data in BigQuery

In this guide, we’ll use GA sample in the public data sets to do some queries in BigQuery. To find the GA public data, simply click on +ADD Data > explore public sets > search Google Analytics > click on GA sample and verify it.

You can see the google_analytics_sample datasets listed under bigquery-public-data. By checking if table is segmented by the date of ingestion, you can recognise this dataset is built with partition table.

How to check if the dataset is built with partitioned table.

For the nested field, go to your preview, and you can find plenty of repeated fields, e.g. one session with all information about the hits stored in nested fields.

Nested fields in GA public dataset.

To work with dates and partition tables, click Query, and the last date available for this data will always show at the end.

Work with dates and partition table in BigQuery.

So says you want to retrieve the data for more than the last day, you will need to follow the instructions below by inserting the * key to select the specific timeframe and apply condition with where_TABLE_SUFFIX between ‘specific date’ and ‘ specific date’. Using _TABLE_SUFFIX will give Google attention to the date in GA data, and you have to follow the date format in the columns date.

applying a condition to filter the specific date with your partitioned table

Note that when you run Queryquery, you will retrieve each page view per date, but if you want to know the sum of total pages per date?

Result ofQueryquery

This is the next step you will need to do — adding the sum (totals.pageviews) and renaming the field title as total_pageviews and group by date.

The sum of the total page views.

There is another way where you can enjoy more functions with table suffix. By adding the following queries:

FORMAT_DATE (‘%Y%m%d’, DATE_SUB(CURRENT_DATE(), INTERVAL 1095 DAY))

AND FORMAT_DATE (‘%Y%m%d’, DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))

GROUP BY date

ORDER BY date

Play more with table suffix

This means you subtract the total page view from the last 1095 days to the day before, with the corresponding format to display data accordingly. You can, of course, change the DAY to YEAR/MONTH to give you the exact data. This is the typical function that you can use for your BigQuery date data, and learn more about it here. So whenever you are working with the partitioned table, you will always need to add TABLE SUFFIX for your filter to function.

Changing the interval day to year.

Working with Nested Fields

Before going ahead, this big query cookbook link is something you would need to check in order to proceed. There are many standard SQL you can use for practice, and we are going to the sequence of hits, for example:

#standardSQL
SELECT
fullVisitorId,
visitId,
visitNumber,
hits.hitNumber AS hitNumber,
hits.page.pagePath AS pagePath
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(hits) as hits
WHERE
_TABLE_SUFFIX BETWEEN ‘20170701’ AND ‘20170731’
AND
hits.type=”PAGE”
ORDER BY
fullVisitorId,
visitId,
visitNumber,
hitNumber

— — — — — — — — — — — — — — — — — — — — — — — — — — — — —

How you can interpretQueryquery is that you need to select your normal field, which is

#standardSQL
SELECT
fullVisitorId,
visitId,
visitNumber,

and these fields are nested.

hits.hitNumber AS hitNumber,
hits.page.pagePath AS pagePath

To retrieve the nested fields, you must select from your table and choose the sub-table containing the nested fields.

FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(hits) as hits

So you might question how to identify nested fields? You will see from the table listed with the record and repeated, which is the nested field you are looking for.

And here, you specify the filter and order by, and then you hit RUN.

WHERE
_TABLE_SUFFIX BETWEEN ‘20170701’ AND ‘20170731’
AND
hits.type=”PAGE”
ORDER BY
fullVisitorId,
visitId,
visitNumber,
hitNumber

From the result, you can see the real sequence of the page path. It is always good to note that you need to use the UNNEST ( field ) as NAME if you want to retrieve the nested field.

Imagine creating a query to retrieve the totals of page views, hits and transactions for the date. What you can do is to add the sum () and run it

#standardSQL
SELECT
date,
SUM ( totals.pageviews) AS pageviews,
SUM ( totals.hits) AS hits,
SUM ( totals.transactions) AS transactions,

FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(hits) as hits

GROUP BY
date

ORDER BY
date

You will notice that the null number from your result, so what you can do is to apply the following function:

By using WITH function, we are basically telling Query to give a name to the subquery ( SELECT to ORDER BY) and select all (*) of the data from table_total with the condition applied (where).

Resources

“Everything is theoretically impossible, until it is done.” Robert A. Heinlein.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Receive or Pull? Part 2 — Take the Disc!

Data Mining VS Data Extraction: What’s the Difference?

PREDICTIVE WEB ANALYTICS: A CASE STUDY

Diving into Global Information Design: Cosmology in the Large

Classical classifier combination techniques: Voting approaches, Borda Counts and…

Future-proof your Analytics Efforts: Hire Diverse Teams

10 Simple Excel Functions for Data Analysis

Big Data and Big Analytics: Latest Trends

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
janet wong

janet wong

“Everything is theoretically impossible, until it is done.” Robert A. Heinlein.

More from Medium

WITH, Create Permanent Unnested Tables from Temporary Tables in a GA4 E-commerce Project in…

Cover image. A BigQuery and WITH CLAUSE logo appears.

Google Analytics sequence segments with BigQuery

Making Interactive Report Using Google Data Studio and BigQuery Sandbox (Public Dataset…

Google Tag Manager Custom Dimensions & Metrics