Working with SQL in BigQuery
Discover how to work with your data in BigQuery. You’ll learn to use SQL language to create, register and schedule your queries, views and new tables.
What you can expect from the article:
- Discover exactly how you can work with your data in BigQuery
- Learn basic SQL language by creating and testing your queries
- Prepare data for your reports and analysis using queries and views
Before diving into the use case of working with data in BigQuery, here is some terminology you will encounter in this article.
SQL is a language that allows you to communicate with your database. A critical language in SQL is JOIN, which enables you to join your table by retrieving some data from several tables. Below are various join you can see in SQL language.
BigQuery queries are similar to Google Analytics custom segments. It allows you to filter the data you need. In SQL, you will SELECT these columns from the table.
View is like view in GA. When users create a query, you’ll save it as a view. It can be seen as a virtual table that you query afterwards.
Now, let’s train some SQL queries and learn how to work with data in BigQuery.
First thing first, you have to create a project ID, dataset and table. If you are not sure what to do, you can review the previous article on importing datasets.
Great tip: Always bear in mind to learn what your table contains in the fields.
Once you have table created, click on QUERY TABLE. You will see the basic Query as follow. Make sure to click on FORMAT QUERY so that it can be more readable.
Assuming you need only Region and Country, you can select these two columns from the Table schema and limit the number to five ( it means the BigQuery will show you only five rows of the selected columns).
Hit RUN, and you’ll see the five-row of selected data in Query Results. If you remove the LIMIT, you will get 100 rows with Region and Country. Similarly, if you change the limit to 10, it will give you ten rows of results.
Let’s say you want to aggregate the data by understanding distinct regionsRegion under your dataset. You can add DISTINCT after SELECT. It will show you the name of unique regions.
You can also ORDER BY Region so that it can be presented in alphabetical order.
Or, you can ORDER BY Region DESC.
If you want to exclude some field, you can add a keyword after SELECT * EXCEPT ( field name, field name) FROM….And your selected field data will be excluded from the results.
SELECT * means select all the fields.
SQL language also allows you to do the FILTER function on your data, i.e. isolate online sales. To filter data, you will need to select some of the fields and add WHERE your field name = “Online.”
BigQuery lets you go further with the WHERE operator by having several conditions applied. You can have WHERE Region = ‘Asia’ Or/AND Region= ‘Europe’ Order By Region.
Not only can you apply the Query above, but you can also add conditions with different fields, for example, WHERE Region = ‘Asia’ AND Sales_Channel = ‘Online’ Order By Region.
Of course, this is where your creativity comes in and how you will run the data.
Note that it is not possible to work when you want to perform operations on the aggregated data. WHERE operator can only be used on existing tables but not with the created new fields.
Alternatively, using the HAVING operator allows you to work with the calculations in your data. Let’s say you want to filter the data with revenue divided by two that are more than one million. To do that, select Region, Country and Total_Revenue. Add the GROUP BY before performing the filter.
This is how you are going to create:
Hit RUN, and you will only see the revenues that are more than one million in your table.
So, always remember — WHERE filters your data before it is calculated. HAVING, it is done after it’s calculated, along with GROUP BY.
Furthermore, joining two tables are possible in Query — all you need is using UNION ALL. SELECT ALL fields FROM table one, and UNION ALL SELECT all the fields from table two. Click Run, and here you have 200 rows after combining two tables.
After all, you can also name your joined table on the Query and select a specific field directly from the join. All you need to do is as follow:
Using JOINs in BigQuery
To work with the JOINs, you must create a table called client_status and upload this data to your table.
Assume you have two tables created here ( sales_data & client_status), and you want to know the Total_Revenue in Client_status. You can join the two tables in order to get your data shown in Client_status. When following the Query below, you will need to take note to specify the two Order_ID from because BigQuery does not understand which Order_ID you are referring to. So, what you can do here is to name these tables Table A and Table B.
Since you also want to understand the Total_Revenue per Client_status and sum of the Total Revenue, you would need to follow the Query below.
Here’s a quick explanation in the slide deck about the SQL JOINs. And you can learn more about SQL joins and BigQuery in the links below.
Apart from JOINS, you can use different functions to calculate the desired results. For example, if you want to calculate how many different statuses you have, apply a built-in function called COUNT (Distinct Client_status).
Again, you can calculate sum/avg/max/min (Total_Revenue) using many other functions.
Whenever you create a group by, make sure you name it explicitly and save it on a personal or project level to be editable by other project members. You will find them in Saved Queries, but if you forget to keep them, you can always go to your Query history and lookout for it. Not to mention, you can save your view as well.
Schedule Query in BigQuery
Assume you want to retrieve every database of Total_Revenue per Region from your recent sales_data to another table. You will need to create an empty table that will receive the data and then schedule a query to retrieve it daily or weekly.
Schedule a query, and you will get three revenue for the last three days.
Then, create a Schedule query after activating an API. Name the scheduled Query as revenue_per_region and also put the destination as revenue_per_region so that the empty table can retrieve the data. Once confirmed, click SCHEDULE and you are done.
Since you learn how to create Query and View, you can easily create an insightful marketing report. Have fun and enjoy! Below are some good resources that you can check out.