5 Ways to Importing Marketing Data into BigQuery
Learn how to start working with your data in BigQuery. Discover different data import options and additional Google Cloud tools like functions, Dataflow, data prep and data transfer.
BigQuery is Google's fully managed, scalable analytics data warehouse. With BigQuery, you can query terabytes of marketing data at lightning speed and explore your marketing data through powerful visualisations. This blog post will walk you through 5 ways to import your marketing data into BigQuery and the actual use case to analyse your data easily.
Google Services for BigQuery Data Import Options
- BigQuery Data Transfer Service
To find the Data Transfers option, go to BigQuery on the left panel and select it. You may require to enable API access before using it. After enabling it, click on CREATE A TRANSFER to import various data sources types into BigQuery.
Do note that some of the data transfer is not free, and you may need to pay the transfer for specific data sources, for example, Google Ads ( 2.5$ per month per customer ID), Youtube ( 5$ per month per channel) etc.
Assume you fill in all the data sources and data destination information, and you can then connect to your Google Account and save it.
So this is a quick overview of where you can create a transfer within the interface. We'll talk about an actual use case and show you the walkthrough below.
2. Google Analytics 360 / Google Analytics 4
Migrating data from Google Analytics 360 to BigQuery is possible where a sturdy data export set-up can be created between the applications easily. The importing services were only available in GA 360, but now it has extended the services to GA 4, and they are entirely free of cost.
3. Cloud Storage
BigQuery also supports querying Cloud Storage data in several formats, .i.e, CSV, JSON, AVRO, ORC, Parquet etc., from different storage classes. To get data into Google Storage, you might need a hand from your developers to send the files automatically to the exact destination.
4. Google Cloud Function
Cloud functions can be triggered by HTTP requests, Cloud storage events or even Cloud Pub/Sub messages to enable users to transfer data to BigQuery automatically. It is a cost-effective component to include in your solutions since you only pay for it when the function is triggered.
5. Google Cloud Dataflow
Often, if you want to go further with streaming data, you can check on some tools like Pub/Sub or Dataflow from Google cloud. Pub/Sub is a fully managed tool that receives and send messages between independent applications. Whereby Dataflow takes
Google Cloud Dataflow is a cloud-based data processing service for both batch and real-time data streaming applications. It enables developers to set up data pipelines and transfer the streaming data to BigQuery.
BigQeury Data Import from Local Machine
There are two ways where you can load data from a readable data source:
- load data manually ( our goal is to automate the report, so this is not going to cover today)
- Load data automatically by using API or Client Libraries. *For those who are not sure about API, it is a set of clearly defined methods of communication among various components — application, software. For example, when you connect to BigQuery API, your developers can automatically send the files to BigQuery anytime they want.
There are also two options sending the data to BigQuery or Cloud Storage
- Batch ( load data one time per week/day/hour)
- Streaming ( load data in real-time without delaying)
Generally speaking, the Batch option is sufficient for all the marketer/digital analysts to play around and hence, we'll not cover streaming data due to its complexity.
For BigQuery Data Streaming, there are three options you can do for your data ingestion:
- Develop your custom solution
- Use Google solutions ( Google Dataflow/ PubSub) * You will still need to develop something from your site to stream the data.
- Or use paid solutions such as OWOX, Bluemetrix, Stitch Data, Fivetran and others.
One of the use cases that have been seen commonly in data streaming is mobile applications. Since there are plenty of events that happen to stream data every time, it is required to load the data in real-time. The mobile application is provided with the option, but Google Analytics is also available to stream the data to BigQuery.
Use Case: Consolidating Real- Estate offline and online data
Here we want to show you an actual use case on how you can use your Google BigQuery data in your marketing reports. Assume you have two storage buckets — CRM and back-office and you want to send these CSV Files to storage buckets once per date (approximately 1500€ paid once to the developers). Using the cloud python function, you then pulled the data from CRM data and bonded it automatically to the BigQuery table. And now, two datasets in bigquery with two 2019 tables. You then use OWOX for GA data streaming to bigquery (200€ per month per 200k users, and if you only have the standard GA, you would need to rely on an external tool). Since you have combined all data, you can now create powerful queries and understand the insights, i.e. if the user came from the ads bought the house. And your money spent on bigquery itself till now — 0€
To achieve the use case above, you would need to create datasets in your projects. Give a name as "crm_data" and choose a location in Europe ( your data will be stored on the EU servers). Then enable a table expiration with the days, and hit CREATE DATASET.
Now, you will need to create table to load the data into your empty datasets. There are several options you can create an empty table:
Empty Table
Select the Empty Table from scratch, name your table and insert JSON Schema. ( You can use the below JSON as an input ) Click Create Table and then head to your crm_data to check the schema. You'll see the table you've just created.
Google Cloud Storage
Select create table from Google Cloud Storage and click Browse to select a file from the GCS bucket. If you click Browse and see nothing in the file, you need to go over to the Cloud Storage from the left panel.
Next, click CREATE BUCKET on the top. Give a unique name to your bucket, select where to store your data, choose a default storage class as Standard and click CREATE.
You will upload your CSV files to Bucket, and that's it.
Now, go back to your bigquery, select CRM data and create table. You can then browse the CSV files through the bucket and pick the exact file you want.
Double-check the File Format and enable Auto Detect to ensure if your data schema is correctly defined. Select No Partitioning and then hit Create Table.
Desktop
Another method you can create table is by uploading the file from your desktop computer. Again, same step. Click on Browse and choose a file you want to load in BigQuery. Make sure you have all auto detect and no partitioning checked. Then create table, and that's how you load data to BigQuery manually.
Google Drive
Next, loading data from Google Drive. Open up the Google Sheet in your Drive, copy the URL and paste it on the select Drive URL. Double-check your file format, go through the same steps you did early. It is essential to go to the Advanced option and ask the Header rows to skip as 1 ( the first row of the sheet). Hit Create Table, and you are done.
Note that there is no preview mode in your table because all your data is available and stored on Google Drive, and hence it is always recommended to use another table option. If there is an exception where you need to connect to the Google Drive file, you can click on QUERY TABLE and run the query. Then you can see all the tables under schema.
*Since Cloud BigTable is another Google cloud service and requires specific knowledge, we'll skip it in this article.
Upload Data from Storage to BigQuery using Google Cloud Function
To continue our use case, create a data set called sales_data and table that will receive from Upload. Select the exact file ( make sure you create the header row so that BigQuery detects the schema and understands every field's data type), then upload to this place and name the table as sales2021.
Imagine your developers needs to send you the new data sales files via API every day, and you want to load all these files from Storage to BigQuery. This is where Cloud Function comes in.
Next, head to Cloud Function and click CREATE FUNCTION. Name your function and select the memory allocated to 1GB, Cloud Storage for your Trigger and Finalise/Create for Event Type. Here, it means that every time the file is created and added to Cloud Storage, the function will be triggered.
Choose your bucket and insert the code with Python and the requirement text as shown below.
After inserting the code, you will have to change some elements, .i.e, the project name, bucket name, dataset table etc. A good tip is to use CTRL+ F and search "Your" Whatever contains Your will require updating the name, and you can leave the rest as it is.
Once you specify all the names, execute the first function by inserting bigQueryImport. Hit create, and it'll take some time to get the function deployed.
To ensure your sales.csv data sync through your BBigQuery'stable data( file that you inserted in your python code, you'll need to upload the sales.cvs file to the bucket in Storage too. There you'll see your sales.csv data loaded to your preview table, and that's it.