Scheduling Cloud Functions to Post to BigQuery

Matthew Hoss
7 min readDec 27, 2020

The is a bare bones tutorial focused on quickly setting up Cloud Scheduler -> Cloud Function -> BigQuery. Along the way, we’ll also setup a service account and local dev environment. If you’re new to GCP, not a bad place to start.

Why? BigQuery makes a nice storage database, but you need an easy way to ingest data. In this first part, we’ll cover setting up a cloud function that can write to BigQuery. In part II, we’ll use the cloud function to regularly query different data sources and pull our data into an ever increasing database. At that point, we might want to use it to train a machine learning algorithm. Who knows. But we can’t do any of this until we establish a database to store everything.

But why write this? Over the past few days, I’ve spent countless hours reading google docs, Medium posts, and git repositories all to achieve a simple goal — schedule a Cloud Function written in Python to post to BigQuery. Easy enough. However, numerous changes to GCP, minor documentation errors, and my own newness to GCP have made this simple task quite frustrating. Google — if you’re listening — fix your tutorials. My goal with this article is to provide an easy to follow guide that will give you a few basic tools to build from. Here we go!

As a quick aside — BigQuery is not the best tool for transactional data because the response times can be a bit slow. Stay tuned for a post on Cloud SQL and Firestore if that’s your need. However, when it comes to storing enormous amounts of data, BigQuery a great place to go. This is also a good starting place if you’re new to GCP and want to see how different components work together.

Before we launch into the fun, let’s make sure we have a few things setup first:

  1. Signup for GCP account if you have not already (https://www.googleadservices.com)
  2. Check your Python Version (python -V). I’m deploying in 3.7 and writing in 3.8. If you run into any issues, leave a note in the comments and I can update the code.
  3. (optional) Install GCP SDK (https://cloud.google.com/sdk/docs/quickstart)

ok, with that out of the way. Let’s look at what we’re about to do —

  1. Create a new project
  2. Setup a GCP BigQuery Datasource and Table to house our data
  3. Create a Service Account to run your project
  4. Locally, clone and run a Python script to read and post data into BigQuery
  5. Create and deploy a Cloud Function to run our Python code
  6. Schedule your Cloud Function
  7. Sit back and drink coffee

Let’s do it.

  1. Create a new project

Navigate to GCP (https://console.cloud.google.com/) and login.

Let’s create a new project to house this in (click on My First Project -> New Project). I’m calling mine CloudFunctionTest.

Click My First Project again and click into your project. On to BigQuery.

2. Setup a GCP BigQuery Datasource and Table to house our data

Type BigQuery at the top and select BigQuery to open

Since we only have 1 project, it has opened to our default project cloudfunctiontest-###### we created above.

First we need to create a new dataset to house our data. Click Create Dataset. I called mine CloudFunctionDataset.

Click on your CloudFunctionDataset on the left and now click the + to create a new table. I called mine CloudFunctionTable. Click + Add Field and add a field for Name (string), TimestampValue (Timestamp), and ID (integer). Then click Create. General notes — you will usually want to use Timestamp and not Datetime because it has a timezone value. Also, avoid using field names which match function names — ie we call it TimestampValue and not Timestamp.

Before we leave BigQuery, click the CloudFunctionTable you just created, Click details, and copy the Table ID. It should look like this — ProjectID:Dataset.Table
cloudfunctiontest-299816:CloudFunctionDataset.CloudFunctionTable

3. Create a Service Account to run your project

We will need a service account to run our Python code locally as well as to launch our Cloud Function (further reading here https://cloud.google.com/docs/authentication/getting-started).

In the global search in GCP, find Service Accounts and click Create Service Account.

Click Done (we’re going to add access from IAM so you can find it next time).

Click the 3 dots under actions, next to your service account, and click create key. Save as a JSON. We will use on the next step.

Finally navigate to IAM under Access (or just search IAM), find your service account and click the pencil (edit member).

Click add another role, and add the following —
BigQuery -> BigQuery Data Editor
Project -> Owner

We are ready to dive into the code.

4. Locally, clone and run a Python script to read and post data into BigQuery

Now that we have a table setup in BigQuery and our service account, let’s try writing and reading from our database.

For my local environment (mac) I do the following —
Create a new directory
> python3 -m venv env
> git clone https://github.com/mhoss2008/CloudFunctionBigQuery
> source env/bin/activate
> cd CloudFunctionBigQuery
> pip install -r requirements.txt
Finally, copy the JSON key you downloaded for your service account into the same directory (you can move it later, this is just for testing).

Open main.py and update the table_id and GOOGLE_APPLICATION_CREDENTIALS

You can now cycle through the different definitions and test loading from JSON, csv, and Pandas dataframe. Try running each of the following —

test_load_BigQuery_JSON(table_id)
test_load_BigQuery_csv(table_id)
test_load_BigQuery_Pandas(table_id)
BigQueryQuery(table_id)

5. Create and deploy a Cloud Function to run our Python code

Now that we have tested our script locally, it’s time to push it out to a Cloud Function. There are plenty of posts about setting up a full CI/CD and we can address that later. For now, let’s just get it up and running. Ditto for using the CLI to push files up to GCP.

For now, copy the function Cloud_Function_load_BigQuery from main.py.

Now into GCP.

Go to Cloud Build API and turn it on (this will let us build our function).

Navigate to Cloud Functions and click Create Function. I named my function UpdateBigQuery.

Change Runtime to Python 3.7 and change entry point to Cloud_Function_load_BigQuery.

Copy and paste the contents of main_GCP.py into main.py

Copy and paste the contents of requirements_GCP.py into main.py

Finally, click deploy.

Assuming all the permissions are setup correctly, it should deploy without issue.

Finally, open your function, click on testing, and click Test the Function

If everything works, 2 new entries should be added to BigQuery. You can open up BigQuery or fire up the BigQueryQuery definition in your Python script to verify.

If you run into any errors at this point, please post in the comments section of this article (and if you have a solution, please post that too!).

6. Schedule your Cloud Function

Home stretch — let’s get that function scheduled.

Go to GCP global search and open Cloud Functions (if you’re not already there. Go to trigger and copy the trigger URL (example — https://us-west2-cloudfunctiontest-299816.cloudfunctions.net/UpdateBigQuery)

Back to GCP global search and open Service Accounts. Copy the email for the service account (example — cloudfunctionserviceaccount@cloudfunctiontest-299816.iam.gserviceaccount.com)

Back to GCP global search and open Cloud Scheduler

Click Create Job. Select your Region and click Next. Enter a name for your function, a frequency (cron format, 0 * * * * will run once an hour), set target as HTTP, for URL, use the Cloud Function trigger. Click on advanced, and for Auth Header, add OIDC token and paste in the email address for the Service account.

That’s it! We made it! Now sit back and drink coffee as Cloud Functions do all your work. Thanks for joining and leave any questions in the comments.

--

--