How to create a BigQuery sync job

Get started by naming your job and selecting Load Target -> Big Query

(Currently Google Sheets and BigQuery are supported... learn how to set up a Google Sheets sync job

Screen_Shot_2018-08-27_at_2.34.12_PM.png

About BigQuery

A BigQuery job in Local Hero entails uploading data from a source CSV file into a destination table within the BigQuery service, which is a paid, petabyte-scale data warehousing and analytics technology within the Google Cloud Platform.  BigQuery is ideal for data sets that are likely to be too large for Google Sheets, however it is not a transactional database, so it tends to work best for data sets that are transacted upon elsewhere (like in an SIS) but where extremely high performance querying, analysis, and custom dashboard visualizations may be a requirement.

By our estimates, using BigQuery for most SIS-scale use-cases will end up being free or extremely inexpensive, however Google does require that a billing credit card be set up in order for you to use it.  Learn how Amplified IT can help schools get set up with GCP billing via purchase order instead.

Setting source data and target

From Step 1, clicking Next will show the Source Data & Target step. 

On the left (see below), select one or more source files that you wish to push to a destination table in BigQuery.  (Additional files can be joined to the first file by selecting CHOOSE FILE, provided the additional files have a unique primary key that matches to a value within the first file.)

On the right (see below), you'll be asked to identify a target table in BigQuery.  Clicking Load Projects will load all GCP projects where BigQuery privileges have been granted to the user who has authorized Local Hero (click the gear icon to see who this user is, if you've forgotten). 

Screen_Shot_2018-08-27_at_3.13.56_PM.png

If you have an existing GCP project with BigQuery enabled and a target dataset and table, but don't see the project listed...

...you may need to add the Local Hero user to the GCP project, you can do so through the IAM & admin section of the Google Cloud Platform dashboard.  The minimum privilege required is BigQuery Data Editor.

iam-admin.png

 

If you DON'T have an existing GCP project, you'll need to set one up, with a target BigQuery dataset & table

In order for a BigQuery sync to function properly, a target dataset and table will need to exist in BigQuery, and be accessible (with a minimum of BigQuery Data Editor permission) to the user who is authorizing Local Hero.

To create a BigQuery data set, visit https://bigquery.cloud.google.com/.   If you've never created a Google Cloud Platform project with BigQuery, you'll be prompted to create one.

Screen_Shot_2018-08-27_at_2.49.51_PM.png

Clicking Create a project should take you to a location where you can choose your organization (typically your domain) and associate a billing account.

Screen_Shot_2018-08-27_at_2.54.35_PM.png

If you've never enabled billing on the Google user account, you'll need to do so before you can enable the BigQuery service.  Typically, Google will offer a $300 introductory "free credit" valid for 12 months.  Accepting these free credits will require a credit card. Learn how Amplified IT can help schools get set up with these same credits and GCP billing via purchase order instead.

Once you've got a project set up with BigQuery enabled on it, create a dataset by clicking Create dataset, give it a name, and configure your location and data expiration preferences.

create-dataset.png

Screen_Shot_2018-08-27_at_2.58.10_PM.png

Once you've created a dataset, you'll need at least one table within it.  Select the dataset in the left panel and click Create table to the right.

create-tavke.png

In the fly-out, you'll be able to specify a table name and field structure (via the + Add field button).  The field structure of this table should be created to match the structure and column data types found in your source CSV data.

create-table2.png

 

 

Once you've set up your target BigQuery dataset and table, click NEXT.

Screen_Shot_2018-08-27_at_4.58.51_PM.png

In the Target Mappings step, click the gear icon next to each field to map it to the correct source data field in your CSV file.

gear.png

Within the mapping dialog for a given field, you can concatenate fields, insert text, and utilize a small set of date, string, and anonymization functions.  Also note the validation section, which can help provide guidance around the data type requirements in BigQuery.  The sync job will also fail if validation fails within the preview.

validation2.png

validation.png

 

Clicking Save after each field mapping, and NEXT will prompt to Choose Action.  Because BigQuery is a non-transactional data store, appending and clear / replace are the only options available.  (For record-level update functionality, see Sheets sync jobs)

Screen_Shot_2018-08-27_at_5.40.22_PM.png

Click NEXT and you'll be prompted for Automation Settings.

An email notification can optionally be sent to a designated list of email addresses, and sync jobs can be run:

  • Manually
  • Automatically at a specific time of day
  • Every hour

Screen_Shot_2018-08-27_at_5.43.08_PM.png