Google BigQuery Sink Connector allows you to continuously store the data from your Kafka Topics in Google BigQuery. In this guide, we will walk you through creating a Google BigQuery Sink Connector.

Get Started

Create a Kafka Cluster

If you do not have a Kafka cluster and/or topic already, follow these steps to create one.

Prepare the Google BigQuery Environment

If you already have a Google BigQuery environment with the following information, skip this step and continue from the Create The Connector section.

  • project name
  • a data set
  • an associated google service account with permission to modify the google big query dataset.

Go to Google Cloud BigQuery. Create or select a project. Note that this project name will be used later to configure the connector.

Create a dataset for the project. Note that this dataset name will be used later to configure the connector.

Default configurations should be fine for this guide.

Next, we will create a service account which later we will connect to this project. Go to Google Cloud Console, then “IAM & admin” and “Service accounts”

Click on “Create Service Account”.

Give a name to your service account.

Configure permissions for the service account. To keep it simple, we will make this service account “Owner” to allow everything. You may want to be more specific.

The rest of the config can be left empty. After creating the service account, we will go to its settings to attach a key to it. Go to the “Actions” tab, and select “Manage keys”.

Then create a new key, if you don’t have one already. We will select the “JSON” key type as recommended.

We will use the content of this JSON file when creating the connector. For reference it should look something like this:

{
  "type": "service_account",
  "project_id": "bigquerysinkproject",
  "private_key_id": "b5e8b29ed62171aaaa2b5045f04826635bcf78c4",
  "private_key": "-----BEGIN PRIVATE A_LONG_PRIVATE_KEY_WILL_BE_HERE PRIVATE KEY-----\n",
  "client_email": "serviceforbigquerysink@bigquerysinkproject.iam.gserviceaccount.com",
  "client_id": "109444138898162952667",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/serviceforbigquerysink%40bigquerysinkproject.iam.gserviceaccount.com"
}

Then we need to give permission to this service account from the dataset that we created. From BigQuery Console go to your dataset settings and click share.

The “Dataset Permissions” view will open. Click to “Add Principal” We will add the service account we have created as a principal here. And we will assign the “Owner” role to it to make this example simple. You may want to be more specific here.

With this step, the BigQuery dataset should be ready to use with the connector.

Create the Connector

Go to the Connectors tab, and create your first connector by clicking the New Connector button.

Choose your connector as Google BigQuery Sink Connector

Enter the required properties.

Note that the Google BigQuery Connector expects the data to have a schema. That is why we choose JsonConvertor with schema included. Alternatively AvroConvertor with SchemaRegistry can be used as well.

The advanced screen is for any other configuration that the selected connector supports. At the top of this screen, you can find a link to related documentation. We can proceed with what we have and click the Connect button directly.

Congratulations! You have created a Google BigQuery Sink Connector.

As you put data into your selected topics, the data will be written into Google BigQuery. You can view it from the Google BigQuery Console.