StarTree
This tutorial shows how to integrate Upstash Kafka with StarTree
StarTree provides a fully managed, Apache Pinot based real-time analytics database on its cloud environment.
Upstash Kafka Setup
Create a Kafka cluster using Upstash Console or Upstash CLI by following Getting Started.
Create one topic by following the creating topic steps. This topic will be the source for the Apache Pinot table running on StarTree. Let’s name it “transcript” for this example tutorial.
StarTree Setup
To be able to use StarTree cloud, you first need to create an account.
There are two steps to initialize the cloud environment on StarTree. First, you need to create an organization. Next, you need to create a workspace under this new organization.
For these setup steps, you can also follow StarTree quickstart.
Connect StarTree Cloud to Upstash Kafka
Once you created your workspace, open Data Manager under the Services
section
in your workspace. Data Manager is where we will connect Upstash Kafka and work
on the Pinot table.
To connect Upstash Kafka with StarTree, create a new connection in Data Manager.
As the connection type, select Kafka.
In Kafka connection settings, fill the following options:
-
Connection Name: It can be anything. It is up to you.
-
Broker Url: This should be the endpoint of your Upstash Kafka cluster. You can find it in the details section in your Upstash Kafka cluster.
-
Authentication Type:
SASL
-
Security Protocol:
SASL_SSL
-
SASL Mechanism:
SCRAM-SHA-256
-
Username: This should be the username given in the details section in your Upstash Kafka cluster.
-
Password: This should be the password given in the details section in your Upstash Kafka cluster.
To proceed, you need to test the connection first. Once the test connection is successful, then you can create the connection.
Now you have a connection between Upstash Kafka and StarTree Cloud! The next step is to create a dataset to store data streamed from Upstash Kafka.
Let’s return to the Data Manager overview page and create a new dataset.
As the connection type, select Kafka again.
Now you can select the Kafka connection you created for connecting Upstash Kafka.
In the next step, you need to name your dataset, provide the Kafka topic to be the source of this new dataset and define the data format. We can give “transcript” as the topic and select JSON as the data format.
To proceed to the next step, we must first produce a message in our Kafka topic. StarTree doesn’t allow us to go to the next step before it validates the connection is working, and data is being streamed correctly.
To make StarTree validate our connection, let’s turn back to the Upstash console and create some events for our Kafka topic. To do this, click on your Kafka cluster on Upstash console and go to the “Topics” section. Open the source topic, which is “transcript” in this case. Select the Messages tab, then click Produce a new message. Send a message in JSON format like the one below:
{
"studentID": 205,
"firstName": "Natalie",
"lastName": "Jones",
"gender": "Female",
"subject": "Maths",
"score": 3.8,
"timestampInEpoch": 1571900400000
}
Now go back to the dataset details steps on StarTree Data Manager.
After you click next, StarTree will consume the message in the source Kafka topic to verify the connection. Once it consumes the message, the message will be displayed.
In the next step, StarTree extracts the data model from the message you sent.
If there is any additional configuration about the model of the data coming from the source topic, you can add it here.
To keep things simple, we will click next without changing anything.
The last step is for more configuration of your dataset. We will click next again and proceed to review. Click “Create Dataset” to finalize the dataset.
Query Data
Open the dataset you created on StarTree Data Manager and navigate to the query console.
You will be redirected to Pinot Query Console running on StarTree cloud.
When you run the following SQL Query, you will see the data that came from Upstash Kafka into your dataset.
select * from <Dataset-name> limit 10
Was this page helpful?