This tutorial shows how to integrate Upstash Kafka with Apache Spark
Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.
Create a Kafka cluster using Upstash Console or Upstash CLI by following Getting Started.
Create a topic by following the creating topic steps. Let’s name the topic “sentence”.
If you already have a project and want to implement Upstash Kafka and Apache Spark integration into it, you can skip this section and continue with Add Spark and Kafka into the Project.
Install Maven to your machine by following Maven Installation Guide.
Run mvn –version
in a terminal or in a command prompt to make sure you have
Maven downloaded.
It should print out the version of the Maven you have:
To create the Maven project;
Go into the folder that you want to create the project in your terminal or
command prompt by running cd <folder path>
Run the following command:
Open the project folder by using an IDE which has maven plugin such as Intellij,
Visual Studio, Eclipse etc. Add following Spark dependencies into the
dependencies tag in pom.xml
file.
Import the following packages first:
To send messages to Kafka from Spark, use the following code after replacing the
UPSTASH-KAFKA-*
placeholders with your cluster information:
Before running the project, open the messages of the topic from console.
You can observe new message coming to the topic on Upstash console when you run your project.
If the following packages are not imported, import them first:
To receive the messages from Kafka topic by Apache Spark and to process, use the following code after replacing the UPSTASH-KAFKA-* placeholders with your cluster information:
You can verify that you can see the sentence, which you sent, on your console with number of word occurrences:
This tutorial shows how to integrate Upstash Kafka with Apache Spark
Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.
Create a Kafka cluster using Upstash Console or Upstash CLI by following Getting Started.
Create a topic by following the creating topic steps. Let’s name the topic “sentence”.
If you already have a project and want to implement Upstash Kafka and Apache Spark integration into it, you can skip this section and continue with Add Spark and Kafka into the Project.
Install Maven to your machine by following Maven Installation Guide.
Run mvn –version
in a terminal or in a command prompt to make sure you have
Maven downloaded.
It should print out the version of the Maven you have:
To create the Maven project;
Go into the folder that you want to create the project in your terminal or
command prompt by running cd <folder path>
Run the following command:
Open the project folder by using an IDE which has maven plugin such as Intellij,
Visual Studio, Eclipse etc. Add following Spark dependencies into the
dependencies tag in pom.xml
file.
Import the following packages first:
To send messages to Kafka from Spark, use the following code after replacing the
UPSTASH-KAFKA-*
placeholders with your cluster information:
Before running the project, open the messages of the topic from console.
You can observe new message coming to the topic on Upstash console when you run your project.
If the following packages are not imported, import them first:
To receive the messages from Kafka topic by Apache Spark and to process, use the following code after replacing the UPSTASH-KAFKA-* placeholders with your cluster information:
You can verify that you can see the sentence, which you sent, on your console with number of word occurrences: