Proton is a unified streaming SQL processing engine which can connect to historical data processing in one single binary. It helps data engineers and platform engineers solve complex real-time analytics use cases, and powers the Timeplus streaming analytics platform.

Both Timeplus and Proton can be integrated with Upstash Kafka. Timeplus provides intuitive web UI to minimize the SQL typing and clicks. Proton provides SQL interface to read/write data for Upstash.

Upstash Kafka Setup

Create a Kafka cluster using Upstash Console or Upstash CLI by following Getting Started.

Create two topics by following the creating topic steps. Let’s name the first topic input, since we are going to stream from this topic to Proton. The name of the second topic can be output. This one is going to receive the stream from Proton.

Setup Proton

Proton is a single binary for Linux/Mac, also available as a Docker image. You can download/install it via various options:

With Docker engine installed on your local machine, pull and run the latest version of the Proton Docker image.

docker run -d --pull always --name proton ghcr.io/timeplus-io/proton:latest

Connect to your proton container and run the proton-client tool to connect to the local Proton server:

docker exec -it proton proton-client -n

Create an External Stream to read Kafka data

External Stream is the key way for Proton to connect to Kafka cluster and read/write data.

CREATE EXTERNAL STREAM input(
    requestedUrl string,
    method string,
    ipAddress string,
    requestDuration int)
SETTINGS type='kafka', 
         brokers='grizzly-1234-us1-kafka.upstash.io:9092',
         topic='input',
         data_format='JSONEachRow',
         security_protocol='SASL_SSL', 
         sasl_mechanism='SCRAM-SHA-256',
         username='..', 
         password='..'

Run Streaming SQL

Then you can run the following streaming SQL:

select * from input where _tp_time>earliest_ts()

Let’s go to Upstash UI and post a JSON message in input topic:

{
  "requestedUrl": "http://www.internationalinteractive.name/end-to-end",
  "method": "PUT",
  "ipAddress": "186.58.241.7",
  "requestDuration": 678
}

Right after the message is posted, you should be able to see it in the Proton query result.

Apply Streaming ETL and Write Data to Upstash Kafka

Cancel the previous streaming SQL and use the following one to mask the IP addresses.

select now64() AS time, requestedUrl,method,lower(hex(md5(ipAddress))) AS ip
from input where _tp_time > earliest_ts()

You will see results as below:

┌────────────────────time─┬─requestedUrl────────────────────────────────────────┬─method─┬─ip───────────────────────────────┐
│ 2024-01-10 03:43:16.997 │ http://www.internationalinteractive.name/end-to-end │ PUT    │ d5b267be9018abbe87c1357723f2520c │
│ 2024-01-10 03:43:16.997 │ http://www.internationalinteractive.name/end-to-end │ PUT    │ d5b267be9018abbe87c1357723f2520c │
└─────────────────────────┴─────────────────────────────────────────────────────┴────────┴──────────────────────────────────┘

To write the data back to Kafka, you need to create a new external stream (with output as topic name) and use a Materialized View as the background job to write data continously to the output stream.

CREATE EXTERNAL STREAM target(
    _tp_time datetime64(3), 
    time datetime64(3), 
    requestedUrl string, 
    method string, 
    ip string) 
    SETTINGS type='kafka', 
         brokers='grizzly-1234-us1-kafka.upstash.io:9092',
         topic='output',
         data_format='JSONEachRow',
         security_protocol='SASL_SSL', 
         sasl_mechanism='SCRAM-SHA-256',
         username='..', 
         password='..';

-- setup the ETL pipeline via a materialized view
CREATE MATERIALIZED VIEW mv INTO target AS 
    SELECT now64() AS _tp_time, now64() AS time, requestedUrl, method, lower(hex(md5(ipAddress))) AS ip FROM input;

Go back to the Upstash UI. Create a few more messages in input topic and you should get them available in output topic with raw IP addresses masked.

Congratulations! You just setup a streaming ETL with Proton, without any JVM components. Check out https://github.com/timeplus-io/proton for more details or join https://timeplus.com/slack