How ksqlDB works

Who am I?

Michael Drogalis

Principal Product Manager

ksqlDB & Stream Processing @ Confluent

How do I use my data in real-time?

It's not exactly simple.

What if it looked more like Postgres?

How you use it

The basics

Streams

CREATE STREAM readings (sensor VARCHAR KEY,
                        reading DOUBLE,
                        location VARCHAR)
  WITH (kafka_topic='readings',
        value_format='json',
        partitions=3);
              

Rows

INSERT INTO readings (sensor, reading, location)
    VALUES ('sensor-1', 45, 'wheel');

INSERT INTO readings (sensor, reading, location)
    VALUES ('sensor-2', 41, 'motor');
  
INSERT INTO readings (sensor, reading, location)
    VALUES ('sensor-1', 42, 'wheel');
  
INSERT INTO readings (sensor, reading, location)
    VALUES ('sensor-3', 42, 'muffler');

...

                

Transforming a stream

                        -- pq1
CREATE STREAM clean AS
  SELECT sensor,
         reading,
         UCASE(location) AS location
  FROM readings
  EMIT CHANGES;
    

Filtering rows out of a stream

                    -- pq1
CREATE STREAM clean AS
  SELECT 
    sensor,
    reading,
    UCASE(location) AS location
  FROM readings
  EMIT CHANGES;
                    -- pq2
CREATE STREAM high_readings AS
    SELECT sensor, reading, location
    FROM clean
    WHERE reading > 41
    EMIT CHANGES;

Combining many operations into one

-- pq1
CREATE STREAM high_pri AS
    SELECT sensor,
           reading,
           UCASE(location) AS location
    FROM readings
    WHERE reading > 41
    EMIT CHANGES;
                

Processing with multiple consumers

-- pq1
CREATE STREAM high_pri AS
    SELECT sensor,
           reading,
           UCASE(location) AS location
    FROM readings
    WHERE reading > 41
    EMIT CHANGES;
-- pq2
CREATE STREAM by_location AS
    SELECT *
    FROM high_pri
    PARTITION BY location
    EMIT CHANGES;
-- pq3
CREATE STREAM s1_by_location AS
  SELECT sensor,
         reading,
         UCASE(location) AS location
  FROM s2
  EMIT CHANGES;

Stateful functionality

Materializing a view from a stream

                    -- pq1
CREATE TABLE avg_readings AS
    SELECT sensor,
           AVG(reading) AS avg
    FROM readings
    GROUP BY sensor
    EMIT CHANGES;

Automatic repartitioning

                    -- pq1
[[ internal ]]
                    -- pq2
CREATE TABLE part_avg AS
    SELECT area,
           AVG(reading) AS avg
    FROM readings
    GROUP BY area
    EMIT CHANGES;

Replaying from changelogs

                    -- pq1
CREATE TABLE part_avg AS
    SELECT area,
           AVG(reading) AS avg
    FROM readings
    GROUP BY area
    EMIT CHANGES;

Replaying from a compacted topic

                    -- pq1
CREATE TABLE part_avg AS
    SELECT area,
           AVG(reading) AS avg
    FROM readings
    GROUP BY area
    EMIT CHANGES;

Scaling and fault tolerance

Scaling: 1x

Scaling: 2x

Scaling: 8x

Scaling with state

High availability

Thanks for watching!

Learn more and start using ksqlDB

at

ksqldb.io