
Apache Kafka – Scaling SQL with KStream and Kafka Connect - Antwnis
https://www.youtube.com/watch?v=denwxORF3pU
======
barrkel
I had an entertaining few days working with Confluent's Kafka Connect stuff. I
was trying to connect a MySQL table to Kafka and then on out to Hadoop.
Amusingly, Kafka Connect wanted to use a queue with the same name as my table
(MySQL or Hive / Hadoop, I don't recall which end); but of course since Kafka
doesn't have namespaces, I had better hope that my table name is unique across
the whole cluster!

It was around about then that I figured out that Confluent was a bunch of kids
playing at building stuff. I have zero doubt that it's a good base if you have
an enormous firehose of data, but look for features beyond raw performance and
basic correctness, and it's underdeveloped. Basic stuff like back-pressure -
don't expect it, either overallocate your storage or make sure you always have
faster consumers than producers.

~~~
EdwardDiego
> Amusingly, Kafka Connect wanted to use a queue with the same name as my
> table (MySQL or Hive / Hadoop, I don't recall which end)

It'll be the MySQL end if it's a Connect source as opposed to sink.

Two options - in your Connect config, you can specify a topic prefix, or if
you use a custom query, the topic prefix will be used as the entire topic
name.

> It was around about then that I figured out that Confluent was a bunch of
> kids playing at building stuff.

Kafka Connect saved me writing a load of boilerplate to monitor a PG database
to propagate model updates in a medium suitable for streaming jobs - Kafka
Connect + Kafka Streaming's Global KTables is a nice fit, even if the Connect
JDBC end is somewhat beta at this point (KTables rely on Kafka message key for
identity, the JDBC source doesn't populate it by default, so you have to use
Single Message Transforms (SMTs) to achieve it)

I'd say beta, not kids.

