
Bottled Water – Real-Time Integration of PostgreSQL and Kafka - olalonde
http://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/
======
SEJeff
FYI I've used bottledwater and it can cause segfaults that take down the
postgres database hard.

I ran into both of these:

[https://github.com/confluentinc/bottledwater-
pg/issues/53](https://github.com/confluentinc/bottledwater-pg/issues/53)

[https://github.com/confluentinc/bottledwater-
pg/issues/61](https://github.com/confluentinc/bottledwater-pg/issues/61)

They're fixed now, but this is very much beta stuff. It wasn't as magical as
we'd hoped so we ended up ripping it out and switching the application to use
SSE[1] which streamed into a kafka producer.

[1] [https://en.wikipedia.org/wiki/Server-
sent_events](https://en.wikipedia.org/wiki/Server-sent_events)

------
threeseed
Not really sure why you would do it like this.

Most systems do cache/index invalidation or HDFS archiving from the web app or
a set of microservices that gets notified from the web app. So instead of
pushing raw DB rows into Kafka they will push generic events e.g. "user
deleted account" and let the various services work out how to respond to it.

In this model you have tightly coupled both the choice of DB as well as the DB
schema to the events system.

Sounds like a bit of a nightmare if your iterate a lot on your architecture.

~~~
bladecatcher
This makes sense for e.g, when you want to make your database content
available for search (via elasticsearch) - for this you may want to push the
raw DB rows.

~~~
threeseed
Again it doesn't really make sense because you are tying the physical DB
schema to the search engine schema.

Your application domain model should be at the centre of your architecture not
the physical database model. For example storing a User object rather than a
row from a User table.

I understand why a database company would see a database as the centre of the
world. But it really should be your application. Especially if you want to use
PostgreSQL and InfluxDB for different domain types and yet have both indexed
in ElasticSearch.

~~~
shawn-butler
No.

The application domain model can be buggy and full of holes. The data store is
the source of truth.

~~~
zepolen
Couldn't agree more.

It's also pointless to try and convince someone who thinks otherwise either, I
found it better to let the those developers shoot themselves in the foot and
learn the hard way. It's the only way they'll learn.

------
ZenoArrow
Previous discussion:
[https://news.ycombinator.com/item?id=9427441](https://news.ycombinator.com/item?id=9427441)

IIRC this works closely with Samza. There's a decent video by the author
knocking around somewhere.

EDIT: This is the video I was thinking of:

[https://m.youtube.com/watch?v=fU9hR3kiOK0](https://m.youtube.com/watch?v=fU9hR3kiOK0)

------
agentgt
I thought about doing this approach but it looked like too much work (I was
going to use the listen notify event support in Postgres).

Instead we just write directly to durable queues on RabbitMQ (there are some
exceptions where extreme consistency and transactions are needed).

That's is we do web->queue->databases.

There are serious cons to this bus approach as well as pros like improvements
in latency.

The bottle approach is nice because you can write old school CRUD style.

Maybe I can use parts of bottle to write to our rabbit bus for the parts that
are CRUD because of consistency reasons.

------
pram
Confluent has a newish program for taking streaming data from Kafka and
sending it to multiple different types of stores called Kafka Connect:
[http://docs.confluent.io/3.0.0/connect/](http://docs.confluent.io/3.0.0/connect/)
[https://github.com/confluentinc/kafka-connect-
jdbc](https://github.com/confluentinc/kafka-connect-jdbc)

It supports HDFS and ElasticSearch as well.

------
pimeys
It's funny. We're in a need of something like this. I was thinking my options
this morning and went to Hacker News just to see the link posted here. We
already have the latest PostgreSQL and Kafka running. What a coincidence!

------
eddd
MySQL has its own solution as well.
[https://github.com/zendesk/maxwell](https://github.com/zendesk/maxwell)

------
zodPod
Really? It's called Bottled Water? Are we really getting that ridiculous with
our naming?

~~~
raarts
Relevant username.

~~~
zodPod
Haha well it's not "City Building" or some ridiculously common English word.
At least, not as far as I know.. lol it was a random series of noises that
popped out of my head which, I'd argue, is better than naming something
"Bottled Water" or "Hydrogen" or "Rust" lol

