
Show HN: Skor – Drop-in microservice to get Postgres changes as JSON webhooks - mesid
https://github.com/hasura/skor
======
odammit
Looks cool. Seems like a good alternative if you can’t use listen/notify for
some reason.

I’ve used listen/notify with triggers for a while with good success. The
flexibility of my apps controlling what they want to listen for instead of
having that config elsewhere.

Downside to l/n is that if you aren’t listening you aren’t getting changes.
Seems like the same is true if the webhook fails here since there isn’t a
retry.

If you want to check out another cool alternative see Debezium[1]. It’s uses
PGs logical decoding feature to stream changes. Supports protobufs and JSON
output.

[http://debezium.io/docs/connectors/postgresql/](http://debezium.io/docs/connectors/postgresql/)

~~~
octernion
Unfortunately, most of these LR programs don't work in AWS (RDS) land, since
they require PG plugins, and RDS LR only supports the built-in test decoder.

I wrote a similar program that uses LR to stream changes to AWS's Kinesis[1],
using a mini library that parses the output of the default test decoder[2] as
a result, in case anyone else is in RDS and has the same limitation.

[1]
[https://github.com/nickelser/pg_kinesis](https://github.com/nickelser/pg_kinesis)
[2]
[https://github.com/nickelser/parselogical](https://github.com/nickelser/parselogical)

~~~
nrmitchi
IIRC Debezium is RDS compatible since version 0.7.0

~~~
octernion
That's awesome -- wasn't aware that was the case (wasn't many moons ago :).
Thanks for the heads up! But, we don't use Kafka internally so it's sadly a
non-option still...

------
pfooti
Interesting. A quick look at the code indicates you're just doing a notify on
the whole row. Have you tested with large rows? The postgres notify function
fails with a payload larger than 8000 bytes, and that limit cannot be changed
at runtime, last I looked.

I wrote a similar thing to notify on changes, but had to just fire notifies on
what changed (id, column names) and there was a second step in the listening
code to fetch the full details.

~~~
vsurabhi
Yes. That is correct. It fails with large rows. I'll add this to the
limitations. The trigger currently sends the entire row but it can be modified
to send just the unique identifier for a row. I'll add the documentation to do
this. Tracking this here:
[https://github.com/hasura/skor/issues/6](https://github.com/hasura/skor/issues/6)

(I work at Hasura)

------
netcraft
This problem is one of my biggest needs with postgres, although its certainly
not limited to pg. listen/notify is fragile - you either have a single point
of failure or you run multiple workers and have to deal with de-duping.
Logical replication might one day be an answer, but not likely to ever work
completely and without issue on cloud hosted instances. (maybe im wrong about
this... see debezium comments elsewhere in the comments)

What ive been thinking of lately is doing triggers to write to a different
table with a timestamp and doing a rolling log - the worker can listen/notify
on that but can also keep track of the last message it saw and can go back and
catch up when it first starts, so if it fails you dont lose anything. All I
really need is a type (table) and the relevant keys. I think I would prefer a
way to configure which tables I want to listen on and which columns (keys) I
care about.

Really wish there was a standard postgres provided way to notify an external
system of changes though. Its certainly not an easy issue.

~~~
fritzy
`table -> trigger to log table -> trigger to notify` is exactly the right
solution here. You can even use the same stored proc to replay old logs.

------
xdanger
I did something similiar, but directly from postgresql triggers:
[https://github.com/pramsey/pgsql-http](https://github.com/pramsey/pgsql-http)

You just have to be careful with failing queries and set timeout to low.

------
politips
This looks cool, and another approach is pg_amqp_bridge to send events like
this into rabbit. A nice feature is that the triggers don't block:

[https://github.com/subzerocloud/pg-amqp-
bridge](https://github.com/subzerocloud/pg-amqp-bridge)

You can then push to websockets from rabbit with STOMP

[https://www.rabbitmq.com/web-stomp.html](https://www.rabbitmq.com/web-
stomp.html)

~~~
oneweekwonder
> pg_amqp_bridge

Thanks for posting this. Can already see a couple of weekends burned trying to
drive celery from pg.

Do you know if it is used in production somewhere; how battle-tested is it?

~~~
politips
I've used it in a couple production deployments and it's solid. It's only ~300
lines of Rust code, so I'm confident it is simple and even if I had a problem,
would be easy to understand.

------
bearjaws
Very cool, at my current job we have a realtime ETL process using Maxwell &
Apache Kafka to stream replication events to a set of Node servers for a
variety of transformations.

Seeing that this isn't hooked into replication is a bit worrying, but it seems
they are going to implement it as a replication client which should make it
much more robust.

------
fizx
Seems like there are a lot of people here who aren't aware of wal2json and RDS
support for wal2json.

------
crescentfresh
Does an update of 100 rows result in 100 webhook calls or 1 webhook with a
payload of 100 rows?

~~~
vsurabhi
100 webhook calls as the trigger is executed whenever a row is modified.

Postgres's per statement triggers don't let you capture the rows that are
modified and skor doesn't do any sort of batching currently.

~~~
anarazel
> Postgres's per statement triggers don't let you capture the rows that are
> modified

They do these days. Since 10 you can specify:

    
    
      REFERENCING { { OLD | NEW } TABLE [ AS ] transition_relation_name } [ ... ] ]
    

in a FOR EACH STATEMENT ... AFTER trigger.

~~~
vsurabhi
That's great news ! So the trigger can use json_agg to compose the
notification with all the modified rows. We'll document it
([https://github.com/hasura/skor/issues/8](https://github.com/hasura/skor/issues/8)).

------
anc84
Very cool and I see myself using this in the future.

Is an approach like this best practise or rather meant as intermediate hack?

~~~
Lord_Zero
In general I have not seen the DBMS emitting changes as a best practice. The
applications modifying the data should be emitting events or messages
appropriately. Someone can correct me if I am wrong.

~~~
ryanworl
The most reliable implementation of this pattern is to use the replication
binary logs of the database. Your application cannot atomically publish a
write to both a database and a messaging system (without introducing a massive
headache that you will implement incorrectly).

~~~
hderms
Yeah but I think it's important to consider whether any microservice should be
able to listen to any other microservice inserts or updates. I think using the
replication log as the underlying mechanism is a good idea, but you should
really be choosing which messages to expose because it forms a public API.

~~~
chatmasta
The receivers choose which messages to expose by selecting which events to
listen for. The broker then only forwards those events. The question is, how
do the receivers specify the event type? Do they use the schema of the
publisher, or some intermediate interface at the broker?

------
codezero
Cool! Does this handle rate limiting or retries?

~~~
vsurabhi
Currently there is no support for rate limiting or retries of webhook calls.
I've opened an issue
([https://github.com/hasura/skor/issues/7](https://github.com/hasura/skor/issues/7))

~~~
codezero
Cool - I didn't expect this, but was curious.

------
theo31
Hi! It's very similar to what we built at [https://base.run](https://base.run)
:) Take a look!

