
Dropping Webhooks - sundip
http://blog.runnable.com/post/142092495506/dropping-webhooks
======
dyeje
Title is really misleading. Read the whole thing expecting it to conclude "So
we decided to drop webhook support".

~~~
ykumar6
The blog post talks about how the api server was dropping web hooks, and there
is no re-try mechanism built into most web hook senders.

Scaling out an api server is not resilient enough, and there needs to be a
queue-based system fronting your web hooks so they aren't dropped

~~~
misthop
Yes, but the title implies that they will stop using or supporting webhooks. A
more accurate title would be "Handling Dropped Webhooks"

------
zrail
The tl;dr is basically to handle webhooks asynchronously. This is easy to
accomplish with any background processing system you may have lying around in
your infrastructure, and the author happens to have RabbitMQ already running.

------
smetj
Shameless self-promomtion here. The Wishbone framework
([http://wishbone.readthedocs.org](http://wishbone.readthedocs.org)) enables
you to construct event pipeline servers which can do exactly this sort of
thing.

The parent's use case could be handled by connecting the Wishbone modules in
the following way:

Accept webhook data over http(s) ([https://github.com/smetj/wishbone-input-
httpserver](https://github.com/smetj/wishbone-input-httpserver)), parse the
JSON data
([http://wishbone.readthedocs.org/en/latest/modules/builtin%20...](http://wishbone.readthedocs.org/en/latest/modules/builtin%20modules.html#wishbone-
decode-json)), validate the event data against JSONschema
([https://github.com/smetj/wishbone-flow-
jsonvalidate](https://github.com/smetj/wishbone-flow-jsonvalidate)) and
eventually submit the event data to to RabbitMQ
([https://github.com/smetj/wishbone-output-
amqp](https://github.com/smetj/wishbone-output-amqp)).

Super easy to start a server:

    
    
      $ wishbone start --config bootstrap.yaml
    
    

Depending on the modules you hook together you can build the event processing
pipeline suiting your needs.

For example:
[http://smetj.net/processing_webhooks_using_wishbone_part_1.h...](http://smetj.net/processing_webhooks_using_wishbone_part_1.html)

------
jclulow
It seems that Github should eventually retry delivery on transient faults,
like connection time outs or 500 errors.

~~~
xur17
I'm surprised to see that they don't - it seems standard to retry on failure,
waiting longer and longer between each attempt.

Some sort of pubsub queue would be awesome - pull (or have them pushed), and
process them on demand. If something breaks, retry when you want.

~~~
devbug
I'd love an open-source appliance that lets me turn webhooks (and polling
APIs) into a kind of pub-sub. Being able to inspect, modify, and run events
would be an "immediately use this" feature set.

It would be something I would pay for to not have to host myself, but iff
there's a clear migration path, i.e. it's open-source.

~~~
smetj
Wishbone could do that
[http://wishbone.readthedocs.org/en/latest/index.html](http://wishbone.readthedocs.org/en/latest/index.html)

Some example articles:

\-
[http://smetj.net/processing_webhooks_using_wishbone_part_1.h...](http://smetj.net/processing_webhooks_using_wishbone_part_1.html)

\-
[http://smetj.net/processing_webhooks_using_wishbone_part_2.h...](http://smetj.net/processing_webhooks_using_wishbone_part_2.html)

------
encoderer
Due to the nature of our task-monitoring service at Cronitor, missing a ping
from a user could result in a false-positive alert. Needless to say, it's bad
news. As a failsafe against exactly the situation described here, we run a
daemon on every webserver that tails the nginx error log and injects failed
messages into SQS. These at-least-once semantics work for us because messages
that somehow get delivered twice are idempotent. Somedays I think this is a
kludge, other days I'm pleased that it saves our butt (including, when SQS is
unavailable or slow which thankfully rarely happens, but rarely is not never).

------
tobych
One solution might be for Github to store the details of each failed call, and
give each call a number that increments (not just for failed calls). Then, if
the server notices that it's missed a call into it, it can call Github and
fetch those missing calls. Furthermore, if it notices it's not received a call
for a while, it can could call in to check.

------
fasteo
Don't trying to by cynical, but decoupling calls from external systems is the
bare minimum you need to put in place; not only for the problem at hand, but
to protect your system from systems you don't own.

For the record, we use beanstalkd[1] with both php and node workers. Getting
millions of external notifications every day without any major issue.
Internally, we use the same design, this time to completely decouple our
workers from the persistence layer. Our major benefit in this case was the
ability to perform database maintenance without taking the service down.

Anyway, I good read and a simple, yet effective solution.

[1] [https://github.com/kr/beanstalkd](https://github.com/kr/beanstalkd)

------
ngrilly
GitHub doesn't retry when the webhook fails? Stripe does that for example.

------
whatnotests
I don't understand why Runnable didn't __start __with dumping requests
straight into RabbitMQ from the beginning.

It's the obvious right solution for this kind of scenario.

------
jasdeepsingh
Misleading title.

