
Webhooks do’s and dont’s: what we learned after integrating APIs - giuliano84
https://restful.io/webhooks-dos-and-dont-s-what-we-learned-after-integrating-100-apis-d567405a3671#.9rl15pi3z
======
bazizbaziz
How do people in production handle the possibility that your service might
miss a webhook notification? If you miss a notification you'll end up with
stale data and you won't know it.

Slack has a retry policy for a while but will then just give up. Another
webhook provider I've looked at says nothing at all about this sort of thing.
How do folks deal with this in production systems?

Seems to me like the best way to address this issue is to use the webhook as a
hint that you need to run some other process that guarantees you've got all
updates.

~~~
developer2
I would prefer to implement the sending of webhooks in bulk - if the consumer
falls behind, they receive up to 100-1000 webhooks per request (depending on
the size and complexity of each individual webhook - ids only is 1000, complex
documents 100). This drastically cuts down on the number of concurrent
requests to a single client when load is high, or the consumer broke down for
a period of time.

Unfortunately, developers writing code to receive batch requests are often...
inadequate, to say the least. They'll write basic looping code without any
error/exception handling; so if the 3rd item in a bulk request of 100 items
causes a server-side error for them, they throw a 500 Internal Server Error or
similar and fail to continue processing items 4 through 100. You simply cannot
batch webhooks as a producer, unless you detect a single failure from the
client to process a batch as a cue to drop to performing "batches" of size 1
until you receive an error for a single request, at which point you return to
bulk. Rinse and repeat.

Honestly, being the producer sending webhooks to consumers which are written
by random developers is a nightmare. You have to understand that your
customers will not write proper code to accept your webhook requests, even if
each request is for a single webhook. You also must understand that your
customers will not look to blame themselves for shitty code. You can retry
1,000 times over a 48 hour period, and if their code still fails to process
the webhook, it will be YOUR fault, not theirs. Truthfully, it is horrible to
be on the sending end of webhooks to random developers/customers.

~~~
_pmf_
Transactions are obviously too enterprisey for fast moving unicorns; better
spend 3 weeks to badly hack together a ridiculous farce.

------
shizcakes
I think the "securing webhooks" section is missing some critical tips that
we've learned in production.

1) Resolve the DNS of the webhook URL, and compare all returned addresses from
that resolution against an IP blacklist, which includes all RFC1918 addresses,
EC2 instance metadata, and any other concerning addresses.

2) Even though it seems like you'd want to, do NOT blindly return an
unexpected response to the person configuring the webhook. Say there was an
error, what the code was, etc, but returning the response body means you
basically just gave someone curl with a starting point on your network (see 1
as well)

3) Find ways to perform other validations of those webhooks. Are the URLs
garbage? Are they against someone else's system? Create validation workflows
that require initial pushes to the URL with a validation token to be entered
back into your system, like validating an email address by clicking a link.

~~~
pdkl95
> IP blacklist

Please stop trying to enumerate badness. This is and always will be
incomplete.

[http://www.ranum.com/security/computer_security/editorials/d...](http://www.ranum.com/security/computer_security/editorials/dumb/)

~~~
pfranz
Ehh. I disagree with both Default Permit and Enumerating Badness--I think they
have their place. If I run a club do I background check and whitelist every
customer? Or to a blacklist the troublemakers? The problems cited in the
article were reasonable decisions at the time, but years later grew into
headaches when the use-cases changed.

Does their no Default Permit policy apply to network egress? Do I have to
approve each and every application that wants to connect to the Internet? I
think the leaving port 80 open because it was whitelisted is why so many
things tunnel through port 80 instead of using other protocols and ports. Now
how do you filter and whitelist traffic?

His example of antivirus products using Enumerating Badness is a market
failing more than anything else. I'm not sure I see the alternative for a
naive user. Call a specialist to investigate their use-cases and "open the
system" to accommodate? Any time you want to update your tool or workflow or
try something new have that specialist come out and reevaluate your system?

------
sly010
Aside: Webhooks are always a pain.

Implementing polling is easier for both sides.

I routinely have to integrate with random 3rd party systems, some with no or
broken webhooks, some with no API at all. It turns out for my customers (this
may not be always the case) eventual consistency is more important than
timelyness.

What I do now every time I need to sync data from a third party is I always
implement some sort of pull first with idempotent logic on my side. It's
easier, and it allows me to just re-run things if something fails (e.g.
network error, unexpected data in production, etc).

Only when that works reliably and only if required by the customer I implement
a webhook, but I usually throw away most of the message and just wake up my
polling worker that is otherwise polling relatively slowly.

~~~
abraae
Long polling works brilliantly (where your API call blocks until there are
some results or until timeout occurs - then you loop and call again).

Long polling gives you the best of both worlds - easy programming model with
instant alerting rather than the delay of normal polling.

The only downside really is the need for a more or less permanently open
connection per client. As long as the server does not use a naive "thread per
connection" model this can scale up to many hundreds of thousands of clients
or more.

~~~
Animats
The good thing about long polling is that if the connection breaks, the keep-
alive will time out and you'll know you're not getting updates. Assuming
there's some keep-alive feature.

------
jarcoal
Sort of disagree with the send-everything-in-the-payload approach. It opens
your system up to all sorts of weird edge case bugs like receiving hooks out
of order which could mean stale data is considered fresh. It also means you
have to care a lot more about verifying the authenticity of the request.

~~~
abraae
Agree. Its better to use webhooks as a pure signal that something has changed,
and then in the case of update or insert, have the client pull whatever they
want using normal API.

Otherwise, you end up in a descending vortex of madness trying to specify some
protocol whereby the client can specify in advance which properties they care
about.

------
madamelic
Also, in your documentation, please show what the webhook events will look
like since developers actually want to write code and not guess at what we
will get.

 _cough_ Stripe.
([https://stripe.com/docs/api#events](https://stripe.com/docs/api#events))

~~~
brandur
The implication was meant to be that the information under `data/object` is
simply a full representation of another API resource of the type on which the
event occurred, and that you can look elsewhere in the documentation to see
exactly what each type will look like (you can see a subscription embedded in
the sample response for example).

Fair enough that we could rewrite this to be more explicit about that though!
We'll see what we can do to make that section more clear.

(I work for Stripe.)

~~~
madamelic
This is what I love. I purposefully talk about Stripe on here just because I
know Stripe people browse HN.

Stripe is great though. :)

------
misterbowfinger
This is going to sound bizarre, but why do webhooks and not just an AMQP
queue? I get that receiving HTTP POSTs is easier, but it just seems better to
setup a publisher/subscriber relationship. That way, if a subscriber goes
down, they can always catch up. And publishers can allow messages to sit in
the queue with a TTL and max_size. It seems like a win-win for everyone.

~~~
jon-wood
It's not AMQP (sadly) but something I've done previously is to have the actual
webhook endpoint be as dumb as possible, doing nothing but accepting the
payload (maybe with some very high level validation that the request was
expected) and pushing it into a real queueing system.

This means you can handle all sorts of failure modes, not just the backend
going down, but also bugs in the consumer that would otherwise result in
losing the request. I've not tried it, but I imagine this is a pretty good
usecase for AWS Lambda as it's a small bit of glue code.

~~~
fiatjaf
Shameless plug: [https://requesthub.xyz](https://requesthub.xyz) is ideal for
these cases.

~~~
jon-wood
That's pretty awesome, thanks for the tip.

------
boubiyeah
Webhook only makes sense if you don't care a single bit about missing updates.
If not, it's deeply flawed.

A pull model (polling, long-polling, SSE, etc) is strictly superior for
synchronisation. You just can't "miss" updates, can restart from the beginning
again and reinterpret past events in a different light, the client goes at its
own pace, etc.

~~~
paulddraper
To expand on icebraining's comment, you can use weebhooks as notifications to
poll.

------
nimblegorilla
It's also really handy when API providers give a nice webhook UI that lets you
view and resend webhooks during development.

~~~
onion2k
BitBucket is a very good example of web hook integrations done right.
Relatable, logged, and well documented. I learned from their UI when I
implemented my own version.

------
johns
I did a talk a little while back on providing a good developer experience
around webhooks that covers a lot of the same topics. I wish I had a recording
of it, but the slides are here: [https://speakerdeck.com/johnsheehan/crafting-
a-great-webhook...](https://speakerdeck.com/johnsheehan/crafting-a-great-
webhooks-experience-1)

Edit: found the video
[https://www.youtube.com/watch?v=xc5ezyJjz1k&feature=youtu.be...](https://www.youtube.com/watch?v=xc5ezyJjz1k&feature=youtu.be&t=1266)

------
simo9000
I'd like to follow up on the statement that the OpenAPI tools do not support
webhooks. This is slated to change in an upcoming version of the OpenAPI-
specification. Check out [https://github.com/OAI/OpenAPI-
Specification/pull/763](https://github.com/OAI/OpenAPI-Specification/pull/763)
to see details. As soon as this is released, it only be a matter of time
before Swagger and the rest support webhooks.

------
z3t4
To be fully client based (serverless) you need a middle-man for web-hooks.
Websockets are a better alternative for stand-alone web clients. There are
also "push notifications" via web workers but they are vendor dependent.

------
arxpoetica
Honest question. Why webhooks over something like push/pull handshaked socket?

~~~
detaro
For the receiver: Everything that can run a dynamic website can run a webhook
receiver, opening an arbitrary socket connection isn't possible in all
environments (e.g. software running on shared hosting or PaaS). You'd also
need to define and implement a protocol on top of said socket, whereas more or
less every web developer knows what to do with HTTP POST with a JSON payload.

And for the sender, keeping many concurrent connections open can be quite a
challenge. Sending Webhooks also takes resources, but at least you can easily
distribute it over many machines/processes if necessary.

------
intellent
What I am most interested in is how to test/debug webhooks during development.

How do I tell webhook providers to send test notifications to my local
development instance without tampering with the production setup on both
sides?

~~~
jeffnappi
This is another thing that Stripe nails. Out of the box it comes with a Test
mode and you can easily use this to test your webhook implementation.

Another way to handle this is to create and maintain a mocking tool that will
generate requests.

------
Animats
So "push technology" is called "webhooks" now?

How does this all integrate with HTTP 2? Can you get your notifications over a
channel you already have open for other reasons?

~~~
supernintendo
Not quite. "Push technology" is sort of an all-encompassing term for server-
to-client updates whereas webhooks pertain specifically to HTTP callbacks. An
example of a webhook would be GitHub making a POST request to some URL (set by
the user) whenever new commits are made to a repo. Push technology might take
the form of webhooks, long polling, WebSockets etc.

Webhooks are traditional HTTP requests so I don't believe HTTP/2 changes
anything. The ability to differentiate notifications depends on the service /
API you're integrating with.

------
mosselman
It would have been interesting to read about tests for web-hooks. How do you
do integration testing for example?

~~~
stevekemp
In the past when I used to support webhooks what I did was very simple:

* Receive the HTTP POST submission to my hook end-point.

* Save this data in a queue.

* Return to the hook-caller "200 OK - $ID".

This was better than trying to initiate a long-running job as a result of the
hook, and meant that I could trigger "fake webhooks" just by adding data to
the queue manually.

I'm sure there are other approaches, but this is a flexible one that also gave
the benefit of being simple. (For the queue I just used Redis.)

------
g105b
Apostroph'ed.

