* Timeouts: the user can set up a webhook receiver that takes very long to generate a response. Your service must be able to deal with that.
* Timeouts (slowloris): the webhook target could be sending back one byte at a time, with 1 second pauses inbetween. If you are using, say, the "requests" python library for making HTTP requests, the "timeout" parameter will not help here
* Private IPs and reserved IPs: you probably don't want users defining webhooks to http://127.0.0.1:<some-port> and probing your internal network. Remember about private IPv6 ranges too
* Domains that resolve to private IPs: attacker could set up foo.com which resolves to a private IP. It is not enough to just validate webhook URLs when users set them up.
* HTTP redirects to private IPs. If your HTTP client library follows HTTP redirects, the attacker can set up a webhook endpoint that redirects to a private IP. Again, it is not enough to validate the user-supplied URL.
* Excessive HTTP redirects. The attacker can set up a redirect loop - make sure this does not circumvent your timeout setting.
My current solution for all of the above is to use libcurl via pycurl. I wrote a wrapper that mimics requests API: https://github.com/healthchecks/healthchecks/blob/master/hc/... (may contain bugs, use at your own risk :-)
There's a clever extension to this attack; a naive way to mitigate it is to do a DNS resolution first to verify it's not a private IP and then do the actual request. An attacker can simply return a public IP on the first DNS resolution (with a 0 TTY) and then return a private IP on the second. This is called a "TOCTOU" (time-of-check time-of-use) vulnerability. I've written about this and other security best practices on my blog here - https://www.ameyalokare.com/technology/webhooks/2021/05/03/s...
I've also built an egress proxy that prevents such attacks here - https://github.com/juggernaut/webhook-sentry
Same caveat applies, use at your own risk :-)
Does anyone have suggestion for such tools/services that they might have used in production?
Timeout attacks: instead of failing you'll now be just be paying through the nose. And also with enough scale you can hit lambda limits and fail.
SSRF: you are still as vulnerable unless the lambda is outside your VPC (but then it's the VPC that solved it, not the lambda).
Resolution and timeouts: the aiohttp library for Python is slightly better in terms of letting you configure these things, though it's better to just use a sending proxy that does it all for you and is also located in an isolated VPC to make sure that you're protected.
Not saying that there aren't valid use-cases (e.g. maybe some sort of dynamic webhook receiving), I'm just saying that we made this choice given the above, and we would be willing to change it if it's ever a barrier for someone.
Fair enough, but often consumers that use multiple vendors receive webhooks in the same service; think about going from /v1/webhooks -> /v2/webhooks, they'd have to change the URL for every vendor. Easier to redirect first then update the URLs later. I think it's a reasonable expectation that a HTTP client would honor redirects as long as the usage isn't malicious (like loops etc)
https://youtu.be/TWohosldLP4 (video specifically for a Lambda function)
Some also don't follow 301s or 302s and will process only specific http responses.