
Ask HN: Securely exposing host just to webhooks? - andbberger
Background: I wish to listen for github webhooks on a Jenkins instance sitting in a private network. There are no public endpoints on the network, it&#x27;s for internal services.<p>Easy enough to add an endpoint that forwards to Jenkins, but 
I&#x27;m not a web dev person and have no experience securing public endpoints so this is a terrifying prospect to me. I could easily introduce a huge backdoor without realizing.<p>What&#x27;s best practice to accomplish this? Is there a tool that is user-friendly enough so as to prevent me from doing stupid things?<p>Or should I just forget about it and poll from inside more frequently?<p>I guess I&#x27;m really asking is security still a full time job? Because the gain relative to just polling more frequently is very small here, and the risk is enormous. So unless things are absolutely rock solid and fool proof I&#x27;m better off just not.
======
jlgaddis
Refer to the guidance suggested by Github themselves:
[https://developer.github.com/v3/guides/best-practices-for-
in...](https://developer.github.com/v3/guides/best-practices-for-
integrators/#secure-payloads-delivered-from-github)

That is, 1) put an SSL certificate on your endpoint and 2) only permit
connections to 443/TCP from Github's IP address ranges.

~~~
avtar
Suggested practice #3 is equally as important:

> Provide a secret token to ensure payloads are definitely coming from GitHub.
> By enforcing a secret token, you're ensuring that any data received by your
> server is absolutely coming from GitHub

------
alexwilliamsca
A few things you can do: 1) place Nginx in front of the webhooks service (it
will allow you to filter traffic by IP, request headers, etc - a bit less
efficient than a proper firewall IP filter). 2) listen for webhooks on a port
other than 443 (GitHub allows this). 3) use a unique URL for webhooks, with a
unique/random/long token as part of the URL (this way only someone who knows
the exact URL will be able to reach it). 4) of course, use a valid TLS
certificate 5) validate __all __headers sent by GitHub (user-agent, x-github-
delivery, etc). 5) provide a unique /random/long shared "secret", different
from the URL token, for validating the sha1 signature of the request. 6) only
accept a valid JSON payload and application/json content-type. 7) only accept
specific events from the x-github-event header (ex: push, ping). 8) reject
EVERYTHING ELSE with a 404. 9) validate the actual content of the JSON payload
(does it contain the proper key/value pairs you need? discard the rest). 10)
enable audit-logging of requests, so you can see any attempts at people trying
to "hack" your webhooks service.

I recommend running the webhooks (external service) as an entirely different
application from your internal services. If it's a nodejs app, and your main
internal app is nodejs, then you'll need to run 2 nodejs processes (and not as
root).

Also if you can, try running the webhooks service on an entirely different
machine (vm?) - and have it talk to Jenkins through the network (ex: as others
have suggested with a message queue or API call).

If you're filtering by IP (might be troublesome if GitHub's IP range changes),
most of the above will be overkill.

Edit: to answer your last question: security is a process, whether it's full-
time or not depends on how much you care. Edit 2: fix typo

------
niftich
One way, like you say, is to set up a listener on a public network and have it
spawn a different event, which gets handed down to your private server through
a specificly-punched hole. But there's a lot of things to pay attention to:
the public listener is fully public and will probably attract all manner of
hostile traffic -- DDOS attempts, scans, canned exploit attempts -- which you
have to be resilient to, and filter out. And even if you receive a properly
authenticated payload from your webhook, you're going to want proper input
validation (e.g. whitelisting expected values and discarding anything else, so
that your lower program isn't directly executing unvetted output from the
outside).

There is tooling that makes some facets easier (e.g. API proxies, which handle
in-software rate limiting, authn/z), but the inherent complexity of the
problem remains.

An alternative that comes with much less exposure and much less runaway
complexity is to poll from the inside.

~~~
zAy0LfpBZLC8mAC
> An alternative that comes with much less exposure and much less runaway
> complexity is to poll from the inside.

How exactly is it much less exposure when you are presumably handling the
exact same information?

~~~
markkanof
I would guess the OPs concern is inadvertently providing access to entities
other than Github. If you polled from the inside then you could be sure that
the only thing you would need to worry about is how the data that is pulled
down from Github is parsed because you are not providing access to anyone at
all, including github.

~~~
zAy0LfpBZLC8mAC
> I would guess the OPs concern is inadvertently providing access to entities
> other than Github.

Well, yeah, sure.

> If you polled from the inside then you could be sure that the only thing you
> would need to worry about is how the data that is pulled down from Github is
> parsed because you are not providing access to anyone at all, including
> github.

That's a highly confused way of looking at things.

Whether you consider a process "pulling data" or "receiving pushed data" is
ultimately entirely arbitrary. You can view any "push" as a "pull" and any
"pull" as a push if you shift perspective slightly.

For some reason you decide to use a push configuration. How do you do that?
You create some address where data can be delivered to. And then you inform
the potential source of the data about that address, so it can _push_ the data
to you. So, really, you are just sending a request to send you the data, i.e.,
you are obviously _pulling_ , right?

OK, say you don't like a push setup, so, let's say you send an HTTP request to
a server in order to _pull_ some data? Well, yeah, sure you do. But then,
really, you are just creating a multiplexing identifier that allows you to
receive data and you inform the HTTP server about that address, namely a TCP
four-tuple plus sequence numbers, which the server can use to subsequently
_push_ the object to you.

Mind you, I am not just playing with semantics here: Anyone who knows the
address that you submit to the "pushing" side can send you data, in both
cases. Well, not quite as easily with the TCP connection state, as you have to
fake IP addresses for that, but that's not really a huge barrier. The
underlying layers give you no guarantee at all that the data you receive on an
outgoing connection comes from the entity that you intended to connect to,
that's just an illusion.

So, what do you do? You authenticate. You use TLS, you use SSH, you use keys
and certificates, to cryptographically authenticate that the data that you
receive is indeed coming from the entity that you want to accept it from, and
anything that isn't authenticated successfully, you drop on the floor. Whether
you do that on an inbound or on an outbound connection is irrelevant. And if
you fail to do it in either case, you have a security risk.

Essentially, the idea that you aren't "providing access to anyone" is an
illusion. When you can receive data from someone, you are "providing access"
on some level, and when you pull data via HTTP from github, they obviously can
send you data in response--and, as we have seen, others can as well, so you
can't even be sure you are getting the response from github. That is, unless
you authenticate cryptographically that it indeed does come from github--but
then, there is no reason you couldn't do the same on an inbound connection.

Pull and push only have a useful meaning in terms of scheduling, but that has
nothing to do with security: A pull is when the requested data is pushed as an
immediate response to the request, while a push is when the requested data is
pushed in response to an event external to the established link. See also HTTP
long polling (outbound connection used for "push" from the server) or SMTP
TURN (inbound connection used for "pull" from server).

------
moltar
You can use a message queue, to which you add the webhook payload. Then your
Jenkins server can listen on this queue.

~~~
markkanof
Can you explain more about how this would address security concerns?

------
johns
ngrok

------
zAy0LfpBZLC8mAC
> Because the gain relative to just polling more frequently is very small
> here, and the risk is enormous.

What is that enormous risk that you see?!

There is nothing inherently more risky in handling "inbound" connections vs.
"outbound" connections. Either you are doing stupid things with untrusted data
(i.e., anything not from your own trusted systems) or you aren't. How the data
comes into your system is completely irrelevant. If you have too little clue
to know how to build a secure listener, chances are you shouldn't be building
a poller either.

