DNS Infrastructure at GitHub

chrxr · on May 31, 2017

I'm always interested to know how long these infrastructure changes took from project initiation to launch, and whether it was a dedicated project team or something completed alongside BAU tasks.

These details are almost never included in these write-ups. Anyone have any guesses?

logicalstack · on May 31, 2017

This was a roughly six month project for a single engineer working around 75% of the time on it, with help from other folks along the way for code reviews and etc. The first three months was research, planning, implementation, etc and the latter three months was a very careful roll out and migration from the old system to the new and finally decommissioning the old system.

chrxr · on May 31, 2017

Thanks very much for the reply. Very useful. I think these kinds of details really help people in other organizations who might want to undertake similar projects.

pbarnes_1 · on May 31, 2017

Do queries to github.net stay internal or do you also sync github.net zones to Route53/Dynect ... just in case?

We have a similar setup with unbound and nsd (no need for powerdns for us). Even then it took a while to get it right because JVM apps especially love to hang for no reason doing NS lookups. You also need to specify -Dnetworkaddress.cache.ttl= etc since they don't listen to TTLs.

Running unbound on every single machine has saved us a lot of downtime.

logicalstack · on May 31, 2017

Nearly all of our internal zones are internal and not sync'd to an external provider. In a few cases we need to perform lookups of internal zones external to our network and those zones live both internal and external.

bogomipz · on May 31, 2017

I noticed PowerDNS in the mix, can you say what backend you are using with PowerDNS and how that has been?

logicalstack · on May 31, 2017

We use the mysql backend and http API, a few small nits but for our purposes it has worked very well thus far. Note that our authorities never see production traffic outside of AXFRs from our "edge" hosts so I can't say how well it works for other use cases.

nik736 · on May 31, 2017

What's the reason you've chosen MySQL over the bind backend when you are using the API anyways? I have to make a similar decision soon and I am not really sure yet, any insight would be appreciated.

logicalstack · on May 31, 2017

Full access (read and write) to the PowerDNS HTTP API requires one of their generic SQL backends (via https://docs.powerdns.com/md/httpapi/README/), such as MySQL. The bind backend only supports reading from the API, changes to zones would need to be done on the file system and/or using pdns_control. Beyond that having all our records queryable via SQL has been nice for debugging and researching our own DNS records, types and etc. Lastly, backends like the MySQL one allow for things like auto generating serials and adding comments to the DNS data.

Habbie · on June 1, 2017

PowerDNS developer here - any nits we should know about?

bogomipz · on May 31, 2017

Thanks!

antoncohen · on May 31, 2017

Do you still run a local caching DNS daemon on every server? If not, why the change?

logicalstack · on May 31, 2017

Yes, we still use local caches on each host.

tedivm · on May 31, 2017

I'm curious if they're using DNSSec at all. I notice they're using Dynect for this, and in my experience DNSSec and Dyn do not get along (unless you're not using any of their special features like geotargeting), so it I'm interested in hearing how they've managed to get all that working.

ktta · on June 1, 2017

I'm curious why people ask about DNSSEC support. None of the major browsers support validating it.

Even to validate the DNSSEC records by yourself, there is only a single website available[1] (which doesn't even have TLS). I want DNSSEC to catch up, but adoption level is a joke.

[1]:http://dnsviz.net

vegardx · on June 1, 2017

You're not limited to just browsers, and a perfect use-case for dnssec would be in combination with sshfp records for ssh, incidentally something GitHub heavily relies on, and where support is much better.

Adoption is slow, nobody argues there, but when you've set it up and have routines for rolling keys it's more or less self-maintained.

Google public DNS will return servfail if validation fails, which is a step in the right direction.

There are plenty of tools to validate dnssec, even with TLS [0]. But I'm not sure why you would need a webpage to do it. You can easily grab the root keys and validate the whole chain using dig on your own computer.

[0] https://dnssec-debugger.verisignlabs.com/

betaby · on June 1, 2017

$ dig +dnssec github.com will give the answer and the answer is NO.

codazoda · on June 1, 2017

I don't see any mention of HTTPS support for custom domains. I wonder if this helps move the needle on that. I had moved a lot of project hosting to my paid GitHub account but SSL has become a necessity (SEO and privacy) so I'm launching sites on Digital Ocean again. I'd love to have less server config to do though.

chrisfosterelli · on June 1, 2017

Not everyone loves this approach, but putting Cloudflare in front of Github pages works great for getting easy SSL.

voltagex_ · on June 1, 2017

>We configured zone stubs in the caching daemon to direct queries locally rather than recurse on the internet.

What does this mean?

Habbie · on June 1, 2017

It means that for those zones, they explicitly put the IPs of the edge servers in their resolver (Unbound) configuration, so that lookups of names in those zones don't have to go the root servers and then the TLD (like .com) servers, only to find out that the authority (the edge servers in their design) are in the next rack. Instead they will go directly to those edges. This gives them "Addtionally, public zones are completely resolvable within our network without needing to communicate with our external providers. This means any service that needs to look up api.github.com can do so without needing to rely on external network connectivity."