Hacker News new | past | comments | ask | show | jobs | submit login
The Decline and Fall of BIND 10 [pdf] (ripe.net)
66 points by ook on May 14, 2014 | hide | past | favorite | 59 comments

Wow I feel that pain. I designed a replacement for Sun's yellow pages service called NIS+ and with a couple of awesome engineers got it built and into production. It changed everything about the old YP. And if there is one thing system administrators really hate, its change that isn't compatible with simple mods to their shell scripts. That lead to an interesting effort to make a NIS+ light which was more like YP.

The evolution of BIND had very similar sorts of challenges it seems. Earlier version worked new version was all different. So different that their customers (the system administrators) seem to have revolted. Ouch.

When you are building a part of the infrastructure that lots of people have to manage to keep the infrastructure running, its a special kind of challenge. Both in deployment, change management, and service evolution. Even after going through the process with NIS+ I'm not sure if I could even chart a path for a replacement BIND.

The secret is to make the upgrade path seamless - if I have to navigate an obstacle course of library dependencies, config file format incompatibilities, system conflicts or complicated data migration issues, I'll just skip the whole affair.

All too often devs get blinders on and don't realize that most people don't have their domain knowledge and therefore it's not quite so easy for them to see the forest for the giant stacked trees that are lying across the path.

If you can't avoid that morass, the new version better offer something so compelling you can't possibly skip it.

I remember that Sun's internal IT guys refused to run NIS+ for years and years, stable or otherwise. I converted a site from NIS to NIS+ in early 1994; approximately 25,000 users. I had guys from Sun external tech support calling me for a few years asking for tips on how to make it run faster and how to recover it when it went belly up (which it did from time to time). Having said that if you ignored the manuals and reached into its innards you could get it to restart in the space of a few seconds; even on a lowly sun4c workstation.

In my experience it was far more flexible than NIS which was both a good thing and a bad thing. The one big advantage it had for me was its support for SecureRPC. In time it was killed by LDAP.

Just want to say thanks, the automounter, nfs, and nis were great in concert.

Until they aren't.

Too many hours spent troubleshooting automounter / NFS failures through NIS and LDAP.

I had heard through the rumor grapevine that NIS+ was designed/built due to certain requirements from a Large Customer, and that Sun itself still used plain NIS internally. Interesting to find out that was wrong.

Nope, it was a bit weirder than that. I started work on it after Sun had signed the agreement to merge System V and SunOS with AT&T. We were beavering away on it as the Zeus Name Service (ZNS) (Zeus was the code name for Solaris 2.0, aka the first merged product) and after a while it became obvious to AT&T there was this thing in the Sun product that wasn't in the 'merged' product. We explained to them what it was and they said "Well we want that in our product too." and Ed Zander said, "Well we couldn't possibly agree to that without more money from you guys." And they said "How much?" and Ed quoted them some multi-million dollar sum (I wasn't privy to the actual conversation and have heard different numbers for it), and they said "Ok fine."

To say that it surprised Ed would be an understatement, and it set off a series of events that left me feeling pretty unsatisfied. The first thing it did was bring attention to the next version of the ONC product which was going to include NFS V3 and NIS+ and nominally just ship to customers. Except that now NIS+ was "valuable" SunSoft (the sub-company in Sun responsible for Solaris) didn't want to just _give_ it away (original plan) they could charge big bucks for it. So they rechristened it to ONC+ and wrapped some egregious license terms around this "new" product. That lead to a ferocious debate inside of the company, fractured relationships with long term ONC licensees (like SGI and HP), and an edict to not ship it on SunOS 4.1 to "encourage" adoption of Solaris 2.0. I would have left Sun at that point if James Gosling hadn't recruited me to come work on project Green, aka Java.

Wasn't NIS+ also a Sun-only product for a while? Or at least, if it was available for other Unix systems, was it an additional cost? I seem to remember that we couldn't use it back in the late 90's because we didn't have an AIX version or something like that.

Interesting view into the human elements of a software project.

BIND is one of those programs that scares the living crap out of me, both from a security perspective and from a complexity perspective. (Gee, could those be related?)

Just take a look at the list of BIND vulnerabilities. We're still finding em, and I'm sure we'll continue to for years to come.


For a refreshingly simple and secure DNS serving experience, I highly recommend djbdns / tinydns / dnscache:


>BIND is one of those programs that scares the living crap out of me, both from a security perspective and from a complexity perspective. (Gee, could those be related?)

I couldn't agree more. I'm currently TAing a Bachelor Security course where I had to design some challenges for students to break into systems and one of them is DNS cache poisoning. I've spent the last few weeks digging into Bind's code, DNS implementation and old vulnerable versions and it's amazing how much stuff had gone ignored in the past and how many subtle vulnerabilities can hide for years in very complex software and protocols.

Kaminsky's DNS cache poisoning attack is fun.

Note that djbdns wasn't vulnerable to the attack, because it was designed from the outset to exploit other sources of randomness. (I'm a little rusty here, but) The countermeasures djbdns employed against the attack were known all the way back when djbdns was started, and proposed on mailing lists before it. But BIND didn't employ them, because (I think, but could be wrong) that the BIND architects believed DNSSEC would be the operationally "correct" solution.

I trust djbdns more than perhaps any other piece of C code I run, and recommend it.

The last released djbdns is vulnerable to simple poisoning attacks however. It does not merge duplicate queries, permitting infinite retries of spoofing (just like Kaminsky's child label-NS replacement technique).

This failure undoes much of the benefits of source port randomization. A patch is available from a third party, but it is not part of the standard source tree.

There are also attack scripts for djbdns.

BIND, Unbound and other recursives have both SPR and duplicate query control. This fact explains things like this:

From a poisoning point of view, BIND and Unbound are more secure than djbdns. To be fair djbdns is no longer maintained, so this poisoning fix has not been integrated).

Security is relative. From a software coding point of view, djbdns is smaller, has fewer features (e.g., ipv6 and dnssec lacking). I.e., there might be coding errors in large code bases.

But for me, an unmaintained code base with known, unpatched attacks is not a good choice.

If BIND is fairly vulnerable, why does everyone say it's the best/de-facto standard in Linux/DNS Servers? It's as if you use BIND & only BIND. Is it really that bad underneath?

I recommend djbdns.

You recommend something that hasn't been maintained in over a decade. Something that needs all sorts of random patches to still be useful? Random patches that may or may not introduce random vulnerabilities?

BIND may have had a (really) bad run in the BIND 8 days, but BIND 9 was surely an improvement.

And if you really want to stay away from BIND, you can go for something like PowerDNS. Or a combination like PowerDNS+Unbound.

I'm not sure what patches you're saying are required. Most startups don't need much more than what comes in the box. What's crazy to me is the idea that they'd opt into BIND 9 preemptively, not knowing what their extended needs are.

This is, for what it's worth, the original objection to qmail as well --- "it's not even maintained". It's not maintained because for the core job it does, it's _correct_. There's not a lot of mail software (at least, not software written in C) that can make a competing claim.

On rare occasion, someone actually finishes a program. Just because there aren't a mad rush of patches doesn't mean that a program is not up to snuff.

Having been through djbware patch hell, I tend to agree, but did he not finally actually open source his software (both qmail and djbdns)?

Has that addressed the patchiness blechery?

Best and defacto do not necessarily coincide. BIND is just one of those classics that have been around forever and have huge momentum behind them. BIND also powers many of the root servers afaik which gives it some extra credibility. But what is best for root servers might not be best for your average personal/small-business/startup use.

No, it isn't /that/ bad underneath.

But bind is a fairly big and complex code base. Which is probably also tied to why people use it - it's got the features people need (and a _huge_ community and install base.)

  People Fear & Hate Change

  BIND 10 is quite different for administrators

  – Lots of dependencies, slow build
  – Lots of processes
  – Tool to configure, not configuration files

  People hate change.
Do they? Or do they hate things being worse?

Having lots of dependencies can actually be a huge pain. Yes, reusing existing software is wise. And most of the time, these days, on a modern unix, you're covered by the package manager. But not always. Someone is going to be building or installing on some freaky system, or some old version of something, and every additional dependency is going to be like a red-hot poker up the fundament. More dependencies is wrose.

Having lots of processes generally does make things harder to manage. Yes, there are advantages in terms of fault isolation and privilege separation. But it means that you can't just throw off a quick pgrep to see if the service is up, you have to worry about some parts of it being up. More processes is worse.

Using a tool to configure something instead of using a file ... nah, i've got nothing. There's nothing good about configuration tools. I have never come across a situation where i used a specific tool to configure something and didn't wish i was just editing a file. Tools for configuration is worse.

I'm sure there are lots of really awesome things about BIND 10. It clearly had a huge amount of care and attention given to it. But it does sound a bit like its developers were blind to the need of the system administrators who they hoped would ultimately install it.

I'm curious about this quote on page 21: "Administrators really hate Python. Really. HATE."

I hadn't heard anything like this before, what is the reasoning here?

One reason is that versioning is really hard. Now that Python comes standard with Linux distros, and in fact some of the built in tools are written in Python, it becomes really hard to upgrade. So you can never touch the version that your OS ships with or you'll break some part of the OS. But those are usually years behind and developers generally want the new hotness so you end up installing newer versions elsewhere. There are tools built around this like virtualenv but you end up with a bunch of different versions of Python installed in different places, and then installing modules becomes something that happens outside of your system's package management software so you can't track dependencies etc. And python itself has really weak package management, or rather several competing and incompatible systems but you'll download modules that use each of them so you end up with all of them. At least Ruby and Perl were able to standardise on one each.

So very much this. As a developer, I just love "pip install whatever", but as a sysadmin, it makes me cringe every time - how do I manage this? What happens if I have to reinstall? What do you mean, you want to put files in /usr that aren't known by the packaging system that came with the OS? And other langs seem to be following the same painful path (cf Ruby gems, which I don't believe solve these problems any better than Python). I first encountered this in ROS (http://ros.org), which while really awesome, always felt like a big pile of Python spitballed together.

And lo, Python became Perl, and the cycle was complete.

I completely agree. I've worked at the large companies where we were stuck with Python 2.5 because of LTS versions of distros in production (around 2010).

I work at Continuum using our Python distribution (Anaconda[1] and Miniconda[2]). We sell it as the Data Scientist's answer to python, but as a sysadmin, I really think it is the answer administrators have been looking for. It installs into one directory that can be blown away if needed. It provides virtualenv-like environments except it you can isolate different versions of python or any binary. You can "activate" your environment, just by setting your PATH to find the env's bin first.

The conda package manager installs binary packages, so you don't need to compile c-extensions on each machine. Conda allows you to build all of your packages for each arch. Binstar is our alpha package hosting service that lets you host packages and it has a multi-platform build service (super-alpha). And you aren't limited to Binstar if you want to host your own repos -- the commands are built into conda.

If you want to distribute non-python binaries, you can do that as well. I'm using conda to install node and node packages. I really like using conda for node binaries, because I can build them using npm and then I have a way to push versioned node binaries to production. (Internally, we've built most of the R packages, postgres, mysql, mongo, nginx as conda packages to provide non-root users a way to install normally root-only software).

[1] Anaconda: http://continuum.io/downloads [2] Just python: http://conda.pydata.org/miniconda.html#miniconda

What are the "several competing and incompatible systems"? I haven't used anything other than pip, ever. I know of easy_install, but as far as I know pip is an outright total replacement for it.

The tools installed by pip won't usually work with the old versions of Python found by default on some systems. They break because they want a new Python feature, and that makes you want to upgrade and that... you can't do that because it will break dependencies at system level.

And then you want to install a new Python side by side and you sort of do that but /usr/bin/python still points to the old version and you have to be explicit with /usr/bin/python2.6 in your scripts. But then some scripts will work and some still don't and you make /usr/bin/python point to /usr/bin/python2.6 and now you really broke yum.

And all you can do now is hate python because it gave you all these problems.

And I can't imagine what it is like to deploy Java apps if you have no actual familiarity with the JVM ecosystem, either.

As a sysadmin, I hated java way more than python :-)

At least, with java, no one even attempts to use distro package tools to manage libraries, packages and dependencies. Can you see a sysadmin using a syslog replacement implemented in java, requiring java8, along with a webserver only running on java5?

With python, some tools (those big enough to be packaged) are no problem. Some random tool from pip -- not so much -- now you have to patch that python binary, those python dependencies etc... and not just track security.your-distro.org.

It is certainly possible to deploy your system in a similar way you deploy your apps (take on 100% of the burden of maintaining the whole thing) -- but normally you want the "system" to be a stable foundation on which to erect your rickety duct-taped mono-jvm-python-ruby app-stack with the help of puppet/chef/cfengine/salt/ansible... you don't want pip/hackage/gem/npm to manage your system. You might trust it to manage a single application.

It should be very easy. Things are never installed system-wide -- no cross-app conflicts. When starting an application, either you manually specify each file or directory to include, or you just package every dependency into your executable jar file.

What is your point? That administrators really hate Java too?

This system administrator hates Java almost as much as Windows. I'm still enraged at the old lie "compile once, run everywhere". Just look at what Java does to the Red Hat "alternatives" system, although I can't imagine any prettier body cast around that catastrophe.

Reading your comment the irony occurs to me that the best solution to "compile once" has been real VMs. Not Java VMs, but full god damned fucking virtual machines running the OS and environment you determine should run in them.

You've still got to provision those, but, well, they're individual and isolated and tend not to go hammering into one another. Other than consuming all available system resources.

Oddly: IBM got this right with VM ... 40 years ago?


That's a challenge with _most_ scripting languages.

It used to be a major PITA with Perl. I've run into it with Ruby (and the proposed solutions are even more horrible: RPM).

And yes, Devs absolutely want teh neu hawtness.

I thought there was something going on with Python v2 & v3 that would solve all of this? Then v3 was shelved or something? I saw an older post here on YC about it, but it was above me since I'm mostly a Windows admin :(

As an experienced administrator, I can tell you most hate change. So, a lot of admins are still using Perl and Bash scripts.

Python doesn't really bring anything to an admin's life that isn't covered by those other two workhorses from the 90's toolbelt, except irritating change.

I would refine that to say administrators hate change when they have to bear the brunt of adverse effects of change.

When a lot of organizations demand administrators keep a production net stable, with a staging net that bears only a faint resemblance to the production net, and a development net that is a caricature, is it any wonder the administrators despise change? I've been in some shops where the poor bastard administrators only have production to work with, and they are the ones yelled at and the quarterly reviews dinged when a change forced upon them, surprise, surprise, breaks a critical LOB process.

Perl is fine. Your distro comes with it, you download some script off the internet, maybe you have to install something from CPAN, but it generally works. What version of bash are you running? Does anyone care? It's just a default that comes with your system. It's not that sysadmins are objecting to writing python, it's the pain involved in running other people's python that is the source of their complaints. And it is handy to grab a script from your home directory that you haven't run in years but you wrote for that one situation that has cropped up again, and have it still run. That is far from guaranteed with Python unfortunately, but probably works with Perl or Bash.

Nailed it. 100% I'm a believer in python, and even ruby. The problem is that most of my colleagues have no interest in learning a new language. So deploying stuff with it, when there's not enough benefit is really impractical at best, and unnecessarily annoying at worst.

I get where you're coming from, but I will still claim that maintaining other admin's python is preferable to maintaining their perl (or even bash) scripts. So I wouldn't say it doesn't bring anything but change.

When you can't 'yum upgrade python' because it breaks half of your system, the threshold for hating drops really fast.

What kind of idiots tied their system to Python 2.4?

The same "idiots" who guarantee API (version) stability for many, many years.

I won't debate this more than necessary.

Nobody stops you for shipping system tools with Python 2.2 if that is what you want. But don't step on my toes and tell me that I can't install 2.6 and set it as the default one.

If you do that you give me a license to call you names.

Python 2.6 is available in EPEL5 [1]. It runs in parallel, so the default remains the same, but it's better than nothing.

[1]: http://fedoraproject.org/wiki/Python26#Available_Packages_fo...

The people that developed/administered stuff when Python 2.4 was brand new.

You missed my point. The stupidity is not in using a particular version of a software you have at a time. But in the limitations you get when you can't upgrade or install another default version without breaking the system.

Of course you know how easy it is to make your scrips use python2.4. Just make the shebang #!/usr/bin/python24 instead of #!/usr/bin/python. Then you can install another default version and if you keep the old version around you don't break anything.

Since other distributions are doing a pretty decent job at it (see gentoo), they don't have any excuse not to do it right.

Imagine me explaining this with my eyes closed and my hands hidden in each sleeve. This is not an excuse, only a subjective observation:

Python has properties that are not congruent with the expectations of an inexperienced administrator. Its scent is unlike the other creatures that exist in our fold and it makes us feel inadequate that we cannot control it with the skills we use to control the rest.

Off the top of my head, the only way I can find to hate python is if an application was made to not be able to run under virtualenv or any of the really great environmental separation tools that a modern python setup uses.

If "using python" means intertwining the app into the system python / requiring sudo, etc then I would hate it too.

virtualenv being described as 'really great' is... interesting. on a production system, it is a workaround to prevent truly horrible implementation from stopping work.

I'm sure it's really great on a developer's laptop... but the standards are and should be different. virtualenv is, to sysadmins, the symptom of a disease.

As a onetime sysadmin, the problem with deploying into virtualenvs is that suddenly I need to maintain loads of security patches again, rather than rely on my distribution's patch management.

At a large enough place, that might be alright, since we might need to front-run or lag the distro anyway. But now I have to worry about n different Python versions and security patch versions, and it's a headache.

Give me a package of known-goodish stuff, not something which relies on lots of other fiddly stuff which I might need to build by hand, or maintain by hand.

The biggest thing is that adding Python doesn't mean you get to remove anything else. If I now need python, I still need a bourne shell, I still need perl, I still need c/c++.

And frankly significant whitespace still pisses me off: as a sysadmin, I have to make a lot of very small tweaks to random code work in various languages, and most of them have the decency to let me see where the blocks are with my eyes, instead of with my cursor counting tabs and spaces. I presume people who spend a lot of time with Python use editors that show spaces, but since everything else doesn't care (other than Makefiles, but Makefile whitespace is very simple), I don't know how to turn those options on, if they even exist in my editor. And I say this as someone who writes in erlang, with it's shitty line endings, when I get to take off my sysadmin hat.

About Erlang, what if those ;,. weren't needed after all?

I'm working on this: https://github.com/fenollp/kju/blob/master/examples/snippets...

djb is to be commended for his foresight. I always found it depressing that people would slate him for doing things his own way. I'm reminded of this quote:

  "Don't worry about people stealing your ideas.  If your ideas
   are any good, you'll have to ram them down people's throats."

        -- Howard Aiken

I still can't get in my head that writing DNS server is so hard.

Looks so easy on surface. Made me want to write a toy server just to prove myself.

You know, I did actually start writing one recently [1]. It's both more and less complex than I imagined it. A basic caching resolver, especially in a reasonably high-level language is not a bad time. I think the complexity starts coming in when you start keeping a database of authoritative zones, etc.

If you are writing it in C... things are more complex. All DNS records are variable length, so you have be damn sure you don't have off by one errors, etc. Also, proper DNS works over both TCP and UDP so you have to architect it such that you can work with both. Not knowing anything about the code of BIND, I can say this would take quite a bit of careful planning to not get wrong. I can see a whole slew of buffer overflows happening in all these places.

[1] https://github.com/ipartola/pulpdns/blob/master/main.py

My idea for this code was to have a DNS server that keeps track of which domains are the most requested and in the background keeps a thread running to always keep them in the cache. I noticed that even with a pretty fast connection my DNS resolver on my LAN can take 100-300 ms to return a name not in a cache.

In part, this is due to the fact that features were added to the protocol as needed - or as it seemed necessary at the moment. I don't think anyone is to blame, the whole Internet grew like that.

Using specified-length strings like in nginx should guard pretty well against overflows.

Please note that RIPE archives everything and, as a result, the video is available: https://ripe68.ripe.net/archives/video/153/

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact