
Why is BIND 10 written in C++ and Python? - AndrewDucker
https://www.isc.org/wordpress/programming-languages-for-bind-10/
======
EwanToo
There's a comment in the post:

"As of right now, it ends up that about 75% of our code is C++ and 17% is
Python (link) since it turns out that a lot of BIND 10 is performance-
critical."

Which could easily be taken the wrong way. I believe the right way to think
about it is "How much _more_ C++ code would there be, if there wasn't that 17%
in Python?".

~~~
bnegreve
Hum, I don't think interpreting this comment as _Python is not adequate for
performance critical applications_ is misinterpretation. The author makes it
very clear in some other part of the text:

> [Python] has all of the features that we were looking for… except
> performance.

Now your interpretation is also valid since they could have written everything
in C++ but they didn't.

~~~
illumen
There is a python DNS server running on pypy that seems to get a pretty big
speedup over CPython.

Here is the benchmark result:
[http://speed.pypy.org/timeline/#/?exe=3,6,1,5&base=2+472...](http://speed.pypy.org/timeline/#/?exe=3,6,1,5&base=2+472&ben=twisted_names&env=1&revs=200&equid=off)

0.002 for pypy VS 0.015 for CPython, which is 6.66 times faster.

So, for the last couple of years they have been wrong - since PyPy has shown
pretty good performance in a DNS benchmark, so Python can have pretty good
performance ;)

~~~
jerf
As much as I respect and appreciate Pypy, BIND is one of the core foundational
pieces of the internet. It needs to be bulletproof and robust. I don't think
Pypy itself qualifies as a mainstream enough language for them to build BIND
on. This isn't a slight on Pypy, because the core problem is simply that it
hasn't been around for long enough and subjected to enough stress to earn
entry into the highest of the high tiers of reliable software. That's not a
bad thing per se, and it can be fixed, but in the meantime, ISC has to take
the situation as it is now, not how it might be in five years.

~~~
hcarvalhoalves
> I don't think Pypy itself qualifies as a mainstream enough language for them
> to build BIND on.

Minor pedantry: PyPy is not a language. It's an alternative compiler/runtime
for Python.

~~~
jerf
I was echoing the terms used in the post. I thought it was clearer than going
into a long comment about how Python is mature, at least in its CPython
implementation, but the PyPy implementation is not, and when the blog post
said "language" they really meant "implementation" because language is not
implementation, yada yada yada.

------
belorn
C/C++ and python is a nice combination, and from what I hear, even a best
practice when dealing with python optimization. I strongly recollect hearing
python developers promoting the idea of first analyze performance, and then
rewrite the biggest resource hogs to C code and thus get the best from both
worlds. I assume that c++ will have the same usefulness here.

It would be very interesting to see the code paths being run in python vs C. I
suspect that the 17% python code is actually is around 80% of all possible
code paths, but that the 75% C code is just like 5-15% of possible code paths.
A possible way to check that could be by looking at the test suites and see
which one is bigger, python tree or the c++ tree.

------
nodata
A good overview of BIND 10's architecture:
<http://jpmens.net/2012/12/21/completely-different-bind-10/>

It seems that all apart from the performance critical parts are written in
Python 3.1

~~~
pixl97
Wow! Someone at the ISC has finally learned what Unix means. I've not used
Bind for years after being burned by it's security issues back in the 90's.
The 'one big monolithic' program was a terrible idea. I had used DJB dns for
years, even with its shortcomings in areas, the modular design caused a bug or
security issue in one area not to kill the entire system. Hopefully with bind
10 we'll be able to use user separation (or at least enough selinux) to keep
each program in its own security 'domain'.

~~~
agwa
BIND 10 is not the Unix way. In BIND 10, the use of separate processes is just
an implementation detail. The Unix way is about presenting small programs to
the user that can be composed. To the user, BIND 10 is actually more
monolithic than BIND 9 because it needlessly includes a DHCP server. The Unix
way is also about human-editable text files - SQLite zone files and icky JSON
config files are the antithesis of that.

I'm not saying the separate-process design of BIND 10 is bad (to the contrary
it's good for security), but using multiple processes internally is only
Unix-y in a superficial way.

------
nicholassmith
I think it's interesting that they did consider C and then decided it was too
much of a risk. There's a constant theme of 'why use C++? C is faster and
doesn't suck as much!' especially due to Linus' statements on the language,
but often C is a technical risk as there's so much more that can go wrong. I'm
not saying C++ is a better language than C, but it's definitely different and
given the language is structured to be less of a headache I'm surprised it's
often shot down over C anyway.

~~~
justin66
It's a tangent, but: has Linus ever given a presentation that involved
actually discussing code? Perhaps code projected on a screen, even part of a
slideshow?

It's odd that he's such an accomplished engineer but every presentation I've
seen him give involves saying outrageous things while starry-eyed geeks stare
at him adoringly. If it weren't for the fact that he's arguing from a position
of (very great) authority, he would persuade very few people of anything.

~~~
beagle3
Ah, but he has earned this position of authority by actually delivering. He
did not get that position of authority randomly, nor by an act of nepotism nor
by making promises during an election.

First, by delivering working code that ended by running the majority of phones
and smart devices out there. Also, by revolutionizing source code control. No,
he didn't invent (almost) any of the concepts behind git, and by now the
majority of the code was not written by Linus. However, he was able to strike
a balance between features, usability, speed and working model that DID
revolutionize version control. Monotone pioneered a lot, but was too slow and
cumbersome; so was Bazaar without pioneering much. BitKeeper had a lot of
things going for it, but freedom and price working against it. is more or less
on par with git, but it's git that brought the revolution.

Second, by being able to successfully manage more than one huge project with
hundreds of contributors, all of whom he can fire at, but which he didn't
actually hire (nor can he, if he needs more work).

------
przemoc
For core services like BIND I don't see any good reason to write them in more
than one language (actually anything other than C/C++) and introduce
cumbersome dependencies because of that. I wouldn't complain if it was some
another DIY name server toy created for whatever reason (how much time it
would take..., I can do it!, etc.), but it's software that will become widely
deployed in upcoming years (almost) without doubt.

How much more work these 17% done in Python would take to be (re)written in
C++? Would it make the code that much worse in terms of quality and
managability? Having coherent, one-language codebase, is a good feature on its
own too, often improving above mentioned factors.

~~~
mattmanser
You can look at the source yourself.

I don't understand either, randomly opening parts of both the Python and C++
and it's certainly not complicated and they write the Python in a C style
anyway.

I'm not a C++ coder and I only play with Python now and then, but can't say
the code impressed me much. It's actually pretty hard to skim the code because
it's massively over-commented and there are far too many tiny 2-line private
functions that are called by exactly one other 2-line private function that is
called by exactly one other 2-line function. Or Holographic code as John D.
Cook called it[1].

And the copyright notice in every source file is extremely irritating.

Far too high noise to code ratio for my tastes, so I got bored before I could
really 'see' how the Python had helped.

but I can't judge that well as I'm not sure if that's all just a bi-product of
using C++ and being an open source project that comes out of committee.

[1] [http://www.johndcook.com/blog/2012/01/09/holographic-
source-...](http://www.johndcook.com/blog/2012/01/09/holographic-source-code/)

~~~
eru
In general, abstracting code out into functions can be useful, even when the
new functions is only called once.

That is because functions are a well understood abstraction. And having code
separated into functions makes it easier for the reader to deduce the coupling
points: two blocks of code after another can have all kinds of weird
dependencies, e.g. the first blog might set some local variables that the
second one relies on. But functions just have arguments and return-values.

~~~
mattmanser
Obviously.

I've found in my experience working with other people's code and maintaining
large code bases you find that overly nesting functions causes a lot of
problems when you're trying to read or debug code.

You often also see problems where the essentially dependant functions start to
separate in the code as people accidentally add new functions between them.

The article I linked is good, John describes it well. It's a nightmare to work
with when you get triple or quadruple nesting of tiny functions, like in the
code of this program. It's totally unnecessary.

~~~
eru
I guess it also depends a bit on the language you are working in. Some
languages have an easier time dealing with functions. In, say Haskell, having
quadruple nesting of tiny functions isn't too much of a problem---and if you
are pedantic, is the only way to create a function with four arguments.

------
StefanKarpinski
Go seems like it would have been perfect here, but of course, the language
wasn't even a glimmer in Robert Griesemer, Rob Pike, and Ken Thompson's
collective eye yet.

~~~
eropple
Obviously I can't speak for these guys, but if I were designing it the fact
that Go uses a garbage collector is a pretty strong minus. I like GC'd
languages for a lot of tasks, but if I care about performance to the extent
that they need to with BIND, I'm not going near it.

~~~
rlpb
Interestingly, they actually wanted a garbage collector:

"The language had to address most of the problems with C. Ideally this meant
something with good string handling, garbage collection, exceptions, and that
was object oriented."

~~~
eropple
Yeah, I saw that, and my WTF sensor lit up. DNS isn't trivial, but it seems
like a sufficiently well-explored problem that memory lifecycles should be
pretty well-understood.

------
antirez
Quoting from the article, with my replies:

> _String manipulation in C is a tedious chore._

Use a dynamic strings library, like Postfix for instance, and everything else.

> _C lacks good memory management._

So strange that you went for C++ for most of your code that is not immune of
problems from this point of view. I could understand that point if you were
opting for a language with GC support. With C you can easily get better (that
is, _safer_ ) than C++ native MM just building a reference counting system on
top of your C "objects". This is trivial and it is what Redis, Tcl, C-Python,
and many others are doing.

With Redis memory leaks or memory management never was a big issue.

> _Error handling is optional and cumbersome._

Exceptions mostly suck, and in system software the only sane way to deal with
errors is C-alike IMHO, that is, check the return value / error returned by
every function and act accordingly.

> _Encapsulation and other object-oriented features must be emulated._

This is not an objective point since many thing that this features are
actually a problem.

Weak points IMHO, and C++ and Python with the minority of Python looks like a
design error.

~~~
alexgartrell
So I used to agree with you, but I've done extensive performance critical C
development at Facebook (memcached, a new thing we are about to talk about)
and I have also done extensive performance critical c++11 development (our
layer 7 load balancer for http) and I'd have to say that c++11 is the far
superior option if you really, truly understand what the compiler is doing to
your code (a huge caveat).

Unique_ptrs are a total game changer, and the ability to use closures and
lambdas when you want to set a callback function instead of a function pointer
with a context pointer you have to cast and decode is absolutely huge for
readability. _maybe_ we aren't getting every last bit of performance out of it
that we could with C, but it works at our pretty ridiculous scale, so I think
C might have been a premature optimization for us.

~~~
Meai
First you are saying c++11 is the far superior option (with that one caveat)
and in the last sentence you imply that C would still be faster. Can you
clarify? I'll add my opinion too: I think all the readability you can get out
of c++, will be wasted in layers upon layers of object oriented design and C
compiles much faster, so there is that.

~~~
alexgartrell
C++11 is the superior option because it is easier to write correct code with
it, plain and simple. And we aren't talking about orders of magnitude
difference in throughput or latency, we're talking about a slight increase in
CPU idle.

And layers upon layers of object oriented crap is also a problem in C (I've
seen it). At the end of the day I've just been burned more by the complexities
of building large things in C (particularly when people do reference counting
in C) than I have been by the complexity of c++ in general.

------
Scramblejams
I'm finding that Cython[1] can be a very convenient way to speed up the
critical parts of your Python. Anyone who considers porting parts of their app
to C++ should give it a look.

[1] <http://www.cython.org>

------
fool
Mirror: [http://www.isc.org.nyud.net/wordpress/programming-
languages-...](http://www.isc.org.nyud.net/wordpress/programming-languages-
for-bind-10/)

------
rwmj
Did they consider C with a pool allocator library like talloc? It's a good fit
for servers (cf. Samba using talloc, and Apache which uses another pool
allocator).

------
pjmlp
Nice to see more projects moving into more expressive and safer languages.

~~~
army
I'm not sure that I'd call dynamic languages like Python safer, it depends on
context. Compared to C/C++ you lose static typechecking but gain a safer
memory model.

~~~
pjmlp
Anything is better than C.

~~~
eru
You haven't used PHP much? Or MUMPS?

~~~
pjmlp
PHP is not a compiled language for systems programming.

As for MUMPS I know it is something used only in US it seems.

~~~
eru
Oh, I wasn't aware that you only talked about those. MUMPS is also not for
systems programming, as far as I know.

To nitpick a bit, compiled or not is more a property of the implementation
than of the language itself. Of course, some languages are more commonly
compiled than others. But, don't Facebook have a PHP compiler?

~~~
pjmlp
> But, don't Facebook have a PHP compiler?

Yes, but I don't see PHP as a possible systems programming language, even with
a compiled implementation.

Uhm, dreams of device drivers written in PHP...

------
btipling
> "C lacks good memory management"

C lets you manage memory without possibly inefficient or even broken magical
black box automated processes or garbage collectors. Maybe what the author
meant to say was "C lacks easy memory management."

~~~
raverbashing
Funny you mention "black box automated process" because, what do you think
malloc is?

There are two ways of allocating memory directly from the kernel (IIRC), brk
and mmap.

There's some management malloc does, and if it's good or not depends on you
application.

For example, size, number and behaviour of your allocations. Depending on your
situation you may want to do your own memory management.

~~~
qdog
When I worked on embedded systems, we didn't use malloc() at all, just some
heap functions. Even after that, there have been times where I worked on
projects where there was a single malloc() and that was turned into a memory
heap.

However, a call to malloc() is pretty straightforward, you expect it to return
a pointer to the allotted memory, or not. Using a GC or the boots libraries is
not quite as straightforward, and the black box is a lot bigger.

------
pschastain
Obj-C anyone? Seems to meet all their requirements - OO, exception-handling,
memory management via either ref counting or GC, and it's a C-superset so
C-based optimizations would be easy. Maybe the run-time is to large and/or
complicated? Although how it could be more complicated than a C++ runtime I
don't know. Would Apple being the primary driving force behind it's
development be a problem?

------
dottrap
Sad. They didn't do their homework. They basically had the exact same
requirements of the commercial video game industry which uses C/C++ and Lua to
solve this problem. Lua had already long asserted its dominance by 2006 so it
wasn't a secret. And they should have done a lot better than only 17% of the
code in Python.

------
markwong
based on the given criteria, wondering why Java wasn't considered.

~~~
buster
Probably because of memory usage and garbage collection and the fact that java
needs to be installed on most systems whereas C++ needs no dependencies and
python comes preinstalled on the usual linux server (probably not python3 but
i assume it will be some years before bind10 sees adoption for such critical
infrastructure). Also he mentions specialized data structures and memory
management.

Atleast i wouldn't sleep well if i know that the heart of the internet was
running Java :P

~~~
danieldk
_Probably because of memory usage and garbage collection and the fact that
java needs to be installed on most systems whereas C++ needs no dependencies_

Well, GCC (as in the compiler collection) has had a ahead-of-time compiler for
ages:

<http://gcc.gnu.org/java/>

There is also work in the LLVM camp on AOT Java compilation:

<http://vmkit.llvm.org/>

 _Atleast i wouldn't sleep well if i know that the heart of the internet was
running Java :P_

Well lucky you, only many other vital body parts are running on the JVM via
Java, JRuby, and increasingly Scala ;).

~~~
fdr_cs
Those implementations have quite weak garbage collection implementations
(boehm conservative, but I'm not sure), which would just kill the performance.
Hotspot JVM has a very sofisticated Incremental Generational garbage
collector, which does have a _very_ good performance. I'm pretty sure they had
their reasons for not using Java (I actually do have mine too), but, garbage
collection is not one of those.

~~~
danieldk
Indeed. I was arguing against the 'you need the JVM as a dependency'. But then
Go is all the hype these days, and people also use Go for network
applications, which also has a weak GC ;).

~~~
fdr_cs
Go does generate a _LOT_ less garbage then Java, you can control the layout of
your structures. That why its gc has less impact on performance then the JVM´s
one.

------
jejones3141
Here's what I take from this: "C++ is by no means an easy language to work
with, so the idea is that we will avoid its complexity when possible."

~~~
alpatters
Avoiding its complexity does not mean avoiding it. E.g. you could avoid
template metaprogramming due to its complexity whilst keeping to simpler areas
of the language.

C++ is a large, multi-paradigm language. It gives you choices and one is free
to abuse those choices. But having more options gives you more power to
express.

I think with modern C++ style, boost and C++11 its "difficult-to-work-with"
reputation is massively overstated. It is possible to write succinct code with
good design and get massive performance benefits.

------
wildchild
I hope that OpenBSD team will fork it (BIND 9).

~~~
tobiasu
I think the plan is to drop it entirely and replace it with nsd.

------
stefantalpalaru
This doesn't look good for Python:

"Whenever possible, we use Python"

"When necessary, we use C++"

"As of right now, it ends up that about 75% of our code is C++ and 17% is
Python (link) since it turns out that a lot of BIND 10 is performance-
critical."

~~~
justin66
I can see what you're saying but it's worth contemplating what percentage of
the total codebase that Python code would represent if it were replaced by C++
code.

~~~
alpatters
I often convert python code to C++ and you'd be surprised how often the
difference in loc is not that much. Certainly no hassle to do so. Not that I'm
suggesting that one always should.

Several boost libraries are inspired by python and C++11 features all help to
write C++ code that is surprisingly similar to python, with a bit of extra
type sepcification. If you think otherwise I'd suggest you're probably
thinking of the C++ of the 90s - a more C with classes, and not modern C++.

I often find it useful to develop and prototype in python and convert to C++.
Most often I'm doing this because my python simulations can take days and the
C++ versions hours. Often I find it is not just the performance critical
areas, but it is easy enough just to wholesale convert the lot.

~~~
danieldk
_I often convert python code to C++ and you'd be surprised how often the
difference in loc is not that much._

I agree, especially with C++11 and (besides Boost) Qt. It's often the
header/code separation that makes things a bit tedious, having to keep
function and method signatures in-sync. Of course, if you are template-land
that is not that much of a problem.

------
denysonique
Why isn't it written in Node.js?

~~~
eksith
After careful consideration, as the article mentions, ISC decided a
combination of C++ and Python as being more appropriate to the goals of the
project. That being safety, stability, established familiarity, speed and of
course a level of guaranteed future-proof platform availability.

Read: "This is one of the cornerstones of the internet that we didn't want to
piss away on novelty".

That sounded rude, and I'm sorry for that, but I don't know how else to make
that point cogent.

~~~
afandian
I think it was a joke.

