
Two years of production Go at CloudFlare - jgrahamc
http://blog.cloudflare.com/what-weve-been-doing-with-go
======
tptacek
Golang is _perfect_ for DNS: its concurrency model matches that of DNS
extremely well, its standard library is heavily informed both by Unix and by
Internet protocols, and its fast and simple. That's unsurprising, since
Golang's charter was to modernize C/C++, and DNS is essentially an expression
of 1980's C code style.

~~~
staunch
Go is _perfect_ for almost every kind of server!

Though handling DNS requests over UDP is pretty easy and fast in almost any
language. TCP connections are much more pleasant to handle by spawning a
goroutine per connection. I do assume using goroutines is slower than
multiplexing though, but still plenty fast.

~~~
pmjordan
_I do assume using goroutines is slower than multiplexing though, but still
plenty fast._

Surely, the goroutine model is just an abstraction _built on_ multiplexing? As
in, it uses the exact same syscalls that a multiplexing server written in C
would use, except instead of storing state explicitly in a connection
structure, it's stored implicitly in the goroutine's stack. The main (only?)
real difference is code structure and thus the dispatch mechanism to where in
the connection handling code processing should resume: In the goroutine,
you're literally storing the instruction pointer and jumping straight to it.
In the hand-written case, you'll probably go through some buffering code and
an explicit state machine.

Basically, Go and its goroutines save you reinventing the connection state
capture and resuming code for every single protocol you implement - it comes
for free as part of the language. (you can make C do similar things with
makecontext/swapcontext plus some extra plumbing code of course) Performance
wise it should be a pretty close call: resuming a goroutine will presumably
cause a branch prediction miss (due to the change in function return pointer),
whereas the multiplexed code will need to wind its way through the state
machine logic - almost certainly also a branch prediction miss. CPU cache
behaviour should likewise be similar (you're storing the same state either
way).

~~~
staunch
This is true, but I'd be very surprised if you could beat the memory usage and
performance of a while loop with epoll using goroutines. At least in the role
of a DNS server. Now I want to test this...

~~~
pmjordan
Yeah, stateless datagram protocols are pretty much the optimal problem for
unadorned async I/O.

------
staunch
CloudFlare is awesome. They're living out one of my fantasies of productizing
really well done front-end web serving infrastructure, which I love to see.
They appear to have gathered solid team. And they're doing it with Go!

I'd love to see those libraries (sans proprietary modules) show up on GitHub!

~~~
cloudflare
[http://blog.cloudflare.com/open-source-two-way-
street](http://blog.cloudflare.com/open-source-two-way-street)

~~~
chimeracoder
Man, there are so many things I love about that blog post (the Go being the
least of them). Wish you guys had a presence here in NYC!

~~~
eastdakota
I'm in NYC this week if you want to grab coffee. Always interested in talking
to smart engineers interested in GoLang. Email: matthewatcloudflaredotcom cc:
jennatcloudflaredotcom and we'll see if we can set something up.

------
alberth
I wonder how Lua fits into the picture at CloudFlare with this Go
infrastructure, since CloudFlare is also a heavy user of Lua [1] as well and
key financial sponsor of LuaJIT [2].

[1] [http://blog.cloudflare.com/pushing-nginx-to-its-limit-
with-l...](http://blog.cloudflare.com/pushing-nginx-to-its-limit-with-lua)

[2]
[http://luajit.org/sponsors.html#sponsorship_perf](http://luajit.org/sponsors.html#sponsorship_perf)

~~~
dknecht
Every request that passes through CloudFlare goes through Lua. It has how we
look up and apply a customer rules to a recess. We just recently realized a
non-blocking logger option for Nginx [2] using Lua and are in the process of
developing a new KT library [3] in Lua.

Here are the most recent lua libraries we have open sourced... [1]
[https://github.com/agentzh/lua-resty-lock](https://github.com/agentzh/lua-
resty-lock) [2] [http://github.com/cloudflare/lua-resty-logger-
socket](http://github.com/cloudflare/lua-resty-logger-socket) [3]
[http://github.com/cloudflare/lua-resty-
kyototycoon](http://github.com/cloudflare/lua-resty-kyototycoon)

~~~
minikomi
I love that you brought on agentzh to the team.

------
jbarham
Nice writeup. Especially interesting to see that CloudFlare is using Go for
its DNS service. I run a DNS hosting service
([https://www.SlickDNS.com](https://www.SlickDNS.com)) and currently use
tinydns from the djbdns suite in my name servers, but I'm writing a drop-in
replacement in Go. The goal is to make it much higher performance by using
multiple goroutines (tinydns is a single process).

~~~
halayli
You can also write it in C and use lthread. It will be faster than what Go has
to offer.

[https://github.com/halayli/lthread](https://github.com/halayli/lthread)

~~~
tptacek
It will probably _not_ be significantly faster than Golang for this workload.

~~~
halayli
It can be. You have total control of memory layout. You can gain a lot of
performance by considering locality of reference, slab allocators, and parsing
DNS packets in general will most probably be much faster.

~~~
tptacek
Golang also allows control over memory layout.

~~~
halayli
of course it does, but it does it in a way that aims to provide an overall
good performance for all apps. But it doesn't mean that's the best performance
you can get.

~~~
halayli
C doesn't control memory. brk() does. There are various malloc libraries out
there to improve the way malloc manages memory.

But ideally you don't want to allocate memory at all. You want to have memory
preallocated and preferably accessing a page that's still in cache. You can
control memory pages in C because it doesn't manage that for you.

~~~
tonyplee
+1

------
fizx
I want an independent write-up of Red October! Sounds really cool.

~~~
pjscott
I'm not an expert, so take this speculation with a grain of salt, but I'd
guess this is how it encrypts something in such a way as to require two
people:

1\. Generate a key, and use it to encrypt and sign your payload. Nothing fancy
here; just plain old symmetric encryption and authentication.

2\. Use Shamir's Secret Sharing to split the key into pieces. You need any two
pieces to reconstruct the key. This is where the magic happens:

[http://en.wikipedia.org/wiki/Shamir's_Secret_Sharing](http://en.wikipedia.org/wiki/Shamir's_Secret_Sharing)

3\. Encrypt-and-sign each piece with a secret key derived from its owner's
password/passphrase using a secure KDF like scrypt.

4\. Throw away the keys, and put those encrypted pieces on disk.

Now you need passwords from any two people to decrypt the secret payload.
Cool, right?

~~~
KMag
For millions of secrets and tens of people holding those secrets, there's a
much more space-efficient solution. There's an added benefit that hiding each
secret is plain vanilla RSA encryption.

Victor Shoup discovered a relatively straight forward way to take an RSA
private key and apply Shamir secret sharing in a way that commutes with RSA
public key operations. Just like Shamir secret sharing, each of the N shares
of the secret is a point on a random polynomial on a finite ring (in this
case, the set of quadratic residues modulo the key's public exponent). Each of
the N secret holders, using their secret share, can generate a public share.
Any T of the N public shares can be used to invert the RSA public key
operation.

Shoup's threshold RSA signature scheme
([http://www.shoup.net/papers/thsig.pdf](http://www.shoup.net/papers/thsig.pdf))
when used for RSA encryption caries all of the caveats that using the raw RSA
algorithm entails (use OAEP, etc.)

I'm really surprised Shoup-Cramer encryption is more famous than Shoup's
threshold RSA algorithm.

------
camus2
Very good article,though i dont use go(yet) ,that's the kind of writeup i'd
like to see more on HN home page.

------
pstuart
The "seamless binary upgrade" part is intriguing (unless it's just a reference
to the fact that binaries are self-contained (which is indeed wonderful)).

~~~
free652
I have a feeling they just change symlinks between self-contained binaries.
Easy to upgrade or revert binaries.

~~~
dknecht
We use a supervising process [1] that has bound the sockets and the pass in
the file descriptors to Go process. When the new process is ready to take
connections it sense a signal to the original process to shut down.

[1]
[https://github.com/cloudflare/circus/commits/cloudflare2](https://github.com/cloudflare/circus/commits/cloudflare2)

------
jorgem
Pretty cool stuff. How big is the team working on these Go projects?

------
alec
"The guarantees needed to avoid leaving the server in a bad state when
handling panics would be impossible without the defer mechanism Go provides."

I'm only passingly familiar with defer, but I understand it to be equivalent
to RAII in C++, Python's with statement, Common Lisp's unwind-protect, and
others - does is actually provide something more, and if so, what?

~~~
pcwalton
Go's "defer" is not equivalent to RAII. It is function-scoped rather than
block-scoped and has semantics based on mutating hidden per-function mutable
state at runtime. For example:

    
    
        func Foo() {
            for i := 0; i < 5; i++ {
                if Something() {
                    defer Whatever()
                }
            }
    
            // ... the compiler can't tell how many
            // Whatever()s run here ...
        }
    

Compared to RAII as implemented in for example D with its "scope" statement,
"defer" has much more complex semantics, inhibits refactoring since moving
things to function bodies or inlining function bodies silently changes
semantics, and cannot be optimized as easily, because of the dynamic aspects.
IMHO, it has essentially no advantages over RAII and many disadvantages.

~~~
enneff
> defer inhibits refactoring since moving things to function bodies or
> inlining function bodies silently changes semantics, and cannot be optimized
> as easily, because of the dynamic aspects.

As someone who has written and reviewed hundreds of thousands of lines of Go
code, I haven't observed this to be the case in practice.

RAII doesn't fit into Go, philosophically, as it lets you trigger hidden
functionality on the creation or destruction of data structures, whereas a
deferred function can only be run if there's a defer statement there in the
code (where you can see it).

In Go, the only way to execute a block of code is to make a function call.
There are no constructors, destructors, or any other kind of side effect to
allocating or deallocating data structures. This brings a huge benefit in
terms of readability and transparency.

Anyway, I'm not sure why we're comparing defer and RAII, because they're
generally used for different purposes.

~~~
pcwalton
> As someone who has written and reviewed hundreds of thousands of lines of Go
> code, I haven't observed this to be the case in practice.

Sure, a lot of suboptimal design decisions don't cause problems in practice.
That doesn't change the fact that they're suboptimal, and in this case lead to
worse performance.

> RAII doesn't fit into Go, philosophically, as it lets you trigger hidden
> functionality on the creation or destruction of data structures, whereas a
> deferred function can only be run if there's a defer statement there in the
> code (where you can see it).

I'm focusing more on RAII as implemented with "scope" in D; whether stuff runs
explicitly or implicitly is an orthogonal design choice (although I prefer
implicitly running code since you need finalizers anyway in any GC'd language,
including Go—so you might as well embrace it). With the "scope" statement you
also always have to explicitly call the destructor, but in a lexically scoped
way.

The main thing I find suboptimal with "defer" is the choice of dynamic mutable
state as compared to lexical scoping.

> In Go, the only way to execute a block of code is to make a function call.
> There are no constructors, destructors, or any other kind of side effect to
> allocating or deallocating data structures. This brings a huge benefit in
> terms of readability and transparency.

[http://golang.org/pkg/runtime/#SetFinalizer](http://golang.org/pkg/runtime/#SetFinalizer)

~~~
tptacek
This appears to be a helper function used exclusively by the standard library
to handle file descriptor closing (incidentally, the one issue I've had with
Golang's concurrency model).

~~~
pcwalton
> This appears to be a helper function used exclusively by the standard
> library to handle file descriptor closing (incidentally, the one issue I've
> had with Golang's concurrency model).

But it's part of the public API. You can add a finalizer to any object. The
semantics of Go say that finalizers are run automatically when the GC reclaims
an object. So this statement is wrong: "In Go, the only way to execute a block
of code is to make a function call. There are no constructors, destructors, or
any other kind of side effect to allocating or deallocating data structures."
It would be more correct to say "idiomatically, in Go people tend to prefer
calling functions explicitly, and 'defer' encourages this."

I think the fact that it's used by the standard library to close file
descriptors is actually really illustrative: you need finalizers in a GC'd
language, otherwise you'll leak resources. Not all resources are stack-scoped.
So implicitly running functions on deallocation is a necessary evil. You might
as well embrace it in your language design.

~~~
tptacek
It may be part of the "public API" solely because it needs to be made
available to several different components of the standard library, which is
itself at pains to implement itself primarily in Golang.

SetFinalizer feels like a low blow, here.

~~~
pcwalton
I don't see why it's relevant that the standard library as opposed to user
code needs it. The standard library is a library like any other. It needs
finalization functionality because you always need that functionality in a
GC'd language.

File descriptors are just one case of resources that need finalization
functionality to not leak: the same applies to GPU textures, X server
resources, etc. etc.

~~~
SamReidHughes
You don't need finalization functionality in a GC'd language. If you imagine a
language that lacks finalization functionality but has automatic memory
reclamation, things turn out okay. Finalization functionality isn't something
that sane programs depend on -- garbage collection of memory makes sense
because if you run out of memory or allocate and remove pointers a lot of
stuff, the garbage collector can naturally kick in and find you some more
memory to use. If you allocate a bunch of file handles, does the garbage
collector kick in when your OS tells you that you've run out of file
descriptors?

~~~
pcwalton
> You don't need finalization functionality in a GC'd language. If you imagine
> a language that lacks finalization functionality but has automatic memory
> reclamation, things turn out okay.

Not in fault-tolerant message passing systems, to name just one obvious
example. Suppose that you put a bunch of file objects into a buffered channel,
and then the goroutine that was supposed to receive those objects panics. Your
program wants to recover from panics with recover(). Who closes the file
descriptors in those channels? Nobody owns them yet: they were in a channel
and the goroutine that was supposed to receive them died.

You might be able to solve this by handing out references to the channel to
another goroutine that is supposed to clean up the file descriptors, but this
gets really complicated. This sort of thing is why Go is GC'd in the first
place. It's much easier to just have the GC clean up the file descriptors in
channels in which one endpoint has gone dead, and that's the sort of thing
finalizers are for and I assume it's the reason that finalizers were built
into Go.

~~~
SamReidHughes
Have the things in the channel be the equivalent of C#'s IDisposable or
something, then the goroutine can clean them up itself.

~~~
pjmlp
Have you read pcwalton example?

How can the goroutine clean them if it is dead?

~~~
SamReidHughes
Sorry, the words coming out of my fingers didn't match the thought in my mind.
The channel can dispose of them.

------
craigyk
I do part time IT and I have repeatedly wanted a more programatically friendly
DNS and DHCP solution for our small network. dnsmasq doesn't have a great
high-availability story, and BIND+ISCDHCP can be quite complicated to
configure for high-availability (plus BIND doesn't seem to have convenient
options such as expand-hosts from dnsmasq).

Does this mean Go is ready to use to make a 'simple' forwarding/caching DNS
server with the ability to have DHCP update local hostnames? How about DNSSEC
validation?

~~~
tptacek
There's no official support for DNSSEC in Golang, but then, there's very very
little support for DNSSEC in the real world. Which is just as well.

~~~
axaxs
True, but then again, the de facto dns library
([https://github.com/miekg/dns](https://github.com/miekg/dns)) is simply
amazing for all things DNS, including DNSSEC.

------
sard420
CloudFlare is great stuff just wish they offered SSL on their free tier,
should anyone really be running without?

