Hacker News new | past | comments | ask | show | jobs | submit login
Google Open Source Load Balancer in Go (github.com)
323 points by paukiatwee on Jan 30, 2016 | hide | past | web | favorite | 69 comments

Official announcement: http://google-opensource.blogspot.com/2016/01/seesaw-scalabl....

Note that this is a network-level load balancer that is tightly coupled with LVS, not a Layer 7 load balancer like HAProxy.

HAproxy can do Layer 4.

It can, but not (I believe) without actually reading the packets, using sockets to connect to the destination, and sending them onwards.

Seesaw is just a traffic cop. The kernel does all the work.

I know HAproxy will aggressively use kernel features to reduce the amount of actual packet handling it does. For example, it does zero-copy forwarding using the splice system call. How this compares to what seesaw does is over my head, though. I just wanted to point out that HAproxy isn't strictly Layer 7.

Anycast VIPs and Direct Server Return were a couple key parts of VDN "secret sauce" in early to mid-2000s.

Cool to see this engine out in open source.

To build a VPN edge you also need cooperative tiered caches with some very counterintuitive cache admission and eviction algorithms, unicast front end with p2p (for vod) or multicast (for live) back end, multi-datacenter event aggregation and correlation, cookieless/db-less sessions, and a few other goodies, some of which you can now even find as Nginx plugins.

Assembling what you need is much easier now with HLS, DASH, etc., than it was in the MMS/RTSP/RTMP days.

As of last couple years, you could almost assemble a viable global VDN with off the shelf open source parts. If the open compute project keeps up its good work, the physical kit gets affordable too.

I've worked for CDNs and configured many VPNs but I have never heard of a VDN. What is a VDN? I don't recall hearing this term in the early 2000's either. Is that a Google internal name? Could you elaborate? Thanks.

Based on the rest of the commend, VDN = video distribution network. I think the 'VPN' is a typo and should be another 'VDN'

Sorry, yes, autocorrect ... VDN and video delivery network

LVS load-balancing is amazing. You can use a tiny instance to load balance huge amounts of traffic since the instance itself doesn't need to process and pass the traffic.

the part that makes it difficult to implement is that the hosts have to be on the same network. I didn't dive too much into the code but it looks like you also assign an additional interface to use as a VIP. I believe you'll need to be in your own network and not the cloud to implement this.

Still cool nonetheless!

Would it be possible to set this up with Digital Oceans Floating IP[1] and LVS? I know they use HA-Proxy[2,3] as a standard setup here.

[1]: https://www.digitalocean.com/community/tutorials/how-to-use-...

[2]: https://www.digitalocean.com/community/tutorials/how-to-crea...

[3]: https://www.digitalocean.com/community/tutorials/how-to-set-...

The reason I ask, I worked with one hosting provider who leveraged LVS for well over billions of impressions per month and was rock solid for all their client base.

I'm pretty interested in this, but not yet in a position to roll out a dedicated infrastructure.

The third link that you posted is for keepalived. keepalived uses LVS as the actual load balancer (which is what seesaw uses as well). So as long as you can setup seesaw to run a few scripts to inform DisitalOcean which droplet currently has the ip, then it should work fine. There is an example for keepalived in the link that you posted.

Awesome! Many thanks for the reply!

In AWS, you can design subnets within a VPC, which would put the hosts in the same network, and presumably work for LVS load balancing?

I wonder why this requires a second interface, as apposed to just an ip alias for the VIP.

Wikimedia's LVS manager that does what seesaw does, although it was never "advertised" to the public:


It actually is a bit different in that it advertises VIPs via BGP, so no floating IPs are needed, but you need to be able to configure your upstream routers, and its configuration is much more flexible.

Is this just some tooling around Quagga and LVS then? I mean if you had Quagga/BIRD and LVS on your load balancers(not an uncommon configuation) isn't that pretty much all the heavy lifting? What else does Seesaw provide, SSL termination, config management?

seesaw is just a management layer on top of ipvs (or lvs).

The code actually talks to IPVS via netlink.

Given that it's built on top of LVS, this presumably supports UDP load balancing. If so, that's great, as the available solutions for UDP load balancing are, to my knowledge, quite limited.

Yes, there's support for UDP, and for UDP-based protocols such as DNS.


OT: I'm a Go noob and I was wondering if there's something like requirements.txt for Go projects? Or is this usually the way to go?

    go get -u golang.org/x/crypto/ssh
    go get -u github.com/dlintw/goconf
    go get -u github.com/golang/glog
    go get -u github.com/golang/protobuf/{proto,protoc-gen-go}
    go get -u github.com/miekg/dns

This looks incredibly brittle. What if one of those dependencies releases a new major version on its master branch that breaks compatibility? It was really surprising to see this done this way. I know go's package management is pretty weak but a serious project like needs reproducible builds. Since all the dependencies are on git they could just use git submodules in a vendor directory or something.

This looks incredibly brittle. What if one of those dependencies releases a new major version on its master branch that breaks compatibility?

You are not supposed to do this in Go. If you make changes that are API-incompatible, you need to create a new import path. From the Go FAQ:

Packages intended for public use should try to maintain backwards compatibility as they evolve. The Go 1 compatibility guidelines are a good reference here: don't remove exported names, encourage tagged composite literals, and so on. If different functionality is required, add a new name instead of changing an old one. If a complete break is required, create a new package with a new import path.


Whether this is a good approach is the question, but pushing API compatibilities to an existing import path is considered to be bad.

(Since creating a new repository for each API-breaking version is annoying, there are sites such as http://gopkg.in/ that allow you to expose version branches as a different import paths.)

This is (sadly) why gopkg.in [1] exists. It proxies Git repositories, mapping:

  gopkg.in/mypackage.v3     -> github.com/go-mypackage/mypackage

  gopkg.in/bob/mypackage.v3 -> github.com/bob/mypackage
...where v3 is a branch or tag named v3, v3.N or v3.N.M.

What Go needs is a real package manager that uses the new 1.5 vendor directory to manage dependencies. There have been some attempts (godep, gb), but they aren't very good, especially if you look at how existing languages have solved this (Ruby's Bundler, Rust's Cargo).

[1] https://gopkg.in

> I know go's package management is pretty simple


I like cargo's system for this.

This is the biggest problem in golang ecosystem. Yes there are many third party solution, but no official one. So different repo use different solution make all solutions invalid.

Well one official solution would be to use the vendor path.

Unless there's something weird about that repo, go usually figures out what it needs to pull from the imports, you don't need to manually specify them.

Doing "go install github.com/google/seesaw" should do everything for you.

I don't know why they used that in the README, but "go get" has this approach to fetch dependencies like that:

go get -u github.com/google/seesaw/...

How does HN's duplicate link work? I'm asking this because I saw this thread earlier: https://news.ycombinator.com/item?id=10998103

Wouldn't it be fair for that submitter to get karma points?

They won't get the karma, but the equivalent cash value has been wired to their bank account.

This is not an official Google product. So why did it say Google in the title? This should be clarified.

> It’s worth noting that while this project comes out of Google, the open-source version is not an official Google product. So don’t expect the company to provide any official support.

The distinction is that this project is Google written/used not google supported.

It's made by google, used by google, blogged about by google, and hosted on google's github. it's officially google, it's not officially a product.

It is a load balancer Google is using in production: http://techcrunch.com/2016/01/29/google-open-sources-its-see...

Edit: Here's the official Google announcement: http://google-opensource.blogspot.com/2016/01/seesaw-scalabl...

It's not used in production, it's used in "corp" which is totally disjoint from production at Google.

How is that not "using it in production"? Internal usage is still production usage.

When Google, Facebook, etc. refer to "production" they typically mean google.com, facebook.com, etc.

EDIT: If you're going to down-vote this comment, at least provide detaro with the correct answer to his/her question!

I didn't downvote, but your statement in no way refutes parent argument. Typically =/= always. It's similar to someone declaring "HN was down", and you reply by saying "HN is typically up"

It was more like someone saying, "Foo is not bar," someone else saying, "How is foo not bar? Bar is bar," and then me saying, "Because bar has a different meaning, colloquially, at certain companies."

At least 2 years ago it was already not always the case and things were moving towards unification rather than the other way around. Unless the general direction has drastically changed since I left I can only assume your statement is even less accurate now than it used to be.

You're right but this particular piece of software is an artifact of the old way, not the new unified way.

I believe it means they want to share the code but not support it like an official product launch, but I don't know for sure.

And then withdraw the code and cancel it once it's in wide use, a la MyTracks?

MyTracks was never in wide use, and its source code certainly wasn't.

It's distributed under Apache License 2.0. So it doesn't really matter, you'll still have the source.

It's open source, it they loose interest in it anybody can pick up the code and continue supporting it.

It's on the google github organization...

I don't want to digress too much from the main discussion, but what are the main reasons people use Go? What are the main use cases for Go? Just looking for a new programming language to learn for some of the next web apps I'll be building.

Excellent concurrency, blazingly fast and memory-safe.

The first is difficult for a lot of the commonly used languages, the second is hard for Python, Ruby, etc, the third is hard for C, C++, etc.

One downside (although some consider it an upside by design) is that it can be a bit more verbose than other languages since there is rarely any "magic". You can usually read any Go code and be pretty confident at what it does, because it doesn't allow you to hide complexity. This does tend to make the code more verbode.

I highly recommend taking the interactive Go Tour at https://tour.golang.org/welcome/1 because it will show you everything you need to get a high-level overview of Go's power.

I would say the downside that you mentioned can also be a strength too!

Thanks for the response. I've been trying to decide between Go and Elixir as my next new language. Both languages sound like they have great features.

AKA: We are not going to use this nor maintain it anymore.


That analysis is pretty bad, the code is clean and straightforward, yet that tool reports a bunch of random non-issues.

Well, there is `EngineCon` interface [1]. And two types (`engineIPC` [2] and `engineRPC` [3]) that implement the interface. That's why there is some repetition. So, what?

[1]: https://github.com/google/seesaw/blob/master/common/conn/con... [2]: https://github.com/google/seesaw/blob/master/common/conn/ipc... [3]: https://github.com/google/seesaw/blob/master/common/conn/rpc...

Yes, horrible to duplicate standard boilerplate...

Are you adding something to the discussion or are you getting a commission for selling that crappy code quality tool?

EDIT: ok, apparently it is your product. Congratulations on successfully leaving a bad impression.

Guilty - my apologies. Except rather than sell it I'm simply looking for feedback. While the culture of using similar tools (reek, rubocop, flay, flog etc.) exists in the Ruby ecosystem and it's easier to calibrate algorithms and expectations Go is an uncharted territory. This is something I'm really after and again sincere apologies for my previous misdemeanor.

Finding a popular project where your tool finds good issues and writing a blog post analyzing them might be a good way to get attention and feedback.

(the team making PVS-Studio (a static code analyzer for C/C++) does this)

It's good etiquette to include a disclaimer, since this code analysis is a tool/company you are affiliated with.

Personally I've never seen any of these code analysis tools produce anything remotely useful. De-duplicating code might be nice in some cases but it certainly has nearly nothing to do with the over-all software quality.

As much as I love Go and use it extensively for various production-grade projects, I'm wondering how suitable it really is for something like a load balancer.

My understanding is that Go's TLS performance is dramatically less than other C-based implementations like OpenSSL. I remember reading something about some licensing collisions between the Go project and some TLS stuff such that optimized encryption code couldn't be incorporated into the Go project directly. Cloudflare got around it and has much better performance for TLS.

Apart from that, GC is still a big deal and the current net/http standard library produces a lot of garbage.

Go's TLS support should be greatly improved in Go 1.6 thanks to the performance work you referenced by Cloudflare. The licensing issue was resolved and the work was merged 11/10/2016[1]

While GC will always carry a cost, great improvements have been made over the last year and continues to be an area of focus for each Go release. Rick Hudson gave a nice update on Go's GC improves at GopherCon 2015 [2][3]

[1] https://go-review.googlesource.com/#/c/8968

[2] https://talks.golang.org/2015/go-gc.pdf.

[3] https://www.youtube.com/watch?v=aiv1JOfMjm0

I have not looked deeply at the implementation, at all, but given that the actual site refers to this as an "algorithm" for LVS, I kinda suspect all the heavy lifting is happening at the kernel level within LVS, and the go code is merely acting as a control program that tells the backend how to behave. Many high performance systems have lower performance scripting languages or similar built in or bolted on, in order to provide easier means of development without impeding performance significantly. LVS is no exception.

The LVS-based systems I've built and used in the past often had a number of non-C components (mostly Perl, as it was 10+ years ago), for health checks, data gathering, making balancing decisions, etc. It is entirely possible this is the kind of work Go is doing (again, I haven't looked deeply into the code...but, I don't immediately see anything indicating Go is doing the actual load balancing, but it is obviously doing health checks and providing management access).

Finally, Erlang is a garbage collected language, and yet it has been used for a couple of decades for this kind of workload. So, evidence strongly suggests it is possible for a GC language to do things like this (though, again, I don't think Go is doing the network layer work here, since LVS is in the picture).

That's because Erlang's GC is managed on a per-lightweight-process basis, which is possible because of the share-nothing nature of the concurrency model Erlang uses (actor/message-passing).

Garbage collection in Go can't be implemented the same way, because Go allows for shared mutable state.

That said, the work done to the GC in Go v1.5 seems quite phenomenal, but just because the two runtimes share a piece of terminology called "garbage collector", they're very, very different in terms of the semantics and effects exposed.

Apparently in 2016 we still need to make people aware that not all GCs are born the same way.

The 21x performance improvements to Go TLS was merged[1] and is available in go1.6rc1[2], IIRC.

[1] https://groups.google.com/forum/#!msg/golang-codereviews/m5Q... [2] https://tip.golang.org/doc/go1.6

This is a Layer 4 load balancer. It doesn't look at the contents of packets; it only decides where they should go. So Go's TLS performance isn't relevant here.

Premature worry (aka FUD). We run Go services that terminate TLS and handle thousands of req/sec per server and it's never been even a slight concern. CPU usage for TLS has not been measurable. GC has also not been an issue, at least wrt net/http.

This component basically acts as a control plane for LVS which is implemented in the kernel. The data plane is not Go. Basically this load balancer is really the Linux kernel (and it's blazing fast).

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact