
The GitHub Load Balancer - logicalstack
http://githubengineering.com/introducing-glb/
======
NicoJuicy
I notice a lot of negativity arround here. Don't know why that is... But i'll
take my 5 cents on it.

NIH - Not invented here and redoing an opensource project.

\- Github said they used HAProxy before, i think the use case of github could
very well be unique. So they created something that works best for them. They
don't have to re-engineer an entire code base. When you work on small
projects, you can send a merge request to do changes. I think this is
something bigger then just a small bugfix ;). Totally understand them there
for creating something new

\- They used opensource based on number of open source projects including,
haproxy, iptables, FoU and pf_ring. That is what opensource is, use opensource
to create what suits you best. Every company has some edge cases. I have no
doubt that Github has a lot of them ;)

Now,

Thanks GitHub for sharing, i'll follow up on your posts and hope to learn a
couple of new things ;)

------
otoburb
Given this is based on HAProxy and seems to improve the director tier of a
typical L4/L7 split design, I'm led to believe GLB is an improved TCP-only
load balancer.

But they also talk about DNS queries, which are still mainly UDP53, so I'm
hoping GLB will have UDP load-balancing capability as gravy on top. I excluded
zone transfers, DNSSEC traffic or (growing) IPv6 DNS requests on TCP53
because, at least in carrier networks, we're still seeing a tonne of DNS
traffic that still fits within plain old 512-byte UDP packets.

Looking forward to seeing how this develops.

EDIT: Terrible wording on my part to imply that GLB is based off of HAProxy
code. I meant to convey that GLB seems to have been designed with deep
experience working with HAProxy as evidenced by the quote: "Traditionally we
scaled this vertically, running a small set of very large machines running
haproxy [...]".

~~~
dcgudeman
Where did it say that it is based on HAProxy?

~~~
bogomipz
If you look under "stay tuned" it says:

"Now that you have a taste of the system that processed and routed the request
to this blog post we hope you stay tuned for future posts describing our
director design in depth, improving haproxy hot configuration reloads and how
we managed to migrate to the new system without anyone noticing."

That leads me to believe it involves HAProxy.

------
jimjag
I am increasingly bothered by the "not invented here" syndrome where instead
of taking existing projects and enhancing them, in true open source fashion,
people instead re-create from scratch.

It is then justified that their creation is needed because "no one else has
these kinds of problems" but then they open source them as if lots of other
people could benefit from it. Why open source something if it has an expected
user base of 1?

Again, I am not surprised by this. They whole push of Github is not to create
a community which works together on a single project in a collaborative,
consensus based method, but rather lots of people doing their own thing and
only occasionally sharing code. It is no wonder that they follow this meme
internally.

~~~
ckdarby
Have you ever contributed to HaProxy? Have you ever tried committing massive
alterations to major open source projects?

It isn't as simple as here's my massive rewrite, click the accept button and
everything works out for the open source community.

Let me be the first to say that the level of politics, circle jerking and
knowing people is ridiculous.

~~~
polpo
Given the good reaction to an out-of-the-blue patch from me on the HAProxy
mailing list, I'd imagine that contributing even major changes to HAProxy
probably would go rather well. It's one of the best open source development
communities I've experienced. Welcoming, but still highly focused on quality
contributions. The quality and performance of HAProxy reflects this approach.

~~~
paulddraper
HAProxy is exceptionally good in this regard.

------
Scaevolus
Related presentations/papers about large scale load balancing:

Facebook:
[https://www.usenix.org/conference/srecon15europe/program/pre...](https://www.usenix.org/conference/srecon15europe/program/presentation/shuff)

Google:
[http://static.googleusercontent.com/media/research.google.co...](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44824.pdf)

------
gwright
While I understand that NIH syndrome is a real thing, it is very dissapointing
to read many of the comments here.

I think very few HN readers are really in a position to have an informed
opinion regarding Github's decision to build new piece of software rather than
using an existing system.

Personally I find this area quite interesting to read about because it is very
difficult to build highly available, scalable, and resilient network service
endpoints. Plain old TCP/IP isn't really up to the job. Dealing with this
without any cooperation from the client side of the connection adds to the
difficulty.

I look forward to hearing more about GLB.

------
Ianvdl
Given the title and the length of the post I was expecting a lot more detail.

> Over the last year we’ve developed our new load balancer, called GLB (GitHub
> Load Balancer). Today, and _over the next few weeks_ , we will be sharing
> the design and releasing its components as open source software.

Is it common practice to do this? Most recent software/framework/service
announcements I've read were just a single, longer post with all the details
and (where applicable) source code. The only exception I can think of is the
Windows Subsystem for Linux (WSL) which was discussed over multiple posts.

~~~
logicalstack
Joe from GitHub here, frankly there is a lot we want to talk about and release
and it was simply too much for one post. We'd like to give it a proper
treatment and single very long post won't do that. Also, it allows us to get
folks interested in the project and give us time to prepare our code for
release. It's a surprisingly big job.

~~~
otterley
Personally, I would have preferred you waited until you could release all the
documents at once. I admit I was interested, but I've seen too many people and
organizations start a conversation but never finish it or show the goods. It's
misleading and unfair to dangle a solution when all you really have is a
problem.

Take, for example, this post from CoreOS back in March 2016 that suggested
that they might know a way to improve systemd-journald performance:
[https://coreos.com/blog/eliminating-journald-delays-
part-1.h...](https://coreos.com/blog/eliminating-journald-delays-part-1.html)

It smelled suspicious, but its release generated a bunch of noise on HN
anyway. And they never followed up with subsequent parts, which suggests to me
that they never found a solution in the first place.

I'm not suggesting that GitHub is blowing smoke -- if you truly have a
solution, that's great! But there's no harm in gathering documentation and
source code and cleaning it up and waiting until it's good and ready to go.
Otherwise, I frankly mistrust the motives and abilities of those involved.
Call me cynical if you must.

To paraphrase from another industry, "sell no wine before its time." There's a
lot of wisdom there that is equally applicable to products in our industry
too.

~~~
JdeBP
As if by magic, part 2 has just appeared. (-: See
[https://news.ycombinator.com/item?id=12603322](https://news.ycombinator.com/item?id=12603322)
.

------
wtarreau
Did people really read the article ? For me it was pretty clear, maybe it
involves some regular load-balancing terms that people are not familiar with,
because I'm seeing a lot of bullshit written in the comments, but here is what
is described there :

\- in a traditional L4/L7 load balancing setup (typically what is described in
my very old white paper "making applications scalable with load balancing"),
the first layer (L3-4 only, stateless or stateful) is often called the
"director".

\- the second level (L7) necessarily is based on a proxy.

For the director part, LVS used to be used a lot over the last decade, but
over the last 3-4 years we're seeing ECMP implemented almost in every router
and L3 switch, offering approximately the same benefits without adding
machines.

ECMP has some drawbacks (breaks all connections during maintenance due to
stateless hashing).

LVS has other drawbacks (requires synchronization, cannot learn previous
sessions upon restart, sensitivity to SYN floods).

Basically what they did is something between the two for the director,
involving consistent hashing to avoid having to deal with connection
synchronization without breaking connections during maintenance periods.

This way they can hack on their L7 layer (HAProxy) without anyone ever
noticing because the L4 layer redistributes the traffic targeting stopped
nodes, and only these ones.

Thus the new setups is now user->GLB->HAProxy->servers.

And I'm very glad to see that people finally attacked the limitations everyone
has been suffering from at the director layer, so good job guys!

------
gumby
They talk about running on "bare metal" but when I followed that link it
looked like they were simply running under Ubuntu. Is it so much a given that
everything is going to be virtualized?

When I think of "bare metal" I think of a single image with disk management,
network stack, and what few services they want all running in supervisory
mode. Basically the architecture of an embedded system.

~~~
wmf
Yes, it is assumed that all startups are running in EC2 us-east-1 and "bare
metal" is the accepted term for non-virtualized systems.

~~~
colemickens
I don't get it. What else is bare metal meant to mean? "bare metal" =
"embedded system"? What does "embedded system" mean then? I guess my age /
cloud-nativeness is showing?

~~~
unwind
In the embedded space, there often isn't any type of OS or kernel between the
application code and the hardware resources ("the metal").

If I want to send out a character through the board's serial debugging port, I
don't do an open()/write()/close(), I poke the UART's transmit register.

When they said "bare metal", I too thought they ran without OS which had been
kind of cool.

------
yladiz
I'm of two minds about this. Part of me agrees with many of the commenters
here, in that Not Invented Here syndrome was probably in effect during the
development of this. I don't really know Github's specific use case, and I
don't know the various open source load balancers outside of Haproxy and
Nginx, but I would be surprised if their use case hasn't been seen before and
can be handled with the current software (with some modification, pull
requests, etc.). On the other hand, I would guess Github would research into
all of this, contact knowledgeable people in the business, and explore their
options before spending resources on making an entirely new load balancer.
Maybe it really is difficult to horizontally scale load balancing, or load
balance on "commodity hardware".

That being said, why introduce a new piece of technology without actually
releasing it if you're planning to release it, without giving a firm deadline?
This isn't a press release, this is a blog post describing the technical
details of the load balancer that is apparently already in production and
working, so why not release the source when the technology is introduced?

------
p1mrx
GitHub only speaks IPv4, so I would be extra-skeptical about using any of
their networking code to support a modern service.

------
NatW
I'm curious if they looked into pf / CARP as part of their research into
allowing horizontal scalability for an ip. See:
[https://www.openbsd.org/faq/pf/carp.html](https://www.openbsd.org/faq/pf/carp.html)

~~~
logicalstack
CARP and similar systems require an active/passive configuration which we did
not want since it needs at least twice as many hosts, half of which are not
doing any work. We had similar issues with our former Git storage system based
on DRDB ([http://githubengineering.com/introducing-
dgit/](http://githubengineering.com/introducing-dgit/)).

pfsync, lvs and etc uses multicast to share connection state which we also
wanted to avoid.

~~~
voltagex_
Why do you want to avoid multicast?

~~~
ctrlrsf
You would need a large L2 network to support it, or you'd have to route it at
L3, which is not trivial.

------
treve
I half expect a comment here explaining why Gitlab does it better ;)

~~~
sytse
:) We're not doing this better.

We're struggling with our load balancers right now. We're using Azure load
balancers and then HAproxy. But the Azure ones sometimes don't work. Luckily
the new network type on Azure supports floating IPs so we can set something up
ourselves [https://gitlab.com/gitlab-
com/infrastructure/issues/466](https://gitlab.com/gitlab-
com/infrastructure/issues/466)

~~~
dogismycopilot
I would love to see the solution. We're also desiring to run HAProxy in Azure
with keepalived (even in unicast mode). The black-box "windows based" load
balancer that Azure offers is quite limited.

~~~
sytse
Cool, we're trying to do all the infrastructure work in the open under
[https://gitlab.com/groups/gitlab-com](https://gitlab.com/groups/gitlab-com)

------
jedberg
Awesome. The whole time I was reading I was thinking "they need Rendezvous
hashing". And then bam, last paragraph mentions that is in fact what they are
using.

------
lifeisstillgood
I love using GitHub and appreciate the impact it is and has had. But this post
is what is wrong with the web today. They have taken a distributed-at-
it's-plumbing technology, and centralised it so much that now we need to
innovate new load balancing mechanisms.

Years ago I worked at Demon Internet and we tried to give every dial up user a
piece of webspace - just a disk always connected. Almost no one ever used
them. But it is what the web is _for_. Storing your Facebook posts and your
git pushes and everything else.

No load balancing needed because almost no one reads each repo.

The problem is it is easier to drain each of my different things into globally
centralised locations, easier for me to just load it up on GitHub than keep my
own repo on my cloud server. Easier to post on Facebook than publish myself.

But it is beginning to creak. GitHub faces scaling challenges, I am frustrated
that some people are on whatsapp and some slack and some telegram, and I
cannot track who is talking to me.

The web is not meant to be used like this. And it is beginning to show.

~~~
audleman
Are you saying instead of having Github, we should all be hosting our own Git
repos?

> But it is beginning to creak. GitHub faces scaling challenges,

I don't agree that Github facing scaling issues means the web is creaking.
More like old wooden boats are being replaced by big, sturdy battleships. I
think the web is getting stronger thanks to engineers facing the challenges
coming their way.

> I am frustrated that some people are on whatsapp and some slack and some
> telegram, and I cannot track who is talking to me.

If you're annoyed by people messaging on you through multiple platforms, it
seems the solution would be to only have one provider. But you earlier call
that "what is wrong with the web today," and that we should have distributed
systems.

~~~
lifeisstillgood
>>> Are you saying instead of having Github, we should all be hosting our own
Git repos?

Well, yes. That's the point. It was designed as an entirely distributed setup.
It's crazy that in order to post a message to my neighbours I have to send
data to Facebook in SV and just as crazy that two devs on the same team need
to write their code commits in a load balanced mega server in ... Err ...
Washington? Wherever.

And I don't mind having lots of clients but I object to no open standards,
incompatible and frequently unavailable APIs and lack of control over my
messages and how they are dealt with. I want procmail for messaging platforms
! And I want a pony !

~~~
josegonzalez
No one is forcing you to use github or facebook. I host my own gitlab
installation for certain private repositories and I still send emails to some
people when coordinating outings.

~~~
lifeisstillgood
I Am not feeling forced to use it. I use it because it is easier for me as an
individual developer. I use readthedocs because they have better uptime than
my own servers.n All the reasons i use GitHub are good choices for me.

I get the economics of centralised vs decentralised service provision - it's
just ironic that GitHub is facing load balancing problems precisely because
they have taken a distributed technology and made it, de facto, a centralised
technology.

We can imagine a perfect storm of GitHub going down just as someone pulls a
vital package from npm and Google losing jquery CDN; all Of a sudden the web
will stop working.

It's amazing how fragile we can make a system designed to be resilient - I
presume there is a real cost with keeping things distributed that a good
economist could explain to me

------
contingencies
I am intrigued by their opening statement of multiple POPs, but the lack of
multi-POP discussion further in the system description.

My understanding is that the likes of, for example, Cloudflare or EC2 have a
pretty solid system in place for issuing geoDNS responses (historical
latency/bandwidth, ASN or geolocation based DNS responses) to direct random
internet clients to a nearby POP. Building such a system is not that
difficult, I am fairly confident many of us could do so given some time and
hardware funding.

Observation #1: No geoDNS strategy.

Observation #2: Limited global POPs.

Given that the inherently distributed nature of _git_ probably makes providing
a multi-pop experience _easier_ than for other companies, I wonder why
Github's architecture does not appear to have this licked. Is this a case of
missing the forest for the trees?

------
lamontcg
Why not just use DNS load balancing over VIPs served by HA pairs of load
balancers?

Back in the day we did this with Netscalers doing L7 load balancing in
clusters, and then Cisco Distributed Directors doing DNS load balancing across
those clusters.

It can take days/weeks to bleed off connections from a VIP that is in the DNS
load balancing, but since you've got an H/A pair of load balancers on every
VIP you can fail over and fail back across each pair to do routine
maintenance.

That worked acceptably for a company with a $10B stock valuation at the time.

~~~
manigandham
Company stock value has nothing to do with their scaling, performance and
customized processing requirements.

------
madmulita
We are in the process of moving all of our infrastructure to OpenStack,
OpenShift, Ansible, DevOps, Microservices, Docker, Agile, SDN and what not.

There are some brainiacs pushing these magic solutions on us and one of the
promises is load balancing is not an issue, even better, it's not even being
talked about.

Please, please, tell me there's something I'm missing.

------
squiguy7
I know they mentioned their SYN flood tool but I recently saw a similar
project from a hosting provider and thought it was neat [1]. It seems like
everyone wants their own solution to this when it is a very common and non-
trivial problem.

[1]: [https://github.com/LTD-Beget/syncookied](https://github.com/LTD-
Beget/syncookied)

------
bogomipz
Do the Directors use Anycast then? That wasn't clear to me.

~~~
jssjr
Anycast usually implies traffic will be directed to the nearest node
advertising that prefix. The GLB directors leverage ECMP which provides the
ability to balance flows across many available paths.

~~~
bogomipz
Anycast and ECMP work together in the context of load balancing. ECMP without
Anycasted destination IPs would be pointless for horizontally scaling your LB
tier.

What Anycast means is just that multiple hosts share the same IP address - as
opposed to unicast. When all the nodes sharing the same IP are on the same
subnet "nearest" is kind of irrelevant. So the implication is different.

~~~
jssjr
Sure. Feel free to call it anycast then. I usually hear anycast routing used
in the context of achieving failover or routing flows to the closest
server/POP, but there is probably a more formal definition in an RFC that I'll
be pointed to shortly. =)

We are using BGP to advertise prefixes for GLB inside the data center to route
flows to the directors. In our case all of the nodes are not on the same
subnet (or at least not guaranteed to be) which is one of the reasons why we
chose to avoid solutions requiring multicast. I expect Joe and Theo will get
into more details about that in a future post though.

~~~
bogomipz
Are you running Quagga or Bird on the director instances then? I'm looking
forward to reading more about it.

~~~
logicalstack
We use Quagga.

~~~
jerkstate
This is really cool work, I worked with a team that implemented an ECMP
hashing scheme using a set of IPs kept alive by VRRP in a previous lifetime,
so I have a bit of familiarity with the space and a few questions.

The article says the L4 layer uses ECMP with consistent/rendezvous hashing. is
this vendor implemented or implemented by you using openflow or something
similar? How does graceful removal at the director layer work? I know you
would have to start directing incoming SYNs to another group, but how do you
differentiate non-SYN packets that started on the draining group vs. ones that
started on the new group?

If you are using L4 fields in the hash, how do you handle ICMP? This approach
could break PMTU discovery because a icmp fragmentation needed packet sent in
response to a message sent to one of your DSR boxes might hash to a different
box, unless considerations have been made.

------
alsadi
I never like github approach, they alway use larger hammers

