
Evolution of GitHub's data centers - doener
https://githubengineering.com/evolution-of-our-data-centers/
======
rdl
This is pretty generic -- isn't that how virtually any decent footprint site
with physical infrastructure operates (at least from ~2000 onward)?

I'd be more interested in blog posts about either what they were doing before
which was different/worse and got upgraded, or how they decided on specific
things.

Their in-house tool gPanel is pretty cool, though
([https://githubengineering.com/githubs-metal-
cloud/](https://githubengineering.com/githubs-metal-cloud/)) -- I've seen a
lot of people build their own of that in ways which are far less
sophisticated, or used one of the ~10 other tools for it. How much effort an
org puts into making sure firmware is all the same is probably a good
indicator of how competent they are, IMO.

------
sandGorgon
> _In memtesting we boot a custom MemTest86 image and monitor it while it
> completes a full pass. Our custom version of MemTest86 changes the color of
> the failure message to red which allows us to detect trouble. We’ve hacked
> together a Ruby script that retrieves a console screenshot via IPMI and
> checks the color in the image to determine if we’ve hit a failure or not._

Wow. Why is this done this way? Is there no other way to receive hardware
memtest info? I would have thought that there was some way to revive machine
serial output status in a datacenter.

~~~
hueving
Yeah, this seems pretty terrible compared to a modified memtrest that reports
results.

~~~
dom0
Over network? Needs a networking stack in memtest.

Over serial? Physical serial is rare. SOL would work, but since it is
connection-based you can't poll it.

This seems like a simple and reliable method to me.

~~~
sschueller
Since they take a screenshot of the console they could output a qr code and
read that.

~~~
fapjacks
That's sort of what they're doing. It's just a two-bit QR code: Red, and Not
Red.

~~~
gumoro
So, one bit ;)

~~~
fapjacks
Hah! Yes, totally. Haha I wonder why I said that?

~~~
BraveNewCurency
There is also a "not decided yet" state. Hence, you were right, it needs two
bits. :)

------
rcarmo
Two things that stuck out at me:

\- No mention of an European data center (data sovereingty and regulation
might be something they don't care about, but I deal with it every day).

\- No mention whatsoever of virtualization (although there was an older post
mentioning a move to Kubernetes for a part of their stack).

I wonder what the economics of taking this to a public cloud provider would
look like (full disclosure - I work on Azure), and how much they could do
architecturally to benefit from that.

~~~
SOLAR_FIELDS
At a scale of Github’s it’s highly unlikely that this would be a cost
effective maneuver - see Gitlab’s effort to move off of cloud providers as an
example. There comes a point where you get so big that it no longer makes
sense to use third party hosting. Dropbox is another example of this.

~~~
manigandham
Gitlab isn’t moving: [https://about.gitlab.com/2017/03/02/why-we-are-not-
leaving-t...](https://about.gitlab.com/2017/03/02/why-we-are-not-leaving-the-
cloud/)

Dropbox was primarily about moving storage off S3 which makes sense given
their business.

~~~
lwf
Dropbox has run compute* in its own datacenters since 2008 or so.

*: That is, work that does not interact heavily with content

------
gm-conspiracy
Are they using Dell equipment?

~~~
rdl
Based on reference to iDRAC, certainly seems that way. Presumably they 1)
don't have a _huge_ amount of hardware (in the sense a commodity hosting
provider or cloud provider would) and 2) get a decent deal from Dell.

------
jameskegel
I was let down that this wasn’t more of a visual experience.

~~~
svdr
Here are 256 photos from Stackoverflow hardware:

[https://m.imgur.com/r/cableporn/X1HoY](https://m.imgur.com/r/cableporn/X1HoY)

