
Stack Overflow: The Hardware – 2016 Edition - Nick-Craver
https://nickcraver.com/blog/2016/03/29/stack-overflow-the-hardware-2016-edition/
======
evook
You know someone is a real hardware guy when he calls backup, monitoring and
logging non essential. Also a bit confused about the 2960Ss and the choice of
Cisco in general. I'd never thought Stack Overflow ran on Network Equipment,
what I'd describe as consumer hardware in networking.

~~~
Nick-Craver
To clarify a bit: I meant they are non-essential _to serving the sites_. If
they all went offline, visitors to Stack Overflow would not be aware.

~~~
evook
I thought so, but I liked your phrasing.

What I sadly don't get is the hierarchy of your routers and switches by the
pictures and the lack of QSFP+ utilization as well as the amount of copper.
What's the reason behind the copper sfp transceivers in the ASR-1001[0]?

[0][https://nickcraver.com/blog/content/SO-Hardware-Network-
NewY...](https://nickcraver.com/blog/content/SO-Hardware-Network-NewYork-
Rack.jpg)

~~~
Sanddancer
The 2960s seem to be in the background for the management side of the network,
doing ikvm and the like, therefore don't need 10 gigabits. Also, the copper in
the core routers seems to be more a case of the routers still having enough
bandwidth for the site, just needing more bandwidth, thus more ports
aggregated. 10gig ethernet's a good amount cheaper than 10 gigs of fiber and
works just as well for the short run they're doing..

------
diegorbaquero
This is serious Server HW porn. Thanks for sharing! Would love to know more
information about your bandwidth usage, limits, etc. Also, do you have any
closer idea as to how much this would cost in "cloud" services in % terms?
(I'm doing general research in Bare Metal & Colocation vs Cloud pricing
comparison). Thank you!

~~~
Nick-Craver
I commented a bit here with the architecture post:
[https://www.reddit.com/r/programming/comments/468p2m/stack_o...](https://www.reddit.com/r/programming/comments/468p2m/stack_overflow_the_architecture_2016_edition/d038yln)

But a better answer is: that price keeps changing. We now have a lot of AWS
and on-prem experience in house to do a great post. We'll be doing a lot of
research and proper comparisons as a huge part of that upcoming post:
[https://trello.com/c/4e6TOnA7/87-on-prem-vs-aws-azure-etc-
wh...](https://trello.com/c/4e6TOnA7/87-on-prem-vs-aws-azure-etc-why-the-
cloud-isn-t-for-us)

------
nxzero
Awesome pics; it'd be cool to see some people in them too. Thank you for all
the work you're doing for Stack Overflow/Exchange, it's an amazing resource!!

------
kogus
Quite a leap upward from the original:
[https://blog.stackoverflow.com/2008/04/our-dedicated-
server/](https://blog.stackoverflow.com/2008/04/our-dedicated-server/)

------
tkinom
How big is the Stack Overflow database size in disk? I assume you cache them
all in RAM thus you need 700+GB of RAM?

~~~
Nick-Craver
It's at 1.5TB of data and 200GB of T-Logs these days.

~~~
tkinom
It would be cool to see the graph of SQL server RAM size 64GB, 128GB, 256GB,
512GB size vs client access latency.

~~~
sfilipov
64GB is interesting one because it is ~6% of the total data size.

------
pbz
Do you have any plans to run on .NET Core? If so, how soon after it's out as
RTM? Any issues with this?

~~~
Nick-Craver
These plans are in flux. At a core level (heh), we nee to have libraries
ported first _anyway_. So we're doing that now. Jil, StackExchange.Redis,
Dapper, and Sigil are up on NuGet and core compatible today (some as pre-
release). Others like Exceptional are just pending some RC2 goodness to
release, and I'll be starting on MiniProfiler after that. We'll likely post
other internal tools to .Net Core and start heavy testing in the RC2 time-
frame.

Will we run Stack Overflow on core? Probably not for a while. I was asked
exactly this in a recent On.Net interview at the 24:22 mark; you can listen
here for reasoning:
[https://youtu.be/DJn8-Psznsw?t=24m22s](https://youtu.be/DJn8-Psznsw?t=24m22s)

------
saosebastiao
How are the drives used on their SQL server boxes? They have a massive NVMe
drive on raid 0 with 20 sata drives in raid 10? Which drives are used for prod
data? Do they run backups on the same box?

~~~
Nick-Craver
In the Stack Overflow cluster, the RAID 0 NVMe contains Stack Overflow
database and the SATA SSD RAID 10 holds the others (Mobile, Translations,
PRIZM, and Sites).

In the Stack Exchange cluster, the RAID 0 NVMe array contains all databases
except for a large log database (which I called Careers.BigStuff, because it
seemed like a good idea at the time). This larger log database is much more
rarely accessed and is on the 10K HDD RAID10.

We run backups on the primary for several reasons, but they are all sent off-
box. We have 2 primary on-site backup servers, then those backups go offsite
and to tape. Database backups are every 15 minutes with T-Logs and full
backups nightly. We also run copy-only backups in the DR data center nightly
as an additional backup measure.

------
daxfohl
Would love to see details on energy use / cost. Have you done any profiling
there? Is energy cost a big deal? Is that the main reason Providence doesn't
refresh continuously?

------
yuhong
Interestingly, the price of 32GB DDR3 LR-DIMMs has been falling as well. I
wonder if this is due to oversupply or something else since 8Gbit DDR3 will
never happen on servers.

------
bluedino
I'd love to hear more about their backups/deployment.

~~~
Nick-Craver
That's coming up! Deployment is the next post in queue:
[https://trello.com/c/bh4GZ30c/25-deployment](https://trello.com/c/bh4GZ30c/25-deployment)

------
mrkmcknz
This might be somewhat unrelated; Is there any reason Stack Overflow decided
against deploying on something like the Open Compute Project.

~~~
dsr_
Very likely because at just 6 racks of commodity (i.e. orderable without a
salescritter) hardware, they don't need that density or scale.

~~~
toomuchtodo
"Don't fix what ain't broken"

~~~
Nick-Craver
This, more than anything.

There's no compelling reason to invest in such a move for our situation. We're
very interested in how Open Compute progresses, though.

