Hacker News new | comments | show | ask | jobs | submit login
StackOverflow Update: 560M Pageviews a Month, 25 Servers (highscalability.com)
479 points by quicksilver03 953 days ago | hide | past | web | 275 comments | favorite

StackOverflow shows just how powerful using a fast language can be. Compare the TechEmpower benchmarks for, say, Java vs Rails on a very simple JSON serialization benchmark (we can assume .Net would achieve comparable performance to the JVM):


The Java servers get around 900K requests/s on beefy hardware while Rails squeezes out 6K. That's a 150x difference! Any real application will be slower than this, and on cloud hardware you can expect to be a order of magnitude slower as well. You just don't have much headroom if you use languages like Ruby before you have to scale out. And once you scale out you have to worry about sys. admin. and all the problems of a distributed system.

It's a one off cost to learn an efficient language, but it pays returns forever.

> StackOverflow shows just how powerful using a fast language can be. Compare the TechEmpower benchmarks [...] (we can assume .Net would achieve comparable performance to the JVM)

Except those TechEmpower benchmarks show .NET is not nearly as fast as Java. I think StackExchange prove that the platform is NOT the most important: it's much more important to make performance a priority in all engineering decisions, to benchmark everything beforehand and develop new technology where there's no good standard solution available. (think Dapper, protobuf-net, StackExchange.Redis)

I'd like to add one caveat with respect to our C# tests: we did not run the tests on Windows in Round 9. With the help of contributors, we've recently revamped our benchmark toolset, and have yet to pull the Windows implementation of the toolset up to date. The C# data in Round 9 is exclusively from Mono on Linux.

To more accurately judge the request routing performance of C# tests on Windows, see Round 8 data. For example, see Round 8 limited to Java, C#, and Ruby tests on i7 hardware [1].

Other important notes on that data:

* The http-listener test is a low-level test implementation.

* The rack-jruby test is running on Torqbox (TorqueBox 4), which is based on Undertow and has very little Ruby code [2]. This test is mostly about Undertow/Torqbox, with a bit of JRuby.

Another interesting test is Fortunes on i7 in Round 8 [3]. Fortunes involves request routing, a database query, ORM translation of multiple rows, dynamic addition of rows, in-memory sorting, server-side templating, XSS countermeasures, and UTF-8 encoding. Here you will see Java frameworks at about 45K rps, ASP.NET on Windows at about 15K rps, and Ruby at about 2.5K rps.

[1] http://www.techempower.com/benchmarks/#section=data-r8&hw=i7...

[2] https://github.com/TechEmpower/FrameworkBenchmarks/blob/mast...

[3] http://www.techempower.com/benchmarks/#section=data-r8&hw=i7...

I agree with your general point, but I believe you need to choose components that have a chance of being fast if you ever want the overall system to be fast.

I didn't notice .Net in the benchmarks -- no .Net languages are listed but .Net is there as a framework. It's still x20 faster than Ruby.

Those test there are much more web server + platform tests, and can be hardly taken as java vs .net vs something

> That's a 150x difference! Any real application will be slower than this

How many applications are rate limited by the speed of the front-end language? Not that many; the speed of the backing store will usually be the bottleneck.

Unless of course, the front-end processing takes so much time because it's complex and/or the language is slow.

One example, Django template processing is slow

    Java servers get around 900K requests/s [...] while Rails squeezes out 6K
Yes, but that's for the /json test, which serves a static object that the framework converts to json.

I'm not sure it is really representative of the overall speed you'll get with whatever framework you'll be using: how often do you have a static page that's best served by your framework instead of leaving nginx serve it directly?

I prefer watching the test results that actually involve the database (/db, /queries, /fortunes and /updates), as it shows less raw speed for serving static things and more overall speed for your dynamic pages.

With /queries for example, Java does ~11.3K, while php (on hhvm) does ~10.7K, Python is at ~7.7K and Dart ~12.8K (For the fastest framework of each language. Rails still does bad though).

The reduced variance of the /queries benchmark suggests to me that the database is the bottleneck in this setup. This would be the first thing you'd attack if you were optimising for speed. For instance checkout the section on StackOverflow's caching -- they have 5 levels of cache, including in-memory caches on the web servers. They are doing a lot of work to take the DB out of the equation.

I prefer the /json test because it gives me a performance ceiling. I know that's the max performance I can expect, and it's up to me to design my system to get as close to that as I can.

On the other hand the db tests don't really tell me much. Does my data and data access patterns match theirs? Probably not. So it is difficult to generalise from their results. If the systems you build are always stateless web servers talking to a relational db then I can see it might be more useful.

This is a good point. It's true that for the high-performance frameworks and platforms, the database is the bottleneck. Or, more accurately—considering the trivial query and small payload—the overhead of the database driver and the wire protocol are significant factors. That said, there remains a fairly broad distribution over the Single-query test and Fortunes tests. For the low and medium-performance frameworks, the overhead of their ORM and other factors are more significant than the database. I find it can be illuminating that in many cases, the ORM code necessary to marshal result-sets into usable objects is more costly than the underlying queries.

Meanwhile, the 20-query test is a bit pathological as it runs into a brick wall with the database wire protocol and efficiency of the database driver. Many otherwise high-performance frameworks and platforms become bottle-necked waiting on those 20 queries per request. But you and I agree, a 20-query-per-request scenario should be the exception and not the rule. When developing a smooth-running web-application, it's common to aim for zero queries per page load. (For those who find this to be crazy talk, note that I'm saying we aim for that ideal, and may not necessarily achieve it.)

I too particularly enjoy knowing the high-water mark set by the JSON and Plaintext tests. When we add the next test type (caching-enabled multiple queries), we should see some interesting results.

Conversely, you might be able to iterate faster with a 'web' language and create a product people really use in a shorter time frame and worry about scaling later!

5 years ago it might have been true that RoR was that much more productive it was worth using, but that doesn't match my experience today using RoR and Scala.

Update: To try to avoid turning this into a language war, there are other good reasons to use something like RoR. Ease of hiring is one. Using known tools is another, if you're not building anything you expect to get load.

Strange, to me Java and ASP.NET (as used by SE) are web languages.

No, they are general purpose languages that can be used for the web.

Languages like ColdFusion and PHP are web languages that don't make much sense outside of the web.

> It's a one off cost to learn an efficient language, but it pays returns forever.

It also imposes costs forever.

Software is never finished.

Java 8 + an IDE like NetBeans (which isn't even the best Java IDE) is actually a very pleasant development experience. I am a total convert from Python in vim to Java in NetBeans+jvi.

Refactoring in Python or Ruby or Javascript or whatever dynamic language is extremely painful compared to doing it in a static one. The "Software is never finished," argument can also work against dynamic languages.

Have you tried IntelliJ?

Yeah, I have. It seemed objectively better, but I started using NetBeans first, and it's what we all use at work. The workflows and keyboard shortcuts are ingrained enough at this point that switching doesn't really seem worth it right now.

Your point is that more efficient languages are slower to develop in? This hasn't been the case for a while, in my experience.

> The cost of inefficient code can be higher than you think. Efficient code stretches hardware further, reduces power usage, makes code easier for programmers to understand.

I'm curious what the reasoning is for "Efficient code ... makes code easier for programmers to understand". To my mind, efficient code (in this case, I assume coding to the hardware, as they mention elsewhere), has many benefits, but making it easier to understand is not one of them. A useful comment by the code may help, but that's not a result of efficient coding, that's a result of good commenting practice.

In complex systems low-level efficiency is far less beneficial than high-level efficiency. To achieve high-level efficiency you need to have a clear understanding of how the system works as a whole. That is impossible is the code is unintelligible bag of tricks.

I've seen this a lot when working in PHP. I wrote some websites from ground-up (using something similar to my own framework). I optimized int multiple times to great results. The biggest benefits didn't come from making a particular function faster, they came from realizations that large chunks of complexity in templating, routing and permissions checking subsystems simply weren't necessary. It doesn't matter how clever those chunks were written, because I got rid of them completely.

I think this is what the author speaks about.

They may just be trying to express the thought that simpler code is both easier to understand and often more efficient. Don't forget Joel Spolsky's contributions to the effort; he's the one who coined the term "architecture astronaut".

Definitely. Some of the most efficient C code I have seen is incomprehensible to anyone but a C expert and even then it takes several minutes to understand all of the nuances of what might only be a 4-line function.

I experienced exactly this when I cracked open the source to a GNU tool to see if I could figure it out.

Sometimes "efficient" and "easy to understand and be sure is correct" don't have to be mutually exclusive; see this example [1] of Java and Go.

Note that the code is autogenerated, so it should be equally efficient. The Go version also happens to be very simple and no different than most humans would write by hand (without trying very hard to optimize).

[1] https://gist.github.com/shurcooL/9f94bbd021b4693cf584

The code samples in your gist aren't doing the same thing. The Java version decodes the UTF-8 stored in the protobuf into Java's native UTF-16 Strings on first access, while Go strings are native UTF-8 and so only a nil check is necessary.

Or are you saying that languages should always use UTF-8 natively? I would agree with you on that, but disagree that this proves your point that "efficient" and "easy to understand and verify correctness" aren't mutually exclusive. pb.getSomeId().charAt(1000) runs in constant time in Java (albeit failing to give correct data if getSomeId() contains codepoints higher than \uFFFF), but pb.GetSomeId()[1000] will give you garbage data in Go if your field contains non-ASCII text. To get a valid codepoint in Go, you'd need to do string([]rune(pb.GetSomeId())[1000]), which runs in linear time and omits the check for valid UTF-8 bytes.

That's a good point, it seems the Java code is checking if the entire string is valid UTF-8 while the Go version isn't. I wonder why the behavior of generated protobufs is different between the two.

I'm not disputing that efficient code can be easier to understand in some instances, just that the assertion that efficient code is easier to understand. Logically, disputing that just requires a single instance of efficient code that is hard to understand. I think that's a trivial enough example to be self evident. In reality, I'm just wondering what they were trying to convey with that statement. I doubt they would have written it without reason, so I'm curious to the reason.

Fair point.

It's possible when they said "efficient", they did not mean "hardware efficient, runs fast, doesn't use a lot of resources" but "developer efficient, faster to write and easier to understand". But that entire phrase as is doesn't make sense, I agree with that.

It's also possible to be a simple mistake. Maybe they went overboard with marketing phrases and claimed something that isn't quite true.

We never said that. The closest I can think of is that simple code (short stack, no DI, layering, etc.) is simpler AND faster at the same time, and in my talk I give specific examples.

To me, that sounds like you are saying simple code is often efficient (or that efficient code can by simple), not that efficient code is simpler (which is how I interpret the original text).

To be very clear, I don't doubt the author (you?) had a good point to make, just that it was unclear what that was from the way it was presented, and to such a degree that it took me out of the flow of reading the post, and perhaps a less ambiguous way of expressing that concept (or omitting it, if it's expressed succinctly elsewhere) would be clearer and more effective.

Sorry, I'm not the author, I'm the guy at Stack Overflow that's being cited by the article (Marco C.).

All, true, which is why I outlined my assumptions and left it open for others to help try to figure out. :)

Unfortunately, if it is just poor communication on their part, we'll likely get no answers that can be considered more likely than others without original author or someone related (work-wise) piping up.

The generated Java code is just bad. You can simplify it by adding imports and removing the redundant variable: https://gist.github.com/electrum/d2f7668f2a7eb95420a9

Of course, as others have mentioned, it is validating the UTF-8 but the Go version is not.

It's not clear to me what argument the example is supporting.

The Go code is very concise and the Java code is very verbose.

The Go code does look more efficient but what is ImpressionData? Where is it checking the string is valid utf8?

Why would you send a string which is not valid utf8 over the wire in your system? That's the kind of validation that should probably be done prior to that. Or if it hasn't, you can do the validation manually on the receiver. There's no reason to incur the cost on each transmission.

    users=db.get("SELECT name FROM users WHERE group='admin'")
is more efficient and understandable then:

    tmp=db.get("SELECT name,group FROM users")
    for (i=0;i<tmp.count();i++)
      if (tmp[i]['group']=='admin') users[]=tmp[i][name];

But who in their right mind would do the second one?

Obviously you can make code both less efficient and less readable. But starting with code that a competent programmer has written, I find it seldom makes it more readable when you make it more efficient.

  But who in their right mind would do the second one?
Oh abritishguy, I can assure you that the world is awash in such magical code snippets.

> competent programmer

It's amazing the sorts of things otherwise sharp programmers will write when they're tired, hurried, and devoting cycles to more interesting/complicated parts of an implementation.

> true scotsman

>>But who in their right mind would do the second one?

Surely you jest. I don't know who, but their handiwork is everywhere.

People using ORM can end up writing the second one without it being obvious.

i tend to agree. take dijkstra's algorith implemented with a fibonacci heap or something very simple like

for i = 1 to 100 if x = this do that and something with i else something else with i

easy to read, less efficient than: if x = this for i = 1 to 100 do that and something with i else for i = 1 to 100 do something else with i

from my experience, compilers usually dont hoist very well

edit: sorry, spaces didnt come out well

This is where a good abstraction layer can pay dividends. Of course, a bad abstraction layer (as evidenced by Spolsky's "leaky abstraction" post (http://www.joelonsoftware.com/articles/LeakyAbstractions.htm...) on the opposite hand can add immeasurable cost.

Not to mention, you'd never write code like the second one in a .NET ecosystem, unless you were purposefully trying to write bad, unmaintainable code.

Really hate the separate sites thing. I use a half dozen of them and they are computer related, so it is a pain in the ass always having to register and not being allowed to comment/answer at first, etc.. They have a few improvements now like importing your profile from other sites, but they shouldn't even have so many separate sites in the first place, just a tag or a category or something.

There is a deliberate reason the sites are split. Spolsky talks about in depth here: http://vimeo.com/37309773 Since it's largely sociological, it's also the reason my money is on SE as opposed to Quora in this space.

Thanks - that's a great watch.

Essentially they're trying to keep SE from becoming a "chat" community, and keep it focused on being a Q&A community.

With Yahoo Answers, Usenets, even Subreddits, you have people having conversations within posts, which generates noise for people who were looking for answers to questions.

SE has chat forums now, integrated with the main Q&A forums.

You get +100 rep bonus for linking your accounts, so you shouldn't have a problem being allowed to comment.

You do need to have more than 200 rep on one site though.

Which one? I'm a full member on all the StackExchange sites I care about for now with the +100 cross-membership bonus.

In order to get the +100 bonus on all sites, you must have one profile on the network with >200 reputation.

How? I can't seem to find this option in account admin panels.

It happens automatically when you use the same account on multiple SE sites.

It happens automatically, if I remember correctly.

I agree. Diffuses the quality of the related sites. Means you have to deal with twice as many moderators. I dont see why they couldnt have done a "subreddit' approach where you subscribe to the SE category you want and it shows on your homepage.

Hum, each subreddit has its own moderators as well. They also have a front page with questions from the different sites[1], though it's not filtered by the ones you have an account on.

[1] http://stackexchange.com/

But you don't need to create new accounts with different passwords, and login/logout separately whenever you navigate between subreddits.

You don't need to create accounts with different passwords on SE sites either. In fact, SE has pushed for single sign-on like probably no other large site did.

But you still have to go to a different site to have any hope of getting the chance to ask your question.

  It'd be like going to Home Depot and needing information that is related to building supplies but the employees tell you you must go to a different store down the street to even ask your question.

On Reddit you have to post separately on different subreddits, there's no way to simply "post to Reddit" either.

I don't agree with your analogy; I think it's more akin to going to Home Depot to ask for vegetables. I think it makes perfect sense to have a separation between Cooking.SE, Photography.SE, Christianity.SE, etc.

There are a few edge cases, primarily in the tech sites, but I don't think those disprove the model.

Yeah Im speaking primarily about the technical SE's sites. I agree with the completely isolated SE (Photography, Puzzles, religion).

But when you get the related topics servers, programming, security, tools, web development, web apps... All these items are so closely related it's often disadvantageous to try and ask a vertically silo'd question.

I get they are trying to create a detail 'manual' for all questions, but you get to a point where it's just better to read the manual.

"One problem is not many tests. Tests aren’t needed because there’s a great community... If users find any problems with it they report the bugs that they’ve found."

I'm often surprised at the paucity of test-coverage in relatively large companies.


I think we're going trough a thesis/anti-thesis cycle on tests - in the beginning, there was militant testing, 100% coverage, testing getters and setters etc (as well as more complex stuff, obviously). Then some people started coming around to the idea that there are actually large swathes of code that is simple enough that testing doesn't actually add much value especially compared to the effort of writing them, then that probably got a bit out of hand (to what you're referring to). Maybe the pendulum will swing back and we'll find a good heuristic for just how much testing is the right amount that isn't all or nothing?

It really also does depend on the language, or rather compiler. Specifically, tests are far more useful when static analysis doesn't exist.

Actually, in the beginning there was no testing and even justifying having automated test infrastructure or spending developer resources on unit tests was a hard sell to management (what are they paying that QA dept. for after all).

That bullet point surprised me. It comes right after "110K lines of code. A small number given what it does"; to me this reads like they didn't write tests because that would add complexity.

Can any SO devs give us more details on "not many tests"? Or how many bug reports get filed vs rate of change of software?

The things that obviously should have tests have tests. That means most of the things that touch money on our Careers product, and easily unit-testable features on the Core end (things with known inputs, e.g. flagging, our new top bar, etc), for most other things we just do a functionality test by hand and push it to our incubating site (formerly meta.stackoverflow, now meta.stackexchange).

You can look at reported bugs here: http://meta.stackexchange.com/questions/tagged/bug -- it's pretty frequent, at least one every couple of hours or so. The last commit in our "Tests" project was 14 days ago, where as the last commit in the code base was 4 minutes ago.

You should come down our way and do a tech talk about it.

If only for the wailing, gnashing of teeth and rending of garments.

Your people would literally murder me. Are you back from west coast / toronto?

I am, yeah. We do tech talks every Tuesday. I can promise free food that's not as good as yours.

Indeed, they seem to have achieved what many have said is literally not possible.

Also, some might remember when they had Uncle Bob on the podcast, the topic being unit testing, and iirc Bob preaching the gospel (~ you must write plentiful unit tests, or else) and Jeff and Joel (especially) more or less saying they don't quite get why.

(Am I remembering this story correctly? I swear that's how I remember it.)

It's been my experience that outside-in TDD seems like impossibly strict nonsense until you do it with someone who is good at it.

Then after a while it seems like an obvious way to work.

> With their SQL Servers loaded with 384 GB of RAM and 2TB of SSD, AWS would cost a fortune.

I have next to zero experience with server administration, but 384GB seems like a lot to me. Is that common for production servers for popular web services? Do you need a customized OS to address that much memory? Seems like you'd really need to beef up the cache hierarchy make 0.38TB of RAM fast.

There are a few erroneous replies to your comment, re: Windows support for that much memory.

Memory support for versions of Windows:

                           	  User-mode process	   Kernel-mode
                           	    address space  	  address space
                           	  (process bit-ness)
                           	32-bit        64-bit	        all
  Server Operating Systems:	                    	
    Windows Server 2008 R2 	  4 GB*      2048 GB	    2048 GB
    Windows Server 2012    	  4 GB*      4096 GB	    4096 GB
    Windows Server 2012 R2 	  4 GB*    131072 GB	  131072 GB
  Client Operating Systems:	                    	
    Windows 7 (x86)        	  3 GB*          n/a	       2 GB
    Windows 7 (64-bit)     	  4 GB*       192 GB	     192 GB
    Windows 8 (64-bit)     	  4 GB*       512 GB	     512 GB
    Windows 8.1 (64-bit)   	  4 GB*    131072 GB	  131072 GB
  * requires IMAGE_FILE_LARGE_ADDRESS_AWARE flag, otherwise 2GB
  ** All units use binary exponents (GB = 2^30 bytes)

process address space != physical memory

Linux (SuSE enterprise something, possibly with a custom kernel, I’m just a user, not admin) runs just fine on 5.2TB machine, though I can easily imagine you’d have problems if you tried it with, say, DOS :)

The system obviously behaves a bit different from a standard desktop machines, e.g. different areas of RAM are differently fast, depending on the core on which your current process runs, you need to disable individual CPU lines displayed in top etc., but apart from these, mostly everything seems “normal” to me.

> e.g. different areas of RAM are differently fast, depending on the core on which your current process runs, you need to disable individual CPU lines displayed in top etc.

this is NUMA, in case anyone 'new to sysadmin' or to architecture is trying to Google this

I don't believe that much RAM is uncommon for large scale database servers. 384GB RAM is only about $5000 from Dell. They also have a new server model coming out that supports up to 6TB of RAM [0].

[0] http://www.dell.com/us/business/p/poweredge-r920/pd

Most people over estimate hardware cost. Buying a 300-500GB ram server maybe very expensive. But buying a distributed data base expert likely costs at least an order of magnitude higher per year.

Same server in AWS, it'll cost $50k+/month for the same thing. SO just bought it one time fee for probably around $30k or so, I'll say they are doing something right!

The don't have servers with that much RAM, but a 244GB RAM server with 8x800GB SSDs and 32 cores costs <$5k a month ad-hoc and about $1300/month with a 3 year commitment.

$1300/month w/ a 3 year commitment is $46.8k over the lifespan. Which is more then 4x more then the actual hardware cost. Your ROI on a physical server with the same stats at that price tag would be 8 months.

Not saying its totally worth it, but the upside is that they take care of power costs, physical space costs, networking, and if the hardware fails they'll put your application on a new server. Basically you take all of the risk out the equation.

Sure, but add power and cooling for these things plus physical space? (I remember when these big db's needed the floors in the data centre reinforcing since they were a > metric ton )

Still, I agree re. the difference in price. I'd buy too

Worth noting: AWS also doesn't have Enterprise SQL available, so plan on buying that yourself. That's a not-so-insignificant piece of the overall cost to many people.

How common is it to buy one extremely powerful database server? Wouldn't you need the high availability properties of something like a Galera cluster? Or are most people accepting the single point of failure at huge database server?

(I honestly don't know, it's just surprising.)

While you do want HA you also want to keep things simple. You get great HA with just one backup machine. The times you lose two machines are so few that you might crash the site with software failures far more often anyway.

Databases have inherent locking problems that takes more resources to resolve with more machines, more so in some database systems than others. When scaling you often hit a point where you get down to macro logistics, so imagine a car highway: You can get more throughput by adding more lanes. But what that all cars are going to merge to one lane at some point in the middle of the trip (because of a tunnel, bridge or something: it's impossible to have more than one lane at that point)? Now you won't get any throughput benefits of the multiple lanes after all! Just more latency because you get queues up until the single lane and because of the queues all cars are going slow and need to accelerate on the single lane, so the average speed is low too. You are also having too much resources after the single lane because you can never fill all those lanes. It might be better to have one beefed-up single-lane road all the way that people can go fast on. Basically: Remove locks and you get better overall performance.

Yes, this is on the expense of HA. Yes, the costs of scaling up grows asymptotically faster than the costs of scaling out. So this is definitely a trade-off in some sense.

They do have a cluster: http://www.brentozar.com/archive/2012/09/microsoft-sql-serve...

They just run everything through one live database all the time under normal conditions. Personally, I think this is much simpler to manage than to be regularly spreading out queries over many queries.

Windows Server handles that much RAM fine, and, so long as it fits, SQL Server will end up caching the entire database in RAM.

If I remember correctly they're using Dell R720's for their SQL Server machines. At current Dell retail pricing 384GB of RAM would be about $5,900.00.

At that price I'd vote for throwing money at raw hardware and reaping the benefits of caching the database in RAM, versus trying to pay programmers to come up with clever techniques to deal with having less RAM.

It's not a server, but I've seen machines used in physics simulations have something on the order of 256 GB RAM in them. As far as I know they cost maybe $20k-$30k, so not absurdly expensive.

I don't think addressing that memory is a problem on any x64 architecture. Windows Server 2012 has a 4TB upper limit on RAM, apparently [1], and as far as I can tell there's no reason for this other than product differentiation between different "levels" of the operating system.

1. http://msdn.microsoft.com/en-us/library/windows/desktop/aa36...

$4k is all you need to build a whitebox 256GB machine: dual-socket G34 motherboard with 16 DDR3 slots lke the ASUS KGPE-D16 ($500) + two Opteron 6320 (2 * $300) + sixteen 16GB registered ECC DDR3-1600 (16 * $160) + chassis/PSU ($500) = $4160.

Doing this on Amazon r3.8xlarge ($2.8/hr) would cost you $8100+ over 4 months.

A DIY server can make sense in some cases. Eg. you are a very early-stage startup with almost no funding, and you are good at assembling and troubleshooting computers, and are ok with mere warranty on parts as opposed to full vendor support from HP/Oracle/Dell, go for it. That's how many startups started (see Google and their makeshift half-donated half-loaned boxes in 1998).

Don't forget to factor in the costs of running, cooling, and maintaining that much hardware (also bandwidth costs). I'm not saying it's $2k/month, I'm just saying don't compare the physical box and AWS without factoring everything in. At the end of the day, if you don't need the support, you're probably right that a dedicated box makes sense there.

AWS has among the highest bandwidth costs you're going to run into, if we're talking co-location, dedicated, or self-hosting. Their bandwidth costs are the sole reason I won't go anywhere near AWS yet, I consider their prices outrageously high.

10tb = roughly $1,000 per month

They're between 5 and 50 times more expensive on bandwidth than the options in dedicated / colo / self hosting.

You can negotiate with AWS.

I don't believe that startups in their garage days have anywhere near the necessary leverage to negotiate anything with Amazon...

Agree, not many people actually realize just how slow and expensive AWS actually is. Kudos to SO for going with their own hardware.

they negotiate, still. I know several startups which went to amazon, said, "hey, your competitor is cheaper, what is your value prop" and suddenly had 50% discounts, etc. It's amazing how much you get simply asking!

Even before OS'es supported that much people were sticking that much RAM on expansion bus cards and plugging them in and treating them like fast disks. So it is going to happen either way. The performance boost for the DB is just too much compared to it thrashing disks constantly and there are quite a few benefits to having fewer machines compared to many DB servers as well.

384 GB is fairly easy to achieve today with 16 GB DIMMs x 24 slots (many general purpose servers have up to 24 DIMM slots). Beyond that you need to jump to 24 GB DIMMs which are fairly expensive.

SQL Server only reads data from memory. When something is requested it is first looked for in RAM (or virtual) and if not found is copied from disk to RAM.

The more RAM you have the better without exception here.

I believe the idea is to hold as much of the DB in memory as possible, for speed, so huge RAM totals on DB servers are not unusual. 384 GB should be well within modern OS reach. Once you're computing at 64 bits, the next theoretical limit on addresses would be somewhere around 16 million terabytes.

edit: wrong number

Not that much, we have 128 gb ram on our hardware. That is pretty standard for upper midle class rackmounted servers.

64 bit computers has a memmory limit so large that it is not understandeble.

The ram is still lightning fast compared to other stuff.

384 gigs is pretty common these days. With Windows, you need at least the server edition to address that much, but if you're running a server, you're not gonna be using the desktop versions of the OS anyways. Pretty much everything else, it's just plug and play. The RAM is quite a bit if you're doing a typical cloud deployment, but SO seems to be more in the model of making sure all your eggs are in a really strong basket. Besides, at the end of the day, RAM tends to be a lot cheaper than lag.

Not true. Windows 8.1 64 bit would be happy with that much.

*everything else is plug and pay

The simplest rule of computing is, if you have a problem that can be solved economically by just throwing more RAM into a single machine at it, you should do it (instead of any other dimension, and assuming you don't have other constraints).

No, that's no common for production servers for popular web services, though- they tend to shard across other dimensions because the web services are often CPU or network-bound.

You can get servers with 384GB of RAM now from a vendor like Dell, without needing to call them for just under $14k.

Their SQL Server is Microsoft SQL Server running on (shock horror) Windows.

Says in the article:

Windows 2012 is used in New York but are upgrading to 2012 R2 (Oregon is already on it).

To be clear, most servers are moving to 2012 R2 (most already have) but the machines our SQL clusters run on, specifically, will not.

This is because you have to effectively (or literally) rebuild the Windows cluster from scratch and we just don't get that level of benefit from the 2012 to 2012 R2 upgrade. There are quite a few improvements we care about: native NVMe, better dynamic quorum, better DSC support, better SMB, and such...but not enough to make the upgrade worth it.

Nick Craver - Stack Exchange Sysadmin & Developer

Do I get this right? You can not upgrade a server cluster to the next version of the operating system without complete new installation??? Could you please write something about these kind of things, this is very interesting!

I'm not sure a full writeup would have much more detail, but yes. With Windows clustering (on top of which SQL 2012/2014 Availability Groups are based), you can only have a homogenous OS version across the board. Since they all must have the same version, you can't have something like a single 2012 R2 instance in an otherwise 2012 cluster...and since you can't do that, you can't upgrade one at a time. Yay! New cluster time instead!

Trust me, we're bitching about this as are most people and I think changes must be coming there. It doesn't matter how many fancy features you add to the OS if we can't upgrade to it, so they'll have to stop and address that problem.

I've recently started using their micro-ORM Dapper, and I like it a lot. I get the performance of hand-coded SQL, but without the tedious mapping from SqlDataReader to my entity.

I've been using Dapper for around three years, "it just works" and is very fast. I've been around the block a few times with ORM's - LINQ to SQL, NHibernate, Entity Framework, Massive, SubSonic. For small to medium size projects there's no contest, provided you don't mind hand crafting your SQL. Hat's off to Marc, Sam and team for building this.

Dapper is great - I promptly dumped NHibernate and Entity Framework for it. It's the only ORM I've found that makes it trivially easy (as it should be) to use stored procedures and views in queries.

+1 for Dapper - I'm a big fan. It is very fast and we use it to serve our live (mostly read-only) traffic, while using something like Entity Framework for the backend Administrative UI, which has a lot of inserts and updates.

There's a couple of user-contrib projects that do similar things for inserts & updates. Dapper Extensions is the one I'm using.


I wrote a Dapper extension for working with SQL Server's geospatial queries and types a couple of years ago - have they added anything like that yet? Otherwise I'd be happy to add it.

Marc Gravell added in the ability to put in pretty much any custom type without adding any dependency weight about a month ago. You can see the commit here: https://github.com/StackExchange/dapper-dot-net/commit/e26ee...

Look towards the end at the tests for example of how to hook up a custom type (it's pretty simple).

Cool - the one I wrote was years ago. Looks like it's not necessary any more.

Another library worth mentioning in this space is Massive, which uses .NET dynamics a lot (focusing slightly less on performance than ease of use). https://github.com/robconery/massive

Dapper is nice, but if you want to have the compiler check your queries, and still run fast, I recommend LINQ to DB: https://github.com/linq2db/linq2db

(not to be confused with LINQ to SQL from Microsoft)

To be fair, the compiler is checking against what you told it the database looks like. This can help with lots of things, but still has the fundamental disconnect problem you'll get at runtime with either approach.

A huge problem with the tradeoff we had (well, still have in some areas) is the generated SQL is nasty, and finding the original code it came from is often non-trivial. Lack of ability to hint queries, control parameterization, etc. is also a big issue when trying to optimize queries. For example we (and by "we" I mean, "I made Marc Gravell do it") added literal replacement to Dapper to help with query parameterization which allows you to use things like filtered indexes. In dapper we also intercept the SQL calls to dapper and add add exactly where it came from. Here's what that looks like, replicate for the other methods:

    public static IEnumerable<T> Query<T>(this DataContext db, SqlBuilder.Template template, bool buffered = true, int? commandTimeout = null, IDbTransaction transaction = null, [CallerFilePath]string fromFile = null, [CallerLineNumber]int onLine = 0, string comment = null)
        using (db.Connection.EnsureOpen())
            return SqlMapper.Query<T>(db.Connection, MarkSqlString(template.RawSql, fromFile, onLine, comment), template.Parameters as object, transaction ?? db.Transaction, buffered, commandTimeout);
And here's the our marking method tailored for our code, but you get the idea:

    private static string MarkSqlString(string sql, string path, int lineNumber, string comment)
        if (path.IsNullOrEmpty() || lineNumber == 0)
            return sql;

        var commentWrap = " ";
        var i = sql.IndexOf(Environment.NewLine);

        // if we didn't find \n, or it was the very end, go to the first space method
        if (i < 0 || i == sql.Length - 1)
            i = sql.IndexOf(' ');
            commentWrap = Environment.NewLine;

        if (i < 0) return sql;

        // Grab one directory and the file name worth of the path
        //   this dodges problems with the build server using temp dirs
        //   but also gives us enough info to uniquely identify a queries location
        var split = path.LastIndexOf('\\') - 1;
        if (split < 0) return sql;
        split = path.LastIndexOf('\\', split);

        if (split < 0) return sql;

        split++; // just for Craver

        var sqlComment = " /* " + path.Substring(split) + "@" + lineNumber + (comment.HasValue() ? " - " + comment : "") + " */" + commentWrap;

        return sql.Substring(0, i) +
               sqlComment +
This results in a comment at the top of the query like this:

    /* Models\Post.LinkedQuestions.cs@105 */ 
    select top (@top) 
         , p.Title
         , p.Score
It's such a simple approach using the caller member attributes, but it saves so much time tracking things down.

I'm curious if the comments above exist in the source code, or have been added here for the benefit of readers on HN?

That's a straight copy/paste from the Stack Overflow codebase (Helpers\SqlMapper.cs:332 at time of writing, if you care). Now the typos and snark are out there for everyone to see.

Another awesome one which I've used on a previous project is SqlFu, https://github.com/sapiens/SqlFu

So this is what, a 2000 request/sec peak? Over 11 servers, that's like 200 requests/sec peak per frontend?

The problem with scale-up is if you actually have to get a few times larger, it becomes super expensive. But fortunately hardware is increasing so much that you can probably just get away with it now. There's probably a crossover point we're rapidly approaching where even global-scale sites can just do all their transactions in RAM and keep it there (replicated). I know that's what VoltDB is counting on.

Peak is more like 2600-3000 requests/sec on most weekdays. Remember that programming, being a profession, means our weekdays are significantly busier than weekends (as you can see here: https://www.quantcast.com/p-c1rF4kxgLUzNc).

It's almost all over 9 servers, because 10 and 11 are only for meta.stackexchange.com, meta.stackoverflow.com, and the development tier. Those servers also run around 10-20% CPU which means we have quite a bit of headroom available. Here's a screenshot of our dashboard taken just now: http://i.stack.imgur.com/HPdtl.png We can currently handle the full load of all sites (including Stack Overflow) on 2 servers...not 1 though, that ends badly with thread exhaustion.

We could add web servers pretty cheaply; these servers are approaching 4 years old and weren't even close to top-of-the-line back them. Even current generation replacements would be several times more powerful, if we needed to go that route.

Honestly the only scale-up problem we have is SSD space on the SQL boxes due to the growth pattern of reliability vs. space in the non-consumer space. By that I mean drives that have capacitors for power loss and such. I actually just wrote a lengthy email about what we're planning for storage on one of our SQL clusters...perhaps I should echo it verbatim as a blog post? I'm not sure how many people care about that sort of stuff outside our teams.

Nick Craver - Stack Exchange Sysadmin & Developer

>I actually just wrote a lengthy email about what we're planning for storage on one of our SQL clusters...perhaps I should echo it verbatim as a blog post? I'm not sure how many people care about that sort of stuff outside our teams.

I'm sure some DBAs and devs here would find it interesting.

This is definitely the case. Every writeup from those in the trenches I read and share with coworkers.

  I actually just wrote a lengthy email about what we're planning for storage on one of our SQL clusters...perhaps I should echo it verbatim as a blog post? I'm not sure how many people care about that sort of stuff outside our teams.
Yes! please yes. The occasional real world gem like that is what makes scrolling past a thousand javascript-framework/obscure-language/alternative-datastore posts here worth it (for me).

I'm most impressed that all your traffic is passing through a single active HAproxy which is also terminating SSL?

TLS Session IDs and tickets of course are absolutely essential, but I'd be curious how many peak TPS (number of full handshakes / sec) you are seeing on HAproxy.

The alternative, fanning out your SSL termination to your IIS endpoints, unfortunately means running HAproxy at L2, so you lose all your nice logging.

I'm not sure how many full handshakes/sec we're running - the logging isn't as readily available there because we're explicitly not logging the listeners involved on separate proc binds. We are logging the traffic on the port 80 front end they feed. We have tuned the performance there via best practices and taking into account very recent HAProxy changes for various SSL windows added by Willy, Emeric and help from Ilya Grigorik. They're all awesome and tremendously helpful. We are on a version slightly newer than 1.5.2 (we build straight from the repo).

Even with all that SSL we're only running around 15% CPU at peak so it's not having any trouble. Most of that CPU usage does come from the SSL termination though - it ran around 3-5% CPU before. We're also working on much larger infrastructure changes that mean SSL termination much more local to our users, which means the HAProxy load will drop to just about nothing again. I'm working on a followup to my previous SSL post: http://nickcraver.com/blog/2013/04/23/stackoverflow-com-the-... that will explain why it's not already on for all hits...I think hacker news will have fun with hat one.

All that being said, there's no reason we can't forward that syslog traffic from the listeners to our logging infrastructure to get at least a counter out of it. If you're curious how we log, I'll explain a bit.

We feed it into a logging display called Realog built by Kyle Brandt and Matt Jibson in Go. Here are a few snapshots of the dashboard: http://i.stack.imgur.com/OIqhm.png http://i.stack.imgur.com/iQOP9.png http://i.stack.imgur.com/bfpUb.png http://i.stack.imgur.com/JZNy6.png

This lets us forward on custom headers to logging that the web servers are sending back. It tells us how much time we spent in SQL, redis, etc. We can graph it (bottom of the first link) and identify the root cause of any issues faster. It also handles parsing of that traffic from the syslog entry already, so we use it as a single logging point from HAProxy and it handles the JSON generation to forward that traffic data into logstash (a 300 TB cluster we're working on setting up right now).

As soon as we get the logstash piece working well and dogfooded thoroughly, I'll poke Kyle to open source Realog so hopefully some others can get use from it.

Nick Craver - Stack Exchange Sysadmin & Developer

My team would definitely be interested to read that. Please do!

yes please

Yes, the idea is to scale up as much as economically makes sense and only then scale out, but due to Moore's law we're still scaling up.

Not everyone has google-like problems that are betters solved by a battery of cheap boxes.

Actually most of the websites today are better solved by a battery of cheap boxes.

You simply put a load balancer (Nginx/ELB/HAProxy etc) in front of a fleet of smaller web/application servers that dynamically scale depending on traffic. That way it is cost effective, far more reliable, easier to scale and you can tolerate DC outages better.

As with most systems, the data store is the hard thing to scale out. That requires intelligence at the application level, and that might include CAP issues.

So for us, the SQL data store is the real “up” part of the equation. We have a fair amount of headroom there, so if we can keep sharding (and other data strategies) out of the codebase, so much the better.

(Load-balancing HTTP requests “out” is not a big deal and we are doing that.)

Marco/Stack Overflow here.

Simply put, this is not true at all. Most of the websites today run happily on a single box and the choice of architecture is frankly irrelevant because of the abysmal amount of traffic they get.

Large, and by large I mean top 100 sites, are large and typically complex systems. Each one is different and bespoke. Making broad generalizations doesn't really work, because typically they have parts that need to be scaled up, parts that need to be scaled out, etc.

Finally, if you read the article you would know that we already use a load balancer with multiple web servers. What we are discussing here is whether it's better to have 100s of cheap/cloud boxes versus a few big ones.

In our use case (we want fast response times) the second solution is demonstrably better.

Not quite sure how you derive a peak figure from a monthly average; but on average it's more like 560000000/(30 * 24 * 3600) = 216 req/s

I took 4 hours at 20 days to get the peak, figuring it probably won't be worse than that. The average over the month isn't a number you'll see in practice. Most perf engineering work seems to go to keep things working nicely while at peak. (And making sure you have good 95th percentile times.)

Makes me wonder why smaller website teams need dozens of engineers to keep their infinitesimally smaller app running.

From my limited experience, it's a lack of the kind of systems thinking present in companies like Stack Exchange, and present on the High Scalability blog, that results in a poor architecture.

And this is compounded by people who have little ability to troubleshoot performance issues. It's quite easy to hunt down the cruddy SQL queries in a DB, or realize that you spinning rust is too slow. But when it starts to come down to things like a blocking network fabric, which has some big fat buffer between two servers, that is killing your transaction speed - many will just start to blame the devs.

Of course, the second point is compounded by the first - the less systems thinking that went into the design in the first place, the harder it is to produce accurate hypotheses about the system in order to troubleshoot.

Whilst there is a lot to be said for the "right tool for the job", if you watch an artisan crafting something, you'll realize that despite having a huge number of tools, they actually get by with relatively few. This is a generalization, but they only use the full range when doing something new, solving a particularly tricky problem, etc. "All the gear and no idea" is certainly applicable in many start ups.

I personally think it's due to overly complex applications and systems with many moving (and breaking) parts.

Also StackOverflow often practices "scale-up" instead of "scale-out".

More specifically, I think it's due to long feature lists and multiple problem domains. I'm also thinking about in-house line-of-business software.

Also due to political managers that want to increase their head-count to give the impression they have more power.

There are hundreds of reasons at nearly every level of a company that can contribute to this situation. Its best summed up by Sturgeons Law, "90% of everything is crap."

Rigid adherence to certain technologies could be a culprit. "We're a Ruby shop!" is fine for problems where Ruby is great, but the limitations of the technology are bound to creep up somewhere. Not using a polyglot approach would be bad.

It can also be caused by too much embrace of polyglot tools. "We use the best tools for the job" can easily mean an architecture with Redis, Memcached, Mongo, MySQL, Angular, JQuery, Node, and several dozen third-party libraries, all glued together with Thrift or JSON-RPC. If all you're doing is a messaging app, you might be much better off with a single in-process Java app.

Usually you want your solution to be just right for the problem; solve the essential complexity but don't introduce any accidental complexity.

I wish more people considered this alternative when recommending the usage of queues etc. I always remember of this post: http://williamedwardscoder.tumblr.com/post/18065079081/cogs-...

Hacker News also runs on a single box using only in-language hashtables and lists. ViaWeb did too, and PlentyOfFish.

Guess they don't have Jeff Atwood and Joel Spolsky onboard.

As of about 2 years ago, Stack Exchange doesn't have Jeff onboard either.


Joel is very much still around and active. Jeff went off to build Discourse [0] although I think he still has a place at the co-founder table as invisible-benevolent-dictator.

[0]: http://www.discourse.org/

Probably because the engineers are a lot worse.

Extremely interesting read, especially because it goes against many fashionable engineering practices. They see (correctly) that abstractions, DI have their own compromises. Also, they're not religious (about technology), that's also quite rare.

"SO goes to great lengths to reduce garbage collection costs, skipping practices like TDD, avoiding layers of abstraction, and using static methods."

I don't understand this at all. What does TDD have to do with reducing garbage collection?

My guess is that they feel that the layers of indirection and abstraction often needed to make TDD work result in an object creation pattern that results in heavy GC load during normal operation. The references to "using static methods" is probably related to this.

ps. That's my guess, but I'd encourage you to post your question to the meta site for SO.

Marco from Stack Overflow here. We try to measure everything, including GC operation. We know that layers of indirection increase GC pressure to the point of noticeable slowness :-)

It's not a theoretical risk we're trying to avoid here, it's a real problem that needed solving.

IME that always happens when you try to perform tdd in combination with Javaesque encapsulation. The good solution to the problem is to not be so afraid of classes seeing each others internals. The bad solution is to add, factory patterns, dependency injectors and other useless layers just to try and keep your design both well encapsulated and testable.

You do realize that most efficient DI framework only inject the dependency once.

Separating Controller, Repository, and Services are good practices as well and let's be honest, we're looking at 3 methods layer at most.

Here's what happened in Java:

1. When you deploy your WAR, the DI will inject necessary component _once_ (and these components only instantiated _once_ for the whole web-app so there you go, Singleton without hardcoding).

2. A request comes in, gets processed by the pumped-up Servlet (already DI-ed). If the servlet has not been instantiated, it will be instantiated _once_ and the instance of the Servlet is kept in memory.

3. Another request comes in, gets processed by the same pumped-up Servlet that is already instantiated and has already been injected with the component (no more injection, no more instantiation, no more Object to create...)

So I've got to ask this question: what GC problem we're trying to solve here?

Some of the static methods are understandable but if Component A requires Component B and both of them have already been instantiated _once_ and have been properly wired up together, we have 2 Objects for the lifetime of the application.

I'd pay for a wee bit extra hardware for the cost of maintainable code ;)

Discipline (and knowledge of object graph) definitely help to reach to that point.

You probably meant to respond Marco's comment? And afaik Stackoverflow is written in ASP.NET, not Java.

C# or Java, the whole request pipeline processing should be more or less the same. Unless one platform does things less efficient than the other.

Rails and Django do things differently as to my knowledge they do it by spawning processes instead of threads. There are app-server for Rails or Django that may use threads for efficiency/performance reason but I am under the impression the whole LAMP stack is still 1 request 1 process (even though they re-use those processes from a pool of already allocated processes).

> The bad solution is to add, factory patterns, dependency injectors and other useless layers just to try and keep your design both well encapsulated and testable.

Not doing this will cause much self-righteous snickering from some.

Insofar as this is your guess, it's down to the language. In Ruby or Javascript it's very easy to TDD. In Java, much less so. (I'm not sure where C# lies on the spectrum).

Somewhere in the middle, but leans towards Java.

Sounds about right.

At the risk of exposing my slow transmutation into a hipster programmer, I and a colleague found that mocking in Go was much easier than we anticipated, thanks to the way interfaces work.

What I'd like to see is how much cache hit they add on each level of cache. I remember some presentation of the facebook images stack where less than 10% (if I remember well) of the requests actually hit the disks; it would be interesting to see the patterns for the whole SE galaxy.

As someone mentioned above, SQL Server hits memory first, and as they have 384GB of memory, most of the requests would sit there. That's just on the DB server. In the linked article it all authenticated requests hit the DB, with anon users getting a cached copy.

It totally depends on what we are caching (e.g. the community bullettins or hot meta questions get a gazillion hits...)

Relatively new stack dev here. I came in on the other side of the fence of a lot of these technologies (bread and butter is Python in Flask with Mongo on Heroku on a mac) but since I started here, I've been constantly and pleasantly surprised by how performant everything here has been despite my biases. It's mighty fun.

Stack Overflow is the example I'm forever using when arguing against premature scale-out. For non-trivial applications scale-out has substantial complexity costs attached to it, and the overwhelming majority of applications will never truly need it.

It's frustrating to see time wasted obsessing over trying to maintain eventual consistency (or chasing bugs when you failed to do so) on systems that could quite happily run on a single, not that beefy machine.

>For non-trivial applications scale-out has substantial complexity costs attached to it

Forgie me if I am misunderstanding you - but non-trivial applications can actually require scale out.

From my perspective, StackExchange is not techinically that complex. They have built a very efficient, cost-effective and performant stack for their singular application and that works very well for them, but the complexity of their forum is not an extraordinarily complex problem.

By nontrivial I mean 'application which, in your document database, requires multi-document updates to perform some individual logical operations'. This is a rather low bar :-).

The fact is, the vast majority of projects that programmers are working on are less computationally complex than stack overflow. That's not to say that forum software is all that complex, more that most problems are pretty simple. Of course there are real reasons to use scale out - I simply advocate thinking hard about whether your problem will ever truly need it before taking the substantial complexity hit of coding for it.

Er, I don't seem to be able to edit, but I guess I should specify that I'm really talking about scale-out of writes here. Read scalability is an obviously easier problem..

> AWS would cost a fortune.

Isn't that comparing apples to oranges? Taking any application built for large servers and putting the same application onto a cloud-based architecture will be more expensive.

Cloud-based architecture requires ground up differences in how the application is built. Now whether or not it is better to use a cloud-based approach or traditional bare metal is highly subjective and isn't my point.

>'Isn't that comparing apples to oranges?'

On another site, in another context, it probably would be, but here it's really presented as a contrast of scale-up versus scale-out - something the regular audience of highscalability will certainly grok.

In context...

'Stack Overflow still uses a scale-up strategy. No clouds in site. With their SQL Servers loaded with 384 GB of RAM and 2TB of SSD, AWS would cost a fortune. The cloud would also slow them down, making it harder to optimize and troubleshoot system issues. Plus, SO doesn’t need a horizontal scaling strategy. Large peak loads, where scaling out makes sense, hasn’t been a problem because they’ve been quite successful at sizing their system correctly.'

It's an acknowledgement of their relatively unique strategy and the short list of caveats that make it possible.

I love that all the traffic is running through a single HAProxy box - a software load balancer (vs a hardware LB like F5).

And they've moved SSL termination to it.

That's a great quality product, and easy to setup.

Edit: I work I'm software that supports MySQL,PG and SqlServer. SqlServerr seems to be the most stable and consistent in performance - they're hard to kill! One of my few liked MS products :D

This is the fascinating part to me, their SSD have not failed:

Failures have not been a problem, even with hundreds of intel 2.5" SSDs in production, a single one hasn’t failed yet. One or more spare parts are kept for each model, but multiple drive failure hasn't been a concern.

Yep, still true. We lost one Intel 910 drive (PCIe SSD), and that was very abnormal - died so soon it was almost DOA. We hooked up directly with Intel for them to The replacement is still going strong as is another 910 we have.

All of those 2.5" Intels though, still trucking along! We're looking at some P3700s PCIe NVMe drives now, blog post coming about that.

There's a real difference between consumer and enterprise grade SSDs.

For comparison, for a while we nearly had a 1 to 1 dev to killed (consumer) SSD in work machines (see: http://blog.codinghorror.com/the-hot-crazy-solid-state-drive...). Granted, we did adopt them (see: http://blog.codinghorror.com/the-state-of-solid-state-hard-d...) for local dev use much earlier than we started putting them in servers.

I've killed 2 in my time here, though I do horrible big-ish data things that are either not SSD friendly or actively hostile.

I would agree that hundreds of drives is a large enough sample that you would expect to see some failures.

That said, all the SSDs I've had have taken repeated pounding with no complaints. I accidentally swapped a few terabytes to swap on an SSD, and was simply surprised that the job I was doing finished faster.

One smart thing they are doing is putting different sites on different databases. It effectively as a kind of partitioning to allow for horizontal scaling if needed.

If nothing else, it keeps less data in each table, so queries should be faster due to smaller datasets.

Sounds like they know what they are doing.

beware that you can over-optimized this.

for example, do not put every single customer of an saas solution into a separate database.

I'm curious, why not? If your saas was small enough and had few enough customers with large data needs, separating them each out into a separate database seems like a viable solution.

yeah, if you have an experienced team, you can easily refactor your one-customer-per-db model into something that will handle 1,000 new customers signing up every month. given a good enough team, you can even do this while doing other sort-of important things like customer support, bug fixes, and new development.

however, inexperienced developers and teams are not good enough, or fast enough, to do that, and end up painting themselves into a corner when they have thousands of live customers to support and need to change their entire application architecture + database schema when their backup, replication, and housekeeping tasks choke on 1,000+ databases.

and oftentimes, one-customer-per-db also means one-instance-of-application-per-customer, another anti pattern to avoid.

we see this kind of stuff ALL the time. it happens a lot - people make terrible decisions and then are stuck with them 5 years down the line and are looking at a monumental cost to redo. not everyone is experience enough to work their way out of an awful situation like that.

It is fascinating to see how well its server it holding out. With Plenty of headroom to spare!.

With Higher Capacity DDR4, and Haswell or even Broadwell Xeon, and PCIe SSD getting cheaper and faster. It is not hard to imagine they could handle 3 - 4 times the load when they upgrade their server two years down the line.

But i would love to see Joel's take on it though. Since he is a Ruby guy now, and I dont think you could even achieve 20% of SO performance with RoR.

It would definitely be several orders of magnitude slower. That's the price you pay when you are using a language like ruby: it's very productive and fast to iterate in, but way less performant. At the end of the day, what you use depends on your needs.

Arh... I actually meant Jeff, not Joel.

Hardware is cheap, Programmers are expensive! But i am sure there is a line where this crosses over though.

The point of this article is that it costs more programmers to maintain a cloud than to maintain a single master database

Can anyone provide context on "Heavy usage of static classes and methods, for simplicity and better performance."

Is performance _really_ an issue here?

I find the opposition to static classes odd. Unless you need the data encapsulation of an object, why force people to create an object to call functions? I find too much C# code loves creating objects for no real purpose other than that's how it works.

For SO, if we guess they max out around 2000 req/sec, then if a bunch of objects are being allocated for each request, there could be added GC pressure. From my own experience developing much higher-scale managed code, allocating objects is a real pain and people will take a lot of effort to avoid it.

I think the real issue (although it doesn't seem to be an issue for SO) is testability. AFAIK, static classes cannot be mocked, and you are pretty much boxing yourself into only doing integration tests for everything. Not necessarily a bad way to go, just lots more setup per test.

It completely depends if you do this:

    public static decimal DiscountValue(Order order)
Or this:

    public static decimal DiscountValue(decimal orderValue, decimal orderDiscount)
Or even this, which would farm the work out to SQL:

    public static decimal DiscountValue(int orderId)
The first would require a mock, the 2nd wouldn't and the third isn't unit testable.

It's a bit of a contrived example, as you're more likely to do the third method if you're trying to get some sort of summary data about an order out which would mean you don't really need/want to load the whole order object.

Yep, that used to be one of the main reasons to have DI in Java. But now that everything (even static methods) can be mocked, that reason is no longer valid. Is this not possible in .NET?

A singleton object doesn't need to be re-allocated.

Static classes are an easy way of implementing the Singleton pattern in C# ...

A singleton can implement interfaces, static classes cant.

I could be mistaken, but I think this is in regards to that post Sam Saffron wrote a few years back about getting SO to do Gen.2 garbage collection less often, and using static classes and structs were one of the ways that they fixed it.

Can someone help me understand what they mean by: "Garbage collection driven programming. SO goes to great lengths to reduce garbage collection costs, skipping practices like TDD, avoiding layers of abstraction, and using static methods. While extreme, the result is highly performing code."

I assume it means using C# more as a functional language than an OO language, so that most garbage collection is of short-lifetime objects and therefore cheaper to collect (generation 0).

I take that statement to mean C# as C. Functional languages are terrible for garbage collection. Static methods are great since they are static, loaded once and probably in-lineable by the compiler/JIT.

The usual wisdom in clr-land is that the generational garbage collector can cope just fine with a lot of small, short lived objects. It absolutely doesn't like big, long lived objects that reach G1, G2 or the large object heap and must be traversed for a full gc.

This means that normally, you want to avoid data structures that introduce a lot of objects trees, linked lists and stuff like that. In some circumstances you might lose some speed, but win big time when the full GC comes. Additionally, It may be worthwhile to consider avoiding heap allocations by organizing your data as value types instead of ordinary objects.

It depends on your definitions of "a lot" is. It can scale pretty high by default, but when you're doing hundreds of millions of objects in a short window, you can actually measure pauses in the app domain while GC runs. These have a pretty decent impact on requests that hit up against that block.

The vast majority of programmers need not worry about this, but if you're putting millions of objects through a single app domain in a short time, then it's a concern you should be aware of. This applies to pretty much any managed language.

Are you sure that this is really because of Gen0 collections all by themselves, and not because the short cadence of collections leads to a lot of premature Gen1 promotions? I could imagine something like

   var a = InitGiantArray<Foo>();
   var b = InitGiantArray<Bar>();
could leave you with a lot of foo garbage in Gen1 or even Gen2, right were it really hurts. On the other hand I'd be surprised if something akin to

   do {
       Tuple.Create(new object(), new object());
   } while (true);
but with something useful thrown in would suffer much under GC pauses. Looks like I have to test it :I)

That sounds like a roundabout way of saying procedural.

So funny because the most important application for programmers uses the complete opposite of almost every web development trend out there.

Also the SSDs are funny to me because I recently had someone who was supposed to be an expert say that SSDs were a fatal mistake with a high likelihood of massive failure.

Soooo... a average of 200 pageviews a second - I suppose that doesn't sound as cool.

Or, to put it another way; 1 server per 8 pageviews per second

Does not seem that efficient when you put it like that. Even a resource hog like a wordpress install can eclipse that level.

*made correction- still not impressed

Well yeah, but these aren't just plain old text file of course. Database queries, assets, async clients, the whole lot.

I think you're missing a 0 somewhere : 560000000 / (60 * 60 * 24 * 31) = 209

You're off by a factor of 10. I have it at just over 200 page views per second. Also, that's the average load over a month. The same hardware needs to be able to handle peak load.

and they also mention that they could possibly run on a lot less hardware... 7-8 machines + network gear (7 if the load balancer is an appliance, 8 if its a server).

> Redis has 2 slaves, SQL has 2 replicas, tag engine has 3 nodes, elastic has 3 nodes - any other service has high availability as well (and exists in both data centers)

Does this mean the 2 Redis slaves, 2 SQL replicates etc. are in a secondary data center?

We have 2 SQL Clusters. In the primary data center (New York) there is usually 1 master and 1 read-only replica in each cluster. There's also 1 read-only replica (async) in the DR data center (Oregon). If we're running in Oregon then the primary is there and both of the New York replicas are read-only and async.

For redis both data centers have 2 servers, the chain would look like this typically:

   - ny-redis02
     - or-redis01
       - or-redis02
Not everything is slaved between data centers (very temporary cache data we don't need to eat bandwidth by syncing, etc.) but the big items are so we don't come up without any shared cache in case of a hard down in the active data center. We can start without a cache if we need to, but it isn't very graceful.

Here's a screenshot of our redis dashboard just a moment ago (counts are low because we upgraded to 2.8.12 last week, which needed a restart): http://i.stack.imgur.com/IgaBU.png

Nick Craver - Stack Exchange Sysadmin & Developer

Great info, Nick. Thanks for sharing.

I would love to see comparing costs of their solution against cloud solutions as Azure.

They say that hardware is cheaper than programmers. Wouldn't that speak for the cloud solution (software based scaling instead of admins)?

Hardware is cheaper than developers and efficient code. But you're only as fast as your slowest bottleneck and all the current cloud solutions have fundamental performance or capacity limits we run into.

Could you do it well if building for the cloud from day one? Yeah sure, I think so. Could you consistency render all your pages performing several up to date queries and cache fetches across that cloud network you don't control and getting sub 50ms render times? Well that's another matter, and unless you're talking about substantially higher cost (at least 3-4x), the answer is no - it's still more economical for us to host our own servers.

And yes I realize that's a fun claim to make without detail. As soon as I get a free day (I think there's one scheduled in 2017 somewhere), I'll get with George to do another estimate of what AWS and Azure would cost and what bottlenecks they each have.

Nick Craver - Stack Exchange Sysadmin & Developer

So doing it right works. Nice :) I like these articles as it gives good examples/help supporting decisions when you dont want the crappy hack, but the do it right instead.

I'm grateful to them them for being so transparent about how their systems work. It's rare to get this level of detail on how a large, successful site operates.

Anyone else not quite expecting StackExchange to be using a Microsoft stack?

Joel worked at MS on office, Excel if I recall correctly. Jeff's blog is named for a feature from a book published by MS. I would be more surprised if it was MS free.

"I started my career at Microsoft, a little software company outside of Seattle, where I was a program manager on the Excel team. My area was “programmability” and most of what I worked on in those days was replacing the Excel macro language (XLMs) with Excel Basic, and providing an object-oriented interface to Excel. Excel Basic became Visual Basic for Applications and the OO interface is what you know as OLE Automation, a.k.a. IDispatch."

-- http://www.joelonsoftware.com/AboutMe.html

Trello (by Joel's company) is OS agnostic (running on MongoDB and node.js). But from memory they did this to learn new tools.

You mean to tell me Joel is partially responsible for the abomination known as Excel 2007 where I can't pull-drag the damn window from screen to screen (but all of the other Office 2007 products do!)

He seems to have left Microsoft in 1995, so no, not really.


It's a Microsoft/Linux hybrid. SQL Server on the backend, Linux for monitoring, Redis, HAProxy, etc.

They discussed it quite a bit on the podcasts. I found those a really great resource since Jeff and Joel were discussing the website as they were building it and "aired" the reasons for doing specific things.

This has nothing to do with Joel. The decision was from Jeff Attwood. Jeff was a c# veteran and MS stack was the easiest decision to come up with an MVP and the journey from there.

They simply decided to choose the best.

Anyone who knows Joel Spolsky would expect a MS stack :)

It's difficult to take pragmatic advice from someone who took the least pragmatic (and most expensive) route for a startup web company.

Programmer time is a factor that comes into play when you build something and re-training your people to use a stack they are not familiar with + the associated mistakes can get expensive in a hurry.

If you happen to be familiar with the MS stack and are able to factor the license fees into your business plan there is no strong reason against and many for using that particular stack.

MS programmers tend to be a bit more expensive than people working with one of the many Linux offerings but that's not such a huge difference that it would become the deciding factor and if anything the MS stack is more performant than the Linux one on the same hardware (and I write that as an anti-MS guy, I strongly believe that politics and tech should be kept separate when it comes to discussing systems relative to each other).

So even if there are lots of reasons why I would not have made that particular choice I can completely understand why the SO people made their choice the other way.

> MS stack is more performant than the Linux one on the same hardware

Could you please provide some evidence to support that?

Without breaking any promises to others here is an article with conclusions quite similar to mine:


Keep in mind that it hardly ever is the webserver that is the bottle neck.

My theory on why this is the case is very simple: MS can afford to throw vast amounts of money at optimizations that are next to impossible in Linux simply because the coupling between the layers in Linux is looser. And that's a good thing, it translates into better security and fewer bugs.

As always, optimization alone is not a reason enough to go down a certain path. But for raw speed on requests it's fairly hard to beat IIS, if that's what you're after (I'm usually not, and even when it matters I can comfortably saturate most outbound links from a single server doing light processing, and as soon as the processing becomes the bottle-neck the CPU cache size, RAM and so on matter more than your OS, but given identical hardware a 'dirty' approach should yield better results at a cost of unreliability/complexity).

Tight coupling and money are a good arguments, thanks. However, i believe we are comparing apples to oranges here. IIS uses it's own kernel module, HTTP.SYS - http://www.microsoft.com/technet/prodtechnol/WindowsServer20... So, looks like having specific kernel api optimized for your particular usecase is an advantage, which IIS has and others don't.

It turns out there are kernel-mode webservers (or there were, at least), which seriously outperform user-mode ones. https://www.usenix.org/legacy/events/usenix01/full_papers/jo... (page 11; article is old, but anyway)

> the MS stack is more performant than the Linux one on the same hardware

This is provably not true.




There is a good reason majority of the World-Wide-Web is run on 'Nix stacks -- and there is a good reason majority of servers in general are 'Nix stacks. Also, an overwhelming percentage of super-computers (I know we are talking about webservers, but it illustrates performance capabilities) are running a 'Nix stack.

There is also cost involved, as well as flexibility of the stack. With a 'Nix stack, it's infinitely flexible, not so much the case with a Microsoft stack. Also the default MS Stack comes with a lot of additional OS overhead that is not present in the 'Nix stack, which reduces any said box's scalability.

Without trying to turn this into some sort of flame war - I was merely trying to suggest that the MS stack was not/is not the best choice for a highly scalable website. Take the top 10 websites -- they all run on 'Nix. The top cloud providers (except MS Azure), all 'Nix. These are companies that can easily afford MS licensing, so that's not part of the equation.

A MS stack at this scale is unusual to say the least.

That's not to say it won't scale (as evidenced by the SE team), but it doesn't mean there isn't a better alternative that saves more money and scales better with less hardware, etc.

It will be very hard to get into a flamewar with me supporting the Microsoft side of things. That said, none of the links you posted prove anything regarding performance, they do prove something about total cost of ownership, which once you factor everything in leans towards Linux for most installations, however, just looking at the situation for SO seems to me to suggest that they were more comfortable doing the initial development on the stack they were most familiar with and when the license costs started to count against them they used Linux machines to scale out.

Which is a pretty smart decision. Whether an MS stack at this scale is unusual does not say anything at all about whether or not it performs well.

I think we agree for the SE team, given the founders were MS stack familiar, it makes sense.

However, I have to disagree on your assertion the 'Nix's are less performant than the MS stack. The top tier web companies are not running 'Nix because TCO is lower; for most of these companies licensing costs are negligible and if it helped to scale better, it may even save them money going with MS stack... but they don't go with a MS stack...

Sometimes a RHEL license can even cost more than a MS license. Couple that with an Oracle DB back-end, and you easily have a much more costly setup than the MS equivalent. It's not about the money.

These companies are choosing the 'Nix stack because it is performing in an entirely different level than the MS stack. Everything from tiny embedded systems with 64k ram, up to monster systems with TB's of ram.

> "I Contribute to the Windows Kernel. We Are Slower Than Other Operating Systems. Here Is Why." http://blog.zorinaq.com/?e=74

I think you're forgetting BizSpark here, which we were in: http://www.microsoft.com/bizspark/

IMO, it has one of the best tooling sets of any platform out there (Visual Studio) which saves lots of programmer time and increases productivity. With BizSpark the initial cost argument is out...so how's that not pragmatic?

I'm biased of course, but I think it worked out pretty well.

Nick Craver - Stack Exchange Sysadmin & Developer

If you have extensive Microsoft experience and take advantage of something like BizSpark* (although one suspects with their connections they wouldn't have needed that) it's hard to see how it wouldn't be plenty pragmatic and inexpensive.

* Currently three years of free licenses, all software you download during that time is free forever, and discounted MSDN subs thereafter.

I'm not sure I agree. Pragmatic choices based on using what you know and not just choosing a toolset because everyone else is using it.

How in this scenario is the MS stack the "least pragmatic" route?

prag·mat·ic adjective \prag-ˈma-tik\

: dealing with the problems that exist in a specific situation in a reasonable and logical way instead of depending on ideas and theories

Why does everyone in Silicon Valley run with expensive Apple hardware when Linux can run on any CPU?

I might be a bit late to the party, but if anyone is interested in copying the Stack Overflow setup, we're offering startups free software, 60k in Azure credits for accelerator-backed companies and engineering mentoring, if required. Find me at felixrieseberg.com or felix.rieseberg@microsoft.com and I can set you up.

Jeff Atwood hacked prototype and next several versions, and since he's a .net guy (and ruby guy now), that's why .net was picked. Lot about tech stack planning can be heard in first few SO podcasts, don't even know if there're still available online somewhere

I haven't listened to these yet, but I think this is what you are referring to:


No. Why not?

One must wonder, how much better this setup would of scaled had it been a more appropriate webserver stack such as Linux/BSD + Apache/Nginx or similar. Perhaps less boxes would be needed, or with the same number of boxes, more concurrent users.

> Microsoft infrastructure works and is cheap enough

not as cheap as just paying your team to maintain the boxes. I wonder how many times in the past few years SE has needed to call Microsoft for support? Would a RHEL license still be cheaper (probably)? If something like CentOS then there would be no support cost unless you need to bring in an outside contractor (for a particularly nasty issue).

> One must wonder, how much better this setup would of scaled had it been a more appropriate webserver stack such as Linux/BSD + Apache/Nginx or similar

I guess you're gonna have to define "more appropriate" for me, I have a feeling we'll have a fundamental disagreement there. We could (and have) run the entirety of our peak load with 1 SQL server and 2 web servers (and not pegging them). Don't forget we run with a crazy amount of headroom at all times, by design. I'm not sure how much better you picture that scaling on a linux environment.

> not as cheap as just paying your team to maintain the boxes

That makes a lot of assumptions about the team, their expertise, and what support issues would arise.

> I wonder how many times in the past few years SE has needed to call Microsoft for support

In the past 4 years I've been here? Twice. Both to report bugs in CTP versions of SQL server. At no cost, and we improved SQL server as a result of being testers. We have a very good relationship with Microsoft and talk to the developers that make the tools and platform we use in order to make life better for both of us. We do the same thing for redis, Elasticsearch, etc. It's the same reason we open source almost all of the tools we make.

> If something like CentOS then there would be no support cost unless you need to bring in an outside contractor (for a particularly nasty issue)

We use CentOS for all our linux systems and are deploying new servers on CentOS 7 now. We'll me migrating the others in the coming months. That doesn't mean it's free. Developer or sysadmin time to control Puppet deployments and such still eat some factor of time.

Nick Craver - Stack Exchange Sysadmin & Developer

Where I come from, it is considered bad engineering to build a product that can only run on a single platform, can only run on one particular OS, and the product is at the mercy of future decisions by a 3rd party. We go out of our way to ensure we are not dependent on any single thing.

Even if it was the best choice at the time to go the Microsoft route, it may not be in the future... however -- SE has zero choice now otherwise they'd have to re-write their entire product... that sucks as a business because you have little choice over your own product now.

>Where I come from, it is considered bad engineering to build a product that can only run on a single platform, can only run on one particular OS, and the product is at the mercy of future decisions by a 3rd party.

This seems like a bit of a waste of resources to try and run an internal bespoke application on every platform imaginable. Now, if you are developing a product that you are then going to sell to other people to run on their own hardware this make more sense.

There are some very big benefits to writing to a specific platform, especially in the performance space.

In the end of the day it is a trade off between trying to eliminate every 3rd party dependency (next to impossible) or picking a solution or company you think will be around for a long time, and forging strong relationships with them.

We regularly talk to people at Microsoft (and all of the people who create and build to tools we use), give them feedback, and get bugs fixed. There is very little that comes down the pipe from them that we are not away of ahead of time and in some cases have helped shape through early access programs.

> SE has zero choice now otherwise they'd have to re-write their entire product...

This is not true at all, we have choices if MS decided to blow everything up. Not great choices, but we have them. Choosing between two or three crap options does not mean that options do not exist.

-George (SE Sysadmin)

> One must wonder, how much better this setup would of scaled had it been a more appropriate webserver stack such as Linux/BSD + Apache/Nginx or similar.

Why is the Microsoft stack not appropriate? I am not aware of any spectacular performance difference between Apache and IIS, or Windows Server 2012 and Linux, and clearly the SO team is not religiously Microsoft, so it's hard to see why they wouldn't have switched if it was a clear win. MSSQL is actually, in my experience coming from MySQL, amazing - I'm back on MySQL and Postgres now and I often wish I wasn't. I'd say it's the most underrated of MS's products, probably because of the expense of the Enterprise/Datacenter editions.

When you encounter people who don't respect your judgement as much as they "should", I'd suggest it may be because "how well it scales" doesn't guarantee "appropriateness". In most real world situations, the specifics of the chosen stack are rather inconsequential - both MS and *nix stacks perform more than adequately. In my opinion, it is unwise to take the advice of zealots who refuse to acknowledge this reality.

I never ever ever never use SO's search feature: I land on the result through Google. So I guess they should give Google some credit for the claimed performance?

Google isn't serving Stack Exchange's pages to you. They published stats on pages served by their own servers. Google has nothing to do with it.

You miss the point: I never use the search feature of the site, I go to the answer/subject directly from Google's results. I do not think Google is running my keywords against SO's server in 'real-time' and returning the answers as results, Google returns indexed/cached copies. I am sure that many users behave like I do, no? They might as well keep a static copy of all pages in RAM and only update when new comments/answers are posted.

I haven't been able to use SO since they shut off myopenid last year. I can't log in, so can't change my credentials ;-( Any ideas HN?

You can email us (as George said) or you can just punch your email address into the account recovery tool and get an automated email telling you what to click within the space of a few minutes: http://stackoverflow.com/users/account-recovery

We... probably need to make that link a bit more obvious.

Thanks, that was easy. So now I just have a password instead of linking to another account @facebook or google+ I guess that works ;-)

Email team@stackexchange.com they should be able to help you out.

(I'm an SE SysAdmin )

Thanks, I followed the recovery link provided by another SE person.

"Stack Overflow still uses Microsoft products. Microsoft infrastructure works and is cheap enough, so there’s no compelling reason to change." <- there's your problem :-)

54th ranked site in the world with a small number of servers and apparently very maintainable and stable systems.

I would love that problem.

"with just 25 servers"

This really needs to stop. Go to stackexchange.com and you'll find that more than half of the HTTP requests are to cdn.sstatic.net.

Looking up the IP addresses for cdn.sstatic.net returned five entries for me, all owned by CloudFlare. None of the CloudFlare servers that they are using seem to be in that 25 count.

Sure, these are all for static assets, that isn't the point. There are way more than 25 servers being used to serve the StackOverflow sites.

Stack Exchange dont own their CDN. If you really start couting external servers, you would be adding DNS, possible client side proxy servers, client side network infrastructure, etc... these are things that OTHER people do, not SO.

I would say though, CDN doesn't fall in the same category as DNS. Serving static assets off other networks does take a huge load off the SE infrastructure and should probably be mentioned in the article.

Only in terms of bandwidth, though. The CPU load by processing is the interesting figure, not side noise by images/js/css.

It's how many servers you would need to provision to do the same, with the same code base. It's basically the number of directly active servers our code runs on.

Sure, we use a CDN, but not for CPU or I/O load, but to make you find your answers faster.

Sort of true, but static assets are also trivial to serve. What they get by using CloudFlare is getting their assets as close to end users as possible to reduce latency. This isn't a complex problem to solve (for a single site), just a really expensive one that very few single app companies could ever justify solving themselves. I believe Google and Amazon are the only ones with their own significant CDN infrastructure, everyone else outsources it.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact