

Stack Overflow running short on space - gavreh
http://nickcraver.com/blog/2012/02/07/stack-overflow-short-on-space/

======
ghshephard
The key takeaway for me was this:

"...the server goes to 288GB (16GB x 18 DIMMs)…so why not? For less than
$3,000 we can take this server from 6x16GB to 18x16GB and just not worry about
memory for the life of the server. This also has the advantage of balancing
all 3 memory channels on both processors, but that’s secondary. Do we feel
silly putting that much memory in a single server? Yes, we do…but it’s so
cheap compared to say a single SQL Server license that it seems silly not to
do it."

Clearly - if you are paying 10s of thousands of dollars for a database server
License, it makes sense to fully utilize the ones you've purchased.

Also, in my experience, databases these days pretty much have to stay in
memory in order to be performant whatsoever. I think the rule I heard from a
FaceBook DevOps manager who was interviewing with us was "If you touch it once
a day, it goes into SSDs, if you touch it once an hour, it goes into memory."
- of course, at a certain size, you _have_ to scale horizontally with those
in-memory databases as well.

~~~
StavrosK
> Clearly - if you are paying 10s of thousands of dollars for a database
> server License, it makes sense to fully utilize the ones you've purchased.

I suspect this is _the_ reason they're using a single server. It would cost a
few more tens of thousands of dollars to install another one, versus the zero
overhead of just upgrading the existing one.

As an OSS user, I often take for granted how easily I can go "Oh, the database
reads are too slow? Let's throw another Postgres slave at it and see how that
goes." After all, that expense is usually minimal.

~~~
ColdAsIce
Why dont they just change to an open source rdbms?

~~~
joshu
For what it's worth, the word "just", as used above, almost always indicates
that the speaker doesn't understand the problem.

~~~
iopuy
I can't count the number of times I've heard, "I just need a programmer!"

------
bsg75
The scale of the problem does not seem all that difficult to overcome (as
indicated by the author). What I am interested / pleased to read is that a
fairly popular, high traffic site is backed by a "plain old" RDBMS. MSSQL
even.

~~~
jacques_chester
SQL Server is actually fairly well regarded as a platform these days. It's a
"grown up" database; not quite as trusted for very large sites as Oracle, DB2,
Sybase, Teradata etc; but definitely capable of solid "midrange" performance
with the usual Microsoft abundance of features and tool support.

I think that what's happening here, really, is the emergence of SSDs as a
server platform standard. SSDs drastically revise the algo-economics of data
storage.

From the mid-90s until now, demand for structured data storage has surged,
apparently exponentially (though realistically, such a function would
eventually be sigmoidal). So dramatic has this surge been that it overwhelmed
the somewhat linear rate of I/O performance improvement for disk drives.
Meanwhile, Moore's Law has more or less held and so in-memory architectures
have become much more popular, starting with memcache.

But SSDs change the equation, because they too can track Moore's Law. So they
can take mature disk-bound technologies like RDBMSes and breathe new life into
them. And that's exactly what's happening.

At the turn of the millenium, if you were given StackOverflow levels of
traffic _and_ everything from 2011 technology except for SSDs, you'd have to
spend dozens of times as much to approach the same level of performance.

------
reuser
After all of their evangelism about Windows servers, it is rich to see this:
"but it’s so cheap compared to say a single SQL Server license that it seems
silly not to do it."

Yes, that is the problem isn't it?

~~~
shingen
I don't see what's "rich" about that at all.

The Win stack is a trivial cost in their business compared to the revenue
potential of StackExchange.

The biggest cost in their business is labor, not licenses. That will always be
the case. License costs will continue to shrink toward irrelevant as a
percentage of their sales over time.

Given their wild success, there's no great argument to be made against what
they've done: quite simply, it has worked, and worked very well. That is all
that will ever matter, regardless of what argument is thrown around.

~~~
reuser
If license costs are so big that they become this important a factor in
deciding how you will scale, they have become an operational problem of some
size. Probably not a size which is imminently threatening to Stack Exchange.
But it is not nothing, or it would not figure so prominently in this shop talk
about how to upgrade.

It's certainly a factor worth weighing when you consider what technology
you'll build on top of.

In no place have I "argued against what they've done" or implied that the
company is a failure. Yet money IS a part of scaling, and managing expenses IS
a part of business. I'm happy to take your word that Stack Exchange makes so
much money that it cannot ever matter how much their licenses cost. But in the
context of evangelism for others to make the same decision, I have to observe
that the kind of reasoning casually mentioned in the article implies something
that would be negative for many other businesses.

If you are offended by that sort of discussion of reality then you have too
thin a skin (or a conflict of interests)

------
cmer
I was surprised at how small their database is!

We (Defensio) store about 300gb of data, and most if it is purged after 60
days. We're quite far from being 274th largest website in the world as well, I
assume.

It's just very interesting to see how such a huge website can use so little
storage.

~~~
jacques_chester
How are you storing your data? And is it compressed?

... I suppose for spam, the problem is that as a side-effect of trying to fool
Bayesian filters, there's a lot of incompressible gibberish.

~~~
cmer
It's mostly meta data and hashes of stuff. Uncompressed. Most of it in
MongoDB, which is probably one of the worst technical decisions we made, for
all the reasons that have been discussed at length.

~~~
yourapostasy
If you had a chance to do it over again, which store would you have picked
instead of Mongo?

~~~
cmer
I haven't researched this much, but my gut tells me Cassandra and/or Postgres.

One of the main reasons why we escaped MySQL was that we were stuck with out
schema. Adding an index or a column basically copies the table to a temporary
table, apply the change and copies is back in place. That's my understanding,
which might be a little bit wrong. All I know is that making one very simple
change took hours and hours because our tables were so big. It was quite
ridiculous. We couldn't afford the downtime, so we were stuck with what we
had, which was no longer sufficient for our needs.

I understand that Postgres doesn't have that limitation, but since our schema
used to change so much and we has so many joined tables, the MongoDB data
structure seemed like a great fit. Mongo was also amazingly fast in our tests,
but those tests didn't properly take into consideration the global write lock.

The global write lock problem is a very well known issue now but we started
using MongoDB before 1.0; way back when nobody really knew anything about it.
At least people are now more informed, although it doesn't seem to prevent
many from making the bad decision of using it.

------
serverascode
I have 1.7TB of FusionIO PCI SSD at work and it's not even doing that much
right now.

These guys should open their wallets and get more than a little bit of SSD,
prob PCIe, plus max out on RAM. 96GB is what most would _start_ at now in a
larger HP server.

~~~
gbeech
Those drives still cost about 20k or so to purchase, for only about a 25%
increase for reads and 10% for writes
(<http://blog.serverfault.com/2011/02/09/our-storage-decision/>) why would we
pay that extra for so little gain?

Also, with FusionIO you are putting all of your eggs in one basket - drive
wise. If that card dies you are done. In short we don't _need_ the FusionIO so
why would we put out that kind of money for it?

Also we went to 96GB about a year or so ago when we setup those boxes it was a
reasonable amount of memory to put in now - we are maxing the thing out at
288GB now.

~~~
mrkurt
Did you guys do better than about $13/usable GB of storage? When I was
shopping, 6x Intel 710s cost about that, and that's about what Fusion IO
storage seems to go for.

~~~
nick_craver
We'll end up paying about that for the 710s, but you have look at what we're
getting for the same price. Fusion IO card dies: we're dead in the water. It
takes 2-3 of the Intel 710s (in a RAID 10 like they'll be) dying to do the
same. Given that the IO of our storage isn't close to being a factor, we'll
always stick with the much more robust route for the same price.

~~~
mrkurt
Ah. Well if your RAID controller dies, you're dead in the water no? My
understanding of the Fusion cards is that they're roughly as resilient as a
RAIDed SSD setup.

~~~
nick_craver
That doesn't really happen, it's an extraordinary event that a raid controller
dies. But _can_ it happen? Sure, anything _can_ happen...that's why we have an
identical hot backup server only minutes behind on the database for just such
an occasion.

------
salsakran
It seems truly bizarre to me that in 2012, there is a major operation that is
still trying to vertically scale such a trivially shardable DB.

Scaling a static Q/A site with a few widgets that require user info/counters
should be table stakes.

~~~
staunch
The situation is the opposite of what you seem to think. It's _far_ easier and
cheaper to scale up than out than it has ever been. Just 5-6 years ago it
would have been nearly unthinkable to have a half a TB of memory, 16 CPU
cores, and a TB of SSD for a few thousand dollars.

Most people's traffic/data hasn't grown nearly as fast as Moore's law. Scaling
up makes more sense than ever. It may not be hip but it's the right choice for
99% of cases.

~~~
marshray
Just be ready for the day when you max out your IO bandwidth, or some other
hard-to-forsee bottleneck.

~~~
InclinedPlane
It depends on how fast you're growing, since technology keeps chugging along
too.

Case in point, here's a 16TB SSD with 560 MB/s peformance:
[http://www.engadget.com/2012/01/09/ocz-goes-ssd-crazy-at-
ces...](http://www.engadget.com/2012/01/09/ocz-goes-ssd-crazy-at-ces-leaves-
no-port-unplugged/)

~~~
learc83
You might be able to keep cramming in more RAM until memristor storage makes
RAM obsolete.

~~~
jacques_chester
I know it's a cliche, but memristors are going to cause an earthquake in the
algo-economics landscape.

~~~
jules
I have a hard time seeing whether memristors are really going to be the future
of all storage or whether it's just a fad. Can somebody who knows something
about the topic explain this?

~~~
InclinedPlane
Here's a useful, but lengthy, talk on the subject:
<http://www.youtube.com/watch?v=bKGhvKyjgLY>

Memristors look extremely likely to be able to provide non-volatile storage
with latency, access speeds, and density higher than existing DRAM and flash.
They also look like they can be used to make FPGA-like devices which can
approach the speed, power efficiency, and logic density of custom designed
ASICs but with extremely fast reconfiguration speed. And those seem to be the
less fascinating aspects of memristor technology.

------
nchuhoai
It is very intriguing as a front-end guy to see what server admins have to do.
As a front-end guy, while I have to think about performance, I never have to
be afraid about running out of space or random disk I/O. What do front-end
founders do when they unexpectedly run into traffic spikes? I'm so glad
services like Heroku take that from me.

~~~
xxpor
And as a backend guy, I'm glad I don't have to worry about things like why
doesn't this look right in IE, and if I make this button this size vs. this
size what will the sales impact be ;)

------
ck2
Wait, SO is all on one single node? Or are there reverse proxies?

I guess the static content is CDN but all dynamic is coming from one machine?

Oh wait, nevermind (10 Dell R610 IIS web servers)

[http://highscalability.com/blog/2011/3/3/stack-overflow-
arch...](http://highscalability.com/blog/2011/3/3/stack-overflow-architecture-
update-now-at-95-million-page-vi.html)

~~~
JasonPunyon
Our web tier has 10 servers that all run at 10-15% CPU. And we have two
database servers: one for StackOverflow and one for everything else (SE
Network, Careers and Area51)

------
kondro
Also, why not just store everything in memory. 512GB of RAM is also not overly
expensive… even with the extra MS licensing required for it.

------
StavrosK
Does anyone know if there's any specific reason why they have multiple
databases on the same box? From what I see, one could trivially install the
full text search engine on another server and reduce much of the space
requirements and contention.

~~~
jacques_chester
If you read closely, you'll see that StackOverflow's database has been moved
onto its own hardware away from other StackExchange sites.

I think they're using SQL Server's inbuilt text search engine (edit: no, see
below).

~~~
JeremyBanks
Actually, they switched to Lucene.NET a year ago:
[http://blog.stackoverflow.com/2011/01/stack-overflow-
search-...](http://blog.stackoverflow.com/2011/01/stack-overflow-search-
now-81-less-crappy/) It sounds as though the search indexing isn't done by the
database server anymore.

~~~
jacques_chester
Thanks for the correction.

------
th5
Is it not worrisome that they only have one db server with no hot-backup or
fail-over db machine? I suppose many components on that one machine are
redundant - disks, cpu's... but theres got to be many points of failure in
there as well right?

~~~
nick_craver
Both NY-DB01 (runs everything but Stack Overflow) and NY-DB03 (runs Stack
Overflow) have identical backup counterparts: NY-DB02 and NY-DB04. NY-DB04 is
on a mirrored config and is always a few minutes behind, while NY-DB02 is
restoring scheduled backups. With SQL Server 2012, the backup/mirror is
greatly improved and both boxes will have fully-hot spares in a replica
configuration. In 2008 R2, SQL Server just doesn't handle mirroring 100+
databases well.

~~~
griffordson
And those are all in the same data center, right? How much downtime do you
guys have a year?

~~~
nick_craver
The hot backup is in the same New York datacenter, yes. We also have daily
backups across the country in our original Oregon datacenter. The whole OR
setup is getting love to be a much more resilient failover location as we
speak (that'll be the topic of my next post). As for downtimes, pingdom says
we were down 7h 6m 23s last year, so 99.92% uptime (note: not nearly all of
that was DB related).

------
amitutk
noob q: any reason not to use RAID5 or 6?

~~~
Devilboy
If you run RAID 10 on a busy system and a disk fails, everything keeps running
at full speed until you replace the missing disk and rebuild your redundancy.

If you run RAID 5 or 6 and a disk fails, suddenly every read operation becomes
several times slower because the missing data must be computed from the parity
and remaining data channels. If your normal day-to-day load on the storage is
too high you are screwed until you rebuild your missing disk.

~~~
nbm
Write throughtput/latency on RAID10 remains consistent with previous numbers
(latency perhaps milliseconds faster), but read latency and overall-throughput
(as opposed to individual requests) will suffer on the mirror with the bad
disk.

Recovery will drop read/write performance, depending on how fast you do it,
but nowhere near as bad as RAID-5/6.

------
rorrr
If you're 27th largest site in the world, you shouldn't have any trouble
getting large fast SSDs. Or just get a bunch of

    
    
        Crucial M4 256 GB (4KB Random Write: Up to 50,000 IOPS)
    
        or
    
        Plextor M3 Series PX-256M3 256GB (4KB Random Write: Up to 65,000 IOPS)
    

Plus your whole site is perfect for sharding. Questions are pretty much
independent.

~~~
martincmartin
The article says they're the 274th largest site, not 27th. And that's in the
U.S., not the world. And as the article says, they are getting 200GB SSDs.

~~~
petercooper
I did like that they're getting 200GB vs 300GB though. $18m in funding and
still being frugal! :-)

