
Three things you should never put in your database - webology
http://www.revsys.com/blog/2012/may/01/three-things-you-should-never-put-your-database/
======
kogir
While generally good advice, sometimes it does make sense to put binary images
in the database:

At Loopt, long before maps were included on mobile platforms, we served our
own map tiles. These 64x64 pixel PNGs were often less than 100 bytes. Even
though we only covered the US at the time, there were upwards of 20 million of
these tiny files.

They caused file system destruction. We tried ext3, reiserfs, ufs, ntfs - they
all choked. In the best of cases each file took as little as 512 bytes on
disk, 5x larger than the data within. Performance was awful, no matter how we
bucketed the files in directories or tuned the file systems. Reiser 4 trunk
was looking promising, but then Hans was arrested[1].

The solution was to store the images in a database. They fit in-row without
needing blob storage, and the DB indexes were much faster than any of the file
systems we tried. Backups took less space and less time.

[1] He was eventually convicted of killing his wife:
<http://en.wikipedia.org/wiki/Hans_Reiser>

~~~
getsat
XFS rules at handling lots of little files. It also has a ridiculously high
"files per directory" limit.

Something like Redis might've been sweet, too, but would require you to hold
all that data in ram.

~~~
kogir
This was in 2005. Should have tried XFS if we didn't. I think we might have.

------
mmastrac
One issue I have with this article:

"Images, files, and binary data: Great now you can't use nginx or another
lightweight web server to serve them up."

That's not true - you can create a private HTTP endpoint in your web app that
your lightweight caching frontend can serve and cache from.

(I'm not saying it's not a better idea to store things in S3 or another big-
blob service, but the reasoning in this article doesn't hold water)

~~~
duskwuff
Even so, serving them up will still be less efficient (and more complex!) than
if they were files.

~~~
mmastrac
(note: I'm playing devil's advocate here for storing images/files in the
database - this is not something I'd recommend doing without careful thought)

In terms of efficiency, you're only paying a small price on the first hit for
an image before the cache comes into play. It won't be as efficient for you as
directly serving from S3, but with a properly configured cache, the effect
would be negligible.

In terms of complexity, you're looking at a single nginx configuration
endpoint of a handful of lines to cache the images on-disk or in memcache for
fast serving. In the case where you've got CloudFlare in front of your web
app, it's free. Overall it can be _less_ complex, because you can re-use your
existing code for talking to the database, rather than adding code to talk to
another service like S3.

The _real_ argument against storing big binary data in the database should be
based on the argument that databases aren't necessarily always tuned properly
for storing things that aren't relational (which the article briefly
addresses), and that storing your binary data alongside your relational data
could have deleterious effects on the performance of your relational queries.

~~~
d503
Incidentally, SQL Server 2008 has a FILESTREAM type which integrates with NTFS
to give you streaming access to BLOBs (you can even work with them via Windows
file handles). MSDN has a pretty thorough (and I thought interesting)
technical article on them. [http://msdn.microsoft.com/en-
us/library/cc949109(v=sql.100)....](http://msdn.microsoft.com/en-
us/library/cc949109\(v=sql.100\).aspx)

~~~
junto
Interesting. Thanks for linking to this. Notably the article states a
recommendation / constraint:

The recommendation for using FILESTREAM is that the individual units of BLOB
data be 1 MB in size or higher.

Though, the benefits when that is the case, look mighty impressive.

------
pathdependent
What about plain text or uniform-key, unsalted passwords? Granted, it's not a
scalability or reliability concern, but I would put that at number one. It
seems like every six months, some impressively large company gets in trouble
for doing this.

~~~
frankwiles
Hahaha I should have added that! It's one of those "I would never think to do
it" items so it didn't cross my mind. Will reply later to everyone else when I
have a free moment.

------
dools
Storing images in your database is a fantastic idea. You then add in a
mod_rewrite rule which checks for the existence of an image in the requested
size (with some accepted size format like original_filename500x200.jpg) and if
it doesn't, produces and caches that size. Best of both worlds.

The "size of database backups" point has been countered several times on this
page effectively so I won't bother but in terms of scaling your app, restoring
backups, managing versioned backups, moving your app around and a whole host
of other management and maintenance related tasks, being able to completely
blow away your cache of image files and have them rebuild on demand is
tremendously useful.

Also on the point of "storing thumbnails" this assumes that your thumbnail
size requirements never change over time. Using the approach of storing the
source image in a database and caching the files on demand means you can
change your design and all you need to do is change the size parameters in
your URLs, bam, new image thumbnail sizes.

~~~
dchest
I'm not sure what databases have to do with your ideas about caching images
and serving them via URLs with size parameters.

~~~
dools
One of the key points in the article had to do with the speed advantage of
serving images from the file system rather than from a database.

------
sparkie
>read/write to a DB is always slower than a filesystem

One shouldn't forget to mention that a filesystem is a DB, albeit one
specifically engineered for the purpose of storing files. There's some
overhead in doing so, because the filesystem stores a fair bit of metadata
about the file. I don't think this argument always holds true.

There was a paper (a little dated now) from MS research which looked into this
a little: <http://research.microsoft.com/pubs/64525/tr-2006-45.doc>. The
research is not really conclusive, but it suggests files smaller than 256kb
might be better stored in the DB, where anything larger the filesystem
performs better. Would be nice to see some more research on the topic.

~~~
bunderbunder
I suspect that the issue is too complicated for there to be an official
threshold which everyone can rely on. It's not just a question of whether the
filesystem or database can serve up the data faster. It's also the impact that
those large chunks of binary data have on the database engine's cache, and
what that's doing to performance globally. Presumably the specifics of that
impact will depend heavily on the data access patterns of the application in
question.

------
mistercow
>Images, files, and binary data

It really depends on your situation. Point by point:

>read/write to a DB is always slower than a filesystem

Obviously this is true, but depending on the context, the difference may not
be that important.

>your DB backups grow to be huge and more time consuming

This just doesn't hold up to scrutiny. You still have to back the files up,
regardless of where you put them. If you keep them in the DB, then backing up
the DB automatically backs up the files. If you keep them separately, you're
adding complexity to the backup/restore process.

I guess there are times when you keep references to files in your DB but can
live with having the files and DB get out of sync if you have to restore from
a backup, but that's a design decision, not a hard and fast rule.

>access to the files now requires going through your app and DB layers

Again, there are times when this simply doesn't matter.

Let's not forget that one of the main reasons people use databases is that
they simplify concepts like locking. If you start tying your database to some
external structure within your file system, you not only have to implement
those features yourself, but you also have to coordinate them with the DB's
own implementation.

~~~
ars
> You still have to back the files up, regardless of where you put them.

If they are actual files you can do an incremental backup. If they are in the
database the backup is just one huge file that has to be backed up every time.
Just dealing with the IO takes much longer, not to mention disk storage and
possibly network bandwidth.

If you need to restore a single image you are going to need to create a whole
database just to hold the backup before pulling out the one image - depending
on how much data you have this can take hours. But with a file it's trivial.

~~~
jrs235
Okay, the incremental argument carries no weight. You could do a differential
backups of the database.

If you are also doing transaction log backups restoring a file that was
deleted is fairly trivial too.

------
asuth
The logging thing is really really dumb. Having logs in a real database lets
you do incredible things (e.g. correlating user feedbacks to random problems
you noticed). We use our db-driven logs many times a day to debug issues and
find patterns.

Don't be afraid of putting logs in databases and then figuring out how to
scale it. It's not that hard and it's very worth it.

~~~
rplnt
Well, he advised on using tools like Splunk which can do all that for you. But
I'm pretty sure they use database internally. So conflict advices I'd say.

~~~
frankwiles
Splunk, last I checked, used PostgreSQL for it's backend. While maybe badly
worded, my point was more toward the users who have a two column log table
with a timestamp and a syslog style line of text. The problem with "advice"
posts like these is you can't cover every possible scenario without confusing
the less experienced readers. Do I think loggly or splunk cloud should store
their logs as flat files? Hell no, their app IS logging. But in the general
case of application architecture I don't think it is appropriate or terribly
useful for most. I know they say "never say never", but I felt it justified
when I believe it's a bad idea in 99% of apps.

It's really more about using the right tool for the job, Splunk, Loggly, NoSQL
solutions, etc. are a MUCH better fit than an RDBMS for this sort of work.

~~~
nl
Splunk certainly does NOT use Postgres for it's backend(!)

It uses custom index files stored on a filesystem.

It can also connect to MySQL[1] and to HBase[2], but its own data is kept in
its own index.

[1] <http://blogs.splunk.com/2011/12/20/swimming-with-dolphins/>

[2] <http://blogs.splunk.com/2011/12/05/introducing-shep/>

~~~
frankwiles
Makes sense, it's been a long time since I installed or had shell access to a
Splunk box, I just remember it needing PG installed. Likely that's changed
over the years.

------
elchief
Premature optimization is the way to go right?

Screw referential integrity.

Screw db security.

Screw simple administration (single backup).

Raw performance of one part of the system is the only concern.

~~~
frankwiles
Not sure how "db security" factors in, but yes it's likely a premature
optimization for many users, but these three are some of the most frequently
abused RDBMS items I see on a regular basis. And often the major limiting
factor in the client's performance.

While it is "raw performance of one part of the system", it's also the hardest
aspect of a system to scale properly. Not doing these three things can often
get a user months or years further down the line without having to worry about
it at all.

------
jclulow
My time as a system administrator makes me rather fond of logging as text into
files. They're simple, easy to scale, to rotate, to compress and to throw
away. One variation on this theme that I read about recently is to produce log
lines as JSON objects rather than more "traditional" printf-formatted text.

This gives machine readability but also makes for really simple tools for
extracting whatever "traditional" log lines you like. (see:
<https://github.com/trentm/json>) You can also easily add new fields without
breaking existing log parsing code.

I believe I first read about logging in JSON here:
[http://journal.paul.querna.org/articles/2011/12/26/log-
for-m...](http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-
in-json/)

------
derickson
"But wait, you're a unique snowflake and your problem is SO different that
it's ok for you to do one of these three. No you aren't and no it really
isn't. Trust me."

I feel like this article is making a large number of assumptions about the
problem I am trying to solve.

------
maximus2022
You can still store files (images, documents, etc...) in the database as a
backup.

Let's imagine the following scenario. If your database crash and you lose
everything there is good chance you can't run your application any more (if
you don't have backups), but you still have your images! great! what do you do
with them. If you lose your files but not the database, cool you still have
the paths of all your files that have been lost in the database, so now what?!

The idea is to keep files (such as images) in the database to be able to pull
them down on the file system and backup your database. Plus it's not always
stored in the database, for example with MSSQL you can use file stream to
store document.

------
sbov
> Images, files, and binary data.

We did this for one project because we wanted transactional semantics when it
came to image files and other related information, and this made it easy. We
didn't serve images from the db - instead we created image files on the web
servers from what was in the database.

------
gardarh
Article says that logging to a non-production db "isn't a HORRIBLE idea"
indicating that it isn't really a good idea. I used to work with a system
where we routinely made complex queries against the log database. Parsing the
log line before inserting to the db, putting relevant values into indexed
columns seemed to me like a pretty good idea and provided us with a really
easy-to-use and a rather fast diagnostics tool that no one had to learn how to
use.

EDIT: Of course you have to make sure you don't do any writes that block the
main operation, e.g. by using message queues.

------
lucb1e
> read/write to a DB is always slower than a filesystem

I always thought it was the other way around. Databases can cache data and
keep files opened (keep the file locked or something, I don't know).

Also my own benchmarks on an old laptop (MySQL versus FAT32) turned out that
MySQL was actually faster, even if you still had to connect (no persistent
connection). I should try this again on a decent server and use NTFS or Ext3
(see if that makes much difference), but in general I can't really see how a
database can be slower--given that you use it for what a database should be
used for. Of course if you read a movie from it once a day, it's not gonna
cache that and the file handle is not going to help much either. Then MySQL is
only an obstruction to the data which it has to read from the disk anyway. Hmm
or is that what he meant here?

------
zacharyvoase
There's a very strong use case for binary data: bitmaps. You can implement a
bloom filter or a set as a bitmap field on a relation, and then perform quick
searching and joining using the bitwise comparison operators. This is my main
reason for wanting binary field support in Django—at the moment Django
attempts to decode every string as UTF-8.

------
kalleboo
I thought the issues with blobs in databases is that when you pull a file out
of a DB to serve it on a website, you get something like this:

1\. Database pulls file off disk, or out of cache, reads into memory buffer

2\. Memory buffer is copied to a new buffer in SQL client library (over a
socket or whatever)

3\. Memory buffer is copied to a new buffer in your scripting language

4\. Which is then written to the web server's memory buffer, where it gets
served to the client.

Which ends up being a whole lot of copying (and a whole lot of work even when
the data is cached in the DB) compared to:

1\. Web server mmap's file

2\. Web server streams file out to client

Is this situation no longer true? Are there newer tools that handle "serve a
file from a BLOB" more efficiently? Do they also handle things like seeking
(resumes) in a non-wasteful manner? Or is memory performance so good and the
blobs we're talking about so small (bytes, not kilobytes) that it doesn't
matter?

~~~
mistercow
I'm by no means a DB expert, but that was basically my understanding of the
disadvantages as well. As other people have pointed out, you can use caching
to fix a lot of those performance issues.

Of course, the author of the article also believes that every database is
hooked to a web server that will be constantly pushing its data to hundreds of
thousands/millions of users, that every 10 milliseconds lost means another 1%
of your users abandon ship, and that if you claim to have some other
circumstances, you are pleading "unique snowflake" status and should just get
shut up and follow his advice anyway.

------
appleaintbad
Ugh. This is so wrong, even though some of the arguments are valid criticisms.

Why is it wrong? Transactions and referential integrity. Without being able to
guarantee transactions that update data in the database along with any other
data, non-DB file-based or not, then you chance issues with referential
integrity. Even if you have a transactional infrastructure that you would bet
your life on, something could happen like a partial restore that restores the
file system but not the database.

It isn't that it is impossible to have a working transactional infrastructure
without storing these things in the database, it is that it is much easier to
store it in the database, and sometimes making things easy can save your ass.

------
madao
I have to disagree with the logfiles, I work for a really large domain hosting
company so our logs are quite vast, dealing with spam issues can amount to a
large amount of time being used just looking for the information you need.

we have greylog in place to help query these logs from our syslog server, it
has helped us identify alot of different issues since we have put it in place,
not to be rude to the poster but I dont think you have the experience with
really big log files and how painful it can be (or how long it can take) when
you have to grep through this information by hand, zcating an old compressed
log file can take up to ten minutes before you even start seeing data let
alone relevant data.

~~~
BlackAura
Your experience with logs doesn't invalidate the point the author was trying
to make. Storing your logs in your production database is a stupid idea.

You're already doing what he was suggesting anyway - storing your logs
somewhere other than your application database.

~~~
chris_wot
I'm trying to understand why storing your logs in the production database is
such a bad idea. Is it because of potential disk contention? Lock issues?
Could you be more specific about why it is a stupid idea?

~~~
AlisdairO
Probably just disk contention - locking is unlikely to be a problem since
you're just adding rows.

The disk contention issue could be a problem because writes are hard to scale,
but depending on the regularity of logging and the indexing used (I'm assuming
it's done by timestamp here) it's probably not actually the biggest of deals -
you should end up mostly doing append-only writes. In oracle, for example, all
that'll happen is that for each log transaction some stuff will get altered in
memory (i.e. a block of the table will get appended to, and an append will
occur on the index) and the roll forward log will get written to on disk.
Every so often the table and index blocks will get written to disk, but that
will be _much_ less than once per transaction in an appending case.

That said, overall I'd still recommend having the logging done on a separate
system, or at least having profiling set up on your log table/user/whatever
such that logging can't eat your production server if someone decides to
excessively ramp up the log levels.

~~~
chris_wot
Why not just move the logging table into a different tablespace/file group and
place this on a different disk? That would fix write performance problems.

~~~
AlisdairO
This is absolutely a good thing to do, but it doesn't make you safe - the
disks running the database's redo logs are the ones that get hammered in a
logging scenario (since the writes to the table and indexes are largely
appending, and don't have to be written to disk to commit a transaction).
These logs are usually global to the DB instance, so you can't easily (to my
knowledge) separate them out. This being the case, they can significantly
affect the performance of the rest of your system if too much logging starts
happening.

------
gregbayer
Database is a pretty broad term these days. Doesn't this depend on the
database we're talking about?

Something like Big Table is pretty good at storing small blobs. It's also
pretty good at scaling, so you should never really have to delete "usage
statistics, metrics, GPS locations, session data" or anything really.

Regarding logs, I completely agree. Something like tornado + scribe + s3 works
quite well. (check out <https://github.com/gregbayer/logcatcher>) But then EMR
+ Scribe can be used to access it like a db.

------
mildweed

      * Credit card numbers
      * Social security numbers
      * Plain text passwords
    

Can't tell you how often I find them in databases. Its mindboggling.

------
pselbert
While I fully agree that storing log data in the database isn't usually a good
idea, I would never recommend Logly to somebody. It may just be my experience,
but the search facilities and stability left a lot to be desired - it is one
of very few web apps that consistently face me the Chrome "oh snap" js crash.

IMO Papertrail is a much better service with the exact same MO.

------
officialchicken
This is a load of BS from somebody who only works with one kind of relational
data (tabular).

Never let this author near your GIS/spatial system.

~~~
Deestan
Instead of saying "load of BS" and leave it at that you could explain which
parts of the article you disagree about and why.

------
otterley
There's nothing wrong with logging to _a_ database, provided you adequately
buffer the insertions to keep the underlying sync() rate reasonable.

I wouldn't recommend logging to the same database instance that's used to
store customer data for both security and performance reasons, but that's no
reason not to use a database for logging.

~~~
raverbashing
True

I agree, what it should say is: never log to the _main_ database, create a DB
especially for it (preferably in an entire different file system/disk/machine)

And don't forget to set it to 'append only'

------
Kyoushu
I'm ashamed to say I used to use MySQL to store serialized PHP arrays. It
seemed like a good idea until I needed to run a query using a string within
the array. I "solved" the problem by writing a MySQL function similar to
ExtractValue().

I was young and stupid, and I needed the money.

------
AsylumWarden
I used to do custom SharePoint development using SharePoint and MOSS 2007 from
Microsoft. Fun developing experience. I was doing document management for some
rather large companies. Guess where Microsoft put all of the images, word,
excel,powerpoint presentations, email and whatever else! In a blob in an SQL
2005 server database. It was amazing how fast those hard drives could fill up
with each document and all of it's revisions being stored there. There was
talk that MS might eventually add external blob support but I quit long before
that if it ever happened at all. If anyone asks me now I simply refer them to
other document management servers.

------
RickHull
I have to think we've all been tempted at some point. This is a succinct and
persuasive prophylactic. Bookmarked.

------
Mordor
As with any performance tuning advice, "it depends".

Bear in mind not storing files in a database does create restore issues, since
database and file backups will have been created at different times, so
there's always a risk of orphaned/missing files.

------
jsprinkles
_Strongly_ disagree on logging to database being inferior to flat files or
Loggly/Splunk. The first point is good: you shouldn't put them in the same
place as your production data, agreed. Then he goes on to recommend not
logging into a database at all, which is just silly.

Map/reduce on log data is absolutely wonderful and opens new worlds of answers
that you can only dream of. A Hive workflow is fantastic for parsing log data.
You can answer any question, and even GRAPH any answer from your logs. The
insight gained from capturing and processing logs can do wonders for your
performance problems, ideas for new uses of your code, and so on.

I hate any approach to engineering that says "you'll never need it, so don't
do it"; do it, and discover the great things that some things unlock for you.
Burying your head in the sand on this point just limits your ability to unlock
unique insight and gain an edge.

I particularly disagree with this because of the flourish at the end of
article, where he says there are no exceptions to his advice. Flatly wrong,
and if you tell me "don't do it, trust me," I lose a bit of respect for you as
a communicator.

~~~
koko775
Altering the logging table scheme once it reaches several million rows? Not
fun. I think flat files are much easier to scale.

~~~
sukuriant
You could create a new table called Logs2 and use your database's union
functionality when querying it. Perhaps even a view to make things a whole lot
easier.

~~~
Yeroc
If you're talking about Postgres most people would likely implement this using
date/time-based partitioning which provides a more seamless solution to this
problem.

~~~
chris_wot
Not just Postgres folks - SQL Server and Oracle also :-)

------
fleitz
"read/write to a DB is always slower than a filesystem"

This is patently false, reads/writes on a DB are often much faster which is
why we use DB systems.

Even image data especially on write can be faster when written to a DB vs.
flat file because of sequential writes. Small files are slow to write on
filesystems which is why we have DBs.

In most cases with a high read / low write load it's certainly a better idea
to put thumbnails on a CDN for a variety of reasons but to think that writing
small chunks of data to a file system is always faster than a DB is to
misunderstand what DBs do and when and why they are useful.

~~~
gnaritas
> reads/writes on a DB are often much faster which is why we use DB systems.

Not really. We use db's because file systems aren't ACID and we like to not
lose data, and we like to be able to query data, and because we like multiple
machines having access to that data. But the file system is generally much
faster, especially when appending something like a log (yea, db's do that too,
that's beside the point).

> but to think that writing small chunks of data to a file system is always
> faster than a DB is to misunderstand what DBs do and when and why they are
> useful.

I don't think so. Perhaps you can expand on that.

~~~
fleitz
Write a program that creates 100,000 1k-8k byte files.

Write a program that creates 100,000 1k-8k byte files in a zip file with no
compression.

Run multiple copies of these programs, see which writes files faster.

Keep running the program(s) until the drive is 90% full. (Put the files in
different directories if you feel this is fair.)

Run the programs again.

You'll notice that on a decent 7200 RPM drive the ZIP code is writing about
40-50MB/sec on a good drive. You'll probably be lucky to get 4-5MB/sec from
the file code (unless you're on an SSD, even then it will probably be half)

If you feel like it also create a database and do the same with a file table
like "CREATE TABLE data (id int NOT NULL PRIMARY KEY, data varbinary(8000))"

Inject 1000 to 2000 rows between commits.

The DB should also outperform the file code, but probably not the ZIP because
it has to write the data twice.

The reason is that writing file to disk quickly produces a random IO pattern
as the FS tries to allocate files all over the disk. Also, everytime you
create a file you have to update a whole bunch of file descriptors, a modern
file system is a B-tree and journal. Sounds awfully like a database right?
Except this database is optimized for large files, at least 32-64kbytes and
preferably in the hundreds of MB.

~~~
gnaritas
You're biassing the test towards small files and particular file systems that
don't deal with them as well, and edge cases where you think the db is highly
optimized; I'd imagine that's because you know outside those bounds the raw
file system will kick its butt, especially given the tools that let you avoid
the application entirely by streaming the file from disk.

~~~
AlisdairO
In fairness, while his test is biased towards dbs, it's also testing the
claims that he made in his top level comment, which specified smaller files.

~~~
gnaritas
Which again, presumes I have to use multiple files. I can append small data to
a log file faster than I can insert small rows of data to a database, and with
much less effort.

And my main point was, speed is not the reason we use databases, transactional
semantics and query capabilities are.

~~~
AlisdairO
Right, but he was talking about image files.

edit: It's not that your point is wrong by any means, it's just that you're
arguing against a point that he wasn't really emphasising. I don't think
anyone, including him, would try to tell you that constant appending to a log
file would be faster through a db than just a direct write to the file.

------
iRobot
Ouch, I do all 3!

------
nl
I think most of the objections people are having to this article can be
overcome if you say the author is talking about traditional, SQL based
Databases.

~~~
nl
Umm.. well obviously some disagreed with that.

Here's how I see it:

Logging isn't ideal in traditional databases, because then you have to scale
writes. Scaling reads is fairly easy in a traditional database, but scaling
writes is much harder compared to something like HBase or Cassandra

Image storage in a traditional database will always suffer compared to other
options. Filesystems are faster and easier to use, and things like S3 match
requirements better in some cases.

I'm not really seeing anyone argue that ephemeral data is better in a Database
than something like redis.

~~~
chris_wot
If its a problem with writes, then why not put the logging tables onto a
different disk? Surely that would fix write contention?

~~~
nl
Cassandra can do over a million writes per second[1] because of a combination
of its horizontal scaling and eventual consistency. I expect HBase could scale
similarly.

By comparison, people write about scaling MySQL to 25000 writes per second[2]

It's very hard to get a single server to scale multiple orders of magnitude
higher (even with FusionIO or something). Getting around that single master
server for writes is possible, but either complicated or expensive. Even then
I doubt you will get to 1 million writes per second without sharding, and if
you are going to shard you might as well go to a NoSQL solution built for the
problem.

[1] [http://techblog.netflix.com/2011/11/benchmarking-
cassandra-s...](http://techblog.netflix.com/2011/11/benchmarking-cassandra-
scalability-on.html)

[2] [http://www.spoutingshite.com/2011/01/27/scaling-mysql-
to-250...](http://www.spoutingshite.com/2011/01/27/scaling-mysql-
to-25000-inserts-per-second/)

~~~
chris_wot
That is indeed extremely impressive! I hope I didn't come across as meaning
that NoSQL solutions aren't appropriate - it's very hard for me to say this
anything about these solutions, as I've got next to no experience with this
technology.

I am very interested in the first article though. It mentions number of
instances in an EC2 availability zone - what does this mean? Is this similar
to a clustering/load balanced solutions in traditional RDBMSes?

 _The automated tooling that Netflix has developed lets us quickly deploy
large scale Cassandra clusters, in this case a few clicks on a web page and
about an hour to go from nothing to a very large Cassandra cluster consisting
of 288 medium sized instances, with 96 instances in each of three EC2
availability zones in the US-East region. Using an additional 60 instances as
clients running the stress program we ran a workload of 1.1 million client
writes per second._

 _Edit:_ Argh! Sorry, I just didn't read far enough before I posted. They seem
to be being powered by some fairly heavy duty hardware. Is there a comparitive
study using Oracle RAC or something like this?

