
Strategy: Drop Memcached, Add More MySQL Servers - iamelgringo
http://highscalability.com/strategy-drop-memcached-add-more-mysql-servers
======
mdasen
I would tend to disagree with that strategy.

First, memcached is fast. Allocations are made in O(1) time (think: scales
infinitely) and is non-blocking. Anytime you want to look up an object, you
just call for it.

With sharding, the process is more difficult. First, you query for the shard
of the object, then you query that shard. OK, not too bad. What happens when
you want to get all the friends of Adam? Well, you query for Adam's shard, get
the list of his friends, then query for each of those friend's shards, then
query each of the friend's shards for their data? Ugly, slow, gross! OK, so
you denormalize! Only problem with that is that you then have to do a lot more
queries on writes (which are the hard part to begin with). Plus, what if you
want to search for users named 'Bob'. Well, typically sharding involves a
setup like table, pk, shard - to relate an item to which shard it's on. If
you're looking up by the primary key, you're gold. Not so much on the other
fields. Yeah, you can query each shard, have it send back results and combine
them yourself - in fact you could even automate that process in a proxy - but
it isn't the most wonderful solution.

That scenario in memcached works much nicer - mostly because memcached is
basically a sharded database to begin with. You don't have to look up which
memcached server an object is on. Just pass it the ids you want ('person:5',
'person:22', 'person:900') and it will grab them from the appropriate shards
and send them back! The problem is that memcached doesn't keep multiple copies
so it can't be used for persistence. No problem, MySQL will handle that!

What's really missing from the whole debate is that when you get to a certain
point in your application, you just don't get to query like you used to. With
memcached or sharding, you're going toward limiting yourself to keys -
somewhat similar to Google's BigTable (a dumb store).

With AppEngine, Google took a huge step to making this more usable to
programmers. By adding an index on each field (with the option for indicies on
multi-field lookups), AppEngine allows you to have a sharded database without
sacrificing some querying capability. So, the big question is, why isn't
someone developing a system that works like that - spews your data all around,
but keeps an index available for some query capability? In fact, AppEngine's
datastore queries work shockingly similar to views in CouchDB - look at how
scanning a range is done in each and you'll get an idea.

In the meantime, MySQL/memcached in combination allow me to take advantage of
both sharding and querying with relative ease. So, I save Bob as a friend of
Adam in MySQL, then I update the entries in memcached for each of them by
querying MySQL for their records and friend_ids and put {'id': 12, 'name':
'Adam', 'friends': [1, 212, 78, 51]} and {'id': 212, 'name': 'Bob', 'friends':
[12, 491, 51, 999]} into memcached. When I want Adam and his friends, I run
two memcached calls - give me 12, give me (1, 212, 78, 51).

That isn't perfect, but it can be a lot easier than sharding. At some point,
it becomes requisite to shard, but it's not fun so why not use memcached along
the way? "Don't prematurely optimize" comes to mind.

------
newt0311
Better strategy: drop MySQL and move to postgres.

~~~
djist
This seems to be the mantra of the PostgreSQL fanboys.

I use PostgreSQL. I have found, in my experience, that it has superior join
performance when compared with MySQL and that's what is important to be in my
line of work. That said, I'm not silly enough to think that performance =
scalability. Neither MySQL or PostgreSQL scale naturally. They require
strategies to be implemented whether that be sharding, caching, replication,
etc.

This just isn't a useful suggestion for scaling.

~~~
kingkongrevenge
There is so much ink spilled about going through conniptions to scale mysql
and postgres. I really wonder if anyone has actually bothered to confirm that
it's not better to just buy some big iron and run db2 or oracle once you hit
scalability problems. There are a ton of corporate data centers running
oracle/db2 handling WAY more throughput than your web app ever will, and
they're not using memcached or sharding.

~~~
DanielBMarkham
It's a build-buy decision, right?

When you're seven guys sitting in a garage eating pizza, you build it because
it makes the most sense.

If you've got a million records going through an hour and you're making the
big bucks, get the commodity solution as close to wholesale as you can.

The problem is that if you're giving away your app for free, you're spending
dough -- perhaps lots of it. Then you're stuck on the back-end of the curve,
scrambling to tweak every little bit you can.

~~~
kingkongrevenge
> It's a build-buy decision, right?

Exactly, but the entry level DB2 and Oracle solutions are not super cost
prohibitive and time is money. We're talking about a pretty low man-hour
investment in futzing with MySQL before you'd have been better off calling an
IBM rep. If you're doing some popular free web app maybe you can stick a
"powered by IBM" icon in the corner and get the stuff for free. Remember
those?

