

MySQL and Memcached: End of an Era? - alrex021
http://highscalability.com/blog/2010/2/26/mysql-and-memcached-end-of-an-era.html

======
patio11
I think there is a huge, huge gap between TwiDiBook's needs and the needs of
the overwhelming majority of sites on the Internet. There are plenty of
applications which do real things for real people which will _never_ have
scaling issues associated with them. The vast majority of the remainder of
applications will have issues which are tractable by fairly simple solutions.
And then we get to the pathbreakers who have to invent new methods of
engineering to keep up with their growth curves.

The party is just getting started for "Hey, check it: we can put a cache in
front of our database" for that broad middle of the curve. (This includes
applications where relational databases are a great fit but for certain
performance requirements in small bits of the program. For example, you
_really_ don't want to have to implement a university backend system in a
key/value store. Just trust me on this one. It is virtually made for SQL, but
while 99% of the use of the system is less than performance intensive, having
resources in place to support that last 1% costs the university hundreds of
thousands of dollars. I think we could slash that with creative caching.)

~~~
petewarden
I'd agree with all that, but add a disclaimer. The rise of cheap and easy
solutions for handling Big Data makes a lot of new applications feasible. All
those cool correlations that Amazon does to suggest new books? You can do that
at home now. Got a problem that would benefit from running an analysis of
hundreds of millions of pages? You don't have to be Google to implement that.

It is a bit maddening to see people applying the NoSQL hammer to problems that
don't need scalability, but it's partly driven by the realization that they
open up opportunities to build entirely new applications.

~~~
mcav
I know there's Hadoop, etc for map/reduce; what is available for correlations
like Amazon?

~~~
bad_user
Amazon's correlations aren't done in real time, and I don't get why it would
be an issue using MySql for it. Maybe it is for Amazon which got plenty of
data to analyze, but not for a startup.

Personally I don't trust NoSql solutions because they aren't mature enough.
I've had little problems using MySql/Postgres and never had loss of data ...
and I can always come up with a non-relational scheme latter built on top ...
when I really need it.

The other problem I've had with Nosql is that capabilities vary from
implementation to implementation, and my software really evolves a lot ... I
don't know what my needs are from the start.

I've also used MySql instead of Memcached successfully for a website with tens
of millions of hits per day, preferring it because Memcached crashes on me a
lot and there are big differences between versions I don't want to worry about
... and yeah, I know async reads are problematic with Mysql/Postgres, but I
just threw a web service in front of it, and when using the Memory storage
engine it really kicks ass.

~~~
strlen
> Amazon's correlations aren't done in real time, and I don't get why it would
> be an issue using MySql for it.

Many of the algorithms for recommender systems are especially known for being
parallelizable and would be extremely difficult to reliably and quickly
implement on top of MySQL (or Postgres or Oracle). I can't even think of a way
to do so without making a fairly complex ETL cycle with a separate DWH and a
main serving DB (with points at which the site becomes degraded and possibly
unavailable). As I've mentioned earlier, with a system like Hadoop, you could
start off by implementing these algorithms in Perl or Python (making use
extensive libraries already available in the language).

Recommendations may not be real time, but there's a _huge_ time difference
between doing an even fairly simple "market basket"-type analysis for a data
set orders of magnitude difference smaller than Amazon's in MySQL and doing
this on top of Hadoop + HBase/Voldemort/Cassandra (i.e., compute in Hadoop,
serve from a NoSQL db which offers Hadoop integration). You could even start
by doing the computation in Hadoop and loading the data into MySQL (while,
perhaps, serving the data from a slave db).

> Maybe it is for Amazon which got plenty of data to analyze, but not for a
> startup.

Your start-up doesn't need to have tons of users to have tons of data. This is
especially true if you're building a start-up to make sense of the data that's
already out there (some YC examples are Flightcaster, Directed Edge). For a
consumer start-up, ability to have cheap and scalable storage means you can
record data _much more frequently_.

> The other problem I've had with Nosql is that capabilities vary from
> implementation to implementation, and my software really evolves a lot ... I
> don't know what my needs are from the start.

Isn't this the point of separating your data model from your presentation
layer? Begin with MySQL (or even simpler, with BerkeleyDB) -- change it to
whatever else suits your needs.

------
JulianMorrison
The Digg link said it best:

 _"Since it was already necessary to abandon data normalization and
consistency to make these approaches work, we felt comfortable looking at more
exotic, non-relational data stores"_

In other words, to get MySQL to superduper scale, you have to turn it into
NoSQL anyway. At which point, the only reason to be using MySQL at all is
inertia, because it's pretty clunky compared to the other alternatives. Back
in the day before Cassandra and etc, there wasn't a concept for NoSQL, and
sharding MySQL looked like an exotic way of using a relational DB. Actually,
no. It's a way of crudely emulating something that isn't a relational DB.

------
tszming
... I’m just suggesting do not just grab advice from the Internet or friends
tip and do not complicate beyond the need. ...(Peter)
[http://www.mysqlperformanceblog.com/2009/03/01/kiss-kiss-
kis...](http://www.mysqlperformanceblog.com/2009/03/01/kiss-kiss-kiss/)

