

Schema-Free MySQL vs NoSQL - bdotdub
http://www.igvita.com/2010/03/01/schema-free-mysql-vs-nosql/

======
jasonkester

      Instead of defining columns on a table, each attribute has its own table 
      (new tables are created on the fly), which means that we can add and 
      remove attributes at will. In turn, performing a select simply means 
      joining all of the tables on that individual key. 
    

... and now you know why nobody is doing this.

Back before NoSQL had a name (and fast implementations) and we would just
build key-value stores inside of relational databases and describe them to
people as "tall, skinny tables", this was how you would construct a query.

Of course, it was a lot easier in a skinny table than the method you describe
because you'd simply join the table back on itself n times rather than having
n tables sitting around in your database.

Regardless, doing things that way was generally regarded as a bad idea even
back in the 90's, and was only something to consider when you had highly-
configurable things to describe that didn't fit well into a relational
structure. Everybody realized that you suffered huge performance hits at
query-time, and you were essentially eliminating the possibility of reporting.

So no, given that NoSQL is all about using key/value stores to boost
performance by trading off reportability, this solution (which actually trades
both sides of that equation away) is not a replacement.

~~~
timtadh
The issue is no matter if you are using a column store or simulating a column
store in a row store you will have to do tuple reconstruction. Unless all of
you queries really only need one attribute this will require at least one
join. Some column stores use clever techniques, like cracker columns, to make
these joins faster. This is why a column store may perform better than a row
store for specific kinds of queries. But if your queries are constantly asking
the dmbs to re-construct the a relation with many attributes a row store may
actually work better for you.

My point is that the decision to choose your dbms should be driven by, the
kind of data you have, and the kind of queries you are going to run. If
performance really matters to you, you should really do some empirical testing
before choosing the technology. Both there is some give and take here, one is
not definitively better than the other.

That said I agree with you that this is a bad solution don't simulate a column
store with a row store it won't perform as well as a good column store since
it is not optimized for tuple reconstruction.

------
mjw
Interesting article.

I've recently been doing some load-testing of key-value stores for caching,
and also found that MySQL compares pretty favourably with some of the new kids
on the block.

Just talking a "create table (key char(64) not null, value blob not null,
index (key))" schema. (Plus some MySQL performance tweaks, which I won't go
into here)

Some advantages:

* When it comes to redundancy and failover, you can re-use the existing master/slave replication functionality in MySQL, rather than worry about whether your k/v store of choice supports anything like this, how stable it is and how to configure and manage it etc. One less thing to worry about especially if you're already doing mysql replication for other data.

* MySQL can keep hot data in RAM and cold data on disk, and is pretty tuneable in this regard; some of the trendy k/v stores aren't so hot on this front (eg Redis, although they're addressing it in version 2)

* With InnoDB tables (which are pretty good for k/v functionality) you get the transaction support which can save your ass from some nasty race conditions with concurrent processes writing back to a cache. Especially handy if your main relational dataset is also in MySQL since you can update the normalised data and the cached data in one transaction

* You can join 'relational-style' tables directly to the 'k/v-style' tables, rather than having to fetch a bunch of IDs and then look them up (for some stores, one at a time) in a separate store. Which is a great when you have some infrequently-changed data that can be denormalised as cached blobs, and some frequently-changing data which refers to it.

* Possibly more...

~~~
jayair
Could you go into a bit of detail about the performance tweaks that you are
referring to. If not can you point me to some articles that do so. Thanks.

I'm assuming you mean tweaking MySQL just for k/v store usage.

~~~
mjw
I have some stuff bookmarked at work which I can dig out tomorrow.

One thing was forcing InnoDB to cluster by insertion order by adding an
otherwise-redundant autoincrement primary key alongside the unique index on
the actual key column. Also tweaking various innodb-specific config settings.

~~~
jayair
Thanks. We are doing something similar and are running into some constraints
with this setup.

~~~
mjw
Ah - would be interested to hear about those too!

------
m0th87
The OP's suggestions are pretty bad, as jasonkester points out. But for a
specific class of problems (ones that are both performance or JOIN intensive
and require the awesome indexing/sorting capabilities in SQL solutions),
schema-less MySQL, _when done "correctly"_ is awesome. I designed an
application that needed to do graph traversal in real-time, so normalized SQL
wasn't an option for me. I tried various NoSQL solutions, but they all either
lacked important features or ran insanely slow. I ended up settling with a
design similar to how FriendFeed uses MySQL:
<http://bret.appspot.com/entry/how-friendfeed-uses-mysql>

Not only was the solution faster, but it made code so much more concise,
because you can just serialize Python objects to the database rather than
write a huge database abstraction layer. For me this is even more valuable
than the performance ramifications.

If you're using Python+MySQL, be sure to check out Tornado's database wrapper
([http://github.com/facebook/tornado/blob/master/tornado/datab...](http://github.com/facebook/tornado/blob/master/tornado/database.py))
because it handles all sorts of sharp corners that straight up MySQLdb won't
(like working with binary data, reconnecting when necessary, etc.)

~~~
reynolds
I'd be interested in comparing notes if you are. I implemented something
similar to what Bret Taylor described. Mine is pretty hacked and alpha but
I've used it in production. <http://github.com/davidreynolds/simplestore>

------
timtadh
If you want to look at a paper on performance of such a system check out
[http://cs-www.cs.yale.edu/homes/dna/papers/abadi-sigmod08.pd...](http://cs-
www.cs.yale.edu/homes/dna/papers/abadi-sigmod08.pdf)

