Hacker Newsnew | comments | show | ask | jobs | submit login

He's only experienced with MySQL? How can he judge the SQL vs NoSQL battle when he's never used a proper SQL system? NoSQL does not 'save development time' in general, it's just a different tool. A much younger and less refined one at that. Real RDBMSs do a whole lot more than execute your SQL queries for you.



I don't think there's a SQL versus NoSQL battle, whatever that means.

SQL refers to relational databases, which are databases using the "relational model" of representing data: http://en.wikipedia.org/wiki/Relational_model

This means that any SQL database is very flexible in regards to what you can store in it, not to mention that it is based on proved theory and battle-tested implementations of various features, like ACID.

But the relational model also breaks heavily when wanting to work with data structures that don't blend well -- like graph data. It also breaks down heavily when you want to spread your data across many servers. It is also not well suited to storing and querying billions of records -- sooner or later, your indexes are going to go beyond whatever storage / RAM capacity your servers have.

Btw, MySQL is a real RDBMS. Even if it lacks some features, it doesn't lack anything essential to calling it "relational" and talk about advantages or disadvantages of RDBMSs versus key-value stores or other NoSQL types.

-----


Facebook seems to be doing just fine with their graph data on SQL

-----


There has been talk for sometime that they're MySql issues are crippling progress and development due to complexity of management and upkeep.

http://gigaom.com/cloud/facebook-trapped-in-mysql-fate-worse...

-----


Stonebraker has something to sell them, though.

-----


Very true.

-----


He spearheaded the adoption of MongoDB and probably Redis at Craigslist. That's more action than most commenters will see in their entire careers.

-----


He was also a major contributor to one of the best books on MySQL. The one that Monty worked on.

The guy knows databases, period.

-----


Proper SQL? Nice No True Scotsman.

Everybody has a pet feature in their preferred SQL DB that they think makes it a "real" SQL database, Postgres and Oracle people in particular. I agree that MySQL is a bit janky, but get real.

-----


Well, the 'pet feature' that MySQL lacks is 'ability to efficiently join between tables'. I think that is pretty high up on the list of important features for a relational database.

Saying this is a 'no true scottsman' is a 'straw man' on your part. Lots of people only have experience with MySQL, and think that it is somehow representative of the quirks and performance characteristics of all relational databases, which is simply not true. MySQL is popular, and that is the entirety of the list of positive things I can say about it. It's buggy (known segfault bugs accepted in final release versions), slow (I've seen identical fairly simple queries run on identical schemas with identical indexing and identical data run side by side on Postgresql and MySQL when we were moving away from MySQL. In Postgresql the query in question ran in about 4 seconds, on MySQL the query never finished, even after allowing hours to pass), bizarre (many table engines parse and silently ignore all FK constraints), non-transactional (unless you use one of the transactional table engines, and then it is slow), poor with concurrency (several table engines have table level locks, the ones that are MVCC have silly concurrency limitations, until recently Innodb could only do 4 core concurrency, due to some weird implementation), and will fuck you in sneaky ways (a VARCHAR(N) can hold N bytes, so if you store unicode data you can't necessarily store N characters. That's not the only absurd and horrifying wart, there are literally dozens).

-----


> In Postgresql the query in question ran in about 4 seconds,

For someone accustomed to the response times and behaviors of things like MongoDB and Riak, the fact that you consider this a high mark of your comparison is borderline comical and really only proves my point.

JOINs, when used prolifically as they tend to be with RDBMS, are an abstraction that doesn't work in a distributed environment without aggressive data locality and eventual consistency. The model is broken for a range of needs, regardless of whatever quibbling the SQL fans want to do over their pet database.

-----


I agree that JOIN starts to fall apart in a distributed environment. However, with replication, and proper understanding of the level of consistency, or lack of, your application can tolerate, a single database can scale quite far.

In fact, you will only run into three sorts of major deal breakers with the big-db + replication model. The first is if your data is just too large to fit in a single db to begin with. This is quite a bit of data, but certainly big datasets are all over the place. The second is when the write load is so high that the master database cannot keep up, or the slaves are using a lot of resources just to keep up with the replication. The last is that you need very fast access to the data, faster than a general purpose SQL engine can satisfy (think kayak, search engines, etc).

However, NoSQL isn't a solution to any of these problems. NoSQL datastores provide the storage component which you can use to build a system that can handle large datasets with huge churn. I don't mean this in the trivial 'the database isn't the application' sense, I mean it in the 'you are going to end up implementing a JOIN like construct in application code with NoSQL' sense.

The query in question, the one that took 4 seconds, simply could not be implemented in a NoSQL system, except for implementing all the logic of the query in application code, and even then there is assurance it would actually be as fast or faster unless you are smarter in your implementation than the query planner of whatever relational DB. Relational databases are a general purpose tool, and therefore in theory you can always implement a custom solution that will outperform any given query. However, your solution is at that point fast but inflexible, if you need to slice the data a different way, or produce reports that join or group the data differently than you had anticipated, a NoSQL solution will require you to write far more code that is harder to understand and verify the correctness of than a SQL query.

NoSQL solutions also push a lot of data consistency tasks into the application code. In a 'good' relational database, you can put lots of controls in place to ensure that the data is completely consistent, and you can even put some 'sniff test' constraints on, to, for example, ensure that the data at least conforms to what you expect it to look like. There is nothing that scares me more as an application developer than data quietly being corrupted. Such a scenario can cost months of developer hours to correct. Really robust constraints don't completely eliminate the possibility, but they at least cause some reasonable percentage of bugs that would cause silent data corruption to be caught early, and without causing any change in the underlying data (transaction rolls back). Your application code is therefore much simpler, easier to test, and faster to implement.

-----


>The query in question, the one that took 4 seconds, simply could not be implemented in a NoSQL system, except for implementing all the logic of the query in application code

I sincerely doubt that.

>NoSQL solutions also push a lot of data consistency tasks into the application code.

Sometimes, not necessarily. Depends on the consistency model and how much you care about consistency in each case. SQL also forces you to implement eventual/less-than-immediate consistency in various forms just so your application doesn't fall apart.

Lets readdress the first thing I responded to by probing how aware you are of what is out there.

If one were to implement, and I mean this in the simplest possible terms, a join and a group by in Riak, what would be the most straightforward means of doing so?

-----


I'm certainly not a Riak expert, however I did find a blog post which claims that this:

  SELECT addresses.state, COUNT(*)
    FROM people
    JOIN addresses 
         ON people.id = addresses.person_id
   WHERE people.age < 18
  GROUP BY addresses.state

translates into this Raik abortion:

  { "input":"people",
  "query":[
    {"map":{"language":"javascript", "name":"Riak.mapValuesJson"}},
    {"map":{
       "language":"javascript",
       "source":"function(value, keyData, arg){ 
         var data = Riak.mapValuesJson(value);
         if(data.age && data.age < 18) 
           return [[value.bucket, value.key]]; 
         else 
           return []; 
        }"
      }
    },
    {"map":{
       "language":"javascript",
       "source":"function(value, keyData, arg){ 
         var data = Riak.mapValuesJson(value);
         if(data.address && data.address.state) 
           return [{data.address.state: 1}]; 
         else 
           return []; 
        }"
      }
    },
    {"reduce":{
       "language":"javascript",
       "source":"function(values, arg){ 
         return values.reduce(function(acc, item){
           for(state in item){
             if(acc[state])
              acc[state] += item[state];
             else
              acc[state] = item[state];
           }
           return acc;
         });
        }"
      }
    }
  ]
  }
Can you do that same SQL query in Riak in a simpler way? Will the Riak query ever benefit from indexing, or does it map against the entire dataset every time?

Also, in the Riak query (and the example SQL) there is a hardcoded 18. In SQL you can use bind parameters to ensure that there is no chance of an injection attack. How would you do this in Riak? It seems like you would have to escape the value into the JS code you are sending to the server? Code generation is a huge pain, and string concatenation isn't going to cut it, but that sad state of affairs is what I see here: https://github.com/basho/riaktant/blob/master/examples/log-h...

That code is linked from the Riak documentation, and is ugly as sin. Do you have to specify queries against indexes based on what was indexed? In Postgresql, the query planner will make use of whatever indexes are available if they will help performance, but the SQL will execute with our without those indexes. In Riak it appears that it will simply map-reduce all data unless you specify a 'search' something or other to narrow it down, but that 'search' functionality is on the outside of the map-reduce, so the developer has to know which fields are indexed and which are not, and there isn't much hope of more complex index enabled joins.

-----


Just LOOK at all the dev time you save not creating DB tables! /sarcasm

-----


Meaningless copypasta, I was asking for a simple description of what it provides that would enable this.

-----


Sorry, Professor!

I feel like I went out of my way to satisfy your arbitrary demand, and throughout this entire 'debate' you are just insisting I am wrong, but not backing it up with any facts, evidence, or even arguments.

-----




Applications are open for YC Winter 2016

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: