The thing with the NoSQL guys is that many of them seem not to be in a position to make an educated comparison. For example, an, uhh, enthusiastic MongoDB advocate recently informed me that MongoDB was superior to Oracle because in Oracle you had to poll a table to see if it changed. Except, no, that isn't actually true: http://docs.oracle.com/cd/B19306_01/appdev.102/b14251/adfns_... - and that document is from 2005. And you could do a trigger and an AQ message/callback 5 years before that (at least). You haven't needed to poll an Oracle database for changes in a loooong time.
Basically, every evangelism point, you have to double-check and cross-reference, because as you say, the NoSQL guys are encountering issues the RDBMS community addressed years ago (7 in my example, but the sharding stuff, 20+ years) - except they think they are discovering it for the first time!
Programming is more like fashion than science in this regard. Every decade or so something truly new happens in the software world. All the rest is mostly sound and fury, signifying nothing. If you're young or new to programming it's easy to mistake the buzz around things like NoSQL for innovation when they are usually re-discoveries of old (and often discarded or obsolete) ideas dressed up in new clothes.
There's also the tendency to favor new shiny things and reject old crufty (but proven) things, to want to be part of what seems like the leading edge, to be that guy in the cube farm who is playing with the cool new stuff.
I have been programming longer than RDBMSs have been available, so I know from experience what it's like to manage large databases in application code, and how hard it can be to maintain consistency or do accurate queries and aggregation with half-baked tools. It's frustrating to see a new generation of programmers go through this, but it's human nature to ignore the past.
My fourteen year old son wears his pants pulled down below his waist, Vans shoes, hoodies, lots of hair. He looks pretty much like I did when I was fourteen back in the 1970s. The underlying technologies are the same: pants, shirt, shoes, sweater, hair. The only differences are superficial. To him that style is edgy and contemporary and something his parents don't get. NoSQL is the gangster fashion of programming right now.
I cannot remember a more aggravating discussion on Hacker News.
A lot of people who are working with / building and using 'NoSQL' databases are the very same people building the RDBMS tools that you are so eager to defend.
You have stated elsewhere that you do not know anything about 'NoSQL' stores, so instead of insulting a huge number of people far more experienced than yourself, why dont you attempt to learn? (or at the very least, avoid disregarding the thoughts of a huge sector of the industry)
I cannot remember a more aggravating discussion on Hacker News.
I suggest taking exchanges of opinion less personally.
You have stated elsewhere that you do not know anything about 'NoSQL' stores
No, I didn't. I have read about them, gone to conferences, and gone through tutorials for a few NoSQL products, but haven't had a use for them in my own work. Since I have 35+ years of programming and database experience I am not viewing these things through the eyes of a newb. And I have lots of experience with database management pre-RDBMS. I did say I haven't found any use for the current batch of NoSQL tools in my own projects, but obviously lots of other people have. And I have clearly indicated that everything I write is my opinion based on my experience, not word from on high. Again, get a grip.
instead of insulting a huge number of people far more experienced than yourself
No insult intended. My opinions, your mileage may vary. Ad hominem attacks are, on the other hand, deliberate insults.
why dont you attempt to learn? (or at the very least, avoid disregarding the thoughts of a huge sector of the industry)
Thanks for the career advice. My advice to you and some of the other commenters frothing at the mouth is to disconnect your ego from your preferred tools.
In fact I was watching a presentation somewhere (I forget where), where folks were using NoSQL databases as a sort of pre-processing layer for the RDBMS. In this regard it's very useful on the high-end.
Where I am less convinced is outside the high-end, and outside the idea of a transition tier between input and RDBMS (or between RDBMS and output even). As a persistence layer, NoSQL is applicable largely to the subset of cases where Object Oriented Databases were. Here the GP is right that the industry is keen on relearning the same lessons every couple of decades.... However, as pre-processing and post-processing, there seems to me to be a much larger set of use cases out there but again only where the RDBMS is no longer really able to handle everything you want it to do.
MongoDB is disproportionately liked by the inexperienced. There are things to like about it, and being able to have sparse secondary b-tree indexes on arbitrary data is, overall, pretty rad. For prototyping, being able to just toss some shit in there, especially when your data comes from an external service and you're not in control of your incoming data (very common these days), it works great for simple use cases and CRUD apps (which is a lion's share of new projects in the industry of the Internet). Being able to predict all the incoming data ahead of time isn't always doable, because a lot of services have underdocumented APIs, and every now and then you'll get data that's just ever-so-slightly different than what you expected. With Mongo, you can at least guarantee you're holding onto that, so that you can use this existing data in the future. That's way less contrived than it sounds. E.g., a third-party changes one of the data formats, adds a field or changes a datatype on one, for instance. If you have a strict schema, you might fail to write some of that data until you update everything. With Mongo, you can at least capture that data and get it working later.
I'm using it on one of my projects, and as I said, there are things to like about it, but graph traversals remain problematic. Overall, I'd say I'm pretty unhappy with MongoDB, and I wish I hadn't chosen it; it doesn't work well for my project. A lot of things I'm doing boil down to performing some kind of graph traversal, which is painful to do at runtime in Mongo (the potential solution space is too large to be precomputed). From what I've seen, MongoDB hasn't been working for me very well with a highly-connected data set.
This is ABSOLUTELY the tradeoff. I am not using Mongo for long-term data storage. I am using it for hacking systems together which need a place to store data. Possibly this data will be manually queried and examined later on, possibly this data lives only a day or two. The entire conversation changes when you are wanting structured data output that is stable and scalable for 10 years.
You deploy a database without doing a spike or clearly without reading the documentation. Because it is well known that MongoDB by design is ill-suited for doing lots of join between tables. It is much better thought of as a document store.
This kind of rubbish really needs to stop. Just because you don't agree with or understand their choices does not mean that the majority of "NoSQL guys" are ignorant or uneducated.
Some of the biggest companies e.g. Twitter, Foursquare, Google, Amazon all rely on NoSQL.
The real issue I see is that by dismissing NoSQL as only for fools RDBMS developers are failing to see why they are popular to begin with. Take PostgreSQL for example and how difficult it is to shard/replicate compared to CouchDB or MongoDB. This is an area PostgreSQL should see as an opportunity for improvement.
But dismissing huge groups of people as uneducated just makes you seem uneducated.
The majority of NoSQL guys are not Twitter, Foursquare, Google, or Amazon, and Twitter, Foursquare, Google, and Amazon all have plenty of people experienced enough in traditional RDBMS to tell them what things would probably be better in them.
Most NoSQL guys I meet have little to no experience in SQL. Plenty have lots.
The plural of insulting everyone's intelligence who disagrees with you is not authority.
>But do you have ask why so called inexperienced users are choosing NoSQL in the first place? Hint: it's because most RDBMS are ridiculously complex and inflexible.
If we're going to have a cliche fight, this one is called having your cake and eating it too. Either inexperienced users are gravitating to NoSQL or they aren't.
Operating under the assumption that they are, I'd say it's both because they can interact with them without an impedance mismatch; also because they are flat, they are easy to visualize. Another reason might be because they don't have to put a lot of thought into their schema, which would involve new concepts that would take a little time to learn. The biggest reason in my eyes would be that they don't know how big a performance hit they're taking in a write-heavy environment.
In very read-heavy environments with wide heterogeneous queries that you would end up denormalizing in 18 different ways anyway? They could be doing it because they're smart and have done their research. I love NoSQL.
I never meant to imply that inexperienced users were definitely moving to NoSQL. I would imagine that orders of magnitude more of them are still using MySQL due to its pervasiveness.
Only that if they were the current complexity of RDBMS would be a big factor. Accessing a database as a REST service like say CouchDB or having fluid schemas like MongoDB is much easier to handle than ER models.
I think we did "sharding" with relational databases... Back in the 80s. Then we got fast hash joins and partitioning and it turns out that the disadvantages of sharding just aren't worth it. The NoSQL crew will figure this out too around 2030 :-p
Sharding may have been available in the 1980's, but what it lead to in some products is quite amazing. Consider Teradata's clustering ability which is sort of like sharding your database but without the disadvantages typically associated with it. Postgres-XC now offers something similar as a semi-fork of PostgreSQL.
Basically what we are talking about here is a two-tier database layer where storage and coordination are separated, and two phase commit is used between these two. Thus the coordination tier can enforce referential integrity between storage nodes if necessary and thus allow write-extensibility.
This isn't something without uses. For high-end, high-write-load databases, serving very large amounts of traffic (think airline reservations), this has been a typical approach for quite some time.
The fundamental problem though is that once you give up on local consistency over a given domain, you cannot have any guarantee of global consistency. The current relational approaches (Postgres-XC and Teradata) both enforce ACID compliance. BASE doesn't offer any consistency guarantee and therefore it is only good for throw-away data.
Oh absolutely, but what you're talking about there, people do with CICS today, and that's even older than the 80s. CICS is a technology I have a lot of respect for.
But my point is - when I need to use something like that, I know that's what I'm using. I don't imagine that it's some new invention. Hell, a lot of what the NoSQL guys think they're inventing, IBM did back then too - IMS.
Why? The guys at FB or Google are just guys like you and me, don't believe the hype, they are not superheroes or gurus despite their much talked about interview process (the Google interviews I did were a walk in the park compared to GS, btw). Some will know more than me sure, but some'll know less.
At my last job, we averaged 60k pages/sec, on what the NoSQL guys would call a "legacy" database, and there was plenty of headroom. It's not rocket science, just engineering. Companies like Google love to weave a mystique around what they do, it's in their interests after all, to convince their investors that what they do is magic. But I'm from back in the day when we were the same way about millions of pages per month, then per week, then per day...
On good hardware there were PostgreSQL instances running a billion queries a day back in the 9.0 days. Often these are actual accounting apps where reporting matters and so NoSQL would be a very poor fit. For example the French government uses it to distribute welfare program dispersements. The Wisconsin Courts also uses PostgreSQL at loads in the billion per day range. I know there are larger instances out there.
Now a days we are talking about thousands of concurrent users and up to 350k reads/sec on high end machines.
Yes there is a lurking iceberg of things that aren't in the public eye like Facebook or Google, and that's where the really intense and interesting stuff is happening. I wonder what the guys at Visa make of all the hype around these websites, when they were doing these volumes all along...
At said 60k pages/sec job, you know who we looked up to for databases? Starbucks. Walmart. McDonalds. 'Cos we'd seen what they do, and anyone who thinks Twitter is impressive, well it'd blow their minds.
This is a valid point. I dismissed NoSQL for years, and still (largely) do (I prefer to have a schema, documented data structures, deal with relational data etc.)... but I finally gave Mongo a go the other day, and am actually quite impressed. For a simple endpoint I can throw arbitrary persistent data in (rather than the filesystem), it's let me get a couple of proof-of-concept projects out the door much quicker.
Note that I'm not intending to comment on its scalability or how appropriate it is for other projects, just that I agree with you: dismissing it out of hand is hubris.