There is an awful lot of 70s tech that is still in constant operation, and is hi...

wpietri · on May 15, 2012

Sure. But I think the problem space for databases has changed quite a bit in ways that aren't true for TCP, and are only partly true for C.

My dad was writing code at the time, and he saw the big benefit as allowing developers to manage larger amounts of data on disk (10s of megabytes!) without a lot of the manual shenanigans and heavy wizardry in laying out the data on disk and finding it again. Plus, the industry thought the obvious future was languages like COBOL, what with their friendly English-like syntax and ability for non-programmers to get things done directly.

So little of that is true anymore. For a lot of things that use databases, you're expected to have enough RAM for your data. We don't distribute and shard because we can't fit enough spinning rust in a box; we do it because we're limited on RAM or CPU. A lot more people have CS degrees, the field is much better understood, and developers get a lot more practice because hardware is now approximately free. And nobody really thinks the world needs another English-like language so that managers can build their own queries.

TCP, on the other hand, is solving pretty similar problems: the pipes have gotten faster and fatter, but surprisingly little has changed in the essentials.

C is somewhere in between. A small fraction of developers working these days spend much time coding in plain C, and many of its notions are irrelevant to most development work.

But unlike SQL databases, you could ignore C if you wanted to; there were other mainstream choices. That wasn't true for SQL until recently; the only question was Oracle or something else. I'm very glad that has changed.

etrain · on May 15, 2012

My apologies for the rant here - it's not directed specifically at you, but toward a general attitude I see on HN and in the tech community.

I was more commenting on your phrase "if half of that 1 MLOC is still relevant to new ways of building systems (and given that SQL databases are a 70s tech, I doubt it's that much)".

There has been a TON of academic and industrial research on SQL databases since they were invented in the 70s. Calling them 70s tech is akin to calling LISP 50s tech. The basic ingredients haven't changed much (sets in SQL and lists in LISP), but the techniques on top have evolved by leaps and bounds.

To your point here - there are plenty of companies that have way more data in their databases than RAM available. The early users of Hadoop, etc. were primarily constrained by disk I/O on one machine, rather than constrained by RAM or CPU on one machine. It is certainly convenient that a distributed architecture can solve both sets of problems if the problem space lends itself to distributed computation.

I'm not a huge defender of SQL - I think it has some serious problems - one fundamental problem is lack of strong support for ordered data, and it can be a huge pain to distribute. I agree that having some options with distributed K/V stores is really nice, but you have to admit that much of it hasn't yet been proven in the field.

I, for one, DO think that the world needs something like an english-like language so that "managers" can write their own queries. Honestly, the roles of programmer and data analyst are often wholly different. I think it's a huge kludge that the only people capable of manipulating massive datasets are people with CS degrees or the equivalent. Programmers suck at statistics, and while they're generally smart folks, they often don't have the business/domain expertise to ask the right questions of their data. Software is about enabling people to solve problems faster - why should we be only allowing people with the right kind of academic background to solve a whole class of problems.

Finally, saying that you doubt SQL is relevant to building modern systems is borderline irresponsible. Experimentation with new tools is good - but you have to also keep in mind that people were smart way back in the 70s, too, and that their work may be perfectly relevant to the problems you're trying to solve today.

wpietri · on May 15, 2012

No need to apologize! You make excellent points.

There has indeed been a ton of research on SQL databases. But still, Stonebraker, one of their pioneers, said that they should all be thrown out:

"We conclude that the current RDBMS code lines, while attempting to be a “one size fits all” solution, in fact, excel at nothing. Hence, they are 25 year old legacy code lines that should be retired in favor of a collection of “from scratch” specialized engines. The DBMS vendors (and the research community) should start with a clean sheet of paper and design systems for tomorrow’s requirements, not continue to push code lines and architectures designed for yesterday’s needs." -- http://nms.csail.mit.edu/~stavros/pubs/hstore.pdf

I'm sure a lot of their intellectual work is indeed something I could learn from. But SQL databases are an artifact of a particular moment in technological and cultural history. The thing I really want to learn about isn't the residue of their thoughts as interpreted through 30 years of old code, it's their original insights and how to transfer those to today's world.

Hadoop is a fine example of Stonebraker's point. The original sweet spot of relational databases was easy queries across volumes of data too large to fit in RAM. But Google realized early on that they could do orders of magnitude better with a different approach.

I agree that these new approaches haven't been fully proven in the field, but I'm certainly glad that people are trying.

As a side note, I think the right way to solve the pseudo-English query language problem is by making the query language actual English. If you have a programmer and an analyst or other business person sit next to one another, you get much better results than either one working alone.

etrain · on May 15, 2012

Stonebraker's research and commercial ventures the last several years have been focused on building specialized variants of existing database systems. Vertica (C-Store), VoltDB (H-Store), Streambase (Aurora), and SciDB are all specialized DBMS systems designed to overcome the one size fits all nature of things.

Further, he's been critical of NoSQL/MapReduce recently: http://dl.acm.org/citation.cfm?doid=1721654.1721659

Regardless, there's always going to be a balance between specialized systems and platforms, but my point is that we should be willing to trust the platforms that have proven themselves, avoid reinventing the wheel (poorly), and not be too quick to throw them out in favor of the new shiny.

I agree that programmer/analyst working together is a terrific pair, but the beauty of software is that we live in a world where we can have 1 programmer write a platform on which 10 programmers build 10 systems that 1000 users use to get their jobs done and make society that much more efficient.

wpietri · on May 15, 2012

Oh, I trust the current stable platforms to be the current stable platforms. My beef isn't with people who use them. It's with people who don't know how to do anything else, which was a plague on our industry for at least a decade. At least the people who get burnt on the latest fad will make new and interesting mistakes.

I agree that when we can find ways to let users serve themselves, that's best for everybody. I just don't think universal pseudo-English query languages are the way to do that, because the hard part isn't learning a little syntax, it's learning what the syntax represents in terms of machine operations.

Once the programmer and the analyst have found something stable enough to automate, by all means automate it. Reports, report builders, DSLs, data dumps, specialized analytic tools: all great in the right conditions. But people have been trying to build pseudo-English PHB-friendly tools for decades with near zero uptake from that audience. I think there's a reason for that.