I was more commenting on your phrase "if half of that 1 MLOC is still relevant to new ways of building systems (and given that SQL databases are a 70s tech, I doubt it's that much)".
There has been a TON of academic and industrial research on SQL databases since they were invented in the 70s. Calling them 70s tech is akin to calling LISP 50s tech. The basic ingredients haven't changed much (sets in SQL and lists in LISP), but the techniques on top have evolved by leaps and bounds.
To your point here - there are plenty of companies that have way more data in their databases than RAM available. The early users of Hadoop, etc. were primarily constrained by disk I/O on one machine, rather than constrained by RAM or CPU on one machine. It is certainly convenient that a distributed architecture can solve both sets of problems if the problem space lends itself to distributed computation.
I'm not a huge defender of SQL - I think it has some serious problems - one fundamental problem is lack of strong support for ordered data, and it can be a huge pain to distribute. I agree that having some options with distributed K/V stores is really nice, but you have to admit that much of it hasn't yet been proven in the field.
I, for one, DO think that the world needs something like an english-like language so that "managers" can write their own queries. Honestly, the roles of programmer and data analyst are often wholly different. I think it's a huge kludge that the only people capable of manipulating massive datasets are people with CS degrees or the equivalent. Programmers suck at statistics, and while they're generally smart folks, they often don't have the business/domain expertise to ask the right questions of their data. Software is about enabling people to solve problems faster - why should we be only allowing people with the right kind of academic background to solve a whole class of problems.
Finally, saying that you doubt SQL is relevant to building modern systems is borderline irresponsible. Experimentation with new tools is good - but you have to also keep in mind that people were smart way back in the 70s, too, and that their work may be perfectly relevant to the problems you're trying to solve today.
There has indeed been a ton of research on SQL databases. But still, Stonebraker, one of their pioneers, said that they should all be thrown out:
"We conclude that the current RDBMS code lines, while
attempting to be a “one size fits all” solution, in fact, excel at nothing. Hence, they are 25 year old legacy code lines that should be retired in favor of a collection of “from scratch” specialized engines. The DBMS vendors (and the research community) should start with a clean sheet of paper and design systems for tomorrow’s requirements, not continue to push code lines and architectures designed for yesterday’s needs." -- http://nms.csail.mit.edu/~stavros/pubs/hstore.pdf
I'm sure a lot of their intellectual work is indeed something I could learn from. But SQL databases are an artifact of a particular moment in technological and cultural history. The thing I really want to learn about isn't the residue of their thoughts as interpreted through 30 years of old code, it's their original insights and how to transfer those to today's world.
Hadoop is a fine example of Stonebraker's point. The original sweet spot of relational databases was easy queries across volumes of data too large to fit in RAM. But Google realized early on that they could do orders of magnitude better with a different approach.
I agree that these new approaches haven't been fully proven in the field, but I'm certainly glad that people are trying.
As a side note, I think the right way to solve the pseudo-English query language problem is by making the query language actual English. If you have a programmer and an analyst or other business person sit next to one another, you get much better results than either one working alone.
Further, he's been critical of NoSQL/MapReduce recently: http://dl.acm.org/citation.cfm?doid=1721654.1721659
Regardless, there's always going to be a balance between specialized systems and platforms, but my point is that we should be willing to trust the platforms that have proven themselves, avoid reinventing the wheel (poorly), and not be too quick to throw them out in favor of the new shiny.
I agree that programmer/analyst working together is a terrific pair, but the beauty of software is that we live in a world where we can have 1 programmer write a platform on which 10 programmers build 10 systems that 1000 users use to get their jobs done and make society that much more efficient.
I agree that when we can find ways to let users serve themselves, that's best for everybody. I just don't think universal pseudo-English query languages are the way to do that, because the hard part isn't learning a little syntax, it's learning what the syntax represents in terms of machine operations.
Once the programmer and the analyst have found something stable enough to automate, by all means automate it. Reports, report builders, DSLs, data dumps, specialized analytic tools: all great in the right conditions. But people have been trying to build pseudo-English PHB-friendly tools for decades with near zero uptake from that audience. I think there's a reason for that.