If you have a SQL query that takes 50,000 core-seconds, it's probably more useful to execute that query using 10,000 cores in 5 seconds rather than 10 cores and 5000 seconds, especially if cost is the same. Even better if you never have to spin up a VM or worry about scale. This benefit is tangible and applicable to anyone who runs SQL. The reason this isn't prevalent is because it's economically and technologically prohibitive. BigQuery tips that scale in the other direction.
Point is, higher-level cloud-native services unlock very interesting use cases that are applicable for both small-scale startups and large companies, use cases that are impossible with just VMs.
I'm not really disagreeing (much), but very few things fit that criteria. More common are simple problems so overengineered that they sprawl across two Amazon availability zones when a straightforward implementation could serve the whole customer base off a $20 a month VPS. This is more depressingly common than you think. Also depressingly common is a 50000 CPU second operation that could be a 1 CPU second operation with a few indexes and a smarter algorithm. AWS adds a lot of carbon to the atmosphere cranking through crap code. Trust me I've seen it.
What Amazon and kin have done is offer developers a new sexy way of over engineering. The AWS stack is the new Java OOP design patterns book. Yes, there is occasionally a time when an AbstractSingletonFactory is a good thing but I guarantee you most of those you see in the wild are not those times.
The real genius was to build a jungle gym for sophomore programmers to indulge their need to develop carpal tunnel syndrome where everything bills by the instance, hour, and transaction. If Sun had found a way to bill for every interface implemented and every use of the singleton pattern they would have been the ones buying Oracle.
Likewise, but I think you're getting into the philosophical, not the practical. You may choose to live in a single-CPU world for your database, but you're simply disqualifying yourself from a whole lot of interesting use cases. Index+algo only solves a sliver of analytic use cases. And, ultimately, I'm afraid you're creating a world where you cannot effectively understand the shape of your data and you cannot effectively test your hypotheses, so you go with gut feel. And, perhaps more importantly, you cannot create software that learns from its data.
Your argument can be summarized thus as this - do not give people incredible computing capacity at never-before-seen economic efficiency, because they will use it inefficiently. I'm afraid this argument gets made every time the world gets disrupted technologically (horse vs car anyone?).
Edit: I may argue that if "carbon footprint" is your prerogative, then economies of scale + power efficiency should tilt the scale towards cloud, no? AWS is certainly on the dirtier side, but there are other, greener clouds.
I'm not saying what you think I am saying. The thread was about how the cloud is immensely profitable, and I'm saying that a good chunk of that is built on waste and monetization of programmers' naive tendencies to overcomplicate problems.
I am not arguing that there are no great use cases for these systems. But I would be willing to bet that those are less than half the total load.
It's like big trucks. How many people who drive big trucks actually need big trucks? Personally I like my company's Prius of an infrastructure. :) And of course we've architected it so it can be a fleet or an armada of Priuses if need be, with maybe just a bit of work but if we get there I will be happy to have that problem.
If availability and scale are not important, and you can tolerate having to engage a human in the event of a hardware failure, then sure a $20 VPS might suffice. You could also run a single virtual machine in one zone in the cloud.
But I think you might underestimate the amount of use-cases that do legitimately benefit from and desire a greater degree of reliability and automation. When one of my machines dies, I don't want to be notified, and I don't want to have to do anything about it. I want a new virtual machine to come online with the same software and pick up the slack. Similarly, as my system's traffic grows over time, I want to be able to gradually add machines to a fleet, to handle my scaling problem, or even instruct the system to do that for me.
Plenty of use-cases may not require this, but I'm not convinced that the majority of systems in the cloud do not. Every system benefits from reliability, and it's great to get it cheaply and in a hands-off way. In the cloud, I can build a system where my virtual machine runs on a virtual disk, and if there's a hardware failure, my VM gets restarted on another physical machine and keeps on trucking without my involvement. As an engineer and scientist, I can accomplish a lot more with a foundation like this. I can build systems that require nearly zero maintenance and management to keep running, even over long time scales.
I don't think I disagree with you that some people overengineer systems, but I think I disagree with you about how much effort it requires to achieve solid availability and a high level of automation. It's not a lot of effort or cost, and it's a huge advantage. Once I build a system I never want to touch it again.
A certain segment of users are adopting these technologies because they want to be prepared to scale. One of the advantages of "big data" products even for small use-cases is: all successful use-cases grow over time. If you plan for success and growth, then you may exceed the capabilities of a traditional technology. If you use a "big" technology from the beginning, then you can be confident that you'll be able to solve increases in demand by scaling up, rather than by rearchitecting. As these platforms mature and become easier to use, the scales begin to tip, and they no longer require more engineering time than the alternatives; a strong hosted platform actually requires less time in total, especially when you consider setup and maintenance. Many of these technologies do an excellent job "scaling down" for simple use-cases too. While they have been difficult to use, they're getting easier. For example, MapReduce-paradigm technologies are becoming fairly easy with Apache Hive, and fast with Spark. They're becoming easier to set up due to hosted variants like AWS's ElasticMapReduce or Google Cloud Dataproc, etc.
I don't think you shouldn't make the capability available, but I wish more people would stop to ask, "do I need this?"
Since I do data analysis and machine learning (sometimes), a common one I see is people using "big-data analytics" stacks when they don't have anything remotely in the range of a big-data problem. Everyone really seems to want to have a big-data problem, but then it turns out they have like, single-digit gigabytes of data (or less). And they want Hadoop on top of some infrastructure to scale a fleet of AWS VMs, so they can plot some basic analytics charts on a few gigs of data? They would be better served by the revolutionary new big-data solution "R on a laptop". But somehow many people have convinced themselves they really need Hadoop on AWS.
Though I haven't used it yet, BigQuery does seem interesting in comparison, because it at least seems like it doesn't hurt you much. The Hadoop-on-VMs thing is objectionable rather than merely unnecessary, because you get this complex, over-architected system for what is not a complex problem. BigQuery at least seems like, at worst you end up with basically a cloud-hosted RDBMS with scaling features you don't need, which isn't the end of the world as long as the pricing works for you.
edit: Just to clarify, I'm not the person you were replying to, just someone who also has opinions on this. :)