From what I've seen, the problem of inefficiency most often manifests when the entire system of software is taken as a whole.
On a developer's machine with local SSD, a small database and a really small working set, even large inefficiencies can become invisible because the computer is just that fast. However, add in some network latency between the application and the database, some unpredictability about the timing of requests, and a much larger data set so that you won't hit data in memory 100% of the time, and suddenly code that was working fine grinds to a halt as it crosses a threshold where it can no longer process requests faster than they come in.
Unfortunately, with modern libraries helping "abstract away" underlying architecture like databases and the network, code like this is very easy to write without even realising that there might be a problem until it hits you in production.
>Unfortunately, with modern libraries helping "abstract away" underlying architecture like databases and the network, code like this is very easy to write without even realising that there might be a problem until it hits you in production.
I think that modern library abstractions are a tip of the iceberg. There is entire copy-and-paste class of developer who doesn't understand what's going on under the hood with most anything they're doing. We're operating on a stack of a abstractions a dozen layers deep, and people rarely take the time to look more than one layer down.
These folks are competent developers who can get things done. Even big things. But they are either incapable of envisioning how the entire system will interact, or they lack the appropriate functional details of how the abstractions work "behind the curtain" to be able to form an accurate mental model. As a result, they just wire things up without understanding how everything will interact.
The Mythical Man Month talked about the "surgeon" role on a team. The programmer that's now known as the "10x" developer. That's what's needed for any project that will eventually need to scale, or you're almost certainly going to have major inefficiencies that make everyone's lives difficult for years. I'm actually considering marketing my consulting services as being that person for startups -- the architect you really need to design a solid foundation, so that you don't accidentally create a house of cards that will fall over the first time you hit the front page of HN.
Not to say that anyone can foresee all interactions in a complex system. You still have to profile and test carefully. But I do consulting and have seen systems that had, to me, obvious bottlenecks that were poorly handled. One recommendation I made brought hosting costs for a small company down from $10k/month (and spiraling up out of control) to less than $500/month. And their product wasn't even that popular; it was not nearly at a Reddit, HN, or Quora level of usage. They peaked at about 2000 requests per second. I was handling more than that on a $10/month VM for my own backend (different load, and different requirements, but theirs could have been handled by a pair of app servers and a paid of database servers, two of each entirely for redundancy).
I wish more software provided appropriate metrics for performance analysis in live systems. I've seen people blaming the CPU or the disk or the database or the network for issues, but when you actually look at the graphs, the CPU is mostly idle, IO is almost nonexistent and the database sees a few dozen selects per second at worst, but somehow the problem is not in the software... And it's really difficult to get people to believe otherwise when all you have is data from around the software itself, not from inside it.
I never see the network blamed, it's usually ignored. I've seen way to many instances of slow database queries being replaced by slow cache reads when the problem wasn't slow queries but 1000's of fast ones.
This is actually my biggest complaint about the commonly cited benchmarks at TechEmpower. [1]
Their "multiple queries" and "data updates" benchmarks seem to be more of a test of the database and the specific database bindings than anything like production differences in database access. If you look at the data, the database is clearly being pushed to its limits, and is the constraining factor for all of those tests.
It's fine to test the bindings for speed, but they should be clear that's what they're doing. Instead they claim to be testing the speed of the frameworks ("Web Framework Benchmarks" is the topmost title on the page). I think the binding speed is rarely the constraining factor in production, and that if you had a properly scaled database backend you'd find the same languages on top of that comparison that you find on top of the "PlainText" and "Single Query" benchmarks.
In a production environment you typically have a database that is running on different hardware, sometimes in shards, and that is scaled to handle your access patterns. In that environment all queries have higher latency than a database on localhost, and you can push a single application server (where the "framework" runs) a lot farther when you're using, say, Go or Node.js, which can more efficiently pipeline requests, than you can when you're using a thread-per-connection language like Ruby or PHP.
Finally, the TechEmpower "Fortunes" benchmark is nearly 100% a benchmark of the default templating approach in each framework; Go has template libraries that are literally 10x faster than the one in the standard library and could be among the top performers on that page as well.
I guess the moral of the story is that I should contribute to the discussions around TechEmpower benchmarking... :)
Abstractions and libraries are surely not the problem. The higher the abstraction the more opportunity for high level optimizations. And they always trump low level optimizations.
The problems are that people dont know how to optimize things or are blocked by imposed architecture.
On a developer's machine with local SSD, a small database and a really small working set, even large inefficiencies can become invisible because the computer is just that fast. However, add in some network latency between the application and the database, some unpredictability about the timing of requests, and a much larger data set so that you won't hit data in memory 100% of the time, and suddenly code that was working fine grinds to a halt as it crosses a threshold where it can no longer process requests faster than they come in.
Unfortunately, with modern libraries helping "abstract away" underlying architecture like databases and the network, code like this is very easy to write without even realising that there might be a problem until it hits you in production.