The longer I work on performance teams, the more I agree with the "performance always matters" point of view[1]. In large binaries (or at least, in the large binaries I've worked on), we don't see a few tight loops taking the bulk of the time. Instead, there's a death by a thousand cuts in which many small inefficiencies add up, and have to be clawed back slowly and painfully, often by people with less understanding of the semantics of the code in question than the original authors. Most performance work I see isn't stuff like "write the tight loop in assembly; save 30% on execution time", it's stuff like "reuse the locale object to avoid construction penalties; save 0.4%".
For reasons like this I'm skeptical of e.g. python advocates who say the speed difference doesn't matter since you can always rewrite "the hot code" in C. That works when you're truly using python as a scripting language; the glue that ties you matrix multiplication routines or whatever together. But when you're going to have a large, flat, performance profile, you're better off just writing your program in C++ (or its friends) to begin with.
[1] So I suppose, you can take this entire comment as "person in specialty thinks everyone else should change to make his life easier". Maybe I just have a warped view of priorities.
The benefit of using Python to start out with is that you can get data on actual usage patterns before committing to an overall system architecture. If you just rewrite individual hot segments of the code without considering the overall system, you're missing a lot of the point.
To pick on Twitter since their evolution of the product is fairly public - when they started, the product concept was "Oh, let's post a status through an SMS and it'll be visible on a webpage." A database-backed Rails architecture is perfectly reasonable for this. Except that then they were like "...and you can follow people", and then people started using it as a broadcast medium, and they wanted to search for trending topics, and then they opened it up to developers who all wanted an API, and then they closed it to developers, and now it's a big brand-advertising platform designed to engage directly with your customers.
They ended up switching to the JVM, and got a huge amount of flack in the meantime around "Why the hell would you build a messaging architecture in Ruby on Rails?" But the point is that they didn't set out to build a messaging architecture; they set out to build a site where you could post your status update on a website. If they had built a site to do that in J2EE, they would've been equally fucked, probably even moreso. The reason they can build an efficient system now is because they have a lot of data about exactly how it's going to be used, which operations need to be fast, which operations will be performed frequently, and how much data total will be flowing through the system.
What's the alternative? Go? I've used both Python and Go for prototypes and in terms of quickly and flexibly building your software, Go still falls very short.
(If you were about to say Common Lisp or Ocaml, you might have a point, but these often have library issues that make them much slower than a more mainstream language for getting a prototype out.)
Where I'm from we call that a "flat profile" which usually is the death-knell of optimization.
Most of the time I've seen a flat profile it's because architecturally whoever built the system didn't care about their data layout and access patterns(see Mike Acton data oriented design talks). In order to fix these problems it usually requires a complete re-architecture.
It's intrinsic to the shape of the profile. If it was a hot loop or two you could refactor but flat profiles by their nature can't just be refactored or optimized in isolation.
With a flat profile you could sometimes still look to reduce cache misses or similar. Performance problems that present globally sometimes have relatively local cause. Hard to get wins as big as when there's a hot loop, though, to be sure.
For reasons like this I'm skeptical of e.g. python advocates who say the speed difference doesn't matter since you can always rewrite "the hot code" in C. That works when you're truly using python as a scripting language; the glue that ties you matrix multiplication routines or whatever together. But when you're going to have a large, flat, performance profile, you're better off just writing your program in C++ (or its friends) to begin with.
[1] So I suppose, you can take this entire comment as "person in specialty thinks everyone else should change to make his life easier". Maybe I just have a warped view of priorities.