Keep in mind that this includes Instagram and Whatsapp too, as far as I know. As for "wasteful", well..
1.88 billion DAU (Q1 earning report)
= 21759 "users per second" (note: I made this up)
It's hard to fathom just how big traffic to FAANG services can be, until you see what it takes to serve it. Is there some waste? Sure, probably, but not as much as you'd think.
Facebook makes its money from advertisers so that's likely where most of the compute resources are going - users just see the ads at the end of all that computation. Combined with the mandatory over provisioning, the overhead of a massive distributed systems, tracing, etc, I'm not surprised those are the numbers.
Assuming each server cost an average of $20k, that's $40 billion which is two quarters worth of revenue but amortized over 5+ years. It's really not all that much.
As for an interpreter I have not really thought about it too hard. It might be harder than I was originally considering because I was thinking in the context of a full data trace which would just let you re-run the program + interpreter. With just an instruction trace you might need a lot more support from the interpreter. Alternatively, you might be able to do it if the interpreter internals properly separate out handling for the interpreted instructions and you could use that to reverse engineer what the interpreted program executed. Though that would probably require a fair bit of language/interpreter-specific work. Also, given the expected relative execution speeds of probably ~10x, it would probably not be so great since you get so much less execution per unit of storage.
As for the JIT, it's not clear to me that modern Java VMs actually maintain a complete machine-code-to-application-bytecode mapping. It would be good for Pernosco if they did, but I think it's more likely they keep around just enough metadata to generate stack traces, and otherwise rely on tier-down with on-stack replacement to handle debugging with breakpoints, at least for the highest JIT tiers.
That's an understandable tradeoff for car driving.
It's an understandable tradeoff for Facebook debugging.
Ergo, we do it.
Looking at their status page (https://developers.facebook.com/status/dashboard/), they seem to have a problems every week. These are only the public ones!
I guess they figured they are losing more money due to these problems than the additional 5% they have to spend on infrastructure.