Seriously. Instead of a single, clean restaurant reservation HTTP POST API, the future is two neural nets modulating and demodulating the request to and from inexact and potentially ambiguous English audio.
Silly. The future is a stenographic handshake in the initial greeting, which negotiates an upgrade to a proprietary gRPC8 protocol when the caller and recipient are both Google, which Google uses to get a monopoly on telephone-mediated social interactions which it can then monetize by building a social graph to more efficiently target advertising to captive audiences riding Waymo cars.
To be fair, we could make the same complaint in regards to the web - it's largely all plain-text on a line, as opposed to some form of compiled bytecode (I know, it's coming).
What we lose in using human speech for precision we make up in it being pretty much universal. Talk about an adaptable interface. You can phone the restaurant and do anything from reserving a table to ordering takeout to informing them that their cat is on fire.
(I mean that as both a joke and a real comment - you could never force every restaurent in the world to learn REST, but you sure can call a bunch of them)
I think Neal Stephenson would make the case that we started down that inefficiency road when we replaced telegraph signal with voice in the first place. ;)