Multiplayer games run with a synchronization/prediction model. Most are not run with a purely server-driven world-state. The entire game simulation code and logic runs locally and is sent to the server, the server than collects all the inputs and calculates differences to the world state and sends back the diffs with predictions based on latency. These diffs may differ from the local state, and so corrections are applied. In the majority of cases, the differences are minor enough that the player doesn't notice. When they are severe, the player notices hitbox inaccuracy or worse, rubberbanding as his actions are "snapped back".
So when you're playing an FPS and you press a button to shoot, the local game logic and physics computes and displays the result immediately (firing animation, sound start playing). A short delay later, the server confirms a kill.
Even with this fairly sophisticated model, you still want <50ms pings. Video streaming and running the entire game in the cloud just won't work for games that require rapid hand-eye coordination.
So when you're playing an FPS and you press a button to shoot, the local game logic and physics computes and displays the result immediately (firing animation, sound start playing). A short delay later, the server confirms a kill.
Even with this fairly sophisticated model, you still want <50ms pings. Video streaming and running the entire game in the cloud just won't work for games that require rapid hand-eye coordination.