Most vintage systems use something way more complex than a pure CPU-controlled framebuffer for graphics. They generally have some sorts of pre-defined "tiles" used to implement a fixed-width character mode, with the addition of a limited number of "sprites" overlaid in hardware. These video modes could be implemented efficiently by modern GPU's.
Only if you don't have a cycle correct emulator inbetween. Those old school system relied on hard real time timings down to the clock cycle to let the CPU control the color palette, sprite properties etc... at the right raster position within a frame. Modern GPUs don't allow such a tight synchronization between the CPU and GPU, so the best way is to run the entire emulation single-threaded on the CPU, including the video decoding.
(the resulting framebuffer can then of course be dumped into a GPU texture for rendering, but that just offers a bit more flexibility, eg embedding the emulated system into a 3D rendered world)
It depends what you mean by "hard real time". In theory, user input you get while scanning out pixel x might change pixel x + 1, and this leaves you with no choice but rendering single pixels in a strictly serial way. In practice, no existing emulator cares about that.
It's not about user input, but the CPU writing video hardware registers at just the right raster position mid-frame (to recycle sprites, change the color palette, or even the resolution). Home computer emulators for systems like the C64 or the Amstrad CPC need to do this at exactly the right clock cycle, otherwise modern (demo scene) demos wouldn't render correctly.
PS: of course one could build a GPU command list to render such a video frame somehow, but I bet just building this command list is more expensive then just doing the video decode with the CPU. It would basically come down to one draw command per (emulated system) pixel in the worst case.