A context switch to another goroutine doesn't need to "save the entire stack". The stack is already there. The runtime just needs to keep a pointer to the stack of the suspended goroutine. And the stack is usually just a few kilobytes.
Moreover, keeping the stack is useful for debugging because you can print a nice stack trace.
For these reasons and others, nginx could never be as fast as it is with a goroutine model.
I haven't read nginx source code, but I guess they are able to store continuations in a fixed structure because they don't use closures and they know in advance what information must be kept. I don't see how this approach can be used as a general purpose concurrency mechanism for a programming language. But I'd like to learn something :-)