For instance, when a goroutine makes a blocking syscall, it will continue to use its current M (which is blocked in the kernel), but will release its P, allowing another goroutine to execute.
This means that GOMAXPROCS goroutines can execute in user space in parallel, but more goroutines can be blocked in the kernel on different OS threads.
The Go runtime will create more M's as necessary to run all of the P's.
(Note that the Go runtime does try to avoid needing one M per goroutine. For instance, goroutines blocked on a channel are descheduled entirely (they give up their P and M), and are scheduled again only once they need to be woken.)
What you say is almost true, but the terminology is not quite right. Goroutines (Gs) are multiplexed (scheduled) onto threads (Ms), not Ps. Goroutines acquire Ps, they are not scheduled on Ps. However, they are scheduled onto Ms through Ps.
P really stands for processor, and it's just an abstract resource.
> The Go runtime will create more M's as necessary to run all of the P's.
Ps are not runnable entities. The runtime will create Ms in order to run all the runnable Gs, of there which are most the number of Ps (since runnable Gs acquire Ps). But they can be less, and then the runtime will use less threads, while the number of Ps is always exactly equal to GOMAXPROCS.
> when a goroutine makes a blocking syscall, it will continue to use its current M (which is blocked in the kernel), but will release its P
Only for blocking syscalls issues outside the runtime. Non-blocking syscalls do not release P, and neither do syscalls the runtime (not the syscall package) has to do.
Of course, you don't need Ps at all to implement Go with user-space scheduling. Go only added them in Go 1.1. However, this design avoids a global scheduler lock, and uses less memory. Plus some things just fall out naturally from the design, e.g. GOMAXPROCS accounting comes for free simply by the fact that you have GOMAXPROCS Ps.