They're mixing parallelism and concurrency. (nb: I might be abusing these terms too)
Parallelism aka CPU-bound tasks are limited by the number of cores you have. Concurrency aka IO-bound tasks are not, because they're usually not all runnable at once. It can be faster to go concurrent even on a single core because you can overlap IOs, but it'll use more memory and other resources.
Also, "going faster" isn't always a good thing. If you're a low priority system task, you don't want to consume all the system resources because the user's apps might need them. Or the the user doesn't want the fans to turn on, or it's a passive cooled system that shouldn't get too hot, etc.
And for both of them, it not only makes it easier to write bugs in unsafe languages, but in safe languages you can easily accidentally make things slower instead of faster just because it's complicated.
Using his distinction, concurrency isn't about IO-boundedness (though that's a common use-case for it), but instead is about composing multiple processes (generic sense). They may or may not be running in parallel (truly running at the same time).
On a unix shell this would be an example of concurrency, which may or may not be parallel:
$ cat a-file | sort | uniq | wc
Each process may run at the literal same time (parallelism), but they don't have to, and on a single core machine would not be executing simultaneously.
A succinct way to distinguish both is to focus on what problem they solve:
> Concurrency is concerned about correctness, parallelism concerned about performance.
Concurrency is concerned about keeping things correct[1] when multiple things are happening at once and sharing resources. The reason why those problems arise might be for performance reasons, e.g. multiplexing IO over different threads. As such, performance is still a concern. But, your solution space still involves the thread and IO resources, and how they interleave.
Parallelism is in a different solution space: you are looking at the work space (e.g. iteration space) of the problem domain and designing your algorithm to be logically sub-dividable to get the maximum parallel speedup (T_1 / T_inf).
Now, a runtime or scheduler will have to do the dirty work of mapping the logical subdivisions to hardware execution units, and that scheduler program is of course full of concurrency concerns.
[1] For the sake of pedantry: yes, parallelism is sometimes also used to deal with correctness concerns: e.g. do the calculation on three systems and see if the results agree.
Parallelism aka CPU-bound tasks are limited by the number of cores you have. Concurrency aka IO-bound tasks are not, because they're usually not all runnable at once. It can be faster to go concurrent even on a single core because you can overlap IOs, but it'll use more memory and other resources.
Also, "going faster" isn't always a good thing. If you're a low priority system task, you don't want to consume all the system resources because the user's apps might need them. Or the the user doesn't want the fans to turn on, or it's a passive cooled system that shouldn't get too hot, etc.
And for both of them, it not only makes it easier to write bugs in unsafe languages, but in safe languages you can easily accidentally make things slower instead of faster just because it's complicated.