My preference would be to have a well-defined, even if of-limited-use, load rather than a poorly-defined, but maybe-more-useful?, load. From the article, it sounds as if kernel authors tried to implement the former, found it difficult to do as machines grew and then everyone headed off in different directions only to implement lots of the latter...
If your IO actions are performed by CPU rather than DMA, having them in there makes sense (it will slow down anything else). but if it is performed via DMA (or in other ways offloaded from the CPU) it makes less sense.
Anyone waiting on the process is feeling pain, but it isn't eating any CPU so it comes down to what you want the metric to mean. Should it reflect user discomfort or consumption of cpu cycles.
When a machine runs out of RAM and starts thrashing, it most definitely does not want to have more stuff run on it. You want the load average to be high in this case to discourage people or programs from starting more stuff. Luckily, when a system starts thrashing, it typically has lots of processes in IOwait, so load average goes through the roof.
Over time you know what is a "normal" value, so checking the load average is a nice quick health check. There are many other better defined metrics to look at when there's actually a problem. If there is a problem, even a well-defined load average is too coarse a metric to give particularly meaningful insights.
When the NFS server decides to take a rest I've seen 1000+ loads on production Linux (3.13+) systems
Or kernel panic
Or cats and dogs living together
This one from Linuxjournal however.
 - http://www.linuxjournal.com/article/9001