If there are no free blocks, or for whatever reason, the flash device elects to use a block you've already written to, you need to erase the entire block, even if you're only writing one bit. This is relatively slow.
Most high end flash devices, as I understand it, try to keep a pool of pre-erased blocks so that they can stay on the fast path, on average.
As far as I can tell, what they're saying in this article is that sometimes you can hit the slow path, and that causes high latency. On average, it's not a big deal, but for some random small number of writes, it can be an issue.
Keeping pre-erased blocks is useful, but it can only reduce the write latency to what the chip gives you. And the chip gives you a longer write latency than read latency, especially for MLC with smaller process sizes.
see: http://cmrr-star.ucsd.edu/starpapers/309-Grupp-1.pdf (PDF)
If the device is kept at or near full utilization, eventually a fast operation (read) will get delayed behind a slower operation and your tail latencies will suffer.