I don't agree - availability is a totally meaningless property if you are allowed to occasionally return "no, I won't process your request". Such a response communicates nothing about the state of the atomic object you are writing to or reading from, so you can always return it and trivially satisfy 'availability' if we define it this way.
To your other point - be aware that I didn't write the article, so I'm not speaking for the author. However, I think you're right that the article makes it sound a bit like a single failure or message loss will cause any protocol to immediately sacrifice availability or consistency. This isn't the case - all CAP does is establish that for every protocol, there exists a failure pattern that will force it to abandon one of the two.
For quorum systems, this means that a permanent partition causes one half of the partition to no longer be the majority, and therefore can no longer be consistent if it responds to any requests. Paxos is another example.
So you're right, there are particular patterns of partition that stop a protocol from functioning correctly. And many that don't - hence the term 'fault tolerant' has some meaning.
Avoiding these patterns, in practice, can turn out to be surprisingly tricky. High-performance systems can't afford to have too many participants, which means that the probability of a problematic failure is higher than we might like (five participants in a consensus protocol is already a lot for high throughput, but now we are susceptible to only three failures). Failures are also often correlated, so independence assumptions don't hold as much as would like. Machines crash. Networking gear fails.
There's no abuse of statistics here. The probability of a particular failure pattern can be engineered low, and at that point you must weigh the trade-offs of the cost of loss of availability / consistency vs. the effort you make to minimise the chance of occurrence. We are talking about edge cases here, and the implicit assumption is that the cost of hitting one of them is huge (and it often is). However if you run a cluster large enough, you hit edge cases all the time.
(Although you mention partial synchrony, note that most of these results are mainly applicable to asynchronous networks in the first instance).
Availability of the individual node, or of the service as a whole? So long as the nodes always answer quickly, can't I just ask a few different ones until I get a successful response (or conclude that the entire system is down)?
Let's imagine a system where you want to be 100% available for reads, for any number of failures less than N. Then you need to be able to submit every single write to every single node in the system, otherwise the failure of all but the up-to-date node will result in stale reads.
But then if a single node is partitioned from the network, we can't (correctly) be available for writes, because the system is incapable of sending updates to all reads as required. It doesn't matter which node you ask.
The point is that every system has a failure mode like this. I take your point that it's not always just a single node failure that precipitates the abandonment of C or A, but that was never the point of the CAP theorem.