I'm honestly surprised that anybody could mistake the Lyrebird generated voices for something real. It's got this weird buzzy noise about it which sticks right out to my ear, and only a little bit of noticeable influence from the author's voice.
Either some people are much worse at perceiving this than I'd expect, or the article is hyping up something which just doesn't deserve that level of hype - yet.
I think the author had a little over-the-top reaction of "it’s time to unplug everything, chuck my phone, don a tinfoil hat, and move to the woods." The buzzing and computer-created audio was immediately noticed. I also consider myself very good at noticing sound differences, but I think that's beside the point in this case.
People with phonagnosia are incapable of recognizing anyone's voice. If it is like face-blindness, then probably there is a spectrum of ability to recognize voices, and so what sounds obviously computerish to you might be indistinguishable from the real thing to someone else.
The thing is - I find it almost impossible to recognise peoples' voices on the phone, unless they have a particularly distinctive voice - usually the accent is the distinguishing factor. But I think I'd get this straight away, even over the phone. Maybe something to do with being a (very amateur) music producer, though.
You can definitely tell which one's the AI, but if you weren't particularly paying attention on, say, a phone call (where you're used to bad audio quality anyway), you might not immediately notice.
I can't speak for anyone else but I would recognize that as software-generated instantaneously. It's unmistakable imo, and actually sounds significantly worse than, for example, these samples from Google's Tactotron 2 system: http://www.androidpolice.com/2017/12/28/googles-new-text-spe...
It's a little off topic because they do a different thing than in the article, but check this out for some (actually) scary good computer generated examples:
It's a touch robotic, but I've heard real people sound like that (and worse) in Google hangouts with poor connections. I also imagine it can only get better with more knowledge and tech.
The headline is overhyping, but the result is surprisingly good given I had a very low expectation. It won't fool anyone, but should be usable for some scenarios.
Either some people are much worse at perceiving this than I'd expect, or the article is hyping up something which just doesn't deserve that level of hype - yet.