Putting on my tinfoil hat here, if all it takes to build a speech model to impersonate someone's voice is an hour's worth of them talking... what happens when the wrong person gets that? For example, the government or a corporation (a internet phone service, maybe) uses to fabricate evidence of conversations that never really happened; could also be used to aid in identity theft
To indulge you just a little bit, I think it most likely result in an a rapid expansion of forensic industries. While I have no legitimate experience in signal processing, I imagine there would be ways to deduce whether or not such impersonations were credible to some degree. Whether or not that would stop tech-savvy marketers and con artists out of scamming grandma, I don't know. We'll have to wait and see what 21st century holds for future firewalls. Of course, if someone with any knowledge on the subject would like to step in and point out how stupid my response sounds to them, I'd be glad to become more informed!