Hacker News new | past | comments | ask | show | jobs | submit login
Scammers deepfake CEO’s voice to talk underling into $243k transfer? (sophos.com)
322 points by wslh 23 days ago | hide | past | web | favorite | 129 comments



Boss, executive, manager, supervisor. Person in charge. Of these people there are many who want their subordinates to enact their will in the way that requires their own personal least amount of effort.

They want to give ambiguous direction and receive exactly what they imagined but still be delightfully surprised. They don't want to hear about the unintended consequences of getting exactly what they asked for; they want it to "just work and don't trouble me with the details".

Pedantry is unacceptable. Everything must be interpreted exactly how it was meant to be. The rules are meant for others. If a subordinate fails while breaking a rule, then they are fired. If a subordinate fails because they didn't break a rule, then they are fired.

Ultimately, as we come closer to perfect impersonation at the press of a button, these 'people in charge' will have a tough time adapting. They want absolute obedience and unquestioning loyalty. Unless their subordinate was talking to a social engineer using a deep fake.

On the other hand. The people in charge who trust their subordinates, who let them fail and encourage them to do better, who let them question their approach and decisions. These people in charge will find themselves and their organizations more resilient to these attacks. "Yeah, Jeff might be a real jerk sometimes and he's always second guessing decisions, but he doesn't transfer $12 million just because he gets a phone call where someone who sounds like me is upset that the money hasn't been transferred."


This development is a blessing in disguise for people tired of dealing with "decision-makers" giving orders continuously. A tedious authentication step is a good speedbump against impulsive CEOs.


We should be able to trust what's displayed as a caller (stolen unlocked smartphone is a different problem). It doesn't seem that hard. It's not rocket science. I don't understand why we don't have that yet.


That only helps if you're able to recognize that number. Otherwise it's just an equally plausible number to be called from. In this case it's the voice that tricked the person and I'm pretty sure that if your boss calls you and demands something the likelihood to comply is pretty high. Your suspicions about not recognizing the phone number will not even get time to properly form.

We need for bosses to understand that this is not the way to request things but to always go through authorized channels which implement better "authentication" processes. This may also dampen their willingness to make "out of band" requests.


What millennium do you live in that when one of your contacts calls you, the thing that shows up is a number and not the caller's name?

That's my biggest pet peeve about Android. (Totally unrelated discussion, sorry) I can't tell it to send all calls that aren't from my contact list directly to voicemail. They're universally spam.


The millennium in which my CEO can call me from any phone in the company and I'd be hard pressed to tell if it's legit or not? Or from a phone in another parent or subsidiary in any number of countries that I have basically no chance of recognizing? The simple fact that it's the CEO calling discourages anyone from questioning it because C level executives and senior management have a way of imposing this kind of "absolute authority" in many companies.


Excellent point. Authoritarian workplaces worked adequately in the industrial age. But they've had a hard time keeping up with the modern world, where you get better economic results by empowering workers and investing in social systems. It's especially obvious to companies like Google: https://www.nytimes.com/2016/02/28/magazine/what-google-lear...

This deepfakes stuff can only help accelerate the move toward bottom-up power and process-oriented thinking.


The alternative of having collaborative decision making has its own downsides, of being slower, not meeting goals, and sometimes producing conflicting outputs.

In reality, a "directive" can be generalized to any "process request", such as a software enhancement request.

A "goal level" statement must be described, and then someone between the implementer and the manager needs to flesh out and escalate what the functionality looks like and any potential conflicts. ...then the person implementing them needs to translate that spec into an implementation, and then escalate any potential unintended consequences found in testing phase. ...and all of this in an iterative cycle.

Unfortunately, 99% of office requests do no merit enough _value_ for this process to be worthwhile. With infinite resources, we can achieve infinite perfection in every management request.

...but in the real world where business demands are transient, the expectation should be that management doesn't know what they want all the time and asks for things with incorrect parameters and conditions. ...and so the implementer (since there's no one in between) needs to bring their industry experience and communication skills to _poignantly_ explain the caveats in the request and explain what the alternatives are.

These are difficult conversations to have. They inherently contain conflict and the emotions of frustration associated with the manager perceiving that they aren't getting what they want, and the implementer being asked to break protocols and safeguards.

There is no perfect solution. This comes down to professionalism and having employees and managers that are able to offer compromise, have dispassionate discourse, and be willing to thoughtfully approach the problem in succinct and timely fashion.

...which is why hiring experienced intelligent emotionally stable people is the most productive choice for companies.


This. Technical verification solutions help, but none of them defend very well against your CEO apparently ordering you to bypass the verification step.


So the idealized authoritarian CEO must first have ordered you to never bypass it, and then must regularly test you to ensure that it cannot be bypassed.

There's no reason this sort of authorization process couldn't be trivially handled with any one of a number of simple technologies. If there isn't already some sort of Personal Verification Service to do this, well, there's a niche to be had. Come up with a better name.


I wish I could "give you gold" or at least upvote your comment a thousand times!


No organization can work if normal communications are hijacked and spoofed by bad actors. This is a pretty severe security issue that no amount "social engineering awareness" training is going to fix. Most businesses can't operate if every decision of consequence needs a face to face meeting to verify authenticity.


> a face to face meeting to verify authenticity

There are other ways for 2-factor/3-factor verification (physical or passcode based tokens, e-mail+voice, or even a video chat).

There are other ways of safety like requiring a 2-person authorisation for large transactions - many organisations and especially charities already do that.


This doesn't solve the actual problem that you can't trust normal communications if they can be easily spoofed.


Although the general class of problem is real, this story didn't hold up under examination. Here's a follow-up article in Spiegel: https://www.spiegel.de/netzwelt/web/deepfakes-werden-erkennu...

It's in German, but Google Translate to the rescue. Key paragraphs:

'But when asked, how do you know that such a software and no voice imitator has been used, a spokeswoman replies to SPIEGEL by email: "We do not know with 100% certainty. Theoretically, it could have been a human voice imitator But we do not assume that there are some clues (but no evidence). "

'In turn, the clues she mentions have no technical relevance; they are by no means interpreted as evidence of a deepfake. The supposedly "first case" can therefore at best be termed a "possible case". Which is quite symptomatic of the debate about deepfakes and the resulting risks.'


Possible case, furthered tempered by "infosec industry source". :) It is their job to sell fear


The entire "infosec industry" is fraudulent, in my estimation. A bunch of assholes selling powerpoints and not a whole lot else. I've seen it a lot.

None of them know what they're talking about. Source: my direct experience


I wouldn't say that, but they do suffer from something the IT industry as a whole suffers from: there is/was a need, and by fulfilling that need, they find themselves in a quandary where they're committed to something that must continue to exist in order to avoid inflicting self-harm.

You can't go into business selling software on the assumption you'll sell a single copy and your work is finished. Infosec is no different. There is always this paradoxical need to ensure you never become entirely redundant, regardless of the function you specialize in

I think a lot of the great "evolutionary difficulties" in our industry can be phrased this way. It's why things like Microsoft Excel are laughed at even as hundreds of thousands are trained in the art of constructing effectively bespoke spreadsheets apps in Django by the million every year. I hope for a correction some day, just as much as I hope I'm on the right side of it when it comes..


You misunderstand. "Infosec" is a shopping list item on govt requirements. "We need this, this, this, and it needs to be secure!" They assume security is just this additional ingredient that can be purchased, mixed in, and boom; project is secure, like mixing in chili to anything will make it hot. The infosec companies do everything they can to encourage this assumption.

It's all bullshit, and these people are pure scum.


Speaking as a developer, I've had my ass handed to me by competent security folk on several occasions. Unfortunately in the software industry, security must regularly be an additional ingredient that can be purchased and mixed in, because 98% of us are completely clueless and absolutely not thinking about this stuff as we're trying to clear a sprint board


Serious questions:

> because 98% of us are completely clueless

Do universities/bootcamps teach OWASP-style classes of programming vulnerabilities these days? Some developers are curious enough to learn them on their own, but many are oblivious.

I suspect there could be a startup idea here somewhere.

> absolutely not thinking about this stuff as we're trying to clear a sprint board

Does anyone have a decent tool/process for remembering all of the detailed tasks for every type of software deliverable? I find myself in a state of cognitive coma after sprint planning when I need to divide tasks into subtasks.


> Do universities/bootcamps teach OWASP-style classes of programming vulnerabilities these days? Some developers are curious enough to learn them on their own, but many are oblivious.

I had exactly one course that touched on security.

A course in web programming.

The instruction we received consisted of: "If your project is not secure, it will lose points."

I'm not sure a single person on that class had one point taken off for getting security wrong.

Why would a university care about educating people in something ephemeral, and domain-specific, like security, when it could instead be teaching them about complexity theory and Djikstra, and third normal form?


I refuse to learn about infosec, for the same reason that I refuse to become an amateur surgeon: I don't want to know enough to be dangerous. This is a field that will take 100% of your time if you want to be good at it, and being "meh" is more dangerous than realizing you don't know anything and paying an expert.

(Crypto and security in general were a hobby of mine until I realized how difficult the field really is. "Programming Satan's computer" was one paper that contributed to that.)


You are utterly wrong.

The real problem is that there is a vast need for professionals and the lack of them calls a lot of smoke-sellers. And for a lot of people is hard to tell wether they are legit.

Just look at the Machine Learning / Artificial Intelligence industry. Also "fraudulent" if you apply the same logic.


Oh, I don't think I'm wrong. And neither do you, since your second sentence basically agrees with me. Hit a nerve?

And yeah, most of the ML industry is in precisely the same category. A bunch of frauds.


I can see your point, but babblers != industry.


Sounds like your idea of infosec is colored by a bunch of salespeople.

You'd be surprised just how many places run public unpatched stuff with admin/root holes here and there. Ive seen passwords like "123456" and "password" and plenty other badness that nobody really bats an eye to.

And even simple things like "Use WPA2 and a password manager", for low barrier infosec is routinely ignored. Companies can barely even manage that.... and they have the funds.

And it's not like the bad stuff is scare quotes. Ransomware is a thing. Ive seen a hospital network up north get hit by it. City of Madison IN (1h away from me) ended up paying a large sum, cause they thought backups were pointless. Even know of a story where a state government's machine ended up being a warez server. The lead though to clean up a trojaned linux box, was to rsync from a clean one. Left most of the trojan kmods intact. I caught it down the line.

Sure, if the likes you're talking about is complaints about IBM with QScan or similar, with grandiose claims that their software will save everything - thats obvious bullshit. Security is definitely a process and procedures, ALONG WITH technical means to facilitate that. Even automated scanning of "front doors", or doing routine searches in Shodan is a magnitude better than nothing.


Nice anecdote. There's a shitload of salesmen selling snake oil in security, but they're mostly selling it to idiots and people who think security is a box you tick and not a continual process.


In your direct experience, have you never come across any non trivial security vulnerabilities?


For the newspaper, a 'deep fake' story sells better.

For the executive who messed up, by using the 'deep fake' story, they become blameless. If they admit to a voice actor, they can be questioned for incompetence.

Everyone wins, except those who care about the truth.

The whole thing reminds me of: "The majority of men prefer delusion to truth. It soothes. It is easy to grasp. Above all, it fits more snugly than the truth into a universe of false appearances—of complex and irrational phenomena,” ― Friedrich Nietzsche, The Antichrist


>For the executive who messed up, by using the 'deep fake' story, they become blameless. If they admit to a voice actor, they can be questioned for incompetence

I don't really see why the two should be considered any different.


Because of one of Clarke's laws:

> Any sufficiently advanced technology is indistinguishable from magic.

The mostly tech illiterate boses and the unthinking masses can't really parse the deep fake story. For most, there's simply no basis of comparison even they wanted to expend the energy on system 2 thinking.

But talk about voice actors and everyone knows someone who does imitations. So they will base their anecdotal experiences in judging the possibility of what happened. This also mostly comes from system 1 thinking, which is pleasant to do, since it removes uncertainty from the world.

Edit: when thinking of deep fakes, it applies to me too. I don't know how advanced they CAN be. I'm sure some military has some amazing ones that aren't open to the public. It's a point that goes beyond my imagination and into a world of uncertainty.


So a fake deepfake - not sure where this is going to end.


Crud, I would have totally blindly trusted Sophos on this.


Don't trust obviously incentivized players like Sophos, karma-chasing security wannabes, or anyone else in this category around this kind of topic. German media is very good. Spiegel is an excellent paper.

Trust your instincts, look for second sources, and wait to follow up before you decide one way or the other.


Sounds like what I always considered "crazy talk" but you're right. Don't always assume and trust. Thanks


Sure, this is a troubling new variation, but it's still the kind of social engineering that can be deflected with reasonable protocols, such as a second channel of authentication (e.g. "Sounds good boss, can you email me the transfer/order id to confirm things and I'll finish it from here". The underling CEO was as vulnerable to someone imitating his boss's voice the old-fashioned way.

Regardless of the advances in AI, I think text-based social engineering will still be prevalent and efficient. IIRC, the Anonymous hackers who targeted HBGary got SSH access by fooling the company's chief security officer. Sure, they did by emailing from the company owner's account (thanks to a weak admin password surfaced through other vulnerabilities). But the hackers didn't even know the account name when asking for a password reset. The CSO could've stopped things by asking to do a call, e.g. "Hey let me call you, I need to walk you through this part" or even just texting the phone for confirmation.

https://arstechnica.com/tech-policy/2011/02/anonymous-speaks...


Does this trend bring us into the promised "post-truth" world?

...or will it force humanity to finally acknowledge the very basic security concept of public/private keys for signing important shit?


I recently chatted about the potential use of DeepFake in future propaganda with other HN readers, and apparently it's going to be here soon. You may want to follow this thread, https://news.ycombinator.com/item?id=21269150

But in short,

Several observations...

* A major threat is end-to-end VoIP/telephony encryption. Currently, the most common way to verify one's public key and ensure that no MITM wiretapping is going on, is reading a hash (encoded to words) aloud in a phone conversation. It's used to many protocols, such as Signal, or ZRTP by Phil Zimmermann et al. There is no real security, but it's considered a shortcut with reasonable security for most people, the assumption is that voice synthesis cannot be done in real-time yet. But with DeepFake it's disastrous for cryptography. Now, this shortcut is going to be blocked soon. Full verification, like signing your VoIP key with your long-term public key, or asking security questions (e.g. OTR's SMP algorithm) is needed (perhaps not to everyone, but it's now required for a lot of people to a greater extent).

* KirinDave suggested that, in additional to signing, timestamping will also be important. If automatic synthesis of video and audio becomes widespread, one way to prove the authenticity of the material is to timestamp it as soon as it's recorded, or even timestamping it in real-time if real-time forgery is a serious threat (I hope not). This is one of few use cases that a blockchain actually makes sense if you want minimum trust over 3rd-parties.

The final thought is that it's time to rewatch Ghost in The Shell (the TV animation series, not the movie). Released in 2000s, it portrayed and predicted our world remarkably well, and it'll give you a lot of inspirations of what would the future society look like. In one episode, the protagonists realized a government conspiracy aimed to intensify a military conflict that involves nuclear weapon, and they had the following dialogue.

> "How do we stop it? Can we post the video footage online?"

> "No. Video footage is seen as completely untrustworthy today."


Your second observation regarding time stamps is something I’m really interested in. The broader topic is provenance: the ability to say “this AV stream came from this device at this time.” I just wrote a post echoing KirinDave’s idea but perhaps expanding the scope in this thread: https://news.ycombinator.com/item?id=21526278

The more verifiable pieces of data that you can associate with a recording, the more you can trust that it came from when/who it claims to. If I send a recording that can be tied to:

- a timestamp service with my request for one stored in a blockchain,

- a tamper-evident device that signs the data with its own private key,

- my own private key

then you can have a high degree of certainty that I am the one who recorded and sent the content and that it has not been altered. I could still be tied to a chair while a voice actor impersonates me and forces me to send it. This is after all basically the modern equivalent of a proof-of-life photo with the kidnapped victim holding today’s newspaper. But it’s a lot more effort for someone to go through compared to having none of those other guarantees.

It’s a fascinating topic that will only become more relevant as deepfaking gets easier. Whoever makes the first device/system to do this, if it’s not a flawed premise, will make a pretty penny.


I think most countries can't solve the authenticity/signing problem until they admit that a central trust/PKI database is basically a requirement. Some countries vehemently seem to oppose such systems and will do so for a long long while, for a good reason with all these real issues without? We don't know yet.

There is at least some hope for EU citizens, that they wouldn't have to worry about these authorization/identity issues, because the entire European Union just recently created the legislative framework required for solving this problem in the entire EU. With national trust/PKI services there isn't any need to resort to insecure ancient methods (phone calls, fax) that can be spoofed or intercepted, with increasing simplicity. It is somewhat sad how long it has taken, nearly 20 years later the EU is following Estonia's practice/example! I might be biased about how good such systems are as an Estonian, but it isn't bias speaking that the system seems to work - issues like identity theft and account takeovers generally don't exist here. The fact that we have to specifically teach people how to use mostly foreign services safely, because none of them can provide really secure authentication and identification together, says quite a lot about the differences.


I think somtimes it's difficult to pin point how far we can go with not telling the truth.

For example: I create photorealistic product pictures in 3D (Blender). This saves my customers a lot of money. But the thing is: we present fake pictures to the world looking very real. The products look perfect as render but in the real world this can be a little different.

Another example is IKEA. Almost everything you see in their catalogs is 3D rendered. In those renders everything fits together perfectly. But when you build it at home you might see millimeter offsets.

Maybe those are exteme examples but truth is not black and white.


at least in advertisement people are aware that they're looking at curated content, even though it still might have a negative effect because you can't escape it altogether even if you know it.

But these deepfake scams or fake news or what have you goes a step further because it intrudes into what is supposed to be authentic and genuine.


I hope it does. I have for years thought about the two possible post-fake worlds we can end up with, either everyone learning gradually on their own how to tell truth from fiction and learning how to efficiently do their own research, or a mass-surveillance system designed to 'protect' those that 'cannot determine truth from fiction themselves'. I guess we'll see faster than I thought.


Simple "Only act on official mail" policy can defeat this post-apocalypse deep-fake terminator.


Sure, so long as the official mail is set up to check public/private keys!


My point is, even with quantum computing™, all you need is simple email server with 3 attempt before user lock. If you can't manage this, public key is not the answer. I Have seen only <10 scenarios when private key in the user hands solves the problem. None were scam related.


> simple email server

These days, DKIM is necessary for even the simplest of email servers that wants to interact with any of the major email platforms.


I think it probably will, it is just a matter of time before political actors use this kind of thing to invent dirt on someone. We are probably going to get into an era where everyone deepfakes every politicians voice saying all sorts of wacky stuff and passes it off as real.

"Oh why is the quality bad? I was recording it secretly on my phone in my pocket that's why it is muffled and sounds off" Will be the explanation for all of these, it will come to such a head that politicians will be able to say anything they want to people, racist, sexist, genocidal, whatever, and continuously claim that it was merely deepfake, and thus not their opinion.

People will invent world events out of thin air with deepfake footage and audio. If the CIA cut off the internet access of a small island nation, and create a bunch of deepfakes of a politician stepping down, whilst covertly doing operations on them how would anyone in the west ever know it wasn't real?


Were we in a "pre-truth" world a century ago?


Definitely.

Remember when people used to let their kids fuckin play outside? =\ Sometimes for 12 hours!

We're less violent as a planet/society than ever.

But back then, the "gore" of all of humanity's everyday life wasn't smashed into our faces as a routine - only occasionally, as life dished it out to us personally.


Those "Grandma send money to your grandson to bail him out" scams are about to get a lot more interesting. It's one thing to convince your grandparents never to send money to random people calling on your behalf, but if the conversation happens in your voice? How do harden against such an attack when dealing with people who are technologically illiterate?

Perhaps this opens up a market for secure identity verification that is accessable to the layman...


To be fair, these scammers already claim to be the grandson. My grandma got a call from me, from jail, in Mexico. I guess I had a little too much to drink at a wedding, and I had a cold, so my voice was a little off. And I desperately needed to post bail in a foreign country.

It wasn't the voice that tipped my grandma off. Just that I do not drink. So that was close.


This isn't really that hard to fix, and it's basically horse-battery-staple a la Hogwarts dorms. Families need to establish a verification phrase or word and ask "what's the password?" if there is any doubt about identity. Given deepfakery of images, even a video call won't be safe in the future and personal info about people can be datamined, so the "3 security questions" approach is also already useless.


What I don't get about these scams is what is preventing them from working with the bank to reverse the fraudulent transfers? Sure I get that in these instances money is usually moving between the banking systems of two different countries, but just like we have extradition treaties, how is this not a thing?


Well, the issue is I think specifically wire transfers are designed to NOT be reversible? I'm not saying they can't reverse in case of fraud (like this). But it's probably way more difficult because the money would have been IMMEDIATELY available in the fraudsters account, in which case they probably knew to move the money somewhere else immediately / withdraw it (in some regards, I'm not sure how you withdraw $200k cash).

Anyway- that's my guess why this is difficult to reverse.. if it was easy to reverse, both sides would be very suspicious of each other during a large transfer.


This and other discussions on HN make me think on the fact that so much of your voice has been recorded by companies when you call their support lines. Are these troves of voice recordings stored safely with appropriate levels of access control?

I mean most people can usually pick up on the difference between a voice recording and a generated voice sample based on the recording. Have there been studies where generated voices are then subjected to telephone compression and layered with background noise to appear to be a phone call from a car?

If DeepFake voices only need 5 second clips to produce good enough versions of anyone's voice and any discrepancy in quality could be attributed as a bad telephone connection or masked with background noise, is anyone really safe?


> This and other discussions on HN make me think on the fact that so much of your voice has been recorded by companies when you call their support lines. Are these troves of voice recordings stored safely with appropriate levels of access control?"

"When we said 'quality and training purposes', we were referring to training a neural network."


One way to defeat this (kindof) is to institute a callback on the internally shared number.

So it would have went like this:

    1. scammer deepfakes CEO voice
    2. demands X dollars transferred elsewhere
    3. Employee hangs up and calls back using pre-shared phone#
    4. Gets confirmation
Of course this can be somewhat thwarted by sim-stealing attacks. But this is just another layer of defense, that stops another 99.xxx % of attacks. Enough of those and these attacks would be infeasible.


Or you just don't authorize multi-million dollar unreversable transfers with a phone call? Processes for this kind of thing are in place for a reason.


Well, the obvious counterexample would be Fintech. Calls and messages over approved SEC communications methods can be and are normal operating procedure.

But in this article's case, we're talking about $.243 million , which flies counter to "don't authorize multi-million dollar unreversable transfers".

Not sure what your point was here.


My bet is on a voice actor -- or an underling invoking deep fake to cover his arse


Transferring $230,000 to your cousin's account, and hoping to stay out of jail, by telling everyone that a deep-fake CEO told you to do that is an incredibly dumb way to steal.

I mean, people do incredibly dumb things, so it is certainly possible. I doubt it, though.


How do they know it was AI and not a voice actor?


It seems like they don't know.

"The target of the scam was convinced that he was speaking with his boss due to a “subtle German accent” and specific “melody” to the man’s voice and wired the money as requested.

According to a representative of Euler Hermes Group SA, the firm’s insurance company, the CEO was targeted by a new kind of scam that used AI-enhanced technology to create an audio deepfake of his employer’s voice. " - https://cyberscout.com/en/blog/voice-deepfake-scams-ceo-out-...


its a random assumption / hypothesis. maybe even it was the real ceo... :D that'd be too funny


Be an effective way to get some money.


Alan Turing would like to know as well...


> Bafflingly, though, Dessa said that its team created the Rogan replica voice with a text-to-speech deep learning system they developed called RealTalk, which generates life-like speech using only text inputs.

I'm missing something here. How could they replicate the sound of his voice using only text inputs[emphasis theirs]?


They probably trained the model with audio, then told it what to say using text inputs. That made me do a double-take at first too, it's a pretty misleading wording.


I had the same question, and came the conclusion: that's not what they do. In the first post they claim that they used only text inputs [1], but in the post which actually explains what they do [2], they clearly use audio clips to train their model.

[1] https://medium.com/dessa-news/real-talk-speech-synthesis-5dd...

[2] https://medium.com/dessa-news/realtalk-how-it-works-94c1afda...


The voice model is trained on voice inputs. The text-to-speech synthesis is based on text only, rather than audio of a different voice. So, together you have voice and text as inputs.


Do you know if anyone done this without using text? I mean training a model only using voice samples from different people, and using voice input only during inference.


I think it's pure ignorance on the part of the author, thinking that text-to-speech is somehow harder than speech-to-speech with a voice change.


Sounds pretty stupid honestly. There is so much more to voice then the words that are said, and punctuation is just not possible to convey everything. Can their replicate sing? I think not..


At some point, we are all just going to blame deepfakes for all of our foibles.

"Honey, I sent $20k to that camgirl because I thought she was you! Deepfakes!"


Related discussion (separate article) from yesterday: https://news.ycombinator.com/item?id=21525878


Odds are there was no deepfaking involved here, nonsense clickbait title.


Almost everything digital in audio seems to have been a decade or two ahead of video (non-linear editing, lossless/uncompressed quality, digital synthesis of elements, distribution, etc), primarily due to the bandwidth and processing power requirements being much greater for video.

So I'm curious how, with regard to deepfake technology, video seems to be well ahead of audio. Is audio deep fake technology simply less interesting to people? Are listeners far more sensitive to voice not being perfect? Is the human input just a lot less helpful in modeling the output with voice vs video (where the initial human input is only slightly more helpful than just synthesizing from text-to-speech)?


There is zero information to backup the claim this was a deepfake and not just impersonation.


I haven't even seen information backing up the claim of the calls. It could be all made up.


If this worked, this scheme is going to be widespread next year, making phones even more useless.

It's ok to get spam and scam for emails if you're not paying anything but there's no reason to get the same for phones if you're paying top $ for it.


Wouldn’t have taken much effort to verify this. Doesn’t even sound like he was that sure it was his CEO, just says he had a German accent and a similar melody. You’d think for a transfer this large he would want to be a bit more certain than that!


My dystopian mind thinks it's only a matter of time when a deepfake launches fire and fury across the globe. Imagine the Cuban Missile Crisis with deepfakes. Yes, I know there are layers and layers of safeguards, but....


Institutions can adapt, and I'm fairly confident they can handle this okay. I'm more worried about the damage that will be done to democracy from a political perspective and the criminal justice system. If you think large chunks of the US lack a shared epistemology now, just imagine when people have legitimate reasons to mistrust voice or video recordings.


Sounds like a good story for some embezzlment.


Reminds of when some scammer bilked Google and Facebook out of $100MM through fake invoices. I'm not sure deepfakes is really likely or a differentiator here.


At present day to fight these tricks like your bank calling you or a contact you know calling you both saying fraud...just hang up and dial their phone number directly. Call your bank yourself and ask to be transferred to the fraud department...dial your bosses number directly, etc.

Until the scammers are able to take control of your phone number which I hope never happens I think the above is a good solution to fight this junk.


That latter part happens all the time, it’s called SIM swapping.


How could there be enough samples of his voice? Seems like quite a jump to “it was a deep fake” unless there’s more info they haven’t revealed



From yesterday: https://news.ycombinator.com/item?id=21525878

Only requires 5 seconds of voice audio to synthesize believable speech.


The cadence of the synthesized voices are still noticeably artificial, even for the short demo phrases. This is not to say this isn't impressive. But how much does this method improve when it isn't constrained by a 5-second sample? If we feed it several hours of public speeches from Martin Luther King Jr, or hell, tens of hours of audio from President Obama or Trump – will it have the same artificial cadence, even if the tone and pitch of the imitated voice is accurate?


"Deepfakes voices!" makes a cool headline but there's a 90% chance either the employee or the CEO is in on it. Right now the only real fact is "the employee claims someone who sounded like the CEO called"... it's far more probable that either the employee is lying or the CEO actually made the call.


Not sure what makes you skeptical. The technology to do this sort of scam was ready at least a year ago. If anything I’m surprised it hasn’t happened sooner (it probably did, we just haven’t heard about it). If making money transfers is part of your job and your boss calls you to make an urgent transfer for a reasonable amount, why wouldn’t you do it?


Because the simplest explanation is often the right one. Do you think it's more likely that some scamming group somehow got thousands of hours of the CEO's voice, trained a world-class voice model, got internal info about how the CEO usually calls and requests transfers, found the correct employee to target, spoofed the incoming number... or was the employee's drinking buddy like "hey, just transfer X to this offshore account and say the CEO called you, we can pin it on that deepfakes stuff we saw in the news"? Or even more likely, CEO wants to embezzle some money so he makes a call for the transfer then feigns ignorance?

If there was any sort of proof that this happened, like a recording, then sure. But there's no facts in the article other than "analysts suspect..." and an attention-grabbing headline.

Maybe I'm just jaded but after everything that's happened over the last few years, but I take everything the media publishes with a grain of salt these days.


My understanding is you don’t need 1000s of hours of CEO’s voice. You train the model on many voices, then fine tune it on a few hours of the target voice.

Regardless of what happened in this particular case, this type of scam is now possible, feasible, and will get easier to perform with time. We should be concerned.


Maybe it was genuine screw up and the deep fake story was created to cover ones ass.


Trust is under full on assault by the nefarious among us.

I ask you, how will we be able to trust moving forward? Will all transacting be in person? We can fake id's too.

How will we be able to verify reality in this age of deception?


Previous discussion from 2 months ago: https://news.ycombinator.com/item?id=20864659


A simple predefined verbal password that updates once in a while would be sufficient. Especially if you're transferring hundreds of thousands based on a single phone call.


So the CEO was caught embezzling 243k and he blamed it on a deepfake?

Genius.


I have a slightly hard time believing all possible responses could be regenerated or done on the fly with a true deep fake. Maybe possible with great planning.


occam razor: dude transferred the fund to himself after reading about the "deepfake for audio" research as it was circulating recently and faked the scam assuming worst case scenario is him being fired and rich.


I assumed a human was listening in and typing responses for a text-to-voice program to speak out


It'd be pretty tough for the operator would have to wait for the other party to finish their thought, comprehend said thought, formulate an appropriate response, and then type it out on a keyboard.. Whilst maintaining the expected rhythm of a real-time phone conversation


Faster than writing an AI that could pass the turing test ;)

But yeah a voice actor seems like an even easier (if less press-worthy) way to run this grift


Poor internal controls on the CEO's team part.


Yes, in this case. On the other hand, it's not hard to see that we're probably increasingly headed towards a future with more rigorous identity and authorization verification for at least some actions and purposes. And those more rigorous processes will inevitably lead to more edge cases that force people to go to a lot of trouble to verify who they are. (Yes, of course we can get your access to your Google account back. Just bring a notarized birth certificate and two other forms of ID to our Mountain View office between 10-11am on a Friday.)


I think we will do better, but i'm an optimist.


The Motel 6 Saga! (Sequel)

https://youtu.be/6ePFuTLYTaE


I tell banks and other people that I will call them back to verify the authenticity of the incoming call.


Are there ML models that can detect a faked / synthesized voice, even if we can't?


the models to detect if it's fake are used to reinforce it's training https://en.wikipedia.org/wiki/Generative_adversarial_network


The best question is: is there something computationally easy to check but computationally difficult to generate?


probably but whatever will be used it will just be the next generation input, it's going to continue escalate indefinitely.


Anybody who will give away 243K based on a phone call alone is a moron.


Working with much larger sums of money quickly distorts your perceptions of what is "a lot". 243k is under the signing authority of a random Director level manager, and to a CEO would likely feel like a rounding error.


Combine this with callerId spoofing and it'd probably fool me.


Every company should pre-emptively attempt all known forms of CEO fraud every couple of weeks as a kind of vaccination against the real thing.


This is a thing already. Large banks spend up to multiple billions per year on security (for example) and even a 3,000 person enterprise can easily drop millions a year. Red-teaming is an absolutely necessary aspect of security these days.


And so it begins.


Artificial voices must have improved a lot in the past 2-3 years. None of the ones I've ever heard would pass muster. Something weird in their intonation, a kind of sameness or monotony in their speech i.e. lack of excitability, a bit too perfect and precise.

I just went to cereproc, one of these companies advertising a very realistic synthetic voice, and none of their voices convinced me at all, though I have to admit it was pretty good.

When they get so good that they are indistinguishable except through Turing tests, and maybe not even then, we'll all be in trouble. I somehow expect that we haven't yet reached that point, though it can't be long in coming.


You should really listen to the modern WaveNet samples.

I can't distinguish most of them from human, even knowing ahead of time which one is which:

English: https://r9y9.github.io/wavenet_vocoder/

English: https://google.github.io/tacotron/publications/speaker_adapt...

Japanese: https://r9y9.github.io/demos/projects/icassp2020/


You can probably avoid the uncanny valley by a "bad phone connection" filter.


the sound quality on cell phones is really poor for me so i could see this being an easy thing to trick someone with on that. have you ever gotten a phone call from someone on a bad connection?


You read the article right? It literally happened. It’s already good enough to fool people who aren’t paying attention.


> "Analysts told the WSJ that they believe that artificial intelligence- (AI)-based software was used to create a convincing imitation of the German CEO’s voice…"

> "If this turns out to indeed be a deepfake audio scam…"

It didn't literally happen, they _think_ that's what happened. I'm sceptical personally, I didn't realise deep fake audio could be done in realtime now? And who is this CEO that must have hundreds of hours of publicly available audio that the voice could be trained on?


Take a look at this article that hit HN yesterday: https://news.ycombinator.com/item?id=21525878




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: