I recently watched this webinar[0] about it, presented by the company ParaSoft, which develops a static analysis tool and compliance reporting solution for businesses required to use MISRA C++[1].
I must admit I found that element of the story...surprising. Has realtime faking gotten that good yet? Presumably there was back and forth in this call so this person was either disguising their voice or typing responses to be generated on the fly.
I know all this can be done, I'm just surprised it's reached the maturity where an attacker would choose to impersonate someone the call recipient presumably knew vs just being a vague "Bob from IT".
Although to be fair the article does say the employee was suspicious so maybe there was a delay which (if you were looking for it) you would spot.
You could probably reduce the "delay" by using a soundboard of pre-generated filler material and playing that while you type the real response. "Let me find that bookmark", "So the thing about that is...", "ummm yeah. so...", "hmmm no not really"
You can also use text macros to type the response faster. Here they were trying to get MFA access, so you could map longer phrases that will come up often like "Okta multi factor authentication" to numpad 1. Company name to numpad 2. IT supervisor name to numpad 3.
If you know the target of the conversation you can tailor what you pre generate. I like to mess with scam callers when I get one, and I've noticed some are using some kind of soundboard with a woman's voice (I'm pretty positive it is real and not AI) and they have a planned flow / script. If you try to deviate from the script they have some options to bring you back into it. If you ask them to repeat something you can notice it's the exact same audio snippet as before. If you accuse them of being a bot they have a few samples of the woman being shocked and mildly embarrassed. "Oh my goodness, do I really sound like a bot? No it's just been a long work day for me. I'm sorry about that."
Why type or use a sound board. You aren’t thinking mission impossible enough
Live transcribing in realtime has been a thing for, forever, so there’s no reason for me to think this couldn’t all be glued together into a “voice changer” like the typical super deep “I have your son give me a million dollars” boxes, except instead of doing frequency modulation it is pipes to a model trained on someone’s voice, and applies it. Transcribing to text probably isn’t even needed because why would it be for machine to machine modification. It only needs to go to text for human consumption.
Raw pcm bits from audio in -> AI model trained on victims voice -> line out to phone or voip app.
We totally have the compute to do that. Probably with our phones.
I can't remember which election it was, but the 3D animated character was pushing the limits of real-time rendering for its day when he appeared on a morning talk show and answered questions live. So the live thing has been around for quite some time. The deep fake just allows for the models to look believable. Once you have a model, you can make it do anything.
Faking a famous person would seem to me to be easier (for various reasons) than faking my colleague. It's not enough to fake the sound of their voice, it's also the manner in which they speak - word choice, attitude, responses, knowledge, sense of humour etc. But I'm guessing the target of this attack only knew the fake person they were speaking to marginally.
The approach seems to be unnecessarily risky vs just phoning up pretending to be someone they didn't know is my point.
Whoever is generating and distributing these images should be charged with distributing child pornography. Doesn't matter if the images are AI generated.
That's my initial response, anyway. I'll concede that it's not well thought out. For one, it isn't comprehensive enough to solve the problem of the perpetrator being halfway around the world.
I think the problem is that the most likely perpetrators are the teenage boys the girls are in school with. It seems like exactly the sort of thing teenage boys would do without fully understanding the damage never mind the legality. In that sense it feels very tricky to deal with - saying that, it needs a solution. Being a kid these days seems pretty awful. I thought it was bad when your mistakes could be caught on camera and put on Facebook. How much worse can it get?
I think the endgame is having a different understanding of privacy, modesty, etc. There's no way this is going to go away or be regulated away somehow. Heavy handed punishment of young kids who generate images just creates more problems (though I imagine we'll go through that phase). Eventually (in a generation or two) it will equilibrate and nobody will take the pictures seriously or be interested in making them. There's novelty now, it will go away.
I can't see any other realistic direction this will go.
I can’t see this happening. People have been saying this for a long time. On top of that, a lot of young girls are going to go through a lot of pain in the meantime - hoping for societal change seems negligent.
Yes. After all the alcohol limit of 21 is a massive success and leads to both people under 21 never drinking alcohol, and people over 21 being very responsible drinkers, enjoying one or two drinks in the evening instead of getting blackout drunk.
In light of these successes we should ban smart phones from under 16 year olds. Computers too, after all those can also be used to access AI tools. Anyone who says that adults taking the bare minimum of responsibility has anything to do with parenting instead of taking agency and responsibility from teenagers is just a small-government naysayer
I realize that users of this site might be inclined to think that non-perfect solutions are not acceptable. However, in the real world, all solutions are non-perfect. Like for instance alcohol and tobacco limits: they are a success, even if they don't totally prevent children from consuming those drugs.
A smart device ban would be similar to the ban of those substances. Not terrible, not great, but much better than status quo.
That opens the whole "why is child pornography illegal" question we already have with the legal status of drawn depictions of child pornography.
The most obvious answer is that it's about the harm to the subject of the pictures. Most child pornography is made through exploitation of minors, so we just forbid the whole category. Fictional child pornography, like a drawing (or an AI generated image) doesn't suffer from that, so doesn't have to be outlawed. That's largely the position of the US justice system for example.
Some countries go further, arguing about the impact of child pornography on society, especially pedophiles. Pedophilia seems to be getting worse by consuming child pornography, not better, so that gives reason to outlaw it altogether, no matter how clearly fictional it is. That also gives room for lots of subtlety, like when a Swedish court ruled that a manga expert could keep a drawing that would in other cases be illegal child pornography. Similarly the fact that the case in this article is child pornography made by minors for minors could factor in.
In Spain specifically, the line is drawn at a certain level of realism. Real porn of real children is illegal, so are things that look exactly like it, manga levels of unrealism are legal, but somewhere between there's the line. Where these AI images fall on that line would be interesting, but impossible to judge without seeing them and having good knowledge of the Spanish legal system.
Child pornography is one of the worst forms of human depravity, because in order to satisfy one person's messed up desires, a child is subjected to unspeakable sexual violence. We're conditioned to protect our young for lots of obvious reasons, so this triggers an understandable an entirely justified visceral response.
This particular situation is ... different. It's clearly still causing pain to children. It's using their likeness without their consent and in a sexually violent way.
But ... I can't get behind the idea of equating it to child pornography.
It should absolutely be considered a crime, and come with its own set of punishments for those found guilty.
Again, making it absolutely clear that I personally find this act to be vile, unacceptable and highly antisocial, I also think that it should be published much less severely than producing/distributing .. err ... "actual" child pornography...?
We treat manslaughter and murder as different things, perhaps that's a suitable analogy here?
This also seems similar to the whole issue of deepfaked porn involving celebrities. When folks said "AI is gonna usher in societal problems we have no idea how to deal with", I never imagined it would get this bad, this quickly.
It depends on whether the AI model was trained on CSAM or not, right?
If it was, then crime. If it wasn’t then no child was harmed and in a free thinking liberal society we don’t punish thought crimes.
And if AI models prevent people from committing actual harm to children, then isn’t this actually a win?
Humans and machines must be free to imagine. And as a society we must tolerate all art, even if it depicts something most people find gross. Consider, we have books, movies, and video games depicting killing, even though it’s illegal.
I'm a fan of not yucking other people's yum, especially in the privacy of their own homes and not infringing on the liberty, safety or wellbeing of others.
That said, if we've got folks who just straight up like child pornography (which we do, and always will for as long as we remain a race of homo sapiens, sadly), would the ability for them to consume this kind of generated content actually help? Or would it encourage these people to then go further and prey on real human children?
I simply have no idea. I grew up with violent video games. I've had violent thoughts. But blowing someone's brains out in a game has never motivated me to do it in real life. I think that whole era of moral panic was silly. But human psychology is complicated, this could be very different.
Well... it's certainly interesting to ponder. As a personal anecdote I had zero parental supervision growing up and I spent an absurd amount of my formative years on the dark corners of the early Internet. During that time I got hooked on pornography the effects of which I deal with to this day. If I replay the scenario but insert the possibility of stumbling across what amounts to a pedophilia creation machine I... really don't want to think about it...
Creating pornography featuring the likeness of anyone, child or adult, should automatically be classified as a crime similar to revenge pornography laws.
Creating child pornography that does not feature the likeness of someone living or dead should be prosecuted under obscenity laws, but not as child abuse, since by definition no children were abused.
It doesn’t take into account whether there’s a victim or not. In a free thinking liberal society we don’t punish thought crimes because the concept is absurd. It’s what allows us to have diversity of art, literature, thought, etc. Fantasy != reality.
Today we allow all manner of “unspeakable” acts to be portrayed and imagined: war, murder, sexual abuse, speeding, gambling, fraud, you name it we can write, draw, think, and talk about it. There’s nothing fundamentally special about portraying a minor in a lewd way in that sense.
So I think any call to heavily punish people for a new crime should be framed in the context of: who’s the victim and what harm are we preventing? If there is no victim then it’s much harder to build a case that there’s harm.
Listening to an audiobook narrated by an AI voice can get old after a few hours. There is absolutely still hope for traditional voice narration, especially if different voice actors want to voice different characters in the story.
I imagine this will sit where the LibriVox recordings did before. "Good enough" to cover the vast amount of works that never received studio treatment, but not anything that will ever replace a human at the high end. Even if the likeness is absolutely perfect, people will still prefer to know they are actually listening to their favorite actor.
People can listen to every book they want in decent quality instead having to wait for it to be adapted as an audio book. Of course blind people would especially welcome this. Audio books would get cheaper, which seems like a desirable trait to me. In the end, all of these AI "conversions" are menial in a way anyway, so why not automate it. Having access to these automations, hopefully subscription free, would also be a kind of freedom for everyone. The freedom to consume information in any form you want. A universal translator isn't all that different from this in that regard.
As for actual creative tasks, that is another matter altogether. I can only hope that real creativity doesn't die out and that TV series and other things don't get any more homogenous than they already have become. At least there are still enjoyable outliers in this plethora of entertainment output. And in some way, audio books can also be creative if they really add emotions that aren't written down exactly. Similar to how the same sheet music can be adapted by different musicians in slightly different ways or instruments and such. Even if all of that could be synthesized, what it means is that selection of the results would become the creative task. Just like stable diffusion and others output hundreds of images for a prompt but only a few are actually interesting. By definition of this automation, you will always end up with more than you can consume and finding the right result for the right people then becomes the next hurdle.
I still wonder though, whether new entertainment really is necessary until the end of times. There probably are already more books and music in the world to last you a lifetime. And after 100 to 200 years of continuous TV and movie output, you probably could just reshow the ones from a decade ago for the next generation, no need to rehash everything. This assumes that there are no new media though, such as maybe walkable VR movies. But even film from decades ago can be remastered in 4k and I begin to doubt that anything above 4k makes much sense. Even 2k is still perfectly fine to me.
We have millions of digitized books by now. Only a very very small percentage of those are worth spending the money to turn into an audiobook narrated by a human.
A lot of people do not read books (for various reasons: they are blind, they are driving, they are jogging, they are doing chores, they just don't enjoy reading), leaving millions of pieces of art inaccessible to them.
Getting TTS quality up to near human narration level means absolutely everything will become available to them overnight.
To eliminate work of course. To advance the state of technology where everyone can have the base essentials for free, or as low a cost as physically possible.
You should listen to these Project Gutenberg audiotexts…they are really pretty good. I would have no problem listening to them for an extended period. The real challenge Project Gutenberg faced was automating the removal of the random metadata noise from the original plain text versions of the content. And it will continually improve from acceptably good to excellent in short order with perhaps a tiny amount of hand tuning.
It isn’t just the blind who benefit. As you age, eyesight degrades and the mind dulls. At some point, audiobooks are a real boon, especially with PD texts where you can read until you get tired, then listen. Also the growing demographic of live-alone elderly like the background sounds of people talking in their residence, explaining part of the popularity of talk radio. They feel a little less lonely. These audiotexts provide an intellectually varied and overall healthier option. It seems like an Absolute Good to me.
AI audiobooks also seem they’re ripe for “perfection over time” where parts are indicated where it’s not pronounced right or needs additional inflection, etc, and that can be rerendered.
Do you have any thoughts on CVSSv4[0]? It appears to incorporate finer-grained and organization-specific scoring to address issues many have with the one size fits all approach currently used for CVEs.
Do you have any thoughts on CVSSv4[0]? It appears to incorporate finer-grained and organization-specific scoring to address issues many have with the one size fits all approach currently used for CVEs.
Do you have any thoughts on CVSSv4[0]? It appears to incorporate finer-grained and organization-specific scoring to address issues many have with the one size fits all approach currently used for CVEs.
This already exists today where you can do custom scoring and some companies (e.g. Red Hat) already do so.
CVSSv4 fixes some things, yes, but not the underlying issue which isn't so much a technical challenge (partially, sure) but a shift in policies and thinking.
The current model of "we need to get to 0 vulnerabilities in our scans" will lead to malicious compliance[1] and worse results compared to being able to focus on the few vulnerabilities that are really important.
At least that's my very strong opinion.
A reference to their video IETF Celebrates The Standards [LIVE at Demuxed '22]
https://www.youtube.com/watch?v=NAkAMDeo_NM