Hacker News new | past | comments | ask | show | jobs | submit login
They stole my voice with AI (jeffgeerling.com)
232 points by sounds 3 hours ago | hide | past | favorite | 94 comments





If LLMs are the ultimate remix machine, then is anyone with a RAG a digital DJ?

One can't help but wonder what theft even means any more, when it comes to digital information. With the (lack of) legal precedent, it feels like the wild wild west of intellectual property and copyright law.

Like, if even a superstar like Scarlett Johansson can only write a pained letter about OpenAI's hustle to mimic her "Her" persona, what can the comparatively garden-variety niche nerd do?

Like Geerling, feel equally sad / angry / frustrated, but merely say "Please for the love of all that is good, be nice and follow an honour code.".


what theft even means any more

They dragged the term through different phases, but that’s just projection of will. Theft is undefined for objects with .copy() interface. It’s still there when you look at it.

People have to adjust expectations, not laws. Computers replaced computers, now voice acting replaces voice actors. Your popularity doesn’t mean anything really and wouldn’t it be unfair if only popular could spare their jobs.


> Theft is undefined for objects with .copy() interface.

> Computers replaced computers, now voice acting replaces voice actors.

It's incredible what web development does to someone's ability to communicate ideas.


I appreciate his pointer to precedent, but the truth is that while precedent is a start, we're going to need to do work with principles beyond precedent. When tech introduces unprecedented capabilities, we will either figure out how to draw boundaries within which it (among other features of society) works for people, not against them, or we'll let it lead us closer to a world in which the strong do what they will and the weak (or those just trying to keep a camry running) suffer what they must.

California recently signed some legislation into effect. It’s a start. Congress is working on “No Artificial Intelligence Fake Replicas And Unauthorized Duplications Act.” Still in dev in the House, but has bipartisan support.

Call your congressperson, ask them to co-sponsor and/or vote for it.

https://www.cbsnews.com/losangeles/news/california-bills-pro...

https://salazar.house.gov/media/press-releases/salazar-intro...

https://files.constantcontact.com/1849eea4801/695cfd71-1d24-...


Seems silly. What if I train my model on somebody who sounds like a somebody?

And the flip side. What if somebody who sounds like the person you trained on accuses you of stealing their voice? Assuming malice from similarity is going to sometimes lead to wrong results.

I imagine it would then depend on the intent and how that voice was presented.

If you got Bob who sounds awfully like Mr Very Famous Guy to record vocal that you then use to train your AI and use that vocal clone to sell your nutritional yeast extract as though it’s Mr Very Famous Guy selling it that would likely be a problem.

If you used the vocal clone to sell it, but said something like “oh hey it’s Bob here lots of people tell me how much I sound like Mr Very Famous Guy but I’m not him” then Mr Famous might have a case for his name being used without permission, but probably not the vocal clone.

But it’s all so new and there’s no precedent.

Given the lawyers are all busy working out whether using copyright protected books and music to train generative AI is legal or not - and have good arguments on both sides - it’s all a bit unclear how stuff like this will work out in the end.


This Ain't Very Famous Guy: An AI Parody

That's what happened with Scarlett Johanssen's voice in the OpenAI thing, right? Or at least I think it was a claim at the time.

You go to court and see what happens.

Oh so its like patents

Then you'd be able to prove that you actually did that, for one.

If your thing got data of somebody who sounds like somebody else but users use not "somebody" but "somebody else" to generate derivatives then you know your answer:)

I mean, it’s the same as trademark disputes; legal standards will slowly be cobbled together from statutes, regulations, and random judges setting precedent. “Confusion in the marketplace” seems like a potentially relevant term — accidentally producing a product similar to an existing person’s voice is one thing, but publishing it in a manner and/or context that makes it seem like that person recorded the lines is something else entirely.

Anyway, given how the election is shaking out on Twitter, I have a feeling political usage will spark legislation and precedent far before commercial usage does. But that’s just a plain guess


The copyright hell carries on it looks like.

It's going to be an interesting First Amendment question.

Might as well make photoshopping and manual audio tweaking / impersonation illegal, since its the same ballpark, just less effort.

A bit like breaking a door is "the same ballpark" as unlocking it with a key. Or paying with legit currency instead of counterfeit.

Sometimes all you want is the effect, other times it's important that you're accurately representing effort or accounting for other human considerations.


Might as well make forging signatures and identity thief legal. Who is the government to say which squiggles I may and may not write?

Society is about compromises and balancing different needs against each other. Sometimes we go one way, other times we go the other way, there is no one principle that always solve any situation.


We need better liable and slander enforcement. Treat realistic media as a truth claim that is subject to liable laws.

They’re stifling creativity with these anti-AI bills! “No AI unathorized duplications”… these regulations are going to hold this country back while others advance. Mark Andreessen is very much against this government overreach

Yeah but I don’t think being either party in a precedent-setting litigation is fun or easy. You’d have to find some sort of political non-profit (ACLU?) to foot the bill as you go from appeal to appeal, all the while enduring negative media coverage and general attention.

The Camry class needs its defenders, I wholeheartedly agree, but it’s also a core principle of contemporary praxis that you gotta let people choose their comfort level/ability to contribute. Encourage, promote, embolden — but try not to shame :)

Anyway, something tells me this blog post is gonna be more than enough. I don’t think basically anyone is on the side of stealing people’s voices, it’s just intuitively icky in a way that scraping the NYT and deviantart archives for training data isn’t. Public shaming isn’t gonna win him a big sack of damages, but it doesn’t seem like that’s what he’s after!


I don't see why using AI would get around Midler vs. Ford. If anything, there is even less of an argument to be made in your defense when you use AI to replicate a voice, instead of using another voice actor to replicate the voice.

The court explicitly limited their decision to the voices of professional singers in that case:

> ...these observations hold true of singing, especially singing by a singer of renown. The singer manifests herself in the song. To impersonate her voice is to pirate her identity...

> We need not and do not go so far as to hold that every imitation of a voice to advertise merchandise is actionable. We hold only that when a distinctive voice of a professional singer is widely known and is deliberately imitated in order to sell a product, the sellers have appropriated what is not theirs...


Doesn't this have an obvious edge case for every singer from now on though? If your voice is cloned before you become a singer of renown you have no protection.

Real solution is to never use the voice actors again, and cut them out from the very beginning.

Maybe I am crazy but I don't really think it sounds that much like him. It's a little similar but different. It's slightly higher pitch, more nasal, and the intonation is a little different.

As someone who hasn't heard of him before, from the first few seconds of this video, I'd say it sounds similar enough to be an imperfect AI clone. https://www.youtube.com/watch?v=UMofZIT9FcQ

As some who has watched all his videos and livestreams, I think that it very much sounds like him.

It is clearly trained on his voice. The intonation and pitch differences you describe are just because it’s AI generated and not human speech.

With the tools I'm aware of you just add clips of as many types of voices you want blended in and it blends everything in them to an unknowable uncontrollable degree plus entropy of the system. I suspect their story is they have added in more pleasant sounding voices to the mix which provides enough differentiation.

Question is: who is to say how much is needed before it escapes likeness theft? The king of generic nerd voices is going to claim excessive likeness and the accused lifter isn't going to reveal his whole process. Also tuning AI voices by ear is surely possible soon so category kings are not saved by demanding to be left out of training. A ministry of voice authority sounds bleak.


I’ve watched hundreds of his videos and it sounds very much like him.

We have 100s of tools that are about voice cloning - of course we’ll get content with cloned voices.

Same as it happens with unauthorized use of someone’s images. And platforms and their moderation teams have processes in place to report and remove that. Looks like we need something similar for voice.


It’s the Wild West and will be for some time but I agree, they should have the decency to use only voices of the dear departed. The library should be open source and hosted on GitHub. the talking dead seems like a good name for it. Obviously we will have to put it to a vote among the living.

That's even worse imo. There's a reason nearly every major world belief system includes a proscription against necromancy and this is exactly what they mean. The living should not speak with the mouths of the dead.

Initially I'd say well if you're a public figure and upload your own voice online, of course this will happen. So its something to expect, however, this shouldn't be a problem for Jeff to solve... instead it should be YouTube's problem as they profit from the video monetization. Eventually they'll have to have some kind of detection for all uploaded content.

I strongly disagree. I don't know the rights around one's own voice, but the idea that you suddenly lose ownership of something because you shared it online is the exact thing that many people take issue with when it is written in the terms of service for social networks, creator tools (adobe), etc.

I didn't mention ownership and I don't think you should lose it (nor does one lose it really even in this case, legally). But I do think that in cases like these, where there's money involved and YouTube, that they should have the means to prevent it.

It's all fun and games until someone produces a recording of somebody else saying something incriminating and it will be used in court. This is the part of AI I hate.

It'll be bad for a few years, but surely at some point it'll become inadmissible in court because it's too easy to fake, right? But then what do we do, if video and audio footage is inadmissible?

I don't know where you live, but where I live I believe this sort of thing meets all the required elements for fraud.

>I remember when OpenAI practically cloned Scarlett Johanssen's voice...

There is absolutely zero evidence for this. I find it infuriating that this keeps being stated as a fact. So they go and hire a voice actor and clearly use her voice to train, but then they also scrape Scarlett Johansson from youtube and splice it into the training data to make the voice a bit more like hers? Really does that sound realistic?


>I haven't decided what to do.

Make a video, say what you think, get views, and probably put more pressure on Elecrow to respond.


They did make a video, https://www.youtube.com/watch?v=UMofZIT9FcQ

It was linked in the article.


What I want to know is...

Does this controversy all become free publicity for elecrow?


More and more I am starting to wish I had gone ahead with the novel I had sketched out in the 1990s. The backdrop was a kind of post-imitative-AI collapse of trust in society, because it had become effortless to fake up, say, your least favorite political candidate talking about the merits of eating babies, so the various echo chambers bore a kind of ghastly fruit, each stance finding its own "evidence" for its beliefs, right down to the flat earth types. Paranoia runs rampant, and so on.

It looks like we're heading in that direction.


The idea that stolen voice tones are going to matter at all is one of the shortest sighted bits of AI investment - powered by Hollywood "never make anything new" thinking.

In about 5 years AI voices will be bespoke and more pleasant to listen to then any real human: they're not limited by vocal cord stress, can be altered at will, and can easily be calibrated by surveying user engagement.

Subtly tweaking voice output and monitoring engagement is going to be the way forward.


I am not convinced that it will be even 5 years. Have you tested elevenlabs[1]?

They offer different voice cloning techniques today, starting from 30 seconds of audio input (sounds somewhat like the cloned voice but definitely not exactly the same) to multiple hours of voice input (sounds like the actual person). In addition, you can adjust the voices with a few parameters or simply create one by defining parameters.

The voice from the video could be an 'instantly cloned' voice based on a few seconds of voice input (judging from the quality). If you want to do y more advanced clone, you have to proof that it is your own voice.

[1] https://elevenlabs.io


"This call may be monitored or recorded for quality assurance and training purposes"

> training <


I'm long on humans and suspect that many people will begin to prefer imperfection in reaction to the overproliferation of ai generated content.

AI will be able to generate imperfection too.

That's convenient, right? AI will be more than perfect and also be able to be imperfect if you want it, it'll just do anything you want. Here's a better prediction: Almost no one wants to listen to whatever low effort trash gets pumped into an advanced TTS.

In my country there's a lot of dubbing, that are some dubbing actors who millions of people grew up listening to them on animes and the like, I could see companies buying their voices, because in that situation is not only about being pleasant, but a lot about familiarity. ElevenLabs, for instance, bought some voice rights from deceased people from their estate.

But aside this nostalgic-ish specific context, I don't see why wouldn't they just create a synthetic voice to begin with it.


Stolen voices matter because what's being stolen here is the authors likeness, his reputation that he's build in the YouTube tech space and used for commercial products he had already reviewed. They chose exactly his voice for that reason.

While AI voices will aesthetically be indistinguishable or even preferable they aren't going to carry any reputation or authenticity, which by definition is scarce and therefore valuable. In fact they're likely going to matter more because in a sea of generic commodified slop demand for people who command unique brand value goes up, not down. That's why influencers make the big bucks in advertising these days.


Exactly. If it was a brand of pen ink cartridges or dishwasher detergent, that's one thing (still would be wrong, but not as egregious, and I might never have known it happened).

The fact is, Elecrow's a company I've worked with in the past (never signed any contracts, but reviewed a product of theirs 4 years ago that they provided). They're active in the exact same space my YouTube audience is (Pi, microcontrollers, hobby electronics, homelab).

There are a number of potential Elecrow customers who also subscribe to my YouTube channel (one of them alerted me to the tutorial series, in fact), and I would rather not have people be confused thinking I've sold my likeness or voice to be used for corporate product tutorials.

Especially any competitors to Elecrow, who I may have a relationship with, that could be soured if they think I'm suddenly selling my voice/online persona for Elecrow's use.


> Stolen voices matter because what's being stolen here is the authors likeness

There is not enough voice space to accommodate everyone. Authors would like to fence off and own their little voice island. For every voice there are thousands of similar ones.


This was fine 10 years ago because I would, if I wanted to get an impersonator of someone's voice who carries a lot of weight, I'd have to go through to the work of hunting them down, finding out if they were willing, paying them, or I guess having tons of children and waiting decades and hoping their voice is right for what I need.

Now I can sit at a computer and spend a few hours doing it.

You are completely blowing past the facts of what a person's voice is, and the value it has if that person is someone special to a lot of people. Are there people out there who can do spot on Obama impersonations? Of course. Is the former President not wanting someone who sounds like him to pretend to be him to endorse a local candidate trying to unfairly "fence off" too much of the "voice space"? Give me a break.


Like slapped together particle board furniture vs hand crafted beautiful designs, I expect the price difference to be so significant that, like the artistic wood carvers of old Japan, that the market will dry up and fewer and fewer will hold the skill until it is practically lost

Again: this literally only matters currently in people trying to steal a voice.

There's already VTubers who's whole visual identity is synthetic. Why wouldn't the same happen in any other space where performance can affect the perception of content, but you can now simply engineer the performance?

Like I said: give it 5 years and you'll have influencers who no one has ever heard the voice of, because they don't make content with their own.


On mobile so not looking for the link, but a couple of years back, a motorcycle enthusiast vlogger who was this cute girl had her filter drop live and it was a middle aged guy the whole time. In that case, I recall the viewership accepted him

It's not even close to his voice lmao, just has similar cadence.

[flagged]


That is easy, got his YouTube channel and listen to any of his vids.

Jeff had been around in the tech community for a while. I have used his ansible roles almost a decade ago.


If you read the article you would have seen multiple links to his channel and videos where you could compare

He provided a whole YouTube video on his popular channel, with more than 750,000 subscribers: https://www.youtube.com/watch?v=UMofZIT9FcQ

RTFA

> His case would be much more compelling

If he convinces you, what will you be compelled to do?


Believe him? Which ostensibly is the entire point of writing a blog post about it?

It is like becoming meme famous: is up to you how to monetize it, nobody owes you nothing.

But they should owe you for stealing your likeliness without your awareness to promote their products. This isn’t for satire purposes.

How much? This IP/copyright mentality is so 90s/2000s. Brings back memories of Napster. This Jeff Geerling which I have never heard of will cry until some Spotify of AI appears and pay him some pennies. Maybe he wants to be the James Hetfield/Metallica of this generation. This guy doesn't have 1/10000 of the relevance Metallica had at the time tho.

> This IP/copyright mentality is so 90s/2000s. Brings back memories of Napster.

The trend has been going against copyright ever since internet was invented. We used to go for passive consumption - radio, TV, books, print magazines. But now that age has passed. We have changed. We prefer interactivity - games, social networks, web searching the billions of contents online, youtube clips commented and shared around. In this age copying is like speaking.

Now comes AI and pushes this trend even deeper - more interactive, more remixing and adapted to the user. We should take a hint - the idea of owning content doesn't make sense anymore. Any content can be recreated in seconds now, or similar content was already posted online years ago. Protecting expression is useless and protecting styles would destroy creativity. Quite a catch-22.

We should take a look instead at new models that popped up in the last decades - open source, creative commons, Wikipedia, open scientific publication. Maybe we need to decouple attribution from payment, like scientific papers, they cite each other without monetary obligations. In social networks comments respond to each other, it would not work if we had to pay to read. Even in real life, people don't pay to hear others speak, and are reusing ideas without issue.

I am aware this sounds bad for copyright, but I am trying to state the reality, not propose my own views. There are deep reasons we prefer interactivity to passive consumption. Copyright was fit for a different age.


> This Jeff Geerling which I have never heard of

This guy doesn't have 1/10000 of the relevance Metallica had at the time

It's weird that you seem to think a person's level of fame is relevant to a discussion of their legal rights.


In terms of personality rights, it is relevant. Although I would think Jeff is enough of a public figure to make a case here

What legal rights? TIL voice imitators are criminals. /s

Funny, I first heard of Jeff Geerling over ten years ago in my dev circles.

Your inability to look even few decades into the future to see the impact of this is depressing. You only seem to care about yourself and your current grift.

Welcome to Hacker News/YCombinator. It's what they do here. Ever since joining the community, it's been a speedrun of things explicitly covered as unethical during my CS Ethics course. Mainly stay here to keep a finger on how far things have slid down the "man made horrors beyond comprehension".

How are you on hacker news and never have heard of Jeff Geerling? He's a goat in the ansible and raspberry pi world.

You would be surprised how niche these two things are. I don't care about both and 6 seconds of a tech guy's voice I am sure isn't the trademark of his content

You think someone stealing the dude’s voice is the same as people downloading Metallica songs?

Are you trying not to be taken seriously?


The danger of this shit can not be understated. Four years ago already there was a video where a deepfake of a president of the USA read a speech: In Event of Moon Disaster https://www.youtube.com/watch?v=LWLadJFI8Pk we of course know Nixon never gave this speech.

What happens when this "AI" is used to sway an election?

Last month a family got hospitalized after eating mushrooms they found and identified from a AI generated book. They didn't know it was AI generated. What happens when and this is not if, alas but when people die from this?

This shit is a danger to democracy and human lives. Napster was not.


Hi chx, long time no see! And I agree; deepfakes and voice cloning is already past the point it's fooling relatives. Some are good enough I have to spend time double checking if it's real or not. There are very real implications to all that, and being able to verify true from false is going to get more challenging in many circumstances.

>Last month a family got hospitalized after eating mushrooms they found and identified from a AI generated book.

Yeah, mushrooms are very dangerous, but the argument here is that the book could be written by a human. So what's the best way forward? Ban A.I. books, and human books?

A.I. also, is the best way IMHO to identify mushrooms. Mushrooms are drastically different from one another by magnifying their spores using a microscope. That's not the case when looking at the fruit. Mushrooms totally unrelated to one another, may turn out to be very similar, depending on season, humidity, rain, elevation, tree hosts, temperature and soil.

However when a human tries to examine the spores, he has to compare the spores to thousands of mushrooms to be sure. That's an amount of information, that only mechanically could be tackled effectively. A.I. may have a good chance to solve that problem.


I feel like this comment helps confirm dead internet theory. Are we there yet HN?

Boneheaded take.

This whole argument rests on the absurd assumption that you can "own" a voice as if it's property. Does this mean people can own the patterns of vibration in air? It's completely nonsensical.

More like owning an identity or likeness, which is a fundamental basis of many fraud cases. It seems that Jeff Geerling is a resident of Missouri which also has Supreme Court precedent to test for "Right of Publicity" [1]:

"If a product is being sold that predominantly exploits the commercial value of an individual's identity, that product should be held to violate the right of publicity and not be protected by the First Amendment, even if there is some "expressive" content in it that might qualify as "speech" in other circumstances."

Whether or not the voice is determined to be predominant would be for courts to decide, of course, but there's clearly an argument.

1: https://law.justia.com/cases/missouri/court-of-appeals/2006/...


Intellectual property is, in fact, recognized by many legal jurisdictions. And audio works are typically included in that.

However, in this situation, the right of publicity is probably more applicable.


With enough of a "vibration pattern", it becomes a fingerprint.

"Why should child porn be illegal? It's just a pattern of bits on a computer!"

Describing a reasonable legal principle in terms of physics phenomena does not make it unreasonable.


You can own your likeness. Does that mean you can own the photos that represent your face? Yes, yes you can. Why should you voice be different?

Easy: The air is free but the electromagnetic spectrum is regulated.

/s


Human voices store shared accent data too not just biometrics, your siblings/twins/sorority sisters likely present with extremely similar voices which can cause the same confusion as AI clones. is it most famous sibling first gets the copyright or are siblings the only ones with the power to supersede it or must they alter their voice? Simple voice never really was guaranteed to be a unique part of identity since it changes so easily over time and ages.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: