
Adobe Voco 'Photoshop-for-voice' causes concern - ThomPete
http://www.bbc.com/news/technology-37899902
======
tekacs
> Dr Eddy Borges Rey - a lecturer in media and technology at the University of
> Stirling - was horrified by the development.

> "It seems that Adobe's programmers were swept along with the excitement of
> creating something as innovative as a voice manipulator, and ignored the
> ethical dilemmas brought up by its potential misuse," he told the BBC.

> "Inadvertently, in its quest to create software to manipulate digital media,
> Adobe has [already] drastically changed the way we engage with evidential
> material such as photographs.

> "This makes it hard for lawyers, journalists, and other professionals who
> use digital media as evidence.

I find this /reaction/ horrifying - Adobe isn't invalidating evidence so much
as they are making the public infinitely more aware (as a sibling commenter
points out) that evidence may already be compromised.

If Adobe doesn't do this, actors who care enough to forge evidence surely
will, so better that the public trust in voice evidence collapse than people
mistakenly continue to believe in it. :@

~~~
Waterluvian
It's such a tired argument. Should we be horrified that humans decided to
invent and evolve technologies that lead to anything possibly harmful?

~~~
TeMPOraL
Especially that _any_ technology that's even remotely useful for something can
be (and is) used for harmful things.

------
stevenh
[https://www.youtube.com/watch?v=I3l4XLZ59iw&t=2m34s](https://www.youtube.com/watch?v=I3l4XLZ59iw&t=2m34s)

"Wife" sounds exactly the same in both places, so all this did was copy the
exact waveform from one point to another. Nothing is being synthesized. If
this is all this app can do, it would be quicker and easier to do this
manually.

That little "guh" noise at the beginning of the first "wife" could also be
manually cleaned up and pitch/formant shifted to sound more natural with
respect to its position in the sentence.

[https://www.youtube.com/watch?v=I3l4XLZ59iw&t=3m54s](https://www.youtube.com/watch?v=I3l4XLZ59iw&t=3m54s)

The word "Jordan" is not being synthesized. He was recorded saying "Jordan"
beforehand for this insertion demo and they're trying to play it off as though
this was synthesized on the fly. This is all a scripted performance. Jordan is
phoning in his task of feigning surprise.

[https://www.youtube.com/watch?v=I3l4XLZ59iw&t=4m40s](https://www.youtube.com/watch?v=I3l4XLZ59iw&t=4m40s)

Here they double down on their lie. The phrase "three times" was clearly
prerecorded.

If Adobe wanted to put the bare minimum effort into trying to convince anyone
this was a real product that exists and is capable of synthesizing speech on
the fly, then they'd toss a beachball around the audience and have them shout
out words to type.

This is a fraudulent demonstration of a nonexistent product and an audacious
insult to everyone's intelligence. Adobe is falsely taking credit and getting
endless free publicity for a breakthrough they had no hand in. They are
stealing the hype recently generated by Google WaveNet and praying they'll
have a real product ready by whatever deadline they've set for themselves.

~~~
zump
Is this common in the software industry... ?

It is getting more tempting to join an industry where there are no customers
(prop trading).

~~~
TeMPOraL
Releasing half-assed trivial tech demos with shit-ton of hype generating
marketing? I'd say it's bread and butter of the industry - startups and big
companies alike. It's a reason I'm really starting to hate programming as a
job.

People say that a programmer should focus on meeting business needs, not on
typing out code. But if business needs are this level of bullshit, it's really
hard to focus on them and retain even a modicum of self-respect as an
engineer.

------
Hyperborian
If this technology is being made available to consumers now, that means it's
existed before now, probably for years, outside of the public's awareness.

Makes you think. What's been faked in the last year or two that we didn't know
could be faked?

~~~
Gustomaximus
What makes you think it existed? I dont buy this assumption that military is
10+ years head on everything. They just dont have the budget, nor can they
sequester everything of interest companies produce. Think Apple spends $10bn
on R&D yearly and they've been averaging one distinctly new product every ~7
years + refinements. And while I dont doubt the military have some
amazing/terrifying tech hidden it seems unrealistic is off to assume the
military has that much commercial goods developed ahead of the private sector.

~~~
nitrogen
Growing up, the first thing my friends did when they got audio software on PC
was edit recordings of people to make them sound stupid. It was basic swapping
answers from different questions.

Years later, I edited dialog from a movie (for my own use) to shorten or
rearrange sentences in a way that very few people can detect, to remove
profanity from clips I wanted to show to people who can't handle it.

Also, AIUI Hollywood has been splicing dialog for decades in ADR. Star Trek
TNG had an episode in which a vocal resynthesis device played a role, so the
idea is old.

------
nitrogen
Maybe people concerned about having their voice manipulated can carry a device
that generates an acoustic timecode whenever they speak, or at least some kind
of non-repeating tone generator. Supposedly the 60Hz or 50Hz hum in recordings
can be correlated with variations in the electrical grid frequency. It would
make sense to go further and deliberately tag live audio with a timestamp to
ensure continuity.

I can think of a few ways of faking a timecode like that, but they can be
counteracted to an extent.

~~~
Retr0spectrum
It would be trivial to filter out any kind of watermarking. Watermarks get
photoshopped out of images all the time.

On the other hand, if there was some way to add a watermark that was also a
cryptographic signature, you could actually prevent editing.

I have no idea how you would implement such a signature system though.

~~~
bradbeattie
Cryptography sign the image sans the least significant bits, then add the
signature in said position:

[http://www.lia.deis.unibo.it/Courses/RetiDiCalcolatori/Proge...](http://www.lia.deis.unibo.it/Courses/RetiDiCalcolatori/Progetti98/Fortini/lsb.html)

Shouldn't be too noticeable.

A similar approach should work for audio too.

------
nohuck13
"It seems that Adobe's programmers were swept along with the excitement of
creating something as innovative as a voice manipulator, and ignored the
ethical dilemmas brought up by its potential misuse," he told the BBC."

Is there a name for the fallacy of giving the first person to market all the
credit for an idea whose time has, essentially, come, with or without them?

Surely this is an idea obvious enough that, had Adobe not done (and
publicized!) this, someone else would?

Edit: sense

------
LiveOverflow
I don't really understand the concern. Digital media has always been edited
and manipulated. I was actually surprised to see so much news about it -
professional video/audio productions edit speech all the times. I'm sure with
a lot of dedication any amateur can already collect enough unique sounds and
and snippets from prominent people to fake sentences. It's just easier now.

~~~
aliakhtar
> I'm sure with a lot of dedication any amateur can already collect enough
> unique sounds and and snippets from prominent people to fake sentences

[https://www.youtube.com/watch?v=hX1YVzdnpEc](https://www.youtube.com/watch?v=hX1YVzdnpEc)

~~~
shermanyo
[https://www.youtube.com/watch?v=ZiEjs7deHL4](https://www.youtube.com/watch?v=ZiEjs7deHL4)
and
[https://www.youtube.com/watch?v=RtS2Ikk7A9I](https://www.youtube.com/watch?v=RtS2Ikk7A9I)

~~~
aliakhtar
[https://www.youtube.com/watch?v=ZIesCd4I4hU](https://www.youtube.com/watch?v=ZIesCd4I4hU)

------
kevin_thibedeau
About 17 years ago I saw a short piece on the old Headline News featuring a
professor who had developed something like this. They played a recording of
Whoopi Goldberg saying something made up. I never heard about it since and
assumed it was sucked into some black program.

------
TeMPOraL
:D :D :D :D. That's about the only thing I can say about this.

I mean, seriously. It's been obvious for like 20 years now that stuff like
this is going to be possible pretty soon. That someone finally packaged this
capability in a nice form-factor doesn't change much. But then again, I guess
people still didn't internalize the fact that _photographs_ are like 15 years
past being a reliable source, and videos probably 10 years past it too.

It's interesting how society will have to adapt to function with such
technological capabilities, but - like others here pointed out - it's nothing
really new, so I don't get the surprised concern.

------
sixstringtheory
When I was in grad school an Adobe representative presented at a colloquium,
showing us their new-at-the-time content aware photo editing (think removing a
person from a photo and filling in the trees/buildings behind them).

He mentioned a project based on predicting marketing/political campaign
reactions in things like social media. My hand immediately went up and I asked
him what they thought about the ethical implications, and how it could be
protected from abuse. "We aren't really thinking about those kinds of things."
I'm not surprised to see this line up with the quote in tekacs reply.

I don't trust Adobe, and while this is certainly really neat-o tech, I just
don't see its benefits outweighing the huge impact its abuse could
precipitate.

~~~
nohuck13
First hand anecdotes like that are some of my favorite HN things - thanks! But
do you think we as technologists should be thinking about things like that?
That feels icky to me. Maybe I'm wrong.

~~~
grzm
Technologists are people, too!

I think we should definitely be thinking about those things. We have some
pretty sharp technologists to point to who thought about the ethics of what
they were doing as well. Those working on the Manhattan Project and the German
equivalent[1] and come easily to mind.

[1]: [http://germanhistorydocs.ghi-
dc.org/pdf/eng/English101.pdf](http://germanhistorydocs.ghi-
dc.org/pdf/eng/English101.pdf)

------
rdtsc
Intelligence agencies would be interested in this but they probably have
something like it already.

Specifically this is relevant for ZRTP protocol. There is a voice-based
verification step which relies on knowing and recognizing the other party's
voice and then verifying a short authentication string they see on their
screen. Being able to mimic can lead to ability to conduct a man-in-the-middle
attack.

To counter-act the strategy is to use dictionary words instead of just numbers
for verification, say "pink salad elephant" instead of "1934" so then parties
would maybe have a joke or say something referring to the ridiculous word
combination combination. That would be harder to mimic.

~~~
jameshart
This is _literally_ the plot of Sneakers, the 1992 hacking movie. I would hope
voice signature systems are beyond "Hi, my name is Werner Brandes. My voice is
my passport. Verify Me."

~~~
rdtsc
Ha, good point. I forgot about Sneakers.

This is a bit different though. The idea is that both parties should see the
same authentication string and then verify verbally what that is.

Say they see "123" so Alice say "I have 123 as my SAS code, what do you
have?". And "Bob says, yap I have 123 as well". If Eve is in the middle then
she would show "123" to Alice but maybe "456" to Bob. She would have to fake
Alice's voice and tell Bob the code is "456" and it's ok. Then to Alice in
Bob's voice that the code is "123" and it's ok. All in such a way that Alice
and Bob don't suspect anything (has to be realtime too).

So this kind of software might make that easier.

The way to bypass that is to use dictionary words instead of numbers but then
also refer to them further in the conversation. (see silly example in a sister
comment about it).

------
rosstex
I assume that the words "three times" must have appeared at some point during
the long speech, correct? So it's not quite generating new sounds, but
intelligently rearranging them?

------
dreamcompiler
I know nothing about this product but based on its name and what it's doing
it's probably a fine-grained vocoder tuned with machine learning. I's not
quite trivial but it's an obvious idea. We've been building analog and digital
vocoders for decades (e.g. the Skylons' voice from the original Battlestar
Galactica from the 70s). It takes some hefty processing to do it with lots of
bands as they're probably doing here, but you could probably do it with a
desktop machine and a good GPU.

------
JunkDNA
So does this mean Disney will be able to keep making movies with Darth Vader's
"real" voice until the heat death of the universe?

~~~
grzm
I hear Disney's lawyers are currently working on extending copyright beyond
that. Steamboat Willie is expected to initiate the next Big Bang.

------
arca_vorago
Welcome to the Bin Laden tapes years ago. Remember, the NSA/CIA have tech
(they used to say 20) 10 years beyond the public.

Also, I'm pretty sure similar tech is already being used in production on TV
already. For example, I can't be the only one who sees all the cgi in the
latest BBC planet earth can I?

------
dingo_bat
This is amazing! Audio stuff has always scared me. I want to know more details
though. Does it transcribe automatically? Is there some learning involved?
Will it fill in background noises?

------
snsr
Can't wait to noodle around with this, amazing creative potential.

------
amyjess
I'm curious about the applicability this has for fandubs.

