
Ask HN: My wife might lose the ability to speak in 3 weeks – how to prepare? - tech4all
My wife will be undergoing significant oral surgery in a few weeks and there is a SMALL chance she may lose the ability to speak. I&#x27;d like to prepare, just in case, to have technology to reproduce her voice from keyboard or other input.<p>My ideal would be an open source &quot;deepfake toolkit&quot; that allows me to provide pre-recorded samples of her speech and then TTS in her voice.  Unfortunately most articles and tools I&#x27;m finding are anti-deepfake.  Any recommendations?<p>Fallback would be recording her speaking &quot;phonetic pangrams&quot; and then using her pre-recorded phonemes to recreate speech that sounds like her. I feel like the deepfake toolkit is the way to go.  Appreciate any recommendations... There must be open source tools for this??
======
audiohermit
Hey, speech ML researcher here. Make sure you have different recordings of
different contexts. fifteen.ai's best TTS voices use ~90 min of utterances,
some separated by emotion. If you're having her read a text, make sure it's
engaging--we do a lot of unconscious voicing when reading aloud. Tbh, if she
has a non-Anglophone accent, you're going to need more because the training
data is biased towards UK/US speakers.

If you want to read up on the basics, check out the SV2TTS paper:
[https://arxiv.org/pdf/1806.04558.pdf](https://arxiv.org/pdf/1806.04558.pdf)
Basically you use a speaker encoding to condition the TTS output. This
paper/idea is used all over, even for speech-to-speech translation, with small
changes.

There's a few open-source version implementations but mostly outdated--the
better ones are either private for business or privacy reasons.

There's a lot of work on non-parallel transfer learning (aka subjects are
saying different things) so TTS has progressed rapidly and most public
implementations lag a bit behind the research. If you're willing to grok
speech processing, I'd start with NeMo for overall simplicity--don't get
distracted by Kaldi.

Edit: Important note! Utterances are usually clipped of silence before/after
so take that into account when analyzing corpus lengths. The quality of each
utterance is much much more important than the length--fifteen.ai's TTS is so
good primarily because they got fans of each character to collect the data.

~~~
grogenaut
I came here to say this. My brother has a PhD in chemistry and no coding
experience. He was able to create a voice model of himself using basic nvidia
example generators in a week. My dad lost his voice and it would have been
very nice to have a TTS that was much more close to him. I personally would
think it would be worth it to have that database.

But obviously also attend to the human matters as well, eg spend time.

~~~
audiohermit
I work in pathological speech processing/synthesis so I'm unfortunately
familiar with your father's position. It really sucks that these people didn't
know that archiving their voice would've been useful. I hear snippets that
people manage to glean from family videos right after listening to their
current voices and it makes me really sad.

On the upside, your father can choose any celebrity he wants to voice him!
Tons of celeb data is publicly available (VoxCeleb 1 & 2).

~~~
vervez
Is Morgan Freeman the most used celebrity?

~~~
core-questions
I'd go for Stephen Hawking, myself.

(Not using his voice synth, reconstructed using ML, because it should sound
more natural that way ;-)

~~~
shagie
I recall that the "say" program on the SGI from the mid 90's was approximately
Hawking's voice. Hawking gave his speech for the Whitehouse Millennium Lecture
at SGI also, and while I wasn't able to attend I found the transcript of it
and fed it in there... there were some jokes that he had that only really came
through with the intonation and pacing of a voice synth -- its the ultimate
dead pan voice.

[https://clinton.presidentiallibraries.us/items/show/16112](https://clinton.presidentiallibraries.us/items/show/16112)
[https://youtu.be/orPUQm1ZRSI](https://youtu.be/orPUQm1ZRSI)

And his voice was his - even with the American accent.

[https://www.news.com.au/technology/innovation/why-stephen-
ha...](https://www.news.com.au/technology/innovation/why-stephen-hawkings-
voice-computer-spoke-with-an-american-accent/news-
story/d4529ffb6341278d8c1b33e06cd3099c)

> “It is the best I have heard, although it gives me an accent that has been
> described variously as Scandinavian, American or Scottish.”

> ...

> “It has become my trademark and I wouldn’t change it for a more natural
> voice with a British accent.

> “I am told that children who need a computer voice want one like mine.”

Somewhere, I recall a NOVA(?) program from the mid 80s where it showed him
using the speech synthesizer and the thing that he said with it that still
sticks in my mind is the "please excuse my American accent". In later years he
was given the opportunity to upgrade it to a more natural sounding voice - but
that voice was his.

~~~
egypturnash
Near the end of his life, his original voice computer started to fall apart.
He managed to get in touch with the people who wrote the software, who started
a mad scramble to find source, and ultimately ended up emulating the whole
setup on a Pi.

[https://theweek.com/articles/769768/saving-stephen-
hawkings-...](https://theweek.com/articles/769768/saving-stephen-hawkings-
voice)

------
kemiller2002
My mom lost her ability to speak, and what you are going to find is that your
life and how you interact with everyone will have to change. Human verbal
communication is very fast. She will find it difficult to be part of normal
conversations. Without lots of help, she will start to fade into the
background of conversations, because she can't keep up. You will have to help
her be a part of things. It will be a depressing experience for her, and you
will have to help her. People will look at her differently like she is
mentally handicapped. (I know she won't be, but people will assume that she is
even unconsciously). I recommend finding her a therapist if she has to go
through this transition.

~~~
bergerjac
Seems like a great application for Elon's Neuralink.

~~~
mhh__
Who needs therapy when you have technology that doesn't exist yet!

------
fxtentacle
Record her reading the texts of a standardized text training corpus.

That way, you can retrain an existing AI to do text to speech with her own
voice.

Edit: here's a link to the corpus that I believe Mozilla uses
[http://www.openslr.org/12/](http://www.openslr.org/12/)

~~~
asveikau
Is she on board with this? I can imagine a lot of people being severely put
off by being asked to record "a corpus of approximately 1000 hours" in advance
of what sounds like a stressful surgery.

~~~
joshribakoff
Seconding this, also, reproducing her voice with an AI may not be something
she is on board with, it could make her feel like you don't accept her with or
without a voice. It may also be unhealthy for you, similar to how spending too
long on social media can become a dangerous source of dopamine.

It might make sense to consider making a recording that is more meaningful,
and focus on giving her emotional support rather than building an AI that
could be perceived as a replacement.

~~~
netsharc
It's not like OP is replacing her entirity with Alexa, if I were the wife I'd
think "sure, let's 'backup' my voice, having it available in case I lose mine
would be useful, so that people can still hear my thoughts in my voice instead
of a robot's."...

~~~
badRNG
> if I were the wife I'd think "sure, let's 'backup' my voice"

That very well seems to be the OP's position as well. That's a far more
generous reading of the situation. It makes sense that someone here would have
the mindset of "lets keep a backup in case we want access to it later."

------
Rotten194
I would also suggest looking into learning American Sign Language (of course
alongside this project). While communicating via keyboard is workable and good
for communicating with the wider world, ASL would be much more convenient for
communicating between you two -- and a very interesting language to boot. It
is a foreign language thats not related to English besides a few loan words,
but there's tons of online resources and most universities have classes as
well. Plus, you also can experience beautiful Deaf culture, with a rich
storytelling and poetic tradition that blends language, gesture, acting, and
pantomime in a way thats just impossible to translate to a spoken language.

The downvoted commenter was being a jerk, but I do think learning ASL is an
option worth looking into.

~~~
krisoft
I think your answer misses the point of the question. Learning ASL can be done
after the surgery if she lost her voice. The question was what can be done now
before the surgery. The kind of things which, if it comes to the worst and she
loses her voice, cannot be done after.

~~~
saltcured
I wouldn't discount the value of having some rudimentary signs to communicate
immediately after surgery. It seems odd to me to focus on some dream of a
perfect TTS synthesis if these more basic needs are not addressed first.

If you've ever had a mouth injury that inhibits talking, or been in a foreign
environment where your speech is totally useless, it can be very stressful to
be unable to communicate. I think the couple should consider learning some of
the basics ahead of time, so that communication is possible without typing or
any other apparatus.

Considering post-surgery recovery window, I'd want to be able to express very
basic things like:

I am comfortable

I am in pain

I am hungry

I am nauseated

I need to urinate/defecate

I want to rest

I love you

When will you return

etc. I might suggest trying to boil down one or two inside-joke kinds of
phrases as well, to be able to lift each others spirits in private or intimate
way.

~~~
whatusername
a pen and paper would suffice for immediate communication needs.

~~~
bluGill
If it must, but it isn't as smooth as conversation can be. sign language is a
real language, and you can have real conversation, with all the pros and cons
of real conversation.

------
quiet_hacker
I have a progressive neurodegenerative disease and lost most my ability to
speak about 3 years ago. What you are proposing is super cool, but you might
be overthinking this. These things (text to speech, etc) are more awkward than
practical in real life. Also, make sure your wife is completely on board.
Seeing old clips and hearing my voice is actually kind of depressing to me.
Here is my actual advice:

Outside of social situations, it honestly hasn't been that big of deal for me.
As a remote developer, my job has remained the same. My managers and co
workers have been super supportive. I send messages during meetings to one
person who will read it aloud for me.

With text and social media, I still keep up with friends and family. Most
medical appointments, etc, can be made online. SprintIP relay is free for
deaf/speech impaired, and it allows the caller to type what they want to say
and a representative will relay this to the other party. It works via the web
or a mobile app.
[https://www.sprintrelay.com/sprintiprelay](https://www.sprintrelay.com/sprintiprelay)

Banks, brokers, or anything involving personal info (like SS#) usually
requires a voice phone call. I have my wife call and explain the situation. I
can whisper yes, as they occasionally require me to give permission. Some call
center representatives have no idea how to handle this situation, and will
just stick to the script saying they have to speak to me the entire time. My
wife just thanks them, calls back, and hopes for someone more understanding.

There are awkward encounters where people don't know you can't speak, and will
respond by speaking louder and slower. These people will also assume you are
not intelligent and be dismissive. This is just one of the things you have to
deal with.

I sincerely hope the procedure goes well and you wife doesn't have to deal
with this. Just know that even if the worse happens, she can have a normal and
productive life!

~~~
aspaceman
> There are awkward encounters where people don't know you can't speak, and
> will respond by speaking louder and slower. These people will also assume
> you are not intelligent and be dismissive. _This is just one of the things
> you have to deal with._

It sucks you have to just deal with it.

------
happycry
We get quite a few requests for this at Resemble
([https://resemble.ai](https://resemble.ai)). We can get her to record right
on our website or you can upload an existing file (along with a video of her
consent) on the platform. Feel free to shoot me a message and I'd be happy to
help build a voice for her.

~~~
cdolan
I dont know how to send messages but I researched this space a few years ago.
Unfortunately a family member of mine had a surgery result in loss of his
speech.

We have a lot of tapes around of his voice, from voice mails to family videos
to some things from his work. If you are open to reaching out that would be
awesome, I’ll check out the site as well.

Edit: I’ve wanted to make some sort of soundboard + “text to talk” setup for
this family member. He often can’t participate in conversations because he
writes on a whiteboard, and the speed of chatter moves faster than his writing

~~~
happycry
Feel free to shoot me an email: zohaib[at]resemble.ai

We also have an API that you might find useful for the soundboard project:
[https://app.resemble.ai/docs](https://app.resemble.ai/docs)

------
mattlondon
I don't know if you have kids/grandkids/nieces or nephews (or plan to have
those) but it might be nice to record your wife reading some books out loud.

Not only will you have your own personal "audio books" of Harry Potter/The
Hobbit/Chronicles of Narnia/Oi Frog/Alice in Wonderland/Roald Dahls etc etc
for any kids/grandkids/relatives etc that will hopefully be something
treasured in its own right, but you'll also have a large corpus of training
data from well-known texts that you can retrain over and over as the tech
improves in the future. Might be worth chucking in some other well-known texts
to avoid over-fitting on a "kids' story voice" \- maybe something plain like
inauguration speeches/declaration of independence/magna carta/etc.

Obviously I'd focus on gathering raw material now, and focus on the
reconstruction later when you've all recovered mentally and physically to
whatever happens. The more data the better when it comes to this sort of
thing. There might not be something "simple" right now (e.g. you could
probably implement the WaveNet or similar paper yourself today, and training
it up on some GPUs in your spare room etc, but in a few years there might be a
nice WYSIWYG/SaaS thing for it), but with the recordings safely stored you'll
obviously be able to use it in the future.

Best of luck to you both.

~~~
Zenbit_UX
I like this idea but the specific examples you give would almost certainly be
a terrible idea. A voice trained on Tolkien or old American legalese like the
Magna Carta would train a model with a lot of thee, thus, therefore and though
art and undertrain it with modern English. His wife would sound like the
second coming of Jesus or Shakespeare and less like a normal human being.

~~~
mattlondon
From what I understand, it is not the words themselves (thee etc) but the
sounds that make the words - so the "th" and the "ee" are still legit sounds
in modern English words. The network would just be synthesising the words you
tell it to - it won't be picking the words for you.

I might be wrong though.

------
kerkeslager
I don't have any answers to give you, but I want to say that this is a really
loving and beautiful thing you're trying to do.

~~~
Someone
Is it? My first thought was _“is your ideal also her ideal?”_.

We cannot rule out she wants to spend quality time with her partner instead of
spending time in a recording studio, so that, if the worst outcome comes, her
husband can remind her of what she lost.

~~~
kerkeslager
Presumably the guy is better at guessing what his wife wants than you are, and
his wife is an adult who can tell him if he guesses wrong.

~~~
thaumasiotes
> his wife is an adult who can tell him if he guesses wrong

She can, but she might not. A lot of that depends on how he presents the idea
to her -- it might seem like something that's important to him.

~~~
pugworthy
It's sad that people trying to discuss the emotional side of this are being
down voted.

Honestly there is no doubt a very large emotional/personal side of this,
irrespective of who's idea it is and who supports it.

Technology isn't the solution for all problems and challenges in life.

~~~
at_a_remove
No, it isn't.

But good lord, sometimes trying to get technical help on the Internet turns
into this rabbithole of people who are specifically looking for ways _not_ to
be helpful. "Did you really want that?" "Did you consider alternatives?" "What
you _really_ have is an XY problem."

~~~
pugworthy
_" Truly identifying a problem means looking deeper at the symptoms, the
customer, the impact, the alternatives, the opportunity, and the relationships
between them, while avoiding the “solution bias” (often known as “The issue is
that the customer does not use my solution”)."_

#1 item from [https://www.molfar.io/blog/yc-
questions](https://www.molfar.io/blog/yc-questions)

~~~
at_a_remove
Or not. Not everything has to be this super-deep, six whys exploration of how
craving and attachment is the cause of all suffering and if you would only
stop wanting a solution you would no longer be in pain.

Sometimes a cigar is just a cigar.

~~~
pugworthy
If it was a cigar he’d just ask the technical question of how to capture and
simulate someone’s voice.

~~~
at_a_remove
"I'd like to prepare, just in case, to have technology to reproduce her voice
from keyboard or other input."

He then goes on to say "My ideal would be an open source 'deepfake toolkit'
that allows me to provide pre-recorded samples of her speech and then TTS in
her voice."

That sounds like wanting to capture and simulate someone's voice.

------
covercash
Other resources you may want to explore are r/mute and r/deaf subreddits. Both
also have Discord servers listed in the sidebars.

Having spent a good deal of time in hospitals, a few things I recommend... 10’
phone cable since outlets can sometimes be far from the bed, cheap slippers
she can wear to walk around (stepping in a hospital hallway mystery puddle
wearing just socks is very unpleasant), comfy clothes that you don’t mind
having ruined (T-shirts, underwear, shirts, pajama pants - they can
temporarily unhook the IV so she can put a T-shirt on), earplugs, eye mask. If
she’s going to be on liquid-only diet, bring your own since hospital food is
not great, not terrible. Soylent/Orgain/Ensure if she’s permitted that,
otherwise good quality Italian ices are such a nice treat and most hospitals
have a patient fridge/freezer you can store them in. Broth, but go to a
restaurant or grocery store/farmers market with hot soup bar and fill a
container with just the broth from the chicken noodle soup. It’s INFINITELY
better than boxed broth.

Hopefully all of your research and preparation will be for nothing, I wish you
and your wife a successful surgery!

------
dawg-
Speech-language Pathology student here. I would recommend going to see a
speech therapist. It will likely be covered by your health insurance. Find an
SLP who specializes in AAC (Augmented and Alternative Communication) who can
help your wife communicate if she loses her speech. Your DIY approach could
work, but having support from an SLP to help her learn the system, and come up
with other options if it doesn't cover all of her communication needs, will go
a long way.

~~~
stevenbedrick
Upvoted and agreed 100%, from an AAC researcher. Your best bet is definitely
going to be to reach out to an SLP with AAC expertise.

------
coronadisaster
Just have her carry a good microphone at all times to record everything she
says until that point, to have a maximum amount of samples. If you can't
"deepfake" it today, maybe you will be able to do it tomorrow, but at least
you will have the data.

~~~
lostlogin
“This conversation is being recorded for training and quality assurance
purposes.” Should be stated before each new interaction. The legal requirement
will vary by jurisdiction but a lawyer can advise on that. And yes, I’m
joking.

~~~
coronadisaster
While this can be true, it depends in which state that you live in:
[https://recordinglaw.com/party-two-party-consent-
states/](https://recordinglaw.com/party-two-party-consent-states/) . In
Illinois, it is apparently legal for the police to record you without consent
but it is illegal for you to record the police...

------
korethr
Others here are addressing technical solutions, but I don't see anyone here
covering non-verbal communication. IMO, that's going to be just as important.

I am going to assume that your wife and you have a healthy relationship with
strong communication, in part because you've developed an intuition for her
body language and other non-verbal communication methods. In the scenario
where she loses her ability to speak, even if she happily and completely takes
to whatever technical solution(s) you offer to replace that, I think it's
likely she will reflexively lean more heavily on those non-verbal channels,
and you're going to need to get better at reading them than you are now.

------
uberman
This might get you started:

[https://speech.microsoft.com/customvoice](https://speech.microsoft.com/customvoice)

I imagine if MS offers custom voices then the other text to speech providers
do as well.

Good luck

~~~
tech4all
Thank you - great lead.

------
thaumasiotes
Some (decades old) research on this involved a research team creating a video
of JFK saying "I never met Forrest Gump". I found a writeup in Google Books:
[https://books.google.com/books?id=mQtGVQeQplcC&pg=PA208&lpg=...](https://books.google.com/books?id=mQtGVQeQplcC&pg=PA208&lpg=PA208&dq=%22I+never+met+forrest+gump%22&source=bl&ots=k3PobhFWaY&sig=ACfU3U3VlGf4aIdU1Q_JRllhb8AwVNzeLA&hl=en&sa=X&ved=2ahUKEwiT7dyxkvrpAhUrHDQIHVUfDwQQ6AEwAHoECAoQAQ#v=onepage&q=%22I%20never%20met%20forrest%20gump%22&f=false)

> We evaluated our Kennedy results qualitatively along the following
> dimensions: ... naturalness of the composited articulation; ...

Obviously the state of the art will have advanced, but maybe this can point
the way toward more current research.

While I tend to agree with everyone else that this _can be_ a great idea, my
instinct is to float the idea to your wife first and see how she responds. I
can imagine someone taking this negatively.

~~~
foepys
There is a YouTube channel called "Speaking of AI" that makes short fake
speeches of some US public figures. The quality is quite good and a bit
frightening.

[https://www.youtube.com/channel/UCID5qusrF32kSj-
oSGq3rJg/vid...](https://www.youtube.com/channel/UCID5qusrF32kSj-
oSGq3rJg/videos)

------
watertom
If she loses her ability to speak there are many ways to help her out, but
nothing can replace the sound of her voice, especially for those important
moments.

Just in case. Record specific messages for various people in her life, that
can be used repeatedly, Children, Mom, Dad, siblings, in-laws, friends,
messages like: "X, I love you", "X, I miss you.", "Mommy loves you!" "Give me
a hug". "Holiday Greeting", "Happy Birthday","I'm so proud of you!" favorite
happy saying, frustration saying,

You get the idea.

------
arethuza
What about recording messages to other people for future events (e.g.
graduation of a child, birth of grandchild etc.)?

Recording a message to a yet unborn grandchild is maybe something we could all
do!

------
jasonhn9999
When my dad lost his speech, we had Boogie Board Jot devices all over the
house. It made writing short notes and simple dialogs much less tedious.

We also used the Verbally premium iPad app to help give him a voice and make
transactions on easier.

Wishing you all the best.

------
fxtentacle
The paper "Generalization Of Audio Deepfake Detection" gives an overview.

The paper [https://arxiv.org/abs/1904.05441](https://arxiv.org/abs/1904.05441)
has a list of spoofing methods.

Here's one method as paper
[https://arxiv.org/pdf/1806.04558.pdf](https://arxiv.org/pdf/1806.04558.pdf)

And here on GitHub [https://github.com/CorentinJ/Real-Time-Voice-
Cloning](https://github.com/CorentinJ/Real-Time-Voice-Cloning)

------
probably_wrong
For an open-source approach, the MaryTTS project has a guide on how to add new
voices to their tool:
[https://github.com/marytts/marytts/wiki/VoiceImportToolsTuto...](https://github.com/marytts/marytts/wiki/VoiceImportToolsTutorial)

------
mbreese
You may want to look up what was done for Roger Ebert. He has lost his voice
due to surgery, but because of the vast corpus of audio recordings of him, a
viable text to speech engine was able to be created.

It’s a bit dated at this point, but I imagine the research has vastly improved
since then.

It’s a very good question though. A decade ago this was able to be done for
one man. Is it now possible to be done for anyone? Like others, I’d guess the
first step is to record everything while you can.

------
echelon
I wrote [https://trumped.com](https://trumped.com)

You ideally want five hours of clean speech (good microphone, no background
noise, high sample rate). It should be spoken clearly, in a single tone or
mood. My model sounds awful because the data isn't consistent, and the room
tone and microphones are terrible.

If you want different prosody or moods, don't mix them in the same data set.

You can experiment with transfer learning LJSpeech with Nvidia Tacotron2 right
now. Glow-tts is also promising.

You'll start to get results with fifteen minutes of sample data, but for high
quality you want a lot of audio.

Have your wife read a book and record it. The training chunks will be ~10
seconds apiece, so keep that in mind for how to segment the audio.

Focus on getting lots of good sounding data. Hours. The models will improve,
but this may be your only shot of acquiring the data.

Download the LJSpeech dataset and listen to it. See how it sounds, how it's
separated. That is a fantastic dataset that has yielded tremendous results,
and you can use it for inspiration.

------
nutanc
At a minimum get the following list of sentences recorded in her voice,
[http://www.festvox.org/cmu_arctic/cmuarctic.data](http://www.festvox.org/cmu_arctic/cmuarctic.data)

Make sure the recordings are of a good quality. This will ensure that you will
have a baseline TTS of her voice at the minimum.

------
asdfman123
Here's a simple and practical solution:

Get a decent audio headset, have it record the audio to her phone, and spend
hours talking to her about whatever. Preferably in a reasonably quiet
environment.

Just spend a lot of time talking. You don't have to talk to her through a
headset. Just make sure hers is recording her voice.

It would be easy, painless, and probably good for the relationship too.

------
arslnjmn
(off topic) Record a few things for her future self. E.g. favourite quotes,
frequently used phrases.

~~~
zxter
Good advice! Maybe a few shoutouts to your future children.

------
bcatanzaro
Make sure to record with the best microphone you can find and in the quietest
room you can find. Makes a huge difference in the resulting TTS.

------
adrianmonk
You might look at resources for ALS patients.

Since ALS (aka Lou Gehrig's disease) is a degenerative motor neuron disease,
people with ALS can pretty much count on eventually losing the ability to
speak. So "voice banking" is apparently pretty common.

------
anaisbetts
Not exactly what you're asking for, but I wrote an app for this scenario:

[https://play.google.com/store/apps/details?id=org.anaisbetts...](https://play.google.com/store/apps/details?id=org.anaisbetts.sirene)

This is a text-to-speech app with a very keen emphasis on _Day To Day_ usage -
the UX will put the focus at the right places, help you reply faster, etc. I
used it for a full month when I was unable to speak after voice surgery and it
made a big difference, other folx have reported the same

------
da39a3ee
This is probably a really stupid suggestion but just in case.

Do you and your wife drink alcohol a bit? If so might it be worth having a
couple of drinks in a quiet setting with her one evening with microphones
running? I'm not suggesting getting wasted! I'm just wondering whether it
might help to catch her getting more animated or "natural" in conversation. I
was thinking this might help make the resulting synthesized speech capture
even more of her personality than reading children's books or subsets of AI
corpora etc.

------
shockron22
I have had good results with this.
[https://www.resemble.ai/](https://www.resemble.ai/) It is based on this open
source work. If you want to run it yourself.
[https://github.com/CorentinJ/Real-Time-Voice-
Cloning](https://github.com/CorentinJ/Real-Time-Voice-Cloning)

The voice cloning can be done in a matter of minutes. (< an hour) Its also
very easy to use the website.

Best of luck!

------
kw9
Strongly suggest reaching out to Dr. Rupal Patel
([https://www.linkedin.com/in/rupalvocalid](https://www.linkedin.com/in/rupalvocalid))
of Northeastern University ([https://coe.northeastern.edu/people/patel-
rupal/](https://coe.northeastern.edu/people/patel-rupal/)) and VocaliD
([https://vocalid.ai/about-us/](https://vocalid.ai/about-us/)). She's a
licensed Speech-Language Pathologist
([https://web.northeastern.edu/cadlab/publications/RupalPatel_...](https://web.northeastern.edu/cadlab/publications/RupalPatel_CV_WEB.pdf))
and she and her husband, Dr. Deb Roy, did the Human Speechome project
([https://en.wikipedia.org/wiki/Human_Speechome_Project](https://en.wikipedia.org/wiki/Human_Speechome_Project)).
She was also my doctoral advisor and I feel confident saying she would be very
interested in talking with you.

------
benjohnson
Do you have children? Perhaps - record her reading a few favorite children's
books.

------
DoreenMichele
Not to discourage you from making voice recordings and all that, but as
someone who is handicapped and sometimes has trouble speaking because of it:

1\. I spend a lot of time online. It doesn't matter so much there. I do a lot
of typing.

2\. My oldest son, who had serious output difficulties as a child, is talented
at inferring what I need from a gesture and a grunt. This has proven
enormously helpful.

3\. Consider using her phone as a communication device. It's small and people
tend to take their phone everywhere and she can type out what she wants to
say.

4\. Writing tweets can help a person learn to say things more succinctly. I do
freelance writing and figuring out how to say things succinctly is a talent
you can develop. (It's something I have to work at -- I'm a "would have
written you a shorter letter if I had more time" type of person.) This can
help enormously when you face communication barriers.

5\. Take some time to deal with the emotional stuff. It matters.

I'm sorry you are facing this. Best of luck.

------
jitendrac
ML will require a lot of samples for getting it as desired. I will say, let
your wife carry an attached microphone and meet all the people she wishes to
talk at least once. collect all the audio data, and you can use it later. <ake
all the available moments memorable for her like If you have child record a
message from your wife for next 10 birthday of child.

------
underdeserver
Consider investing in a good microphone for recording. A Blue Yeti is ~$200.

------
seesawtron
Here's a recent work [0] where you can train the model with 10s audio and
convert any "text to speech" (all doable in the browser). I tried with Google
Colab demo [1] and its performance fluctuates with the training audio sample
that you give it so might need some trial and error to get the sweet spot.

Also the model is not saved in the browser with Colab so you might also want
to do it locally to save it eventualy (if it comes to that).

All the best mate!

[0] Main repo: [https://github.com/CorentinJ/Real-Time-Voice-
Cloning](https://github.com/CorentinJ/Real-Time-Voice-Cloning) [1] Google
colab repo to try it out: [https://github.com/CorentinJ/Real-Time-Voice-
Cloning/blob/ma...](https://github.com/CorentinJ/Real-Time-Voice-
Cloning/blob/master/demo_toolbox_collab.ipynb)

------
ardenwood
Hi, I like your idea for your wife. Hope the surgery will succeed without
damage to her speaking. I'm from Nvidia and know well the team behind NeMo
toolkit. Happy to connect you to the team if that helps. You may send me an
email to ardenwood.bruin_at_gmail.com. -- Michael

~~~
maps7
That's really good of you. It's amazing to see this community be so helpful.

------
jameswestgate
This may also be useful. Free and open source.

[https://www.tobiidynavox.com/en-gb/software/web-
applications...](https://www.tobiidynavox.com/en-gb/software/web-
applications/message-banking-2/)

------
totetsu
The mycroft voice assistant has some tooling they used to create voices.

[https://mycroft.ai/blog/mimic-2-is-live/](https://mycroft.ai/blog/mimic-2-is-
live/)
[https://github.com/MycroftAI/mimic2](https://github.com/MycroftAI/mimic2)

Search Results Web results

Festival Speech Synthesis has a tool for recording speech databases, and some
tutorials for training festival voices.
[http://www.cstr.ed.ac.uk/research/projects/speechrecorder/](http://www.cstr.ed.ac.uk/research/projects/speechrecorder/)

------
disabled
You need to do voice banking. It is imperative that you do so, so that your
wife keeps her identity no matter what.

What you need to do is spend the entire next 3 weeks doing voice banking. This
will give your wife a text-to-speech voice (SAPI 5 voice, or others, for
example). You record phrases that the voice banking service wants you to
speak, with a _high quality headset (best if wired) in a quiet setting_.

The more sentences (samples) you have, the better the voice will be,
obviously. But, there are services out there that will update the recordings,
as the technology gets better, and that is the way to go, in terms of choosing
the "best service".

The voice banking services that people typically use are here:
[https://www.mndassociation.org/professionals/management-
of-m...](https://www.mndassociation.org/professionals/management-of-mnd/aac-
for-mnd/voice-banking/equipment-and-services/)

I would say that Acapela my-own-voice is currently the best technology.
Obviously there are open source technologies, but you do not have the luxury
of time to figure all of that out. However, you should do your own voice
banking for later post-processing on your own with open source stuff.

There is also a free version of voice banking available, but I would only
recommend it as a secondary tool:
[https://www.modeltalker.org/](https://www.modeltalker.org/)

This app (iOS and Android) for example, allows you to use your personal voice
banked text-to-speech voice, to talk: [https://therapy-
box.co.uk/predictable](https://therapy-box.co.uk/predictable)

This is another great app that allows you to use your personal voice banked
text-to-speech voice:
[https://www.assistiveware.com/products/proloquo4text](https://www.assistiveware.com/products/proloquo4text)

Source: Disabled engineering student, who is extremely interested in assistive
technology. I would love to be a rehabilitation engineer.

------
stevewillows
It might also be worth recording normal conversations you have around the
house as a fallback. You can always cut it up later and feed it into these
systems.

Best of luck to the two of you. I really hope you don't ever need this
technology.

------
KhoomeiK
You might want to try DIY'ing something like this [1] depending on the
extensiveness of her surgery. It basically records electrical signals (EMG)
emitted by the vocal chords (subvocalizations) and can convert it to text with
ML/other signal processing algorithms. Basically a rudimentary version of the
transhumanist Brain-Computer Interfaces that would enable telepathy.

[1] [https://dam-
prod.media.mit.edu/x/2018/03/23/p43-kapur_BRjFwE...](https://dam-
prod.media.mit.edu/x/2018/03/23/p43-kapur_BRjFwE6.pdf)

------
nighthawk454
This can be trained using only 5 Seconds of reference audio:
[https://google.github.io/tacotron/publications/speaker_adapt...](https://google.github.io/tacotron/publications/speaker_adaptation/)
[https://arxiv.org/pdf/1806.04558.pdf](https://arxiv.org/pdf/1806.04558.pdf)

It's been mentioned a bit already, but thought it was worth calling out. This
may be one of the lowest-overhead ways to start experimenting, at least in
terms of data collection.

------
abjecton
Your approach towards the situation might determine the life quality of you
and your wife. I can't imagine how it's like to think in a logic way while
you're in the middle of such of an emotional event.

------
The_rationalist
[https://dathudeptrai.github.io/TensorflowTTS/](https://dathudeptrai.github.io/TensorflowTTS/)
is the state of the art and feels natural enough

------
ooopsnevermind
First off I'm sorry you're going through that, it sounds really tough. We
sometimes have families use us for this
([https://trysaga.com](https://trysaga.com)) as a way to collect voice
recordings of loved ones, to record and share a large number of memories and
stories in their voice and have them saved forever. You can download all the
recordings to keep. It's free right now and I'd be happy to help out and make
sure it got you what you needed, let me know.

------
cl0rkster
Probably not what you were seeking, but I have to imagine it would be similar
to long periods I have spent in a non-verbal state. Being allowed to exist and
just smile or laugh as a "part" of the conversation around me was like
sunlight on a dark day. The range of human emotion and expression often
overlaps enormously between people. Sometimes pretending you're voice is
really the good you hear around you and not the throat mumblings that cause so
much conflict is the most beautiful dream.

~~~
cl0rkster
Also... Learn sign language. Some of the most beautiful and overlooked people
are non-verbal. I've met several truly speechless people who had families that
never learned to sign. It's sad for them.

------
redsh
Sorry about this. Record as much voice as you can now (stereo too?), then
you’ll have time to find the right solution and improve it as the technology
gets better in time

------
erogol
Hope I am not repeating any comments here. My suggestion is that you start
recording as soon as possible and as much as possible without worrying about
technicalities. You can also use if you have any old voice records or videos
with a relatively good voice quality. For now maybe she can read a book aloud
in a silent room. After you have the data I can also help if you like to
create a TTS model.

------
YAFZ
You might contact the following company: [https://www.acapela-
group.com/solutions/acapela-voice-factor...](https://www.acapela-
group.com/solutions/acapela-voice-factory/)

There's also open source TTS from Mozilla:
[https://github.com/mozilla/TTS](https://github.com/mozilla/TTS)

------
m463
I went through something similar with a parent years and years ago. I wanted
to be able to do things to help with what would eventually be lost.

I have to say I didn't help as much as I thought I could and afterwards I was
always wondering if I could have used this technology or that and done more.

So - I think you should recognize that you can only do so much, we're doing
the best we can, and in the end we are all winging it.

------
hvaoc
This is not open source but this was very good from their demo in terms of
your own voice reproduction.

[https://www.descript.com/lyrebird-ai](https://www.descript.com/lyrebird-ai)

I hope good folks in there will help you, try reaching them.

[https://m.youtube.com/watch?v=VnFC-s2nOtI](https://m.youtube.com/watch?v=VnFC-s2nOtI)

~~~
unstatusthequo
Love Descript and think it’s a great way to both record and get transcripts.

------
TriNetra
I've recently seen these two software on HN that maybe of some help:

deepfake for voice: [https://github.com/CorentinJ/Real-Time-Voice-
Cloning](https://github.com/CorentinJ/Real-Time-Voice-Cloning)

Reproducing emotional voices:
[https://www.sonantic.io/](https://www.sonantic.io/)

------
rajacombinator
Is this a time sensitive procedure? I think I’m stating the obvious - (maybe
not) - but this is not something you should just wing a few weeks before, nor
is it something you should try to figure out on your own without _thoroughly_
discussing with your wife. “Surprise honey, I deepfaked your voice!” is not
something most people would appreciate.

------
abinaya_rl
You are trying to do a beautiful thing. I don't have a knowledge of this
subject, but I really wish you good luck on this project.

------
inspectorG4dget
Nobody has mentioned VocalID and voice surrogacy [1] yet. This organization
might be able to recreate her voice from historic samples for speech-to-text

[1]
[https://www.ted.com/talks/rupal_patel_synthetic_voices_as_un...](https://www.ted.com/talks/rupal_patel_synthetic_voices_as_unique_as_fingerprints)

------
meristem
All sorts of feels here. I had a positive outcome from exploratory throat
surgery that had a chance of obliterating my voice. Prepping the way you are
doing is amazing. Please balance it with time well-spent with your wife, being
present in the moment. Sounds trite and yet takes focus to not just
concentrate in the possible negative future outcome.

------
peterwwillis
Here's a story from the San Francisco Chronicle on saving Stephen Hawking's
voice: [https://www.sfchronicle.com/bayarea/article/The-Silicon-
Vall...](https://www.sfchronicle.com/bayarea/article/The-Silicon-Valley-quest-
to-preserve-Stephen-12759775.php)

------
loph
You might look at what Jamie Dupree has done.

[https://www.cnn.com/2018/06/15/health/dystonia-jamie-
dupree-...](https://www.cnn.com/2018/06/15/health/dystonia-jamie-dupree-radio-
no-voice/index.html)

He uses a text-to-speech system that sounds more-or-less like him.

------
jimlikeslimes
This is very much a short term solution if they are unable to talk immediately
after surgery, for up to a few days. My wife used a small portable whiteboard
and magic marker to write messages on in the same situation. It worked really
well. Even with our 2 year old, it helped her to understand something unusual
was going on.

------
offsky
I’m sorry that the both of you have to deal with this. I’ve read many of the
replies here and I’m surprised there isn’t already a self-service website that
does this. Pay some money, record some text, and boom here’s your voice.
Something like this should exist. Someone should build this.

------
moooo99
I don't really have anything to add to all the helpful comments under your
thread. Do the preparation as much as you can, as long as your wife also wants
this.

You said there is a small chance, so I really wish you and your wife the best
of luck that she and her voice will be fine after the surgery.

------
eschaton2023
If she has time get here to read the most common english words. Then parse the
text and play the audio for the known words and use traditional speech
synthesis for the outliers. It will not be perfect but you can then possibly
train an AI to pronounce the outliers.

------
egwor
I would also think of various phrases that need a lot emotion applied. e.g.
for sensitive situations like someone's death, or for positive feedback like a
wedding or a birthday or a thank you

Maybe also if she has a favourite book or a favourite quote, get those
recorded too.

Back it all up!

------
mathnode
If you don’t have any children (yet) you should get her to record herself
reading some of her favourite children’s books. At the very least she will be
able to read along with them. Children’s books are quite sparse, so a page
per-track is easy to do.

------
jll29
Just let her read a couple of pieces of texts and record in high-quality (44
KHz).

Beyond the techical answer, you may want her to record some nice personal
words addressed to your family that you can listen to later.

You don't need to do anything until the worst case materialises.

------
voicevoice50
For recording training audio:

[https://github.com/daanzu/speech-training-
recorder](https://github.com/daanzu/speech-training-recorder)

The recorder works with Python 3.6.10. Need to pip install webrtcvad also.

------
bb123
There is [https://www.descript.com/lyrebird-
ai](https://www.descript.com/lyrebird-ai) which is in private beta right now,
but looks to serve your needs exactly. Maybe reach out to them?

------
techbio
Confident as I may be that OPs intentions are good and pure, a quick CTRL-F on
the comment threads finds no references to “abuse” or “ethics”, and I propose
that synthesis of voice raises issues for which society has few natural
defenses.

------
mproud
Roger Ebert has some articles about his troubles he encountered that may be
worth a read.

------
diggum
[https://www.modeltalker.org/vrec/](https://www.modeltalker.org/vrec/) is a
project for "voice banking" that might be able to help. It's not perfect yet.

------
bigmasterofnone
Good luck with what you are doing and more importantly, I wish your wife good
health.

------
PopeDotNinja
My first thought was to spend some time together not speaking. See how it
goes, so there’s less fear going into it. Maybe take a couples mime class or
something! Just making it real and not living in fear is the point.

------
josinalvo
IDK about the tech, but I would not worry about it right now. You dont need to
play with the tech unless the bad unlikely outcome comes to pass.

The only tip I have is from a bit of amateur sound editing I did: collect many
samples, and beware of big phrases: Like, ask her to say the same thing many
times. And ... sometimes ... to ... stop ... at ... each ... word. And ... so
... me ... ti ... mes at each syllable.

Otherwise, if you ever need to create a sample that contains a single
word/syllable, you cant. It is weird how much sound that contains clearly
distinguishable syllables for the human ears still is not separable when you
go to edit it.

Also, you might want to check wordlists by frequency to get a menu of common
words, and ipa notation, to ensure you cover a good range of sounds

~~~
JDEW
> Otherwise, if you ever need to create a sample that contains a single
> word/syllable, you cant. It is weird how much sound that contains clearly
> distinguishable syllables for the human ears still is not separable when you
> go to edit it.

Don’t know why you’re being downvoted. Thought it was insightful.

------
suchoudh
Please do keep us posted on the final outcome. We all pray for the surgery to
go successful. ( Really appreciate your efforts for preparing for the worst
case scenario)

------
techwraith
I recently learned about a startup that is working on this kind of tech:
[https://phonetic.ai/](https://phonetic.ai/)

------
fenesiistvan
These are the things i am coming always back to ycombinator.com. There are
always valuable, intelligent replies here for all kind of issues you might
have.

------
vinniejames
Take a look at Lyrebird

[https://www.descript.com/lyrebird-
ai?source=lyrebird](https://www.descript.com/lyrebird-ai?source=lyrebird)

------
csisnett
Vocalid.ai has an vocal bank where you can record yourself, and use other
people's voices as well. It could be a good choice for her to use her own
voice

------
lowercased
what dangers are there of someone 'stealing' your voice to impersonate you
later? it seems mostly theoretical right now, but perhaps the more high-
profile you are, the bigger the dangers might be, even today? if you had a
large body of your voice already recorded (prepped for voice processing
systems), is that data high-risk?

------
ponker
Make sure to not have her read too much. The vocal cords can get inflamed and
increase the chance of complications/damage.

------
smolPotat
There's an app for that! It's called Vocable, it's open source and iOS and
Android!!!

------
diegoperini
Please let us know the good news if they arrive, preferable with Tell HN or
something similar.

Good luck and best wishes! <3

------
pkinnaird
get a great microphone and have her read her favorite books. Go for books with
lots of dialog and emotional content.

Later, you can extract all the phonemes you want from it and you will retain
the emotional expressiveness of her voice.

She should probably sing some songs -- lullabies, rock, etc. Go for emotional
diversity.

------
glonq
> I'd like to prepare, just in case, to have technology to reproduce her voice
> from keyboard or other input.

Is this something that she wants? She's got a lot on her plate (emotionally
and logistically) to prepare for this surgery, and maybe doesn't need a big
geek project inflicted upon her just because there's a small chance of a bad
outcome.

------
werdnapk
How small of a chance of her losing her ability to speak are you talking about
here?

------
dragoon7
Learn sign language.

~~~
klyrs
These suggestions are getting downvoted, but my girlfriend needed surgery
wherein she wouldn't be able to speak for about a month. I know sign language,
and tutored her for about a month leading up to the surgery. It was
empowering, and she was able to teach friends, family and coworkers a few
basic signs which made a lot of interactions go smoother. This low-tech
solution doesn't need batteries or internet connectivity, and can provide a
much smoother flow of conversation than typing things out.

------
chubs
Acapela.com has a voice banking service

------
ghoshbishakh
Please. There is a small chance you said. Everything will be fine. But still
carry on your research on the problem since it might help others.

~~~
swyx
even if there is a small chance, the preparation may help lesson the blow of
what would still be a tremendous loss.

also it might just help pass the time since OP has 3 weeks.

------
kangaroozach
Descript.com has the tech.

Reach out to Andrew Mason.

------
dazuaz
Not bad for as a niche product Idea

------
evmolesworth
Does your wife want you to do this?

------
kangaroozach
Descript.com Andrew Mason

------
pezo1919
Did you ask her about that? Make sure she is not freaking out of that.

