
Mozilla's Speech-to-Text Engine Is at Risk Following Layoffs - lisper
https://www.phoronix.com/scan.php?page=news_item&px=DeepSpeech-At-Risk
======
macspoofing
I don't understand what Mozilla is. They re-upped the Google deal to the tune
of $400-450million/year for the next 3 years, so their revenue is secure.
That's great. I can also understand that it makes sense to cut some side-
projects that are probably a distraction (and this one may very well fit that
category). But if you cut funding to things like MDN, Servo/Rust and move away
from Firefox as a core focus (the thing that just brought you another $1.2
billion) .. what the heck is Mozilla now?

~~~
BiteCode_dev
Wait, move away from firefox as a core focus ?

~~~
macspoofing
Yeah. From [https://blog.mozilla.org/blog/2020/08/11/changing-world-
chan...](https://blog.mozilla.org/blog/2020/08/11/changing-world-changing-
mozilla/)

They list their 5 areas to focus on .. and Firefox isn't mentioned by name
anywhere. For example:

"New focus on product. Mozilla must be a world-class, modern, ___multi-
product_ __internet organization. That means diverse, representative, focused
on people outside of our walls, solving problems, __ _building new products_
__, engaging with users and doing the magic of mixing tech with our values "
(emphasis mine)

~~~
Krasnol
The second sentence of the article you linked is:

> Mozilla exists so the internet can help the world collectively meet the
> range of challenges a moment like this presents. Firefox is a part of this.

They follow up on this by saying that they need to go BEYOND the browser but
it doesn't mean that they want to drop it.

But well...you never know with that hollow techmarketing talk...

~~~
macspoofing
>but it doesn't mean that they want to drop it.

I wasn't trying to suggest there was no mention of Firefox in their roadmap
blog - though it is telling that the Firefox mention you pointed to is in
context of an argument for developing non-Firefox products. Nor was I arguing
that they will drop Firefox. Even if they wanted to, they wouldn't be able to
because Firefox is their sole revenue driver and the only thing that gives
them a seat at the table for developing open-standards.

To me the telling part was when they listed their five areas of focus, and
Firefox was not one of them.

------
jjcon
I’m glad to see the healthy skepticism towards Mozilla lately, they have long
been a sacred cow in tech.

Just because a company’s mission is positive or aligns with ones values does
not mean the company itself is good or is actually carrying that mission out.

~~~
porjo
Mozilla senior leadership have also been caught up in the politically correct
mantra that pervades much of silicon valley, which imho is a huge distraction
from making great products. Case in point:
[https://mobile.twitter.com/newstephen/status/106048929137627...](https://mobile.twitter.com/newstephen/status/1060489291376275457)

~~~
franga2000
This has to take the cake for one of the stupidest "we won't use this word
because it's offensive or whatever" arguments out there.

Just because people call things that aren't a meritocracy a meritocracy does
not mean that's what it means. It's like saying we don't want to say
"democracy" anymore because China (PRC) calls itself a democracy and they
actually aren't. Or if we turn the whole thing around, saying that "feminism"
is offensive because TERFs use the term too.

Like, are there actually people who actually think like that? How do they
communicate??

~~~
mantap
The word was actually coined for the purposes of mocking the concept of a
'meritocracy' in a satirical essay. Then people picked it up and started using
it straight. Now it's come full circle again.

[https://en.m.wikipedia.org/wiki/Meritocracy#Etymology](https://en.m.wikipedia.org/wiki/Meritocracy#Etymology)

~~~
Khaine
Utopia also was created to mock people seeking to create a perfect world.
Nowadays people are trying to create a socialist utopia.

The meaning of words is constantly perverted

------
jacquesm
Don't put a lawyer in charge of a tech company.

~~~
fxtentacle
"But I think most would also agree that if all corporate lawyers, bank
lobbyists, or marketing gurus were to similarly vanish in a puff of smoke, the
world would be at least a little bit more bearable."

I remember just yesterday there was a great article on HN that discussed
bullshit jobs and counted corporate lawyers towards the "goons". The rationale
was that if nobody has them, nobody else needs them.

[https://www.businessinsider.com.au/bullshit-jobs-
australia-2...](https://www.businessinsider.com.au/bullshit-jobs-
australia-2018-7)

~~~
AlphaSite
Who exists to enforce the rules? Contracts? Etc. I feel like this sounds great
on the surface, but is quite flawed.

~~~
zaptrem
Agreed. Lawyers (and by extension, the legal system) must exist to ensure
order and keep the system fair for as many people as possible. They don’t
always achieve that goal, but removing them entirely would make things 100x
worse. My favorite example is the ACLU.

------
suby
Semi-related, but does anyone know if Firefox Send is on the chopping block?
They took it down before the layoffs due to security concerns, and now I'm
worried it won't be coming back online.

Every alternative I've been able to find has been either poor quality or too
shady for me to ever actually want to use.

~~~
Mandatum
Not given priority right now - it's getting abused by malware authors.
Confirmed via Mozilla IRC.

You can set it up yourself really easily (got it running locally on Mac in 5
mins), throw an auth'd gateway in front of it to keep it away from bad actors:
[https://github.com/mozilla/send](https://github.com/mozilla/send)

~~~
wavesquid
I thought mozilla irc was shut down?

~~~
Mandatum
Oops, meant Matrix.

------
HackedBunny
My quick test of DeepSpeech, as posted on Facebook back in April:

\--

I just ran a speech-to-text converter on a very clear clip of former Doctor
Who actor Tom Baker talking in an interview.

The DeepSpeech converter uses the very latest AI deep-learning advancements to
'listen' to the audio and output the spoken words as text.

After 3 long minutes of running it on a 30-second clip, it printed out its
interpretation:

"hooloomooloo how booboorowie i have a honeymoon"

~~~
twoslide
It's supposed to work on sentence long audio (4 - 5 seconds), they suggest
chunking your audio first: [https://discourse.mozilla.org/t/longer-audio-
files-with-deep...](https://discourse.mozilla.org/t/longer-audio-files-with-
deep-speech/22784)

~~~
magicalhippo
Also there's essentially two parts to this, the neural net is used for speech-
to-characters, and then a language model is used to convert the character
stream to words.

I found that the language model they supplied was trained data that did not
contain the words I needed, and got significantly improved results when making
my own language model using the kenlm[1] tools.

[1]: [https://kheafield.com/code/kenlm/](https://kheafield.com/code/kenlm/)

~~~
zaptrem
Would it be possible to substitute this for GPT2/BERT? Or is that a different
type of language model? Can the pre-trained language model be fine-tuned? I’m
using DeepSpeech to transcribe long-form lecture audio, and have just assumed
there would be a massive improvement once they noise-harden the models with
1.0.

~~~
nshm
GPT2 is not a good language model but there are things like XLM. Mozilla
DeepSpeech doesn't support XLM rescoring, other toolkits do and it gives great
improvement in accuracy. If you care about accurate transcription you'd better
consider alternatives.

~~~
zaptrem
I didn't know any other ML-based open source transcription engines existed? I
can't seem to find them on Google.

------
dgellow
It's always sad to see people at the risk of losing their job or the project
they are working on. That being said, this Speech to Text engine always seemed
to be more a distraction than anything else. There is of course a lot of
pessimism at the moment toward Mozilla, and I subscribe to it for the most
part, but if the recent changes and upcoming changes result in more focused
projects and efforts that wouldn't necessarily be a bad thing.

------
nshm
Not a big problem given so many alternatives around like

E.g. some very active projects are:

* Kaldi ([https://github.com/kaldi-asr/kaldi/](https://github.com/kaldi-asr/kaldi/)) obviously, probably the most famous one, and most mature one. For standard hybrid NN-HMM models and also all their more recent lattice-free MMI (LF-MMI) models / training procedure. This is also heavily used in industry (not just research).

* ESPnet ([https://github.com/espnet/espnet](https://github.com/espnet/espnet)), for all kind of end-to-end models, like CTC, attention-based encoder-decoder (including Transformer), and transducer models.

* Espresso ([https://github.com/freewym/espresso](https://github.com/freewym/espresso)).

* Google Lingvo ([https://github.com/tensorflow/lingvo](https://github.com/tensorflow/lingvo)). This is the open source release of Googles internal ASR system, and used by Google in production (their internal version of it, which is not too much different).

* NVIDIA OpenSeq2Seq ([https://github.com/NVIDIA/OpenSeq2Seq](https://github.com/NVIDIA/OpenSeq2Seq)).

* Facebook Fairseq ([https://github.com/pytorch/fairseq](https://github.com/pytorch/fairseq)). Attention-based encoder-decoder models mostly.

* Facebook wav2letter ([https://github.com/facebookresearch/wav2letter](https://github.com/facebookresearch/wav2letter)). ASG model/training.

* Vosk ([https://github.com/alphacep/vosk-api](https://github.com/alphacep/vosk-api)). Offline lightweight speech recognition API with support for 10 languages.

And there are much more.

------
zo1
Am I the only one trying to figure out why Mozilla even has a "speech-to-text"
team?

~~~
suby
The world needs a state of the art and easy to integrate library for providing
speech to text so that we don't have to rely on sending all of our voice data
to the big tech giants. I imagine this was the motivation behind it.

That being said, I think the current open source options are actually okay and
constantly getting better. They're certainly a lot better today than they were
3 years ago.

~~~
hutzlibu
The thing is, good speech to text AI needs good training data. And lots of it.
With as little error as possible.

They were also doing this. And made an easy to use website, where you could
contribute and people did. I cannot imagine it was soo expensive that they now
have to throw it under the bus.

~~~
suby
I agree with you, their common voice project is extremely important. From the
outside it seems like the engineering effort behind this is done -- They just
need to keep the site running to keep collecting data. Actually, now that I
think about it, validating the data might be a lot of man hours.

I very much hope they don't abandon it, the common voice project is arguably
more important than their speech to text engine. There are competing open
source speech to text engines. There is no other project like common voice
that I am aware of.

~~~
hutzlibu
"validating the data might be a lot of man hours."

It totally is and therefore also a very good crowdsourced project. And they
set up already a nice website with badges and other things motivating people
to contribute. Daily 15 minutes of a lot of people would mean lots of
validated data. Because with user accounts you can also validate userquality
etc.

------
joecool1029
hmm, Firefox Voice was supposed to eventually use the speech-to-text engine
that's now under threat but instead currently uses Google's engine? Can't
imagine priorities like this came up in their recent negotiations.

------
est31
[https://news.ycombinator.com/item?id=24245683](https://news.ycombinator.com/item?id=24245683)

------
callamdelaney
This (text to speech) breaks pulse audio on my machine every time I enable
reader view. Even if I'm not using text to speech or listening to anything
else, I'll get a whiny noise until I restart pulse.

~~~
throwme45353464
The browser does have this effect on Linux systems with pulseaudio. You will
hear lots of static in your speakers, and after that the audio will be
distorted until you restart pulseaudio.

What I cannot confirm is the relation of this to the speech-to-text engine.
Reader mode most likely utilizes text to speech.

------
Tanath
Seems unlikely. The layoffs happened before they got a new deal:
[https://www.theverge.com/2020/8/15/21370020/mozilla-
google-f...](https://www.theverge.com/2020/8/15/21370020/mozilla-google-
firefox-search-engine-browser)

Now they can bring people back.

~~~
fooker
>Now they can bring people back.

Or they can increase executive compensation.

How far in advance do you think these deals are decided before public
announcement?

