Coqui, a startup providing open speech tech for everyone

resoluteteeth · on April 14, 2021

In case it helps clarify what this is, I think the reason this is getting posted is because it was discussed earlier as a continuation of Mozilla's TTS work: https://news.ycombinator.com/item?id=26790281

caddemon · on April 14, 2021

Nothing to do with the tech, but I love your name! I'm just a tourist in Puerto Rico but I've been a number of times and it always warms my heart to hear the Coqui frogs at night. I even play the recordings at home sometimes if I'm having trouble falling asleep.

For anyone that hasn't heard of Coqui frogs before, they are pretty cool animals. Little guys, but a single one can be surprisingly loud and throw its voice pretty effectively. AFAIK they're only really found in Puerto Rico - apparently they can survive in other warm climates but will not sing? Maybe that's an urban legend though.

Anyway I know the sound is a little contentious (some hotels get cats to cut down on guest complaints about the Coqui) but I'd recommend checking it out: https://musicofnature.com/coqui-magic-nightscapes/

bradknowles · on April 16, 2021

Oh, god. I took a vacation to Hawaii a few years ago. The damn Coqui frogs would never shut up. You lost sleep, because you couldn’t shut them out. People have died because they ran off the road while driving, due to lack of sleep caused by Coqui frogs.

Please kill all the Coqui frogs in the world. Or, at least imprison them all on an island that has no human habitation.

Sure, it’s cute in small quantities, but if that’s your soundtrack 24x7 at 70-80 decibels or more, non-stop, you’d probably want to commit suicide just to get out of there. That’s the kind of place that Coqui frogs drive you to.

Vivtek · on April 15, 2021

I live in Adjuntas and no amount of cats would eat this many frogs - fortunately!

They tend to quiet down during dry spells, so our rain last week has made things sound much nicer.

icedchai · on April 14, 2021

Unfortunately, it is a bad name because, right or wrong, it will be easily mispronounced. Cocky AI? Who wants that?

spoonjim · on April 14, 2021

Kind of an Americentric perspective and also doesn't give people the benefit of the doubt. I'm an American who only speaks English but I think it's an awesome and interesting name compared to most of the lame product names out there.

caddemon · on April 14, 2021

I agree with you but also Puerto Rico is technically in the US. I think anyone that has vacationed there will recognize the word (especially with the frog logo), since they are pretty hard to miss and there is a ton of Coqui memorabilia sold at the touristy shops. Puerto Rico is a super common vacation spot for those on the East coast, so unless I'm totally overrating the memorability of these frogs I'd guess plenty of non-Spanish speaking Americans will know how to pronounce.

rhn_mk1 · on April 14, 2021

The computing industry could get less sensitive and accept that other languages than English exist.

caddemon · on April 14, 2021

I agree, but also it's an onomatopoeia, the sound the frogs make actually is "Co-Kee". IIRC the indigenous people in Puerto Rico are the ones that named them a long time ago, but the spelling might have been influenced by the Spanish. Point being that you definitely do not need to speak a non-English language to know how to say the word, you could also just have visited Puerto Rico.

bloak · on April 15, 2021

It will definitely be mispronounced if they misspell it without the accent (Coquí).

I understand they don't want non-ascii in the GitHub project name that turns into a directory name, but the one-line description contains an emoji so I would have thought they could have allowed themselves a Latin-1 character there, for the benefit of people who know some Spanish but hadn't heard of the frogs.

explorigin · on April 14, 2021

I'm glad to see this. I hope they can get the TTS to sound more conversational and less like a newscaster...that being said...free is nice.

indigochill · on April 14, 2021

The quality is also miles better than the last open source TTS I've heard that wasn't just an Amazon SDK. I'll take newscaster-voice over robot-from-the-early-00's.

EMM_386 · on April 14, 2021

The sample links are impressive to me. I don't follow the space closely but they sound conversational.

https://soundcloud.com/user-565970875/pocket-article-wavernn...

nitrogen · on April 14, 2021

Usually conversation is less crisp. That sounds great, but more like an audiobook or NPR. It's probably going to be hard to sound conversational with only one voice speaking though.

jhbadger · on April 15, 2021

Yes, the woman's voice definitely has that NPR way of finishing sentences.

kdavis · on April 15, 2021

Some samples of the TTS voice are here[1]

[1] https://erogol.github.io/ddc-samples/

pabs3 · on April 14, 2021

Is there any information about the training process? Which data was used, which license was that data under and which tools, drivers and hardware was used for the training?

Basically I'm wondering if these projects count as libre machine learning projects according to the Debian Deep Learning Team's Machine Learning Policy.

https://salsa.debian.org/deeplearning-team/ml-policy

ftyers · on April 14, 2021

They do, the issue is with Tensorflow support iirc and with NVIDIA drivers.

So for the English model they use mostly free/open-source data, but some non-free data:

- train_files Fisher, LibriSpeech, Switchboard, Common Voice English, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora. (from https://github.com/coqui-ai/STT/releases/tag/v0.9.3)

As for the hardware for training... it's basically NVIDIA (you need Tensorflow / CUDA and all that guff). For inference it works realtime on a CPU.

I'm preparing pretrained models based on Common Voice data for their STT based on Common Voice data here: https://tepozcatl.omnilingo.cc/manifest.html

There is plenty of free/open-source voice data out there, just it's a question of reaching for a sick bag and installing NVIDIAs stuff.

I don't know if they'd count as free/open-source according to Debian (I'm a Debian user myself), but the team has definitely talked about getting into Debian and would be very open to discussions about it.

pabs3 · on April 15, 2021

Check out the ML policy, it sounds like it would currently be classified as a "toxic candy" model due to the non-free data, but it sounds like you could re-train to avoid that.

Using CUDA also means it wouldn't be considered legitimately free enough, although folks are working on getting AMD ROCm into Debian.

Tensorflow isn't yet in Debian, but there may be folks working on it.

Another problem is that Debian doesn't have the hardware for doing training.

I'd encourage you to talk to the folks on the debian-ai mailing list and IRC channel to discuss these and other issues.

https://lists.debian.org/debian-ai/ ircs://irc.oftc.net/debian-ai

donpark · on April 15, 2021

AFAIK, using copyrighted data to train does not necessarily make the trained model "toxic". "Authors Guild, Inc. v. Google, Inc." case [1] is viewed as a key precedent for this view.

[1] https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....

pabs3 · on April 15, 2021

The phrase is "toxic candy" not "toxic", see the policy for what it means.

Most data is protected by copyright, but I assume you meant proprietary rather than copyrighted. Using proprietary data might not matter under copyright law, but it does matter in terms of the Debian machine learning policy and DFSG, because the non-free data cannot be shipped in Debian main and thus cannot be used to train a model shipped in main.

pabs3 · on April 15, 2021

Hmm, that case doesn't appear to be about ML though, could you explain how it is considered a precedent for ML?

donpark · on April 16, 2021

See https://towardsdatascience.com/the-most-important-supreme-co...

pabs3 · on April 18, 2021

Thanks. Its interesting that this only applies to countries with the concept of fair use, which is unfortunately not widespread.

ftyers · on April 15, 2021

Yeah, ROCm is a bit of a mess. I actually have an AMD GPU in a server, but the drivers in the mainline kernel don't work properly, so have never been able to use it.

If I were into conspiracy theories I'd say that AMDs failure to compete in the GPU/DL space is to do with the relationship between the AMD CEO and the NVIDIA one.

Tensorflow is just awful, as is anything that touches bazel :)

hansvm · on April 15, 2021

> Tensorflow is just awful, as is anything that touches bazel :)

Mind if I ask why?

ftyers · on April 15, 2021

I wasted a week trying to replace the scorer component with a NN-based language model. Every time I made a change the whole codebase, including Tensorflow recompiled, so the turnaround time was about an hour per change. It was awful. I mean I get reproducible builds etc. and probably if you're running stuff at Google scale it has all kinds of useful features. But for development on a personal laptop it was torture. Eventually I gave up.

hansvm · on April 15, 2021

Got it, thank you.

Fwiw, that sounds like a bug or a misconfiguration; it's absolutely supposed to have better caching behavior than that (and does in the few projects I've used it on, even on a personal laptop). If you're interested in pursuing it further (I'd understand if you aren't; that sounds frustrating), I bet the bazel team would be interested in your report.

hutzlibu · on April 15, 2021

"There is plenty of free/open-source voice data out there"

No doubt about that, but you need validated transcripted voice data(no errors) and this is harder to get.

ftyers · on April 15, 2021

You don't need _no_ errors, you just need low errors.

Aside from Common Voice, there are also a lot of resources at openslr. Also, the amount of data you need is often vastly overestimated, with advances in pretraining and transfer learning and the fact that most languages don't have as terrible an orthography as English.

rafaeltorres · on April 14, 2021

Just a note: the colorful frog pictured on the website (https://coqui.ai/) is not a coqui. Coquis are usually brown and tiny. https://upload.wikimedia.org/wikipedia/commons/6/62/Coqui_Fr...

andrewfromx · on April 14, 2021

I think it’s like https://www.descript.com/

jsnk · on April 14, 2021

Descript doens't look like it's open source though.

andrewfromx · on April 14, 2021

exactly, that's why this is exciting!

andrewfromx · on April 15, 2021

also like https://www.resemble.ai I remember using that one awhile ago and thinking this should all be open source

pabs3 · on April 15, 2021

It would be awesome if this could be integrated into GNOME, KDE and the other open desktops as an accessibility feature.

trowngon · on April 14, 2021

Looks like a fork of Mozilla DeepSpeech by former DeepSpeech developers. What is the relation to the original project?

basaltbeach · on April 14, 2021

tl;dr:

- Mozilla fired the developers and mothballed the project

- But wants to keep it around as a museum piece

All ongoing development is happening in the fork.

totetsu · on April 15, 2021

I was very sad when that happened. There were a lot of language communities organizing their efforts around that project too.

basaltbeach · on April 15, 2021

Yeah, they really dumped them in it, fortunately the devs are keeping it all going at coqui.ai and are really supportive of any community that got abandoned by Moz.

Cilvic · on April 15, 2021

any idea how they are financed?

trowngon · on April 16, 2021

Looks like after nvidia 1.5m grant devs returned back ;)

edoceo · on April 14, 2021

how do I actually use this to turn my speech into text? it seems some docs are 404ing.

edit: I found some transcription code:

https://github.com/WebThingsIO/voice-addon

DeepSpeech based but close, workable.

ftyers · on April 14, 2021

There is a lot of example code here: https://github.com/coqui-ai/STT-examples

If you have any more specific requirements then we can point you in the right direction. Or just join us on Matrix: https://app.element.io/#/room/#coqui-ai_STT:gitter.im :)

edoceo · on April 15, 2021

I want to yell at some webpage that is reading my mobile's mic and have it become text in a <textarea>.

And I don't want the company that must not be named to know what I said.

ftyers · on April 15, 2021

You might want something like LocalSTT if it's on mobile: https://github.com/ccoreilly/LocalSTT

Otherwise this code does streaming on a websocket: https://github.com/coqui-ai/STT-examples/tree/r0.9/web_micro...

edoceo · on April 15, 2021

yep, just found those

I'm looking forward to bothering you on the Matrix ;)

Rebelgecko · on April 15, 2021

This seems neat, but I wish they had more examples of how to use it as a library. Most of their tutorials seem focused on training new models

kdavis · on April 15, 2021

There is a repository with STT examples[1]. Is what you're looking for there?

[1] https://github.com/coqui-ai/STT-examples

Rebelgecko · on April 15, 2021

If I'm parsing the acronyms right, that's speech-to-text and I was hoping to try out their Text-to-Speech

ftyers · on April 15, 2021

There is another article about the TTS: https://news.ycombinator.com/item?id=26790951

rytill · on April 15, 2021

How does this startup plan to make money? Services?

throwaway99x99 · on April 15, 2021

Wow, that soundcloud sample is amazing. I wonder how long it took to produce it or if it could be produced in real time?

ftyers · on April 15, 2021

Iirc it is realtime, but check out the Matrix channel: https://app.element.io/#/room/#coqui-ai_TTS:gitter.im

topicseed · on April 14, 2021

What is open speech tech? Clicked twice (GitHub, website's homepage) and didn't really get much out of it.

Miraste · on April 14, 2021

They have text-to-speech software and speech-to-text software, both of which are open source.

totetsu · on April 15, 2021

Basically building the tools to collect data, and models to let anyone implement voice interfaces in their system without needing to use a closed api.

amelius · on April 14, 2021

I'm not sure but in any case they have a great collection of papers and talks in that repository.

cirrus3 · on April 15, 2021

Cocky? Might want to consider a rename before you run into the Coq situation.