In case it helps clarify what this is, I think the reason this is getting posted is because it was discussed earlier as a continuation of Mozilla's TTS work: https://news.ycombinator.com/item?id=26790281
Nothing to do with the tech, but I love your name! I'm just a tourist in Puerto Rico but I've been a number of times and it always warms my heart to hear the Coqui frogs at night. I even play the recordings at home sometimes if I'm having trouble falling asleep.
For anyone that hasn't heard of Coqui frogs before, they are pretty cool animals. Little guys, but a single one can be surprisingly loud and throw its voice pretty effectively. AFAIK they're only really found in Puerto Rico - apparently they can survive in other warm climates but will not sing? Maybe that's an urban legend though.
Anyway I know the sound is a little contentious (some hotels get cats to cut down on guest complaints about the Coqui) but I'd recommend checking it out: https://musicofnature.com/coqui-magic-nightscapes/
Oh, god. I took a vacation to Hawaii a few years ago. The damn Coqui frogs would never shut up. You lost sleep, because you couldn’t shut them out. People have died because they ran off the road while driving, due to lack of sleep caused by Coqui frogs.
Please kill all the Coqui frogs in the world. Or, at least imprison them all on an island that has no human habitation.
Sure, it’s cute in small quantities, but if that’s your soundtrack 24x7 at 70-80 decibels or more, non-stop, you’d probably want to commit suicide just to get out of there. That’s the kind of place that Coqui frogs drive you to.
Kind of an Americentric perspective and also doesn't give people the benefit of the doubt. I'm an American who only speaks English but I think it's an awesome and interesting name compared to most of the lame product names out there.
I agree with you but also Puerto Rico is technically in the US. I think anyone that has vacationed there will recognize the word (especially with the frog logo), since they are pretty hard to miss and there is a ton of Coqui memorabilia sold at the touristy shops. Puerto Rico is a super common vacation spot for those on the East coast, so unless I'm totally overrating the memorability of these frogs I'd guess plenty of non-Spanish speaking Americans will know how to pronounce.
I agree, but also it's an onomatopoeia, the sound the frogs make actually is "Co-Kee". IIRC the indigenous people in Puerto Rico are the ones that named them a long time ago, but the spelling might have been influenced by the Spanish. Point being that you definitely do not need to speak a non-English language to know how to say the word, you could also just have visited Puerto Rico.
It will definitely be mispronounced if they misspell it without the accent (Coquí).
I understand they don't want non-ascii in the GitHub project name that turns into a directory name, but the one-line description contains an emoji so I would have thought they could have allowed themselves a Latin-1 character there, for the benefit of people who know some Spanish but hadn't heard of the frogs.
The quality is also miles better than the last open source TTS I've heard that wasn't just an Amazon SDK. I'll take newscaster-voice over robot-from-the-early-00's.
Usually conversation is less crisp. That sounds great, but more like an audiobook or NPR. It's probably going to be hard to sound conversational with only one voice speaking though.
Is there any information about the training process? Which data was used, which license was that data under and which tools, drivers and hardware was used for the training?
Basically I'm wondering if these projects count as libre machine learning projects according to the Debian Deep Learning Team's Machine Learning Policy.
They do, the issue is with Tensorflow support iirc and with NVIDIA drivers.
So for the English model they use mostly free/open-source data, but some non-free data:
- train_files Fisher, LibriSpeech, Switchboard, Common Voice English, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora. (from https://github.com/coqui-ai/STT/releases/tag/v0.9.3)
As for the hardware for training... it's basically NVIDIA (you need Tensorflow / CUDA and all that guff). For inference it works realtime on a CPU.
There is plenty of free/open-source voice data out there, just it's a question of reaching for a sick bag and installing NVIDIAs stuff.
I don't know if they'd count as free/open-source according to Debian (I'm a Debian user myself), but the team has definitely talked about getting into Debian and would be very open to discussions about it.
Check out the ML policy, it sounds like it would currently be classified as a "toxic candy" model due to the non-free data, but it sounds like you could re-train to avoid that.
Using CUDA also means it wouldn't be considered legitimately free enough, although folks are working on getting AMD ROCm into Debian.
Tensorflow isn't yet in Debian, but there may be folks working on it.
Another problem is that Debian doesn't have the hardware for doing training.
I'd encourage you to talk to the folks on the debian-ai mailing list and IRC channel to discuss these and other issues.
AFAIK, using copyrighted data to train does not necessarily make the trained model "toxic". "Authors Guild, Inc. v. Google, Inc." case [1] is viewed as a key precedent for this view.
The phrase is "toxic candy" not "toxic", see the policy for what it means.
Most data is protected by copyright, but I assume you meant proprietary rather than copyrighted. Using proprietary data might not matter under copyright law, but it does matter in terms of the Debian machine learning policy and DFSG, because the non-free data cannot be shipped in Debian main and thus cannot be used to train a model shipped in main.
Yeah, ROCm is a bit of a mess. I actually have an AMD GPU in a server, but the drivers in the mainline kernel don't work properly, so have never been able to use it.
If I were into conspiracy theories I'd say that AMDs failure to compete in the GPU/DL space is to do with the relationship between the AMD CEO and the NVIDIA one.
Tensorflow is just awful, as is anything that touches bazel :)
I wasted a week trying to replace the scorer component with a NN-based language model. Every time I made a change the whole codebase, including Tensorflow recompiled, so the turnaround time was about an hour per change. It was awful. I mean I get reproducible builds etc. and probably if you're running stuff at Google scale it has all kinds of useful features. But for development on a personal laptop it was torture. Eventually I gave up.
Fwiw, that sounds like a bug or a misconfiguration; it's absolutely supposed to have better caching behavior than that (and does in the few projects I've used it on, even on a personal laptop). If you're interested in pursuing it further (I'd understand if you aren't; that sounds frustrating), I bet the bazel team would be interested in your report.
You don't need _no_ errors, you just need low errors.
Aside from Common Voice, there are also a lot of resources at openslr. Also, the amount of data you need is often vastly overestimated, with advances in pretraining and transfer learning and the fact that most languages don't have as terrible an orthography as English.
Yeah, they really dumped them in it, fortunately the devs are keeping it all going at coqui.ai and are really supportive of any community that got abandoned by Moz.