Hacker News new | past | comments | ask | show | jobs | submit login
[flagged] Kokoro TTS – A lightweight (82M params) text-to-speech model (kokorotts.online)
29 points by jallenjia 12 hours ago | hide | past | favorite | 19 comments





From the faq:

> Can I use Kokoro TTS offline?

> Kokoro TTS is a cloud-based service that requires an internet connection to access our advanced text to speech technology. This ensures you always have access to the latest improvements and don't need to worry about local hardware requirements or model installations.

I would happily take on the worrying for offline instead of them having to worry about my worries.


What's the point of promoting a model as "light weight" or even mentioning the parameter count if I can't run it locally? I don't give a toss how much pressure your remote hardware is under, and promoting a cloud service as small and lightweight only makes me think it's going to be cheap and crappy.

This looks like a fake website. The creator of the website is claiming credit for the model, which does not appear to be created by him. The original model can be found here, along with the source code: https://huggingface.co/hexgrad/Kokoro-82M

Every popular machine learning paper has a fake website associated with it, for some reason. Can anyone figure out why? Another example, someone created this website https://imagen3.org, which is NOT Imagen3 by Google. However, it currently ranks #2 for the model name.


This seems to be a general pattern emerging. Cynical opportunists are wrapping hf endpoints/embeds in dodgy SaaS offerings. A similar one is BetterDictation, which tbf I do use. But I still hate that people are profiting off open-spirited ML engineers and HF's goodwill.

Notice in this case that each testimonial avatar links to an image asset with a different name than the purported persons' name. Notice additionally the user in the thread who's pushing this 'product'; their post history makes it obvious they're an LLM slopBot...


You can buy SaaS kits that include a frontend with pricing pages, backend and all code necessary to wrap any API and resell at a profit.

Why? Some people are so convinced they won't make it if they follow the rules and ethical principles, they try to do with out them.

> You can find a hosted demo at hf.co/spaces/hexgrad/Kokoro-TTS.

And in the FAQ:

> What's included in the Kokoro TTS free trial?

> New users can try Kokoro TTS's full capabilities with our free trial. This allows you to experience our professional-grade text to speech technology firsthand, including access to all voices and both American and British English options.

So this is the "free trial"? Plus it being a cloud-based service makes me not understand the situation.


Company is based in Singapore apparently

On the privacy policy part

> We collect certain personal data, including but not limited to your name, email address, and payment information (if applicable) to enhance the Service and improve user experience.

It's the first time I've seen collecting payment info to improve user experience.


https://kokorotts.org/ is the proper site.

No, that one also appears to be fake.

i just used it with https://github.com/santinic/audiblez/pull/14/files (including the pr because it has gpu accel)

it is very fast and very passable.


I'm excited to share Kokoro TTS, an open-source text-to-speech model we've been working on. Despite its relatively small size (82M parameters), it achieves impressive results in natural speech synthesis, ranking first in the TTS Spaces Arena benchmark.

The model is Apache 2.0 licensed and trained on less than 100 hours of audio data. It supports both American and British English, offering multiple voice options with natural emotional expression and 24kHz audio output.

We've deployed a demo at kokorotts.online where you can try it out. I'd really appreciate any feedback from the HN community on both the model's performance and potential applications.

Tech stack: StyleTTS 2 architecture, ONNX runtime, Next.js for the web interface.


It's NOT Open Source.

Confusing messaging, a previous version is: https://huggingface.co/hexgrad/Kokoro-82M (matching the demo if you use the "TTS v0.19" tab, it has some artefacts in the voice[1] and definitely doesn't sound as good as the latest version).

"There currently isn't a release date scheduled for the other voices"

[1]: https://huggingface.co/blog/hexgrad/kokoro-short-burst-upgra...


And it's not offline.

In which sense? https://huggingface.co/hexgrad/Kokoro-82M

- Apache 2.0 weights in this repository

- MIT inference code in spaces/hexgrad/Kokoro-TTS adapted from yl4579/StyleTTS2

- GPLv3 dependency in espeak-ng


That's not the model repository advertised in the post.

The website is not from the authors. Seems fraudulent

HF: https://huggingface.co/spaces/hexgrad/Kokoro-TTS


Where is the code?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: