More

BrunoJo · 2025-08-13T19:22:52 1755112972

Subtitle Edit is great if you have the hardware to run it. If you don't have GPUs available or don't want to manage the servers I built a simple to use and affordable API that you can use: https://lemonfox.ai/

BrunoJo · 2025-06-26T05:47:55 1750916875

Let me know if you are interested in a more reliable transcription API. I'm building Lemonfox.ai and we've optimized our transcription API to be highly available and very fast for large files. Happy to give you a discount (email: bruno at lemonfox.ai)

satvikpendem · 2025-06-27T02:22:58 1750990978

Can it do real-time transcription with diarization?

BrunoJo · 2025-06-26T05:38:37 1750916317

If you look for a cheaper transcription API you could als use https://Lemonfox.ai. We've optimized the API for long audio files and are much faster and cheaper than OpenAI.

BrunoJo · 2025-04-08T04:10:46 1744085446

Another great source for trades made by US politicians is https://stockcircle.com/congress-stock-trades. You can see their whole portfolio and also follow other investors like Warren Buffett to see what they are investing in.

BrunoJo · 2025-04-05T06:24:13 1743834253

yes, or if you are looking for a cheap Kokoro TTS API I would recommend https://www.lemonfox.ai/text-to-speech-api

BrunoJo · on Jan 23, 2025

I wouldn't use Edge TTS for commercial projects since it's using an internal Microsoft API that was reverse engineered.

If you are looking for a commercial API, I just launched a TTS API powered by the the best performing open source model Kokoro: https://www.lemonfox.ai/text-to-speech-api. The API is compatible with OpenAI and ElevenLabs and up to 25x cheaper.

rany_ · on Jan 23, 2025

It's worth noting that there have been occasions where the library was blocked and it took a few weeks to workaround said block. For example, when a valid Sec-MS-Token became required, it took a while to implement it in the library: https://github.com/rany2/edge-tts/blob/08b10b931db3f788a506c...

Basically, it's a very bad idea to use this library for anything serious/mission critical. It also is really limited to only taking in text (i.e., no custom SSML, emotion elements, etc) as Microsoft restricts the API to only the features Microsoft Edge itself already supports. Generally commercial users would want these more advanced features and so they'd want to use Azure Cognitive Services.

At any rate this library was never really marketed, I'm not sure how it blew up. It was really only intended so that I can have audio files I can play back for my Home Assistant instance. Later, I started using it to generate e-books. In general, these are the two main uses of the library AFAIK.

ghxst · on Jan 23, 2025

> no custom SSML

I believe this used to be available for edge tts, very sad to see they removed it.

If anyone knows of comparable projects that implement something like SSML please do share.

rany_ · on Jan 23, 2025

While technically the library could continue supporting custom SSML, I ended up removing it because keeping the support was pointless. The API stopped allowing anything other than the basic tags used by Microsoft Edge itself (i.e., prosody for rate/volume/pitch, voice, etc).

As for comparable projects, you can use Azure's offerings instead. They have a free tier that's really generous.

ghxst · on Jan 24, 2025

Fair point! I should have clarified I'm disappointed that Microsoft removed the capability from the api.

qqqult · on Jan 23, 2025

or run a kokoro tts docker container on your own hardware, the hw requirements aren't crazy: https://github.com/remsky/Kokoro-FastAPI

ipsum2 · on Jan 23, 2025

Better yet, you can run it on your browser, in Javascript and not pay any fees! https://huggingface.co/spaces/webml-community/kokoro-web

bilater · on Jan 23, 2025

Nice I was thinking about launching an API because providers like Replicate have long queues. I think if you can nail down latency and concurrency you may get a lot of users who need reliable fast TTS.

dqv · on Jan 23, 2025

Ah, I'm always looking for new ones, but it doesn't look like it supports SSML. Most engines have trouble with things like postal codes, names, and other implicit linguistic rules. Take the example

> Melania Trump's zip code is 20001.

It says "Melaynia Trump's zip code is twenty-thousand one". With SSML, you can tell the engine the correct pronunciation and to say a string of numbers digit-by-digit. Spelling proper nouns differently to trick it into pronouncing it correctly works until it doesn't.

Being able to tell it to pronounce "Melania" like [ˌməˈlɑːn.jə] or [%m@"lA:n.j@] and tweak other aspects of the synthesis with SSML is, in my opinion, an important part of a commercial speech synthesis offering.

I wonder how much effort is needed to make these engines work with SSML. Kokoro+SSML would be awesome.

bsenftner · on Jan 23, 2025

Hey BrunoJo, I'd like to learn more about lemonfox.ai, but there does not seem to be information such as "about us" links. Your service looks worth investigating.

laurentlb · on Jan 23, 2025

Interesting, I'm interested in something like this, but the page doesn't have much information. - What languages are supported? - How many voices are available? - Is it possible to use without a monthly subscription? I'd rather pay only based on my usage (I don't use it every month).

For my use case, I'd need access to a wide variety of languages, and ideally 5+ voices per language. I'm currently using Amazon Polly, but I wonder if there's something better now.

hobo_mark · on Jan 23, 2025

I wish Kokoro supported SSML... Is there a way to explicitly emphasize parts of the text?

BrunoJo · on Oct 25, 2024

https://Lemonfox.ai is another alternative to OpenAI's Whisper API if you need support for word-level timestamps and diarization.

BrunoJo · on Aug 9, 2024

You can try https://www.transcripo.com/ for free

BrunoJo · on July 3, 2024

Good tips, especially the point about narrowing the scope. At https://Lemonfox.ai we started with a LLM, image and speech-to-text API. Now we are only focusing on the speech-to-text API as the other areas are already very crowded and there's a lack of innovation in the speech-to-text space.

cootsnuck · on July 3, 2024

> Now we are only focusing on the speech-to-text API as the other areas are already very crowded and there's a lack of innovation in the speech-to-text space.

I'm legitimately wondering how your hosted Whisper API for $0.17/hr is supposed to compete with groq's exact same API that costs $0.03/hr.

You may be about to find out how crowded all of the AI infra spaces are.

I strongly recommend narrowing your scope far beyond modality. If you've been working with this tech and getting familiar with it then you already have valuable expertise. Pivot now or panic later. If you want to stay in the speech space find what markets are being underserved with speech AI related solutions. Are there pain points there that can be solved by a STT API? If so, build those solutions. You can't compete at the infra layer and I'm not sure why you would want to try if you don't already have something unique about your offering beyond hosting open source models. It's never good if your competition is potentially just a single developer in a company standing up your entire service internally in a week.

If you are determined to stay in the AI infra space then you'll need to be tackling a hard problem that companies want solved. Maybe take a look at fine-tuning models. Hard problem and maybe there's a hunger for it. (It's a risky one to tackle too though since it's very possible general/foundational models will maintain a grip on "good enough".)

anonylizard · on July 3, 2024

Like how do you plan on competing against multimodals, which keep getting cheaper and clearly can do audio->text? Or existing incumbents like Deepgram? Or just the generic APIs provided by the big clouds.

BrunoJo · on Nov 28, 2023

You may want to try https://lemonfox.ai/ as a OpenAI API alternative. I think relying on open-source models is a great alternative.

Filligree · on Nov 28, 2023

The solution to 'GPT-4 sometimes breaks' isn't to use something that never works...

jjcon · on Nov 28, 2023

Have you never used the open source models? They are getting really good - better than 3.5 for sure not as good as 4 except when domain trained in my opinion

nuz · on Nov 28, 2023

Finetuned local models can work just as well or better than gpt-4 in many use cases