Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Bolna – build and ship enterprise grade voice AI in minutes (github.com/bolna-ai)
5 points by xan_ps007 29 days ago | hide | past | favorite | 2 comments
Hi Hacker News! This is Maitreya, Marmik and Prateek, co-founders of Bolna (https://github.com/bolna-ai/bolna). With Bolna, developers can create end-to-end conversational voice agents. They can connect to their own custom LLMs, their own Telephony, their own models etc. and create application features requiring voice AI.

Here’s a small video: https://github.com/bolna-ai/bolna/assets/1313096/2237f64f-1c....

Our product originates from building an AI interviewer bot which can be used for practising coding interviews like Leetcode. By trying to build something like that, we realized that the orchestration layer can be a real pain point and hence something that everyone might want to use and build upon for any Voice AI feature. The hypothesis quickly got verified with the immense usage.

We started off as a complete open source solution but quickly realized that to enable and expediate adoption and usage, we will need to create APIs on top of it so developers find it very very easy to use it. Hence, our package is wrapped over by an API layer which can be extended. Our primary focus has been to provide reliability towards latency and successful conversations.

Our github repo https://github.com/bolna-ai/bolna has maintained a 30% fork:star ratio which is pretty massive from our point of view. We've seen developers fork us and try to build their own layers, their own telephony components, their own custom LLMs and models, etc.

Later on we also released our hosted APIs for developers who wanted to use our managed hosted solutions - https://docs.bolna.dev/introduction and built a no-code/low-code playground for trying it out - https://playground.bolna.dev with a "chat" option to tune & test prompts since we realized that prompt is a very critical part for conversations to succeed.

We recently released a complete open source end-to-end stack combining Whisper + Llama + MeloTTS (https://github.com/bolna-ai/bolna/tree/master/examples/whisp...) and hosted it's dockerized versions as well (still WIP) on docker hub.

We think Bolna is quite unique since it encapsulates the underlying orchestration into API-able structure enabling developers to get started quickly via Docker, etc. I've been a great lover (and user) of the Elastic stack since 1.3x versions (circa. 2014) and trying to use a lot of inspiration from their open source process and philosophy on how to improve the adoption and usage.

If this sounds interesting to you, please check us out! You can check us out our open source github repository at Github - https://github.com/bolna-ai or our hosted no-code/low-code dashboard at - https://playground.bolna.dev.

We’d really love to hear your feedback and look forward to all of the comments!




how do you handle latency issues in real-time voice conversations? what specific optimizations have you implemented in the orchestration layer to minimize delays between speech recognition, llm processing, and text-to-speech output?


hey! that's a great question. Initially, we had multiple processes for every component (ASR, LLM, TTS) and using a configurable settings pair of endpointing and token_size we used to handle it since for some cases latency might be an issue but some some others (where there are longer responses) it might not be that much of an issue. Later on, we also integrated with caching and routing to minimize unnecessary calls.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: