Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Built a Real Time Visual Intelligence (twitter.com/withsenapp)
2 points by Aeroi 11 days ago | hide | past | favorite | 1 comment
I built a realtime visual intelligence that connects a users phone camera to a multimodal llm. I use the pipecat open source framework, webrtc, and a few other services to connect it all together.

It's similar to chatgpt advanced voice and grounded with google_search for asynch internet searches based on transcripts or frames from the video that run at 1fps to the LLM.

Let me know what you think and if you want to work on some fun scaling problems with me on this project.

www.withsen.com






One interesting note with voice AI is that you can shove static datasets into the long context windows of these newer models like 2.0-flash-lite. It creates a Model Assisted Generation(MAG) and returns super low latency and 99% relevant information to the bot. Theres a good example in the foundational example of the pipecat github.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: