Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I built vector search for COSS podcasts & livestreams (algora.io)
2 points by zcesur 10 months ago | hide | past | favorite
Hey HN! I built COSSgpt using videos from the Open Source Founder Podcast [1] and livestreams from COSS Office Hours [2][3]

I transcribed the VODs using Whisper and vectorized fixed-size segments from the transcripts with MPNet on Replicate GPUs. I made these segments overlap a little to prevent semantic meaning being lost inbetween segments

Then I indexed the vectors using HNSWLib in-memory vectorstore [4] and persisted the entire vectorstore into Tigris object storage [5] to cache multimedia and vectors across all Fly.io regions

I built the app in Elixir, almost entirely server-side rendered with minimal diffs sent to the client over WebSockets using Phoenix LiveView. I also used Livebook [6] a ton when I was building the multimedia processing & ML pipeline. I'm super bullish on Elixir for building webapps and/or MLops!

Let me know what you think :) If you're curious you can find the code at https://github.com/algora-io/tv

[1]: https://algora.io/podcast [2]: https://tv.algora.io/peerrich [3]: https://tv.algora.io/rfc [4]: https://github.com/nmslib/hnswlib [5]: https://tigrisdata.com [6]: https://github.com/algora-io/tv/blob/2586950/scripts/cossgpt...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: