Hacker News new | past | comments | ask | show | jobs | submit | ashu_trv's comments login

Yeah, we tried to solve the 1 and 3. 2nd is still an open problem. Can you share more about the MECE?


Keeping documentation and SDK updates aligned with evolving "LLM contexts" can quickly overwhelm dev teams. At VideoDB, we've built an open-source solution—Agent Toolkit—that automates syncing your docs, SDK versions, and examples, making your dev content effortlessly consumable by Cursor, Claude AI, and other agents. Ready-to-use template available.


A new benchmark study evaluates Vision-Language Models (Claude-3, Gemini-1.5, GPT-4o) against traditional OCR tools (EasyOCR, RapidOCR) for extracting text from videos. The findings show VLMs outperforming OCR in many cases but also highlight challenges like hallucinated text and handling occluded/stylized fonts.

The dataset (1,477 manually annotated frames) and benchmarking framework are publicly available to encourage further research.

Paper: https://arxiv.org/abs/2502.06445 Dataset & Repo: https://github.com/video-db/ocr-benchmark

Would love to hear thoughts from the community on the future of VLMs in OCR.


This open-source agent framework is like ChatGPT, but for videos. It simplifies complex video tasks like search, editing, compilation, and—best of all—generation. The results stream instantly. You can even extend the agents to suit your needs and build custom automated workflows.

The framework is fully open source and uses a VideoDB key for cloud-based video storage, processing, and streaming. It seamlessly integrates with tools like Stable Diffusion, Eleven Labs, Kling, Replicate, and more.

Looking for collaboration with GenAI audio/ video teams and feedback from amazing devs out here.


If this is your own work, you can edit the title to prefix it with 'Show HN:' which tells everyone you're showing your work for feedback and it ends up in the Show HN section of the site which gives the post more exposure.


Thanks! Added now.


LLMs are great with text, but they don't help you consume or create video clips. Checkout PromptClip - Use natural language to describe the what you want. - Instantly get video clips with the help of LLMs like OpenAI or Claude.

Few interesting prompts we tried while building it and loved the results. There's no limit to creativity with this.

*Shark Tank Videos:* [Find every moment where a deal was offered](https://console.videodb.io/player?url=https://stream.videodb...)

*Useful Gadgets* [Show me where the host discusses or reveals the pricing of the gadget](https://console.videodb.io/player?url=https://stream.videodb...)

*Huberman Podcast:* [Find details about every sponsor](https://console.videodb.io/player?url=https://stream.videodb...)

*Masterchef* [Show me the feedback from every judge](https://console.videodb.io/player?url=https://stream.videodb...)

Say goodbye to manual editing, skimming and seeking the video and hello to instant, AI-driven video consumption and creation


Yeah, VideoDB is the next-gen infrastructure for videos and actually less costly than current video infrastructure.

But you can use any LLM for analysing the transcript.


So it's still just analysing the transcript, not using GPT4-V or OCR in any way?

Can you confirm if I could skip using VideoDB by using Whisper to transcribe the video, and then use that transcript with LLaMa to extract the important parts?


It analyse the transcript, but there is no way to get back the video clip without building your own video infra. We at Videodb are solving the exact problem.


Build custom GPT on your video data with StreamRAG in 2 mins.This search agent find relevant moments across hundreds of hours of content and return a video clip instantly.

RAG applications are great with text, but with video they can't support simple requests like "show me where sleep improvement is discussed"


Made a twitter bot that lets you generate an interactive transcript(Spext Docs) of any audio/video in a tweet

Here is an example - https://twitter.com/spext_it/status/1286130139290632192

Interactive transcripts are published at https://publish.spext.co


Very cool! How is it going to be beneficial over the native solutions provided by cloud providers itself?


The cloud providers more or less just give you a Kubernetes cluster. You can then connect this cluster (or any other cluster in a private cloud) to DevSpace Cloud, which allows developers to create their own namespaces on demand in the cluster. DevSpace CLI is supposed to enable developers to work with Kubernetes regardless of their experience in working with k8s. Overall, DevSpace does not replace the cloud providers but makes their Kubernetes offer accessible (DevSpace Cloud) and easy to use (DevSpace CLI) for developers.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: