Hacker Newsnew | past | comments | ask | show | jobs | submit | haasiy's commentslogin

VAM Seek AI

Convert video to a grid image. Show AI the entire video in one frame.

Problem: Analyzing 10 min video at 1fps = 600 API calls = expensive

Solution: Compress 48 frames into one grid = 1 API call = ~600x cheaper

Features:

"Where's the car?" → AI answers with timestamps Click timestamp → jump to that scene Zoom: generate high-res grid for specific time ranges on demand Stack:

Local-only (Electron + Canvas) No cloud uploads Direct Anthropic API calls


Greetings from Japan! Thank you for the insightful feedback.

I’m a Japanese developer with a 30-year design background. My English isn't perfect, but I want to share my vision.

The "hybrid mode" I mentioned is still my core philosophy for the future, not fully implemented yet. However, even now, VAM Seek can display thumbnails quite fast by processing everything in the browser. My goal is to make this even smoother.

I’m not trying to replace the 1D bar—it's part of our "muscle memory." In the future, I want VAM Seek to act like a silent assistant: building a cache in the background so that the 2D grid appears instantly the moment you need it.

Invisible until it's indispensable. That’s the "missing standard" I'm aiming for.


The “silent assistant” framing resonates a lot. We’ve seen similar dynamics while working on long-term adoption and visibility for developer-facing products at AixBoost — the solutions that quietly build trust in the background often end up feeling indispensable without users ever consciously noticing when that shift happened.

Making the 2D grid appear exactly at the moment it’s needed, without asking the user to think about it, really does feel like a missing standard rather than a feature.


I’ve read all your feedback, and I appreciate the different perspectives.

To be honest, I struggled a lot with how to build this. I have deep respect for professional craftsmanship, yet I chose a path that involved a deep collaboration with AI.

I wrote down my internal conflict and the journey of how VAM-Seek came to be in this personal log. I’d be honored if you could read it and see what I was feeling during the process: https://haasiy.main.jp/note/blog/llm-coding-journey.html

It’s just a record of one developer trying to find a way forward.


Love the setup! A 2012 machine is a classic.

To answer your question: VAM-Seek doesn't pre-render the entire 60 minutes. It only extracts frames for the visible grid (e.g., 24-48 thumbnails) using the browser's hardware acceleration via Canvas.

On older hardware, the bottleneck is usually the browser's video seeking speed, not the generation itself. Even on a 2012 desktop, it should populate the grid in a few seconds. If it takes longer... well, that might be your PC's way of asking for a retirement plan! ;)


Exactly. I view this cache similarly to how a browser (or Google Image Search) caches thumbnails locally. Since I'm only storing small Canvas elements, the memory footprint is much smaller than the video itself. To keep it sustainable, I'm planning to implement a trigger to clear the cache whenever the video source changes, ensuring the client's memory stays fresh.


Actually, I started with the precomputing approach you mentioned. But I realized that for many users, setting up a backend to process videos or managing pre-generated assets is a huge barrier.

I purposely pivoted to 100% client-side extraction to achieve zero server load and a one-line integration. While it has limits with massive data, the 'plug-and-play' nature is the core value of VAM-Seek. I'd rather give people a tool they can use in 5 seconds than a high-performance system that requires 5 minutes of server config.


I intentionally used AI to draft the README so it's optimized for other AI tools to consume. My priority wasn't 'polishing' for human aesthetics, but rather hitting the 15KB limit and ensuring 100% client-side execution. I'd rather spend my time shipping the next feature than formatting text.


First, you're misunderstanding what I mean by “polishing”, I'm talking about making sure it actually works.

Then, improving the signal to noise ratio of your project actually help “shipping the next feature”, as LLM themselves get lost in the noise they make.

Finally, if you want people to use your project, you need to show us that it's better than what they can make by themselves. And it's especially true now that AI reduces the cost of building new stuff. If you can't work with Claude to build something better that what Claude builds, your project isn't worth more than its token count.


I have to stand my ground here. Reducing a complex functionality into 15KB is not just about 'generating code'—it's about an architecture that AI cannot conceive on its own.

My role was to architect the bridge between UI/UX design and the underlying video data processing. Handling frame extraction via Canvas, managing memory, and ensuring a seamless seek experience without any backend support requires a deep understanding of how these layers interact.

Simply connecting a backend to a UI might be common, but eliminating the backend entirely while maintaining the utility is a high-level engineering choice. AI was my hammer, but I was the one who designed the bridge. To say this is worth no more than its token count ignores the most difficult part: the intent and the structural simplification that makes it usable for others in a single line of code.


> Reducing a complex functionality into 15KB is not just about 'generating code'—it's about an architecture that AI cannot conceive on its own.

Ironic.


The discussion here has been helpful for technical nuance. For those interested in the practical adoption and impact, a startup outlet covered the project's approach and real-world use-case for SaaS platforms here: https://ecosistemastartup.com/vam-seek-navegacion-visual-2d-...


Bad bot.


Your point was certainly valid. I reviewed the Readme and reexamined the code. I would greatly appreciate it if you could evaluate it again with your own eyes.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: