Hacker Newsnew | past | comments | ask | show | jobs | submit | vlmrunadmin007's commentslogin

It's impressive how the MCP example in https://docs.vlm.run/mcp/examples/template-search search retains visual context across multiple images and tool calls. Unlike most chat interfaces, it enables seamless multi-step reasoning—like finding a logo in one image and tracking it in another—without losing state. This makes it ideal for building stateful, iterative visual workflows.


We have successfully tested the model with vLLM and plan to release it across multiple inference server frameworks, including vLLM and OLAMA.


Basically there is no model schema combination. IF you go ahead and prompt a open source model with the schema it doesn't produce the results in the expected format. The main contribution is how to make these model conform to your specific needs and in a structured format.


Wait, but we're doing that already, and it works well (Qwen 2.5 VL)? If need be, you can always resort to structured generation to enforce schema conformity?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: