Hacker Newsnew | past | comments | ask | show | jobs | submit | yatesdr's commentslogin

Happy to report v0.3 released for go-llm-proxy!

Great for connecting your local LLM coding and vision models to Claude Code and Codex.

General improvements

> Vision pipeline - images described by your vision model, transparent to the client

> Dual OCR pipeline - smart routing for PDFs and tool output (text extraction first, vision fallback for scanned docs). Dedicated OCR models like

> PaddleOCR-VL are ~17x faster than general vision models on document pages

> Brave & Tavily search integration - native behavior for Claude Code and Codex when configured on the proxy

> Per-model processor routing - override vision, OCR, and search settings per model

> Context window auto-detection from backends SSE keepalive improvements during pipeline processing Full MCP SSE endpoint for web search on OpenCode, Qwen Code, Claw, and other MCP-compatible agents Docker update for easier deployment (limited testing so far)

Codex-specific

> Full Responses API translation - Chat Completions under the hood, your local backend doesn't need to support /v1/responses

> Reasoning token display - reasoning_summary_text.delta events so Codex shows thinking natively

> Native search UI - emits web_search_call output items so Codex renders "Searched N results" in its interface

> Structured tool output - Codex's view_image returns arrays/objects, not strings. The proxy handles all three formats

> mcp_tool_call_output and mcp_list_tools input types handled (Codex sends these, other backends choke on them)

> Config generator produces config.toml with provider, reasoning effort, context window, and optional Tavily MCP

Claude Code-specific:

> Full Messages API translation - Anthropic protocol to Chat Completions, so Claude Code works with vLLM/llama-server

> Thinking blocks - backend reasoning tokens wrapped as thinking/signature_delta content blocks so Claude Code renders them

> web_search_20250305 server tool intercepted and executed proxy-side

> PDF type: "document" blocks extracted to text before forwarding

> Streaming search with server_tool_use + web_search_tool_result blocks so Claude Code shows "Did N searches"

> /anthropic/v1/messages explicit route for clients that use the Anthropic base URL convention

> Config generator produces settings.json with Sonnet/Opus/Haiku tier selectors, thinking toggles, and start scripts


Pretty similar to litellm[proxy], but supports the Responses API and also some re-write. This is pretty much targeted at coding TUIs but I do use it a lot for text embeddings and streaming inference in applications too.

I run a few different models on my compute nodes and was constantly editing json files managing configs for which one was where. Built this to solve the problem of aggregating them into one place behind a public nginx reverse proxy. My goal was hooking it to claude-code or qwen when I run out of tokens so I could use minimax or glm-5, but it works great for that and also sharing those with other people.

MIT licensed, reasonably secure, maybe useful.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: