More

sanchitmonga22 · 2026-03-11T15:06:22 1773241582

please check our main repo: https://github.com/RunanywhereAI/runanywhere-sdks/

We are running anywhere, hence RunAnywhere, MetalRT is the fastest inference engine we made for Apple silicon, and we'll be covering other edge devices as well, All edge about to hit Warp speed!

sanchitmonga22 · 2026-03-11T15:04:10 1773241450

Yes, that's the plan. MetalRT will ship as part of the RunAnywhere SDK so other developers can integrate it into their own apps. We're working on making that available. If you want to be in the early access group, drop me a line at founder@runanywhere.ai or open an issue on the RCLI repo. Happy to look at your project.

sanchitmonga22 · 2026-03-11T15:02:13 1773241333

Fair criticism. Our benchmarks are on small models because MetalRT was built for the voice pipeline use case, where decode latency on 0.6B-4B models is the bottleneck.

You're right that the bigger opportunity on Apple Silicon is large models that don't fit on consumer GPUs. Expanding MetalRT to 7B, 14B, 32B+ is on the roadmap. The architectural advantages(that MetalRT has) should matter even more at that scale where everything becomes memory-bandwidth-bound.

We'll publish benchmarks on larger models as we add support. If you have a specific model/size you'd want to see first, that helps us prioritize.

sanchitmonga22 · 2026-03-11T04:56:12 1773204972

Yes, mobile is our primary offering and it is on the roadmap. The same Metal GPU pipeline that powers MetalRT on macOS maps directly to iOS (same Apple Silicon, same Metal API)

sanchitmonga22 · 2026-03-11T03:27:12 1773199632

Agreed for a lot of use cases. RCLI supports text-only mode (--no-speak flag or just type in the TUI instead of using push-to-talk). TTS makes sense for hands-free / eyes-free scenarios, but we dont force it.

sanchitmonga22 · 2026-03-11T03:25:34 1773199534

We use AI tools in our workflow, same as a lot of teams at this point. The pipeline architecture, Metal integration, and engine design are ours. The code is MIT and open for anyone to read and judge the quality directly.

sanchitmonga22 · 2026-03-11T03:23:56 1773199436

RCLI includes local RAG out of the box. You can ingest PDFs, DOCX, and plain text, then query by voice or text:

rcli rag ingest ~/Documents/notes rcli ask --rag ~/Library/RCLI/index "summarize the project plan"

It uses hybrid retrieval (vector + BM25 with Reciprocal Rank Fusion) and runs at ~4ms over 5K+ chunks. Embeddings are computed locally with Snowflake Arctic, so nothing leaves you're machine.

sanchitmonga22 · 2026-03-11T03:22:29 1773199349

Fair point. The install script shouldn't silently install Homebrew without explicit consent. We'll update it to detect when Homebrew is missing and prompt the user before installing anything beyond RCLI itself.

In the meantime, if you already have Homebrew, you can install directly:

brew tap RunanywhereAI/rcli https://github.com/RunanywhereAI/RCLI.git brew install rcli rcli setup

Or build from source if you prefer not to use either method: https://github.com/RunanywhereAI/RCLI#build-from-source

sanchitmonga22 · 2026-03-11T02:08:34 1773194914

Cool, just checked out dlgo. Looks like you're targeting Go bindings for on-device inference? Different approach but same conviction that this should run locally. Happy to compare notes if you want to chat about Metal optimization or pipeline architecture.

sanchitmonga22 · 2026-03-11T02:07:25 1773194845

Apple has the silicon, the frameworks (MLX, CoreML), and the models. The gap is putting it all together into a fast, unified on-device pipeline. That's what we're focused on, and honestly, we think Apple will eventually ship something similar natively. Until then, we're trying to show whats possible today on their hardware.