We are running anywhere, hence RunAnywhere, MetalRT is the fastest inference engine we made for Apple silicon, and we'll be covering other edge devices as well, All edge about to hit Warp speed!
Yes, that's the plan. MetalRT will ship as part of the RunAnywhere
SDK so other developers can integrate it into their own apps. We're
working on making that available. If you want to be in the early
access group, drop me a line at founder@runanywhere.ai or open an
issue on the RCLI repo. Happy to look at your project.
Fair criticism. Our benchmarks are on small models because MetalRT
was built for the voice pipeline use case, where decode latency
on 0.6B-4B models is the bottleneck.
You're right that the bigger opportunity on Apple Silicon is large
models that don't fit on consumer GPUs. Expanding MetalRT to 7B,
14B, 32B+ is on the roadmap. The architectural advantages(that MetalRT has) should matter
even more at that scale where everything becomes memory-bandwidth-bound.
We'll publish benchmarks on larger models as we add support. If you
have a specific model/size you'd want to see first, that helps us
prioritize.
Yes, mobile is our primary offering and it is on the roadmap. The same Metal GPU pipeline that powers MetalRT on macOS maps directly to iOS (same Apple Silicon,
same Metal API)
Agreed for a lot of use cases. RCLI supports text-only mode (--no-speak flag or just type in the TUI instead of using push-to-talk). TTS makes sense for hands-free / eyes-free scenarios, but we dont force it.
We use AI tools in our workflow, same as a lot of teams at this point. The pipeline architecture, Metal integration, and engine design are ours. The code is MIT and open for anyone to read and judge the quality directly.
It uses hybrid retrieval (vector + BM25 with Reciprocal Rank Fusion) and runs at ~4ms over 5K+ chunks. Embeddings are computed locally with Snowflake Arctic, so nothing leaves you're machine.
Fair point. The install script shouldn't silently install Homebrew without explicit consent. We'll update it to detect when Homebrew is missing and prompt the user before installing anything beyond RCLI itself.
In the meantime, if you already have Homebrew, you can install directly:
Cool, just checked out dlgo. Looks like you're targeting Go bindings for on-device inference? Different approach but same conviction that this should run locally. Happy to compare notes if you want to chat about Metal optimization or pipeline architecture.
Apple has the silicon, the frameworks (MLX, CoreML), and the models. The gap is putting it all together into a fast, unified on-device pipeline. That's what we're focused on, and honestly, we think Apple will eventually ship something similar natively. Until then, we're trying to show whats possible today on their hardware.
We are running anywhere, hence RunAnywhere, MetalRT is the fastest inference engine we made for Apple silicon, and we'll be covering other edge devices as well, All edge about to hit Warp speed!
reply