Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Talk to your Mac offline – sub-second Voice AI (Apple Silicon and MLX) (github.com/shubhdotai)
1 point by mshubham 47 days ago | hide | past | favorite | 1 comment
I wanted a voice assistant that feels realtime but runs completely offline. This prototype uses MLX + FastAPI on Apple Silicon to hit sub-second latency for speech-to-speech conversations.

Repo: https://github.com/shubhdo­tai/offline-voice-ai

It’s fast, minimal, and hackable — would love feedback on latency tricks, model swaps, or use-cases you’d like to see next.



A few details for anyone curious: • MacBook Pro M3 (16GB) • STT: mlx-community/whisper-small.en-mlx-q4 • LLM: mlx-community/LFM2-1.2B-4bit • TTS: hexgrad/Kokoro-82M • Backend: FastAPI + WebSocket streaming • Interruption: VAD with configurable “quiet probability”

Current avg latency: ~850 ms end-to-end (speech → LLM → speech).

Goal: keep it fast, under ~1K LOC, and clean so anyone can swap models or adapt it to their use case.

Feedback welcome on model choices, latency wins, and better UX for barge-in/turn-taking.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: