“Day 7" would be amazing - all that I see YouTube recommending is "I tried it for 24 hours"
I was listening to an "expert" on a podcast earlier today up until the point where the interviewer asked how long his amazing new vibe-coded tooling has been in production, and the self-proclaimed expert replied "actually we have an all-hands meeting later today so I can brief the team and we will then start using the output..."
I'm pro OVH granted I've not used Hetzner before and only used DO briefly. I too remember the burn but they gave credit. I like the emails saying "your service is under DDOS attack".
I've been trying to get speech to text to work with a reasonable vocabulary on pis for a while. It's tough. All the modern models just need more GPU than is available
I haven't tried on a raspberry pi, but on Intel it uses a little less than 1s of CPU time per second of audio. Using https://github.com/NVIDIA-NeMo/NeMo/blob/main/examples/asr/a... for chunked streaming inference, it takes 6 cores to process audio ~5x faster than realtime. I expect with all cores on a Pi 4 or 5, you'd probably be able to at least keep up with realtime.
(Batch inference, where you give it the whole audio file up front, is slightly more efficient, since chunked streaming inference is basically running batch inference on overlapping windows of audio.)
EDIT: there are also the multitalker-parakeet-streaming-0.6b-v1 and nemotron-speech-streaming-en-0.6b models, which have similar resource requirements but are built for true streaming inference instead of chunked inference. In my tests, these are slightly less accurate. In particular, they seem to completely omit any sentence at the beginning or end of a stream that was partially cut off.
I used to live on $20K/yr working a restaurant job, now in tech and six figs I'm still check to check. It's a lifestyle/personal choice thing in my case I'm dumb/waste money.
reply