Hacker Newsnew | past | comments | ask | show | jobs | submit | bytesandbits's commentslogin

Apple


4x faster PREFILL not decode. Decode is bandwidth-bounded. Prefill is flops-constrained.

do you run two eSIMs when traveling and if so how is stability / battery life?

Always 2 SIM/esim running simultaneously. Compared to previous non-apple modem it's night and day battery-wise.

Didn't notice any issues with connection speed/stability.


incredible work

sensei karpathy has done it again

parakeet v3 has a much better RTFx than moonshine, it's not just about parameter numbers. Runs faster.

https://huggingface.co/spaces/hf-audio/open_asr_leaderboard


That was my experience when I tried Moonshine against Parakeet v3 via Handy. Moonshine was noticeably slower on my 2018-era Intel i7 PC, and didn't seem as accurate either. I'm glad it exists, and I like the smaller size on disk (and presumably RAM too). But for my purposes with Handy I think I need the extra speed and accuracy Parakeet v3 is giving me.

It is about the parameter numbers if what you care about is edge devices with limited RAM. Beyond a certain size your model just doesn't fit, it doesn't matter how good it is - you still can't run it.

I am not sure what "edge" device you want to run this on, but you can compress parakeet to under 500MB on RAM / disk with dynamic quants on-the-fly dequantization (GGUF or CoreML centroid palettization style). And retain essentially almost all accuracy.

And just to be clear, 500MB is even enough for a raspberry Pi. Then your problem is not memory, is FLOPS. It might run real-time in a RPi 5, since it has around 50 GFLOPS of FP32, i.e. 100 GFLOPS of FP16. So about 20-50 times less than a modern iPhone. I don't think it will be able to keep it real time, TBF, but close.

regardless, this model with such quantization strategy runs real time at +10x real-time factor even in 6-year old iPhones (which you can acquire for under $200) and offline at a reasonable speed, essentially anywhere.

You get the best of both worlds: the accuracy of a whisper transformer at the speed and footprint of a small model.


maybe a deepseek v4 distill. give it a few days


its cause of a chain of events.

Next week Chinese New year -> Chinese labs release all the models at once before it starts -> US labs respond with what they have already prepared

also note that even in US labs a large proportion of researchers and engineers are chinese and many celebrate the Chinese New Year too.

TLDR: Chinese New Year. Happy Horse year everybody!


Not trained in Ascend that is BS. Hopper GPU cluster. Please remove that.


wow Mistral really cooked


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: