Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Do you know what the tricks are?


1. Don’t use LSTMs (4 vector-matrix multiplies) or GRUs (3 multiplies). Use a fixed Hippo matrix to update state. Just 1 multiply and since it’s fixed you can unroll during training, much faster than backprop through time.

2. Write SIMD intrinsics by hand. None of the libraries are as fast.

3. Don’t use sigmoid or tanh functions as your nonlinear activation. Instead approximate them with the softsign function which is much cheaper.

Depends on exact architecture, but these optimizations have yielded 10-30x improvement for single threaded CPU real time audio applications.

When GPU audio matures all this may be unnecessary.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: