Hacker News new | past | comments | ask | show | jobs | submit login

I'm wondering if it would make sense to use an H.264/5/6/AV1 encoder as the tokenizer, and then find some set of embeddings that correspond to the data in the resulting bitstream. The tokenization they're doing is morally equivalent to what video codecs already do.



This was already done in JPEG-LM [0] and it did work.

[0] https://arxiv.org/abs/2408.08459


Cryptomnesia!

Interestingly, they managed to train and inference on JPEG bitstream directly. I thought they'd need to at least build embeddings for those bitstream features or something.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: