I'm wondering if it would make sense to use an H.264/5/6/AV1 encoder as the toke... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

kmeisthax 85 days ago | parent | context | favorite | on: Don't Look Twice: Faster Video Transformers with R...

I'm wondering if it would make sense to use an H.264/5/6/AV1 encoder as the tokenizer, and then find some set of embeddings that correspond to the data in the resulting bitstream. The tokenization they're doing is morally equivalent to what video codecs already do.

ronsor 85 days ago [–]

This was already done in JPEG-LM [0] and it did work.

[0] https://arxiv.org/abs/2408.08459

kmeisthax 84 days ago | [–]

Cryptomnesia!

Interestingly, they managed to train and inference on JPEG bitstream directly. I thought they'd need to at least build embeddings for those bitstream features or something.

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact