I really hope Dynamic Byte Latent Transformers work out. Death to tokenizers!
Interesting that it's a a hierarchical structure but only two levels of hierarchy. Stacking more levels seems like an obvious direction for further research.
Author here :), I do think it’s a good direction to look into! That said, aside from it being a bit too much to do at once, you’d also have to be careful about how you distributed your FLOP budget across the hierarchy. With two levels, you can make one level (bytes/local encoder) FLOP efficient and the other (patches/global encoder) FLOP intensive. You’d also need to find a way to group patches into larger units. But ya, there are many directions to go from here!
In a way I'm kinda sad that if tokenizers will go the way of the dinosaurs as asking someone to give me a Unicode character from the private use area was one of the last ways you could actually distinguish a co-operative human from an LLM online
They simply don't have those characters tokenized, so they can't output them. (But this is technically moot if the LLM has a python interpreter handy)
Interesting that it's a a hierarchical structure but only two levels of hierarchy. Stacking more levels seems like an obvious direction for further research.