If context disambiguates, then you have to use attention which is even more resource intensive.
You want to be as state free as possible. Your tokenizer should match your vocab and be unambiguous. I think your goal is sound, but golfing for the wrong metric.
Most of C++ programs written before P0593R6 depended on implementation behaviour, and were graciously allowed to not be undefined behaviour just 5 years ago. C++ as a language standard is mostly irrelevant, what one should care about is what the compiler authors consider valid code.
You have to rely on implementation for anything to do with what happens to memory after it is freed, or really almost anything to do with actual bytes in RAM.
The default case should be the safe correct one, even if it “breaks” backward compatibility. Without it, we will forever be saddled with the design mistakes of the past.
You want to be as state free as possible. Your tokenizer should match your vocab and be unambiguous. I think your goal is sound, but golfing for the wrong metric.
reply