A sequence of tokens can be converted back to the sequence of tokenized characters without loss of information. Eg, how do you think text is rendered for the user based on sequences of tokens generated by the LLM? Different tokenization schemes arrange that information differently and may make it (hand waving here) harder or easier for the model to reason about details like raw character counts that are affected by tokenization. If the training set included sufficiently many examples of character counting Q/A pairs, an LLM would have no trouble learning how to do this task.