Representing all languages is ok as a goal -- adding klingon and BS emojis not so much (from a sanity perspective, if adding them meddled with having a logical and simple representation of characters).
So, it comes to "the fact that some visible characters are made up of many graphemes the number of single code points would be huge" and "while some languages it's feasable to normalize them to single code points but other langagues it would not be".
Wouldn't 32 bits be enough for all possible valid combinations? I see e.g. that: "The largest corpus of modern Chinese words is as listed in the Chinese Hanyucidian (汉语辞典), with 370,000 words derived from 23,000 characters".
And how many combinations are there of stuff like Hangul? I see that's 11,172. Accents in languages like Russian, Hungarian, Greek should be even easier.
Now, having each accented character as a separate might take some lookup tables -- but we already require tons of complicated lookup tables for string manipulation in UTF-8 implementations IIRC.
I'm curious why you think that UTF-8 requires complicated lookup tables.
Because in the end it's still a Unicode encoding, and still has to deal with BS like "equivalence", right?
Which is not mechanically encoded in the err, encoding (e.g. all characters with the same bit pattern there are equivalent) but needs external tables for that.
And I added that while this might need some lookup tables, we already have those in UTF-8 too anyway (a non fixed width encoding).
So the reason I didn't mention UTF-16 and UTF-32 is because those are already fixed-size to begin with (and increasingly less used nowadays except in platforms stuck with them for legacy reasons) -- so the "competitor" encoding would be UTF-8, not them.