Encoding a 'u' into variable names for "unsafe" strings and omitting it for ones that have been encoded safely for HTML bodies is nice. If this is the only problem to solve and you can universally apply this technique then it solves the "NUL terminated string of the 2010s", and I may use it but…
What about encoded safely for URLs?
What about encoded safely for use as a Unix file name?
What about encoded safely for SQL? (yes, don't do that, I know).
How about encoded safely for 7 bit ASCII only applications?
How about the output of base64? It isn't unsafe for HTML, but it is for URLs.
What about that bit of HTML that the user entered using their WYSIWYG field that needs to be sanitized but not encoded? 'u' seems right, but Encode(uVar) isn't the proper handling.
How about that library that doesn't know the convention, so its function calls all look like they are encoded safely for HTML since they don't start 'Us'?
What about the library that used 'u' for UTF-8?
(Oh, and…
dosomething()
cleanup()
… is broken. Requiring a caller of dosomething() to remember to cleanup() is about as polite as leaving set bear traps in your living room furniture and reminding guests to check under the cushions before they sit. Programmers are used to that sort of abuse (Remember to fclose() what you successfully fopen()), but the future should really provide use with better constructs, and it not being 1980 anymore, we are the future.)
Yes, exactly, the meaning of 'safe string' depends on its context.
I personally believe that all code that is sensitive to escaping issues, should be escaped on output (or storage / sending to the database / whatever) by default unless you explicitly opt it out.
In our framework, we maintain 'already escaped' strings as a separate class, forcing the developer to acknowledge this fact. The HTML output layer escapes all strings unless they are instances of this safe class. Similarly for the SQL layer, all code gets quoted unless it was explicitly marked as 'safe'.
Build them using a syntax builder framework, but it is possible to pass raw SQL strings in in exceptional circumstances. Obviously, the developer needs to be totally aware of the risks of this. Not ideal, but needed to solve a couple of problems.
Except now the combinatorial explosion gets even worse. How do you prefix a b64 blob which has been SQL-escaped, urlencoded and htmlencoded? `urlhtmlsqlb64String`?
It seems like 'combinatorial explosion' might not be the right phrase here. There is a combinatorial explosion of possible prefixes, but, since you're not going to search the possible-prefix space, it doesn't really matter. (For example, there is a combinatorial explosion of unprefixed variable names, and nobody's bothered by that!) The question is whether you can decode a given prefix, and, even in your example, that's easy. (Prefix ambiguities could easily be resolved with underscores, like url_html_sql_b64_string.)
I'm not arguing for this convention (I don't much like it myself); but I don't think your argument is a very strong point against it.
EDIT: Oops, sorry, I just noticed that it wasn't you who brought up the explosion in the first place.
Encoding a 'u' into variable names for "unsafe" strings and omitting it for ones that have been encoded safely for HTML bodies is nice. If this is the only problem to solve and you can universally apply this technique then it solves the "NUL terminated string of the 2010s", and I may use it but…
What about encoded safely for URLs?
What about encoded safely for use as a Unix file name?
What about encoded safely for SQL? (yes, don't do that, I know).
How about encoded safely for 7 bit ASCII only applications?
How about the output of base64? It isn't unsafe for HTML, but it is for URLs.
What about that bit of HTML that the user entered using their WYSIWYG field that needs to be sanitized but not encoded? 'u' seems right, but Encode(uVar) isn't the proper handling.
How about that library that doesn't know the convention, so its function calls all look like they are encoded safely for HTML since they don't start 'Us'?
What about the library that used 'u' for UTF-8?
(Oh, and…
… is broken. Requiring a caller of dosomething() to remember to cleanup() is about as polite as leaving set bear traps in your living room furniture and reminding guests to check under the cushions before they sit. Programmers are used to that sort of abuse (Remember to fclose() what you successfully fopen()), but the future should really provide use with better constructs, and it not being 1980 anymore, we are the future.)