Great list. A few questions:

* How could this be used to test 'corrupt' characters? Isn't the process of savign the file itself as UTF-8 un-corrupt...the file?

* Is there some recommended way to group these into "strings that should pass validation" versus "strings that should fail"... or is that too application-specific?

