Many filesystems don't support unicode or support only a subset of it:
> Also domains and URLs can be unicode too.
Domains: it depends at which level you are dealing with them. See https://en.wikipedia.org/wiki/Internationalized_domain_name
Internationalized domain names are stored in the Domain
Name System as ASCII strings using Punycode transcription.
only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.
For example, a few years ago I grabbed a source tarball from somewhere, I forget what or where. It had the author's name in a comment, which included an O with dots over it. That was the only non-ASCII character in the source code. No matter what I did, both Eclipse and command-line javac refused to compile the source.
Finally I wrote a script to delete his name from every source file manually. It compiled flawlessly.
Then there's the time I found some text files with two characters of binary junk at the beginning, followed by completely normal text. Again, I forget what I was doing, but some program was refusing to process them correctly. It was something internationalization-related called the BOM. Eventually I ended up writing a script to walk a directory and remove the first two bytes of every file. (This can probably be done with dd and xargs on UNIX, but I was using Windows at the time, which means that something like this will require spending an hour or so in your favorite programming language.)
These experiences lead me to believe that, for bootstrapped USA startups at least, you shouldn't worry about a market outside the English-speaking world.
If you need to worry about junk like accented characters or moon runes (Chinese/Japanese/Korean characters), it means you're big enough to afford to hire someone specifically to address the problem.
How do you define "English-speaking world", btw? Those too ignorant to have heard of non-ascii-characters (ie: excluding Canada, as anyone doing business there should at least have heard of French)?
Anyway, for anyone actually burnt by something similar on a GNU system try looking up recode(1).
And personally I think to exclude all internationalisations because they're harder is a terrible attitude to have. Particularly these days when there's an online tutorials for pretty much any job imaginable (not to mention the numbers of helpful experts willing to give up their time for free on various forums and communities).
Ok, this is where I stop worrying about how quickly I write code. Did this (removing BOM) quite a few times and it took just a few minutes in Python (under Windows). Heck, this could be two-liner I think :)
EDIT: It works fine for comments, at least.