
Stop using Windows Notepad - nreece
http://blogs.msdn.com/michkap/archive/2010/02/23/9967789.aspx
======
harshpotatoes
So there is this character, the BOM, which is explicitly allowed by UTF-8
standard. And while it is not really necessary, nor recommended, you are still
allowed to put it in your UTF-8 text files to signal that this text file is
encoded in UTF-8. Then there are all of these *nix programs, which don't know
how to deal with these BOM's, simply because the first character in the file
they're reading from isn't the shebang or <?php or something like that, but is
instead this completely allowable BOM. and everybody believes MS should be
fixing their product? am i missing something?

~~~
wendroid
<http://plan9.bell-labs.com/sys/doc/utf.html>

Doesn't say anthing about BOM.

~~~
harshpotatoes
"In UTF-8, the BOM corresponds to the byte sequence <EF16 BB16 BF16>. Although
there are never any questions of byte order with UTF-8 text, this sequence can
serve as signature for UTF-8 encoded text where the character set is
unmarked."

<http://www.unicode.org/versions/Unicode5.2.0/ch16.pdf>

~~~
wendroid
You can always trust a standards body to screw something up.

------
gfodor
This is like one of those moments where you are quickly brought back to a
painful memory from a more troubled time in your life, so distant in mindset,
if not time itself, that it has faded to be less a memory of life itself than
that of a bad dream.

You shake it off, and it recedes back into the cobwebs, not to be jarred again
until the next time something flies by in your RSS feed about a new Microsoft
product or feature that is a poorly conceived sugar-coated re-implementation
of cron, grep, sed, bash, vim, awk, ssh, unix pipes, or lisp.

Yes, I too was once a full-time Windows user.

------
jrockway
Ok, but everything handles the UTF-8 "BOM" just fine. Adding it only causes
problems when you ./file.sh that has a BOM, and haven't setup your binfmt_misc
magic properly. Fix that, and it works fine.

~~~
jmillikin

      Ok, but everything handles the UTF-8 "BOM" just fine.
    

For small values of "everything". I can't even count the number of bizarre
errors I encounter, only to discover that one of the files I'm working on has
been corrupted by somebody carelessly using Notepad. Usually it's not obvious
what's going on -- the error will be something like "Error parsing file:
illegal byte before start of message body", which provides not much help when
it's 23:30 and I'm staring at a text editor wondering what the hell byte it's
talking about.

~~~
pak
And therein lies the insanity that is a BOM: it's a character that's meant to
be invisible, even to your UTF-8 capable text editor. With this in mind, it's
not clear what _is_ supposed to be able to view or edit this character, short
of a hex editor. Somebody else mentioned even cat ignores it; how are you
supposed to easily tell that it's there or not?

IMHO a BOM is completely ridiculous to have on a UTF-8 file, because UTF-8 has
no ambiguity over endian-ness that a BOM needs to resolve, and it breaks the
principle that UTF-8 can serve as ASCII when all the codepoints <0x7F.

Is there any point to using it in UTF-8 besides pain, suffering, and "well
UTF-16 and UTF-32 have it"?

The answer, straight from the horse's mouth.

<http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf> , pg. 36

"Use of a BOM is neither required nor recommended for UTF-8, but may be
encountered in contexts where UTF-8 data is converted from other encoding
forms that use a BOM or where the BOM is used as a UTF-8 signature."

That's right, the UTF-8 BOM is neither required nor recommended. I rest my
case.

------
donaq
Hmm. I am as bewildered by this as the author. Why would anyone edit shell
scripts with notepad?

~~~
btipling
A terrifying fear of vim modes, maybe. I guess people stick with what they're
used to. I don't know, I'm trying here, I really am.

~~~
rbanffy
joe and nano may be your friends.

If one is afraid of Vim, I would not recommend Emacs.

------
sliverstorm
I agree with the author. Right or wrong, Notepad:

1: is on every Windows machine ever

2: just works

I leave it to you, critics, to select something, anything, that is a
fundamental tool available on every *nix machine, and just stop using it.

Hint: nobody but a purist is going to stop using `cat` or `cd` because of a
minor incompatibility with a small percentage of computers.

I love Linux, I love Unix, but I also love that Windows had and still has
Notepad. One of Microsoft's most successful apps in my eyes, that honestly
works. Hell, in any case if it didn't have at least one flaw I'd start getting
suspicious wondering if M$ really did write it after all.

~~~
sketerpot
Does Notepad support large files nowadays, and line endings other than CR-LF?
Because for a long time Notepad definitely did _not_ Just Work.

~~~
sliverstorm
Ok, it wasn't perfect, but the only time CR-LF was an issue was moving to Unix
machines, so I made a habit of doing dos2unix/unix2dos or perl one-liners. I
suppose I should redact my use of just works. Notice you capitalize it and I
don't- I think that explains everything.

Just Works: No user intervention required whatsoever, could be used by a 2nd
grader with no prior knowledge

just works: does it's job with perhaps a bug or two that can be worked around

------
jmillikin

      A long time ago, someone decided that if your file was 100% ASCII and you chose to save it as UTF-8 and you opened the file up again and added some >0x007f character and later saved again that you should not be prompted
    

The author makes this sound like an unreasonable request, but I don't see the
problem. All ASCII text is also UTF-8; if a file is valid ASCII, it should be
opened and saved in UTF-8 mode.

    
    
      P.S. Isn't there some tool on UNIX that does this correctly?
    

Yes, nearly all of them.

------
pasbesoin
I haven't been to his blog in a while, but Raymond Chen has made various
elucidating comments pertaining to Notepad, over the years.

[http://www.google.com/#hl=en&source=hp&q=site%3Ablog...](http://www.google.com/#hl=en&source=hp&q=site%3Ablogs.msdn.com%2Foldnewthing%2F+notepad&btnG=Google+Search&aq=f&aqi=&aql=&oq=site%3Ablogs.msdn.com%2Foldnewthing%2F+notepad)

------
rbanffy
I must confess I never faced this problem. What annoys me no end is windows
end-line CRLF pairs.

That insanity must end.

Oh.. And a simple solution to this problem is simply to ban Windows. It also
solves a whole lot of other problems and creates an incentive for keeping sane
corporate networks (with interoperable applications and so on).

------
rbanffy
I wonder if a crippled, totally brain-dead, but formally correct
implementation is not Microsoft's way to discourage usage of a standard while
adhering to it in order to get government business...

POSIX subsystem, anyone?

------
hexley
That dialog box is a linguistic nightmare. What's with the OK and Cancel
buttons? I thought Windows was changing to "Save" and "Don't Save" like they
should have done all along.

~~~
ryanelkins
I think that dialog box is from his personal, private build that he mentions
in the footnotes. Therefore, he probably doesn't really care how user friendly
it is. I'm actually surprised its as well written as it is.

