
Excel: Error when CSV file starts with “I” and “D” - rompic
https://support.microsoft.com/en-us/kb/323626
======
mckoss
Maybe my bug? I wrote most of the original Excel text-based file import code
in 1984 (including the SLYK format, which was the main way we migrated files
from Multiplan to Excel).

~~~
rjbwork
This is one of the great things about HN, and indeed the internet at large.
Someone posts a link to a little quirk of our world, and then someone is able
to pipe up and say "hey that's me! I made that thing!" Even better when it's
something like a software bug.

~~~
drzaiusapelord
Yeah but for every superstar post like this there are thousands of equally
upvoted yet ignorant comments that play on the sympathies/bias of HN readers
for upvotes.

I wish we had more of the former and less of the latter. I suspect no forum is
ever safe from eternal September.

~~~
dredmorbius
What do you suggest as a means for achieving that end?

~~~
drzaiusapelord
Cites from journals (not politicized websites like huffpo, guardian, etc) ,
credential listings, publication listings, etc.

Something like this could filter out the "random geek wanna-be with an axe to
grind" type post.

~~~
dredmorbius
Essentially a reputation system, based on publisher or specific known authors
then?

How would, say, something like Thompson's "Trusting Trust" (though I suppose
that _was_ published in ACM or IEEE), or a Dijkstra or Pike blog post, rate?

Comments from, say, Linus Torvalds on the LKML, or Lennert Pottering on
systemd, or Bill Gates' various book recommendations, etc.?

------
combatentropy
I've run into this error many times for a decade, always when I make a CSV
file of a table whose first column is "ID".

It seems to me that Microsoft by now could have improved its tests. If the
first two letters are "ID", but if "there are no valid SYLK codes after the
'ID' characters," then maybe it was never meant to be an SYLK file. If the
file's suffix is .csv, then maybe you should just treat it as a CSV file. If
the file's suffix is .txt, then maybe you should treat it as a text file.

~~~
13of40
If you're a fan of this sort of thing, try the following in CMD. (It helps if
you read it in the voice of a 90's rapping cartoon cat.)

echo MZ is my name my name is MZ. I'm the hippest loader bug from the sea to
the sea! >foo.txt

.\foo.txt

~~~
Retr0spectrum
What does this do? I don't have access to a windows machine.

~~~
13of40
For historical reasons, the MZ is interpreted as the header of a 16-bit
executable. When you run something under CMD (hopefully I'm getting this
right, it's been a few years) it first runs it using the CreateProcess API,
which just tries to run the file as an executable, regardless of extension,
then it falls back to ShellExecute if it doesn't have an MZ header, which
dispatches based on extension.

Basically you get a dialog saying the text file isn't a compatible executable.
If you change the MZ to anything else, it opens in notepad.

~~~
tehmaco
Windows 7 64 bit here - it just has the error message box, and the same text
in the cmd window, it doesn't open the file though :)

"This version of foo.txt is not compatible with the version of Windows you're
running. Check your computer's system information to see whether you need a
x86 (32-bit) or x64 (64-bit) version of the program, and then contact the
software publisher."

~~~
JonathonW
That's because it's treating it as a 16-bit DOS executable (since it has the
'MZ' magic number but not the rest of the PE header that tells Windows that
it's 32 or 64-bit). 64-bit Windows can't run 16-bit executables at all.

32-bit Windows should fail a little later in the execution process; it can run
16-bit software, but your text file is missing the rest of the MZ-format
header.

------
sly010
Reminds me of a charset detection bug in some versions of notepad [1]:

1\. Open new file

2\. Type "Bush hid the facts"

3\. Save the file

4\. Open the file

5\. The content of the file have changed to "畂桳栠摩琠敨映捡獴"

[1]
[https://en.wikipedia.org/wiki/Bush_hid_the_facts](https://en.wikipedia.org/wiki/Bush_hid_the_facts)

~~~
cesarbs
There was a similar sentence in Brazilian Portuguese. Something about our
largest TV network (Globo) lying about the existence of Acre (a state people
jokingly say doesn't exist, like North Dakota in the US). I can't remember the
exact sentence now though.

:)

~~~
javawizard
acre vai pra globo?

~~~
cesarbs
Oooh, that was it. I had a completely wrong recollection of it, so my previous
comment is BS now :/

------
mhd
So this would be another problem that could be avoided if "CSV" would actually
friggin mean " _comma_ separated values" and not "whatever Windows might
consider comma-equivalent depending on locale and/or phase of the moon"?

~~~
boulos
No. From the detailed description:

> A SYLK file is a text file that begins with "ID" or "ID_xxxx", where xxxx is
> a text string. The first record of a SYLK file is the ID_Number record. When
> Excel identifies this text at the beginning of a text file, it interprets
> the file as a SYLK file. Excel tries to convert the file from the SYLK
> format, but cannot do so because there are no valid SYLK codes after the
> "ID" characters. Because Excel cannot convert the file, you receive the
> error message.

This is usually called "magic string" or "magic number"
([https://en.m.wikipedia.org/wiki/Magic_number_(programming)](https://en.m.wikipedia.org/wiki/Magic_number_\(programming\))).
It has nothing to do with the comma, and everything to do with SYLK using a
pretty risky magic string (ID) and Excel not having a "try SYLK, if that
fails, try as csv".

tl;dr: this is about guessing the input format and has nothing to do with the
delimiter.

~~~
mhd
It's not just ID, there's also mandatory fields after that, separated by a
semicolon. Thus 'ID;P' would be a valid header for SYLK and e.g. for German
"CSV"s. If CSV would stick to commas (and boohoo, just escape more fields when
this causes locale issues), then even such overly simple heuristics would've
sufficed, wouldn't it?

Although you'd just run into other pathetic cases with such an informal format
(CSV, not SYLK). But hey, I need to interchange formats with some IT guys and
don't have a big IT department (ready) and they don't want to parse XLS(X), so
I just use CSV. Now you have two problems.

~~~
philh
"ID;P" is valid at the start of CSV as well. Fields are allowed to contain
semicolons.

~~~
Frank2312
Depending on locale.

I believe in the Canadian French locale (and maybe many other locales), ";" is
considered a separator (a "," equivalent).

~~~
philh
Sure, but mhd was saying that if we stuck to CSV being actually comma-
separated, "ID;P" would distinguish CSV from SYLK. I'm saying that it
wouldn't.

~~~
Someone
mhd doesn't say that:

Thus 'ID;P' would be a valid header for SYLK _and e.g. for German "CSV"s._

~~~
philh
"If CSV would stick to commas ... then even such overly simple heuristics
would've sufficed, wouldn't it?"

~~~
Someone
Ah, thanks. Apparently, mhd knew and forgot :-)

The problem of discriminating between csv and SYLK files is unsolvable, as
every file with only ASCII characters and no commas or quotes is a valid csv
file (with one column) and there are valid SYLK files in that set.

~~~
ufmace
In the general sense, yeah, but you would think it shouldn't be too hard to
make a reasonable guess about which one it is and handle failures gracefully
instead of assuming that it's SYLK based on a tiny amount of data and then
blowing up when it turns out not to be.

------
reilly3000
Jeez, how many SLYK files out there? I venture to guess there is an order of
degrees more attempts at opening CSV files with ID as the first header than
attempted opens of unconverted SLYK files in excel.

When I admire Microsoft, it's most often because despite nearly 4 decades of
bloat to support, they still can ship reliable software to millions.

~~~
lostlogin
I kind of wish I could just use a version from a decade or 2 ago. The (4 year
old) versions of word and excel that the uni site I work for use are so clunky
to navigate. I loved using word in pre OSX says - I wonder how much of that is
rose tinted nostalgia.

~~~
Piskvorrr
Try LibreOffice (Portable if necessary) - it has remained with the
conservative UI.

------
MikusR
"A SYLK file is a text file that begins with "ID" or "ID_xxxx", where xxxx is
a text string. The first record of a SYLK file is the ID_Number record. When
Excel identifies this text at the beginning of a text file, it interprets the
file as a SYLK file. Excel tries to convert the file from the SYLK format, but
cannot do so because there are no valid SYLK codes after the "ID" characters.
Because Excel cannot convert the file, you receive the error message. "

~~~
Piskvorrr
In other words, Excel has all the information to decide that this is _not_ a
SYLK file but a CSV, but just throws an error, because fuck you. Great UX.

That's way up there with the "you can't drag stuff here, should've dragged it
a few px further" from Win98.

~~~
kedean
Who's to say that it should fall back to CSV instead of giving an error? What
if it was actually a malformed SYLK file where further headers were mangled in
transmission?

I do think it should be giving back a more descriptive error though, possibly
one informing the user that it thinks it is a SYLK file and giving them the
option to interpret it differently.

~~~
Piskvorrr
That is Excel team's choice, of course. However, I would have thought that it
would be based on the usefulness of the given import type _currently_ , not in
1988 (I trust there have been new import filters since)

How many SYLK files did you see recently, e.g. within the last 30 years? For
me, the result is 0 (zero). As compared to innumerable swarms of CSVs of all
flavors. Yet even MSO2013 prefers Yonder Hiftorical Curioufity; whence Excel's
fondness and preference for obscure and rare formats, I have no idea.

And yes, giving user at least an intelligible error message would be nice (I
hold no illusions that popping up a selection would be a complex feat: import
logic is usually a gnarly place).

------
chias
I feel like this is one of those bugs that would have taken less time to fix
than it took to document.

You could even do something as trivial as "if Excel fails to open the file as
SYLK, try again as CSV" and cut at least 99% of the problem away.

~~~
zaidf
_I feel like this is one of those bugs that would have taken less time to fix
than it took to document._

After many years, I still have to slap myself when I catch myself thinking
this way. Unless you know the architecture of the software, I've learned that
it's often _significantly_ harder than you imagine to fix seemingly simple and
obvious bugs without breaking something else.

~~~
Dylan16807
Fine, a slight correction then. It should be easy to fix except for terrible
software architecture. Terrible software architecture is not a valid excuse
when you have plenty of time and billions of dollars. So there should be
little forgiveness for this bug still existing.

~~~
zaidf
As a consumer _affected_ by this bug, sure, you may find it inexcusable. But
the basic truth about any mature software product used by tens of millions is
that there will be a laundry list of bugs competing for limited resources. So
there will _always_ be bugs that you want to fix but simply isn't a high
enough priority _relative_ to all the other bugs or features.

~~~
Dylan16807
There are limited resources, but there are enough resources to improve the
architecture over more than a decade. And if they decided to avoid
architecture changes, there would be nothing in the way of throwing lots of
siloed software engineers at bugs to get more of them fixed. They're not
lacking for money with _so many_ customers. It's possible for them to set a
much more inclusive cutoff.

They could have fewer bugs per feature than almost any other product after
being stable and popular for so long. But they choose not to make that a goal.

------
AndyKelley
The lesson here is when choosing your new file format's magic bytes, don't use
your initials or something clever. Have the decency to use a few actual random
bytes as your format's identifier.

For plain text files, it's a little bit different, but there's no excuse for
this problem happening in a binary file format.

Switching topics, I'm imagining a solution to this problem where you don't
actually know which format it is, you have a streaming processor for each
thing it could be, and feed it to each processor one character at a time.
Whenever a processor returns an error, drop it from the list. If the last
processor in the list returns and error, report that error to the user along
with which format that processor was for. If multiple peocessors complete
successfully, you'll have to either rank them, or ask the user what format it
is.

~~~
mark-r
SYLK isn't binary, it's all ASCII.

Interesting idea on passing the file to multiple parsers simultaneously,
although I don't see a benefit over trying them serially.

------
cm2187
I had to check this is not the 1st of April

I particularly like the step by step explaination of how to add an apostrophe
at the begining of the first line with a description of which keyboard key to
press...

~~~
gus_massa
I have three apostrophes in my (Latin America) keyboard: ` ' ´ but IIRC Excel
only recognizes one of them.

------
lifeisstillgood
I had to fake a csv file for testing yesterday, and started out with id, then
went "ahahah", and used name.

I know that's boring but this is one of those generational pieces of knowledge
like "keep your docs up to date" that we need to build into software training
somehow. (Or rather, this bug is not important, but the kind of training that
imparts this knowledge will be vital in building a real software profession)

But yeah, it's a real WTF

~~~
Jaruzel
Embedded knowledge like this, is why guys like us can never retire...

------
rompic
Also reminded me of [http://tburette.github.io/blog/2014/05/25/so-you-want-to-
wri...](http://tburette.github.io/blog/2014/05/25/so-you-want-to-write-your-
own-CSV-code/)

------
garyclarke27
Excel is useless with csv and utf-8. when I need to export to csv, which I
regularly do, I always use Libre Office.

~~~
Freak_NL
Excel is quite often useless with CSV, even if its contents are pure ASCII.
Whether or not Excel will open a standard CSV file delimited with comma's (as
per RFC 4180) properly also depends on the locale of the computer running
Excel. So if you run Excel on Windows with a Dutch locale, it will by default
fail to open the CSV normally and jam all its contents in one cell, because it
expects a semi-colon as delimiter instead of a comma. To open standard CSV you
have to import the file instead.

Someone at some point decided that the Dutch use semi-colons instead of commas
to delimit tabulated data, and decided to apply this logic to CSV files as
well. I would like to know the history behind such a decision! Switch your
locale to English US, and it opens normally.

LibreOffice just opens the CSV and asks the user to confirm the delimiter and
what not regardless of locale.

~~~
jsproc
Yes, this is so horrible. As a workaround I sometimes put "sep=," on the first
line of the spreadsheet. This forces Excel to use the comma as a separator,
regardless of the locale. The downside of course is that every other tool on
the planet shows the extra row, so it's not very useful.

~~~
ptman
I suffer from the reverse problem. I need to ingest CSV files produced by
excel in various locales. Or is there some generally available alternative to
excel that would produce valid RFC CSVs?

~~~
Piskvorrr
There is no such thing as a valid CSV - even the spec is borked, and nobody
follows it anyway. If you have a choice, XLSX is a lesser evil.

~~~
Freak_NL
RFC 4180¹ has been in existence since 2005 and seems perfectly straight
forward. The basic syntax is two pages of very understandable text. I hold
that to be the definition of valid CSV.

CSV libraries tend to adhere to it (and often support additional options
encountered in the wild as well); e.g., Apache Commons CSV².

1: [https://tools.ietf.org/html/rfc4180](https://tools.ietf.org/html/rfc4180)

2: [https://commons.apache.org/proper/commons-
csv/](https://commons.apache.org/proper/commons-csv/)

~~~
supergreg
Is there a way to report a bug to excel asking for proper RFC 4180 support?

~~~
Freak_NL
Probably, but what will it achieve realistically? This (both what OP reports
and this locale-dependent behaviour) is likely a WONTFIX type of bug, as there
are workarounds (don't double-click to open normal CSV if you are in a locale
where different delimiters are used, use import; don't start your CSV files
with "ID"; etc.).

------
rompic
We had a big wtf moment yesterday at work.

Also see
[https://en.m.wikipedia.org/wiki/SYmbolic_LinK_(SYLK)](https://en.m.wikipedia.org/wiki/SYmbolic_LinK_\(SYLK\))

------
code777777
"Applies to Microsoft Excel 2002 Standard Edition, Microsoft Excel 2000
Standard Edition, Microsoft Excel 97 Standard Edition, Microsoft Office Excel
2003 "

~~~
BBlarat
I still get the error in Excel 2016, so I guess it's not fixed yet.

~~~
chetangole
We get a warning in Excel 2016, and can open the file.

------
rbobby
Ok... that workaround is just terrible. Add one single-quote to the first
line?!? What about a closing quote? And wouldn't the side effect of this be
for the first line to be treated as one value (instead of separate values for
each header)?

The workaround needs it's own workaround :)

------
makecheck
There are times when I think the source code of the Unix "file" command should
be a required part of languages’ standard libraries. I have never seen
anything do a better job of describing contents accurately.

It’s really silly to see applications have trouble understanding data. Even on
macOS, where "file" is installed, the graphical interface sometimes fails to
open/preview something that the command-line "file" describes perfectly (i.e.
if it’s really text, just show me the text).

~~~
Piskvorrr
For all its magic, even _file_ is fallible.
[https://news.ycombinator.com/item?id=8171956](https://news.ycombinator.com/item?id=8171956)

------
ZenoArrow
I don't know enough about SLYK to suggest a full solution, but couldn't
additional checks be performed that confirm it's a SLYK file? Other than all
SLYK files starting with ID, are there any other properties that exist in all
SLYK files that could be checked against?

------
bawana
So why didn't SYLK files get their own extension? Having to parse the file to
figure out what it is? Seems so error prone. But maybe Machine learning will
solve this too. Isn't that why extensions were invented- to reduce cognitive
load?

~~~
Someone
They got, but many systems that produced them didn't have such a thing as a
filename extension. Microsoft Multiplan, for example, ran on Commodore 64,
CP/M, TRS-80, etc.
([https://en.m.wikipedia.org/wiki/Multiplan](https://en.m.wikipedia.org/wiki/Multiplan))

And even if they did, chances were that transporting the file between machines
(no, you couldn't move a floppy disk between machines, even if both machines
had a floppy disk. Typical transport involved sending data over a serial line
that only guaranteed to transfer 7 bits/byte, another reason why SYLK is
ASCII) lost the extension.

------
v768
[https://xkcd.com/1700/](https://xkcd.com/1700/)

------
pskomoroch
Weird, I hit this exact error 2 weeks ago

~~~
chrismorgan
Ditto, two days ago.

------
DenisM
One thing they could do is just create a new extnesion, like .CSVX and treat
it as CSV file per RFC. I wish.

------
SubiculumCode
I've been getting this error from CSV I created in Ubuntu. Glad to have
learned what this was about.

------
DenisM
Could that be worked around by enclosing the field in quotes?

------
flamedoge
one of those, how did they miss this?

------
deadowl
I get this error all the time.

~~~
lspears
Same

