
Please Offer an Excel Export Option - smcgivern
http://www.evanmiller.org/please-offer-an-excel-export-option.html
======
codingdave
"Data is ultimately meant to be consumed by people."

I disagree. Information is consumed by people. Data underlies information, and
sometimes is mistaken for it. But raw data is absolutely parsed and processed
before being consumed by the average audience. Some analytic folk will read
raw data, but only to serve a higher level purpose - to answer some question
or gain some knowledge.

But in general, data is analyzed and filtered and summarized and visualized
before being consumed. This author may just work too close to the data on a
day to day basis to realize where the average audience really fits.

~~~
Sir_Substance
That was my interpretation as well.

If we want to talk about ease of use for the consumer, we need to ask what
kind of person downloads datasets from websites.

Not normal people, that's for sure. The target demographic for that is
probably 5% researchers and 95% idly curious developers and statisticians.

The researchers can take the time to deal with an XLS format, but why would
they want to? The developers and statisticians are doing this in their off
time, and are going to want or need (I wouldn't expect a statistician to be
able to work with the XLS documentation) an easier format to deal with. So if
you give it to them in XLS, they're just going to have to turn it into a CSV
immediately before starting to hack up an interpreter for it.

So why not skip the middle man and deliver it in CSV? This author has the
situation almost exactly 180 degrees backwards.

------
tomlock
I'm a long term enterprise user of Excel, and I taught myself to program using
VB6 and have now moved onto Postgres/Python etc. For any serious data
application, CSV is a must, because navigating data types in Excel is a
nightmare. Over and over, colleagues of mine have lost the leading zero in a
string because Excel automatically typed it as a number, or had a cell value
interpreted as a formula, or had a VLOOKUP not work because the number is
typed as a string or vice versa. These problems often have counterintuitive
solutions, and for a basic user, these issues exist in both excel formats and
in csv. Excel has pretty robust ODBC integration that drives data directly
into Pivot Tables or through VBA. In this way you can provide informative (and
pretty) reports to end users, with dynamic aggregations, filters etc. If your
users are more technical than that or require more detailed atomic data, the
widely compatible solution is generally csv. What I'm trying to say is that
the excel format fills a gap that is very small, between completely non-
technical and slightly technical data users, and honestly probably isn't worth
implementing.

------
FabianBeiner
For PHP, I can recommend
[https://github.com/PHPOffice/PHPExcel](https://github.com/PHPOffice/PHPExcel).

------
sp4ke
Sorry but this would only keep the proprietary file format madness that is
Excel. I would prefer to share the data in JSON and let the user decide what
to do with it ...

~~~
CountHackulus
That doesn't solve the lossless encoding of doubles problem.

~~~
stephencanon
Lossless encoding of doubles is a long solved problem. Use hexadecimal
floating point format (%a with printf in C99, inherited by most high-level
languages that sit on top of C). The problem is platforms that still don't
support C99.

~~~
nkurz
I was wondering why he said that CSV was not capable of supporting full
precision for doubles. Are you saying that the problem is not with CSV per se,
but instead the C libc printf function? I'd have thought that modern standard
libraries would do this right with the usual decimal %f and a large enough
precision:
[http://pubs.opengroup.org/onlinepubs/009695399/functions/pri...](http://pubs.opengroup.org/onlinepubs/009695399/functions/printf.html)

~~~
stephencanon
The trouble is that (a) the standard recommends but does not require that all
binary-to-decimal and decimal-to-binary conversions be correctly rounded and
(b) even if it did require correctly-rounded conversions, the precision needs
to be absolutely enormous to guarantee that every double is exactly
representable in the chosen format (which makes it wildly wasteful), otherwise
you'd get different results depending on the rounding mode in effect at the
time of conversion back to double.

------
omonra
As a user, I agree 100%. The only way I use CSV files is to import them in
Excel.

I've always assumed that programmers output to CSV because they are too lazy
to implement an Excel output function.

~~~
bsg75
CSV can be consumed by anything (with the caveats mentioned in the article),
XLS/XLSX by spreadsheets only )if you want to work with the data).

Excel files have row limitations, potential file format incompatibilities.

Depending on the language being used, it can be the same level of effort to
emit XLS file as a CSV file, so unless we are talking about programmers how
don't bother learning how to use output libraries, I suspect CSV is more
common for reasons of compatibility.

~~~
omonra
"CSV can be consumed by anything"

That's the thing. For 99.99% of users there is no _anything_. There is only
Excel. And dealing with CSV files is headache for them.

It's as if you designed parking garages to accommodate both horses and cars -
because _someone out there_ might be coming on a horse.

~~~
bsg75
99.99% is a made-up stat, and highly dependent on audience.

Noting Excel can open CSV natively (again with the caveats mentioned in the
post), my users want a garage not limited to cars, but trucks and bicycles
too.

------
oheard
Is there any reason why the open ODS format would not be a better alternative
for those of us concerned about open standards?

~~~
justincormack
You should also provide it, but some people may not be able to use it with
their tools.

~~~
the_why_of_y
At least MS Office 2007 and later support ODF import/export.

------
codeulike
_In all likelihood they are going to open your CSV file in Microsoft Excel
2004, or if you 're lucky Microsoft Excel 2007_

When did he write this? It says Nov 2014 at the top but that doesn't seem
accurate.

~~~
Demiurge
He also mentioned pandas, which isn't that old. So, it's probably hyperbole.

~~~
NamTaf
It wouldn't surprise me in the slightest that the corporate envronment still
uses Excel 2003 (? not 2004, I thought?) or maybe 2007 rather than 2013.

Then again the WinXP EoL should have changed that, but watch it not have.

~~~
ZenoArrow
Excel 2004 is basically the Mac version of Excel 2003. It's EoL'd too.

As for the article, personally I'd say drop XLS support and go for XLSX. The
more that users are aware their version of Excel is no longer supported, the
more noise they'll make about wanting upgrades. Plus, it's not limiting, in
the sense you can get free tools that open XLSX (LibreOffice, etc...).

I would imagine there'd be XLSX-handling libraries for most mainstream
programming languages. I've used XlsxWriter with Python, it's fairly
intuitive...
[https://xlsxwriter.readthedocs.org/](https://xlsxwriter.readthedocs.org/)

~~~
bradleyland
We do a ton of Excel import/export in our web app. The day we moved to XLSX,
every one of our users cheered because the reliability of our Excel related
features improved dramatically.

The problem with XLS is that its internal format (called BIFF8) has no
official specification. It has been reverse engineered to a great degree, but
every implementation I've used outside of actual "excel.exe" has show stopping
bugs that you will encounter at some point.

XLSX, on the other hand, has an actual published spec (OOXML). There is plenty
of political strife surrounding OOXML, but at least we have a spec we can
develop against. It has also been my experience that "simple" XLSX files can
be constructed more reliably than their BIFF8 counterparts. Because XLSX files
are XML internally, there are a great number of libraries that can be used to
construct the required XML structures, while avoiding edge-case errors in
composition that are inherent to reverse-engineered binary formats. At a bare
minimum, software authors can use something like libxml to construct valid
XML, rather than some ad hoc BIFF8 serializer.

FWIW, we use the axlsx gem for our Ruby app, and it hasn't let us down yet. It
even supports some pretty eccentric Excel features like data validations.

------
kubiiii
An alternative is to provide data as ressources that can be imported using
Excel web queries. I don't thnik vanilla excel can be linked to JSON
ressources, but a simple HTML table does the trick. Excel Web queries can use
parameters, making the use of several ressources faster vs the download of
several files. As a bonus you can set the Excel file for the client, with your
data ready and up to date in it.

------
bane
I've seen a weird hack where an export to excel option was provided, but what
was actually exported was an html file of some sort with a .xls extension that
happened to open fine in Excel. I think it even had different tabs.

If you just create a text file and put an old fashioned html table in it and
rename it to .xls Excel will complain a little, and then open it, and even
respect font tags like <b> and <i>

~~~
troels
xls is a catch-all extension for a bunch of formats that Excel (the program)
will interpret. See my comment below about SpreadsheetML.

------
wluu
There's also the Open Xml SDK, written in C# and works with Mono (in addition
to .NET on Windows)

[https://github.com/OfficeDev/Open-XML-SDK](https://github.com/OfficeDev/Open-
XML-SDK)

Someone has also created a javascript version of the SDK -
[http://openxmlsdkjs.codeplex.com/](http://openxmlsdkjs.codeplex.com/)

~~~
taspeotis
If you're using .NET I would recommend looking at EPPlus instead:
[http://epplus.codeplex.com/](http://epplus.codeplex.com/)

------
dpweb
I was doing CSV but couldn't get multiple files into the same sheet (tabs). I
had 20 CSVs to download I didn't need 20 different files.

I work in JSON mostly do wrapped the excellent XLSX library it was very useful
[https://goo.gl/S7rlFm](https://goo.gl/S7rlFm)

------
microcolonel
CSV export is for Numbers users, I think.

Apple has failed to support OpenDocument Spreadsheets, and their XLS importer
regularly crashes altogether for the operations people at my office, though
maybe it's super stable elsewhere.

~~~
Someone
In my experience, Numbers doesn't even import XLS anymore.

XLSX works fine, though (for the features that Numbers supports)

------
nickporter
I have a product that lets our users (who are non-technical) export data. CSV
exports were a source of many support requests. We switched to XLSX exports
and everybody seems happy.

------
sebastianavina
I use Google Refine (OpenRefine you name it) for exploring data, it's more
powerful than excel for visualizations, have a lot of ways to order it. It's
amazing.

------
jjs3979
Just expose an odata endpoint than can be consumed by excel, various other
analytic tools as well as whatever programming language you want since it is
just rest.

~~~
hobs
So for users that we are thinking are going to have too much trouble
converting a csv to an excel document, we will instead use a custom data
source. I dont think that solution really works for people using it at the
level he is talking about.

------
justincormack
xlslib, at a brief glance, looks like it is based on the .xsl format, not
.xlsx. Wondering if it is easier to generate .xlsx XML files for export, or
not.

~~~
skeletonjelly
They're totally different formats. It'd have to be a separate implementation.
xls is binary, xlsx is XML based.

~~~
troels
xls is not just one format. There are several formats which Excel will
understand, that uses that extension. One of them is a real simple xml-based
format (SpreadsheetML), that is way easier to generate than proper excel files
and has most of the features you need (At least for data transport purpose).
Highly recommend using it.

~~~
Mikhail_Edoshin
I second SpreadsheetML; it's plain XML (not multi-part compressed .xslx) and
it supports multiple sheets, formulas, formatting, and some other bits like
named ranges, print settings, and such.

~~~
justincormack
Ok,so is there a simple implementation guide/library?

~~~
Mikhail_Edoshin
Yes, here it is:

[http://msdn.microsoft.com/en-
us/library/aa140066%28office.10...](http://msdn.microsoft.com/en-
us/library/aa140066%28office.10%29.aspx)

It only describes the data part and doesn't document Excel options such as
print settings and filters; if you need them, create a sample in Excel, save
it as XML Spreadsheet 2003 and examine the result.

(The 'big' Excel format that is a part of Office Open XML format is also
called Spreadsheet ML, but they're different; from what I understand this one
is a subset.)

~~~
the_why_of_y
It's a deprecated format that was essentially Microsoft's first experiment at
an XML based format in the Office XP/Office 2003 days... whether it will
continue to be supported in future versions is anybody's guess, given that the
successor format OOXML is much more widely adopted.

~~~
Mikhail_Edoshin
My impression is that Microsoft is fairly reliable when it comes to supporting
old formats and standards. E.g. I can still send Word and Excel DDE commands
and this is a tech from 90s, Windows 3.1 and stuff. OOXML is certainly more
powerful, but it's also more complex.

------
volker48
"When they export their data as CSV, they’ll probably just bring it into Excel
first to have a look around."

If the first thing the the user is doing is bringing the file into excel and
you don't need multiple tables per file what is the difference to the end user
other than the file having a csv instead of a xls or xlsx file extension?

~~~
onethree
its right there in the article: Dates, times, and durations; Percentages;
Number formatting

the number of times i've opened a csv to find that excel has tried to
determine the dates, and decided that the year thats being referred to is 1914
instead of 2014...

~~~
volker48
I've never had an issue like you describe with dates before. I've always
output them in the csv in Y-m-d format and that has worked as expected.

I should have been more precise and said if your data isn't using the set of
features described in the article there isn't a big difference. Not all data
has dates, time durations, percentages, and number formatting.

