
Calamine: Excel file reader, written in Rust - TAForObvReasons
https://github.com/tafia/calamine
======
geocar
How neat!

Here's a little Excel-file writer in PHP; Excel will read these files, but
this library won't:

    
    
        <?php echo '<?xml version="1.0" encoding="utf-8"?>'; ?>
        <?php echo '<?mso-application progid="Excel.Sheet"?>'; ?>
        <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
            xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" 
            xmlns:o="urn:schemas-microsoft-com:office:office"
            xmlns:x="urn:schemas-microsoft-com:office:excel" 
            xmlns:html="http://www.w3.org/TR/REC-html40">
          <Worksheet ss:Name="Sheet1">
            <?php $data = array(array(4,5,6),array("dude","whatever","m<om")); ?>
            <Table>
              <?php foreach($data as $row): ?>
              <Row>
                <?php foreach($row as $cell): ?>
                <Cell>
                  <Data ss:Type="String"><?= htmlentities($cell) ?></Data>
                </Cell>
                <?php endforeach; ?>
              </Row>
              <?php endforeach; ?>
            </Table>
          </Worksheet>
        </Workbook>
    

They're "technically" Microsoft Office XML-format[1], and not "Excel files",
but users don't tend to be amused by programmers being technically right so I
tend not to argue about that sort of thing. I only bring it up here in case
someone wants to search for a specification on it.

[1]:
[https://en.wikipedia.org/wiki/Microsoft_Office_XML_formats](https://en.wikipedia.org/wiki/Microsoft_Office_XML_formats)

~~~
Piskvorrr
Also, Excel will read the old BIFF files (XLS), CSVs, SYLK, and whatnot. You
are doing, in reverse, exactly that which you oppose: "unless this library can
open each and every and any file that Excel can, bug-for-bug compatible, it's
Wrong!"

(While it is true that users will toss anything at you, it's quite pointless
to expect one library to handle all of it: we used one to sniff the content,
another to parse CSVs, yet another to parse XLSX, and in-house code for this
obscure abomination which sadly still gets about 10% of use)

~~~
geocar
> it's quite pointless to expect one library to handle all of it

Libraries don't just come with implementation, but also a use model. Perl, for
example has:

Spreadsheet::XLSX

Spreadsheet::ParseExcel

These have very different user (programmer) interfaces -- so different that
now there's a Spreadsheet::ParseXLSX which just allows programmers (and
programs) used to one interface, consume the new files with very little
changes.

This rust library has it's own model, and if it's a _good_ model it makes
sense to implement other file formats with this model -- and indeed this is a
good way to test if it's a good model.

Other things like CSV are so fundamentally different from XLS/XLSX that
they're _going_ to have a different model, and so it makes sense that these
require different libraries.

------
igitur
Which C# libraries were used in comparison? The 2 I know of, EPPlus and
ClosedXML, do much more than just read cell values, so it's unfair to compare
against those. For example the additional complexity of handling styles adds
overhead.

Disclaimer: I'm a current ClosedXML contributor.

------
moomin
For me, the gold standard of these libraries is Apache POI and NPOI. Porting
to Rust would be perfectly possible, but it would be _a lot_ of work. Until
you support that level of functionality, it's pretty much irrelevant what your
"performance" is. You can always improve performance by dropping features.

~~~
merb
actually apache poi is extremely slow while reading/writing large excel files.
so there might be the need for some libraries to process them fast.

~~~
jontro
The streaming versions of poi (SXSSF) should be really fast if you're working
with large documents.

~~~
merb
only if you know the column size ahead of time. and it's still slower than
other streaming solutions and at some point it buffers to disk.

------
laacz
There was a time I had to read XLS (BIFF formats) in PHP. Poked around, found
some libraries (slow as hell), decided to port[1] one of them (python's xlrd).

What followed, even though it was almost 1:1 port, was an inevitable loss of
huge part of my life, I'll never get back. However, I learned a lot about what
a mess that format is. Since then ANY xml format Microsoft uses is godsent.

[1]: [https://github.com/laacz/xls-reader](https://github.com/laacz/xls-
reader)

------
faitswulff
Was there a particular reason that this library was written? The only thing I
can glean from the repo is that it seems to be faster than C# alternatives.

~~~
masklinn
Somebody needed to read excel files from Rust, possibly not from Windows? That
would also be the reason why openpyxl and xlrd exist for Python:
[http://www.python-excel.org](http://www.python-excel.org)

e.g. importable data using real spreadsheet files is much more reliable than
CSV, and easier for clients to generate.

~~~
brunoqc
> e.g. importable data using real spreadsheet files is much more reliable than
> CSV, and easier for clients to generate.

It can also be more a pita since cells can have different formats.

