

Comparisons among Java-based CSV parsers - kmoe
https://github.com/uniVocity/csv-parsers-comparison

======
jtheory
Taken with a grain of salt, of course, because the winner is the one running
the competition.

Worth noting:

\- All but the last-place finisher here are actually placed quite closely in
performance, given that they're working on a 3 million record file.

\- Other performance stats that could be more relevant (depending on what
you're doing with CSV...): startup time, memory footprint, any differences in
handling based on very long or very short rows

\- Given similar performance on the above, what's actually more important (for
most uses): elegance & consistency of API, support for various CSV formats
(e.g., Excel vs. RFC-4180 etc. vs. flexibility for rolling your own format),
and sensible error handling options (like: don't blow up if there's one row
with a different number of columns).

I've hardly reviewed any of these, so I can't really ompare them usefully, but
I've been using the Apache Commons CSV parser 1.0 version recently (finally
released after who knows how many years in semi-hibernation!), and it's been
pleasant to work with thus far.

~~~
farmfood
I agree with much of this, especially api simplicity. I usually reach for
openCSV for the same reason.

Definitely applaud the effort, and it would be good to extend the test corpus
in terms of record length and escape complexity. I do think 3M records is on
the low side. Good to see scale tests for 1OM, 100M, 1BN records too.

~~~
jtheory
They're mostly operating on streams, so at some point (based mostly on how GC
is managing, I imagine) the speed will be constant per-row regardless of the
record count.

------
KevinEldon
Jackson has a CSV parser that is not in the list:
[https://github.com/FasterXML/jackson-dataformat-
csv](https://github.com/FasterXML/jackson-dataformat-csv)

------
jtheory
Side note: CSV is boring and unsexy as formats go; but it's also dead easy for
companies to provide and even automate even with minimal technical staff on
hand; this is one of the reasons I'm working with it at the moment ("CSV
uploaded via SFTP" is on the list of inaterfaces we support for data
integrations).

Think of the character escapes involved in something like XML or even JSON,
for example; for CSV you escape _only_ the double-quote, and you escape that
by adding a second double quote -- so you don't need to mess with escaping
your escape character. The main problem with CSV is more that there are
several possible specs for it...

~~~
IsTom
There's only one RFC though.

~~~
Roboprog
RFC 4180:
[http://tools.ietf.org/html/rfc4180](http://tools.ietf.org/html/rfc4180)

... which is compatible with the data I used to get out of Lotus-123 (and some
other things) back in the 80s.

------
Roboprog
I think they missed this parser:
[http://ostermiller.org/utils/CSV.html](http://ostermiller.org/utils/CSV.html)

