

CSVjs: Basic CSV parsing and encoding in JavaScript - knrz
https://github.com/knrz/CSVjs

======
userbinator
> text.split("\n")

I don't want to be negative but this will not work on CSV files that contain
newlines inside columns. You should also implement escaping quotes - this is
_very important_ to ensure that the data can round-trip! Good for a first
attempt, but as simple as CSV may look, there are subtle points that you
should be aware of if you want to handle most if not all CSV out there. I've
worked with lots of other implementations that get these little things wrong
too, and it is particularly aggravating when CSV is supposed to be a pretty
standard interchange format for tabular data. Please refer to RFC4180 for the
details.

~~~
gcb0
yet, that is what most csv code does. heck, ms excell did that until a few
versions ago.

~~~
gav
MS Excel has supported CSV that had newlines in fields for a least a decade.

~~~
EvanPlaice
MS Excel also defaults to using /r as the newline char on the OSX version
(despite /r being obsolete in OSX since v10.1). It's true that CSV parsers
have been around in various forms for a long time but 'complete'
implementations in JavaScript haven't been available until recently.

------
hglaser
This kind of thing is incredibly useful.

Large frameworks are great, but generally I only adopt them when starting a
new project; or the occasional large refactoring project.

Whereas targeted problem-solving libraries like this one get adopted all day
long. Next time I need to parse a CSV on the client, you can bet the first
thing I'll do is Google to see if there's a library I can download and be done
with it.

So, great job solving a targeted problem! Keep building things and keep
contributing. People will find code like this endlessly useful.

~~~
knowtheory
(sorry accidentally hit the downvote button, so commenting to give you the
karma back. hmm that used to work afaik, doesn't seem like it helped.)

~~~
hglaser
I'll live. :)

------
knrz
First anything I've contributed to the community.

As a 16 year-old just getting into software development, it'd be amazing to
get feedback from hn.

~~~
batuhanicoz
Some feedback:

\- Vanilla JS would be cleaner

\- Make it useable with Node. (Check if "window" variable exists, if not, do
export CSV as a module. Should be this easy AFAIK)

\- You have some typos on the README.md

\- Put this on bower and npm

Good work. I was writing software at 16 but I wasn't doing it open source, I
wish I did.

~~~
knrz
Vanilla JS is up :) It's now usable with both Node and AMD. Typos should all
be fixed. I'll put up on bower and npm once I become RFC4180 compliant.

Thanks for the feedback!

~~~
cabbeer
What's AMD? I keep seeing that acronym but searching it just returns "Advanced
Micro Devices"

~~~
batuhanicoz
Asynchronous Module Definition[0].

[0]
[http://en.wikipedia.org/wiki/Asynchronous_module_definition](http://en.wikipedia.org/wiki/Asynchronous_module_definition)

------
nthitz
d3.js provides a pretty solid JS csv parser
[https://github.com/mbostock/d3/blob/master/src/dsv/dsv.js](https://github.com/mbostock/d3/blob/master/src/dsv/dsv.js)

[https://github.com/mbostock/d3/wiki/CSV](https://github.com/mbostock/d3/wiki/CSV)

~~~
knowtheory
Mike Bostock pointed out that he had already extracted D3's DSV parsing into
it's own repo when i expressed my disappointment i couldn't use d3's w/o
rewriting parts of D3's DSV parser.

It lives here:
[https://github.com/mbostock/dsv](https://github.com/mbostock/dsv)

ps it properly scans & tokenizes CSVs and handles quoted fields fine as a
consequence

------
seldo
If you are doing large-scale CSV parsing or encoding in Node, you may also
find these two packages useful:

[https://www.npmjs.org/package/binary-
csv](https://www.npmjs.org/package/binary-csv)

[https://www.npmjs.org/package/csv-write-
stream](https://www.npmjs.org/package/csv-write-stream)

They are both written with handling very large files in mind, so they use
buffers and streams for i/o.

------
Monkeyget
The CSV 'format' is hell. I just wrote a blog highlighting some of the issues
: [http://tburette.github.io/blog/2014/05/25/so-you-want-to-
wri...](http://tburette.github.io/blog/2014/05/25/so-you-want-to-write-your-
own-CSV-code/)

~~~
EvanPlaice
It's not as complex as you make it sound. The parser should pass along any
non-terminal characters without issue.

The rest of the edge cases (ex newlines in data) can be handled by using a
proper DFM (Deterministic Finite State Machine). None of that
String.split('\n').split(',') garbage.

If you're processing text as binary without using a string reader that can
differentiate between UTF-8 and ASCII then you're doing it wrong.

With that said, I agree completely that people should use an established
library. Code that has been viewed, used, broken by thousands of users is
infinitely better than any home grown variant.

Source: With lots of blood sweat and tears I authored one of those 'solid'
libraries.

------
EvanPlaice
Ever heard of jquery-csv?

I wrote the jquery-csv over two years ago with the goal of being the first
completely RFC compliant CSV parser for Javascript.

It integrates with the CSV namespace but doesn't depend on it, uses pure
vanilla JS, works with Node.js.

At the very least, if you want to claim RFC compliance you should have a test
runner that verifies that your code doesn't break on the edge cases.

[https://code.google.com/p/jquery-
csv/source/browse/test/test...](https://code.google.com/p/jquery-
csv/source/browse/test/test.html)

Once you have your state machine working, your best bet to optimize speed is
by limiting string copy operations. I managed this by using a regex tokenizer
that groups any non-terminals (ex data between quotes).

I wrote it with the intent of providing a lib that can effectively parse CSV
data from the browser (loaded remotely via AJAX or locally via the HTML5 File
API).

The biggest weakness of parsing CSV on the browser is the inability to process
data streams. That 2GB memory limit on JS scrips in the browser becomes a
fundamental weakness when you're trying to process large CSV files.

CSV in general is terrible for data storage, unless it's only used for
serialization because arbitrarily reading any point in the input data stream
requires the parser to start from the beginning. You're basically screwed if
you can't hold the whole dataset in memory as a 2D array.

------
maxerickson
Consider starting at the default behavior with the documentation. For example,
in the parse section, if no header option is provided, the first row will be
returned as an array correct?

Start there: "The default behavior is to return each row of the CSV as an
array." then "If the CSV file has a header, pass in {header: true} to get back
an object using the header values as keys." then "If the file does not have a
header, pass in ...".

Also, can a user override a header that is present in the data?

~~~
knrz
Thanks for the feedback on the docs. Working on them now.

As for overriding the header, I think its possible utility offsets the ~3 LoC
that it adds. Coming next commit.

------
mrfusion
So will this let me make a csv file in JavaScript and actually then let the
user download it as a file?

That would really simplify a lot of my workflows instead of having to make
separate csv and HTML views I django.

~~~
batuhanicoz
You could create a download file, but only in newer browsers.

We have an internal application which requires CSV export, we use something
like this:

    
    
            csv = cleanTurkishChars(makeCSV(data));
            csv = 'data:application/csv;charset=utf-8,' + encodeURIComponent(csv);
            $("#csvexport").attr({
                'href': csv,
                'target': '_blank'
            });

~~~
mrfusion
Thanks! Any idea where I can read more about it? What browsers support it? Are
there file size limitations?

------
soggypopsicle
in case someone wants a more mature parser: [https://github.com/koles/ya-
csv](https://github.com/koles/ya-csv)

------
IvanK_net
Hey, I needed JS CSV parser few days ago. Here is my parser, I bet it is 100
times faster than yours :)

function parseCSV(str) { var obj = {}; var lines = str.split("\n"); var attrs
= lines[0].split(","); for(var i=0; i<attrs.length; i++) obj[attrs[i]] = [];
for(var i=1; i<lines.length-1; i++) { var line = lines[i].split(","); for(var
j=0; j<line.length; j++) obj[attrs[j]].push(line[j]); } return obj; }

~~~
scrabble
Looks like you've missed values enclosed in quotes that might contain new line
characters or commas -- very important for things like address fields.

