
Show HN: Papa Parse 3 - mholt
http://papaparse.com
======
colinramsay
This looks great, particularly the way the website breaks down the main
functionality with plain explanations and code samples. I know that there's a
bit of a bugbear with some people when it comes to reimplementing CSV parsing
but having it in-browser could be very handy.

~~~
mholt
Thanks for your feedback. If it helps to know, the Q&A format on the homepage
is the result of scratching my head for a while, thinking "How would I explain
this to somebody? There's a lot to it, and it's all immediately relevant."

I eventually convinced myself that the last part isn't true. It could be
disclosed in parts. So I decided to run the scenario in my mind instead: "I'll
give somebody this library and imagine that they'll reveal their use cases one
piece at a time and I'll help them through it." Then the conversation flowed
naturally and came out while typing.

~~~
cheese1756
Will the website itself be released under an open license? I could see this
website format being useful for a ton of open source projects.

~~~
mholt
I'd be okay with that, I just hadn't considered it. By format do you mean
design or content (or both)? And are you thinking a Creative Commons license?

~~~
cheese1756
I meant the design, since the page created such a clear template that would be
fantastic for future open source projects to modify. But if you want to
release the content in the open too, all the better. I just think that a lot
of open source projects which struggle with onboarding would be greatly
benefited by using the page as a base.

------
jqueryin
The explanation of features is one of the best I've seen on the landing page.
I didn't even have to hit the documentation to fully understand usage under a
variety of conditions/circumstances.

~~~
mjburgess
On the other hand, it's a CSV parser so it's fairly easy to explain the
features and list all of them (rather than try to select some important ones).

------
thegeomaster
Do you guys think it would be too cheesy/unoriginal to "steal" this Q&A format
for other products? I have a free software library to publish pretty soon, and
I absolutely love your way of explaining how it all works in a cool,
conversational style.

~~~
domiono
Innovation is cloning 80% and innovating on the remaining 20%. :)

Edit: Hey why the downvotes, I didn't say it, Andrew Chen said it. Slide 23
[http://andrewchen.co/2013/10/14/zero-to-productmarket-fit-
pr...](http://andrewchen.co/2013/10/14/zero-to-productmarket-fit-
presentation/)

~~~
rschmitty
Steve jobs (who stole it from Picasso, who took it from ?) said "Picasso had a
saying -- 'good artists copy; great artists steal' \-- and we have always been
shameless about stealing great ideas."

This was later "clarified" by Bud Tribble: "If you take something and make it
your own ... it's your design and that is the dividing line between copying
and stealing."

My history teacher in high school had a simple challenge to get an A for the
entire year, skipping all tests and quizzes, and free pass to sleep during
class: "Give me a 100% unique idea"

Obviously everyone tried and everyone failed. There are no unique ideas, only
innovations upon existing and past work of others.

So I find it odd anyone would downvote you for the truth!

~~~
mtdewcmu
>> Steve jobs (who stole it from Picasso, who took it from ?) said "Picasso
had a saying -- 'good artists copy; great artists steal' \-- and we have
always been shameless about stealing great ideas."

That's faintly ironic -- he had too much shame to steal the quote from
Picasso. Perhaps he meant that his company employed shameless great artists,
not that he was one himself. Steve Jobs has always struck me as a businessman,
primarily (which is probably what you want in a CEO, if you owned, say, Apple
stock -- not a thief).

------
shekhar101
That's the coolest, most informative, clear and concise FAQ I've seen! Great
job Devs. You are cool! :)

------
jakejake
For what would otherwise be considered a mundane task, this page is absolutely
epic. Nice work!

------
owenversteeg
Amazing project! Everything seems really well put together. I really like the
humor and the Q&A format. Do you mind if I borrow it for v2 of a project I'm
releasing soon?

Also, there is not a single negative comment here. That's a HN record.
Congratulations!

~~~
mholt
It's not a CSV library, right? :) Go for it.

And yeah, I've been surprised! I hadn't anticipated so many comments about the
web site, since the feature is a Javascript library. It's like going to a
movie for the previews.

Definitely grateful for the feedback, though. It's great that quirks that are
being constructively reported so they can be fixed.

~~~
javajosh
_> I hadn't anticipated so many comments about the web site..._

Well, first, I think you need to accept the fact that you made a truly stand-
out website, with a useful innovation (the Q&A format with code examples) that
people are going to copy (and to good effec).

But second, the comments about the web 'preview' is the GCD of HN - Haskell,
.Net, and Python programmers might still take a peak at a JavaScript library
web page, even if they won't use it immediately.

Third, and this is most important, it would be significantly better if you
published Papa Parse to the NPM registry, because then "real" javascript
programmers can add it to their existing project with a simple "npm install
papaparse --save". Publishing is trivial (install npm, run 'npm init', set
values, then 'npm publish').

~~~
benaiah
s/GCD/LCD

Greatest common multiple, lowest common denominator

~~~
javajosh
I meant GCD. The LCD between people on HN is arguably their personhood - but
that's not useful, since it's true for every online community. The GCD of HN,
on the other hand, has to do with a shared love of problem solving,
particularly in fair contests. The website of a library, as opposed to the
library itself, is an expression of "a solution", which we can all enjoy (and
comment on), even if we don't use it in our work. Therefore website exposition
speaks to the GCD of HN members.

------
damncabbage
Slight off-topic, sorry – I'm having a hard time reading the text on the photo
background: [http://i.imgur.com/StACWqu.png](http://i.imgur.com/StACWqu.png)

~~~
mholt
Oops. That's awful. What browser are you using?

~~~
grossvogel
Don't think it's a browser issue. The text just doesn't contrast enough with
the diagonal shadows in the lower left of the bg image.

~~~
mholt
I wish that was the case. But he's right, it's not supposed to look like that.

------
prezjordan
Really digging the Q&A format

------
ekmartin
"Mini Papa for production use - Fat Papa for debug and development."

~~~
barnabask
Given the jaunty tone of the documentation, I really think they missed an
opportunity to somehow have a "Beard Papa" version. Randomly inserts cream
puff ingredients or something.

------
Camillo
Great presentation. One question: since this is a performance-oriented
library, did you consider the possibility of letting the user specify a
"chunk" function that receives an array of lines, instead of a "step" function
that only receives one? That way, if you are parsing a file with one million
lines, you can invoke the callback ten thousand times (with a hundred lines
each time) instead of a million times.

~~~
mholt
Sounds like a great idea. Since that will use more memory, I also want to
allow users of the library to customize the chunk sizes. I'll try to build in
these features for 3.1.

------
kcmarshall
Your demo page is slick. I was able to parse and inspect a couple files I had
laying around locally. Toggling the header and typing options worked great.

~~~
mholt
Awesome, glad it's useful. I was unsure how visitors would feel about having
to open the developer tools, but I feel like they give you a better way to
explore the results than just stringifying some JSON.

------
mnutt
This seems like it might integrate really well with crossfilter[0] which is a
really neat multi-dimensional filtering tool but can completely lock up the
browser as the data is loaded.

[0]
[http://square.github.io/crossfilter/](http://square.github.io/crossfilter/)

------
thecodemonkey
Yes! Thank you! We're using the d3 CSV parser for our CSV upload functionality
[1] but are having a ton of issues with performance and cross-browser support.
Can't wait to try this out.

[1] [http://geocod.io](http://geocod.io)

~~~
mholt
Cool, good luck! Disclaimer: some features definitely won't work in older
browsers.

I saw your service on HN a while back. It generated some excitement among us
here at SmartyStreets, which also uses Papa to process customers' lists
without requiring them to send us their whole file. I think these are great
use cases.

------
mqsiuser
When you convert CSV to and from JSON(/XML) then the interesting part IMHO is
how to convert the hierarchy (of JSON/XML) to CSV and how to encode it (e.g.
in a header-line). Do you account for that?

~~~
mholt
There are some assumptions made. Mainly, that you have an array of arrays or
an array of objects to encode as CSV. For an array of arrays, that's easy:
each array is a row. For an array of objects, each object is a row keyed by
field name.

Data with a varied structure is much trickier, if at all possible, to flatten.
I haven't figured out a magic formula for that (yet?).

~~~
mqsiuser
Thanks for scrolling down :) and answering. Yes assumptions have to be made. I
have just been curious, since I have done sth similar recently
([http://www.use-the-tree.com](http://www.use-the-tree.com)). The assumptions
I make there are "that these are reasonable business documents" (e.g.
invoices, orders). E.g. A List of Customers having Orders with Order Lines (3
levels (or more))

------
mescalito
I am very interested in hearing use cases for this, I am having trouble coming
up with something useful to do with it, but looks very interesting nonetheless
and site's indeed really good.

~~~
mholt
For one, here's the use case that inspired the project: SmartyStreets uses it
so people can process their address lists. It was difficult for some customers
to upload their files full of personal data either because of regulations or
just being concerned about privacy... so now they can just do it all in their
browser.

Scientific and research applications would find this useful... any web app
dealing with tabular data could benefit I think. Even if you still have to
upload the whole file to a server, being able to instantly render a preview is
a big win.

------
peaton
Hm, could somebody explain the multi-threaded part to me? I'm a little
confused. I didn't think multiple threads were possible in js. Furthermore,
does this only work on Firefox?

~~~
padenot
Multithreading in js has been available for a while using Web Workers [1].

They are available in most major browser (including recent IE).

[1]:
[https://developer.mozilla.org/en/docs/Web/Guide/Performance/...](https://developer.mozilla.org/en/docs/Web/Guide/Performance/Using_web_workers)

~~~
peaton
Ok! Interesting. Are web workers gaining traction? This is the first time I've
heard of them. Are there any major users/libraries that depend on ww?

~~~
dangoor
It depends on your browser support requirements. On the Brackets project[1],
we work within Chromium (and "modern browsers" with our in-browser branch), so
we can rely on web workers for doing things like parsing JavaScript for
autocompletion.

Web workers are quite useful and I'm sure there are a fair number of webapps
that take advantage of them.

[1]: [http://brackets.io/](http://brackets.io/)

------
juergen
[Error] ReferenceError: Can't find variable: performance (anonyme Funktion)
(demo.js, line 155) dispatch (jquery.min.js, line 3) handle (jquery.min.js,
line 3)

~~~
mholt
That should be fixed now. Thanks!

------
jflowers45
Haven't used the tool. Not sure I'll need it in the immediate future. But damn
I love the website and how quickly and easily I understood what it does.

------
stevoski
Does it handle quoted CSV fields that spans multiple lines?

~~~
mholt
Yep. You can try it out on the demo page.

------
atacrawl
As others have said, really smart way of explaining features. But, I get the
following console error on the demo:

 _Can 't find variable: performance_

~~~
mholt
Ah; your browser doesn't support the performance API. (To measure how long it
takes.) I'll fix that.

------
fiatjaf
I looked at this amazing parser some time ago, but it depended on jQuery for
basically nothing. Great news it doesn't anymore.

~~~
retroencabulato
Why?

------
dmachop
As jqueryin has said, the feature list is really good on what I can do. I
don't even have to check on docs for my purpose.

------
santoshsankar
This will be huge financial data, operational data, etc that is spit out of
age-old systems in CSV format. Excellent.

------
conickal
What happens when a remote server doesn't support Range headers?

~~~
mholt
Then you can't use the streaming feature. Fortunately, most production-ready
servers support the Range header.

------
mschulze
The link to File on developer.mozilla.org seems to be wrong.

~~~
mholt
Thanks, you're right! Fixed.

------
sjs382
Awesome name/brand for this tool. I love it.

------
shamsulbuddy
In Demo it says to get see the data in browsers console.. thats cool .. but
how do we Download the data , like the results should directly prompt a Save
dialog window..

~~~
mholt
Hmm... I'll consider that for a later upgrade to the demo. For most people, it
seems that seeing the results in the console is enough to know if it does what
they need.

~~~
shamsulbuddy
yeah we can wait for later stages for the implementation in demo but in the
meantime any documentation or pointers on how to achieve the direct download
of results..

~~~
mholt
A quick Google search yields this[0] which would probably do the trick. Just
beware of browser compatibility[1] (something that plagues several features of
Papa already, hence it requires a modern browser to use them).

[0]: [http://stackoverflow.com/questions/3665115/create-a-file-
in-...](http://stackoverflow.com/questions/3665115/create-a-file-in-memory-
for-user-to-download-not-through-server)

[1]:
[https://www.google.com/webhp?q=max%20length%20of%20data%20ur...](https://www.google.com/webhp?q=max%20length%20of%20data%20uri&safe=active#q=max+length+of+data+uri&safe=active)

------
retroencabulato
I will probably receive flack for this, but am I the only one that thinks it
is a little bizarre this much branding and visual design has gone into a CSV
parser?

~~~
wwwwwwwwww
welcome to the silicon valley bubble

------
dyeje
Interesting branding with the whole Papa motif.

------
nobotty
How do you have CSV data with a comma within a field?

~~~
accatyyc
You use another delimiter if you need commas. CSV isn't comma only (contrary
to what the name implies), you could use '!' or anything else.

------
w00kie
Papa likes!

