Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Papa Parse 3 (papaparse.com)
363 points by mholt on July 16, 2014 | hide | past | web | favorite | 83 comments

This looks great, particularly the way the website breaks down the main functionality with plain explanations and code samples. I know that there's a bit of a bugbear with some people when it comes to reimplementing CSV parsing but having it in-browser could be very handy.

Thanks for your feedback. If it helps to know, the Q&A format on the homepage is the result of scratching my head for a while, thinking "How would I explain this to somebody? There's a lot to it, and it's all immediately relevant."

I eventually convinced myself that the last part isn't true. It could be disclosed in parts. So I decided to run the scenario in my mind instead: "I'll give somebody this library and imagine that they'll reveal their use cases one piece at a time and I'll help them through it." Then the conversation flowed naturally and came out while typing.

What really got me was the desperation in the "voice" of the questions, slowly being won over by a calm father figure. I've never had to parse CSV in the browser, but now I feel like I should. Powerful stuff!

I'm also a big fan of libraries where the library name is also the API (ie Papa Parse -> Papa.parse()). It just fills me with glee.

Will the website itself be released under an open license? I could see this website format being useful for a ton of open source projects.

I'd be okay with that, I just hadn't considered it. By format do you mean design or content (or both)? And are you thinking a Creative Commons license?

I meant the design, since the page created such a clear template that would be fantastic for future open source projects to modify. But if you want to release the content in the open too, all the better. I just think that a lot of open source projects which struggle with onboarding would be greatly benefited by using the page as a base.

love the conversational style of the documentation!

Loved the format!

I thought this as well, really good way to explain the benefits and features - through common questions.

Yeah, it explains it so well. Being able to relate to each point and feature is such a key part of explaining your service.

This site might be for a CSV parsing library and not a startup, but it does a better job of telling a story than most startups do. And, to me, that's how you sell and engage users.

The explanation of features is one of the best I've seen on the landing page. I didn't even have to hit the documentation to fully understand usage under a variety of conditions/circumstances.

On the other hand, it's a CSV parser so it's fairly easy to explain the features and list all of them (rather than try to select some important ones).

Do you guys think it would be too cheesy/unoriginal to "steal" this Q&A format for other products? I have a free software library to publish pretty soon, and I absolutely love your way of explaining how it all works in a cool, conversational style.

Innovation is cloning 80% and innovating on the remaining 20%. :)

Edit: Hey why the downvotes, I didn't say it, Andrew Chen said it. Slide 23 http://andrewchen.co/2013/10/14/zero-to-productmarket-fit-pr...

Steve jobs (who stole it from Picasso, who took it from ?) said "Picasso had a saying -- 'good artists copy; great artists steal' -- and we have always been shameless about stealing great ideas."

This was later "clarified" by Bud Tribble: "If you take something and make it your own ... it's your design and that is the dividing line between copying and stealing."

My history teacher in high school had a simple challenge to get an A for the entire year, skipping all tests and quizzes, and free pass to sleep during class: "Give me a 100% unique idea"

Obviously everyone tried and everyone failed. There are no unique ideas, only innovations upon existing and past work of others.

So I find it odd anyone would downvote you for the truth!

>> Steve jobs (who stole it from Picasso, who took it from ?) said "Picasso had a saying -- 'good artists copy; great artists steal' -- and we have always been shameless about stealing great ideas."

That's faintly ironic -- he had too much shame to steal the quote from Picasso. Perhaps he meant that his company employed shameless great artists, not that he was one himself. Steve Jobs has always struck me as a businessman, primarily (which is probably what you want in a CEO, if you owned, say, Apple stock -- not a thief).

Of course it would be unoriginal! Do it anyway, it's a good way to explain usage.

Wait a month, swap "papa" with "mama", and you should be good to go.

That's the coolest, most informative, clear and concise FAQ I've seen! Great job Devs. You are cool! :)

For what would otherwise be considered a mundane task, this page is absolutely epic. Nice work!

Amazing project! Everything seems really well put together. I really like the humor and the Q&A format. Do you mind if I borrow it for v2 of a project I'm releasing soon?

Also, there is not a single negative comment here. That's a HN record. Congratulations!

It's not a CSV library, right? :) Go for it.

And yeah, I've been surprised! I hadn't anticipated so many comments about the web site, since the feature is a Javascript library. It's like going to a movie for the previews.

Definitely grateful for the feedback, though. It's great that quirks that are being constructively reported so they can be fixed.

> I hadn't anticipated so many comments about the web site...

Well, first, I think you need to accept the fact that you made a truly stand-out website, with a useful innovation (the Q&A format with code examples) that people are going to copy (and to good effec).

But second, the comments about the web 'preview' is the GCD of HN - Haskell, .Net, and Python programmers might still take a peak at a JavaScript library web page, even if they won't use it immediately.

Third, and this is most important, it would be significantly better if you published Papa Parse to the NPM registry, because then "real" javascript programmers can add it to their existing project with a simple "npm install papaparse --save". Publishing is trivial (install npm, run 'npm init', set values, then 'npm publish').

Thank you! And you're right about the audience and their interests. That makes sense.

I'd love to package this thing up for NPM. But I'm not a Node.js developer, so I haven't tried running this in Node. I'll see what I can do though to make that happen.

Interestingly, a lot of people are putting up non-node.js libraries on NPM (front-end stuff), which Bower, Browserify, and others take advantage of. So don't be shy, just document that it's for front end (though I have a feeling it'll probably do just fine in node anyway, the lack of dependencies makes it easy)


Greatest common multiple, lowest common denominator

I meant GCD. The LCD between people on HN is arguably their personhood - but that's not useful, since it's true for every online community. The GCD of HN, on the other hand, has to do with a shared love of problem solving, particularly in fair contests. The website of a library, as opposed to the library itself, is an expression of "a solution", which we can all enjoy (and comment on), even if we don't use it in our work. Therefore website exposition speaks to the GCD of HN members.

I was wondering why he was comparing JS to Grand Central Dispatch myself.

It's really great, usually people around here will tear into any detail of your site that they don't like or find unconventional.

Slight off-topic, sorry – I'm having a hard time reading the text on the photo background: http://i.imgur.com/StACWqu.png

Oops. That's awful. What browser are you using?

You can use http://aboutbrowser.com to share not just your browser, but it's capabilities & css support too.

Nice. They should give an option to choose what to share. For example, I don't want to share network info.

Don't think it's a browser issue. The text just doesn't contrast enough with the diagonal shadows in the lower left of the bg image.

I wish that was the case. But he's right, it's not supposed to look like that.

For me the background is grey on that spot and the bg image is only behind the main hero at the top so I definitely think it's a browser issue. Looks great here.

It looks somewhat similar for me on the android browser.

Really digging the Q&A format

"Mini Papa for production use - Fat Papa for debug and development."

Given the jaunty tone of the documentation, I really think they missed an opportunity to somehow have a "Beard Papa" version. Randomly inserts cream puff ingredients or something.

Great presentation. One question: since this is a performance-oriented library, did you consider the possibility of letting the user specify a "chunk" function that receives an array of lines, instead of a "step" function that only receives one? That way, if you are parsing a file with one million lines, you can invoke the callback ten thousand times (with a hundred lines each time) instead of a million times.

Sounds like a great idea. Since that will use more memory, I also want to allow users of the library to customize the chunk sizes. I'll try to build in these features for 3.1.

Your demo page is slick. I was able to parse and inspect a couple files I had laying around locally. Toggling the header and typing options worked great.

Awesome, glad it's useful. I was unsure how visitors would feel about having to open the developer tools, but I feel like they give you a better way to explore the results than just stringifying some JSON.

This seems like it might integrate really well with crossfilter[0] which is a really neat multi-dimensional filtering tool but can completely lock up the browser as the data is loaded.

[0] http://square.github.io/crossfilter/

Yes! Thank you! We're using the d3 CSV parser for our CSV upload functionality [1] but are having a ton of issues with performance and cross-browser support. Can't wait to try this out.

[1] http://geocod.io

Cool, good luck! Disclaimer: some features definitely won't work in older browsers.

I saw your service on HN a while back. It generated some excitement among us here at SmartyStreets, which also uses Papa to process customers' lists without requiring them to send us their whole file. I think these are great use cases.

When you convert CSV to and from JSON(/XML) then the interesting part IMHO is how to convert the hierarchy (of JSON/XML) to CSV and how to encode it (e.g. in a header-line). Do you account for that?

There are some assumptions made. Mainly, that you have an array of arrays or an array of objects to encode as CSV. For an array of arrays, that's easy: each array is a row. For an array of objects, each object is a row keyed by field name.

Data with a varied structure is much trickier, if at all possible, to flatten. I haven't figured out a magic formula for that (yet?).

Thanks for scrolling down :) and answering. Yes assumptions have to be made. I have just been curious, since I have done sth similar recently (http://www.use-the-tree.com). The assumptions I make there are "that these are reasonable business documents" (e.g. invoices, orders). E.g. A List of Customers having Orders with Order Lines (3 levels (or more))

I am very interested in hearing use cases for this, I am having trouble coming up with something useful to do with it, but looks very interesting nonetheless and site's indeed really good.

For one, here's the use case that inspired the project: SmartyStreets uses it so people can process their address lists. It was difficult for some customers to upload their files full of personal data either because of regulations or just being concerned about privacy... so now they can just do it all in their browser.

Scientific and research applications would find this useful... any web app dealing with tabular data could benefit I think. Even if you still have to upload the whole file to a server, being able to instantly render a preview is a big win.

Hm, could somebody explain the multi-threaded part to me? I'm a little confused. I didn't think multiple threads were possible in js. Furthermore, does this only work on Firefox?

Multithreading in js has been available for a while using Web Workers [1].

They are available in most major browser (including recent IE).

[1]: https://developer.mozilla.org/en/docs/Web/Guide/Performance/...

Ok! Interesting. Are web workers gaining traction? This is the first time I've heard of them. Are there any major users/libraries that depend on ww?

It depends on your browser support requirements. On the Brackets project[1], we work within Chromium (and "modern browsers" with our in-browser branch), so we can rely on web workers for doing things like parsing JavaScript for autocompletion.

Web workers are quite useful and I'm sure there are a fair number of webapps that take advantage of them.

[1]: http://brackets.io/

[Error] ReferenceError: Can't find variable: performance (anonyme Funktion) (demo.js, line 155) dispatch (jquery.min.js, line 3) handle (jquery.min.js, line 3)

That should be fixed now. Thanks!

Haven't used the tool. Not sure I'll need it in the immediate future. But damn I love the website and how quickly and easily I understood what it does.

Does it handle quoted CSV fields that spans multiple lines?

Yep. You can try it out on the demo page.

As others have said, really smart way of explaining features. But, I get the following console error on the demo:

Can't find variable: performance

Ah; your browser doesn't support the performance API. (To measure how long it takes.) I'll fix that.

I looked at this amazing parser some time ago, but it depended on jQuery for basically nothing. Great news it doesn't anymore.


As jqueryin has said, the feature list is really good on what I can do. I don't even have to check on docs for my purpose.

This will be huge financial data, operational data, etc that is spit out of age-old systems in CSV format. Excellent.

What happens when a remote server doesn't support Range headers?

Then you can't use the streaming feature. Fortunately, most production-ready servers support the Range header.

The link to File on developer.mozilla.org seems to be wrong.

Thanks, you're right! Fixed.

Awesome name/brand for this tool. I love it.

In Demo it says to get see the data in browsers console.. thats cool .. but how do we Download the data , like the results should directly prompt a Save dialog window..

Hmm... I'll consider that for a later upgrade to the demo. For most people, it seems that seeing the results in the console is enough to know if it does what they need.

yeah we can wait for later stages for the implementation in demo but in the meantime any documentation or pointers on how to achieve the direct download of results..

A quick Google search yields this[0] which would probably do the trick. Just beware of browser compatibility[1] (something that plagues several features of Papa already, hence it requires a modern browser to use them).

[0]: http://stackoverflow.com/questions/3665115/create-a-file-in-...

[1]: https://www.google.com/webhp?q=max%20length%20of%20data%20ur...

I will probably receive flack for this, but am I the only one that thinks it is a little bizarre this much branding and visual design has gone into a CSV parser?

welcome to the silicon valley bubble

Interesting branding with the whole Papa motif.

How do you have CSV data with a comma within a field?

You use another delimiter if you need commas. CSV isn't comma only (contrary to what the name implies), you could use '!' or anything else.

Enclose the field in quotes: a,"b,b",c

Papa likes!

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact