Hacker News new | past | comments | ask | show | jobs | submit login
Visualize any public CSV on GitHub in a few clicks (statwing.com)
69 points by glaugh on Apr 29, 2014 | hide | past | web | favorite | 11 comments

Looks like you handle CSV's and non-Github links to datasets. You should add transparent handling of gz/zip files as well as xls. This would be useful for poking around government datasets.

Yeah, agreed. We do actually handle xls files in our main product (this is sort of a demo of an API connection). Probably should have enabled that for this little implementation.

Can't yet take gz/zip files, probably should though.


Very cool!

Is there any limitation on the delimiter? I often choose less frequently used characters like * as delimiters to avoid parsing values in the csv. It would be nice to see an option to specify or even detect the delimiter. Same could be said for headers.

Thanks for sharing!

Very cool. Is it possible to easily embed this into an application?

I bet my users would love that.

I misread and thought it was an app to visualize CVS repositories on Github.

Man that would be cool... it's insane CVS is still used by more than one person on this planet.

I wonder if the entire stack could be run in the browser (click-drag to submit a file, perform statistical analysis in browser and display results)

Couldn't quickly find the limitations when parsing these .csv files. How many lines in them would be still ok?

Things will definitely start slowing down pretty linearly after 100k lines, but we often see millions, and most files shouldn't break us as long as they're not over ~500MB.

Edit: Fleshed out explanation

Do you try to check a resource with an http head request to ensure its under 500MB before ingesting?

Nope. Folks who sign up for the API generally have some awareness of what size files work and what don't. And since they're uploading for their users, we're aligned around wanting those users to have a good experience.

We haven't worried about it in this particular implementation around the API because we didn't run across many raw github files that were big. And even when we do get the odd big one, we just refuse to process it once we receive it, so it doesn't hurt us much if someone sends us a few GBs.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact