

Visualize any public CSV on GitHub in a few clicks - glaugh
http://blog.statwing.com/visualize-any-public-csv-on-github-in-a-few-clicks/

======
sureshv
Looks like you handle CSV's and non-Github links to datasets. You should add
transparent handling of gz/zip files as well as xls. This would be useful for
poking around government datasets.

~~~
glaugh
Yeah, agreed. We do actually handle xls files in our main product (this is
sort of a demo of an API connection). Probably should have enabled that for
this little implementation.

Can't yet take gz/zip files, probably should though.

Thanks/cheers

------
paulsef11
Very cool!

Is there any limitation on the delimiter? I often choose less frequently used
characters like * as delimiters to avoid parsing values in the csv. It would
be nice to see an option to specify or even detect the delimiter. Same could
be said for headers.

Thanks for sharing!

------
lionheart
Very cool. Is it possible to easily embed this into an application?

I bet my users would love that.

~~~
glaugh
Yup!

Here's details:
[https://www.statwing.com/overview/integrations](https://www.statwing.com/overview/integrations)

API docs:
[https://www.statwing.com/docs/api](https://www.statwing.com/docs/api)

------
scrollaway
I misread and thought it was an app to visualize CVS repositories on Github.

Man that would be cool... it's insane CVS is still used by more than one
person on this planet.

------
sheetjs
I wonder if the entire stack could be run in the browser (click-drag to submit
a file, perform statistical analysis in browser and display results)

------
caio1982
Couldn't quickly find the limitations when parsing these .csv files. How many
lines in them would be still ok?

~~~
glaugh
Things will definitely start slowing down pretty linearly after 100k lines,
but we often see millions, and most files shouldn't break us as long as
they're not over ~500MB.

Edit: Fleshed out explanation

~~~
toomuchtodo
Do you try to check a resource with an http head request to ensure its under
500MB before ingesting?

~~~
glaugh
Nope. Folks who sign up for the API generally have some awareness of what size
files work and what don't. And since they're uploading for their users, we're
aligned around wanting those users to have a good experience.

We haven't worried about it in this particular implementation around the API
because we didn't run across many raw github files that were big. And even
when we do get the odd big one, we just refuse to process it once we receive
it, so it doesn't hurt us much if someone sends us a few GBs.

