
Show HN: CSV Explorer (YC F1) - Explore CSVs with Millions of Rows - jastr
https://www.csvexplorer.com/
======
xaybey
I usually use csvkit
([https://csvkit.readthedocs.io/en/1.0.1/index.html](https://csvkit.readthedocs.io/en/1.0.1/index.html)).
There are commands to list the columns, filter, browse the data in a somewhat
formatted way with less. Typing the commands is a huge pain though, and I
would be very interested in a tool that could _instantly_ pop open and let me
peruse. Excel can take minutes to load and eagerly does a lot of unhelpful
formatting on things like dates and decimals.

That being said I would never want to (and often legally cannot) etl my data
to some third party. That would be terribly slow. But I would happily pay for
a nice desktop tool to do it for me; command line or GUI.

~~~
a3n
At a former job, our embedded device logs decoded to csv. Some of them were
too large for Excel.

Pandas handled them without a burp. Pandas in Jupyter (Ipython Notebook) was a
godsend.

There's a minimal amount of variable setup, but once you've done that once or
twice it's easy.

Of course, any analysis or manipulation takes a bit of python code, but I see
that as a feature, not the least because you can read it right there in the
open instead of having to hunt for formulas in cells.

~~~
donquichotte
Same here. I load the data using pandas or parsing the rows by hand in go.
It's interesting to watch your RAM getting filled up while the data is loaded.

Looks like this tool is for non-programmers, it's interesting to see that
there seems to be a market here.

------
jastr
Hi HN!

A few months ago, a college friend reached out because his consulting company
had just received a 60 Gb spreadsheet, and they didn't know what to do with
it! They actually tried opening it in Excel.

I'm excited to Show HN CSV Explorer - a simple web tool for opening really big
CSVs! Try it out, and let me know how it goes!

~~~
goatlover
Open it with a programming language like R, Julia or Python that supports the
data frames. No reason to upload 60 gigs to the web or Excel.

~~~
jastr
This project came out of some of my contracting work for consultants,
journalists, lawyers - people who don't know what a command line is.

------
deadringerr
Cool idea - Having to work with sensitive data, it'd be great to have this
functionality without importing/uploading to the internet.

~~~
jastr
Thanks - having worked with health data for a few years I can relate.
Unfortunately, I don't have plans for a desktop app right now.

~~~
seanp2k2
Think not of a desktop app, but of doing everything client-side in JS. That
way, it's still a web app, but you're not schlepping [sensitive|large] data
between front and back-ends. Also, by offloading the work onto clients, it
scales much better - you could host the app on a CDN and have no real back-
end.

~~~
jastr
I originally wrote it in entirely client side JS, but it didn't scale nicely
past a few hundred thousand rows. For the really big datasets, CSV Explorer
loads them into Redshift - queries takes a few seconds!

------
patwalls
This is really awesome. What sparked this idea?

Not sure if you have this feature, but I think it would be really cool if you
could open APIs to your own datasets. This could be really useful for
enterprise applications that do a lot of flat file imports/exports to
push/pull data.

~~~
jastr
Some consultant friends asked for help when they got a 60Gb CSV file from
their client! Since then, I've also found that lots of tech companies share
CSV internally. I've also worked with journalists attempting to look at large
public government datasets.

~~~
seanp2k2
Plenty of Big Data things also work with CSV. Like >1TB datasets big. It's
pretty insane, but it works.

------
dromenkoning
This tool would be great were it not for me (rightfully) getting fired and
possibly sued by any of my clients if I uploaded even a single file. In fact,
what kind of business with millions-of-rows kind of files would entrust said
datasets to a service that to me seems strangely vague [1] on how the data is
secured, or where it is actually going to be physically stored.

[1]
[https://www.csvexplorer.com/legal/privacy/](https://www.csvexplorer.com/legal/privacy/)

------
jastr
Update: A one minute demo video -
[https://youtu.be/RBiDL5neWDc](https://youtu.be/RBiDL5neWDc)

------
emodendroket
Is there any reason this tool couldn't be 100% client-side?

~~~
jastr
Perhaps - I tried a client-side implementation, but I had issues scaling past
a few hundred thousand rows. I understand hesitations of uploading data to the
cloud, but I now have users looking at hundreds of millions of rows in seconds
thanks to Postgres!

~~~
Zaheer
You can run PostGres natively as well and do the same thing no?

~~~
emodendroket
Not really an all-in-browser solution then.

------
hprotagonist
My solution to this problem, so far, has been ipython and pandas. (and maybe
jupyter notebook for visualization and sharing).

What's the value-add here?

~~~
jastr
It's like a Jupyter notebook for non-engineers, people who don't know what
python is, eg. consultants.

------
cocktailpeanuts
What's "YC F1"?

~~~
jastr
I was in the first batch of YCombinator's Fellowship.

------
chrisweekly
Does this do stuff lnav can't?

------
pclark
you should integrate with commatocolumn.com! ;)

