
Ask HN: What do you use to open large CSV files? - tixocloud
As a data analyst in a large organization, we still deal with large Excel files - wondering what the HN analytics community uses to open these large files?
======
uridium
[http://www.schematiq.com](http://www.schematiq.com) \- files are loaded
externally from the Excel process but are transparently accessible from within
Excel.

------
Lon7
I use the sql BULK INSERT command to put it in a table. It's simple, fast, and
gives you options to generate error files containing any rows with malformed
data.

------
gaius
If you are confident that it is well-formed, bulk-load it into SQLite is the
easiest method. There're methods in R and Pandas too.

Or you could load into the Excel data model instead of the grid, look in the
Get & Transform feature. Then do your initial exploration there. Data model
(formerly Power Pivot ) is very underrated IMHO.

------
mtmail
[http://openrefine.org/](http://openrefine.org/)

------
chudi
We use pandas for opening big data files, manipulating the data and then save
it to a pickle file or a sql database for use consumption.

~~~
gaius
DataFrame.to_sql() seems painfully slow, do you have any tips for speeding it
up?

What I do is dump it as CSV again after manipulation and then bulk load it
into the DB using the DB's native tool. It's messy but 10x faster.

------
jklein11
If you can do what ever you want to do by analyzing one line at a time I would
recommend reading the file as a stream for processing.

If you can't do whatever you'd like one line at a time, read the file as a
stream and write it to disk with some database(e.g sqlite, MySQL, Mongo). This
will be slower but might be easier to play with the data.

------
CreMindES
At first, I just look into it with less and then usually use R's
data.table(fread) or some form of SQL for exploration.

------
partisan
Targeting .NET and SQL Server, the LumenWorks CsvReader in conjunction with
the SqlBulkCopy class. The performance quite good and avoids issues with
permissions and file access when attempting to automate bcp commands.

------
dtnewman
When you say "large", do you know _how_ large? Solutions might vary based on
whether you are looking at a million rows or a billion rows.

~~~
tixocloud
I'd say several million to several hundred million rows. Haven't seen a CSV
that has a billion rows yet although I have had to consolidate multiple
million row files into a single CSV.

------
johnmurch
[https://www.csvexplorer.com](https://www.csvexplorer.com)

~~~
tixocloud
This website doesn't get past the corporate firewalls :)

------
limeblack
Are you looking for something for free or open source?

~~~
tixocloud
I'm fine with paid or free. Open source is fine as long as there is a pre-
built package.

