
CSVfix is a tool for manipulating CSV data - pessimizer
https://code.google.com/p/csvfix/
======
thezilch
I can't speak for this lib, but I've had a lot of success with csvkit [0],
json2csv [1] and json directly with jq [2].

[0] [https://github.com/onyxfish/csvkit](https://github.com/onyxfish/csvkit)

[1] [https://github.com/jehiah/json2csv](https://github.com/jehiah/json2csv)

[2] [https://github.com/stedolan/jq](https://github.com/stedolan/jq)

~~~
susi22
This is a great list, but IMO lacks the most powerful (but unfortunately
unpopular) one:

[https://github.com/dbro/csvquote](https://github.com/dbro/csvquote)

Apply it first, then do the normal processing with GNU coreutils and you'll
cover most use cases.

~~~
dbro
Thanks very much! You just made my day!

------
georgehotelling
I've had some success cleaning up CSV data in the past using OpenRefine[0]
(née Google Refine, and Freebase Gridworks before that). It is a really
powerful tool for getting data in a consistent format.

[0] [http://openrefine.org/](http://openrefine.org/)

~~~
baldfat
I love open refine but I like scripted events so I can repeat it easily and
move on to other tools. Currently I use Python and Pandas.

I want a way to use Open Refine and export the code to Python.

------
elchief
To quickly examine CSV data in PostgreSQL, you can do this:

    
    
      CREATE EXTENSION file_fdw;
    
      CREATE SERVER my_server FOREIGN DATA WRAPPER file_fdw;
    
      CREATE FOREIGN TABLE my_csv (
        field_a text,
        field_b smallint,
        ...
      ) SERVER my_server
      OPTIONS ( filename 'some/file/path.csv', format 'csv', header 'true' );
    
      select * from my_csv;

~~~
eli
The first python script I wrote is a fairly ugly hack that converts CSV to a
SQLITE-compatible SQL so you can query it.
[https://github.com/elidickinson/csv-
tools/blob/master/csv2sq...](https://github.com/elidickinson/csv-
tools/blob/master/csv2sql.py)

~~~
stevekemp
I once wrote a tool to insert Apache logfiles into a SQLite database to run
queries against.

I'm frequently surprised by how popular that project remains:

    
    
      * http://steve.org.uk/Software/asql

~~~
eli
That reminds me, there's actually a funky old Microsoft skunkworks project
that lets you query CSVs and Apache logfiles and all sorts of stuff via SQL:
[http://www.microsoft.com/en-
us/download/details.aspx?id=2465](http://www.microsoft.com/en-
us/download/details.aspx?id=2465) I have no idea if it even still runs on
current windows versions.

------
Bjartr
The best tool I've ever used for cleaning up large datasets, CSV or otherwise,
is Google Refine[1]

It takes some time to get used to the workflow, but it's very powerful and
does a great job making messy data usable.

[1][https://code.google.com/p/google-
refine/](https://code.google.com/p/google-refine/)

------
voltagex_
I've used this in an (unfortunately unsuccessful) hackathon project - it
definitely got us out of CSV-cleaning-hell.

------
_e
I'm surprised no one has mentioned Microsoft's Log Parser. It provides query
access to CSV, XML and many types of log files.

[http://technet.microsoft.com/en-
us/scriptcenter/dd919274.asp...](http://technet.microsoft.com/en-
us/scriptcenter/dd919274.aspx)

------
RexRollman
Way off topic but one of my favorite iOS apps is CSVtouch. It works well with
.csv files and it lets me keep my data is an open format (although it can only
view files and not edit them).

