

Csvkit - command line utilities for working with csv files - cpenner461
http://csvkit.readthedocs.org/en/latest/index.html

======
knowtheory
It's worth noting that this is a tool that was built by Chris Groskopf (he's
@onyxfish on twitter) while he was working at the Chicago Tribune.

Chris is now working on a Knight Foundation funded project called PANDA to
build a FOSS search appliance for tabular data (especially for CSV and
spreadsheet based files) intended for deployment in news rooms
([http://www.pbs.org/idealab/2011/11/panda-project-
releases-a-...](http://www.pbs.org/idealab/2011/11/panda-project-releases-a-
first-alpha307.html) ).

You can test out the alpha for panda online here:
<http://alpha.pandaproject.net/>

(Chris is also a super nice guy)

------
zvrba
Mucking around with CSV files in command-line is painful. Been there, done
that, got annoyed by many limitations of that approach, bit the bullet and
learned R. It is somewhat weird language, but it was one of the best decisions
I ever made in my professional career: now it is my go-to tool for data
analysis and plotting.

CSV data is easily imported into R where you can easily analyze it, transform
it, plot it -- everything from a unified command-line interface (there also
exist GUIs, but I haven't used them). reshape and plyr packages are worth
learning too. There's also an emacs package for interacting with R (ESS), and
it significantly eases interaction; works also under Win, and is what I use in
my work with R.

TL;DR: nice project, but it's a toy compared to what you can get from R. (Re
unix philosophy: I'm the type of person that likes to get the job done and I
therefore very often choose pragmatism over idealism.)

~~~
archangel_one
I don't think the intention of this tool is for doing statistical analysis of
data, just for manipulating it at a shell prompt. For example, csvcut is
similar to Unix cut in that it's a binary you can pipe data through, which R
isn't.

------
veyron
Is there a CSV AWK?

By that i mean, something that could read the header line in a CSV and
automatically generate those variables for each line.

Demonstration:

    
    
        $ cat test.csv
        Field1,Field2
        1,2
        3,4
        5,6
        $ csvawk '{print Field1+Field2}' test.csv
        3
        7
        11

~~~
ralph
IIRC Aho, Weinberger, and Kernighan's excellent slim tome _The Awk Programming
Language_ implements something similar; awk is used to generate the awk with
Field1 replaced by $1, etc.

~~~
veyron
It's not that simple because you have to properly handle stuff like
"blah,blah" which is supposed to be treated as one block (but blindly using
awk -F, will give you two separate fields)

~~~
ralph
What's not that simple? I didn't say anything was, just citing a reference.
Did you mean to reply to the parent?

------
rwmj
You might also want to look at csvtool (already in Fedora, Debian, RHEL, etc).
It's a command line tool for doing the same thing, written as part of the
OCaml CSV library.

------
benatkin
Related: <http://docs.python-tablib.org/en/latest/index.html>

------
plasma
MySQL has a CSV storage engine, just give it the file to load and you can
read/write using SQL.

------
dquigley
Anyone know of a similar tool based on Ruby?

~~~
brianobush
These are command line utilities, why worry about the language? Just pipe data
into and out of csvkit's tools and level up!

