
Ask HN: What's your favorite command-line tool for working with data? - jeroenjanssens
About a year ago, I wrote a blog post about command-line tools for data science [1]. Thanks to HN, I received a lot of valuable comments and pointers to other great command-line tools! In the past 10 months, I have been writing a book titled Data Science at the Command Line [2]. Ever since that blog post, I&#x27;ve been discovering new tools. On the one hand, that&#x27;s quite frustrating because it&#x27;s difficult to keep up and include everything in the book. On the other hand, it&#x27;s fantastic to see that the command line is still very popular!<p>In order to gain a better overview of what&#x27;s available, I thought it&#x27;d be nice to ask on HN what your favorite tools are to work with data. Many new tools have been developed in the past year, but your favorite one may just be 10 years old. You may think that I&#x27;m too late with this question because the book is already finished, but fortunately the book also discusses the underlying concepts which haven&#x27;t changed too much in the past forty years.<p>I&#x27;m very much looking forward to hearing about your favorite command-line tools. Bonus points if you reply in CSV format &quot;command,url,reason\n&quot;, so I can easily scrape the comments :)<p>Thanks!<p>PS. For those who are interested, next Wednesday, I&#x27;ll be doing a webcast about this topic [3], where I might share the outcome of this discussion.<p>[1] https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=6412190<p>[2] http:&#x2F;&#x2F;shop.oreilly.com&#x2F;product&#x2F;0636920032823.do<p>[3] http:&#x2F;&#x2F;www.oreilly.com&#x2F;pub&#x2F;e&#x2F;3115
======
ycombover
GNU sed 4.2.1 awk 3.0.4 and grep 2.4.2, [http://www.git-
scm.com/](http://www.git-scm.com/) , Bundled with Windows Git (which needs an
updated find)

Python with pandas, [http://pandas.pydata.org/](http://pandas.pydata.org/) ,
If I need HDF5 or time series

ffmpeg, ffmpeg.org, If I'm generating animations

* I look forward to your book :)

------
fhuszar
jq,[http://stedolan.github.io/jq/,best](http://stedolan.github.io/jq/,best)
tool to handle JSON files in command line #The tool that immediately comes to
my mind is jq, a tool to transform and process JSON objects. It's one of those
powerful tools that is super easy to learn and once I started using it I just
couldn't live without. The only negative thing I have to say is that it does
not have good native support to transform between JSON and CSV.

------
vram22
After all your command-line data munging (possibly in a Unix pipeline), if you
want to convert the resulting text to PDF (without leaving the command line
:-), check this post:

[xtopdf] PDFWriter can create PDF from standard input:

[http://jugad2.blogspot.in/2013/12/xtopdf-pdfwriter-can-
creat...](http://jugad2.blogspot.in/2013/12/xtopdf-pdfwriter-can-create-pdf-
from.html)

It needs xtopdf and ReportLab (use v1.17) and Python (use 2.2 or higher).

Online overview of xtopdf:
[http://slid.es/vasudevram/xtopdf](http://slid.es/vasudevram/xtopdf)

xtopdf on Bitbucket:

[https://bitbucket.org/vasudevram/xtopdf](https://bitbucket.org/vasudevram/xtopdf)

------
hashtag
Clickable:

[1]
[https://news.ycombinator.com/item?id=6412190](https://news.ycombinator.com/item?id=6412190)

[2]
[http://shop.oreilly.com/product/0636920032823.do](http://shop.oreilly.com/product/0636920032823.do)

[3] [http://www.oreilly.com/pub/e/3115](http://www.oreilly.com/pub/e/3115)

------
gexos
CSVKit:
[https://github.com/onyxfish/csvkit](https://github.com/onyxfish/csvkit) and
The R Project for Statistical Computing:
[http://www.r-project.org/](http://www.r-project.org/)

------
crasshopper
Jeroen, I'm just reading your [1] for the first time now. Are you aware of
Dirk Eddelbuettel's `littler`? I believe that might overlap with your Rio tool
to some degree.

~~~
crasshopper
Here's a tiny tool I like:
[https://gist.github.com/isomorphisms/9537586](https://gist.github.com/isomorphisms/9537586)

It just prints the first `head` line and then random few rows instead of the
top few. For me this is nicer than looking at the top 5 lines every time. I'm
peeping at different parts of the table and thus gradually getting acquainted
with it.

~~~
crasshopper
And Seth Brown wrote a few useful small functions too.

[http://www.drbunsen.org/explorations-in-
unix/](http://www.drbunsen.org/explorations-in-unix/)

(might have also made it to HN)

------
ole_tange
histogram, [https://github.com/ole-
tange/tangetools/blob/master/histogra...](https://github.com/ole-
tange/tangetools/blob/master/histogram/histogram), So you got this table and
you are not really in the mood of firing up GNUplot/a spreadsheet/R but you
would like a quick bar chart here in the terminal. cat data | histogram

------
kazinator
txr, [http://nongnu.org.txr](http://nongnu.org.txr), Use it all the time and
like it a lot! That keeps me interested in working on it. Started five years
and and still at it today, more than 1500 commits later, and 27000 LOC.

------
hellageek
I like cat. Always a good start to a pipe chain for a quick look at a small
data set.

~~~
kazinator
Of course, hellageek is totally kidding!

cat is almost always a bad start to a pipe chain, which has come to be called
"UUOC":

[https://en.wikipedia.org/wiki/Cat_%28Unix%29#Useless_use_of_...](https://en.wikipedia.org/wiki/Cat_%28Unix%29#Useless_use_of_cat)

------
roycoding
jq has proven useful for dealing with JSON. A nice way to reduce or reformat
your data.

[http://stedolan.github.io/jq/](http://stedolan.github.io/jq/)

~~~
rubiquity
I love jq. I've always wondered, is there an equivalent for working with XML?
Not that I prefer XML over JSON, just that I seem to deal with a lot of
services still serving XML. Ugh.

~~~
jeroenjanssens
I have played a bit with XMLStarlet [1] and xml-coreutils [2], but not too
much. Let me know how you find them!

[1] [http://xmlstar.sourceforge.net](http://xmlstar.sourceforge.net)

[2] [http://xml-coreutils.sourceforge.net](http://xml-
coreutils.sourceforge.net)

------
ibstudios
Interactive Ruby Shell.

