
Ad Hoc Data Analysis From The Unix Command Line - yarapavan
http://en.wikibooks.org/wiki/Ad_Hoc_Data_Analysis_From_The_Unix_Command_Line
======
cstross
Generally good stuff, but my personal taste for the command-line stuff would
run more towards awk -- it's lighter weight and easier to pick up than full-
blown perl.

On the other hand, if you're going to do a lot of this stuff it's worth
tracking down a copy of "Data Munging with Perl" by David Cross (
<http://www.manning.com/cross/> ) which, while slightly out of date (it covers
classic Perl 5 OOP, not New Perl) gives a solid grounding in how to do almost
all of this stuff entirely within Perl.

------
dmv
If this is appealing, I recommend trying RecordStreams
(<http://code.google.com/p/recordstream/>). I use it for ad hoc Big Data
command line analysis almost daily.

~~~
abhikshah
Looks interesting, will definitely have to play with it. How does performance
compare with cut,grep and friends on large files?

------
henning
Gary Bernhardt did a screencast showing an example of this kind of Unix
hacking: [http://blog.extracheese.org/2010/04/a-raw-view-into-my-
unix-...](http://blog.extracheese.org/2010/04/a-raw-view-into-my-unix-
hackery.html) He always impresses me with how quickly he moves from one thing
to the next. It's a beautiful thing to watch, to me.

------
barbolani
awk is very powerful. There is a story here
[http://consultuning.blogspot.com/2008/10/optimizing-mysql-
da...](http://consultuning.blogspot.com/2008/10/optimizing-mysql-data-
loads.html) where the time of a data load goes from 10 to less than 1 minute
using three lines of awk.

------
kleiba
This is all good and well -- but good luck with XML data. :-(

~~~
rmc
xmlstarlet is a great commanding tool for manipulating xml data. It's like
sed/awk/grep for xml

