

Data Science Hand Tools: Unix Power tools Revisited - yarapavan
http://radar.oreilly.com/2011/04/data-hand-tools.html

======
silentbicycle
Good post. It's also not hard to throw together quick data stream-processing
tools like that in awk or C. I have a couple: drop (opposite of "head"), til
("til 10" -> "0 1 2 3 4 5 6 7 8 9"), slice (read given 4kb chunk numbers from
file, via mmap), and dozens of little scripts. I don't really care for shell
scripting, but awk is great.

If you want more stuff like this, check out _The Unix Programming Environment_
and _The Awk Programming Language_. Both co-written by Brian Kernighan.

~~~
mturmon
Doesn't

    
    
        tail +n
    

do what your "drop" does? (Starts reading n lines from the beginning.)

~~~
silentbicycle
It does! Didn't know that, thanks. :) I didn't check, given the name. ("til"
and "drop" are names from k.)

------
achompas
Shocked, _shocked!_ to see an O'Reilly data article that does not pimp their
Strata conference or their damn "data science" books. This is a great
refresher on command-line tools, and I would love to see more stuff like this.

~~~
apl

      > or their damn "data science" books.
    

Are they that bad?

------
solsenNet
Unix Power Tools is a great reference for data analysis.

The author does not mention the chapter on: "You Can't Quite Call this
Editing" which I used extensively for some pretty involved flat file data
analysis.

covers the great tools:

cut

tr

sort

uniq

also:

piping to grep and grep -v (invert match)

I also came across this great line in the sort man page for allocating in-
memory usage SIZE for the sorting data:

SIZE may be followed by the following multiplicative suffixes: % 1% of memory,
b 1, K 1024 (default), and so on for M, G, T, P, E, Z, Y.

...that says about all you need to know about unix ;)

------
raghava
That nifty map-reduce block using awk-sort-uniq-xargs in the post is very
smart!

I forward <http://tldp.org/LDP/abs/html/textproc.html> to everyone who joins
my team newly; though only a very few of them make use of it, it would help a
lot!

