

Basic Command-Line Data Processing in Linux - symkat
http://symkat.com/1/five-text-processing-tools-you-should-know/

======
pbrumm
Even after using linux command line for years it is always useful to see how
others use it. I tend to overuse perl regex's to massage the data how I want
it and underuse awk.

my norm is cat file | perl -p -e "s/. _from ([0-9\\.]+) ._ /\1/g" to get the
ip out of the same datafile.

regex's seem to help with messy data or data that contains inconsistent
delimiters.

(some of the stars got stripped by HN so the above won't work)

~~~
nitrogen
_(some of the stars got stripped by HN so the above won't work)_

Try putting a couple of spaces in front of your code line, like this:

    
    
      cat file | perl -p -e "s/.*from ([0-9\.]+).*/\1/g"
    

See <http://news.ycombinator.com/formatdoc>

Edit: for simple regexes, sed works well, too, and probably loads slightly
faster than perl.

------
j_baker
I just learned about another one a couple of days ago: cut. Can't believe I
never knew of its existence.

Also, for programmers, I'd recommend ack over grep.

~~~
tsmall
Thanks for the tip. I'd never heard of that one either. It seems like a
simpler awk, or at least small subset of awk.

------
schm00
I started a wikibook on this stuff a few years back. Includes material on
inline perl, gnuplot and has lots of examples. Check out:
[http://en.wikibooks.org/wiki/Ad_Hoc_Data_Analysis_From_The_U...](http://en.wikibooks.org/wiki/Ad_Hoc_Data_Analysis_From_The_Unix_Command_Line)

~~~
Mod_daniel
So very cool! Thank you for posting.

------
wonderzombie
For my part, I don't use awk for anything more complicated than one-liners. I
used it for a while, stopped when I was working on something else, and forgot
all the awk-specific stuff.

My MO these days is if it's anything more complicated than an awk one-liner
like awk '{print $2 " " $NF}', I'll use Python or, lately, Ruby. (Perl would
be fine, too, if I used it in other contexts often enough.)

That said, there's nothing quite like, well, _programming_ your environment.
The extent to which you can manipulate files, directories, and text in *nix
right out of the box makes me feel privileged to understand it. I can remember
a time when renaming a bunch of images en masse seemed tantalizing but out of
reach. I've since learned quite a bit, and even though it's relatively mundane
now, it still feels magical. Upthread, someone called it "moving mountains."
That's precisely it, and I love it.

Yes, yes. I'm a complete and utter nerd, etc.

------
muyyatin
It's also useful to know that 'sort -u' removes duplicates.

~~~
jimbokun
Although it was important to use the -c flag on uniq for this particular
problem, which is not available with "sort -u."

Which I guess just goes to reinforce the Unix philosophy of tools that do one
job and do it well.

------
mturmon
They forgot sed.

------
bobf
As others have mentioned, tr and cut are extremely useful. Although I had
overlooked them in the past, expand/unexpand are also very useful! They
convert tabs to spaces, or spaces to tabs. Of course there are other ways to
do that, like substituting with sed, translating with tr, or printing tabbed
data using $1/$2/etc. with awk... they just aren't as simple.

------
KC8ZKF
See also "Opening the software toolbox" by Arnold Robbins, part of the GNU
Coreutils documentation.

[http://www.gnu.org/software/coreutils/manual/html_node/Openi...](http://www.gnu.org/software/coreutils/manual/html_node/Opening-
the-software-toolbox.html#Opening-the-software-toolbox)

~~~
ludwigvan
GNU Coreutils documentation as a whole is very useful.

<http://www.gnu.org/software/coreutils/manual/> To read on the command line,
try:

info coreutils

Also see,

<http://en.wikipedia.org/wiki/GNU_Core_Utilities>

------
ddelony
Hacker News readers probably have at least a passing familiarity with
Unix/Linux, but it's still refreshing to be reminded that you can move
mountains with short commands.

~~~
kd0amg
And even for someone who knows all of this, knowing a good guide makes it easy
to handle requests for help (often preemptively). This is one I can (and just
did) send to a friend who's less familiar with Unix.

------
Andrew-Dufresne
I think paste (merge lines of files) also deserves mention. Besides that, I
have found tail and column to be extremely useful.

------
obsessive1
I've been using Linux for a while now, and I never thought about how powerful
those simple commands can be.

------
riffraff
sinc people already pointed out the missing ack & comm, I'll add: no love for
tr?

------
caf
Best textutil you probably haven't heard of (or have forgotten about): comm(1)

~~~
RiderOfGiraffes
I find "diff -y" more intuitive, but I didn't know comm and will explore
potential uses.

