
Sculpting text with regex, grep, sed and awk - g3orge
http://matt.might.net/articles/sculpting-text/
======
gpapilion
Sed and awk are two underused unix tools. One you learn how to use them well,
you'll constantly surprise yourself with what one can do in a simple shell
script.

~~~
khafra
I've learned sed well enough to replace simple patterns usefully, and awk well
enough to do a '{print $2}'. I could doubtless do much more if I really spent
some time learning them. But, with the current ubiquity of Python, is there a
big benefit offered by awk and sed?

~~~
billjings
Python's just a different tool. Sed and awk fall down with a sufficiently
involved task, but they shine in duct taping something together quickly at the
command line or in a short shell script.

Two reasons why: one, everything you realistically would want to do with sed
and awk is available immediately. You don't have to import any libraries or do
any setup work, and the language itself implicitly assumes that you're
iterating over delimited text. Getting to '/^GET/ { print $3 }' takes a lot
more characters in python than awk.

Two is that in the case of awk, the language model is superior for simple
parsing of structured text. If I want to extract some content from a specific
XML file, for example, the event based programming model gets me there quickly
and without any specialized libraries.

Of course, python wins as your problems get harder and veer away from
sed+awk's strengths. My rule of thumb is that as soon as I start thinking I
should break things out into functions, I switch to a stouter programming
language.

~~~
cema

      Getting to '/^GET/ { print $3 }' takes 
      a lot more characters in python than awk.
    

True. Then there is Perl which is closer to awk/sed (intentionally, too) and
thus more compact than Python.

    
    
      perl -lane 'print $F[2] if (-m /^GET/)'
    

It tends to grow unwieldy as the problems (more precisely, solutions) become
more complex.

------
TheCapn
When I got my first job out of university I was doing C coding for a dept.
that shared space with a group that had heavy unix scripting jobs. They were
swamped with work and I was relatively free. One day one of the acting
Business Analysts asks me if I know Unix. I responded "Sure, I know all the
basics." He asked if I knew sed to which I responded "No not really." He
turned away at that point, I went to google.

At that point I picked up a few awk/sed tutorials (I already was quite
familiar with grep) and suddenly saw the world for what it could be. I'm
seriously blown away day to day by the way I make things easier with these two
tools. Parsing out data from massive files along with trying to do lots of
adjustments to scripting files is suddenly super easy. It makes a lot of tasks
easier and I look at some of the stuff I do in my new job and wonder how I'd
get through without these tools. Hell, I even wrote a crappy C program to do
the basic global search/replace that existed in sed before I realized how it
worked in vi :%s/.../.../g

I also get more slashdot jokes now...

tl;dr - You don't _know_ Unix until you know sed+awk.

~~~
qntm
Is it acceptable to not know sed, grep or awk but to know Perl?

------
aidenn0
"awk manipulates an ad hoc database stored as text, e.g. CSV files."

I love awk and use it on a daily basis, but it's biggest weakness is CSV
files, since data so often contains commas. Please don't use CSV as an example
of what awk is good at!

[edit] Easiest workaround if you do need to do something quickly with a CSV is
to just use sed to replace unquoted commas with a string not in your data; for
non throwaway uses, there is a CSV library for awk.

~~~
bingaling
Csvkit may be useful: <http://news.ycombinator.com/item?id=3477771>

~~~
dredmorbius
Also csvtool (just went hunting for it after seeing your post).
<https://forge.ocamlcore.org/projects/csv/>

------
gmaslov
Oh dear. I like this article, but using XML parsing as the example for sed
made me cringe hard. Please never, ever attempt to work with XML using regular
expressions!

~~~
bwarp
The "can't parse XML using regular expressions" argument is possibly flawed as
it's a false dichotomy. XML is a superset of text therefore it is parseable
with regular expressions, but in the context and constraint of text, not as
structured XML. I.e. you can parse the textual content but not the structure
with regular expressions.

It's possibly a hack but ignoring a level of abstraction for the sake of
simplicity works in a lot of cases absolutely fine.

Let's also add the constraint that an XML parser can't mathematically parse
broken XML whereas a regular expression can extract data from XML. That end of
the problem is far more interesting.

GEB made me appreciate this fact after some rather late nights and
considerable amounts of alcohol.

------
g3orge
the link is missing in the "AWK, according to its creators, Aho, Weingberger
and Kernighan" note.

~~~
mattmight
Thanks for catching that! It's fixed.

