Hacker News new | comments | ask | show | jobs | submit login
The Awk Programming Language (1988) [pdf] (archive.org)
370 points by shawndumas 9 months ago | hide | past | web | favorite | 207 comments



I consider awk to be the most useful and underused language in the UNIX ecosystem. I use it daily to analyze, transform, and assemble data, and it always blows my mind that so few people really know how to use it at a decent level. This is an excellent book to give a real idea of what awk is capable of.


> it always blows my mind that so few people really know how to use it at a decent level

Not nearly as surprising as it is to me that now that most developers have forgotten perl, they're turning to awk as an inspiring example of a bygone era.

Seriously: perl basically replaced awk in the mid-90's. It absorbed all the great lessons and added two or three dozen innovations. But it had scary syntax, so everyone used python (which did not replace awk very effectively) and forgot perl. So now we're back at awk. And the scary syntax is all in the Rust world.


Perl deprecates awk & sed [0]. Whereas Python is a bad substitute for those, and because of that, we're back on Awk. I have the feeling that Python simply succeeded as it seems to be a language designed for people that do not enjoy programming, which sadly seem to be the majority in our industry.

The Perl hate is so prevalent in most companies, that you can get into serious issues should you even write a one-liner in it. So here I am, reluctantly using awk/sed and asking too much of Bash/Python.

[0] https://www.manning.com/books/minimal-perl


Ooof. I have to respectfully disagree that perl deprecates awk. If I'm writing awk, it's usually because what I want to do is precisely what awk does by default: read a line, match it against a condition, and perform some action.

I could do all that in perl, but it's more stuff I'd need to dig up every time I wanted to do something awk-like. People who code perl regularly already have it, for sure. It's a barrier for those of us who don't, and one that awk doesn't have.

Awk isn't perfect, for sure, but I've transitioned from writing simple text processing in perl to doing it in awk because awk provides the basic framework that I'd end up doing ad-hoc and non-idiomatically in perl every time.

Again, all this with that caveat that I'm not a perl programmer. I used it for a couple of semesters in college, which was enough to be dangerous, but not truly proficient.


`perl -n` wraps code in an awk-alike start and finish stanza.

`perl -p` wraps code in a sed-alike start and finish stanza.


So, basically, if I understand, Perl has both awk and sed built in.


Yes, as per what I read long ago, Larry Wall created Perl as a sort of superset of parts of awk, sed, shell and C (maybe not shell).


absolutely shell. Perl acts as glue between multiple processes very nicely. $foo = `foo_command`; is great. Opening named pipes. Python is more awkward because it's less like shell.


Thanks, didn't know that about Perl.


If one writes a Perl one-liner where AWK will do, that’s avarice.

I used to maintain and debug a relatively large and an extremely complex build engine in Perl for a living for several years, and there was no construct in there that could not have been easily written in AWK.


> a language designed for people that do not enjoy programming, which sadly seem to be the majority in our industry

Could you expand a bit on that comment?


If I can add my interpretation of his comment:

Consider which language is sold to people-off-the-street as "How to Program!" it's python. Most of these people do not enjoy the cognitive effort, detailed typing and symbolic work, algorithmic thinking, architectural thinking, linguistic thinking that is "programming". At best, the enjoy the end result, but not the process.

Python taught in this way at least, fools people into thinking that programming is about something far simpler and more toylike than it is. Many "hard" disciplines do this: sell children and lay people on toy experiments that momentarily captive but have no relationship to what a practitioner of that discipline does.

The older system of recruitment would be to polarized hard, ie., to throw people in at the deep-end to scare off all the people who would waste time/resources training. Who remained really wanted to do that thing (eg. physics, programming, ...).

Today we're doing something perhaps vaguely immoral: selling people on a career that has no relationship to the sales pitch.


To me, that sounds like gatekeeping - as in "it's only programming when it's insanely difficult".

Somebody playing football in a Sunday pub league is still playing football, and still loves playing football, even if they're not at the same standard of Lionel Messi.

For me at least, I feel the same about programming. I love understanding somebody's problem, and building a solution for them that solves that problem. I normally use PHP (and sometimes Python or Ruby or JavaScript) because they make it easier for me to focus on the problem, rather than language details. I can't always solve the problem because it's too difficult, and perhaps some of my solutions are not 'optimal'. But I feel hurt by the idea that because I don't have a strong understanding of how Python works at a really deep level, I'm not a real programmer.

I also think that's a great way to piss people off who are just getting into the industry, and may one day become great programmers - even Linus Torvalds was a junior once. I'd encourage them to keep going, keep learning, and keep helping people solve their problems (and getting paid good money for doing that).


Its only gate-keeping if people want to be past the gate. I'm not talking about deterring people who are interested.

The vast majority of people do not want to be programmers and would not enjoy programming. Delaying the moment the actually have to do something difficult with a programming language is not especially heathly.

How would you feel about a career in football sold to you on the basis of table pong? Keep playing the pong, and then one day, you're face is in the dirt and you drop out.

The self-esteem hack psychology of the 60s-90s equivocated encouraging people with lying to them, as-if the only way we can get programmers is by lying about what programming is about. This isnt encouraging anyone, it's lying to them.


I think you’re confusing “programmer” with “10x programmer”. Plenty of people are capable of implementing business logic in code. Very few are capable of designing that logic — they’re the 10x programmers who know all about data structures, algorithms, etc.

You can’t run a business expecting every employee to be a rockstar. It just doesn’t scale. So you skill it down and put the high-skill people where they can have the most impact.


It sounds like we have a different definition of programming. For me, programming is producing code that gets executed by another piece of hardware or software.

The complexity of the code is not part of the definition. Nor is your understanding of the hardware/software involved.


Enjoying programming is enjoying programming. The way it is sold, sometimes, is not as programming.

Unless you've experienced this sales pitch you're not going to connect with the point i'm making.


Care to give an example pitch from the real world?


Perl isn’t difficult in the same way that the English lnguage isn’t difficult. Easy to get started with, takes a long time to master. Fortunately for us die hard perl types it’s quite capable of cleanly solving all normal dynamic language problems, and some seriously abnormal ones. And because of the insanely good backcompat in perl we’ll see the pendulum swing back to perl 5 some time in the next decade.


>>To me, that sounds like gatekeeping - as in "it's only programming when it's insanely difficult".

The whole point of programming is power and flexibility. Else there would be no need to move beyond logic gates.

The problem is tools sold for newbies tend to take away too much power to make it easy for people to start, but then keep them, right there, all life.


When I did my comp sci undergrad degree, languages per-se was not part of what was taught in the classroom. Whatever language that class was using, be it assembler, scheme, C or some other higher level language, it was up to the student to figure out the syntax, how to run the compiler, etc. You got some help and basic examples in the discussion sessions but it was not part of the lectures. And there was no web, no stack overflow, no google at the time. You had a book about the language, some local newsgroups (and the larger USENET) and that was it.

If you didn't enjoy reading, puzzling, and banging your head against the wall you would wash out after a couple of classes.


What language do you think beginners should be learning that would not keep them from what you say is ‘cognitive effort, detailed typing and symbolic work, algorithmic thinking, architectural thinking, linguistic thinking’. Perl? And why is Python incompatible with those things?

Like many of us, I started programming with BASIC, not C or assembly, but I turned out fine. I don’t think people need to or necessarily can absorb all the high level details straight off. To me python seems well organized enough that it makes a great language for beginners.


Python is fine.

Python is perhaps my favorite language, and I use a dozen fairly regularly.

My point was about how python sometimes gets used, and how its design is especially facilitating to that use.


As far as the second part of your comment, I think if you make topics like this impenetrably difficult and complex at first, it will not only filter out the non-dedicated, but also those who don’t yet know that they would like to be dedicated to it.


Python is the rubber sheet analogy of programming.


Aka python helps you think more like the computer does. Perl helps the computer think more like you.


Note that it sounds like a snide remark but it doesn't have to be. One could interpret it as, "Do you want to simply get things done, and not bicker endlessly about microoptimizations or where the braces go? Python is for people who simply want to complete their tasks and then go home and spend their time doing more stuff they think is fun."

It's not necessarily a bad thing to be productive, instead of inventing new problems for yourself to keep having programming things to do.

(Note that none of this necessarily reflects my personal opinion; it is just an alternative way to read the grandparent.)


The cargo cult continues.

It's disgusting how much we all rely on intuition and gut feelings when evaluating large swathes of technology. The internet hates perl so people criticize it without even using it. People will use what everybody else is using without actually trying out the options. There's too much information and so we must go with what others have said, and it all becomes hearsay. Keeping up is more like wizardry than engineering. Go with what the crowd says because I can't possibly install all of those libraries, play with the examples, and give my own evaluation. Hey look a new js framework just came out...


I used it quite a bit, and wrote some applications in it that are still in use a decade later. The "write-only" aspect of Perl is real -- it enables many styles and idioms, and as a result, tends to requires reading knowledge of all of them. It also has some unusual design decisions (list vs. scalar context) and some hacks ("bless") that do not contribute to readability, especially for people who do not get to use Perl all day long. Python is a lot more readable, and does most of the same stuff.

What Perl did that was amazing was bring regular expressions "to the masses", and Perl-compatible regular expressions (pcre) are still the defacto standard that most subsequent libraries have used (more or less).

"The internet" is an abstraction and doesn't hate (or love) anything. That itself is the kind of gross generalization you are criticizing. And one can criticize a language and still have respect for it.


Let's be honest -- people only use what everyone else is using because it lowers the bar to getting hired to 90+% of programming jobs.

I can count the number of tool-agnostic development teams that I've met on one hand. Many more have claimed they are when they are not.

If you aren't aiming for the top 10% of jobs (vague quality metric that you can interpret as you wish), then you want to have above-average knowledge of _just_ Python, Go, React, Docker and Kubernetes.

The situation only changes when high profile current/ex-Googlers (or similar) start talking about a language/tool a lot. Then the mass hops on board that train too.


I agree with the thrust of your comment but not the specifics. There are still a ton of Java and JavaScript (not necessarily React) jobs out there in the "bottom 90%".

And I don't think "high-profile" people talking has much effect. Paul Graham talked up lisp for a while, and I was certainly interested (I like lisps), but... there are still precious few jobs that use Lisp. Most people learn technologies that they either need currently or are in use in jobs they know of and might conceivably get.


Even at shops just hiring for Java or JavaScript, I don't think knowing them is a career advantage. They don't make you more hirable than you would be otherwise. That's really the only reason I left those out.

Paul Graham, while a thought-leader of sorts, isn't generally thought of as someone working at either the edge of tech or in large-scale systems. That's why nobody wants to chase the tech he's using vs what Google/Facebook/etc are.


>And the scary syntax is all in the Rust world.

There's something like necessary complexity you can't easily abstract away. I find Rust does a fine job at cleaning up syntax. I'm not a fan of snake_case. Other than that I can't think of anything that's more difficult than the underlying concept in Rust. And it's still close to C (braces and functions) and ML syntax (type after colon and variable name, let bindings) in many ways.

Especially compared with a similarly complex language like C++. Now that is scary syntax, if you're not used to it from 20 years of using C++ and developing Stockholm Syndrome.


Oh, don't be silly. Exactly the same point you're making about rust can be made about both C++ and perl by expert practitioners. Syntactic complexity is linear with expressive power, that's why it's complex to begin with. You just "like" rust, so you view it as a good tradeoff there and not in other languages that you "dislike".

My point above was that this decision is basically one of fashion and not technology. And the proof is how the general consensus about awk has evolved over 20 years as perl has declined.


Are you confusing syntax with grammar? Rust has a large grammar—many reserved words, many compile-time macros, etc.—but not too much in the way of syntax (e.g. novel punctuational operators; novel kinds of literals; etc.)

C++ and Perl, meanwhile, both have tons and tons of syntax, such that they're 1. harder to grasp for people who haven't seen them before, and 2. harder to learn (especially by attempting to Google language features "by name.")

If there was a spectrum with Lisp [or Forth] on one end and APL on the other, Rust might be somewhere right-of-center... but it'd still be pretty far left of C++ and Perl.

Also, given the languages that occupy the ends of said spectrum, I think it should be clear that your position on said spectrum has no correspondence with "expressive power" :)


You said it better than I could :)


Awk has evolved? GNU Awk has---somewhat. It has a clumsy facility in the place of a proper FFI called the "extension API" for binding to C libraries. You need to compile C code to use it, and the API has been a moving target. It has a way to work with XML. It has an "in place" editing mode, and other things like bignum integer support via GMP (requiring the -M option to be used). Plus various minor extensions to POSIX, like full regex support in RS (record separator), a way to tokenize fields positively rather than delimit by matching their separators, a way to extract fixed-width fields, an @include mechanism, a way to indirect on functions and such. None of it adds up to very much.


I like C++, but it's not the language that you would design if you hadn't accumulated so much cruft over the years. Rust didn't have to support C compatibility and compatibility with earlier C++ standards. So Rust could be designed properly for modern use cases.

There's necessary complexity to express certain concepts, but C++ has accumulated a lot of unnecessary complexity over the years.

Rust, in terms of GC-less languages with mostly zero-cost abstractions is the simplest language that I've seen. And having a GCless language with memory safety is not just fashion. It's pretty much the greatest single advancement in language design since the GC itself.


> Syntactic complexity is linear with expressive power

Have you taken a serious look at a lisp? You might be pleasently surprised. Everything that can be said about lisp has probably already been said, but I'd argue that sexprs have a much higher expressive/complexity ratio than say the C++ grammar.

Somewhat relative xkcd[1]

[1] https://www.explainxkcd.com/wiki/index.php/224:_Lisp


Second this. Perl replaced both awk and sed. Data scientists marvel at awk/sed but Perl is somehow forgotten. Perl May not be as suited to writing complex programs, but for text processing tasks, it is much more elegant than awk or sed.

Perl -pie is a very powerful idiom.

I would rather use a subset of Perl than awk.


That's the first time I'm hearing Perl being praised for its elegance of all things. Elegance is certainly in the eye of the beholder, but by default is understood in the context of programming languages as "containing only a minimal amount of syntax constructs". By that measure, Perl is spectacularly/absurdly bad with its "sigills" and "there's more than one way" idioms. In fact, I find Perl one of the ugliest languages of all time.

Edit: a-ok, probably elegance is meant in the same sense that C is elegant by cramming everything in a single statement using pre-/post-increment operators and assignments-as-expressions


One has to recall that there were some forces that wanted Perl to become a standard shell rather than a programming language. A shell is usually more limited in features, but it is frequently very forgiving, provides many shortcuts, and there are often multiple ways of doing things.

However, I've never believed it to be possible to have a language both as a shell language and a proper programming language for large-scale projects. I believe the two usecases are fundamentally antithetical, but I'd be happy to be proven wrong.


> However, I've never believed it to be possible to have a language both as a shell language and a proper programming language for large-scale projects. I believe the two usecases are fundamentally antithetical, but I'd be happy to be proven wrong.

I'd say Powershell proves you right. Powershell has a great design, it has optional typing and access to a cornucopia of libraries via .NET.

Even so, they had to make some compromises because of the shell parts (functions return output, for example) which makes is quite finicky as a "proper" programming language.

On the shell side, the very nice nomenclature which makes it very readable and discoverable makes is annoying sometimes to use as a shell. That and the somewhat unwieldy launch of non-Powershell commands.

Someone who attempts to bridge the two has a ton of work to do, both in the research and in the implementation department. I guess Oil Shell (https://www.oilshell.org/) is the most realistic approach we have today. And it's probably still 1-2 years away from release and many more years from mass adoption (if that ever happens).


yeah.. I like the `s///` syntax in sed/perl than the `sub/gsub` syntax.. plus the regex is lacking in awk.. no backreference(gawk provides, but only in replacement section).. and perl has non-greedy, lookarounds, code in replacement section, etc

other nice features I'd like in awk is `tr` and `join`


Why add 'tr' and 'join' to awk when they exist on their own?

That's part of why people avoid perl. It's very capable, but that wide scope is counter to the unix philosophy that prefers simple, focused utilities that can be combined in pipelines.


looks like I didn't word it properly...

you could ask why have sub/gsub when there is sed... that's because you need that for specific field or string in addition to other processing.. similarly, having tr for specific string/field is useful..

I meant join as in perl's join - to construct a string out of array values with specified separator

Some examples:

* https://stackoverflow.com/questions/48920626/sort-rows-in-cs...

* https://stackoverflow.com/questions/45571828/execute-bash-co...

* https://stackoverflow.com/questions/48925359/sorting-groups-...


I think perls niche is sort of it's downfall. For example, I used to work at a company with 70% C, 25% shell script, and 5% perl. Any time I ran into a perl script I had to switch my brain into Perl mode, with the understanding that what I was working on would be just as good or better in C or shell. I had nothing against Perl as a language, but always enjoyed exorcising a perl script from the codebase.


Ah, but which subset?


What I like about AWK is that it is described by a few pages of the POSIX standard. If Perl was that ubiquitous and simple, I would prefer it over AWK.


To be clear: I like that about it too. But in the real world, we want tooling that does more stuff. In the mid-90's, "everyone" knew perl, because it was the best choice for problems in this space. And in a world where everyone knows perl, there is no place for awk.

But now we live in a world where no one knows perl, and languages like python and javascript are clumsy and weird in this space. And that makes awk look clever and elegant.

All I'm saying is that in perl-world (which was a real place!), awk wasn't clever and elegant, it was stale and primitive. And that to me is more surprising than awk's cleverness.


I think people underestimate the importance of medium-powered tools. Perl is great, but awks limitations make it easier to write in a way that the next person can maintain.


That being said, the couple times I’ve paired with a perl wizard were revelatory.


Or just reading about perl wizardry: https://www.hanshq.net/perl-oneliners.html


Can you elaborate this? What revelations did you experience?


> in a world where everyone knows perl, there is no place for awk.

And yet in 2018, I never think about Perl anymore, but use sed, awk, and grep daily.


This is a great point


>>And that makes awk look clever and elegant.

Eventually the cycle will repeat again. The moment you will have non trivial text work to do, awk will have to give way to Perl.

The rise of Python was really use case for dealing with standard interfaces like DBs/XMLs/JSONs becoming common. Python hasn't actually replaced Perl in any meaningful way.


Most modern distributions write their tools in Python instead of Perl. Proof: the really long transition process Fedora (and therefore RHEL), Debian and Ubuntu went through to migrate from Python 2 to 3. They could have done it faster if not for their system tools written in Python 2.

In the web space, Python is not huge, but it definitely supplanted the niche Perl used to have. No one I know writes web stuff in Perl anymore.


I totally see your point. And in Perl-world, I would probably use Perl too – I mean, if both tools are equally ubiquitous, why not use the most powerful one?

I entered the field at the tail end of the Perl era, so I've only toyed with it a long time ago.


I hated perl in the mid-90's, and stuck with grep/awk/sed intead, because it was a little more documented-structured.

Meanwhile, if you randomly mashed on your keyboard, it would output a perl script.

I jumped to Python as soon as I found out about it in the late 90's, because it was exactly what I was looking for in self-documenting structuring. It was great for creating parsers with state-machines. I was also the user of my scripts, and I didn't want to have to relearn what I coded a year or so earlier. Python let me pick that up, and to a lesser extent, awk. State machine programming was really self-documenting in Python.


I’m sick of the “perl is line noise” narrative. https://pastebin.com/C9hPS8xR


Here's a picture of the AWK book next to the Camel book https://twitter.com/heinrichhartman/status/77170643345276928...

"#awk doc shows you how to implement a rel-DB and a compiler. #perl doc talks 20p+ about nested data structures."


That is one thing I like better in Python than in Perl. In Perl, I was having difficulty with nested data structures and then realized this was something I did in Python daily without even knowing it was a thing on a conscious level. Who hasn't made some weird dictionary with its values being lists or tuples or something like that?


This is exactly backwards to my eyes. Perl's autovivification behavior (assigning through a empty reference to something autocreates the object of the right type) makes nested data much, much cleaner.

Who hasn't made some weird dictionary with its values being lists and forgotten to check for and write code to create it when it doesn't exist, and had to fix that later after a runtime crash? In perl that's literally covered by the one-line assignment you used to set the field you parsed.

This is why it's sad that everyone's forgotten perl.


Defaultdicts to the rescue! But I think it should be an explicit choice. If you only intend to have lists at some keys, and then accidentally mistype a key, it shouldn't (in my opinion) silently create a list, effectively hiding the bug.


Sorry, but I didn't catch your example. Can you explain a little more? What does Perl do better there?


Most of awk's 'advanced and hard to do' use cases in Perl go like

    open (my $FILEHANDLE, '<', $file) or 
        die "Cannot open $file\n";
    while(<FILEHANDLE>) {
        chomp;
        #Do you stuff here
    }
    close(FILEHANDLE);
Also perl can do various other things awk just can't. For example removing something from a file and then talking to a database or a web service, or do other stuff like parse a JSON or XML. Deal with multiple files, or other advanced use cases. Unicode work, advanced regexes etc etc.

In fact the whole point of Perl was Larry wall reaching the upper limits of what one could do with awk, sed and other utilities and having to use them all over the place in C. Then realizing there was a use case for a whole new language.


In fact the boilerplate you describe is mostly redundant if you use perl -p since perl -p results in your program running with this wrapper:

    LINE:
    while (<>) {
	...		# your program goes here
    } continue {
	print or die "-p destination: $!\n";
    }


Correct me if I'm wrong, but this will only be line based won't it? As far as I can tell there is no equivalent of FS/RS/OFS/ORS that make awk the record (not line) based language it is.


For example removing something from a file and then talking to a database or a web service,

...

Deal with multiple files, or other advanced use cases.

I do exactly this every day with AWK. Solving exactly these kinds of problems features prominently in the AWK programming language book.


It goes further. Awk is when you entire use case fits into the category of iterating lines in a file.

Perl is that plus more things.


Perl simply isn’t worth it. In all the decades of programming, I’ve yet to run into a problem which could only be solved in Perl because it couldn’t be done in AWK.

And I would hereby like to remind you that every computing problem is fundamentally an input-output problem, and because of this intrinsic property, it is possible to reduce all problems in computing to input-processing-output.

Which is exactly the kind of problem AWK is designed to address.

And AWK doesn’t work with lines, it works on records, for which the fathers of the language cleverly chose the default of ‘\n’, which is reconfigurable.


"In all the decades of programming, I've yet to run into a problem which could only be solved in Perl because it couldn't be done in AWK."

Have you ever pondered the number of projects that, in addition to sh, make and common base utilities, require perl during compilation where awk could have sufficed?

As a single example, have you looked at compiling openssl without perl, using awk instead?

Whenever I see a perl prerequisite I question whether it is truly a requirement or whether other base utilities1 such as awk could replace it.

Assuming it could be removed, how much effort is one willing to expend in order to extinguish a perl dependency?

1. Some OS projects like OpenBSD make perl a base utility.


The illumos project undertook a massive effort to eradicate any and all dependencies on Perl (which had also been made part of the core operating system decades prior, by themselves no less). While they're still at it, they have managed to rip out most of it and replace it with binary executables in C or shell.

Yes, writing a build engine in AWK would be perfectly doable, but the right tool for that job is Make.


That's easy to do.

But that's not the end of it. The whole point of Perl is avoid a salad of C and shell utils. Also in many cases the moment you have to deal with >2 files at a time shell utilities begin to show their limits.

The resulting code is often far more unreadable than anything you will ever write in Perl.


I’m sorry you believe that, but it’s simply not true, especially since UNIX is designed to be driven by a combination of binary code and shell executables. Perl is a mess whose syntax borders on hieroglyphic once poured into software. That doesn’t mean that it’s not a capable language; but it’s not a language in which written programs are maintainable or easily debuggable.


That, and awk programs are almost perfectly portable across awk implementations: nawk, gawk, mawk (can't vouch for busybox-awk as I haven't tested it, though I've heard it's good as well).

This is of practical importance for portable scripts, since Debian uses mawk [1], RH ships gawk, and Mac OS uses nawk.

[1] mawk has recently received new commits by its original author after a hiatus of 20 years or so; see https://github.com/mikebrennan000


AWK seems far more modern than Perl, though. Defining a function with you know, "function" and actually having parameters with names feels like a 21st century language. Calling things "sub" and having it work off a an array called @_ doesn't. Yes, I know there are packages to add named parameters to Perl, plus Perl6 has them out of the box, but it is weird that things went backwards in a lot of ways when Perl replaced AWK in the mid 1990s.


(g)awk was also fun to fuzz, recently:

https://blog.steve.fi/if_line_noise_is_a_program__all_fuzzer...

Though sadly still unfixed:

https://bugs.debian.org/816277


You have a really good point. AWK feels incredibly modern for being so old.


Until you find that it has no local variables other than function parameters. In a function, local variables can be simulated by defining additional arguments ... and not passing those. Everything that isn't a function parameter is a global!

Awk is stupidly confused whether it is a two-namespace or one-namespace language (think Lisp-2 vs. Lisp-1). For instance, this works:

  function x()
  {
    print "x called"
  }

  function foo(x)
  {
    x() # function x, not parameter x.
  }
but in other situations the two spaces are conflated. For instance sin = 3 is a syntax error because sin is the built-in sinusoid function, even though you're using it as a variable, which shouldn't interfere with function use like sin(3.14).


Actually what you show is that there is a clear rule. First, the symbol is treated as a function. If it is not in the function space, then it is used as a variable name. To make a finer distinction awk would need some form of local variable declaration, which it clearly hasn't.


> First, the symbol is treated as a function. If it is not in the function space, then it is used as a variable name.

That is simply not the case. x() unambiguously treats x as a function, and will fail if there is no such function, even if there is a variable x.

  $ awk 'function foo(x)
         {
           x()
         }

         BEGIN { foo(42) }'
  awk: cmd. line:2: fatal: function `x' not defined
> To make a finer distinction awk would need some form of local variable declaration, which it clearly hasn't.

Also not the case. The purely syntactic context distinguishes whether or not the identifier is being used as a variable or function. Awk sometimes uses it, sometimes not. This doesn't diagnose:

  function x()
  {
  }

  function foo(x)
  {
     x()   # function call allowed
     x = 3 # assignment allowed
  }       
 
  BEGIN { foo(42) }

But:

  function foo(x)
  {
  }       
 
  BEGIN { foo = 3 }

  fatal: function `foo' called with space between name and `(',
  or used as a variable or an array
Why isn't it a problem that the function x() is used as a variable?


Awk is basically what JavaScript syntax is based on (no really!). The following (while not terribly useful) is both awk and JavaScript:

    function f(x, otherArgs) {
      r = otherArgs["x"]
      for (v in otherArgs)
        delete otherArgs[v]
      return r
    }
JavaScript even uses awk-style regexp literals (though with PCRE semantics, and more features, etc.).


And Perl is 1000% better than awk/sed when my company got switched to unix based systems we all got sent on a weeks crash course this included sed and awk.

I have only used sed once to edit the passwd file on an early linix system when vi wasn't installed and awk never. Though having being trained in sed and awk helped with picking up Perl


Python was never intended to replace Perl. Perl was designed to extract stuff from text files. Python was designed as a scripting language for system programming. IM(Seldom Humble)O Python beat Perl for two reasons (a) batteries included (CPAN became a clusterfuck) (b) C API lead to stuff like SciPy, NumPy and Pandas.

FWIW I've used both Perl and Python professionally and Python rules.


Perl is a little older than Python. Most of the ideology of Python was a reaction against Perl (https://wiki.python.org/moin/TOOWTDI). Python has always tried to be as good as Perl on everything. I use very much Perl for what it was intended: parsing files (often log files and configuration files), producing reports and launching commands (shell replacement). I have tried to use Python for the same tasks, in particular when some of the files were in xml (xml parsing in Python is nicer than in Perl). Regular expression usage is easier in Perl where multithreading is easier in Python. IMHO, the main handicap of Python vs Perl is the lack of autovivification. Python is very good for teaching (whiteboard interview) or as a scripting language around some big libraries (like tensorflow). At my work, the small glue scripts are almost always in shell or in Perl. The Python applications are being rewritten in Java because of maintenance issues (mainly caused by lack of proper typing). Python does not rule here.


You might want to try Ruby for some of those things you reached to Python for. It takes direct inspiration from Perl and does a lot of those things better, IMO.

Nokogiri is hands-down the best tool for dealing with XML that there is.

I have much the same experience with Python and anything significant developed in Python here ends up getting rewritten in Go.


Not to move the post, but that's part of what makes python so great to me. It reads like pseudocode if written with that goal in mind. Its more like having a conversation with the computer. When I need performance, I now have the algorithm written out in an easy to parse way. Then small parts in golang/C are far faster to write, because I can feel out the whole program.

Im not directly in the software industry though, just writing programs for data analysis in quality and safety programs. I'm sure once you can think directly in something like Go, it'd be faster to write that program first. But it decreases the cognitive load for me.


No worries -- I actually agree with you. We actually don't do very much scripting at my company in general -- Java, Go and JS (for Lambda) are roughly what we've standardized on.

We have a bunch of Python scripts for operations work and it's rare that performance becomes a major concern (...except in/regarding Ansible). Development isn't my team's primary responsibility so this state of things is fine -- our SysEng team can grok Python pretty well, whereas with other languages I wouldn't say this is true.


To me, the main reason of Python success vs Perl is that it's much easier to learn, so anyone can jump in even with limited programming experience.


That is probably true with regard to its adoption, but, as a Perl fan and user, I have to say that in addition, the limitations of Perl 5's syntax begin to show once you get in to collections of collections, and passing them to functions. If you are doing this sort of thing every day, it becomes second nature, but as a casual, ad-hoc user, it is no longer in my working memory.

Maybe Perl 6 fixes these things, but learning it is too far down on my to-do list, where it sits just below Ruby.

If I have a problem that can be solved by looping through the lines of a text file and applying some combination of regular expression matching and substitution, the split and join functions, simple arrays and hash tables, I reach for Perl 5.


> Python beat Perl for two reasons [...]

The reason I personally left Perl was the extremely hostile community. When someone asked a simple question, not only would they get an overhaul and namecalling, but so would anyone that tried to help them.

Python seemed to do a much better job at onboarding new people. To me, that seems like the most important reason they won in the long run.


that is a very abnormal experience. Larry Wall has personally helped me with at least two junior issues over the years by just randomly answering in general forums.

FWIW perl6 takes extra special effort to be nice.


> that is a very abnormal experience.

I don't have links handy, but there were official blog posts and such about how they endeavored to fix the problem. Of course I'm not saying it was universal, nor trying to take away from the fact that there are a lot of great people in the Perl community (PerlMonks is a good example)

They did fix it, as far as I can tell anyway, but it was too little too late.


>CPAN became a clusterfuck

Did it? Not used Perl for a while but I never needed to worry about backward compatibility and versions with CPAN, while I find I do with Python (to be fair I am programming more complex stuff than I did in my Perl days).


I've used Python for years and a little Perl. CPAN works pretty well when used with ActiveState...not sure about Unix. Python on Windows stinks when using Pip and there is some VisualStudio C++ library dependency you need and they don't even have anymore. With Python it is best to get a large distribution that bundles everything already such as Anaconda...not worth the hassle.


IMO Python won primarily because its syntax was more English-like, which makes it a lot easier and more readable to regular people who never worked with shell and linux, and perl's $#@%>> and regExps confused the hell of them. That and the boom of PHP killed Perl 5 popularity, while python got its foot in the door in science/finance niches and survived.


> CPAN became a clusterfuck

This gave me flashbacks to installing some perl module and watching it download half of CPAN.


Just as bad I'd download CPAN module A then CPAN module B and CPAN would totally trash A.


Still NPM is so much worse...


Those who don't learn from history etc.


> forgot perl

Or maybe, hiding out in PHP?

Steeling syntax, releasing versions bigger than 5, smoking regex benchmarks...

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...


> And the scary syntax is all in the Rust world.

JavaScript is heading in the direction of Haskell, Scala, and (to a lesser extent) Rust, too [1].

[1]: https://medium.freecodecamp.org/here-are-three-upcoming-chan...


If a 25 year old programmer were to start learning Perl now, what would you recommend: Perl 5 or Perl 6?


If you are doing it for personal growth and learning, then Perl 6. It is an interesting language that cleanly supports many major programming paradigms including functional, imperative and object oriented and, most probably, others. Additionally, it has many nifty features built in such as concurrency, grammars and rationals. Aside from that, it is also fun to use and has a great community.

If you are doing it for immediate application in a job or wish to acquire a skill you think might be required in a job then the answer is probably Perl 5. Despite the noise about Perl, it is still used a lot in many industries for flow control and inter-tool format conversion as well as many other applications. Many, many people understand perl and use it for rapid development of automation tools. I can tell you for a fact that it is pervasive in the semiconductor industry and is practically a job requirement, there.


Perl6 is mostly complete. Even though it has been getting significant weekly performance improvements (they have a weekly summary of these things), it still has a long way to go. It has a lot of really cool things such as simple to use concurrency, grammars, gradual typing...etc. It is a big language, but not super hard to learn. A bunch of books have been released on it in the past two years. Perl5 is used by companies now. Some are choosing to write new software in it, but there is a lot of legacy apps.

Remember that they are two different languages with some similarities. Perl6 has really good support for making big apps I would say as it has really good OO support out of the box in addition to making FP concepts easy too. Perl5 has atrocious support baked in, but they have the excellent libraries that everyone uses to give them world class support for OO. Still, it feels more natural in Perl6.

On another note, Groovy is another great and fast language that runs on the JVM and works very well in the scripting space with a lot of DSLs for GUI, XML, DB...cool stuff.


> On another note, Groovy is another great and fast language that runs on the JVM and works very well in the scripting space

But the same versioning issue exists for Apache Groovy (2 or 3 ?) that exists for Perl (5 or 6 ?). The Groovy PMC seem to be handling the issue, though, by slowing down the development of Groovy 3 to a crawl so it won't ever ship.

Oh, and calling Groovy "fast" is a bit of a stretch. It hardly matters anyway because scripting for classes written in Java doesn't need "fast".


Depends on what you're looking for:

In many ways, p6 is the superior language. But as an AWK competitor specifically, p5 might be the better choice for performance reasons alone (while p6 has been improving, as far as regex performance goes, it just isn't there yet [1]). You might even be able to write tighter code in p5 (for one, p6 regexes return objects, not strings, so in cases where you actually want the latter, you'll have to throw in boilerplate).

[1] https://gist.github.com/cygx/9c94eefdf6300f726bc698655555d73...


Perl6 has grammars: https://docs.perl6.org/language/grammar_tutorial (recursive regex with sane syntax), which are much better than plain regex in many cases. Otherwise modern Perl5 is much more practical.


If you want to get paid for doing it, Perl 5. If you're just mucking about with personal projects, Perl 6.


Perl 5 and Perl 6 are two different beasts. Perl 5 is closer in spirit to Awk.


Perl5 and perl6 are very different languages - however after perl5 there was nobody left to listen... (Disclaimer: I got used to perl5 and I still think it's quite reasonable)


I didn’t know Perl was capable of filling the same niche as awk, and I think awk is absolutely wonderful. How does Perl improve on it?


What about tcl? Is it better than perl for text proccessing


    it always blows my mind that so few people really know how to use it at a decent level
I'll second this -- it's by far the most used swiss-army knife in my arsenal[0]. A little part of me dies when I see scripts doing grep|awk, using awk simply to print a column. After taking some time to actually read the gawk user manual a few years ago, I've found that I can do most things in awk that I was previously using grep/sed (and to a lesser extent) perl/python for in the shell. In my environment, it also comes with the benefit that I know the awk code I'm writing is supported by the version of gawk that's installed on every server/computer I come into contact with -- no having to ensure the right python/perl is available.

One thing I wish there was better documentation on, though, is creating complete gawk scripts. With @include directives coupled with AWK_PATH, it's really convenient to build up a handful of utility functions that do frequent things and I've found, on several occasions, that I end up writing an .awk script instead of a bash/zsh script and it ends up being a much more straight-forward set of code.

[0] Well, specifically, the 'gawk' variant


why not write something about the @include directives like a blog or some posts, that will be awesome.


I think my tendency to reach for grep/sed/awk might be the thing that led me away from software development full-time and into the ops world.

Too many times I encountered 200+ line Python scripts that I could replace with an Awk one-liner and a cron job.

My lazy habit in my Ruby work eventually became to parse JSON/XML/whatever into usable text, shelling out to sed/awk and working with the results. This would not only save me LoC, but was less error-prone.


I use AWK daily; it’s my workhorse programming language and automation tool. Small, fast, no dependencies and portable.

Extremely powerful programming language for big data processing and data driven programming, I often use it to generate shell scripts based on some arbitrary data. With no memory management and providng hash arrays, AWK is an absolute delight to program in. Using it with the functional programming paradigm makes it even more powerful. It blows my mind that Aho, Weinberger and Kernighan managed to design something so versatile and yet so small and fast. I’ve also been using it to replace Python scripts with a ratio of Python:AWK being anywhere from 3:1 to 10:1 as far as lines of code needed to get the same tasks done.


I think my hiccup is that it seems like good awk programmers are expected to be able to pass the full bloody program in as a simple string argument in a bash one-liner. Which is clearly not tenable.

There are a few things I have done which felt more difficult than they should. However, the vast majority of dealing with structured data using awk really has been a pleasant surprise.


Awk did have this reputation of hard-to-read one liners, but it was also always possible to write readable code in it much as you would do in any language.

My favorite bit of Awk code that I ever wrote was this print formatting program for the HP LaserJet II:

https://github.com/geary/awk/blob/master/LJPII.AWK

It printed source code in a "2-up" format, with two pages printed side by side in landscape mode. It looked for the kind of separators that I and many people were fond of back in those days (the source code is a good example) and converted those to graphic boxes. And it tried to be smart about things like breaking pages before a comment box, not after.

I wish I still have a sample printout handy. For 300 lines of what I thought was fairly cleanly written Awk code, it made some nicely formatted printouts.


That's very clean looking code and I think it would be considered so regardless of the language it was written in, IMHO. Nice! :)


Awk makes it easy to write one liners, but it is also very easy to save your script to a file and run it with the -f option.


I think some of the examples in popular documentation use awk -f to read a program from a file, so I don't feel like that's regarded as an unreasonable way to run awk code.

It's true that many programs might well just have a single action (or a single action plus a BEGIN and END), but I'm not sure that resorting to awk -f will give great offense.


one-liners are fun though, especially if you like code golfing..

if you're interested, you could give my tutorial(https://github.com/learnbyexample/Command-line-text-processi...) a try

it has over 300 examples..


I gotta say, I really enjoyed skimming through this, and saving it for future reference. I can already tell there are some basic things that I should've memorized with all the one liners I've used, but because most were cobbled together for the immediate task (and many online forum searches assume a lot of prior knowledge), so t this presentation, and cookbook style references really help me hold the practical benefits in memory alot longer, so thanks again!


thanks, that means a lot to me :)

if you are interested in text processing in general, do check the other chapters for grep/sed/perl/sort/paste/pr/etc

I learned a lot writing them up and marvel at the speed and functionality provided by these tools.. for many cases, it is lot easier to write a bash script combining these tools than to write a Perl/Python/Ruby script..


Ahh, what a useful link! Bookmarking immediately. I use AWK like once a year but when I do, I usually struggle a lot. Your examples are really helpful!


thanks, it's a reference for me as well - not easy to remember all the features and tricks


Thanks for that mate. I'm a regular customer your work is greatly appreciated.


thanks for the feedback :) happy learning


what I usually do is to wrap the awk script in to a sh function. I never do one-liners unless it is a well known idiom.


One thing I also really like about Awk is that you can keep 99% of the language in your head. With Awk it is seldom necessary to even reference the man page. I don't have to interrupt myself with looking up the syntax or the order of the parameters to a function.


Agreed. I learned awk a bit late in my Linux journey (started using Linux in 1999, learned awk in 2016), but I can't imagine not using it daily now. Great tool to manipulate data, and it's great for when you want a Unix tool that doesn't exist (e.g., computing the sum or mean of a stream of numbers).


I never got past using awk for string replacement that I couldn't pull off as a regex in grep...


awk is a fundamental part of the unix ecosystem. It's in the hall of fame of utils along with the likes of grep. If you use unix or linux in any semi-serious capacity, you are familiar with awk.


I'll just share my biggest Awk-based project here: A tool that creates HTML documentation from Markdown comments in your source code.

It does essentially what Javadoc does, except it uses Markdown to format text, which I find much more pleasing to the eyes when you read source code.

The benefit of doing it in Awk is that if you want to use it in your project, you can just distribute a single script with your source code and add a two lines to your Makefile. Because of the ubiquity of Awk, you never have to worry whether people building your library has the correct tools installed.

It doesn't have all the features that more sophisticated tools like Doxygen provides, but I'm going to keep on doing the documentation for my small hobby projects this way.

[1] https://github.com/wernsey/d.awk


It is true that there exists an awk most everywhere, but I've been burned a few times by the discovery that nawk (Solaris (dating myself a bit there...)), gawk (many GNU/Linux distros), and mawk (some GNU/Linux distros? Don't know where I ran into it, but I have in the last couple of years) all have subtle incompatibilities around the edges.

As I recall, gawk in particular has some extensions that are explicitly a superset of the standard funcionality. Which is great if you're only targeting gawk and know about it. It's less great if you think you're only targeting gawk, and discover later on that there's a system with a different awk you need to support.

Your project looks neat, by the way. I'm looking forward to taking a closer look.


Yes, there are differences. I used only standard awk facilities, and tested my project with gawk, mawk and nawk.

I've also been burned when I discovered that Raspbian ships with an older version of mawk that did something differently in the way it processed regexen that caused my script to break.


On the other hand, there is a POSIX standard, which you can use as a rough guide to write portable Awk if you suspect portable Awk is needed.


Very cool. But, are there environments which have awk that don't have perl5? Even most commercial unices back in the late 90s shipped some kind of perl.


I had this exact book and used awk on DOS/Novell in the early 90's when scripting choices were pretty scarce. The writing is tremendous - a model of clarity, and worth reading just for that. Anything with Kernighan, Pike or Plauger as author is worth checking out just for the example of clear thinking.


In 1996 I worked as a federal contractor for a US Army base. They had different Unix systems locked down for security reasons. Had Awk and Sed to work with and ordered the books from Amazon.

Oracle databases and other databases exported data in fixed width files and I had to download from several Nix systems to import into one general Nix system using Oracle and then a DOS based Clipper 5 system and an Access 2.0 Windows system and they all had to get the same results.

If not for Awk I could not filter the files from the Nix systems.


Saying "Nix" instead of "*nix" is confusing because of the Nix package manager[0]

[0]https://nixos.org/


I printed this book and went through it and imho just skimming through all of it is worth it: just understanding how it works beyond the basic '{print $2}' is immensely worth it, and being exposed to some 'advanced' techniques gives you a set of techniques that you can reuse in your daily chores (in particular if you're a sysadmin).


This book is worth the read. Just to get in the mindset of the authors. I wish that more programming books could be as concise and useful at the same time.


TXR Lisp provides a Lisp-ified awk in a macro:

http://www.nongnu.org/txr/txr-manpage.html#N-000264BC

> "Unlike Awk, the awk macro is a robust, self-contained language feature which can be used anywhere where a TXR Lisp expression is called for, cleanly nests with itself and can produce a return value when done. By contrast, a function in the Awk language, or an action body, cannot instantiate an local Awk processing machine. "

The manual contains a translation of all of the Awk examples from the POSIX standard:

http://www.nongnu.org/txr/txr-manpage.html#N-03D16283

The (-> name form ...) syntax above is scoped to the surrounding awk macro. Like in Awk, the redirection is identified by string. If multiple such expressions appear with the same name, they denote the same stream (within the lexical scope of the awk macro instance to which they belong). These are implicitly kept in a hash table. When the macro terminates (normally or via non-local jump like an exception), these streams are all closed.


For context, I'm in university, but during one of my internships, a lot of the older developers always seemed to use awk/sed in really powerful ways. At the same time, I noticed a lot of the younger developers hardly used it.

I'm not sure if it's a generational thing, but I thought that was interesting.

Anyways, are there any good resources to learn awk/sed effectively?


Awk really has been superseded by Perl (and therefore arguably by Python, Ruby, etc.) But sed remains a thing of beauty, all its own, and very well worth learning. Hardly a day goes by that I don't use it in some one-off command like

    for i in *.png; do pngtopnm < $i | cjpeg > `echo $i | sed 's/png$/jpeg/'`; done


  for i in in *.png ; do
    pngtopnm $i | cjpeg > ${i#.png}.jpeg
  done


Over the years I've written too many awk one liners to count. Most of them look ugly - hell awk makes Perl look elegant - but having awk in your toolkit means that you don't have to drop out of the shell to extract some weird shit out of a text stream. Thanks Aho Weinberger and Kernigan!


And I'm still waiting for the structural regular expressions version of awk [0].

I very much like awk, I prefer it over sed, because it's easy to read. Also proper man page is all one needs. But I find myself many times doing something like this:

  match($0, /regex/) {
    x = substr($0, RSTART, RLENGTH)
    if(match(x, /regex2/)) {
      ...
    } else if(match(x, /regex3/)) {
      ...
Then I sometimes want to mix and match those strings. Or do some math on a matched number. It's a bit tedious in awk.

[0] http://doc.cat-v.org/bell_labs/structural_regexps/


It seems that work has already started: https://github.com/martanne/vis.


Correct me if I'm wrong, but as fine as vis is it will not feature stand alone strex-awk.


Awk is great for quick command line scripts and also for running on a very wide range of systems.

I recently wrote a simple statistics tool using Awk to calculate median, variance, deviation, etc. and people say the code is readable and good for seeing the simplicity of Awk.

https://github.com/numcommand/num/blob/master/bin/num


If you want a fast awk, use mawk.

https://github.com/mikebrennan000/mawk-2


Yes. mawk is shockingly fast.

In my perfect world mawk would have some of the gawk extensions, and it would have a csv reader mode to properly split csv into $1...$NF. Because that would be the killer tool.


The new mawk has it, just the old Debian version not yet.


Where is the new mawk please?


...Or transpile it to ANSI C with AWKA.


That is a nice book. Starting with a practical tutorial and going into the structure and language features afterwards on a reasonable page count of just about 200 pages.

I like to use awk when I need something a little more powerful than grep. Nevertheless, when I look at the examples and where the book is heading I prefer R for many of the tasks (in particular Rscript with a shebang).

Just to give an example: If you have to manipulate a CSV file, that would most certainly be possible with awk, but some day there might be a record which does contain the separator and your program will produce some garbage. R on the other hand comes with sophisticated algorithms to handle CSV file correct.

I truly respect awk for what it was and is but I also think that the use-cases where it is the best tool for the job has become very narrow over time.


As I do most of my daily work in cheminformatics with a (shell-based) workflow engine (http://scipipe.org), awk has turned out to be the perfect way of defining even quite complicated components based on just a shell command. These days, pretty much 50% of my components are CSV/TDV data munging with awk! :D

(Can be hard to explain how this works without an image, so an (older) image is found in: https://twitter.com/smllmp/status/984173696448434176 )


I find awk so beautiful. I written many scripts in awk. It is so good at data transformation. I used it write a script to delete old and unused records from tables. The book is so beautifully written with amazing clarity of thought.


I recently had a sort of "contest" with someone for parsing the output of a tool. I had to parse some text output into a tree structure.

The other person wrote it in awk, quite quickly. After writing my own version in Python (my version was waaaay over-engineered), I decided to blatantly rip-off the awk solution and re-implement it in Python.

It was almost as simple and as short.

Awk is much more compact as a language, but also way more limited. And it still has its quirks and a certain volume of information you have to gather. I'd say it's more worthwhile to learn Python instead, because you'll be able to use it for other purposes.


From the introduction to Chapter 2:

> Because it's a description of the complete language, the material is detailed, so we recommend that you skim it, then come back as necessary to check up on details.

Any book that recommends skimming is doing something right.


This pdf looks like a scanned book but I can highlight and copy text from it? What exactly is going on here? Does Chrome pdf viewer have built-in OCR?


PDF has long had this feature, most/all readers support it. There is a hidden textual representation included along with the scanned shown content.


Interesting, why not just replace the text then? Where can I find more info about this? I was actually trying to find a good source of info about PDF's recently and couldn't really find much.


Replacing the typeset text with any reasonable fidelity seems like a much harder problem than reproducing the scan and providing the ocr'ed text content. It might still be a good idea to do, maybe some software does this.

I don't have any references, sorry.


>Interesting, why not just replace the text then?

Why would one do that? It's destructive. A typeset book is much more than a text file.

Not to mention OCR is not perfect, especially when math / special symbols are involved.


The lovely and humbling thing about this it was written 3 decades ago and the examples still work. Makes me think of another short elegant piece by Kenneth Church (?) called "Unix for Poets" which shows how to use core UNIX utils to work with text. Also from the mid to late 80s. Perl may have replaced sed and awk but they endure.



I've this book and highly recommend. I have referenced it numerous times to pull out text manipulation wizardly that stunned others.


Lol. This is the IT equivalent of the hairy dog story about the itemized million dollar invoice: $1 - drill a hole, $999,999 - knowing where to drill the hole.

And by that I mean sometimes it seems easy to solve a problem because you have the skill to do so. It looks easy but that's only because of the time invested in making it easy for you. For anyone else the challenge remains.


I always found it interesting that the Awk paradigm is also the basis for IBM's RPG language. Two very different environments coming up with basically the same elegant solution for the same problem:

1. Run zero or more setup operations.

2. Loop over the lines of a text file and process its columns into an output format.

3. Run zero or more cleanup operations at the end.


If you want a binary alternative to awk, try using lex.[1]

You can feed in regexps and c code fragments and it will generate c code for you.

[1] https://www.tldp.org/HOWTO/Lex-YACC-HOWTO-3.html


I wanted to know what the language looked like so I went to the first example in the book and found this:

  This is the kind of job that awk is meant for, so it's easy. Just type this command line:
  awk '$3 > 0 { print $1, $2 * $3 }' emp.data


Lots of comments about awk, perl and sed for text proccessing. What about tcl?


Recycling an older HN discussion on Awk vs Perl

https://news.ycombinator.com/item?id=14647022


I just skimmed it in 30 minutes. I feel I can write some simple stuff now. Except for all the examples it doesn't feel that overwhelming.


I was joking with my coworker some weeks ago about how awk is condemned to be forever used as a cut replacement.


I have this book. I use awk daily to do analysis of Suricata logs. It's great for querying structured text.


I love the typography


That's troff at work.


This has bothered me for a long time. It looks like i'm seeing something other than what other people are seeing. So many of these documents just look awful. In this particular case it does not look good either:

http://i.imgur.com/e11d0aK.png

http://i.imgur.com/0Ysr7QQ.png

Look at the kerning on "Awk", it's not good. And look at the zoomed in version, the characters all have pixelisation and jaggies.

These were just viewed using Firefox's default pdf viewer. Is there a way to view them and see better a quality version of the document?


Given that this is a scan of a book produced on a phototypesetter (no pixels), you probably want a real copy.


Yeah, it was just surprising to me that people were praising this document.


Awk has it's uses. If you use the command line you'll probably use Awk occasionally.

I don't get the Perl hate. Perl's unpopularity may have something to do with some of the languages design choices. I think what really killed it was Perl coders. Some of the worst code I've seen happened to be written in Perl. If you follow clean code principles Perl is fine. Mojolicious is an awesome framework. I like it a lot.

Today I code Python and C. I used to code Ruby and before that Perl. I loved Ruby's syntax but Ruby seems to be waning. I'm looking forward to coding in Go. I'll be coding Javascript but I'm not looking forward to it.

Use the tool that fits the job. I have no loyalties to any programming language.


>Perl's unpopularity

Is a myth actually.

> Tom Radcliffe, recently presented a talk at YAPC North America titled “The Perl Paradox.” This concept refers to the fact that Perl has become virtually invisible in recent years while remaining one of the most important and critical modern programming languages. Little, if any, media attention has been paid to it, despite being ubiquitous in the backrooms of the enterprise.

> Yet at ActiveState, we have seen our Perl business continue to grow and thrive. Increasingly, our customers tell us that not only are they using more Perl but they’re doing more sophisticated things with it. Perl itself recently made it back into the Top 10 of the Tiobe rankings, and it remains one of the highest paying technologies. Therein lies the paradox.

https://www.activestate.com/blog/2016/07/perl-paradox


At the bottom of top 20 now and pretty much no job openings in most countries.


With perl's use cases, I would be a bit surprised to see a job specifically for programming in perl. It's a tool that you use to support your other infrastructure. It's not a tool that you generally use to build up that infrastructure.

Similarly, I don't expect to see many jobs openings for wrenchers, but I fully expect a mechanic being hired someplace to be able to use a wrench.


> see a job specifically for programming in perl

That was exactly seen some twenty years ago. Perl as used to build infrastructure, like entire back-ends for sites and whatever.

You don't see those jobs anymore.


There are generally no job openings for engineers anywhere, because they all rely on headhunters and recruiters.

Try https://perl.careers/ for example.

I first heard of them from this slide deck that any developer, regardless of tech stack, should read: https://de.slideshare.net/perlcareers/how-to-write-a-develop...


> Awk has it's uses. If you use the command line you'll probably use Awk occasionally.

Agreed -- I use awk all the time in shell pipeline.

> I don't get the Perl hate.

The problem is not with writing Perl -- I don't mind using Perl to write scripts and tools. In fact, I like it better than Python for writing glue code.

Reading Perl code -- especially if the code base is old and has been worked upon by multiple people -- now that is a real pain. Don't get me wrong, I have seen some really well written Perl code and have had the good fortune of working with some really smart Perl programmers. But the majority of Perl code I have encountered has been unreadable mess that makes me want to pull my hair out. There are times when I feel it is more productive to rewrite the code in Python than to spend time on the existing code.

> If you follow clean code principles Perl is fine.

Agreed -- except most Perl coders don't. Worse, they make a large chunk of people who contributed to CPAN -- something which was one of the major reasons behind the popularity of Perl.

> Use the tool that fits the job. I have no loyalties to any programming language.

Couldn't agree more.


Yes, Perl is widely known as a write-only programming language.


And yet most Perl code was never hard to read and understand, except for stuff like OOP, but OOP is hard in any language. In fact push for OOP is probably one of the notable things that contributed to Perl's decline.


> except for stuff like OOP

That is funny, because one of objectives of OOP was to give code a better structure and therefore make it easier to understand.


Correct, because good luck trying to figure out what the hell someone else's Perl program is doing later.

I used to spend up to 40 hours in a Perl debugger trying to figure out what the program is doing, and after going 25 layers deep into the call stack, I'd come out one week later none the wiser as to what that damn spaghetti code was doing. Tracking down any bug would always turn into debugging of epic proportions. I never had such problems debugging machine code!

That is why Perl gets so much hate.

My favorite construct to hate was going through the logic and just as I was about finished trying to understand the state machine at that point hitting an unless{} clause which instantaneously wipes the slate clean. Oh how I hate Perl.


> Some of the worst code I've seen happened to be written in Perl

Second that. Worked with a guy who used (and adored) Perl for more than 15 years. His C++ code was so incomprehensible I had to rewrite many things in my spare time.


> I don't get the Perl hate

I would say hate is a strong word but I have 2 issues with Perl:

1) Regex choices made

2) Readability, if someone wrote something in Perl I normally would rewrite it if it took me less time then figuring out what they wrote and wasn't working as expected. Maybe it was just my poor skills but man Perl can be hard to tell what is actually going on.


Don't judge JavaScript before you try it... it has a lot of good tooling around it, and a enthusiastic, and large community.

There is a reason companies like Netflix, and Walmart are heavily invested in using Node.js for their internal services.


Yes, I'm getting there. I'm going to embrace it.


It's like shells or editors. Many `older` beards are mindful of the aspect that not all tools are equal in that most systems had some tools shipped as default and others you had to go and install yourself. Thus they became wise though experience of sticking to using tools that did the job and would be more commonly available. Hence they all grew up with the likes of vi,sh and awk. They did the job and worked well. Yes alternatives became more acesable and now most systems ship with all the flavour of shell and editor as they class the likes of bash,ksh,csh... and emac/perl to be part of those essential common installs. This and space/storage less of a factor as it was decades ago.

Really gets down to personal taste now more than it did in the past, but if a tool works, does the job and the alternatives don't offer any gains, then they stick with that. Not saying the alternatives are bad or worse in any way, or indeed better, they are just different and in some cases, maybe better. At least the core reason of them always being there instead of having to add another dependency and risk factor to a system have become moot these days.

TL;DR some older tools more guaranteed to be upon all systems as a lowest common denominator in the older days than now and legacy always outlives the machine, hence we still run COBOL today as it works, maybe better solutions but rebuilding rome overnight still avoided.


I don’t think it’s just a mythical old greybeard thing. I’m 28 and I use vim for exactly this reason: a vi-like editor, not necessarily vim, is guaranteed to be available everywhere. I don’t have any particular reason to like vim over emacs, other than the fact that there is value in using standard tools.

My favorite OS, FreeBSD, comes with neither bash, nor emacs, nor Perl, so I don’t consider them part of the “lowest common denominator” set even in 2018.

(Funnily enough, it does come with a C++ compiler!)


the most recent version(s) of Javascript are quite pleasurable. A lot has happened in the last 4 years.


bold was called heavy, lol


I know many people will downvote, but in my opinion, just say no to this ancient "programming language". It's so confusing, completely text based, designed ages ago in an entirely different environment. There are many better alternatives, like Python or Powershell. Why not use them?




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: