Hacker News new | comments | show | ask | jobs | submit login
How to Make Money Using Grep, Sed and Awk (openmonstervision.github.io)
72 points by bsdpunk 9 days ago | hide | past | web | 40 comments | favorite





"How to make money by posting a pretty boring article to HN and asking for book purchases at the end of it"

I didn't mind, I actually lolled when reading his other blogpost about how he botched the personal question part of an interview: https://openmonstervision.github.io/blog/posts/how-not-to-ge...

That article /is/ funny.

It also doesn't beg for book purchases for books he didn't write... :P


Dude! You need help with using your tools' features to reduce your pipeline lengths.

- grep derp | sed whatever is the same as sed '/derp/ whatever' (you might be interested in sed -r for that matter)

- sed whatever | sed whatever\ else is the same as sed -e whatever;whatever\ else

This list is by no means complete but saves a lot of external processes already, and seriously: think about writing the whole thing in pure bash or awk. There might be just too little gain to justify including all these tools without composing the many features they provide.


I disagree, unless it's slow enough to effect performance, I prefer to combine simple pipes.

The only time I have to use more complicated constructions is to get around the stupid problem that passing filenames with pipes is almost impossible to do safely.


I learned when writing bash scripts for cygwin under Windows that Windows doesn't support copy-on-write fork()'s. This means that any new process is extremely expensive to fork, because it copies the memory from the process it was forking from, even if it doesn't need it.

As a result, I did as much as I could using only bash internals, and very rarely did I use pipes, because they always fork()'d a bash subshell in addition to whatever the process was.

In about 7k lines of bash script written to completely automate the iterative development of two games with a shared game engine featuring dockerized server containers, I was able to avoid using sed in almost all of it. When I did use it (pretty sure only two places), it was basically for mass substitution of variables inside of text files, and the multi-line syntax worked very nicely for this, as I could form the sed line with a loop over all the variables I wanted to replace.

It turns out for most bash scripts, you can do the most common sed substitutions using bash's rich variable substitution expressions. For instance, you see in a lot of bash scripts calling commands like "dirname" and "basename" to get the directory and filename of paths. There's a much faster way to do this in bash:

  path=/tmp/my/path/to/stuff.txt
  dir="${path%/*}"
  file="${path##*/}"
  other="${path/stuff/other}"
That dir line means "delete the shortest string that matches /* from the end of the string". That file line means "delete the longest string matching */ from the start of the string." That other line means "replace the word stuff with the word other in this string".

there's always find -print0 and many core tools support -z (-0 for xargs). but you have a point of course, YMMV applies.

There's always some reason for not to write "the perfect" command. Either being in a hurry or you just don't care about it.

Considering the commented out valid code, it looks like he was saving time by just adding another "|" and check the output, instead of improving the existing command and check the output. Something that I do very often as well.


Glancing over this I thought it said: "Either being hungry or...". Nodding in agreement, I then realised my mistake.

I looked up the freelance job in question, it pays $36 for a 215 page publication from the 80s, including checking where automation has failed.

A question to those who do freelance work: sites like Upwork often have low budgets, where do you guys find better projects with better compensation?


I gave up with all of them. A friend of mine is using Upwork and Freelancer (awful) to find gigs, and he tried to get me involved so I could help him out with overspill. It seemed like you had to put ludicrously low rates to get terrible work, and despite paying peanuts, everyone expects amazing results.

I ditched it, joined some communities online, did a tiny bit of networking, and announced I was looking for work related to X - that was far more fruitful than sending 10 pitches a day for gigs on those sites and getting nothing in response because someone in India will do it for a dollar. Although I did love seeing the same projects re-appear a week later to fix the crap that their offshore team delivered.


I'm from India and I'm horrified that so many people are willing to do work for nothing or next to nothing...

I see... Thanks for sharing. I think what we seem to be after is contracting/consulting work rather than freelancing, the naming is what'll set the tone.

Haven't worked below 60$/ for fun things and 80$/h for boring things on Upwork without much reputation. Sure I could find a better paid job, but if you can sell yourself as skilled you don't have to be afraid of the averagely low numbers on there

Actually, it was a project, I priced out at $100. Thinking I could rip through it easily. It took me longer than I thought I am sad to admit, even with my rough pipes, and multiple sed and greppings :)

edit: And the catalog was 1048 pages long not counting the index.


A man who eschews Perl, and all later and more usable tools, in favour of a bastard hodgepodge of shell and awk with a smattering of sed: It's like looking in the mirror.

At what point it pays off to switch to python/node instead of keep growing a bash script?

At what point should one switch from an interpreted to a compiled language?

There probably aren't any hard and fast answers. Take Dracut [1], for example. It's a successful utility completely written as a collection of bash scripts. The answer probably depends on the specifics of your needs and specific benchmarks.

Bash gets a lot of hate, but if you take the time to really learn it as a language and use good coding practices---like linting, verbose warnings, and unit testing---then it's not too difficult to write long bash scripts well. I don't think it's really much trickier or dirtier than JavaScript.

One thing that helps is putting the spiritual equivalent of "use strict" at the top of your Bash script:

    shopt -s -o errexit pipefail nounset noclobber
Take a look at `help set` for info on what those settings do. Here's a list of resources that have helped me feel confident when writing bash:

    * Bash Hacker's Wiki [2]
    * ShellCheck [3]
    * Debugging Bash Scripts [4]
Also, this blog post gives some advice that I found useful:

    * Shell Scripts Matter [5]
[1] https://dracut.wiki.kernel.org/index.php/Main_Page

[2] http://wiki.bash-hackers.org/

[3] https://www.shellcheck.net/

[4] http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_02_03.htm...

[5] https://dev.to/thiht/shell-scripts-matter


The beauty of bash scripts is that anything with u+x in your PATH, aliases and user-defined functions become a first-class citizen with the same interface. The usual plumbing with pipes and redirection -- and sometimes some more advanced stuff like process substitution -- is often all you need.

Moving to Python, say, makes the control flow and the "software engineering" easier, for sure. However, don't underestimate the power of grep, sed and especially awk. I wouldn't want to reimplement a half-arsed, presumably non-bug-free version of awk in Python when I could have just used awk in the first place.


There's no reason you can't orchestrate awk and other tools from python though. (I doubt I'm the only one that's done that before).

I wrote a tool to do that:

https://github.com/ianmiell/shutit

Among many other things I use it to test Kubernetes/OpenShift clusters in Vagrant for Chef recipes:

https://github.com/ianmiell/shutit-openshift-cluster/blob/ma...

and here's some others in a more 'native' python format:

https://github.com/ianmiell/shutit-scripts/

https://github.com/ianmiell/shutit-scripts/blob/master/logme...

https://github.com/ianmiell/shutit-scripts/blob/master/gnupl...


> crazy talk about pushing awk through python

> oh! I wrote a tool to do that!

sniffles

That might just be my favorite HN exchange ever. Y'all remind me of emacs users. Between this and the unusual number of Perl mentions on HN today, my cockles are suitably heated.

That is a compliment. Nary a day passes where HN doesn't give me a reason to keep clicking links and learning.

I started off giggling and now I'm moved from my tablet to a laptop so that I can investigate your shutit - as it looks like a handy tool for learning. The scales look the most interesting.

I have been here for a while, but only recently (past couple of months) decided to comment. I lurked for like 12 months, just to see if I'd fit in. Why? HN continually has commentary about things I haven't yet learned.

In short, this is my awkward attempt to thank you. I'll be spending the afternoon trying to enjoy your shutit scales.


Great, thanks!

You can find more info here on my blog:

https://zwischenzugs.wordpress.com/

do ping me with your experience.


No, of course not, but it’s way easier in Bash because of the first-class-ness of commands. In Python, you’d have to mess about with the subprocess module, etc.

I guess it comes down to which school of wizardy you went to. I'd rather muck around in python, if only because then I wouldn't dread rereading/editing the whole thing a couple months later.

In my experience it tends to be at around 50 to 100 lines. At this point you usually encounter at least one "gotcha" or difficultly that bash can't handle in an elegant way and realize that it would probably be a lot faster if you just used another language instead.

As soon as I start adding ifs and battling to remember the exact syntax I give up and move to Python and argparse. Recently I've started using the Begins[1] library to make useful command line tools and scripts even more quickly. I appreciate it's less and less portable at this stage but so much more productive. Most of the time these scripts are very specific anyway so it's not a concern.

The other thing I've noticed amongst peers is that I'm usually the only one (or one of a limited few) that remotely understand the shell script too. So I'd serve the team better writing it with Python. YMMV with that of course.

[1] http://begins.readthedocs.io/en/latest/


There is a lot of historical arcana baked into the command line and bash, but I've found it worth learning like any other language.

Anyway, just for future reference you can use bash's `help` command to get documentation about builtins. In the case of conditionals, we want to loo at `test`:

    $ help test
I find this a lot easier than wading through the bash man page.

Holy crap, xelxebar. Thank you! This is a game-changer. I've been wading through the bash man pages like a fool.

Thanks for pointing to Begins!

I suspect some of this is just a difference in background or culture; shell conditionals look completely obvious and intuitive to me, and I'm pretty sure my team can handle shell better than python.


Begins looks awesome, thanks for sharing! :)

Quite a bit after the point where it pays to convert sed and awk one-liners into Perl one-liners and build a program around it.

Perl may be as dead as sed and awk are now, but that was the initial use case for Perl before CGI (as in Common Gateway Interface - the first primitive interface for web apps) came along.


Long after this particular case. I'd be surprised if it could be done in fewer than twice the lines of code the author used in any language other than Perl (which might make the case for using Perl at around this point, if you like Perl). But, Python and Node would surely be a lot longer.

Shell with awk/sed/grep is very powerful and very composable for small parsing tasks like this. (It could have been done with even less code than this, but I don't blame the author for not figuring out all the necessary incantations to do so, as these tools can be fiddly if you don't use them every day.)


My thresholds: a) over 50 lines b) when you want to fork a pipe (get two different results in one pass over the data)

My rule is one line, I don't want to write any bash script to a file, one-liners are as far as it goes for me.

The best way to make money with grep/sed/awk is to become sufficiently l33t in them, you can be gainfully employed by others to "do things" on "the goddam unixes"

hey.. we got more unixes here. can somebody grep them out the way?


grep -v .*[nN].x will solve the infrastructure problem...

As you say, the way to make money (with any tool) is to get good enough at using them to be useful to someone that wants to pay you to do so


Best Comment is Best

If author used better variable names, then the code would not have been as hard to look at

That's fair



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: