Hacker News new | past | comments | ask | show | jobs | submit login
Sed – An Introduction and Tutorial (grymoire.com)
281 points by aethertap on Jan 7, 2015 | hide | past | favorite | 73 comments

I just use | perl -pi -e 's/foo/bar/g' , etc for this kind of stuff. Is there anything I can't do with perl on a line that sed will do? I can see how perl is a lot more complex than sed, but I went through the whole perl learning curve back in the late 90's so it doesn't bother me that much..

For me, this is the whole point of perl: You can learn one tool which replaces three, each with their own syntax: Sed, Awk, and Grep.

Wait, why not just use awk? It can do pretty much everything the other two do (and can, like sed, be used to replace other things (like 'head')) and runs faster than perl at some things. (https://news.ycombinator.com/item?id=8858342)

I learned awk in school, and like it. But I like to choose tools which increase programmer-efficiency before machine-efficiency. Perl is just one tool and one easy syntax to learn. Its regex language is now the standard as well.

Only if runtime efficiency turns out to be too slow would I re-examine choice of tools. This is like avoiding premature optimization.

(I've never found perl to be too slow in practice, btw.)

It's a lot easier to delete specific lines using sed. Also you can have sed do replacements to the n'th instance of something. Doing that in Perl is a bit more complicated and a lot less succint.

$ echo "foo foo foo foo" | sed 's/foo/bar/3'

foo foo bar foo

The Rakudo Perl 6 compiler is still immature and slow, and the -i option (in-place edit) hasn't yet been implemented, but, at least for comparison's sake:

$ perl6 -pe 'next if ++$ == 2' example.txt

... prints all lines except line 2.

This is an example from Perl 6 One Liners[1].

The `$` is just just an unnamed variable that is getting incremented once per evaluation (-e is for `evaluate`) which in this case happens once per line (-p is for printing each line of input after eval'ing the code -- unless a `next` applies, in which case that line gets skipped).


$ echo "foo foo foo foo" | perl6 -pe 's:3rd/foo/bar/'

... replaces the third foo with bar.

P6 regexes are far easier to read and way more powerful than P5 regexes. The `:3rd` bit is a general language feature called "Adverbs", in this case applied to the regex focused s/// built in.[2]

[1] https://github.com/sillymoose/Perl6-One-Liners

[2] http://doc.perl6.org/language/regexes#Adverbs

I've read literally nothing about perl 6 but what David Skoll wrote here: http://david.skoll.ca/blog/2010-07-29-perl-sss.html


"I asked on a forum what the goals are for relative size and speed of Perl 6 vs. Perl 5, and a Perl 6 developer responded that a reasonable goal would be to have Perl 6 be twice as big as Perl 5 and take twice as long to start up.

"To achieve this goal, the Perl 6 developers will have to shrink the program size by a factor of 6.1 (that is, get rid of about 84% of the code.) They'll need to reduce startup memory consumption by a factor of 13.7 (that is, cut out 93.7% of their memory use) and reduce startup time by a factor of over 275.

"Oh, and this is after they add in all the missing features required to bring Perl 6 up to production-level."

Has the situation gotten better since 2010?

> Has the situation gotten better since 2010?

Not really. Startup uses about the same RAM. It's about 10x faster.

The best docs I know about performance would be http://pmichaud.com/2012/pres/yapcna-perflt/slides/slide17.h... and http://jnthn.net/papers/2014-yapceu-performance.pdf#page=72

> "... all the missing features required to bring Perl 6 up to production-level."

The latest story is that the last major missing features (Unicode grapheme-by-default and native arrays) will land in the next few months and Perl 6 will be declared "officially ready for production use" by the end of 2015.

deleting (i.e) not printing lines Here are the examples for not printing the 4th line:

sed: sed -n '4!p'

awk: awk 'NR != 4'

perl: perl -ne '$. != 4 && print'

Not much between them really.

I would consider 'sed 4d' to be significantly easier than doing the same in Perl, but I don't disagree that it's not that hard to do in either.

Probably not, but in some cases sed and AWK might be faster. I am a big fan of AWK. It is limited in some ways that make it impractical to use for really serious programs, but it is very expressive. Look at [1, 2].

[1]: http://c2.com/doc/expense/

[2]: http://www.pement.org/awk/awk1line.txt

I have tested converting sed/awk lines to perl in a few base scripts that worked on a fairly large amount of data. Oddly enough, in every case, perl 5.18 performed at LEAST 1.5x faster, sometimes as much as 3.5x faster. Obviously anecdotal evidence, but recent versions of Perl seemed to have gained some good speed.

I've had a similar experience, with Perl up to about 7x faster. I had sed in a few data-mangling pipelines because I assumed simpler=faster, but replacing it with Perl was either a wash or a speedup in every case. This with the versions of perl and sed in Debian (looks like it's GNU sed), so ymmv with other seds.

The case where I saw a 7x speedup was doing many-times-per-line, fixed-string search/replace on a file consisting of very long lines (an SQL dump where some lines had >1m characters). Perl was IO-bound (so presumably would've been even faster if I'd had better disks), while sed was CPU-bound at a pretty low fraction of the possible IO performance.

Could have to do with charset handling.

For the really complex stuff, there's rejit[0]. I wonder if LuaJIT would work; these tools also need IO tricks to be fast.

[0] https://lwn.net/Articles/589009/

Rafe Colburn from Etsy wrote about this performance oddity: http://rc3.org/2014/08/28/surprisingly-perl-outperforms-sed-...

EDIT to manage expectations: the article doesn't explain why, it just provides benchmarks and one commenter made a suggestion about character handling. More insight still welcome :)

There are many different versions of awk: gawk, (BSD) nawk, mawk, etc. I think OS X uses nawk, but mawk is reputedly faster. Gawk is definitely slower than both mawk. I'm not surprised that Perl is faster though.

> I used the default OS X versions of these tools. The versions were Perl 5.16.2, Awk 20070501, and some version of BSD sed from 2005

I'm a firm believer that there are many ways to work effectively. If Perl fills the same needs for you that sed does for some people, then by all means stick to Perl.

However, for someone who knows neither, here are some reasons you might want to choose sed over Perl:

1. sed syntax is pervasive in other tools. For example, to run a substitution from early on in the tutorial in vim, type :%s/abc/(&)/<enter> from insert mode.

2. sed is simpler than Perl. It used to be that Perl filled a unique role as a scripting language but now there are a bunch of languages in that space (most notably Python and Ruby). Python + sed for example would fulfill most of the same functions that Perl does (and there are reasons to choose Python over Perl as scripting languages, although that's a much more complicated domain to discuss).

3. sed is more performant (or so I hear). This has never been a real concern for me, but some people cite this is a concern.

4. sed usually is a bit more terse. For the length of expressions you'll typically be writing with sed, this isn't a big concern.

Disclaimer: There are probably good reasons to choose Perl over sed, too. Not being a Perl guy, I don't know those reasons. I'll leave that to someone who knows more about Perl.

When I learned sed it was because I was taking a *nix class in college, and a professor pointed me at sed and not Perl. I learned sed instead of Perl because it was put in front of me, not because of any weighing of pros and cons. That's how a lot of learning happens. Sometimes simply learning what's put in front of you leads turns out to be an obvious mistake in retrospect (I was stuck writing VBA for a little while). I don't know whether learning Perl or sed is better for what sed does, but I do know that after maybe 6ish years using sed quite frequently it hasn't turned out to be an obvious mistake.

> "3. sed is more performant (or so I hear)."

There are regexes for which awk (maybe sed too?) will perform at a reasonable speed while perl is incredibly slow. Graph: http://pdos.csail.mit.edu/~rsc/regexp-img/grep1p.png Article: http://swtch.com/~rsc/regexp/regexp1.html

Perl...could...employ much faster algorithms when presented with regular expressions that don't have backreferences.

Sed is one of those tools that once you learn it, you'll start to wonder how you ever got by without it.

This is a great set of tutorials, he also wrote one about Awk: http://www.grymoire.com/Unix/Awk.html

Get to know these two tools and you'll be amazed at the hours you can save and what you can do, especially with text files.

I am always amazed by how much people can get done using piped unix commands - but, personally, I find it much easier to just use sed when I need to quickly edit a few files, and if I need to do anything slightly more elaborate, to do it in Python with a script.

It's obviously much slower - but I've never been in a position where I needed insane speed to quickly fix a bunch of files.

I'll sometimes use sed/awk/cut/etc... to get one off summary info out of large files. Excel/LibreOffice would choke, and loading it into a database for one off is painful.

The nice thing about these tools is you can count on them being on any unix machine w/o worry about installing them on a remote locked down box of some kind. You can also avoid any dependency complexity. They can be real awkward to remember however and the man pages can sometimes be a handful, so I try to store common idioms in gists or in my .bash_profile whenever I develop one.

With regard to cross-platform compatibility, it's worth knowing some of the basic differences between GNU sed and BSD sed. I like the GNU extensions -- particularly the extensions to regular expression such as non-printing characters (e.g. `\t`) and character classes such as `\w` and `\b`. I almost always use the GNU `-i, --in-place` option once I'm satisfied that my sed commands do what I want. A couple of years ago, I was using my other half's Mac (OS X 10.4 with a very old version of BSD sed) and I really missed the GNU extensions.

When you say you store common idioms in your Bash profile, do you mean storing the commands as Bash aliases or functions? I have similar issues with remembering syntax and building sed commands and I've been trying to do something similar to avoid spending time building a complex command from scratch when I already created a similar one some time previously.

I usually store things as aliases...and while sometimes I don't end up using the alias exactly having a useful name that describes what it does helps me tailor the command later on. For example I have the following alias 'watch_port' that looks like this:

sudo ngrep -W byline -d en6 -qilw 'get|post' tcp dst port

so `watchport 8080` will print all network traffic over port 8080 on my ethernet. I actually rarely use this as 'watch_port' directly, but it helps me remember quickly how to bend ngrep to my needs.

Thanks for the response. I used to keep a list of long commands that I had constructed in a plain text file, titled `useful-commands.txt` but that became too unwieldy. Now, similar to you, I try to store them as aliases – even if I don’t use the command exactly as it was saved. The hard part is coming up with a good, succinct name (descriptive but not too long) for the alias.

For a long time, I didn’t like using aliases because I didn’t want to become overly reliant on my custom aliases – and then miss them when working on an unfamiliar system. This generally worked out alright when I was able to use `Ctrl-R` with a large Bash history. Now, I think that was an irrational rationale and that aliases are very useful shell features. I’m currently trying to organise my aliases and functions into useful groups such as `home_aliases.sh`, `cygwin_aliases.sh`, etc. so that they can be loaded as needed. I then plan on adding them to a git repository so that they can easily be used – and updated – on different systems.

BTW, thanks for letting me know about ngrep. It looks like a useful complement to tcpdump.

Maybe it's because Perl was really popular at the time I discovered Unix and its tool, but why would you use sed and awk instead of a Perl one-liner? (Or even :s// in vim or M-x query-replace-regexp in emacs, if it's just regex munging)

Perl isn't included in e.g. BusyBox. The additional overhead of including perl in an embedded distribution could be a valid reason for using sed where you could otherwise have used perl.

vim's regex syntax is unusual and sucky. Thankfully running commands is super-easy:

:%!sed …

Exactly. Use Perl (or maybe Python).

sed is bullshit.

Do you know it is not actually possible to get output from sed that does not contain a newline?

Are you sure? sed has -z, it's useful.

I tried posting this to /r/programming. It said already submitted 8 years ago. It's a very good tutorial and I was oblivious of its existence for so long.

To be honest I don't know how much he has improved on the manual. It is such a small language that you could easily read up to the the examples very quickly even if you aren't particularly interested in learning.

I would suggest just giving it a look directly at https://www.gnu.org/software/sed/manual/sed.html

Though be forewarned, something that neither document explains well is the actual syntax. As in how addresses and expressions can be used and how to read a script. The syntax is relatively simple to understand looking at some examples, but the lack of clear delimiters between the address, command, and command parameters can confuse beginners.

But that only covers the GNU bits.

If you stick (too closely) to that one, sometimes you might use features only found in GNU Sed and think you're writing portable scripts.

I think this tutorial helps clarify what is and isn't portable.

Well, it does mention explicitly which parts are GNU extensions. But I see what you mean. I can write sed scripts fairly well at this point but clearly didn't internalize any of the notes about what is an extension, thus it is frustrating trying to get the non GNU versions to do anything at all.

probably the best reference besides the o'reilly books.

Peter Krumins also has decent a walkthrough of sed that essentially goes through and explains it via detailed explanations of sed one-lines (The explanations are original, but the list of one-liners was already popular on the internet).

Book: http://www.catonmat.net/blog/sed-book/

Free online articles: http://www.catonmat.net/blog/sed-one-liners-explained-part-o...

For context of 'why sed and why not x?'

This was written in 1984 (I think) and still works with a few syntax adjustments. I think it is not bad discipline to return to these tools from time to time and remember core UNIX principles.


I am not so sure anything that I currently am writing would/ could be relevant in 30 years. Very humbling.

I have always used grymoire.com for regex and sed tutorials since probably around 2009. Thanks grymoire

Is there a command to automatically escape a string for use in sed?

I got frusturated escaping for simple replacement: https://github.com/jakeogh/replace-text

If you find yourself having to escape a lot of slashes, then you can do something like this:

echo "/foo/bar" | sed -e 's|/foo|/tmp|'

(The article mentions this)

It can be any character really. I like sed 's,foo,bar,' sed 's@foo@bar@'

Python has re.escape; awk can take variable definitions from the command line:

  echo b |awk '/b/{print a}' a=4

A friend of mine has been trying to get me to learn awk, sed, perl or grep. Honestly I only have the patience for one at the moment, which do you think is the best (taking the ease of learning into account)?

IMO ack is a useful complement to grep if you do development: http://beyondgrep.com

Also, for general bash stuff, these are great, and IMO much better than the TLDP site that always ranks high on Google:



Learn grep. If you're familiar with using a shell, it'll be the most immediately useful.

# search current and all child directories for files containing "bananas", case insensitive $ grep -ir "bananas" .

here's an example of the first useful grep I learned :

history | grep "git push"

Just use Ctrl-R in bash to search backwards in the history

I didn't know you could do that, so it's good to have an option- thanks.

But how do you then pipe that to tail? :D

You don't, it's a readline thing, it doesn't really generate a file you can manipulate. (I expect to be proven wrong with some insane one-liner though).

What you can do is press Ctrl-R again and it will search the older match. There's also forward match but I can't recall the shortcut.

I alias "history | grep" to "hrep" and I'd bet it is one of my top 3-5 commands.

Why not find out:

  history | awk '{a[$2]++}END{for(i in a){print a[i] " " i}}' | sort -rn | head

:D HA! Nice!

2936 git 1166 cd 409 ll 343 ssh 302 l 279 vagrant 223 ls 201 la 151 cat 140 mkdir

uniq seems a bit easier to understand here:

history | awk '{print $2}' | sort | uniq -c | sort -rn

Why not use C-r?

This could change your life:


It's a fuzzy searcher for Ctrl-r that shows all possible results, updating as you type.

Sometimes that is what I want, but usually I want a list of all the iterations of something to cut/paste/adjust.

Great reference; with the awk reference on that site, this formed a great resource for learning text manipulation/searching in a Unix environment.

There's also an excellent (non-free) book:

Definitive Guide to sed


I found it to be well worth the money, though I wish it were available as a PDF.


was interested in reading this until i saw the yellow background and couldnt stomach it - looks like a lot of great information just displayed in a horrible way

If it's available on your platform, you could print it to a local file since the print version doesn't use the same CSS as the web (OS X = print to PDF, Win = print to XPS, etc).

Or, presuming you're on a modern browser and care that much about the content, you can just inspect the dom, find that <link type="text/css"...> in the head, and delete it.

On firefox:

View > Page Style > No Style

> If it's available on your platform, you could print it to a local file since the print version doesn't use the same CSS as the web (OS X = print to PDF, Win = print to XPS, etc).

Yeah or you could go read a better book

You'll have a hard time finding one though. The O'Reilly books are the only ones that I know of that might qualify as "better."

The site is ugly, but it's one of the best references for sed.

If you're using Chrome, you can install Clearify[1] and read it with better readable CSS

[1] https://chrome.google.com/webstore/detail/clearly/iooicodkii...

I had the same opinion about the official sed page[1], until I noticed the footnote. It got my attention and I decided to learn. One of the best, portable, elegant tools I use every day.

[1] http://sed.sourceforge.net/

That's not official. It's a centralized homepage for several different sed implementations, not really affiliated with any of them.

In Firefox:

View/Page Style/No Style

It also looks pretty good in the lynx browser, and probably other text browsers. https://en.wikipedia.org/wiki/Lynx_%28web_browser%29

It's weird, it didn't always look like that. I'd been recommending it for years while it was a very Spartan HTML page with no CSS or inter-page navigation. The horrid new look came early last year.

I use the Clearly browser plugin for pages like this. Works like a charm.

The site is so garish I actually love it for some reason.

Use Instapaper?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact