Hacker News new | comments | ask | show | jobs | submit login
Why is it so difficult to write valid “tar” commands from memory? (dantilden.com)
15 points by FreakyT on Mar 25, 2015 | hide | past | web | favorite | 44 comments

A couple of nits:

- TFA says that `ln` is "used to create symbolic links, among other things"; I'd drop "symbolic"--that is needlessly restricting what it does. It just creates filesystem links; hard or soft.

- TFA says, "tar is actually short for TApe aRchive". The "a" should belong to "archive" ("Tape ARchive"); this is because of it's relation to the older `ar` ("ARchive") command.[^1]

- TFA says, "the filename is actually part of the f flag, and therefore must directly follow it." That's actually not the case for the "dash-less" form of tar invocation; all of the flags go in one string, and then the arguments follow in the same order, rather than right after the associated flag. For example, if I wanted to specify both the "f" and "C" flags, it (c|w)ould be something like `tar cfC argument-to-f argument-to-C`. That said, modern tars also support `tar -c -f argument-to-f -C argument-to-C`.

As TFA says, this isn't inherently wrong, it's just different than everything else (I know of no other command that uses that flag syntax[^2]). Understand that it predates the standard convention that everything else follows.

[^1]: Today ar is pretty much only used for creating .a files containing library objects for static linking. It may be noteworthy that the argument order of ar is ALSO "archive-name members..." despite archive-name NOT being an argument to an "-f"-like flag.

[^2]: Actually, come to think of it, `ps` might take that syntax, depending on your system.

I don't like ln as an example of sanity.

  $ ln
  usage: ln [-Ffhinsv] source_file [target_file]
         ln [-Ffhinsv] source_file ... target_dir
"Source = from", "target = to" is how I interpret that. Then if you look at some existing link, you'll see it as:

  my_link --> real_directory
In the direction of the arrow: from, to. But to create that link, the command would really be

  $ ln real_directory my_link
The link target/real path is the source and the link to create is the target.

My brain thinks it breaks the idiom of `[cmd] [from] [to]` that cp, mv, etc use. You wouldn't say the example above is a link from real_dir to my_link, you'd say the opposite (and that's what the cli shows when you ls a link).

I agree that ln is a little confusing. I've often explained it to people that it's the same order as if you were copying the file, but it creates a link instead.

Also, in sections 1 and 8 (shell), data typically flows from left-to-right across the line. The filename on the left flows into the link on the right. Data flows from the left of the pipe to the right.

Whereas in section 3 (C), data typically flows from right-to-left across the line. Variable assignment, memcpy, ...

There are of course exceptions (dup2, bcopy), but that's the general style for the order of arguments in *nix.

I have no problems with tar; it's operating on the archive, so the operation goes first, then the archive, then the optionals.

I have to look up ln every time.

Great points! I've made some minor edits to the article to reflect them:

* Description of "ln" is more general

* "Tape ARchive" is now described correctly

* No longer uses the arguments in "dash-less" mode

I've been noticing the dash-less form of program invocations popping up.

What is it exactly called?

I haven't seen it in any programs other than old research UNIX era utilities and their modern implementations. In that context, they're often called "traditional" or "old style" invocation.

Do you have any examples of modern utilities using a dash-less syntax?

I think it gets easier once you internalize that your destination filename is actually an option to -f. it's not tar czf (filename) (list of files). it's tar cz(f filename) list of files.

Ironically, I think it'd actually be more intuitive if we used the options fully; tar -c -f filename -z (list of files). But we end up memorizing shorthand before we know what it means, rather than after.

99 times out of 100, I'm extracting a gzip'd tarball. Thus, I've muscle-memorized "tar xzf file.tar.gz". What helped me do that was a little mnemonic that my friend told me: eXtract Zee File.

GNU tar can figure out how to decompress it on its own. Just xf will do.

Just read your answer after writing mine, have been using the same one, adding a German touch to it ;)

This is hilarious, memorable, and actually useful. Thank you zcdziura, may the wind be at your back.

This is riduculous. There's no difficulty writing tar commands. Just RTFM: man tar; I've written tar commands for 30 years: on day 0 I read the manual page, and since day 1 I've been writing tar commands without any problem.

(Same with find: first time a friend hacker dictated me a find command, next step I read the manual and since then I've been a happy find user).

The secret of unix diffusion has been its man pages, full documentation of your local unix system, specific and always available on-line (even when the network is disconnected).

Nowdays, with google or stackoverflow, you have the big problem that the documentation you get doesn't necessarily (probably never) match your specific installation, when you search for documentation, and that you get more fishes than fishing lessons.

Why do I need flags to begin with? tar and untar would have worked fine.

Because originally tar only dealt with tape archives. So the default behavior was (and still is [EDIT: correction, apparently a lot of GNU tar distributions have defaulted to using -f- which sends things to stdout. That is not how tar originally functioned]) to send the results to /dev/st

tar was later changed to read/extract(-x) and write/create(-c) from files(-f). And was also modified to handle gzip(-z) and bzip2(-j) compression.

Essentially, the problem comes down to having to support all of the old scripts that depend on default tar behavior, even though people rarely write to tape drives with it any more.

Unar is pretty good. Saves a letter, even.


If the thesis were correct then 'zip', which has the arguments in the same order as tar, would also have the same issues.

For example, to add/update an entry in a zip file, do:

    zip test.zip test/test.txt
I have not heard of the zip command-line as being more or equally frustrating than tar, so I do not believe this thesis is correct.

Well, zip came after tar, so it probably followed tar's order (as did the `jar` command which even follows tar syntax). Now, consider the `ar` command, which came before tar, for which the analogous command would be

    ar cr test.a test/test.txt

That's a good point, though I feel like that may just arise from 'zip' being the less commonly encountered of the two utilities.

I am unable to compare the number of Windows-based developers, who often use zip as an archive format, and Java-based developers, who often use zip to work with jar files, to the number of people who use tar. My guess is that tar users would be in the minority.

How did you come to the conclusion about which is most commonly encountered?

Also, other popular archivers also use the archive-first approach:

    7z.exe a c:\a.7z file1.txt dir2\file2.txt
    rar a -r yourfiles.rar *.txt c:\yourfolder

zip is almost certainly the most used archive format, but that doesn't mean the the 'zip' command line tool is more used than the 'tar' command line tool.

Even though I've probably run across more .zip files than .tar files (and certainly more archives of in the zip format) in my career, if undoubtedly typed 't-a-r' on a command line many more times than 'z-i-p'.

Oh, I certainly use tar more than zip. My downloads directory has 67 tar files and 15 zip files. Bear in mind though that the hypothesis is not specific to Unix development. The examples are:

    cp file1.txt file2.txt file3.txt destination/
    mv file1.txt file2.txt file3.txt destination/
The equivalent in MS Windows command shell is:

    copy file1.txt file2.txt file3.txt destination\
    move file1.txt file2.txt file3.txt destination\
The hypothesis makes the prediction that Windows command-line users (which is a subset of all Windows users) will have similar problems with zip/7zip/rar's "inverted" command-line parameters.

Similarly, the hypothesis predicts that people who use the command-line to work regularly with .jar, .odf, and other zip-based file formats will also have problem with the ordering.

I have not heard of such problems, but I am not knowledgeable about those areas of development.

It definitely did take me a few tries to figure out how to use 'zip' the first time in the command line on OSX.

Was it because the argument ordering was backwards from what you expected, or for some other reason, like not knowing the specific command-line options? For me it was the second.

For some years now I have been using a quite funny mnemonic in order to remember how to extract a gzipped tar. It reads "eXtract Ze File" with German accent ;)

Maybe I agree with facepalm. But also with informatimago down at the bottom. It's probably be in my best interest to read most unix commands straight from man.

I tend to use tar -czf <file> to untar. If it didn't work I'd proceed to tar -xzf <file> or tar xzf <file>. Even so sometime my machine wouldn't accept the command until I tried something like untar (not real!) or unzip or gzip. But those were mostly out of frustration.

I hope I wasn't the only one mouse hovering the XKCD image that appears in the RSS reader (Feedly here) to see its actual ALT.

Hate to post "me too," but I did the exact same thing.

I think that the destination comes first as then you're free to tack on any number additional files to add to the archive by simply appending them. If the destination was last argument any added files would have to come before the destination which would necessitate an insert, which is more work than a simple append.

Tar is OK but for scripting and/or archiving over socket connections find(1) with pax(1) rules..


It's hard and it has rather old interface (not a bad one, just old).

Recently I've switched to patool (http://wummel.github.io/patool/) and quite happy with it.


I don't see how it's difficult. Are [li|u]nix admins getting more stupid?

I figure the relevant XKCD[1] wouldn't exist if this weren't the case.

[1] https://xkcd.com/1168/

Snarky and sarcastic. Honestly, tar is the least hard "hard" thing you'll encounter in the UNIX world.

How about writing a valid firewalld configuration command to open port 8060 TCP and UDP to connections from and have it saved and applied immediately.

linix admins?

well spotted ;)

>Naturally, tar is never really used in this way anymore.

I think Veritas and their thousands customers would beg to differ.

What? You stick switch parameters after the switches and general parameters on the end just like every other sensible command.

"tar cf"? "cf"? There's your problem! If you follow the anachronistic "bundled flags" approach no wonder you're getting it wrong. "tar -c -f <parm>"!

tar c * > ../foo.tar; gzip ../foo.tar

(you don't need anything else)

A significant improvement is to instead use

tar -c files | gzip > ../foo.tgz

No messing around with flags, just a single one saying that you want to create an archive rather than extract one. No messing around with filename parameters—that's what the shell is there for.

tar czf ../foo.tar.gz *

(GNU tar can do it all in one command)

Both of you are creating a tarbomb and should be ashamed


I agree, but that's how the original was

I think OP's point is that instead of learning the options of tar, you can just learn the basics, and use unix composibility to do the more complex things without learning more.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact