Hacker News new | past | comments | ask | show | jobs | submit login
The History of the Design of Unix’s Find Command (1995) (cat-v.org)
58 points by pmarin on Oct 2, 2015 | hide | past | favorite | 73 comments



Let's use ( ) for grouping, instead of [ ], so people have to escape them when using find through a Bourne-like shell: \( \). And what better terminating token for -exec than the semicolon!


find was actually introduced several years before the bourne shell, so that's a bit unfair. I assume that the syntax was less awkward when it was first released.

Not that I don't agree that it's a pain to use nowadays.


> WHAT idiot would program a command so that you have to say -print to print the output to the screen. What IDIOT would make a command like this and not have the output go to the screen by default.

You don't need to say -print. It's always printed by default for me, and the man pages (on both linux and OS X) confirm this.

> If the expression contains no actions other than -prune, -print is performed on all files for which the expression is true. [linux]

> If none of -exec, -ls, -print, -print0, or -ok is specified, the given expression shall be effectively replaced by ( given expression ) -print. [OS X]

Has this only been added in the last 20 years?


I've spent the morning trying to find the answer, and I don't have one. But I figured I'd share what I did gather.

    [N] 1982 4.1BSD
    [?] 1992 POSIX.2 (IEEE Std 1003.2-1992)
    [Y] 1993 FreeBSD 1.0 (though it wasn't mentioned in the man page)
    [Y] 1994 GNU find 4.0
    [?] 2001 POSIX.1-2001 (IEEE Std 1003.1-2001)
    [Y] 2004 POSIX.1-2004 (IEEE Std 1003.1-2004)
Edit: Nevermind, I have an answer:

In on 1990-04-17, Keith Bostic made a commit with the message "new version derived from Cimarron Taylor's", which introduced it to BSD between 4.3-Tahoe and 4.3-Reno.

Shortly after, on 1990-06-03, David MacKenzie added an entry to the ChangeLog that added it to GNU find 1.2.


I have written many cross-platform shell scripts over the years. The -print option may have been made standard by the GNU implementation and then spread -- I don't know -- but at least on some older versions of UNIX you do indeed need to specify -print or it won't.


Yes, I think that is true. Speaking from memory, but I've worked on Unix for long, and I think I remember that on some older Unix versions, you had to specify -print, otherwise it would walk the directory tree according the conditions you specified (such as filename wildcards, the -o option (for OR), etc.), but would produce no output for that walk.


The -print option was not the default when I first used find (in v6 Unix, back in PDP-11 days). I don't recall what 4.2bsd did, which was the next version of Unix that I used.

I thought it was pretty brain-damaged, and that whoever decided that had decided poorly.

The author of the tool should have pushed back, IMHO. Then again, I don't know the political structure of the shop where find(1) was written, so maybe that was impossible.


Is unix shell one of those things you just have to suffer through for 3-6 weeks before "getting it"? Kind of like learning a musical instrument?

Also would anyone recommend learning how to use Windows PowerShell and again would that have the same kind of insane learning curve?

I'm just asking to make sure I'm not crazy and that command lines do in fact have a very high learning curve. Every time I attempt it, I fail because I don't know how to use these freaking flags and switches, and it's really hard to ask the computer how to use them. And that's if you know which command to use.


One major thing is that there is so much inconsistency.

Several posters here have commented on how find is a terrible tool with weird syntax. It's not actually terrible or weird, it's just different... because it pre-dates most everything else.

And it is hard to know how to ask the computer how to use the tools. Is the flag you need in `cmd -h` (traditional) or `cmd -help` (X11) or `cmd --help` (GNU) or `cmd --usage` (other GNU) or `man cmd` (UNIX; `man` is short for "manual") or `info cmd` (GNU)?

mstade said that "On any reasonable unix" the solution is `man cmd`. That's in general true, but apparently GNU/Linux isn't a reasonable unix then[1] (one may even say that it's not Unix :P ). Often (mostly for GNU commands), the man page is crappy (don't ever bother with the wget man page, `wget --help` is what you want) or non-existent. But it's a good place to start, then you can try working through the options I listed above... after some point you'll gain some intuition for which are the more likely ones to yield results based on who the publisher of the software is.

untothebreach's suggestion to bookmark the man pages for a system that has consistently good man pages, even if they aren't 100% compatible with the version of the command you have. He recommended OpenBSD, which is a good choice. When I was new, I leaned pretty heavily on the FreeBSD man pages https://www.freebsd.org/cgi/man.cgi .

[1]: Plug: I actually wrote a blog post once on why documentation on GNU/Linux sucks. https://lukeshu.com/blog/poor-system-documentation.html


To make `info cmd` more like `man cmd` do:

    info --subnodes "$@"  | less


Thank you for this one, that's new to me.


GNU Linux main help system is "info". And it sucks. At least its interface.

"pinfo" is much, MUCH better. And it has "hjkl".


Drop "Linux" from that sentence. Texinfo is /the/ GNU documentation system.

The `info(1)` command does have a terrible interface; it's a minimal clone in C of Emacs' `info-mode`. Which is a terrible place to be--non-Emacs users hate it, while Emacs users just use `info-mode`, which is head and shoulders above `info(1)`.

Fortunately, texinfo also generates HTML, PDF, and a dozen other formats, such that it's available in an acceptable format for just about everyone. But, no-one's aware of that because all of the `--help` messages just say "type `info cmd` for the full manual."


I was saddened by the lack of consistency under linux. I hoped that projects like http://docopt.org/ would help reduce work for developpers and users in one go.


I hope this doesn't come across as a "RTFM" comment, because I don't intend it to be, but I have found it helpful to get into the habit of reading man files when I need to know how to use a command. The quality of the man files on your system might not be the best, so I would suggest bookmarking the man pages from the OpenBSD project[1]. It might not always be directly applicable to your system, since some commands are different between linux, osx, and *bsd, but no matter what platform you are on, the OpenBSD man pages are very high quality, and at the very least might point you in the right direction.

1: http://nixdoc.net/man-pages/OpenBSD/


I would really recommend against using the *bsd man pages if you run a gnu system; the differences will drive you madder than the similarities will help, except with common tools (e.g. bash).


Adding to this comment is my recommendation to bro(http://bropages.org/) command, which is a wiki-like man pages that simply tells you what the comand does on high level, and a few practical examples.

man pages are great for the folks who are used to the type of prose and those who want to get to the bottom of the command. But I've almost always found the man pages unhelpful and sometimes even distracting(check the infamous sudoers manpage, which starts off with a 'Quick guide to EBNF).

So my advice to those who start on linux is, if man pages aren't good for you, then google! more often than not, the task you're trying to do is something that someone has already tried to do, and there will be an easier-to-understand description than man page.


Do not ever try comparing "Linux" man pages with the OpenBSD ones, which are much better than even bro-pages.

A badly written man page or worse, an absent one is considered a bug at OBSD.


It's very easy for RTFM to sound dismissive, but after a while of being a young hacker, I realized it had a zen like quality. The answer to life's hardest problems isn't found by asking others; It is by searching inside yourself and Google.

It took me a while to understand, and I imagine others have had the same problem. Flow is important to programmers, as is self-reliance. Interrupting others for questions is generally a bad idea for both you and them. At the same time, telling others not to interrupt can be discouraging and dismissive, so it's a fine line to walk to tell people who are learning to go away. You must do so in a supportive manner, and there's no simple solution to that. Any one solution you take only lasts for a limited time before it sounds trite and dismissive.


Many thousands of people learned it before you so it can't be that hard, and you have it way easier than most of them since you have all the world's information at your Googletips. Like everything, you poke around trying different things until something works and then you do it again. The key is to not be afraid of the damn thing or blame yourself for being too stupid. Everything is stupid we just all muddle through the best we can.


There are many thousands neuro surgeons or physics.doctorates so there's that but learning bit by bit with google won't ever beat picking up the adequate book.


Just like I don't think any number of books will ever compensate for a poor ability to teach yourself.


Of course books do compensate for that. Try picking up mushrooms with self teaching and personal feedback and picking up mushrooms with a catalog of the edible and non-edible ones.


Nah. You can get comfortable with the Unix shell in a week.

First thing's first: you have to know what's out there in order to find out how to use it. Go to a book store (they still have those right?) and pick up a 50 page Unix command line reference. Read it cover to cover. Play with each individual command in some way.

Note: this should be a UNIX reference, and not Linux. You want something ancient and decrepit, because it'll be the most universal and simple set of commands to start with.

Get another book, something like an Introduction to Linux System Administration. Besides telling you all about the system itself, this will task you with running commands and modifying different parts of the system; you'll be doing lots of command-line work, all following the book's instructions.

Finally, attempt to write basic shell scripts and run commands with a basic user interface, like lynx or mc. This will not only teach you how the shell interprets everything you type into it, it will give you an insight into the other ways the console works with the terminal, applications, i/o, etc.

It may sound like a lot for a week, but if you spend a couple hours a night on these you can't go wrong. When in doubt, buy another book.


On any reasonable unix, you should be able to simply type `man cmd` where `cmd` is the command you're looking to learn more about. You can try this with `man man` for instance.

Learning how to operate a shell isn't very difficult, but to really become proficient takes some time. It's also important to make a difference between the shell and the operating system it runs on. Some utilities for instance, may have the same name but slightly different flags or modes of operation, which can throw you off. You'll learn these things along the way, and mostly it doesn't matter unless you find yourself having to deal with many different unix systems (such as developing general purpose shell scripts.)

I'd certainly argue it's worth learning how to use a shell, if you find yourself having to drop in to one relatively often, or if you're just somewhat interested. There are plenty of resources online, but I'm partial to tldp's guides[1].

I couldn't say much about PowerShell, I know next to nothing about it. I also have the benefit of not having to touch Windows systems much, and when I do I usually drop into git bash or something similar anyway. I'd say the answer to that question might lie in what systems you usually work in, and which ones you want to work in. If you have little interest in unix based systems, then by all means go forth and Windows up, but otherwise I'd suggest learning more about unix shells and specifically: bash. Why bash? It may be the javascript of unix shells – there are certainly more (and less) capable shells out there – but it has the benefit of ubiquity. You can find bash pretty much anywhere. (And with git bash and the likes, even on Windows!)

[1]: http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html


> Also would anyone recommend learning how to use Windows PowerShell and again would that have the same kind of insane learning curve?

I do have to use Powershell at work so I am familiar with it. But you can't seriously claim that Powershell is an example of good design. Its language is as inconsistent as Bash, even with the benefit of hindsight.

Powershell is only helpful with the flags and switches, if you are calling other powershell scripts, or .net libraries. If you are calling standard executables, you are back to square one. Which happens to be bash's more common use case.


PowerShell is miles ahead of bash, sh, ksh, zsh etc when it comes to consistency. For instance:

* All of the cmdlets follow the same verb-noun pattern.

* There are only 40 or so standardized verbs, with guidelines for what they should be used for and how to choose the nouns.

* Parameter names are also standardized

* Parameters are parsed by the shell - not individual commands, which means that all cmdlets take parameters the same way, no strange differences like between dd, ls, find etc.

* There's a set of standard parameters supplied by the shell that is common to all cmdlets, and which has the same function on all cmdlets (-ErrorAction, -WarningAction, etc).

* Cmdlets are more single-purpose than typical unix commands. Take for instance unix "find". Why can a find command delete files and why can it execute other commands??

* PowerShell supplies scripting and expressions. Meaning that there no need for tools such as jq, awk, find, xargs etc, each with their own expression syntax.


Thanks for the write-up, this is very informative.


These days I often feel that Google + Stack Overflow are the real textual UI to a computer, and the command line is a kind of bytecode.

As with any assembler, you can get pretty good at memorizing the commonly used parts, but often you'll just need to look it up.

Would it be possible to make a compiler for this: a "natural language Google search query to bash script" tool to automate the process that everybody does anyway...?


No, probably more on the scale of months. In my case it was years, but I was an intermittent and resentful user.

I'd say ignore the people who recommend man pages; they're pretty much the worst form of documentation out there. The first thing they show you is a cartesian product of all switches, which can do nothing but serve to confuse you. After that, they list the switches an alphabetical order, which is useful only as a reference after you already understand the command.

It's true, you can search in the man page, but you have to already know the technical term for what you searching for. Your question may be "How can I do X with Y command?" which man pages often don't help.

The are only reference pages for those who already know the commands.

Better ways to learn are to search the internet, specifically the awesome unix.stackexchange.com, and best yet work with someone who knows more than you who wants to help you. At my workplace, there are several Linux experts, and we regularly chat about ways to accomplish our problems in the shell.

Once you learn them, you will enable yourself to do very very powerful things that would take special commands in special apps, which you have to download, install, and perhaps purchase. Such as grepping an entire codebase for a string, as a small example.

Learning a unix shell is like learning a programming language that works on pipes. There is a hump to get over, and once you do, you will be able to express precisely what you want to with your computer. As far as you're concerned, it's the language of the operating system, allowing you absolute control.

Because of my awesome co-workers, I now work daily, productively, in the shell. I use cygwin on windows to develop on remote and virtualized unix environments, and the experience is seamless.


Check out this book: The Linux Command Line.

http://shop.oreilly.com/product/9781593273897.do

There's a review of it by me on that page, under the Reviews link. I think that review (which I spent some time and effort writing) will give you a fairly good overview of the topics you (or anyone) needs to learn to become good at using Linux (or other Unixes) at a user level.

There are also tons of other resources for it on the net, including many at http://tldp.org - The Linux Documentation Project. Some of those may feel more advanced, so you may want to start with the book above, or any other similar book, and then move on to the TLDP and other tutorials on the net.


I used a different O'R book, Essential System Administration or something like that. It had a chapter on the shell and some useful commands to use in shell scripts. Also the SysV printed documentation had a very nice book about the other commands that you miht use like roff as a word processor.


Yes, Essential System Administration is good. I've read some of it. But I think it is more oriented for sysadmins, while the Linux Command Line book is for beginners, who may or not go on to become sysadmins.

Aside: One of the good points the ESA book maks, IMO, is that when doing sysadmin tasks, try to make your changes reversible, i.e. you should be able to go back to a previous state (of a config file, or whatever). That can save you from a hell of a lot of trouble later.


It does take some time to be comfortable at any shell. That applies to WIMP ones too, but you've probably gone trough that phase already.

That said, you should get able to do the basics fast, and need time just to expand to the less common corners. Also, practice will make you comfortable, and indeed faster, but is not a requisite for using it correctly. The learning curve is steep, but there's no barrier anywhere, and no leap in understanding (if you already can code).

As people will keep telling you, the manual is your friend.


I have found the `find` command and its API to be the hardest to learn. 3+ years on *NIX machine, spending close to 70% time on the command line when not browsing, and yet I still am not comfortable with `find`.


Personal Experience: Implement a command line stack machine like an RPN calculator and it'll click. Every argument normally just pop's the next arg to determine its state. I purposely made myself use find for about 6 months and I still struggle with flags sometimes, especially when chaining more complex expressions.

Find is nirvana with SSD's. I can throw find commands at the root -exec'ing out to grep on my 512GB SSD very quickly.

Lastly cygwin + find is still faster then windows explorer search in my experience, especially if you have an SSD. This is a great way to start teaching yourself unix command line also.


Not sure if I'm doing anything wrong but I find cygwin/msys find very slow on my windows SSD. I mean I still generally use it just because I'm familiar but if I'm looking for a file in huge directory I look up whatever flags I need for dir (dir /s something something)


Its start up is incredibly slow, its operations are slower then native, but then you aren't doing native things.

The argument commonly used is productivity vs speed of operation. Even slow file systems are faster then I can humanly see happen more often then not.


I combine "find ./" and grep instead of learning how to use find. (I'm a terrible person.)


For all I know you may indeed be a terrible person ;), but this isn't why. The composability of Unix' commands is probably its greatest strength.


-name and -iname are the commands to know.

But yeah, I think that's the default novice way of using find. I cannot agree with you and the article hard enough - I have an intense, irrational aversion to using 'find' that's absolutely incomparable to my feelings on any other Unix tool. I really can't think of any other utility which needs so many inane arguments to achieve a basic level of functionality. I even hate cracking open the manpages on it - I'll try to bring myself to do it, and then I decide "you know what, fuck it, I'll grep for it".

This is where I throw in a quick plug for bropages. Best tool ever, cannot recommend strongly enough. It's the second thing I install on a new system, right after etc-keeper.


I have found it helpful to learn small parts of the `find` api a bit at a time. For example, instead of `find ./ -print | grep "name_of_file"`, I use `find ./ -name "name_of_file"` or `find ./ -name '* name_of_file *"` (without the spaces -- I can't seem to surround text with asterisks without it triggering HNs formatting (yes I tried escaping them)) if you want to fuzzily search for a filename. You can replace `-name` with `-iname` if you want your search to be case-insensitive.


You mean that's not how you're supposed to use it?

It's such an utterly terrible tool and needs replacing bad.


find . | grep 'abc' === find . -name 'abc' find . | grep -i 'abc' === find . -iname 'abc'

Here, saved you a pipe :)

(Oh, and with -exec and -delete, not needing that pipe is incredibly useful. Find is an incredibly powerful command.)


Sorry if this is pedantic, but actually:

    find . | grep 'abc' === find . -name '*abc*'

    find . | grep -i 'abc' === find . -iname '*abc*'
The -name and -iname options do a verbatim file name check if you don't include those wild-card asterisks. I've been inconvenienced by having to go back and add them often enough that this is burned into my brain. :-)


I just read the original post (the store about someone writing to Dennis Ritchie) which this comment thread is about. Noticed a mention of find and cpio. So wanted to say:

find, in general, is quite a powerful command [1], and one of the ways of using it, is piping its output to cpio or other commands, to act upon the found filenames (which can include relative or full paths, depending on how you call find). There is also the -exec option to find, which can do more or less what I said above (piping find's output to other commands), but without using a pipeline.

Those are some good reasons why you might want to persevere and learn to use find, despite its awkward syntax. And it's not really that difficult.

[1] One such example is a command I used to use a lot: some variation on piping the output of find to cpio, using the -p option of cpio (for pass, IIRC). E.g.:

$ find . -name some_wildcard -print | cpio -pdmuv dirname

which would copy an entire directory subtree from one place in the file system to another (or to a place (directory) in a different file system too). (Something like XCOPY ... /s /v /e in DOS.) And those cpio options could be used to do things like keeping the permissions of the target files the same as those of the source, or not, etc. Imagine how much time that could save (over having to later manually change the permissions, after the copy, particularly when there are many files). And that's just one example of the power of fin (along with other Unix commands).


xargs is worse than find, I'd say, on account of its useless default mode of operation.

If it accepted newline-delimited input, it would cater for every non-pathological case, rather than (as it does now) failing miserably on many reasonable file names. (Of course, you have xargs -0 - but not all tools output appropriate data. And it accepts shell-style quoting - but do we really want that contagion to spread?)

The bizarre thing is that except for oddities like `echo * | xargs', every tool I've ever met that outputs lists of file names outputs them with newline as the separator. And any that currently don't, would be better off doing that, anyway, I'd argue - since most Unix tools are line-oriented.


Agreed. Posix xargs is bafflingly inane. GNU xargs, at least, supports `xargs -d '\n'` to regain some line-oriented sanity. I prefer to use GNU parallel these days, though, since it's line-oriented by default and a bit more ergonomic.


xargs is very powerful when you want to run one command with many arguments, but each argument is newline-delimited. That's basically all I use it for, and it saves me a lot of looping and subshell creation.

For example, when purging backups from Mercurial after a revert:

find . -name *.orig | xargs rm


But that's exactly it - if any of the filenames have spaces, that will fail, since xargs will try to delete each part separately. From the man page:

Because Unix filenames can contain blanks and newlines, this default behaviour is often problematic; filenames containing blanks and/or newlines are incorrectly processed by xargs.


Excluding filenames containing newlines, I wonder how many weird corner cases this would reveal:

    ls | tr '\n' '\0' | xargs -0 rm
I've been annoyed by ls and xargs not playing together here and there, but the above only just occurred to me. Not sure if it's a good idea yet or not!


find . -name '*.orig' -delete


Blatant plug: My pain at all these unnecessary hieroglyphics & man page work is why I built Crab, so I could use SQL to find things and an EXEC command to run commands on them.

etia.co.uk

Really it shouldn't have to be this hard.


Interesting that that's the direction you went with this.

I've always wished that mysql supported a more unix-like interface, where a few characters of composable interfaces would do, instead of word after word of semi-natural-language syntax.


Wordiness isn't really a problem, because I'm a fast typer and most systems have tab completion and history recall.

For me the benefit is that for lots of problems, whether databases or the filesystem, thinking in terms of sets feels really natural. I can't imagine why you'd want to pipe grep results to grep -v rather than say "and not"

I guess we all prefer the language whose syntax we know the best.


I remember seeing Crab a while back on HN, and thinking I don't see why you would want this, but when you mentioned thinking in sets it was way more clear. It may be cause I don't use SQL often that I didn't click. Prehaps that would be a good thing to mention on the website.


Only just getting to grips with it the past year and even then I'm not doing anything in the way of advanced. I have, however, found knowing only a subset of its features has dramatically improved my efficiency at the terminal. Awk and sed have been a useful learn too - I need to check out gawk. Apparently if trying to use awk and grep (emphasis on trying), I should just use gawk.


What do you find hard to learn about find? I don't think it's particularly hard to learn, as Unix commands go. A bit more difficult than the average command, maybe. Of course not everything about it may be well designed (as some other comments here say), but that holds true to some extent for any command or language.


find has a worse reputation than it deserves. Yes, its learning curve is steep; when I was learning Unix, find was one of the commands that really stymied me. But like many powerful tools once you've learned it you can do some wonderful things with it.

Here's a random example: Prune a a scratch directory of files older than one week.

find /dir -type f -mtime +7 -delete

How would you do it without find? (Serious question. I know it can be done with other tools but since I can do it in one line with find I've never bothered to look for another solution.)


> How would you do it without find? (Serious question. I know it can be done with other tools but since I can do it in one line with find I've never bothered to look for another solution.)

With python you can do something like

    [os.unlink(file) for file in [os.path.join(*path_parts) for l in [list(zip(repeat(root),files)) for root,dirname,files in os.walk('/tmp') if len(files) != 0] for path_parts in l] if datetime.fromtimestamp(os.stat(file).st_mtime) < datetime.now() - timedelta(days=7)]
...I suspect there might be bit more elegant solutions too


Woof. That is one python program I would not call "executable pseudocode."


More idiomatic version is more readable but also more verbose:

    cutoff_date = datetime.now() - timedelta(days=7)
    for basepath,_,files in os.walk('/tmp'):
        for filename in files:
            filepath = os.path.join(basepath, filename) 
            if datetime.fromtimestamp(os.stat(filepath).st_mtime) < cutoff_date:
                os.unlink(filepath) 
Python is not really that great when forced to oneliner format, and I'm not really a good code golfer anyways.


In PowerShell:

    ls /dir -re | ? LastWriteTime -lt (Get-Date).AddDays(-7) | del
i.e.

1: enumerate the files recursively (ls is alias for Get-ChildItem)

2: pipe through a filter (? is alias for Where-Object) that selects only files written to more than 7 days ago

3: pipe those files to the Remove-Item command (del is alias for Remove-Item)


OS X users: check out the mdfind command. It doesn't work on dot directories, but for quickly finding files, I try it first simply because the command syntax is so much easier for me to remember.


I am on OS X now, but I remember using `locate` on Linux to achieve something similar to OS X `mdfind`


just tried it out - it caused iterm to crash


To be fair, iTerm probably caused iTerm to crash.


very true!


This doesn't explain the history of `find`. Saying 'a specification made us do it' is not an explanation, it is passing the buck.


If you want to take a lesson from this, try your best to fight bad specifications, and if you are in a position to accept or not accept code with a bad interface, don't accept it.

Bad code lives forever and find has been a curse on *nixers ever since.


"-exec ls -l {} \;"

""{} \;" ?? Whaaaa ?"

"Hush child; just type the incantation, and all will be well"


I love Unix shell, but I gotta admit, the find command is kind of a mind fück.


http://www.templeos.org/Wb/Adam/Opt/Utils/Find.html

Must handle binary graphics in source code.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: