Hacker News new | past | comments | ask | show | jobs | submit login
Find Files (Ff) – File Search Utility in Rust (github.com)
76 points by pvsukale3 56 days ago | hide | past | web | favorite | 78 comments

There's also fd[1], which is written in rust and is super fast.

[1] https://github.com/sharkdp/fd

I just did a little test. fd seems to be about 3 times as fast.

  Command      | Mean [ms]     | Min…Max [ms]
  `fd .conf /` | 525.6 ± 17.7  | 505.3…552.0
  `ff .conf /` | 1760.8 ± 10.2 | 1744.6…1775.3

Looking at the code, I believe because fd uses a loop, and ff uses function recursion to walk the path tree. Function recursion is easier to write, but definitely has some overhead.

My experience is that if you have to maintain your own stack (which you'd need to in order to recursively walk directories) its faster to use the builtin stack than to implement your own.

Further, it seems that this would IO-bound not CPU-bound so whatever small overhead exists in recursion vs iteration will not be noticeable.

I think the difference is rather that `fd` uses threads and `ff` does not.

> `fd` uses threads and `ff` does not.

Author here. This is true. ff does not use threads right now. I have started learning Rust just a few days ago. I am not quite yet familiar with advanced topics such as achieving parallelism using threads in Rust and other similar stuff. My knowledge of Rust is limited at this moment and I struggled to get the language concepts to work such as ownership, lifetimes, etc. I am sure that I will be able to improvise the ff's performance by some extent by gaining some more knowledge of Rust.

UPDATE: Now ff also uses threads to achieve parallelism.

Author here. Slow behavior while searching through a large number of files is a known issue at this moment. I will definitely try to reduce it by some extent by immediately streaming to STDOUT (or to a non-TTY device if TTY is not STDOUT). Right now, the results are accumulated, filtered and then printed in one go.

Z (fast shell jump) and ripgrep + FZF are among my favorite niceties. Not essencial, but so much more joyful to use.

yes I use fd which is fast and great, along with 'rg' for grepping, both by the same author using rust.

hope someone can produce a C version of them for fun

fd wasn't written by me. :-)

The C version of ripgrep is the thing that came before ripgrep, The Silver Searcher.

First, I have a problem with rewriting common utilities (like find, which also supports more search functionality than regex) while changing flags (e.g. `i` to ignore case to `s` - incidentally, grep also accepts `i` to ignore case). It makes it harder to explore the system interactively, particularly for beginners.

Second, the utility already exists, and this version is significantly slower:

  > time ./ff '.*.cpp' ~/src/ > /dev/null
  real 0m1.195s
  user 0m0.488s
  sys 0m0.671s

  > time find -E ~/src/ -type f -regex '.*.cpp'  > /dev/null

  real 0m0.880s
  user 0m0.407s
  sys 0m0.425s
But wait! This is a totally unnatural use of find. If we use find like users actually do, we get even better performance:

  time find ~/src/ -name '*.cpp'  > /dev/null

  real 0m0.553s
  user 0m0.181s
  sys 0m0.363s
Hardly scientific. But it begs the question: who is this for?

tbh find's command line interface is extremely unnatural, and that alone would justify a switch for me. Ofc, there's no reason that the program itself needs to be reimplemented beyond its argument parser, but that's not that concerning for me as long as its still fast enough

In ff v0.1.4, the results are dramatically improved.

  > ff '.*.js' ~/projects > /dev/null   
  7.59s user 
  24.21s system 
  337% cpu 
  9.413 total

In ff v0.1.3, the slowness is reduced to some extent.

  > ff '.*.js' ~/projects > /dev/null  
  5.79s user 
  14.77s system 
  97% cpu 
  21.024 total

  > find -E ~/projects -type f -regex '.*.js' > /dev/null  
  3.98s user 
  10.76s system 
  67% cpu 
  21.933 total

Here is a bash/zsh port:

alias ff='find . | egrep'

I hate myself for never thinking alias whenever I use a pattern 23 times an hour. Thanks

Or maybe use the built-in

  ff() { find . -regex ".*$1.*" "${@:2}"; };

This will show all files where any of the path matches, not just the leafs

`find . -type f -name "$1"` should give you that and potentially faster than GP since you don't need to do any I/O to a second process.

The above is faster because it streams the results as it walks the directory tree.

Why do "convert everything to rust" people appear to prefer solving already solved problems instead of focusing on problems we could not solve but are now possible thanks to Rust?

Apart from the aspect that you're new to a language and just want to learn something you might make an improved version.

ripgrep is a Rust version of grep. It's the fastest grep I know and it has better Unicode support than GNU grep (because of Rust language decisions).

Looking at memory safety and portability it might also be a good idea to program them in Rust. There are whole classes of bugs that are excluded in Rust which still pop up every once in a while in some of our basic system tools.

I guess porting something you know in a new language is one problem less.

"If people did not reinvent the wheel we would be still rolling around on wooden logs"

I for one am very happy with tools such as ripgrep and fd which are more or less just rewrites of decades old programs but are much faster and have generally better though out interfaces.

>"If people did not reinvent the wheel we would be still rolling around on wooden logs"

Or doing this:


Besides the fact that these are great exercises, often enough they also demonstrate solutions that are now possible thanks to Rust.

e.g. ripgrep is faster and makes better use of available hardware (concurrency, SIMD) than grep – thanks to Rust.

>they also demonstrate solutions that are now possible thanks to Rust

This seems specious. If I started a grep implementation from scratch in C{,++}, it isn't clear to me that I couldn't achieve the same (or better) results as ripgrep. Comparing a totally rewritten program to one designed decades ago and concluding 'without Rust, we wouldn't have this faster program with better hardware use' doesn't seem quite right.

You are right. You could. I've said as much numerous times in the past, but this doesn't unfortunately stop folks from saying that the performance difference is directly attributable to Rust. The performance difference is primarily attributable to algorithms.

I have, though, said in the past that I wouldn't have written ripgrep without Rust, which is a very different claim and is opinionated, and mostly a statement about developer productivity and maintenance burden. With Rust, it is very easy to keep ripgrep's trains running across Windows, macOS and Linux.

Have you used these command line utilities yet? They're great!

Exa, ripgrep, fd, bat, broot, ruplacer

And more...

I'd recomend lsd over exa. Which I also use to replace tree. I haven't heard of broot and ruplacer before, but they seem cool.

I've got this in my .zshrc:

  alias ls='lsd'
  alias tree='lsd --tree'

> lsd

Tangent, but man is that a bad name for a CLI tool. Had to add a lot of keywords to not get drugs-related pages.

"!gh lsd"

Though yes, many computer-thing names are bad in the search department.

Would you know what their differences are? At first sight neither exa's nor lsd's README.md mention that (except that lsd is slightly faster according to their bench).

Hey, broot is great!

And I'm totally impartial, being the author.

Joke apart, as I've been told broot was being mentioned here, I'm available if you have any question. I'll do a SHOW-HN post some day but I'm currently postponing it because of a bunch of new features in the making.

Reference: https://github.com/Canop/broot

I did try rip "grep".

    rg "^SECRET" 
Find nothing. Try to contact someone who is asleep for half an hour, then they tell me they put it in there.

    grep -r "^SECRET" .
yep, it is there, in an .env file. "Uhh yeah sorry I was using ripgrep and it doesn't search hidden files by default but it has a flag for it" "you wut?" "nevermind, thanks bye." <- that is ripgrep. Maybe we got off on the wrong foot.

Anyway thanks for listing all the tools you're great.

It does search hidden files, it doesn't search gitignored files. Why are you assuming it's identical to grep?

From https://github.com/BurntSushi/ripgrep

> Like other tools specialized to code search, ripgrep defaults to recursive directory search and won't search files ignored by your .gitignore files. It also ignores hidden and binary files by default.

Have you installed it?

    nurettin@ ~ () $ mkdir test
    nurettin@ ~ () $ echo "TEST SOMETHING" > test/.env
    nurettin@ ~ () $ cd test
    nurettin@ ~/test () $ rg "TEST"

Why I don't know, because it has grep in the name? Do you now realize where the confusion might be lying?

This is part the whole point of ripgrep. If you don't want it to do smart filtering, then you either alias `rg="rg -uu"` or just use grep if it's fast enough for you.

Some people might miss this, yes. That's why it's mentioned in the first couple sentences of the man page.

RTFM is an important thing we remind to eachother in order to reach for our own knowledge instead of relying upon others to give it to us.

And yes, this most important bit of information is in the man page and the help text, but not at the top of the page people who came to actually download the tool will look first. Which is `https://github.com/burntsushi/ripgrep` .

Sure. I'm just trying to justify the actual behavior of ripgrep (which you are poo-pooing) by saying that it is at the very heart of what ripgrep is about. It's not some weird corner case that's supposed to bite you. It is its raison d'etre.

The README mentions that it respects gitignore rules. I just updated it to include hidden/binary files. Thanks.

nurettin 56 days ago [flagged]

Poopoo sounds really funny. Haha.

Still pretty sure grep is a confusing name for something which doesn't grep "because of smart filtering" <- that is just your rules, your way, whatever your heart desires, whatever you want to call "smart".

This is not about Rust anymore (which I actually like)

This is a point you seem to be ignoring in favor of opinions such as "this is the reason for this tool to exist" <- again, your way, your rules.

It is circular to name something after something else, then making it act totally different, whatever you want it to do, changing output format, filtering files, then responding to confused people saying hey this is what I wanted. Yes it is. It doesn't change a thing.

Thanks for the readme update.

Making tools whose names end in "grep" that don't behave exactly the same way as grep itself is almost as old as grep. Every Unix-like system has egrep and fgrep, for instance (in fact, they're usually symlinks to the same executable as grep itself, and I at least pretty much always use egrep in preference to grep). There's an approximate-search tool called agrep that's also decades old and pretty widely used.

ripgrep is no more "totally different" from grep than egrep (different regex syntax) and fgrep (not regexes at all) are.

If I named it something totally different, people might say, "why didn't you just call it a grep, because that's basically what it is." Language and perspective is funny like that.

> whatever you want to call "smart".

Related: https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#pos...

I like rust, and I generally like this trend of rewrites, but I will have to support systems with old grep for a long time jet.

So in cases where the new "improved" tool departs from old standard [1], it just add one more thing, that can screw you, especially if it comes over as a replacement.

grepping for settings in dot files is not exactly a corner case for me.

And setting up aliases to me it's similar. It will just mess with you on systems when you don't have them set up.

I guess it depends what you primary use case is. I can see how a developer that spends most of his time on his desktop/laptop command line would find this good default.

[1] And that it does so silently. If something fails hard or errors on the side of not filtering, its less of an issue.

The FAQ question I linked in my previous comment directly addresses most of your complaining: https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#pos... --- In particular, note the statements about what it means to "replace" grep.

> So in cases where the new "improved" tool departs from old standard [1], it just add one more thing, that can screw you, especially if it comes over as a replacement.

The solution to this is so stupidly simple. If this is a problem for you, then _don't use the new tool_.

ripgrep is not a replacement for POSIX grep.

Repeat after me. ripgrep is not a replacement for POSIX grep. I've never once marketed it as such. Others might. Why? See the FAQ for an explanation.

> grepping for settings in dot files is not exactly a corner case for me.

It's not for me either. Why does everything have to be black and white? Just because it isn't covered by default doesn't mean anyone thinks it's a corner case. It is precisely because it is not a corner case that ripgrep makes it very easy to disable "smart" filtering.

And once again, of course, if you do not want to be bothered by this, then don't use the tool.

I read that FAQ before I made my previous comment, and I don't see how it really changes anything I wrote.

I don't expect ripgrep to change, or to cater to my needs, like you said I don't have to use it. I managed to figure that out on my own, but thanks anyway.

Perhaps I expressed myself poorly, or we are misunderstanding each other, but it doesn't really matter.

Thanks for writing ripgrep, I might not use the utility, but when I stared learning rust, ripgreps source was one of the sources I looked at as a nontrivial cli app.

Have a nice day.

The tool is called rip grep. You could see it as a way of saying "RIP grep". The tool isn't called grep. The binary isn't called grep. It therefore isn't grep.

For people who make grep an alias into rg, they can use, rg does have three compatibility modes: -u -uu and -uuu # see the manpage for details.

Guess we should all just stop doing anything then.

If you’re learning a language, reimplementing something that’s already been solved is a fairly standard learning exercise.

Consider GNU AWK. Part of AWK is regular expression compiling. GNU AWK periodically copy-pastes regular expression handling from GNU grep instead of depending on it like a library. It's ridiculous.

never written 'hello world' before? got to learn the tool before you can solve previously unsolved problems

I don't think there is anything that Rust can do technically that wasn't possible before. I think it's benefits are more about implicit organization.

It depends on exactly what you mean. In some sense, yes. It's still all assembly at the end. But, there's some claims about what's realistically possible, rather than theoretically possible. For example, Stylo was attempted in C++ twice, and both times, it failed. The complexity was too much, and there were too many bugs. The Rust attempt succeeded on the first try. shared_ptr's implementation in gcc's libstdc++ uses a heuristic to try and determine if it should use atomics or not. In my understanding, this was done because they were really worried about bugs when people chose the wrong one. Rust has a clean, compile-time checked separation between Arc<T> and Rc<T>.

These kinds of claims are much more interesting to me personally.

Here[1] is the detailed case study of rewriting Stylo in Rust which Steve mentioned in the above comment.

[1] https://hacks.mozilla.org/2019/02/rewriting-a-browser-compon...

I'm not sure an anecdote about about a software project failing twice and succeeding on a third try can be used as evidence the the language of the third try enabled something to be pragmatically possible. I'm sure rust helped, but I'm also very confident that I could architect even a large scale project in C++ well enough that it would remain modular and progress could remain steady.

The team said it, not me. (I realize I didn’t make that clear before) There was a talk at a Rust conference about it in detail.

Modularity was not the issue. Complexity combined with concurrency was. Rust’s compile time checks was what made it feasible to be so aggressive without the bugs.

I was already assuming that concurrency was a big part of that statement. I respect Rust's attempt to solve these issues at the language level, but my experience has led me to think that they need to be solved at the architectural level of a program.

As far as I can tell, concurrency is fundamentally about grouping data by dependencies to find chunks that can be isolated. I think that resources are more about lifetimes and ownership and that resources and data aren't the same thing. This doesn't contradict what Rust is doing, it is more about relying on the language less.

You can see what I'm talking about in this link, though I don't think it has been digestible enough yet without video or examples.



because people want to learn and practice...

Someone already opened an issue to make it ignore files specified in `.gitignore`. A `fd` is doing it, `rg` is doing it, now someone wants this to behave the same. I don't know. Maybe I have some really strange expectations but when I want to find a file and nothing comes up as a result, I want to be sure nothing is there. And I want to be sure without remembering bunch of command line options or setting up aliases everywhere I log in because, guess what, maybe they invented some other file with patterns to ignore. Is it only me?

That's the great thing about these tools. There is no need to serve as fundamental tools of a system that conform to some ancient spec like POSIX. There is no need for that because those tools already exist and work well. So if that's the behavior you want, then just use those tools. They work exceptionally well when called by other programs, like shell scripts, because they have reasonably standardized behavior.

But with these tools, we have the freedom to revisit old assumptions. One of those is smart filtering. I can't tell you how many people tell me that they enjoy that as a good default.

Tools should make the fact that they do smart filtering very clear, because it can be surprising if you aren't expecting it. But otherwise, I'm not particularly sympathetic to your reasoning here because: 1) what I said above about standard tools being available and 2) these tools should have ways of disabling smart filtering. So if it bites you, it should bite you only once. After that, you either learn to like it, learn to setup an alias and forget about it, or ragequit the tool.

99.99% of the times you don't want to search for files that are in .gitignore. Ignoring by default is the sane default, as it reduces the amount of false positives. If you do want to search through all files, there's probably a flag that allows you to do that. You can alias your commands to `alias tool="tool --whatever-flag"`.

I got bitten recently by haproxy using the '!' prefix on .gitignore lines to negate previous lines. This resulted in ag silently not searching any of the code base at all. So if you're implementing .gitignore support, you may want to consider at worst bailing on files with this syntax (and perhaps other syntax that is not supported)

ag bug: https://github.com/ggreer/the_silver_searcher/issues/1233

This is no longer true. `ff` searches through all files recursively by default no matter whether a file or directory is enlisted in ".gitignore" or is hidden. There are flags to change this behavior.

How does this compare to fd, which provides similar functionality and is also written in Rust?

I have answered a similar question here - https://github.com/vishaltelangre/ff/issues/4.

Here's a simple version of a file find command utility, written in D, minimal features, may be useful for beginners to D:

Command line D utility - find files matching a pattern under a directory:


The D stdlib's dirEntries() function makes the job easy, for a basic one like this.

Also, related, a bunch of D command-line utilities, again may be good for beginners to learn a bit from:


What does this tool provide that I can't get from a wrapper around gnu find?

What does GNU find provides that I can't get from a wrapper around Perl?

- It frees you from writing the wrapper (duh)

- It probably have more niceties than a 5min wrapper (color, friendly defaults, etc.)

I've personally used a homemade find wrapper for years, but have now replaced it with `fd` (another find reimplementation). The advantages isn't huge, but I find the niceties I never bothered implementing - well - nice :)

That being said: I did get burned by the default behavior of ignoring hidden files :D

Would've been nice if it by default didn't show hidden matches, but printed on stderr - something like:

    $ fd thing
    >searching hidden files - ^C to abort<
    >hidden files found (29) - use `--hidden` to show<

If nothing else - ability to easily compile for native Windows and work as expected (with no Cygwin/GnuWin32 fun).

It might also be valuable to projects like Redox.

probably nothing, is that a problem?

Not a problem, but it's still a valid question. I'm all for people writing alternatives to existing tools. I've done so myself multiple times. But when you start promoting your alternative, "so what's does it do better?" is a very reasonable question for other people to ask. After all, they are the ones who need to invest time in learning the new tool, perhaps add it as a dependency in their tool chain/work flow - with all the downsides that has.

The readme states it is ~3 to 5 times faster. And it seems to have a more beginner friendly syntax.

I am not aperson who just has to use the newest and gratest for the sake of it, but I’d take ripgrep/rg over grep whenever I have the choice.

That’s fun! I needed something like this and wrote my own a while ago. Choose to implement it with a different algorithm though instead of regex.


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact