It would be nice to have a collection of modern/faster/saner alternatives to common unix tools and such utilities (made in rust or not) that becomes standard -- or at least easily installable in linux distros and OS X. Not for replacing those tools, but for supplementing them.
I appreciate all these new incarnations of old school tools but end up sticking with the basics regardless. When I'm sshing into a box to figure out whats going on, my toolset is mostly limited to top, find, xargs, awk, grep, df, du etc. Even local development now, I'm mostly debugging on a docker container running alpine or ubuntu.
Knowing the right incantations is useful in that context and keeps me from installing better tools until they become part of the distro we deploy with.
> When I'm sshing into a box to figure out whats going on, my toolset is mostly limited to top, find, xargs, awk, grep, df, du etc.
I wrote some of the tools on coldtea's list, and I'm pretty much the same way in that I stick to distro tooling in vanilla setups. But I spend enough time on my local workstation that having "better" (read: creature comforts) is worth it there! But yeah, if you spend most of your time in vanilla setups, then tools not in the standard distro repos are a hard sell.
What's strange to me is that some of these new incarnations use different syntax and/or flags. If those were the same you get the advantage of migraters not having to relearn anything. It also allows for the beauty of aliasing the new tool to the old name!
There is a very simple response to this: if ripgrep were a 100% drop-in replacement for something like GNU grep, then there would be basically no point for it to exist in the first place. The whole point is that it is very similar but does some things differently that a lot of people consider "better." I imagine this is truish for the other tools mentioned in this thread as well. (And still other tools, like xsv, have no pre-existing standard. Standard UNIX tooling doesn't work on CSV.)
Have you considered making a "grep mode" for ripgrep? If it's triggered by the name of the binary, it could provide both the traditional flags as well as its own.
A couple of people have asked for it, but the ROI isn't worth it. Think about all the various POSIX compatibilities, locale support, subtle differences in flag meanings, supporting the different flavors of regex (BREs, EREs) and of course, all the various differences in the regex engine itself, such as leftmost-first vs leftmost-longest matching, and subcapture match locations as well.
I feel like people significantly underestimate the difficulty of being 100% compatible with another tool. Not even GNU grep is strictly POSIX compatible, for example. To get POSIX compatibility, you need to set the POSIXLY_CORRECT environment variable.
And then what do I gain from this? Do you think distros are going to start throwing away GNU grep for ripgrep? No, I don't think so. So what then? A few people will have the pleasure of using ripgrep with grep's flags, even though they are already significantly similar? Not. Worth. It.
Now... If someone does think it's worth it, then they are more than welcome to start that journey. Most of ripgrep is factored out into libraries, so in theory there is a lot of reuse possible. But if you want to support compatibility all the way down into the regex engine, then you might be hosed.
RMS' original convention for requesting GNU tools be compatible with questionable legacy behaviour was POSIX_ME_HARDER, but he toned it down as a gesture of goodwill to the other members of the POSIX committee.
> if ripgrep were a 100% drop-in replacement for something like GNU grep, then there would be basically no point for it to exist in the first place.
That seems to miss a huge lesson: backward compatibility eases transition, new features retain users.
Therefore, your assertion makes only sense if there would be no point in attracting existing users to use a tool whose added value is entirely irrelevant.
It doesn't miss any lesson. It's a trade off. The headline feature involves a default mode of operation that's fundamentally incompatible with GNU grep. Full stop.
ripgrep is an evolution on ag, which in turn is an evolution on ack. All three tools have similar defaults, so in fact, ripgrep preserves some amount of backward compatibility with previously established tools in terms of the default mode of operation. Just not GNU grep.
This goes further in that the intersection between ripgrep's features/flags and GNU grep's features is quite large---certainly much larger than the differences between them, which is just another form of preserving backward compatibility. This was done on purpose for exactly the lesson you're claiming I missed: backward compatibility eases transition.
(The context of this conversation was a 100% backward compatible version of ripgrep with GNU grep. See my other comments on ROI. Just because I can argue against 100% backward compatibility doesn't mean I've missed the importance of backward compatibility.)
I agree with you, the reason these new tools are useful is they’re much more intuitive than the esoteric flags that have built up over the last 30 years.
The people in charge of the distro. Governance structures vary. And sometimes it's not just about who gets to decide, but also, the technical work required to do it. People need to put in the work to make Rust applications packageable according to the distro's standards. YMMV.
This. And at some point enough of these alternatives might accumulate that a new distro will form using only the new tools. The issue is that most people don’t realize just how entrenched the existing tools are. They are a part of cross distro standards because often times they are used for scripting. Imagine a distro that decided to not ship ls in favor of exa. Ok it’s own this isn’t a problem at all. But combined with the overwhelming amount of FOSS you might want to install on top of your distro, you would suddenly need ls anyways.
Debian in particular has an entire alternatives system. Nothing stops somebody from creating the packages `nice_tools` and `select_nice_tools` that will install those tools and make them the default binaries for the commands.
>I appreciate all these new incarnations of old school tools but end up sticking with the basics regardless. When I'm sshing into a box to figure out whats going on, my toolset is mostly limited to top, find, xargs, awk, grep, df, du etc.
That's why I wrote about it being nice to have them all in distros -- perhaps in a single package like coreutils.
Then wherever you are, they'd be just a "package-manager install rustutils" away.
Generally, if one is not an admin of random hodgepodge of systems (e.g. in big enterprises), then you are:
(1) doing work on your own workstation
(2) administering machines you control the provision (e.g. a person administering a startup's Cloud servers)
(3) doing development on some kind of vm
In those cases you can quite easily install a bundle package and make sure those tools are there.
I, for one, don't go out in unknown systems day after day -- even if we have 100s of servers we manage, we DO manage them.
Just my experience, I'm sure there are other types of organizations. In most of my workplaces, having one developer's preferred tools installed as a build step might not pass review. I would be annoyed if fish was installed on all our prod boxes, for example.
I'm glad these tools work for you and I hope the community keeps making them, I just threw in my perspective.
That was just a hyperbolic example, but I can think of a few superficial reasons:
1. Because you end up with a more complex Dockerfile. I've seen curls, clones etc during build - having a remote call for random tools in our build _feels_ like a bad practice.
2. There is also the issue that a fleet of application servers will have different tooling available, which can be frustrating in a pinch. I've done some gnarly things across many hosts, expecting a common set of tools. At AWS, we were responsible for non-nuclear team's services during oncall and would ssh to their boxes. Logging into a box with a different shell would be annoying.
[2] > for i in `cat hosts.txt`; do ssh $i '...'; done
Wondering the same when you consider you don't have to login to fish as the default shell, it's all set per user. Even so, they can just turn it on once they log on.
'fish' I take it is a shell (of which I remain ignorant.) I suppose that at the very least, it increases the attack surface on an installation and either makes security harder or the system less secure.
I think I wasn’t clear, but a lot of people are harping on the specific use case I glibly mentioned.
To answer you, I meant on a Docker image build step- which is where you’d want to install the tool so it’s accessible to people debugging the application during the container run context.
A potential solution that would allow you to have your cake (SSH into wherever) and eat it too (have your favorite tools on hand) would be: use a package manager on your hosts that allows for safe installation of per-user packages (that is, such package installation should have absolutely no effect on the system outside of the user’s environment. A logical implication here is that it should be possible to use multiple versions of each package, both inter- and intra-user) and per-user package installation should not require special privileges (i.e. you shouldn’t need to be root if, as per the previous requirement re: safety, per-user package installation has no effect on the rest of the system). Then you could log in and do your work with the best tools you know, rather than being constrained to what your distro (or your servers’ admins) deems useful enough to be installed by default.
It strikes me as an odd choice to accept the least common denominator of tools. As a technologist, that sounds frustrating. And as someone in leadership, I can’t imagine hamstringing my team by making them work within an environment that won’t let them use the best tools for the task at hand.
What surprises me is that even after 25 years of the Internet, we cannot bring in new tools into our environment (be it a remote box I ssh into or a trimmed down docker image) on the fly.
The popular package management tools rely heavily on FHS and like to install stuff into directories that require root permission.
Imagine if there was a tool that could download binaries of modern tools from a certain repo and install it to our ~/bin. Imagine if we could use new command line tools as easily as we can download a fully functional complex applications securely into a browser tab just by typing its URL!
You can. There's nothing stopping you adding ~/bin to $PATH. Many compilers allow you to specify the destination location either via a ./configure flag (for example) or an environmental variable. Or there is always the option of doing a `make build` and then manually copying the binaries into your ~/bin directory without doing a `make install`
You can also add ~/lib (for example) to your LD_LIBRARY_PATH path if you wanted to install custom libraries and even have your own man page path too.
All of this is already possibly on Linux and Unix however I'm not sure it's something you actively want to encourage as allowing users to install whatever they want on servers lowers the security to the level of the worst user on that server. If it was really deemed necessary that users should be able to install whatever they want then I'd sooner roll out a dedicated VM per user so at least their damage is self-contained (barring any visible networked infrastructure)
>You can. There's nothing stopping you adding ~/bin to $PATH. Many compilers allow you to specify the destination location either via a ./configure flag (for example) or an environmental variable.
Of course you can, technically. The parent laments why it's not easier to achieve. Already you're talking about manually building for example. Where's a package manager that will allow for that too, not just the central repo? (not just asking if such manager exists in some form, asking where it is in modern popular distros).
>All of this is already possibly on Linux and Unix however I'm not sure it's something you actively want to encourage as allowing users to install whatever they want on servers lowers the security to the level of the worst user on that server.
Which also touches the parent's question. Why is it not easier AND safer? It's not like we don't have security models that allow for such things...
This is precisely what I am forced to do and I do this very often when I have to setup my environment in a remote box I don't have root-login too. But this is by no means a trivial process compared to how easily we load a complex app into a browser tab just by clicking a URL.
I think it is fair to hope that after 25 years of Internet, it should be easy to bring in new tools from the Internet into our local environment without requiring root access in a safe and trivial manner.
Firstly the internet is a lot older than 25 years (perhaps you mean web?) and secondly if the last 25 years has taught us anything it's that allowing users to download and install whatever they want usually leads to problems. I mean I do get your point and even sympathise with it, to a point. But if someone needs SSH access and I don't trust them enough to put them in the sudoers file, then I'm not going to trust them enough to even choose what software to install and run from their own local user area. I've been burnt before from intelligent people doing stupid things because they thought they knew what they were doing. So if you're not a sysadmin and you're logging into a shared host, then you don't get install rights. I don't think that's an unreasonable stance to take.
Nix now can download and install binaries into a home directory? I thought it would require quite a bit of work on a building process and binary patching post installation, since you know, a lot of things require absolute paths, like an elf interpreter for example.
The problem is that after 25 years of the Internet, we have almost completely stopped using multi-user systems.
It even used to be easier to make stuff run from your homedir. We are moving from it, not towards it. It is really a shame for the few multi-user setups out there.
It would be even nicer to have an opensource project that pushes a set of these tools, that then can get picked up by the distros such that in 5 year you can expect these all to be included at a reasonable version such that you do not have to install them by hand on each machine that you land on.
On the other hand, you may want to keep some smart dotfiles around that installs them in ~/bin
It is definitely faster. The ripgrep author did extremely extensive benchmarking to a lot of tools. For the most common usecase check out this section, but the rest of the article is very much worth a read as well: http://blog.burntsushi.net/ripgrep/#subtitles-literal
One could argue that tool like this makes no sense since make is installed almost everywhere but I'm already using fish shell so one more non standard tool makes no difference. I appreciate simplicity of "just".
Httpie is my goto http cli for all systems. Pretty much PostMan (that a lot of ppl use for no reason) but without annoying UI and on a single line. Formats JSON and colors it right in the terminal as well. Has a super nice query, header and post body syntax. Works well with sessions and cookies.
> Pretty much PostMan (that a lot of ppl use for no reason)
I can't see anywhere in the httpie docs about capturing variables from and/or running tests on the HTTP responses - it's an extremely powerful feature of Postman when you're debugging non-trivial HTTP flows.
That's basically what's happening. AFAIK, ack kinda started it all by observing, "hey, we frequently have very large directories/files that we don't actually want to search, so let's not by default." Depending on what you're searching, that can be a huge optimization! Tools like `git grep` and the silver searcher took it a step further and actually used your specific configuration for which files were relevant or not. (Remember, we're in best guess territory. Just because something is in your .gitignore doesn't mean you never want to search it. But it's a fine approximation.)
Was this a thing back when the BSD and GNU greps were being developed? Was it common for people to have huge directory trees (e.g., `node_modules` or `.git`) lying around that were causing significant search slow downs? Not sure, but it seems to have taken a while for it to become a popular default!
There are of course other tricks, and most of those are inspired by changes in hardware or instruction sets. For example, back in the day, a correctly implemented Boyer Moore was critical to avoid quadratic behavior by skipping over parts of the input. But that's lessish important today because SIMD routines are so fast that it makes sense to optimize your search loop to spend as much time in SIMD routines as possible. This might change how you approach your substring search algorithm!
... so what's my point? My point is that while sometimes optimizations are classic in the sense that you just need to change your heuristics as hardware updates, other times the optimization is in the way the tool is used. Maybe you can squeeze the former into existing tools, but you're probably not going to make much progress with the latter.
I remember when I first put out ripgrep. Someone asked me why I didn't "just" contribute the changes back into GNU grep. There are a lot of reasons why I didn't, but numero uno is that the question itself is a complete non-starter because it would imply breaking the way grep works.
>”hey, we frequently have very large directories/files that we don't actually want to search, so let's not by default”
>Was this a thing back when the BSD and GNU greps were being developed? Was it common for people to have huge directory trees (e.g., `node_modules` or `.git`) lying around that were causing significant search slow downs? Not sure,
Sure, actually. It was called “build” and it was moved out of source code hierarchy to not interfere with tools. Pretty clever, and you don’t have to patch every tool each time new build-path appears. This should lead us to some conclusion, but I cannot figure out which. Do you?
Those who don't understand history are destined to repeat it?
also that approach seems dependent on all projects you might want to grep (in this example) building cleanly into an external directory, which is naturally never going to be 100%: some people don't know why other software supports those options, some people don't care, some people think that's the wrong solution to the problem, etc. Ultimately someone comes along and builds up a big enough body of experience that they can account for and fix some fraction of the brain dead behavior out in the wild, and the rest of us get a useful tool.
I'm afraid I don't quite see the purpose of una. There are really only two command lines you have to remember. `7z x` for anything.7z and anything.rar. And `tar xf` for anything.tar.anything.
grep came, then ack-grep, then silver-searcher and a few others, finally ripgrep. For most purposes, ripgrep (the one written in Rust) is the best of all of these tools, typically being the fastest and also the most Unicode-compliant, with the main downside being its non-deterministic ordering of output (because it searches files in parallel).
The only reason for using ack these days is if you’re already invested in a Perl workflow and can use some of the Perl magic powers, and don’t care about comparatively poor performance. That’s not many people.
What do you mean by a "invested in a Perl workflow"? Do you mean "comfortable with Perl regexes"?
Other reasons that you may want to use ack:
* You want to define your own filetypes
* You want to define your own output using Perl regexes
* You need the portability of an uncompiled tool in a single source file that you can drop into your ~/bin when you can't install or compile anything on the box.
* You don't want to have to deal with a Rust runtime.
Thanks for the clarifications. I was under the impression that Rust required a lot of tough infrastructure.
I'm glad to see that ripgrep lets you define your own types. That ripgrep does it with globbing vs. the ways that ack allows is just another of the differences between the tools that people have to decide about.
I had real trouble getting ack to work when I first started using it. On Linux I had some trouble, I can’t remember what, but on Windows it was downright hard to get it to work properly. When using a new computer once I switched to ag one time because I was unable to get ack working on Windows.
I believe that defining the output using Perl regexes was what I was referring to with “Perl magic powers”. I work at FastMail, the backend of which is mostly Perl, and have been unable to convince most people to use ripgrep because they like Perl and ack and occasionally use some of that fancy superpower. I, on the other hand, lack that experience (I’m most used to Python, Rust and JavaScript, and have never seriously worked in Perl) and thus don’t really get anything out of ack over ripgrep, without putting in a fair bit of effort to figure out what I need to do in a particular case. I’m more likely to feed the output through sed or vim if I want to rewrite it, actually!
I’m very glad that ack existed; it led the way with better tools, and I used it heavily for some years. Thanks for making it!
> they like Perl and ack and occasionally use some of that fancy superpower. I, on the other hand, lack that experience
One of the things I'm working on right now is a cookbook that will ship with ack 3, currently about thisclose from going to beta. I want to show examples of the magic powers and see how powerful it can be. Your comment reinforces in my mind that it's an important thing to have.
> I’m very glad that ack existed; it led the way with better tools, and I used it heavily for some years. Thanks for making it!
You're very welcome. I'm continually pleased to see what's come out after it. A search tool arms race is a good thing.
You're very welcome. I'm glad it's made things easier. git grep, ag, rg, whatever: Just use whatever makes you happiest.
It's interesting how git grep makes the ack feature of "ignore your VCS dirs" not so important any more. I wonder how the grep-alike tool world would look if git weren't so dominant, and/or git didn't include its own grep.
ack still shines when I have a unique application to highlight one or more regex expressions in a log file because of its unique `--passthru` option, AND support of PCRE regex.
So I can have regex like 'abc\K(def)(?=ghi)'. That will highlight ONLY 'def' in the text but only if that string is preceeded by 'abc' and 'ghi' follows.
- ripgrep cannot do that as it does not support PCRE
ag support --passthrough, but I haven't yet felt the need to try that as ack --passthru has been baked into my workflow and aliases for ages.
Thanks. That's good to know, though it's impossible for rg to do something like this at the moment.. highlight different regexes in different colors. Note that support of \K from PCRE is really crucial for this.
I'm sorry to be something of a negative Nancy - and I'm sure I'm a corner case - but this is not really an alternative to find. It's an alternative for your editor's fuzzy find and a better version of shell globs.
The absense of -delete, -execdir, -mtime, and the ability to chain multiple patterns together for the non-trivial use-cases, means this is practically useless in most places where `find` is used in day-to-day work. Not to mention the "opinionated" choice to ignore a dynamic set of files and directories (what files were skipped? In any directory, or just git repos? Does it back track to find a git directory and .gitignore? Does it query out to git? Does it respect user/global .gitignore settings? Does it skip Windows hidden files from a unix prompt, or a dotted file when on Windows?), the options hidden behind two separate flags.
Perhaps it's just because I'm used to using 'find', but when I reach for it, it's because I need -delete or -execdir, or I'm writing automation that really needs -mtime and chained patterns.
So, I would suggest that you don't call this an alternative to find; it's not. A replacement for shell globs, sure. A replacement for poor file finding in text editors, OK. Just... not find.
> So, I would suggest that you don't call this an alternative to find; it's not. A replacement for shell globs, sure. A replacement for poor file finding in text editors, OK. Just... not find.
It's literally called "a simple [..] alternative to find" and the very first sentence in the README says that "it does not seek to mirror all of find's powerful functionality". I'm not sure anymore how far I have to back off until everyone will be okay with my "advertising" :-)
> The absense of -delete, -execdir, -mtime, and the ability to chain multiple patterns together for the non-trivial use-cases, means this is practically useless in most places where `find` is used in day-to-day work
Yeah except in most places where `find` is used in day-to-day work it is just for `find -iname 'foo'`. Anything fancier is probably less than 10% of uses. Here is my .bash_history:
> The absense of -delete, -execdir, -mtime, and the ability to chain multiple patterns together for the non-trivial use-cases, means this is practically useless in most places where `find` is used in day-to-day work.
I completely agree with that.
And would in fact go slightly further: one of my huge annoyances with `find` is that I can never get alternative (-o) non-trivial patterns to work correctly, I think there's room for an alternative tool here but it should learn from set-theoretic API and make non-trivial predicates easy to write and test. Something like mercurial revsets — using fileset patterns and predicates instead of revs obviously — would be a godsend.
> Not to mention the "opinionated" choice to ignore a dynamic set of files and directories
That has bitten me a few times when using `rg`. Sometimes (but not often enough to remember the switch) I want to grep a lot of binary .so files for a string and am left wondering why nothing was found before just replacing rg with grep. Just calling `TOOL PATTERN` without switches is the most intuitive thing, and there I agree with `fd` to make `fd PATTERN` the default. Unlike BSD find which always needs at least a `.` to function.
> chained patterns.
The -or/-and and are indeed nice, but the -prune logic should get an update. Maybe when fd grows from 80 to 90% of use cases.
Here grep's default behavior is what I expected: I want to know which binary files contain some symbol, details are less important (yes, I could use objdump or smarter tools, but grep is good enough 99% of the cases).
rg -u should list the matching binary file, maybe rg -uu should walk a few (printable!) chars to the left and right of the binary match and -uuu should just assume it is text, i.e. the user is right and the auto-guess wrong.
That's an interesting use case. You could use the `-l` flag to only list file names that match in combination with `-a` I suppose. That wouldn't let you see the non-binary matched lines though.
In any case, I filed an issue[1]. I think the UX is pretty tricky to get right though.
I'm with you in that I will probably not use this (the auto .gitignore reading freaks me out). Another elitist aside: I can usually tell that I won't want to use a tool when it has a mac os install option first under the downloads section on its page.
That said, the good thing about open source is you don't have to use it. I imagine the opinionated choices they made are popular amongst that crowd who seem like the types who mainly use the terminal to code--hence the .gitignore awareness; so, I imagine it's ok for others to use it if they find it makes themselves more productive for what they do. But, the good news is fd won't become a default anytime soon for that reason, so I have nothing to worry about this affecting me.
> Another elitist aside: I can usually tell that I won't want to use a tool when it has a mac os install option first under the downloads section on its page.
This is only because it's so much harder to support every possible Linux distro. I actually don't use macOS on my own, but 'Homebrew' seems like a really nice package manager. (Also, the first install option is via "cargo" which is platform independent.)
Quick FYI: xargs is not the same as execdir; execdir executes the command in the same directory as the file. And to account for spaces in filenames (yay, OSX), your command starts getting a bit more complicated: fd -0 .py | xargs -0 stat
Yes I'll agree with the people who say that this isn't a drop in replacement (fewer features and different syntax) but I also just don't care. fd's defaults are brilliantly simple and just plain make sense. Yes I can use find but I always, always end up looking up an example. fd was simple enough that I very quickly figured it out by trial and error. And it really is fast.
Thank you sharkdp -- really nicely done. Appreciated.
I'm curious why you (or anyone) uses -exec? It's often painfully slower than piping through xargs and I find the syntax (the semicolon and braces and the required quoting/escaping) uncomfortable. I haven't used -exec in 20 years.
In my opinion, xargs has really niche use cases, and -exec is way more fit for general use.
- There's the issue of command line length, pipe too many files and xargs will just fail.
- There's predictibility. With -exec you get to see the entire command line structure, what is escaped, what is not, what names go where. With xargs, a couple of badly named files and you are out of luck.
- Find's syntax is just more general. Xarg is limited to pushing parameters into a script, while -exec can run whatever scrit you want, with single or multiple parameters, or flags, or whatever.
> It's often painfully slower than piping through xargs
I've tried it. It has different semantics. First attempt always leads to errors and doing the wrong things. I always have to come up with some workaround to make things work as I expect they would (i.e. how -exec does things by default).
This lack of confidence also means I would never dare using xargs for anything remotely destructive (like rm).
What reason do I have not to use -exec? Why swap a perfectly perfect hammer for a broken screwdriver?
Besides: IMO the timesaving aspects of find is not strictly tied to how fast find runs. It's tied to what I can automate.
YMMV? Having used find -exec for 20+ years, I find it much more difficult to convince myself to switch to tools like this, and just simpler to use -exec. That, and the extra spawn - and potential for disaster - that putting a | into the mix can incur ..
What about commands which only take a single file as input? Is xargs -n1 noticeably faster than -exec? (And if it is, is that actually worth it for clarity's sake?)
xargs does have limits on input length. On some platforms they aren't terribly high. Sound like -exec is potentially coming though.
Edit: Swore I had gotten ARG_MAX complaints out of xargs in the past. Replies are correct though. I'm wrong here...xargs adjusts on the fly. Perhaps long ago, or some obscure use of xargs?
Why is this a problem? Except in terms of performance, it's completely invisible to the user as xargs will run the command multiple times to ensure all the arguments are processed. In the most extreme/unrealistic case, if xargs could only support invoking the command with a single arguement, then the performance would regress to that of using -exec. Otherwise it's much faster.
Some versions of xargs don't respect the shell's max argument length, so on older platforms (MacOS X 10.4 sticks out in my memory) it's easy to cause xargs to try invoking with a longer command line than permitted.
This is fixed with the `-n` argument to xargs, which lets you specify the number of arguments per invocation.
Any plan to provide find compatibility for drop-in replacement? My find/locate usage patterns are not worth learning new sophisticated tool for 5sec speedup once a week, and I suspect that I’m not alone. This is the main problem of new tools — not repeating the old-familiar style makes it unattractive for non-aggressive users.
Side question: why is speed important? Which tools use fd to gain required performance?
No, fd does not aim to be a drop-in replacement for find. It was actually designed as a "user-friendly" (I know, I know...) alternative to find for most of the use-cases: "fd pattern" vs. "find -iname 'pattern'", smart case, etc. Also, in my view, the colored output is not just a way to make it look fancy, but actually helps a lot when scanning a large output in the terminal.
The speed is not hugely important to me, but I wanted it to be at least comparably fast to find (that's why I decided to try out Rust for this project). Otherwise, nobody would even consider using it.
I know that it will not be for everyone and I also know that it's a lot to ask someone to learn a new tool, but I'm happy if there are some people that enjoy using it. I certainly do ;-)
You can also add the --back option as an alias, so you don't need to type it in every time on systems where that is important. You could also use an environment variable to modify the behavior.
The GNU team did this when they added or changed behavior to the original tools. A lot of them have flags like -posix and such which give them a strict set of behaviors.
I have encountered edge cases with pipe and xargs. if there was something that made it simpler i'd be all for it. This project aims to make find usable and simpler so why not?
I see the speed as a secondary concern relative to the simplified interface. This tool looks like it covers about 99% of my usage of find with a much simpler and terser syntax, and I'm looking forward to trying it.
The other big advantage is `more user-friendly`. Thus no backward compatibility. No doubt some people are used to the old patterns, but lots of people also prefer saner defaults.
IMHO, there is no demand for this kind of "improved tools collection". In the programming world, the programmer creates a sweet working environment to fit his needs by creating many alias and functions, by customizing his desktop. He complains that the tools collection is complex but has no opportunity to progress because he rarely uses them directly. In the "sysop" world, the user is always logged remotely on varying machines. There is no incentive to install and customize a machine because he will quickly have to work on another machine without this customization.
I think we need a saner alternative to the man pages of the classic unix tools collection. Something less focus on the seldom used options, but more focus on the traps to avoid and the useful tricks. The bad quality of documentation is more annoying than the incongruous syntax of tools.
For me, the speed of find and grep is sufficiently never a concern, so I do not find these speedy alternatives sufficient to compensate for its non-universality. And when the speed of grep and find do become a concern, I would think the issues are somewhere else -- like have 10s thousands of jpg files scatter everywhere that you need to find in the first place. When issues are somewhere else, replacing it with better tool only hides and makes the problem worse.
100% of the time I consider using find, what I want is "dir/s foo". Step 1 is decide whether it's worth re-looking-up how to do it. I rarely remember--sometimes I think the man page will be helpful. It's not. ("man find" mentions "-name", which gives me an "illegal option" error. Maybe it's "--name"? Nope. Maybe it's "-n"? Nope. Maybe it's "-f"? Oooh, you have to do path and then add in options, unless those options are -H, -L, -P, -E, -X, -d, -s, or -x, in which case they go before the path? Of course, how silly of me.)
"fd filename" does exactly what I want, and what I'd expect. Gave me what I wanted on the first try.
For my purposes, find is fully replaced, with something actually usable. Small, simple utils are nifty.
Thank you for the feedback! Most of the credit for the speed goes to the amazing Rust modules 'ignore' and 'regex', which are also used by ripgrep (https://github.com/BurntSushi/ripgrep).
Did you do a benchmark with `--threads 1`? I suspect most of the speedup is coming from that as the the listing of the files is likely to be IO bound (or at least bound by calling into the kernel to read the directory listings).
That has certainly been my experience in the past when experimenting with this sort of thing, that more threads makes a lot of difference.
I did. You are right, multi-threading does not give a linear speed up, but it makes fd about a factor of three faster on my machine (8 virtual cores). With `--threads 1`, fd is on-par with 'find -iname', but still faster than 'find -iregex'.
fd is about a factor of 2-3 faster (for this example):
> time fd -HIe jpg > /dev/null
5,12s user 2,03s system 785% cpu 0,911 total
> time ag --depth 100 -uuu -g '\.jpg$' > /dev/null
0,95s user 1,66s system 99% cpu 2,628 total
I've found that fzf has replaced all uses of find for me except in scripts. fzf has the benefits of being bound to a single key binding and showing me results as I type, rather than after the command runs.
You're right that fzf doesn't replace find. Fzf is just an interactive, filtering tool. You can pipe anything into it. find is a common way of populating it with data.
I suspect the GP meant that fzf replaced the pattern 'find |xargs $binary' for small interactive use cases. It's much nicer to do '$binary <invoke fzf>'. I use fzf the same way with key bindings to select directories and files powered by find.
Interesting. Why does multithreading make a big difference? I would have assumed that disk seek latency was the speed-limiting factor; what am I missing?
Should have been more clear heh. There was a benchmark that didn't use regex and seemed to turn off the ignore files feature. Im not familiar with rust or those crates Im assuming that's what those are there for?
It's exciting to see so many problem solvers use Rust to improve core linux utilities. I've been using Exa as an 'ls' replacement for some time now. 'fd' looks promising but has room to grow, still. Finding a file and then acting upon it is an important feature that ought to be addressed.
The docs don't say: is this meant to replace Busybox or the actual GNU coreutils? As in, all GNU features and flags (which are generally much more comprehensive and IMO nicer than plain old POSIX features).
I do think it (optionally) provides a busybox-esque single-binary interface, but I do not know whether this also translates into a busybox-esque command-line interface.
Whoops, good point. In the long run I think Una's a good candidate for being installed at the systems level, because you really want to install unzip, etc. along with it.
EDIT: The idea being that if you install it with your system package manager it can have unzip etc. as dependencies so that's taken care of automatically.
zsh globs are about a factor of 5 slower (for this example):
> time fd -sIe jpg > /dev/null
1,24s user 0,77s system 758% cpu 0,265 total
> time ls ~/**/*.jpg > /dev/null
0,53s user 0,97s system 98% cpu 1,518 total
The benchmarks that are mentioned in the README are performed for a "warm cache", i.e. I'm running one of the tools first to fill the caches. Then, I'm performing multiple runs of each tool (using bench for some nice statistics) such that both tools profit from the warmed-up cache.
I also perform other benchmarks where I clear the caches. In this case, 'fd' is usually even faster.
for a narrow set of usecases with cold page caches I have written ffcnt[0]. As the name says I mostly use it to count how many million files have accumulated over time in a large directory tree. It could be extended to do regex matching on filenames a la find though, I just never had the need for it.
I am not sure if I find that a smart default as I frequently exclude generated files from git repositories and when want to find something I do not know why I would not like to find such a file if my pattern matches it.
I would prefer to let the switch enable the .gitignore logic, but as I don't know the authors use-case, theirs might be valid too.
Surely the common case is that you don't want to search generated files? Or at least I personally am almost never interested in generated files because they're not the ones I'm going to be changing/doing stuff with.
In a package manager I wouldn't expect to see it. On HN, especially as a Show HN, the audience has a much higher likelihood of being interested in digging into the code, perhaps even becoming a contributor. That makes the language used much more pertinent.
Rust is somewhat of a special case: I bet some people mention their software is written in Rust just to avoid a bunch of comments proposing to rewrite it in Rust,
Aye, that's because being written in Python, C or Go is - in my eyes - a strike against the project in an absolute sense, because it will be more buggy and difficult to maintain due to lacking the functional-programming abstractions and powerful static-type system of a good ML descendant such as Rust or OCaml. (Or of Haskell, but it tends to lack the practicality of the ML family.)
So of course one does not trumpet that one's project is written in Python, C or Go.
n.b. I said "absolute sense" because all of this this is, of course, inapplicable when searching for libraries specifically for a language such as Python, C, or Go.
I don't buy it. People say, "X (written in Foo)" because there is fundamentally an interest in the fact that something is written in Foo (and this may indeed depend on the nature of X). Back when Go was new and shiny, it was the same situation.[1] :-)
> there is fundamentally an interest in the fact that something is written in Foo (and this may indeed depend on the nature of X).
That speaks to a deficit in human cognition, that slapping Foo's name on something makes it interesting only based on Foo's "shininess" rather than Foo's technical merits. Neophilia, perhaps?
Anyway. Go has been in roughly the same state of "another purposefully-boring Java-like, but instead of generics it has a different shade of OO and a full gamut of integer machine-types" for its entire life thusfar, which is why I've never understood most of its hype.
Rust, on the other hand, actually has some serious motivation to its learning curve and its developing ecosystem.
I'd say there are two (positive) reasons to it, since it only really appears for newer languages that are just gaining popularity
1. An improved tool was made, and presumably was made easier by the language somehow.
2. The tool serves as example of larger product written in that language
Both facilitate further usage of the language (by evangelism), show the language as being up-to-snuff (production-viable), and the existence of this codebase acts as an example for others within that language community.
It's not so much the language makes this tool better, but for that particular community, that the language was used is important, possibly more so than the tool itself.
> and presumably was made easier by the language somehow.
Right, and outside a few very narrow domains that are helped by the bolted-on concurrency, Go generally fails to deliver on those presumptions, for the same reasons that Java wouldn't if it were invented when Go was.
“Be civil. Don't say things you wouldn't say face-to-face. Don't be snarky. Comments should get more civil and substantive, not less, as a topic gets more divisive.”
Calling something bullshit is both unwarranted in this case and unfair to the person whose work you’re dismissing. They built something and gave it to you for free – if you’re not interested just move along.
> The fact that Rust is used is uninteresting, considering there's nothing special about it as it pertains to the other advertised features.
Considering the fact that a programming language's appeal is directly influenced by the size and accomplishments of its community, I'd say it's relevant.
A high level, systems programming, relative new language is used to write software that outperforms the default tools. That's proof that the language has potential and encourages people to try it out. It's how languages and communities grow. If you think it's uninteresting, that's your subjective opinion, and you're free to dismiss it. It certainly doesn't make the language, the project or the title "bullshit" or "trivia relevant only to [...] language flamewars".
There is nothing interesting about the performance improvement that is a product of using a particular language, and particularly this language. It may as well have been written in Java or Haskell.
Speak for yourself. I would love to see someone write a Java or Haskell tool (where the proportion of Rust code in this tool is similar to the proportion of Java/Haskell code in this hypothetical tool) that could compete in a similar set of use cases. I would learn a lot from it.
> There is nothing interesting about the performance improvement that is a product of using a particular language, and particularly this language.
I said it's "proof that the language has potential and encourages people to try it out". But you seem biased against it ("and particularly this language"), which makes for a poor argument.
>A high level, systems programming, relative new language is used to write software that outperforms the default tools.
It does stretch “the alternative” meaning on it and only then outperforms, actually. But it is not a tool alternative, it is alternative to few use cases, which could be improved by contributing to original projects, but instead they do blind fragmentation. Efforts spent meaninglessly and destructively from the unix community pov.
Take a good tool, call it slow, incompatibly rewrite one option optimized to hell, present as an alternative in rust, take rust hype, cv points, attention, hate from those who understand what is done. Take it all, be a man.
> Efforts spent meaninglessly and destructively from the unix community pov.
I think the unix phylosophy Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new "features". is better applied by this `fd` tool than `find`.
If some else replaces the set of features offered by `find` with other, faster, rust tools the effort would be quite welcomed.
> take rust hype, cv points, attention,
You're welcome to add to your CV all the ways you can use find.
Thinking of stuff like fd and:
ripgrep: https://github.com/BurntSushi/ripgrep (grep/ag alt)
xsv: https://github.com/BurntSushi/xsv (csv tool)
exa: https://the.exa.website/ (ls)
una: https://github.com/jwiegley/una (multi-compression utils wrapper)
tokei: https://github.com/Aaronepower/tokei (loc stats)
And of course this: https://github.com/uutils/coreutils