Hacker News new | past | comments | ask | show | jobs | submit | burntsushi's comments login

There's plenty of examples like that though. A Python programmer might not know to compile in release mode. They might not use buffering when reading from a file. They might pass around copius copies of Vec<T> instead of &[T]. The list could go on and on.

Sure, and there would probably be some value in a tool which can walk them through the easy stuff before they show a real human code which it turns out just wasn't tested with release optimisations or whatever.

Still, as I understand it CTRE means if you just "use" the same expression over and over in your inner loop in C++ (with CTRE) it doesn't matter, because the regular expression compilation happened in compilation as part of the type system, your expression got turned into machine code once for the same reason Rust will emit machine code for name.contains(char::is_lowercase) once not somehow re-calculate that each time it's reached - so there is no runtime step to repeat.

This is a long way down my "want to have" list, it's below BalancedI8 and the Pattern Types, it's below compile-time for loops, it's below stabilizing Pattern, for an example closer to heart. But it does remind us what's conceivable.


IDK how we jumped to CTRE. Python doesn't do CTRE. It's doing caching. In Rust, you use std::sync::LazyLock for that. I don't get what the problem is to be honest.

I assume by CTRE you're referring to the CTRE C++ project. That's a totally different can of worms and comes with lots of trade-offs. I wish it were easy to add CTRE to rebar, then I could probably add a lot more color to the trade-offs involved, at least with that specific implementation (but maybe not to "compile time regex" in general).


I jumped to CTRE because it's another way that you can get the better results. The programmer need have no idea why this works, just like with caches.

I agree that there are trade-offs, but nevertheless compile time regex compilation is on my want list, even if a long way down it. I would take compile time arithmetic compilation† much sooner, but since that's an unsolved problem I don't get that choice.

† What I mean here is, you type in the real arithmetic you want, the compiler analyses what you wrote and it spits out an approximation in machine code which delivers an accuracy and performance trade off you're OK with, without you needing to be an expert in IEEE floating point and how your target CPU works. Herbie https://herbie.uwplse.org/ but as part of the compiler.


One of the biggest improvements reported by my users is the smart filtering enabled by default in ripgrep. That can't be contributed back to GNU grep.

Also, people have tried to make GNU grep uses multiple threads. As far as I know, none of those attempts have led to a merged patch.

There are a boatload of other reasons to be honest as well.

And there's no reason why I specifically need to do it. I've written extensively on how and why ripgrep is fast, all the way down to details in the regex engine. There is no mystery here, and anyone can take these techniques and try to port them to another grep.


> Assuming UTC for tz is not weird and cron users expect it,

That would definitely be weird and unexpected. My crons are interpreted with respect to my system's configured time zone, which seems way more expected than just using UTC.

Taking a datetime and just assuming it's UTC is often a mistake. It's why the TC39 Temporal proposal (overhauling datetimes for Javascript) won't let you silently do it.


Sure that’s an alternative too.


git-absorb is a complementary tool to `git rebase -i`. git-absorb will create the fixup commits (from your staging area) for you and set them up for use with `git rebase -i --autosquash`.


It solves a problem that can be solved better by good habits and a solid understanding of git.


That suggests you don't understand what git-absorb does. git-absorb takes a pattern of good habits and makes one part of it faster and easier.

Sure, I guess that's possible.

All I see is a tool that guesses where to aggregate a fixup commits when I can't be bothered to view and think about that myself.


About grep, all I see is a tool that looks for lines matching a query when I can't be bothered to just read the file myself.

I can trivialize useful software too. See how ridiculous you sound? That's what software does! It makes our job easier.

It's true that git-absorb guesses, but since you clearly haven't used the tool, you don't know how good it is at guessing. Moreover, false positives and false negatives are not the same in this scenario. A false positive would be very annoying, and I don't think that's ever happened for me in the years I've been using git-absorb. False negatives happen more frequently, but it's fine, because it tells you and then you just fall back to what you would have done manually for whatever it couldn't find a commit for.


I'm with you (see my other top level comment), but

> Then you're simply a `git revert` away from undoing it, without risking breaking anything else

This needs careful qualification. On GitHub at least, it is difficult to ensure every commit passes CI. This can result in skipping during bisect for a busted commit. It doesn't happen often enough in my experience too convince me to give up a cleaner history, but it's a downside we should acknowledge.

Ideally GitHub would make testing each individual commit easier.


A revert two weeks after the fact will create a new and unique tree (untested) in any case. I don’t if you’re saying that the original commit or the revert might be untested.

In either case the brand new revert could break something. Who knows, it’s a new state.

> It doesn't happen often enough in my experience too convince me to give up a cleaner history, but it's a downside we should acknowledge.

There are tools for that.

https://github.com/mhagger/git-test


All I'm trying to do is qualify things so that the trade offs can be more honestly assessed. The bisect for finding that commit might not work as well as you hope if you need to skip over commits that don't build or whose tests fail for other reasons. Those things can happen when you aren't testing each individual commit.

I understand there are tools for testing each individual commit. You'll notice that I didn't say it's impossible to ensure every commit is tested. Needing to use random tools to do it is exactly what makes it difficult. And the tool you linked says literally nothing about using it in CI. How good is its CI integration? There's more to it than just running it. On top of all of that, there are more fundamental problems, like increasing CI times/costs dramatically for PRs split into multiple fine grained commits.

Again, anyone here can go look at my projects on GitHub. I try hard to follow atomic commits. I think it's worth doing, even with the downsides. But we shouldn't try to make things look rosier than they actually are.


The negativity in the comments here is unwarranted in my opinion. I've been using `git absorb` for years and it works amazingly well. I use it in addition to manual fixups. My most common uses of git-absorb, but definitely not the only, are when I submit a PR with multiple commits and it fails CI for whatever reason. If fixing CI requires changes across multiple commits (say, lint violations), then git-absorb will almost always find the right commit for each change automatically. It saves the tedium of finding the right commit for each change. False positives are virtually non-existent in my experience. False negatives do happen, but then you just fall back to the manual approach.

It seems like some would reply and say PRs should just be one commit. Or that they will be squashed anyway. And sure, that is sometimes the case. But not always. I tend to prefer logically small commits when possible, and it's not always practical to break them up across multiple PRs. Perhaps partially due to how GitHub works.

I use this workflow on all of my projects on GitHub.


What is wrong with simply pushing a "Fix linting issues" in a new commit? It's self-contained and very well describes the (single) purpose of the commit.

I share the sentiment about the "logical small commits", and hence I don't see that adding a new fix commit is problematic as long as it is self-contained and purposeful, but perhaps I don't understand what is the problem that this tool is trying to solve.

It says

> You have fixes for the bugs, but you don't want to shove them all into an opaque commit that says fixes

So my understanding is that bugs were found in the PR and now you want to fix those bugs by not introducing separate "Fix A", "Fix B" and "Fix C" commits but you want to rewrite the history of existing N commits into N' commits so that it blends those A, B and C fixes in.

Maybe I can see this being somewhat useful only ever if those N commits are pushed either directly as series as patches to the main branch or as a single merge commit, and you want to keep the S/N ratio reasonable.

But otherwise I think it's a little bit problematic since it makes it harder for the reviewer to inspect and understand what changes have been done. But it also makes it harder for a developer since not all fixes are context-free, e.g. a fix for an issue found in the PR cannot always be attributed to the self-contained and single commit in your branch but it's the composition of multiple commits that actually makes this bug appear.


The issue isn't `Fix linting issues`, the issue is a `Fix linting issues` commit for issues you introduced in the code you'll be pushing with that `Fix linting issues` commit. Nobody wants to see you pre-push dev history -- it's not at all interesting. So rebase and squash/fixup such commits, split/merge commits -- do what you have to to make the history you do push to the upstream useful.

Now, if you find linting issues in the upstream and fix those, then a `Fix linting issues` commit is perfectly fine.


It makes git bisect more difficult than it needs to be.


merge-commits or patch-series are already making it vastly more difficult for git-bisect than linear history with single self-contained commits. I used both and git-bisect on merge-commits is a nightmare.


An option is `git bisect --first-parent` to start from your integration points. (Then drill down into that branch if you need to.)


Yeah. But for that we can squash everything into a single commit on merge. Instead of spending a bunch on time making every single commit in an MR perfect both in isolation and all-together. And if squashing everything in the MR causes a problem with the git history being too coarse, then that is almost certainly because the MR itself should have been split up into multiple MRs. Not because of how you organized the individual commits that belonged to one MR.


The goal isn't perfection. Splitting PRs has overhead, especially depending on what code hosting platform you're using.

Do you advocate for making code easier to read and understand by humans? What would you do if someone told you, "no I don't want to waste my time trying to make it perfect." It's the same misunderstanding. I try to treat code history like I treat code: I do my best to make it comprehensible to humans. And yes, sometimes multiple PRs is better. But sometimes multiple commits in one PR are better. I use both strategies.


I make multiple commits in the PR and squash it all on merge.

If the resulting squash is too big / touches too many things, it’s because I didn’t split the MR where I probably should have.

Because to me, the bigger waste of time is fiddling with all of the commits in the MR just to make it so that the MR can be merged without squashing.

If someone needs fine-grained history about how exactly a specific feature or bug fix came to be, they can always go to the MR itself and look at what happened there.


> I make multiple commits in the PR and squash it all on merge.

Sometimes I do that. But sometimes I want the history I've curated in my PR to be preserved without the overhead of creating separate PRs (which then often need to be stacked, introducing problems of its own). In which case, I avoid things like "fixup lint" commits. And that's where git-absorb shines. And saving time is exactly the point! git-absorb helps you curate history faster.

> If someone needs fine-grained history about how exactly a specific feature or bug fix came to be, they can always go to the MR itself and look at what happened there.

That's a lot more work than just reading the commit log. But I agree, one can do this ifneedbe. But it's nice to avoid when possible.


> That's a lot more work than just reading the commit log.

Yeah, that’s true. I suppose that we may be working in somewhat different environments.

For example, when someone has big public projects that get a lot of eyeballs on them, like your ripgrep and other projects, it makes a lot of sense to spend extra time making the git log a thing that can be read on its own completely offline.

For the things I work on at my job, there are just a few of us writing the code itself and those coworkers that work on the repos I do will usually check every MR from everyone else. And anyone else in the company working on other repos is usually likely to browse code via the GitLab anyway, if they are interested in actually looking at any of the code that are in the repos of my team. Onboarding new coworkers on the team is also mostly centered around the docs we have, and talking with other people in the team and the company, rather than digging through the git history.

And for me when I am looking to understand some code from our internal repos it’s usually that way too, that I use the GitLab UI a lot, and I look at the MRs that were merged, and I check the associated Jira ticket, and I might ask some coworkers on Slack to explain something.

Most of the time, no one outside of our team is very interested in the code of our repos at all. Discussions revolve around other aspects like the APIs we expose, and the schema of the data we store in tables in the databases, and the Kafka events we produce. That sort of thing.

And everyone at my job will necessarily need to be online when they are working anyways, so that they can use Slack etc. So we are one click away from our GitLab instance when we work. On the off-chance that someone is trying to do some work offline while being on the go and without Internet access available, they probably have some specific kind of work in mind that is sufficiently independent of specific details that they can do without that history.


Yeah folks work differently. I'm just mostly responding to this idea that (roughly paraphrasing) "hey we don't need this tool, you should just be squash merging instead."

FWIW, at my previous role, where I worked on an internal proprietary codebase, we all followed this same philosophy of curating commits. It was primarily to facilitate code review though, and not source history. Stacking PRs on GitHub is really truly annoying in my experience. (Although there is some tooling out there that purports to improve the experience.)


Exactly this


> What is wrong with simply pushing a "Fix linting issues" in a new commit?

if you want every individual commit be buildable then it is a no-go. it's also a no-go if you don't squash your prs.


What do the semantic commit purists do when a rebase causes some arbitrary commit to go from red to green? I've always wondered about that. Commit A becomes A' in a rebase. A is good, but A' is not. A' might be 20 commits ago in a feature branch.


It's hard to follow your example. You say "go from red to green" which I read as "go from failing to passing," but then you go on to say "A becomes A'" where "A is good, but A' is not." Either way, the answer is that if that commit is in your patch series that hasn't been merged yet, that it might make sense to just rewrite A'. But it might not. It could be a ton of work. But also, 20 commits in one branch is rather large, and maybe that's probably wrong too.

I suppose a purist might say you should go clean up the history so that you have no failing commits, but I don't know anyone who is a true purist. You don't have to be.

Instead of living in a word of black-or-white, join me in embracing the grey. I don't treat commit history as something that must be perfect. I just try to treat history like I treat the source code itself: I write it for other humans. Is the history always perfectly comprehensible? Nope. Just like the source code.


Yeah, that's what I meant.

I guess at the end it's all weighted tradeoffs for me too. I just put less weight on legibility and more on the ability to work on branches co-operatively without force-pushing.


If you're working on a branch cooperatively, then yes, don't rewrite history. But maybe you do rewrite history before merging to main to clean it up, depending.

And this is also why workflow questions are hard. Because "working on branches co-operatively" was previously unstated. It's not universal. I rarely work on branches co-operatively in a long term sort of way. Sometimes I might "take over" a branch to get it merged or similar, but rewriting history in that context is still fine.

It is always the case, even among so-called purists, that you don't rewrite history that you're sharing with others. (Unless you've both agreed to it. But even then, it better be short-lived because it's annoying.)


Red main branch is not what I am talking about at all. I am referring to the code being developed in a feature/bug-fix branch that is yet to be merged with main branch. OP believes that even in the development branch you should not fix your WIP code by adding "Fix linting issues". I thought that was implied by the nature of discussion since rewriting history of the code that resides already on the main branch would be beyond my understanding.


“Fix lint” commits also taint git blame.

You could perhaps add some kind of filter to git blame to automatically skip commits whose message is “fix lint” but at some point the cure is worse than the disease.

I also see people argue that merge commits make git bisect harder than squashing in the first place but there is a third option: rebase and fast forward. If every commit in your branch is clean and stand-alone that’s viable. Linter fix commits break that paradigm.


> What is wrong with simply pushing a "Fix linting issues" in a new commit?

Everything.

1. git blame is obfuscated. "Fix lint" is not helpful or relevant. Tell me what actually changed.

2. git log is noisy. More stuff is harder to read than less stuff, and you're making me read more stuff.

3. git bisect is difficult. Interspersed broken commits requires more time to sift through. (Lint is a poor example, say it's "fix server config")

4. git cherry-pick is tedious. When you copy the dev change to a release branch (let's say you maintain 1.x, 2.x, etc), you must include errata.


5. git revert is tedious When one commit turns out to be bad, you can't revert it cleanly because your "Fix linting issues" builds upon it.

> What is wrong with simply pushing a "Fix linting issues" in a new commit? It's self-contained and very well describes the (single) purpose of the commit.

Because I care a lot more about the logical history of the source code versus the actual history of the source code. Commits like "fix linting issues" are more like artifacts of the development process. The commit exists likely because you forgot to run the linter earlier, and only found the issue after creating the commit. Logically speaking, those fixes belong with the relevant change.

Now, if you're just going to squash the PR you're working on, then sure. The extra commit totally doesn't matter. Make as many of those commits as you want.

But if your PR is 5 commits that you want to "rebase & merge" as-is (which is something I do pretty frequently), and you care about the history being easy to navigate, then it can totally make sense to do fixups before merging instead of creating new "fix linting issues" commits. If the follow the latter, then your commit history winds up being littered with those sorts of things. And fewer commits will likely pass tests (because you created new commits to fix tests).

I want to be VERY CLEAR, that I do not agree with your use of the word "wrong." I do not want to call your workflow wrong. Communicating the nuances and subtleties of workflows over text here is very difficult. For example, one take-away from my comment here is that you might think I always work this way. But I don't! I use "squash & merge" plenty myself, in which case, git-absorb is less useful. And sometimes breaking a change down into small atomic commits is actually a ton of work, and in that case, I might judge it to not be worth it. And still yet other times, I might stack PRs (although that's rare because I find it very annoying). It always depends.

> But otherwise I think it's a little bit problematic since it makes it harder for the reviewer to inspect and understand what changes have been done. But it also makes it harder for a developer since not all fixes are context-free, e.g. a fix for an issue found in the PR cannot always be attributed to the self-contained and single commit in your branch but it's the composition of multiple commits that actually makes this bug appear.

I'm having a hard time understanding your concern about the reviewer. But one aspect of breaking things down into commits is absolutely to make it easier for the reviewer. Most of my PRs should be read as a sequence of changes, and not as one single diff. This is extremely useful when, in order to make a change, you first need to do some refactoring. When possible (but not always, because sometimes it's hard), I like to split the refactor into one commit and then the actual interesting change into another commit. I believe this makes it easier for reviewers, because now you don't need to mentally separate "okay this change is just refactoring, so no behavior changes" and "okay this part is where the behavioral change is."

In the case where you can't easily attribute a fix to one commit, then absolutely, create a new commit! There aren't any hard rules here. It doesn't have to be perfect. I just personally don't want to live in a world where there are tons of "fix lint" commits scattered through the project's history. But of course, this is in tension with workflow. Because there are multiple ways to avoid "fix lint" commits. One of those is "squash & merge." But if you don't want to use "squash & merge" in a particular case, then the only way to avoid "fix lint" commits is to do fixups. And git-absorb will help you find the commits to create fixups for automatically. That's it.


In some teams, you are not allowed to submit any commit that breaks the build, and a lint failure would be considered a broken build.


Do these teams run the pipeline on all the commits when you push multiple commits at the same time to your branch?

Say I have an open MR from a branch that I’ve pushed three commits to so far. I pushed these three commits individually and the pipeline ran green each time.

A coworker of mine points out some errors to me.

I have to touch files that I previously touched in the past three commits again.

I am tempted to commit these changes in one commit. But I decide to try git absorb instead.

So instead of adding one fourth, green commit to my MR, my use of git absorb rewrites all three of my previous commits.

But actually, the changes I was about to put in the fourth commit only work when taken together.

Splitting them up and rewriting the previous three commits will result in build failure if you try to build from the new first commit or the new second commit.

I don’t notice that because I’m on the new third commit.

I force push the three new commits to my branch for my MR. Gitlab runs the pipeline on the last of the commits. Everything looks fine to me.

Your team lead approves the MR and pushes the merge button.

Three months later he scolds me for a broken bisect.


You're right, most places I've worked this applies only to a subset of branches, usually main/master and sometimes other branches considered "stable" such as for the staging env.


We are talking about the commit, or series of commits thereof, during the development of code in feature/bug-fix branch and not about the commit that you push as a post-fix because one of your previous commits broke something. That was not the discussion as far as my understanding goes.


I’ve been using autofixup for this and it’s been ok but not great, it can be quite slow as things grown, and it doesn’t say anything when there was no match so it’s easy to miss. How does absorb surface that?

> Perhaps partially due to how GitHub works.

That’s definitely a major factor, I’d like to use stacked PRs they sound really neat, but GitHub.

Also even with stacked PRs I figure sometimes you’re at the top of the stack and you change things which affect multiple commits / prs anyway, surely in that case you need to fixup other commits than the ToS?


> I’ve been using autofixup for this and it’s been ok but not great, it can be quite slow as things grown, and it doesn’t say anything when there was no match so it’s easy to miss. How does absorb surface that?

I haven't used autofixup, but:

* git-absorb has always been pretty snappy. I don't think it scales with repository size.

* If there's no match, then the things that don't match stay in the staging area and don't make it into a commit. git-absorb will also note this in its output after running it.


GitHub's quirks definitely make life much harder than it needs to be, but I've been using `git machete` for months now with great success in my team. The __one__ thing GitHub has that makes it all work is the fact that if you merge the parentmost branch, its immediate child will retarget its base branch.

I think if I had full "control" over my company's SCM workflows I would use a tool that considers a branch as a workspace and every commit in the branch becomes its own PR (personal preference, but in my experience it also motivates people to split changes more), but alas.


The term Stacked PRs already sounds like a term that was invented specifically in order to communicate in a GitHub-influenced context. Because Stacked PRs are just a reinvention of being able to review a commit at a time (the stack part is straightforward).


Stacked PRs are like being able to review a commit at a time, but add an additional layer of sequencing. It's most simply thought of as a patch series, where the evolution of each 'patch' is retained.

That additional layer allows finer grained history and to mostly avoid (unreviewed) rebasing. Many teams find those properties valuable.


In this day and age I don’t understand why we just can’t call tings commits instead of patches or changesets.

I guess another pet peeve for me.

https://news.ycombinator.com/item?id=41659650


Stacked PRs is a way to surface the lifetime of the proposition as they get fixed or updated following reviews.

It has nothing to do with github.


It has nothing to do with GitHub in the sense that GitHub does not support it (I guess, I’m not up to touch). It does have something to do with GitHub in the sense that the name (PRs) and the benefits are framed from the standpoint of This is What GitHub Lacks.

Which is a GitHub-centric perspective.

https://news.ycombinator.com/item?id=41514663


I assume you refer to https://github.com/torbiak/git-autofixup. I have also used it, and its ok but not perfect.


I use git autofixup; it was much better than git absorb last time I checked

> it doesn’t say anything when there was no match

that's what it should do

> it can be quite slow as things grown

How? All the slowness (on large repos) I've seen has been fixed.


> that's what it should do

No it is not.

> How?

I don’t know, that’s just an observation from using it, semi regularly I autofixup changes and it takes a while to do anything.


you're probably using an old version

Same workflow here, and it's become a breeze with autofix, rebase --update-refs, and a small command to push the whole stack. I'm using magit, so I directly see what could not be matched and remains staged.

I am (what I assumed to be) an extensive user of fixup (usually invoked via interactive rebase). I'm intrigued by this but curious as to how it can really save so much time.

Are people fixup'ing a lot more than I do? I might do it once or twice per MR and it's never a large burden to fix the right commit.

If things get really out of hand such that the whole thing is a mess I just squash. Whatever history I had before is almost by definition gross and wrong in this scenario, and I don't mind losing it.


It's same kind of thing people tell me about ripgrep. "Why bother with ripgrep, grep has always been fast enough for me." That might well be true. Or maybe we value time differently. Or maybe none of your use cases involve searching more than 100K lines of code, in which case, the speed difference between GNU grep and ripgrep is likely imperceptible. Or maybe being faster unlocks different workflows. (The last one is my favorite. I think it's super common but little is written about it. Once something becomes fast enough, it often changes the way you interact with it in fundamental and powerful ways.)

Because I have git-absorb, I tend to be a bit more fearless when it comes to fixups. It works well enough that I can be pretty confident that it will take most of the tedium away. So in practice, maybe git-absorb means I wind up with fewer squashes and fewer "fix lint" commits because I don't want to deal with finding the right commit.

My use of git-absorb tends to be bursty. Usually when I'm trying to prepare a PR for review or something along those lines. It is especially useful when I have changes that I want to be fixed up into more than one distinct commit. git-absorb will just do it automatically for me. The manual approach is not just about finding the right commit. It also means I need to go through git-add in patch mode to select the right things to put into a fixup commit, and then select the other things for the other commits. So it actually winds up saving a fair bit of work in some cases.

Another example is renaming. Maybe PR review revealed the names of some functions were bad. Maybe I introduced two functions to be renamed in two distinct commits. I could do

first rename -> first commit -> second rename -> second commit

Or I could do:

both renames -> git add -p -> first commit -> second commit

Or I could do:

both renames -> git absorb

In practice, my workflow involves all three of these things. But that last git-absorb option eats into some of the first two options and makes it much nicer.


Criticism isn't negativity. We're not Pollyannas here, we're adults who can handle critique.


Can you give me an example of criticism that is not negative? As far as I know, all forms of criticisms involve pointing out a flaw or fault. There's constructive criticism, but it's still fundamentally negative.

Either way, feel free to replace the word "negative" with "criticism" in my comment if you want. It expresses the same thing I intended to express: I disagree with the criticism.

If we're not Pollyannas and we're all "adults who can handle criticism," then you should also be able to handle criticism of criticism. It goes both ways.


I've been thinking about your comment that all criticism is negative and I'm not sure where to go with that in my mind. I think it very much depends on your definitions of "criticism" and "negative". For example, a movie critic could praise a movie, 100% score, no faults. That's still a criticism by many definitions.

But the interesting thing about this is "the eye of the beholder". I suppose to some folks, all criticism does seem to be negative... which is why code reviews can turn into nightmares... and helps explain why I recently received a very angry phone call from another engineer because of some changes that were made to "his code".

For the purposes of this discussion, the oddity seems to be the folks jumping to defend the software they had no part in creating. Why does the criticism result in negative feelings for them? I can understand the author taking issue with it, but someone else's criticism should not impact another's ability to utilize the software for whatever purposes.


I don't want to play a definitions game and dissect word meanings here. I think it's usually a waste of time. That's why I implored folks to just accept a revision in wording in my previous comment. What I intended to convey was this:

1. There were many comments providing criticism that I disagreed with.

2. I felt that the vibe given by the comments at the time was not commensurate with how much utility the tool brought me in my own workflow.

So it seemed useful to point this out. That is, that there was a gap in the commentary that I thought I could fill.

Obviously I won't be using this phrasing again because it spawned off a huge meta sub-thread that I think is generally a huge waste of time and space.

> For the purposes of this discussion, the oddity seems to be the folks jumping to defend the software they had no part in creating. Why does the criticism result in negative feelings for them? I can understand the author taking issue with it, but someone else's criticism should not impact another's ability to utilize the software for whatever purposes.

Again, it cuts both ways. Why isn't it odd for folks to jump in and express criticism of software they had no part in creating? If folks can express negative criticism, then why can't I express positive feedback?

I didn't say people shouldn't make those comments. I said it was unwarranted. Or in other words, that it was undeserved and that I disagreed with it. Or that some balance was appropriate. This has literally nothing to do with my ability to utilize the software. I don't know why you're going down that road.


Negative comments are not always a product of negativity. Sometimes it's positive feedback to improve something that has potential.


I think that can be true and my top-level comment can be true simultaneously. I've also clarified what I meant at this point.


I think GP is saying that you didn't need to focus on the negativity (which is in itself negativity), just say the substantive thing that you wanted to say without editorializing about negativity. Your complaint (negativity) about negativity might be over-done, and anyways not useful. We can all shake our own heads at others' possibly-unnecessary negativity without having to be calling it out all the time. Meta arguments are not always useful.

Besides, consider this negative comment:

  https://news.ycombinator.com/item?id=41653797
is it one of the negative comments you didn't like? It seems like a possibly-useful negative comment -- "it didn't work for me because ..." is useful unless it was really a matter of undiagnosed PEBKAC. Would you rather than comment not have been posted? Surely not.


I think I've clarified my position in other follow-up comments sufficiently that everything you said here has already been addressed.

I even already said I would use different wording next time.

And I never said folks shouldn't make those comments. I clarified that too.


I'd be in favour of auto-stickying this. I see a lot of e-ink spilled over arguments that boil down to whether or not it's ok to comment about not liking some aspect of the subject under discussion. There are good reasons not to criticize in some situations, but I don't think they apply here. Either way the arguments are tiresome. We should agree to ban criticism, or agree not to argue about it (barring special circumstances).


I had to look up the reference, and based on the wikipedia plot summary at least, I admit I don't quite get the relevance. I expected a plot where someone handles criticism quite badly and suffers as a result, but in fact the plot was actually about someone who handled criticism very well instead, and improved the lives of others as a result?

So now I'm curious! In what way does Pollyanna relate to adults who can't handle critique? Have I got the wrong Pollyanna by any chance? xD


A Pollyanna is somebody who's cheerful and optimistic to a fault, i.e., even when it's unjustified.

The plot summary of the book is likely not what you should be reading as it's become an idiom. Something like Wiktionary or another dictionary would be a better place to look it up.

In this case, it's not about being able to receive criticism, but about being reticent about _giving_ it.


> The plot summary of the book is likely not what you should be reading as it's become an idiom.

This is a good and perhaps under-appreciated point. When I first read the term "Polyanna" I made the same mistake as GP. I think if you read "The Prince" to find out what "Machiavellian" meant you'd be no better than when you started. Even terms like "Kafkaesque" have taken on lives of their own and are probably better not thought of as mere literary references.


Machiavelli's "The Prince" will give you a decent understanding of what people usually mean by "Machiavellian". The book explains what methods would allow an absolute ruler to stay in control of state. It does not generally make moral judgments about those methods.

Machiavelli's "Discourses" is the one that will really confuse a reader looking to understand the colloquial meaning of "Machiavellian". In this book, Machiavelli lays out a vision of a healthy "republic" (or more precisely, res publica) which benefits the people who live in it. Among other things, Machiavelli argues that republics actually benefit from multiple competing factions, and from some kind of checks and balances. Apparently these ideas influenced several of the people who helped draft the Constitution of the United States.

Now why Machiavelli had two such different books on how governments worked is another interesting question...


> Machiavellian

> adjective

> uk /ˌmæk.i.əˈvel.i.ən/ us /ˌmæk.i.əˈvel.i.ən/

> using clever but often dishonest methods that deceive people so that you can win power or control

(from https://dictionary.cambridge.org/dictionary/english/machiave... )

Ymmv, but I think that's far from the point of the book, and isn't even the main topic. It's hard for me to imagine taking a person who'd never heard the term, letting them read the book, and then asking them to propose a definition, would produce anything like the above.


I’ve also thought about that based on the same info, i.e. not reading the source completely.

There is a notion of a “Pollyanna mode” in schematherapy. What it means is ignoring negative facts and challenges with an outwardly positive attitude and avoiding addressing the issues themselves.

This certainly can be harmful to oneself. Another harmful thing is hating and bashing oneself for mistakes and faults and I won’t make a comparative judgement, but a healthy way is supposed to be along the lines of speaking up openly about what bothers you and thinking what can be done about it if at all.


This makes sense. Thank you for your comment! Learnt something new today :)


Wat. ripgrep and fd should be available via your system package manager. You shouldn't need to be "moving these binaries around."

I wonder what things you use that aren't specified by POSIX.


Until it's in the base os images, there's an institutional cost for large companies installing everyone's favorite 'enhanced' utility and so they opt to just not do so.

I've spent many years of my career crafting tooling to sync dot files and binaries around and largely over time just gave up on it as the juice is just not worth the squeeze.


What does that have to do with "copying binaries around"? If it isn't in the base OS image, then install yourself. Or not. And this has nothing to do with POSIX either. Because there is plenty in base OS images that aren't specified by POSIX.

I interact with multiple Unix systems daily too. Some of those systems have ripgrep and some don't. Unless I'm in a particular scenario where I think ripgrep would be useful, I just use grep instead. I don't really see the issue. For a short-lived machine, yeah, I don't bother with syncing dotfiles and all that other nonsense. I just use the bare OS environment and install stuff as needed. But I also have plenty of longer lived Unix machines (like the laptop I'm typing on) where it makes sense to set up a cozy dev environment optimized for my own patterns.


Ooh, I can't just install packages on machines. Change management is a super important thing and bypassing that to get a utility installed via a package manager would be a dumb way to get fired.

Sure, I get it. You don't work in a company that has this sort of culture, but a large number of us do. And you want us to. Do you want AWS engineers being able to just install whatever they want on the hosts running your VMs? Of course not.


[flagged]


The entire thread is about how the benefit of these tools are not worth the hassle of copying the binaries around because everyone can't just install or modify the servers.

You keep modifying the argument for some sort of moral win I guess, devolving into name calling when people just don't agree with your position. Very professional of you.

Regardless, I presume you're just having a bad day. I hope it goes better for you and you can find some peace and calm down.


The original comment said nothing about modifying servers or AWS engineers installing random shit. That was you. I responded to "moving binaries around," and you started yapping about change management. Two totally different things. Like obviously if you have a locked down environment, then only install what you need. But this is not what the original poster was referring to specifically.

ripgrep even specifically calls out this exact use case right in its README: https://github.com/BurntSushi/ripgrep/?tab=readme-ov-file#wh...

> You need a portable and ubiquitous tool. While ripgrep works on Windows, macOS and Linux, it is not ubiquitous and it does not conform to any standard such as POSIX. The best tool for this job is good old grep.

So, you presume too much friendo. Now, go away.


> You are already using “performance will be looked at later” versions of grep

Are you referring to GNU grep? Because if so, this is wildly untrue. GNU grep has been very significantly optimized. There are huge piles of code, including an entire regex engine, in GNU grep specifically for running searches more quickly.

Maybe something like busybox's or BSD's grep would satisfy "performance will be looked at later," but certainly not GNU grep. I can't tell which one you're referring to, but a generic "you are already using" seems to at least include GNU grep.


What I mean is not really that grep hasn't been optimized, but that it kind of looks like it is. It's now far from the state of the art from most perspective (in part because of you).

And because there exists alternative that in most workloads are much faster than grep without really endangering grep's prominence, it shows that performance isn't really a criteria in its use nowadays (computers are fast).

My point as a whole is that as long as the implementation isn't catastrophically slow, then any port is good even if it's slower than GNU's because workloads that both care about max performance, and must be 100% compatible with the original, is small enough.


I didn't really want to engage with your broader point. I do disagree with it, but only in a measure of degree, and that's hard to debate because it's perspective and opinion oriented. If performance wasn't as important as you seem to imply, it's very unlikely that ripgrep would have gotten popular given that performance (and filtering) are its two more critical features. That is, that's what I hear from users the most: the like ripgrep because it's fast or because it filters or because of both.

But on the smaller point, it is just factually untrue that GNU grep falls into the "performance will be looked at later" bucket. That was why I responded.

This post is a classic about why GNU grep is fast, demonstrating that GNU grep definitely doesn't look like it wasn't optimized: https://lists.freebsd.org/pipermail/freebsd-current/2010-Aug...

And then before that, there's this: https://ridiculousfish.com/blog/posts/old-age-and-treachery....


> if performance wasn't as important as you seem to imply, it's very unlikely that ripgrep would have gotten popular given that performance (and filtering) are its two more critical features.

I'm not saying performance isn't a feature, but if one want performance they shouldn't use grep, and use ripgrep instead.

There was a need for a high performance grep replacement, and you filled it perfectly. But now that ripgrep exists, I don't think it makes much sense for a grep perfect clone to seek max performance, since it will definitely fall behind ripgrep no matter how hard they tried (at least because they can't ignore files or restrict the kind of regular expressions they support by default).


I don't really share that philosophy personally. But I do think there is a balancing act. There's a big gulf between "not catastrophically slow" and "state of the art." :-)


> POSIX compliance totally matters for interactive use, too, if POSIX is what you know.

When you use grep on the CLI, do you specifically limit yourself to options specified by POSIX? How do you even know if you do or not?

The `-r/--recursive` flag, for example, isn't even part of POSIX grep. Do you ever use it? What about `-w/--word-regexp`? Not in POSIX either. `-P/--pcre2` isn't. Neither is `-a/--text`.

Maybe what grep is, is actually more than what POSIX says it is.


You're absolutely right on the strict point of POSIX compliance. I think there's a point to be made though about just stability of the functional/CLI interface, aside from what's technically in the POSIX standards. ripgrep is pretty damn good on interface stability AFAIK, but the other tools that similarly don't try to position themselves as POSIX+/POSIXish compatible, are a mixed bag. It's nice to just have the stable tools you know, and have them practically everywhere (if you include WSL/msys2/Git for windows, etc.). And similarly you pretty much know what POSIX+ features you have on a GNU system (most Linuxes), or a BSD one (macOS, OpenBSD, etc.).

I like having an excellent set of system tools from GNU/BSD, etc., then I can install/use the SotA stuff -- and I'll still end up using both sets of tools all the time, even though one set is not absolute best in class, because I don't have to worry about how to use sota-tool 1.2 on one system vs sota-tool 2.1 on another, when there may be important interface changes.

And to provide the full context going back to my first comment: I won't consider using "performance will be looked at later" tools whose purported benefit is just the use of Rust, pretty much at all.


For me, I just find it very dubious when folks point to POSIX as a point of stability. The reality is that POSIX is so barebones in a number of respects, that most "POSIX tooling" actually has more features than what POSIX specifies. And sometimes the behavior of those features differs across implementations, with sed's `-i` flag being an infamous example. As I pointed out, the main problem is that, other than POSIX experts who have memorized the spec, most folks have no idea when they're crossing the line between "POSIX specified" and "extra feature."

I agree that stability is really the main important bit here. That is, is that script I wrote using your tool 5 years still going to work the same way today? And that is exactly my point. Because this discussion is totally different if we focus on what actually matters (stability) instead of some flawed idealistic approximation of it (POSIX) that only actually exists in theory. Because even the most spartan of tools, like busybox, implement features beyond what POSIX requires. And people use those not because they are in POSIX, but because they are in practice, in the real world, stable.

In other words, folks harping on POSIX as an end have confused it with what it really is: a means to an end.


It does not. Almost no regex engine does that.

To add more color to this, the precise details of what "Unicode support" means are documented here: https://github.com/rust-lang/regex/blob/master/UNICODE.md

In effect, all of UTS#18 Level 1 is covered with a couple caveats. This is already a far cry better than most regex engines, like PCRE2, which has limited support for properties and no way to do subtraction or intersection of character classes. Other regex engines, like Javascript, are catching up. While UTS#18 Level 1 make ripgrep's Unicode support better than most, it does not make it the best. The third party Python `regex` library, for example, has very good support, although it is not especially fast[1].

Short of building UTS#18 2.1[2] support into the regex engine (unlikely to ever happen), it's likely ripgrep could offer some sort of escape hatch. Perhaps, for example, an option to normalize all text searched to whatever form you want (nfc, nfd, nfkc or nfkd). The onus would still be on you to write the corresponding regex pattern though. You can technically do this today with ripgrep's `--pre` flag, but having something built-in might be nice. Indeed, if you read UTS#18 2.1, you'll note that it is self-aware about how difficult matching canonical equivalents is, and essentially suggests this exact work-around instead. The problem is that it would need to be opt-in and the user would need to be aware of the problem in the first place. That's... a stretch, but probably better than nothing.

[1]: https://github.com/BurntSushi/rebar?tab=readme-ov-file#summa...

[2]: https://unicode.org/reports/tr18/#Canonical_Equivalents


Thanks very much for clarifying that. It did seem unlikely: I remember NSString (ask you parents...) supported this level of unicode equivalence, and it was quite a burden. Normalising does feel like the only tractable method here, and if you have an extraction pipeline anyway (in rga) maybe it's not so bad.


Yes, rga could support this in a more streamlined manner than rg, since rga has that extraction pipeline with caching. ripgrep just has a hook for an extraction pipeline.


For the purpose of searching, wouldn’t it be sufficient to do NFC normalization for text? Could hide that behind a command line flag even…


Can you say how that differs from what I suggested in my last paragraph? I legitimately can't tell if you're trying to suggest something different or not.

As UTS#18 2.1 says, it isn't sufficient to just normalize the text you're searching. It also means the user has to craft their regex appropriately. If you normalize to NFC but your regex uses NFD, oops. So it's probably best to expose a flag that lets you pick the normalization form.

And yes, it would have to be behind a CLI flag. Always doing normalization would likely make ripgrep slower than a naive grep written in Python. Yes. That bad. Yes. Really. And it might now become clear why a lot of tools don't do this.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: