Replace std:find_if in 80% of the cases

yellow_lead · on Oct 1, 2021

> So I went through all our usages of find_if and I found that it was only used in a proper way in about 20% of all the cases.

> This means that the Pareto principle applies here too. In 80% of the cases, std::find_if should not have been used.

I don't think this is the pareto principle. Its a rule of probabilities, i.e 1-X.

tomjakubowski · on Oct 1, 2021

The Pareto principle can be stated thus:

    When the percentage sign's
    89 minus nine
    That's Pareto

kragen · on Oct 1, 2021

Agreed. 80% of the `find_if` cases require 20% of the what, exactly? The functions? But he's using four times as many functions now.

felixguendling · on Oct 1, 2021

When reading the title I thought that it is about complexity of linear search vs. better alternatives like hash sets/maps, or at least sorted data structures (binary search).

But if having linear complexity is fine, std::find_if/any_of/none_of/all_of/etc. are of course fine and you should prefer the version that's most expressive.

kragen · on Oct 1, 2021

Of course; as Larry Wall said, doing a sequential search over the keys of a hash table is like clubbing someone to death with a loaded Uzi.

mywittyname · on Oct 1, 2021

It's safer, more reliable, and easier to clean up after?

Zababa · on Oct 1, 2021

I doubt it will be safer and more reliable unless the person doesn't fight back. I do agree for cleanup, especially if the bullets go through.

mywittyname · on Oct 1, 2021

Shouldn't this be titled, "use std::find_if where appropriate"?

The whole point of this is to find every element in the list which matches the lambda. Which is why it returns an iterator: you're supposed to use that value, increment it, then call std::find_if again with it as the first parameter.

Granted, the documentation is failing here, since it doesn't provide an example of this use case.

    for(auto i = begin(v), e = end(v); find_if(i, e, lambda_func) != e; ++i) {
        cout << "Found a match: " << i
    }

If you don't need a list of matching elements, but instead want to know if there was any match, then use a different function designed for that specific use.

kragen · on Oct 2, 2021

That's a valid way to use `find_if`, although you wrote `find_if(i, ` where you meant `i = find_if(`. But I don't think "the whole point" of `find_if` is "to find every element in the list which [fulfills the predicate]". §10 of the STL paper (pp. 41–55) doesn't mention this as a goal of `find_if`, and the result of using `find_if` that way isn't composable with other STL algorithms; for example, you can't use `equal` to compare two such "lists" of matching elements or `unique_copy` to eliminate adjacent duplicates from one of them.

If the STL contained something whose "whole point" was "to find every element in the list" which fulfilled the predicate, it would be an iterator adaptor, like Python 2's `itertools.ifilter`, Python 3's `filter`, Boost's `filter_adaptor`, or C++20's `std::ranges::filter_view`.

The closest thing in the STL is `copy_if`, which is happy to take an output iterator for its destination parameter, for example a `back_insert_iterator`.

kragen · on Oct 1, 2021

Reading these I'm struck by the non-orthogonality and "mental compilation" on display here. In Python instead of

    return numbers.end()
           != std::find_if(numbers.begin(), numbers.end(),
                           [](int number) { return number % 2 == 1; });

I would write

    return any(number % 2 == 1 for number in numbers)

which, similarly, stops examining numbers once it finds an odd one, but works by transforming the sequence of numbers into a lazy sequence of booleans, then applying the generic boolean-sequence-reducer `any` to it. I think this more clearly says what is desired (the C++ version strikes me as something a compiler might output given the Python version, but in this case you had to run the compiler in your mind), and additionally is a better orthogonal decomposition of the problem than computing a first-hit iterator from a sequence of numbers and a number-to-boolean transformation, then comparing it against the end of the sequence.

Sandor's suggested replacement solves the first problem of clearly saying what is desired:

    return std::any_of(numbers.begin(), numbers.end(),
                       [](int number) { return number % 2 == 1; });

But the penalty is even worse orthogonality!

Because of Python's better orthogonality, you can factor out the lazy boolean sequence generator into a function of its own if that's a useful thing to do:

    def oddnesses(numbers):
        return (number % 2 == 1 for number in numbers)

    return any(oddnesses(numbers))

In this particular case the extra degree of composability you get from it is not very useful, but then again, neither is the original code. You could do things like this with that sequence with equal plausibility to the original code:

    all(oddnesses(numbers))  # addressed by all_of in OP
    zip(numbers, oddnesses(numbers))
    sum(oddnesses(numbers))  # punning booleans as ints
    sum(map(operator.mul, numbers, oddnesses(numbers)))

Of course, Python is not really a viable alternative to C++ in most of the places where people are using C++, mostly because it uses 40 times as much CPU and energy and 10 times as much memory, but also because large Python codebases get unmaintainable.

But we could wish for a language with the efficiency and in-the-large maintainability of C++ and the clarity and generality of Python.

jrimbault · on Oct 1, 2021

Because I love Python, and Rust is applicable is many places where C++ is, here's a Rust version :

    // assuming `numbers` is an iterator
    numbers.any(|i| i % 2 == 1)

Rust's iterators generally compile down very well, and I think this exactly as expressive as Python. And I also wished C#'s Enumerable were as performant as Rust's iterators.

mbrubeck · on Oct 1, 2021

And if you write the equivalent to the `find_if` version in Rust, then the standard linter Clippy will automatically suggest the "good" version:

    warning: called `is_some()` after searching an `Iterator` with `find`
     --> src/lib.rs:2:9
      |
    2 | numbers.find(|i| i % 2 == 1).is_some()
      |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use `any()` instead: `any(|i| i % 2 == 1)`
      |

kragen · on Oct 1, 2021

IMHO it still suffers from less orthogonality; it's precisely equivalent to the C++ std::ranges::any_of version, though pleasantly less verbose.

I think I'm going to have to get serious about Rust. I had a dismaying epiphany writing https://news.ycombinator.com/item?id=28660097 the other day: like Perl, Python just isn't saving me very much effort except for quick hacks, maybe a factor of 2–4 over C (if we discount CVEs, heh). It's not the order-of-magnitude development speed improvement I was hoping for, and now that I've been using it for 20 years I know it never will be. Python starts to buckle under the strain of large projects, they totally fucked up Unicode handling to an epic degree never seen in any other programming language, and it typically costs you orders of magnitude in performance.

And—this is the worst of all—at best a library written in Python is only usable from Python and maaybe the JVM languages, and you can probably call it from a shell script. (I don't know, maybe the CIL languages too.) By contrast, if I write a library in Rust or C or C++ or D I can use it in Rust, C, C++, D, Python, Perl, Ruby, LuaJIT, Julia, PHP, node.js, Scheme, Common Lisp, OCaml, Golang, the JVM languages, the CIL, and maybe even Arduino. So the effort is greater (though, as you point out in this case, not necessarily much greater, though you didn't mention compile time, which sucks pretty bad for both Rust and C++) but the benefit is much greater.

(Also, the Python 2→3 gap was pretty badly flubbed, especially at the social level; no other language community tries to shame people out of maintaining backward compatibility that I know of.)

So I think from now on I'll try to start writing prototypes in Python or OCaml but, if the prototype seems worthwhile, writing the Real Thing in Rust. Which is gonna be pretty rough since I don't know Rust. :)

gpshead · on Oct 2, 2021

We'll link to this in ten years when you're still writing some Real Things in Python assuming this site even stays up that long and we have nothing better to do. ;)

kragen · on Oct 2, 2021

Ha! I forget if I already knew you in 02001, but if I did, you might have said the same thing to me about Perl then. The last time I wrote a Perl module was probably 02005.

Zababa · on Oct 1, 2021

And if numbers isn't an interator, it isn't really worse:

    numbers.iter().any(|i| i % 2 == 1)

I like how explicit it is without being too noisy.

Doctor_Fegg · on Oct 1, 2021

That’s basically borrowed off Ruby. (Which isn’t a bad thing.)

kragen · on Oct 1, 2021

It's not that simple.

Ruby's Enumerable is a concrete class; Rust's std::iter::Iterator is a trait, which has some aspects in common with Ruby classes and other aspects in common with Python protocols such as the Python iterator protocol. std::iter::Iterator is (primarily, though not in this case) an external iterator like Python iterators, STL iterators, and ranges in C++20 or D, not an internal iterator like (the original version of) Ruby Enumerable. Rust's type-checking strategy is more like ML than like Ruby, Python, C++, or D, while its compilation strategy is more like C++'s or D's compilation strategy than those of Ruby, Python, or ML.

Zababa · on Oct 1, 2021

The closure syntax is noted in the "influences" page of the language reference: https://doc.rust-lang.org/reference/influences.html

> Ruby: closure syntax

jackewiehose · on Oct 1, 2021

in C++ I would write

    for (int i: numbers)
      if (i % 2 == 1)
        return true;
    return false;

find_if() etc. is fine if you already have the right predicate function at hand but it becomes quite ridiculous with lambda expressions.

secondcoming · on Oct 1, 2021

This interestingly generates shorter code as the compiler doesn't attempt any loop unrolling

https://godbolt.org/z/eebE9eYP6

kragen · on Oct 1, 2021

I probably would too, and composability would suffer further. The concern you raise about lambda expressions is basically one of syntax rather than semantics. Syntax is important, but as an unrepentant user and occasional implementor of Lisps and Forths, I am in no position to criticize C++ :)

Zababa · on Oct 1, 2021

What I like about your example is that it's way better at communicating intent than the other C++ examples. In JS I would probably use

    numbers.some(number => (number % 2) === 1)

I feel like the other C++ examples given are noisy.

eska · on Oct 1, 2021

Are you aware of ranges?

  std::ranges::any_of(numbers, [](auto i){ return i % 2 == 0; }))

There's also stuff like this for orthogonality https://en.cppreference.com/w/cpp/ranges/transform_view

kragen · on Oct 1, 2021

Yes, which is why I complained about how nonorthogonal the C++ was instead of just how verbose :)

eska · on Oct 1, 2021

See my edit. You're probably interested in views, with which you can do stuff similar to lazy functional languages

  v | std::views::reverse | std::views::drop(2);

Here's an intro https://hannes.hauswedell.net/post/2019/11/30/range_intro/

blt · on Oct 1, 2021

Sadly the C++20 ranges library does not include any facility to create views by writing Python-like generator loops (coroutine loops with "yield"). Creating new views is about as difficult as creating iterators before - i.e. too difficult.

eska · on Oct 2, 2021

Then your complaint is not orthogonality. Orthogonality means that you can use a for each loop, find_if, any_of, ranges::any_of and everything will still compose the same no matter where the data comes from (file, network, a function that advances an iterator). Your complaint seems to be that iterators instead of coroutines are used here.

tomsmeding · on Oct 1, 2021

> But we could wish for a language with the efficiency and in-the-large maintainability of C++ and the clarity and generality of Python.

What about Haskell? Maintainability yes; efficiency mostly. It has a GC that can be a pain for high-memory scenarios if you don't spend a lot of effort optimising, but on the whole Haskell is surprisingly fast for its expressivity.

Any odd? `any odd numbers`. Or equivalently, `or (map odd numbers)`, which is compositional in that `map odd numbers` is just a list again -- no distinction between lazy and strict lists in Haskell. Your other four examples are `and (map odd numbers)`, `zip numbers (map odd numbers)`, `sum (map (fromEnum . odd) numbers)`, `sum (zipWith (*) numbers (map (fromEnum . odd) numbers))`. Note I used `map odd` instead of defining `oddnesses = map odd` and using `oddnesses`, because the definition is shorter than the name. :)

This changes somewhat when you use arrays instead of lists: Haskell's lists are in terms of behaviour mostly "generators that are implicitly stored as a lazy linked list if necessary". So if you want to store stuff more compactly, or you want to index into the list (as opposed to looping over it), you need arrays. But even for arrays the story doesn't change much! Using the 'vector' library, actually all of the above examples work just fine, with no unnecessary intermediate vectors because of some intelligent fusion rules, if you put `V.` before the list-related function names.

kragen · on Oct 1, 2021

Haskell is #1 for generality but #0 for clarity, though I have to admit that some of my Python examples aren't exactly shining exemplars either. But if I write code in Haskell I can only call it from Haskell (which wasn't something I mentioned in my original comment, but is important).

I think there's an underlying tension between generality and clarity. Clarity is when the code can't do things that aren't immediately obvious upon reading it. Generality is when it can. (There's a bit of squirrely equivocation here about what kinds of "thing" is being done, and what sense of "can" is meant, which is why they're just in tension and not really opposite, but I think there's some truth too.)

tomsmeding · on Oct 3, 2021

> But if I write code in Haskell I can only call it from Haskell (which wasn't something I mentioned in my original comment, but is important).

Right, that's a downside. I believe there is a way in which you can call Haskell code from C, but I've never seen anyone use it myself.

> I think there's an underlying tension between generality and clarity.

You're on to something here. C++ has the same problem, where the syntax itself is context dependent: `a * b;` could be a multiplication if `a` is a variable, or a variable definition if `a` is a type. Also template meta-programming, and overloading and {con,des}tructors in general. (Of course, Haskell has the overloading thing even more -- though the containment of IO helps matters quite a bit in my opinion.) Golang explicitly chose to go the clarity way, and it got them some love and plenty of flak.

Maybe one should just conclude from this that Haskell is not a great language for corporate environments. On the other hand, I think it's great for small-team development due to its expressivity and generality. But these two claims do not contradict each other: not every language is good for every purpose.

Zababa · on Oct 2, 2021

> on the whole Haskell is surprisingly fast for its expressivity

I don't know if I would call it "surprisingly" fast. It is plenty fast, but it's a language that's AOT and these languages are usually quite fast, faster than most JIT'd and interpreted languages. Looking at the programming languages benchmark games and the techempower web framework benchmark, Haskell is around Java and Go. Which means that it can compete against the most advanced JIT and a reasonably good AOT, while being more expressive than both. That sounds reasonable to me. I may be underestimating what it takes to make something like Haskell fast though.

kragen · on Oct 2, 2021

I think it's surprisingly fast because it's usually much faster than things like SBCL, (last time I checked) Racket, and modern JS, and languages like Java and Golang are much less expressive.

Zababa · on Oct 2, 2021

I had SBCL around Haskell/Go/Java in my "mental model of language performance", but I don't know much about it and I'm not sure how to evaluate it. From what I understand, Racket uses a bytecode VM and/or a JIT, so Haskell being faster wouldn't surprise me.

For Java and Go, that's true, though in both cases that's mostly because of decisions rather than tradeoffs. Go doesn't have a primitive type system because it makes it faster, but mostly because that's a design choice. Same thing for Java. I'd say you can get a good part of what makes Haskell good by using a ML as the basis for your language . Lack of ad-hoc polymorphism is usually a criticism of ML and its descendants, but Go and Java both lack it too, and both added generics 10 years after the language was created, which ML supports from the beginning. Rust seem to have achieved having parametric and ad-hoc polymorphism, but I don't know how easy or hard it would be to "extract" that "part" of Rust (without the borrow checker) and basically stick a GC on it. I also don't know how performant it would be. Lastly, Go, Java and OCaml all share fast compilation times, Haskell doesn't really and Rust is known for being slow.

Again, I'm far from an expert on all of that, but I think Haskell in the right place in terms of performance and expressivity. It's one of these cases where designing a language for performance in the first place instead of retrofitting it makes things much easier, I think.

kragen · on Oct 2, 2021

> Racket uses a bytecode VM and/or a JIT, so Haskell being faster wouldn't surprise me.

They support AOT native-code compilation too, but they say the JIT generally performs better: https://download.racket-lang.org/docs/5.1/html/raco/ext.html but my experience in the past was that it didn't perform that well. Maybe it's improved.

> Go doesn't have a primitive type system because it makes it faster, but mostly because that's a design choice. Same thing for Java.

I agree: a primary goal for both languages was clarity, even at the expense of some generality. Java was very slow for a decade (HotSpot wasn't released until 01999 and wasn't very good until about 02002), but a few billion dollars and a Fortune-500 bankruptcy later, it's really pretty zippy! (As long as you don't care about startup time.) Golang, by contrast, was designed with higher performance goals, which it fell somewhat short of.

> Lack of ad-hoc polymorphism is usually a criticism of ML and its descendants, but Go and Java both lack it too

Ad-hoc polymorphism is absolutely fundamental to Java; one of its big differences from C++ was that nearly every callsite implied ad-hoc polymorphism, and that's where a lot of the optimization effort went. Golang isn't quite so radical about it, but its interface mechanism exists for almost nothing but ad-hoc polymorphism.

OCaml has ad-hoc polymorphism, FWIW, but people don't use it much.

> both added generics 10 years after the language was created

Golang was released in 02009 and still doesn't have generics, but it will this year. Not far from 10 years.

> Haskell ... It's one of these cases where designing a language for performance in the first place instead of retrofitting it makes things much easier, I think.

I wouldn't have listed Haskell as a language designed for performance. When it came out in 01990 the conventional wisdom was that garbage collection was inherently super slow, especially if you couldn't avoid allocating in your inner loops (by mutating instead).

Allen Wirfs-Brock says Tektronix Smalltalk for the 4405/4406 "may have been the first shipping commercial product, running on a off-the-self processor, to use a generational GC" http://www.wirfs-brock.com/allen/things/smalltalk-things/tek... in 01986, only four years before Haskell was designed. That was the key advance that made efficient pure functional languages feasible, and it was still too early to be sure how it would shake out. That's one reason Java (still Oak in 01992) entirely lacked implicit heap allocations: the conventional wisdom was still that heap allocation was very expensive because it required the GC to run more, so they didn't want to impose that on programmers. It was sort of a self-fulfilling prophecy, because even today the way to make your Java (or Golang) go fast is to avoid allocations in your inner loops. Not a big deal in SBCL, LuaJIT, OCaml, or pretty much anything else with a GC heavily tuned for allocation.

Other aspects of Haskell also involved a lot of performance research. Simon Peyton Jones invented the spineless tagless G-machine in the process of writing GHC, published in a paper in 01992, which was sort of a followup to his 01987 book on efficient implementation of pure functional languages. Even so, through the 01990s, Haskell still wasn't very fast. I think a lot of the stuff that makes Haskell reasonably fast is stuff that's been discovered in the last 25 years, after the basic language design was decided.

I'm also far from an expert on this stuff, though, so my picture of what happened might be missing some important aspects. I still know very little Haskell.

secondcoming · on Oct 1, 2021

> Because of Python's better orthogonality, you can factor out the lazy boolean sequence generator into a function of its own if that's a useful thing to do:

You can do the same in C++:

    static bool isEven(int x) { return x % 2 == 0; }

    bool foo(const std::vector<int>& v)
    {
        return v.end()
            != std::find_if(v.begin(), v.end(), isEven);
    }

kragen · on Oct 1, 2021

Not the same; no sequence.

quietbritishjim · on Oct 1, 2021

I disagree with the article's conclusion. The evidence in it proves that it's better not to use the specialised functions (like any_of).

It's immediately clear to me what each of the find_if snippets does, but I haven't heard of the other functions – and the evidence from the article is that the vast majority of other programmers haven't either (in fact that's out of the subset that bother to use find_if, which I would say means even less of general coders). Sure, it's fairly obvious what they do from the name, but I'd still bother to check their signature in case there's a subtlety. In other words, there may be fewer lines of code, but it's still more cognitive overhead.

In fact I'd go one step further and say it's better to use raw primitives, like for loops (range based where appropriate), where they don't need too much extra code.

leetcrew · on Oct 1, 2021

it seems like your argument boils down to "I prefer a for loop because I'm more familiar with it". if that's true of you and most of the people you work with, that's certainly a valid argument. the opposite happens to be true on my team. we understand and prefer the functions from the algorithms header in most cases.

one argument in favor of functions like any_of is that, once you get familiar with them, they provide additional documentation of intent. I'll concede this is a pretty minor benefit; most uses of any_of replace a pretty simple loop.

quietbritishjim · on Oct 2, 2021

> it seems like your argument boils down to "I prefer a for loop because I'm more familiar with it".

I don't think that's it. A lot of code ends up being calls to functions, and so other types of syntax add a bit of texture. Replacing those with function calls too reduces the code to a smooth paste. It's a bit like if the standard library had a lessThanThree(x) function; it doesn't matter how totally familiar I was with it, I'd still prefer to see x<3.

malkia · on Oct 1, 2021

Wondering if something like this can catch these - https://twitter.com/_fel1x/status/1443613096820543492

toxik · on Oct 1, 2021

Marginally more readable, but is it performing better?

secondcoming · on Oct 1, 2021

No, the codegen is the same

https://godbolt.org/z/eebE9eYP6

twoodfin · on Oct 1, 2021

Indeed, outside of corner cases where iterators are expensive to copy or compare, I’d be surprised if there were any measurable advantage beyond clarity.

dataflow · on Oct 1, 2021

[edit: whoops, never mind]

kragen · on Oct 1, 2021

There's no difference in how many times the predicate is called or with which elements.

sillysaurusx · on Oct 1, 2021

Is "using namespace std" still considered heresy? I rather liked it. It'd make a lot of these examples a little more readable.

riskneutral · on Oct 1, 2021

I don't like it, clarity is better in C++. Things are complicated enough as it is.

kragen · on Oct 1, 2021

How about `using std::find_if`?

dataflow · on Oct 1, 2021

Unqualified calls change the behavior. You end up doing ADL instead of a normal lookup and could potentially end up calling a different function than the one you intended entirely. Sometimes that's intentional (as with swap), but often it's a bug.

sillysaurusx · on Oct 1, 2021

I've been a C++ dev for longer than I care to admit. The idea that you could call the wrong function after putting "using std::string" at the top of your .cpp file doesn't reflect my experience.

You might have a point if people are putting that in header files. But even in header files, "using std::string" is often a standard thing to do in e.g. game engines. It's far easier to change out your string implementation if you're writing "string" instead of "std::string".

dataflow · on Oct 1, 2021

> using std::string

The question was about find_if, which is a function, not string.

eska · on Oct 1, 2021

Definitely don't use it in headers, because it pollutes the namespace of files including it. In .cpp files it's more subjective. You can also resort to only using it in functions, to limit it's scope to code where you make heavy use of std.

josephcsible · on Oct 1, 2021

tl;dr: if the only thing you do with the result of find_if or find_if_not is compare it to .end(), then you should use all_of, any_of, or none_of instead.