
Ruby `reject!` - dmit
http://accidentallyquadratic.tumblr.com/post/157496054437/ruby-reject
======
drodgers
I love this blog, but I'm not sure that I agree with this:

> In my mind, I’d just rather not have a reject! at all, and callers who need
> to mutate an array in-place can figure out how to do safely with respect to
> their own use cases.

We ought to have a standard version of the reject! code precisely _because_
it's so difficult to get right. Saying application developers should solve the
problem themselves just sounds like some sort of political move: 'sure they
may get the edge cases wrong, but then it will be their fault not ours'.

One of the best things about Ruby is that having helper methods for every kind
of possible operation frees application developers from re-solving these edge-
case problems and allows them to just write their business logic.

~~~
erikpukinskis
Edit: I know this sounds like I am trying to dismiss the problem. That's not
my point. I'm trying to say the problem is compounded by the choice to bundle
the solutions to several different problems. If people have criticisms I would
love to read a reply. </Edit>

> We ought to have a standard version of the reject! code precisely because
> it's so difficult to get right.

No, it's easy to get right in any specific case. It's iterating over a dang
list, it's the first thing you learn how to do as a programmer. It's only hard
to get right if you're trying to solve it for every use case with a single
interface.

Essentially: you are taking two different algorithms and trying two switch
between them behind a single interface. Meaty if-statements behind flags are
an indication this is happening. It's cool of you want that but put it in a
library and give it a GitHub per.

An example: I don't use a library to merge attributes on objects. I cut and
paste one of a few snippets or write from scratch. Sometimes you want to
mutate an object, sometimes you want a copy, sometimes you want to shallow
copy or keep a reference to the parent. I could write a nasty complicated
function that intelligently solved "the problem". But all that does is bundle
together a bunch of different solutions to different problems and make it hard
for me to understand which one it picked.

Edit 2: I could be persuaded that what I'm saying is just not Rubyish. I'm a
JavaScript programmer (after many years of Ruby) so maybe there's some truth
in that. I do think that distancing ourselves from the mechanics of list
manipulation can be bad for our code quality.

~~~
pcwalton
> No, it's easy to get right in any specific case. It's iterating over a dang
> list, it's the first thing you learn how to do as a programmer. It's only
> hard to get right if you're trying to solve it for every use case with a
> single interface.

Isn't any correct implementation of any function that retains any only the
elements that pass some predicate going to have to do basically the same thing
that reject! does, or else be slow (creating a new list instead of mutating in
place)?

I guess there's a somewhat simpler implementation that would work for some use
cases (those in which order is not important) that just swaps in the last
element instead of removing. But if maintaining order is important the
complexity is similar.

I think that what would happen if Ruby didn't include reject! is that most
programmers would just needlessly create new arrays all the time, since that's
easier to write.

~~~
STRML
In this case, perhaps mutability is the real problem. Array#filter is quite
fast in JS and never mutates the array.

~~~
rickycook
it might be quite fast, but it's never going to be as fast as a mutable
version of the same thing. there are plenty of cases where you have many
(millions?) of elements and want to remove only a few. in that case, copying
an array of many elements for the sake of immutability is quite wasteful

*edit: i say never, but if you do cool stuff with referencing array ranges so that you don't copy, but reference the unchanged parts of the previous array then sure, but that's more of a low level feature of the language (eg erlang and clojure both do this i thiiiink?) than something you'd want to implement in a library because it can be suuuuper tricky to get right

~~~
elmigranto
No one does immutability with copying, that would be a joke. There is also
nothing tricky about not copying full lists on append at userland lib, just
more advanced data structures.

Yes, it won't be as fast in modifying. But it will be a lot faster in other
operations, so it's a trade off and you chose appropriately for your use case.

Plus, it's a lot easier to read and understand "immutable code".

------
pmarreck
Shameless plug:

After years working with Ruby, I felt that "bang" methods should only be used
for methods which mutate the receiver. Many methods which _do_ do so do not
have a "!" at the end.

To that end I created a little library called "Bang All The Things" :)
[https://github.com/pmarreck/ruby-
snippets/blob/master/bang_a...](https://github.com/pmarreck/ruby-
snippets/blob/master/bang_all_the_things.rb)

I'm convinced that this simplification helps prevent bugs, at the minor cost
of changing some conventions possibly borrowed from other languages.

I have since (mostly) moved on to Elixir which does not have the "mutability
problem" at all.

~~~
steveklabnik
> I felt that "bang" methods should only be used for methods which mutate the
> receiver.

I mentioned this upthread too, but it's a bit broader than that. Consider
[http://api.rubyonrails.org/classes/ActiveRecord/FinderMethod...](http://api.rubyonrails.org/classes/ActiveRecord/FinderMethods.html#method-
i-first) vs
[http://api.rubyonrails.org/classes/ActiveRecord/FinderMethod...](http://api.rubyonrails.org/classes/ActiveRecord/FinderMethods.html#method-
i-first-21) for example. The ! version raises an exception rather than return
nil.

~~~
hawkice
I actually like the ActiveRecord convention more than destructive-to-receiver
-- it was even adopted by the elixir community (which makes even more sense
than ruby, as there really aren't destructive updates in elixir).

------
ouid
I think that perhaps the most interesting part of this article is that it
comes from a blog called 'accidentallyquadratic' with more than one entry.

~~~
jameshart
I remember coming across it before when the left-pad incident happened, and
marvelling at the fact there were already many entries there then. The problem
is now, whenever a new entry on this blog gets linked, I find myself compelled
to go back and read all the entries in it again. I can't help thinking that
over time, that process is going to become prohibitively slow.

~~~
rikkus
This should be the highest voted comment on hackernews, or we're doing
something wrong.

~~~
ouid
it's pretty elegant.

------
eco
One of the more confusing parts of the C++ standard library algorithms is
remove/remove_if. A new user would think calling remove(v.begin(), v.end(),
value_to_remove) would result in the value being removed from the
container[1]. Instead, they'll find that all it did was shift around the
ordering of the vector and leave the elements you wanted removed in place at
the end (if you read the linked article you can already see where this is
going) which is a very unintuitive result. They might look up the function
signature and notice it returns an iterator which is also surprising.

Even after discovering they have to call the container's erase method using
the result of remove and an iterator to the end of the container[2] they'd
probably just decide that remove just has a terrible design. It takes
experience and some data structures knowledge to realize why it is designed
the way it is.

The design of remove means every element that remains are only be shifted (in
the case of an array/vector), at most, one time. The naive approach results in
the problem described in the article where you are constantly shifting the
elements forward as you progress through the container.

The usability of remove is still a problem (the concept shouldn't need a
wikipedia page explaining it) but the way it was done was done for good
reasons (within the design decisions of STL at least). It's probably fixed in
the C++ ranges proposal but I haven't looked.

D's standard algorithm library remove[3] (which is very much based on
Stepanov's STL too) goes one performance step further and lets you specify the
swap strategy so that if stability isn't a requirement the number of shifts
required is the absolute minimum possible (essentially moving elements from
the end of the range into spots freed up by the items being removed).

1\. Slightly more experienced users would quickly notice that it can't remove
the value, it has no reference to the container to change its length.

2\. So common and unintuitive it gets its own wikipedia page:
[https://en.wikipedia.org/wiki/Erase%E2%80%93remove_idiom](https://en.wikipedia.org/wiki/Erase%E2%80%93remove_idiom)

3\.
[http://dlang.org/phobos/std_algorithm_mutation.html#.remove](http://dlang.org/phobos/std_algorithm_mutation.html#.remove)

~~~
wruza
My homegrown C dynamic array has remove() macro with the same purpose (I
didn't know it is in c++). Except that it takes range instead of index and
doesn't move to the end, because you checkout/delete item first, and then
remove, not vice versa. It also has insert function that does guess what. The
module takes around one screen.

Cool thing is that it is not C++, where remove is hardcore stl magic, and I
can just iterate over elements' memory and remove them in sliding ranges, so
multi-remove problem simply doesn't exist. Idk why c++ overengineers
everything. We abstract a generic vector away from our convenience only to to
store items in regular array, because it is the only fast solution for generic
cases.

------
sriharis
Why is `reject!` even a separate, and different implementation than `select!`?
Perf reasons?

Haskell's Data.List doesn't seem to even have a remove/reject. Clojure's
remove is simply a complement of filter.

~~~
yebyen
Same reason ruby has both "if" and "Unless", or capybara has .to_not as well
as .not_to... there are times when the inverse expression is just more natural
and less confusing.

Rubocop, on the other hand against what I've just said, will tell you in
almost every case that you might use Unless, or in agreement with what I think
it is that You are saying here, the preferred idiom is to not use the inverse,
and you should only say "unless" when what you meant to say is unambiguously
!if.

IOW sometimes you really mean reject

~~~
dopamean
I don't think the commenter is asking why the method exists. But more why is
it that the bug exists with reject! and not select!. Why the underlying
implementations of those two very similar methods different?

~~~
yebyen
I caught that too late to avoid the downvoting, shame on me for commenting
before I rtfa

~~~
dopamean
Ah, all good. It happens.

------
Dirlewanger
Avoiding this should be as simple as using the inverse, `select!`, and then
flipping whatever the check is in the block, right?

~~~
yebyen
I think that only half avoids it, or shifts the wrong side of the big O to
opposite of wherever it is now. select! also mutates in place.

But I'm not totally sure about that, after reading the Rubydoc for Array[1],
only reject! has the warning about mutating the array every time the block is
called, select! does not have any such warning. You might be right.

[1]: [https://ruby-doc.org/core-2.2.0/Array.html#method-i-
reject-2...](https://ruby-doc.org/core-2.2.0/Array.html#method-i-reject-21)

------
skywhopper
I can't agree with the conclusion. Sure complexity is complex, but the
solution here would seem to be better performance analysis of the standard
library rather than blindly avoiding risk. How do you know where to stop? All
built in block-accepting methods for enumerator types would seem to be equally
risky in this analysis.

~~~
rocqua
Without much knowledge at all, I'd agree with the conclusion that "block-
accepting methods for enumerator types would seem to be equally risky". It
sounds to me like having non-pure functions that can terminate at any point.
And that sounds like asking for trouble in your code.

I like my safeties in languages though.

------
tzwm
So what is the right way to do the same thing? Use `select` and get new array
included elements we want to keep instead of using `reject!`?

~~~
pmontra
That's what languages with immutable data structures do. Internally they
implement simple data structures (lists, arrays) with more complex ones not to
get a performance hit. For example, they don't want to copy every byte every
time they add or remove from a list.

In ascending order of complexity

[http://theerlangelist.blogspot.com/2013/05/working-with-
immu...](http://theerlangelist.blogspot.com/2013/05/working-with-immutable-
data.html)

[http://concurrencyfreaks.blogspot.com/2013/10/immutable-
data...](http://concurrencyfreaks.blogspot.com/2013/10/immutable-data-
structures-are-not-as.html)

[http://debasishg.blogspot.com/2010/05/grokking-functional-
da...](http://debasishg.blogspot.com/2010/05/grokking-functional-data-
structures.html)

In the case of Ruby, it's a language build on mutability so let's use mutable
data. There are already plenty of languages with immutable data to use if we
want to.

------
jerianasmith
Quite a comprehensive Post ! What makes Ruby block an essential feature of
language is it's power and elegance.

------
pjfitzgibbons
I think worth clarifying, the fix on svn r49255 is in git tag v2_3_1, so if
you're on ~2.3.1 Then you're back to linear #reject.

------
gkya
I find it weird that the case is called a "bug" multiple times. Sure, if its
o(n^2), that's a problem, but if the code works correctly at the end, it's not
a bug.

~~~
pdw
O(n^2) algorithms are insidious in that they will work fine on test data, but
will ruin performance when there's an unusually large input in production.
They're landmines in your code.

~~~
throwaway1579
You do know that the most popular general-purpose, comparison-based sorting
algorithm quicksort has a worst case complexity of O(n^2), right?

~~~
minitech
Good implementations tend to address this with introspection, randomization,
etc..

------
throwaway1579
Can't wait for the entry on quicksort.

