
My Objection to Array#sum - raganwald
http://github.com/raganwald/homoiconic/blob/master/2009-04-09/my_objection_to_sum.md#readme
======
nostrademons
Python does this - sum/any/all are standalone functions, and join() is a
member of the string class instead of the array class. Python also gets a lot
of flack for it, as it seems like every month you can see someone complaining
about why you use `"\n".join(lines)` to join an array instead of
`lines.join("\n")`. People can't seem to wrap their head around it.

Personally, I think the big mistake is to think objects should represent
things in the real world instead of adapting to the needs of the program. For
example, it's very common to need to split a class into FooLike (an
interface), FooImpl, and FooRenderer to avoid coupling specific display logic
to business objects you may want to reuse in other contexts. These have no
analogs in the real world. They're there largely to make sense out of the
different things your program needs to do with a Foo.

~~~
jherdman
Count me as one of those people that don't get it. If I'm joining the elements
of an array together, I'm operating on the array. Therefore I _expect_ the
join() method to be on the array.

To me, Python can't seem to make up its mind with these weird (IMHO) stand-
alone methods like join(), sum(), any(), etc. That's the kind of thing I'd
expect in a functional language, not in an OO language.

~~~
jerf
The thing is, they aren't methods. They really aren't. They don't operate on
an object using the object's local data. They are generic functions that use
certain defined interfaces that many objects define.

"list.join(str)" and "str.join(list)" are _not the same_. The list.join syntax
only works for lists, but str.join takes _any sequence_ , which is anything
that implements the sequence protocol, including lists, dicts, sets, trees,
heaps, iterators in general, and a limitless variety of user-defined
sequences. Once you understand that it's not a matter of picking two spellings
of the same functionality, but actually radically different functionality
between the two spellings. (Putting this on string rather than it being a
free-floating function is perhaps dubious, but at least the operation does
have a sort of irreducible stringy-ness to it in a way that it does not have a
listy-ness to it.)

If you make the "join" a method of list, then you have grotesquely cut its
functionality. Now you are requiring people to implement their own join on
every other object that implements the sequence protocol, which is just silly.
(Of course if they have other needs they may still implement something else,
but why not give them the option of the default?)

This is generic-programming through duck-types, not OO. Similarly for all the
other examples you cite and quite a few more; putting them on "list" is not
"the sensible thing to do", it would be a _grave error_.

    
    
        Python 2.5.2 (r252:60911, May  7 2008, 15:19:09)
        ...
        >>> ", ".join(str(x) for x in xrange(10))
        '0, 1, 2, 3, 4, 5, 6, 7, 8, 9'
    

That's not a list in there.

~~~
zupatol
Well join should be on sequence instead of list then.

To me it seems unnatural to see join on string because string feels more
'basic' than a sequence, so I would expect the sequence to know about strings
rather than the opposite. But the argument against join on string is not as
strong as the argument against sum on array.

~~~
calambrac
This is such a non-issue. Really. The parent posted the reasoning behind the
decision, that reasoning is sound. Your disagreement is purely aesthetic,
there's simply no practical reason why the current way is wrong. Please just
accept that this is how it is, and stop complaining about it. Please?

~~~
nostrademons
That was my point when I started the thread. The Python decision is
objectively justified for technical reasons. Yet a large number of people
viscerally feel it's _wrong_. I blame human irrationality - but then,
programming languages are meant for humans, so shouldn't they accommodate our
irrationality? Otherwise, you get something like Haskell. ;-)

BTW, there are other cases where Python has gone the other way. Multiline
lambdas, for instance. The case against multiline lambdas is essentially that
they _look ugly_ \- they let you embed an indentation-sensitive block in what
might be a parenthesized expression, which means you need some dangling
delimiters. But they're certainly useful, as Scheme/Lisp/Ocaml/etc. have
shown.

It's similar to the compare-constants-from-the-left idiom in languages where
assignment is a valid expression (notably C). If you always write "if (NULL ==
foo)" instead of "if (foo == NULL)", you eliminate a whole class of bugs. But
I've seen very few programmers do this, because it "feels" wrong to a lot of
people.

~~~
lacker
"if (NULL == foo)" and "'\n'.join(lines)" both feel wrong because you are
putting the most complicated object last. The most complicated thing should go
first so you can sooner figure out what the heck this line of code is talking
about.

I would support making the C compiler just disallow "if (x = y)".

~~~
nostrademons
OTOH, in languages that support first-class functions, it's pretty common to
put the functional argument last because then it can wrap to another line
without leaving dangling parameters:

    
    
        $.each(my_array, function(val) {
          // Do something
          // Do something else
        });
    

That's putting the most complicated object last, and feels much more natural
than:

    
    
        $.each(function(val) {
          // Do something
          // Do something else
        }, my_array);
    

I suspect it's more that English has trained us to read "noun verb object"
sentences. In an if statement, `foo` feels like the subject, and then "==
NULL" is the predicate. In an array map, the array is the subject, and the
function is the predicate. In the Python join example, the array feels like
the subject, "join" is the verb, and "," is the object, which is what makes it
seem so awkward. Ruby's array.sum feels more natural because the array is the
subject and sum is an intransitive verb.

------
avibryant
Why is [1,"two"].sum any different from 1 + "two"? That is, 1.respond_to?(:+)
is always going to be true, and yet sometimes sending the + message to a
number will give a type error.

I don't buy into this idea of interface as binary - that if you respond_to?
the message, it's always appropriate, and if you don't, it's not. Interfaces
are something you look at when you're writing the program, and so they are
interpreted by humans and can be necessarily fuzzy: "#sum will give you the
total of the elements in the array, unless they aren't homogenous, in which
case you'll likely get an error". Fine. If you don't know enough about the
array in question, don't send #sum.

Similarly, if you don't _know_ that this is a chequing account, it would be
odd to ask it to write a cheque.

But at runtime, you send the message, and you maybe get an error. Whether this
is TypeError or NoMethodError seems entirely irrelevant.

Incidentally, my objection to Array#sum is that you don't know what to use as
the "0" element (what if the elements implement +, but aren't numbers), though
I'd be fine with it being called #numeric_sum or the like.

~~~
jsf
The difference between [1, "two"].sum and 1 + "two" is that in the second case
you are passing "two" to 1's :+ method, that's an argument error, it doesn't
depend on 1's state. In the first case you are not giving any argument to the
method, so you shouldn't get an error. From the point of view of the array's
user the array is now in an invalid state and that's not something the array
should have allowed to happen.

------
wvenable
I'm not a Ruby programmer so I have a simple question: Is it common to request
whether an object implements a particular method before you make any sort of
call? That seems considerably more fussy than strong typing of interfaces or
even the type-hinting of interfaces that PHP has.

I can't help thinking a cleaner solution is the more traditional way: Create a
subclass called NumberArray that can only hold numbers and has specific
methods for operating on them. But I guess that might not be the "Ruby-way".

~~~
raganwald
> Create a subclass called NumberArray that can only hold numbers and has
> specific methods for operating on them.

I can't see anything wrong with this for some cases. Another way to get there
would be to define a NumericCollection module that can be used to extend any
Array.

~~~
tjstankus
I would tend toward that idea: a module that can be included. Injecting a sum
method into the object's ghost class also maintains scope of responsibility,
but feels a bit more magic. I'd opt for less magic. At first, anyway. :)

------
davidmathers
_Not all arrays can be summed, but they all claim to respond to #sum. This is
extremely broken._

This is how I felt when I first saw Array#sum. But then I took a few hi-

But then I tried it a few times. And now I can quit anytime I want.

I think String#titlecase is correct though. Not all arrays can be summed, but
all strings can be titlecased, including part codes.

~~~
randallsquared
But does String#titlecase do the right thing for ß? :)

~~~
jeffesp
But the point isn't that String#titlecase can't do the right thing here. I
assume there is an implementation of titlecase that will work with ß. But I
can't think of an Array#sum that will work for [1, "two"].

Of course your :) might have meant you were being funny, then I just totally
missed your point and explained something you already know.

~~~
randallsquared
Well, there is a tenuous connection in that it's not obvious what to do for
casing of ß. I wrote some unicode stuff for CL once and figuring this out was
a nightmare. In any case, if you have two data types that that don't make any
other sense as an addition, you could always just convert them both to
bitfields and sum that... now I'm just being silly. Don't mind me.

~~~
davidmathers
According to wikipedia "ß".upcase is "SS", except in the case of legal
documents it remains "ß".

I checked irb and "ß".upcase is "ß", so I guess strings in ruby are of type
"german legal document."

I also just learned that the reason there's no uppercase ß is that no words
start with ß, so titlecase would never touch it.

------
_pius
I agree with you.

That being said, I suspect that this turns out to be a cultural debate more
than anything else. Remember, Ruby is a freedom language. Ruby could eliminate
many of the issues you raise by being strictly typed, but of course that'd
defeat the purpose.

The whole thing is a question of trust. Do you trust clients enough to give
them the syntactic sugar of Array#sum even if their arrays may not all be
strictly summable?

The answer to that question if you're writing Java is probably no. If you're a
Rubyist, however, the answer is a resounding maybe. For something like
ActiveSupport, it's probably OK. For other more hostile environments, maybe
not.

As to whether it's idiomatic Ruby to provide Array#sum, I'd argue no, even
though it's culturally acceptable.

~~~
scott_s
Disclaimer: I'm not a Rubyist.

I think saying "Ruby is a freedom language" is a cop-out for foregoing good
design. It excuses us from having to think through if the design makes sense,
because we can always throw up our hands and say "Ruby is a freedom language."
But I also think that's not an entirely true statement: Ruby doesn't let
programmers jump to any random line of code. (I did find a neat module that
implemented gotos and labels, but I think it only works at proc granularity,
not line granularity.) It doesn't let programmers muck with the the stack
frames or manage their own memory.

The reason for these non-freedoms is that past experience has shown us that in
some domains, some language features are high risk, low yield. I see the
freedom of Ruby as a chance to experiment with many different ideas for
language design, and the practical value of that is the languages we design in
the future will have those lessons baked in. But by defaulting to the idea
that it's a "freedom language," we resist learning.

I say this as a non-Rubyist. I wanted to learn a dynamically typed language,
and I chose Python. I don't have the time now to delve deeply into two
languages. So I'm commenting on this as an observer, but I'm an observer who
sees the value in Ruby, and potential implications to future languages.

~~~
_pius
_I think saying "Ruby is a freedom language" is a cop-out for foregoing good
design._

I think some people could use that designation as a cop-out, but I don't think
I'm doing that above. The point I was trying to make is that Ruby, by design,
is not strictly typed and allows you to do things like metaprogramming easily.
I'm certainly not trying to be an apologist for poor design.

------
koningrobot
I see the point, but I disagree that it's a problem that needs to be solved.
What is the alternative? A "Numbers" class? And how would this look in syntax?
Numbers.new([ 0, 1, 2, 3 ])? I mean, I'm sure you could make it a special
case, but it really isn't a special case. Type inference won't cut it, and
having to specify everything explicitly is tedious (and a matter of where you
draw the line).

And what would you get if you mapped over a Numbers object? An Array or
another Numbers object? Looks like you'll have to specify the return type for
the block. Even then, is an array of numbers really always a Numbers object?

I suppose that in this case, you could use a Numbers module and (somehow) make
every array of numbers magically inherit this. But a generic, extensible way
of doing this would be really, really complex and hard to do efficiently. Does
every newly created collection have to be checked for Numbers-ness?

Really, you can go as wild as you want to, but I'd rather just be able to map,
transpose and flatten around without all of this bureaucracy. Besides that,
I'm fairly certain this is only a problem in single-dispatch languages where
methods are owned by classes.

------
tjic
Calling [1,2,3].inject(0) {|sum, ii| sum + ii } has a cleaner smell, but it's
a lot more typing.

I wonder if the solution might be to have all arrays support sum (in the
responds_to() sense) only if they can can actually perform the operation.

Thus

[1, 2, 3].respond_to?(:sum) => true

[1, [2, 3]].respond_to?(:sum) => false

I can come up with an O(1) way to do this where we assume that an array does
respond to sum, and degrades to not supporting sum when a non numeric is
inserted ... but getting the array to support sum again if/when the non-
numeric is removed... I'd have to think for a bit to come up with something
better than O(n)...

~~~
raganwald

        [1, [2, 3]].respond_to?(:sum) => false
    

I elided a discussion about this case from the post. For some arrays, a
recursive sum is semantically valid. For other arrays a recursive sum is not
semantically valid, and it is not a simple case of knowing whether all of the
leaf elements respond to :+. So again, trying to implement this in the Array
class doesn't work because containers are implementation rather than
interface.

If you are going to use an Array class, I think the array's client is the one
responsible for knowing whether its elements can be summed and if so, whether
recursive summing is valid. The other approach is to subclass or otherwise
create special-case arrays that know about their semantically valid operations
like sum.

~~~
ken
Reminds me of Kent Pitman's article about, among other things, why there's no
generic deep-copy in Lisp: <http://www.nhplace.com/kent/PS/EQUAL.html>

His answer is similar: just because you know the structure of something
doesn't mean you know anything about its semantics.

Because of this, I'm tending to believe that adding more strong-typing to
languages is a losing battle, though I admit this raises more questions than
answers.

~~~
cbeust
> Because of this, I'm tending to believe that > adding more strong-typing to
> languages is a > losing battle,

I see this at the opposite.

Just because an object says that it responds to a method doesn't guarantee
that invoking said method will succeed. I'm not sure why the original poster
is so surprised by this finding.

Since this observation is valid in both statically and dynamically typed
languages, I prefer a statically typed one since at least, I don't need to
verify that the object does respond to that method before calling it.

~~~
raganwald
A question and answer that might be relevant:
[http://www.reddit.com/r/ruby/comments/8ba5g/my_objection_to_...](http://www.reddit.com/r/ruby/comments/8ba5g/my_objection_to_arraysum/c08rcll)

------
evanmoran
The method should be: array<number>.sum

The article is correct .sum shouldn't be on an array, but sum shouldn't be
global either. This is why generics exist.

~~~
omouse
Or you can have a sum method that accepts a block/lambda/anonymous function as
an argument.

This is done in Smalltalk with detectSum.

    
    
        people detectSum: [ :person | person age ].
        items detectSum: [ :item | item price ].

~~~
jdminhbg
In Ruby, this would be:

    
    
        class Array
          def detectSum(&block)
            self.map(&block).sum
          end
        end
          
        >> [1, 2, 3].detectSum {|i| i**2}
        => 14
    

Most Ruby people though would just call #map and #sum in succession on an ad
hoc basis.

~~~
omouse
Ew, how inefficient. Here is the Smalltalk code:

    
    
        detectSum: aBlock
            "Evaluate aBlock with each of the receiver's elements as the argument. 
            Return the sum of the answers."
            | sum |
            sum := 0.
            self do: [:each | 
                sum := (aBlock value: each) + sum].  
            ^ sum
    

Anyway, my point still stands. Better to accept a block than to use a sum
method that assumes the Array consists of numbers

~~~
evanmoran
I agree that using blocks/closures is a powerful and useful mechanism -- In
fact I think that they are a great implementation technique for this method.

What this comes down to for me is: what makes the best interface?

Specifically:

    
    
      myArray.sum()   
    

is much easier to read and use. In a world where we have to maintain our code
much longer then we write it this simplicity is very valuable.

Secondly, it is important to note that since the array uses generics it is not
necessary to check that the items are numbers, as any attempt to insert a non-
number would have thrown an error.

So in summery I agree with you completely that array<variant>.sum should _not_
exist. But I hope you see my point that array<number>.sum _should_ exist. (If
the code enforces the numbers to exist in the array there is no reason not to
include the sum method!)

