Why Java Doesn't Need Operator Overloading (and Very Few Languages Do, Really)

chancho · on April 13, 2009

Haskell has a neat solution for dealing with operators/infix.

- Anyone can define their own operators and set their precedence.

- Any operator can be made into a normal prefix function by putting parens around it. x + y = (+) x y

- Any function can be made into an infix operator by putting backquotes around it. mod x y = x `mod` y.

Sure you get a lot of people defining goofy operators: |||, .&., >>>, etc, but there are good API search engines ( http://www.haskell.org/hoogle/ , http://holumbus.fh-wedel.de/hayoo/hayoo.html ) that will look these things up for you. They're just functions with funny names. A good API will not abuse this ability, but it can come in handy. If you're going to have infix operators at all I think this is the most elegant way to do it. Having just a few special operators is a kludge, better to go whole hog like Haskell or not at all like Lisp.

jmillikin · on April 13, 2009

Another great part about Haskell's operators is that, like other functions, they can't be "overloaded". They are defined for a typeclass (Java users: read as "interface"), and trying to import different definitions for the same operator is a compile-time error.

chancho · on April 13, 2009

Well I think you're conflating the simple operator syntax with broader issues of static typing. There's no reason you couldn't have Haskell-style operators in a Python-like language, where the definition of each operator is given by a method of its first argument.

The 'no-overloading' rule of Haskell is kind of annoying. You can import different definitions of the same function, its just that you then have to disambiguate with a prefix. Like 'List.foldl (+) 0 xs' versus 'Map.foldl (+) 0 xs'. This is made more annoying by the fact that core functions such as foldl aren't typeclass methods. AFAICT the only reason you are forced to disambiguate is because otherwise the compiler would need to use Prolog-style backtracking to typecheck your code on account of type inference. It's perfectly possible, its just that the designers didn't want to force this huge requirement on implementors. Plus a lot of people such as yourself feel that having multiple functions with the same name is a bad thing. I personally don't, as long as they have different types and the compiler is able to disambiguate which one you are referring to based on that type.

dkarl · on April 14, 2009

There's no reason you couldn't have Haskell-style operators in a Python-like language, where the definition of each operator is given by a method of its first argument.

That's how Scala does it. Scala also lets you define new operators and use infix notation for methods, which gives programmers no excuse for violating conventional operator semantics. (In C++, the limited choice of operators tempts people to commit such atrocities as using + and * to denote non-commutative operations.)

Operators like + and * seem to cry out for generic treatment a la Common Lisp, but I don't know how well that technique scales to large codebases. Java's single-dispatch model is limited, but it seems to scale acceptably.

jmillikin · on April 13, 2009

Python, by necessity, is more Haskell-ish. I was primarily considering Haskell as opposed to C++, where multiple functions share a name and differ by parameter types. This sort of situation is a true nightmare to debug, and I believe is responsible for some of the backlash against user-defined operators in general.

chancho · on April 13, 2009

I don't see how Haskell typeclass methods are any different from C++ overloaded functions in this regard. In both languages you refer to a concrete function by its signature rather than its name. Eg in Haskell you'd have "(+) :: Int -> Int -> Int" or "(+) :: Float -> Float -> Float". You have this additional way to refer to the abstract version, "Num a => (+) :: a -> a -> a" but I don't see how that helps in debugging, other than to simultaneously set breakpoints on all the concrete functions which constitute the abstract method. Also C++0x concepts bring these two frameworks closer together.

On the larger issue of free-for-all overloading I mostly agree with you: it causes unnecessary confusion. I'm comfortable with it and like it in my own personal code but I could see how you would want to avoid it in larger projects.

SamReidHughes_ · on April 13, 2009

C++ requires that all overloaded versions exist at compile time; Haskell allows them to exist at runtime. For example, in Haskell you can make a datatype like

    data Foo a = Blah a | Bar (Foo [a])

and you'll have no trouble making a function "showFoo :: Show a => Foo a -> String" using straightforward programming. If you try to do that in C++ you'll get infinite template recursion.

Also, C++ templates don't restrict you to match the right type signatures, the way Haskell typeclasses do.

But other than that, they are pretty much the same kind of thing and are used in the same way.

chancho · on April 13, 2009

You've mixed together a lot of stuff here. your Foo data type is recursive. I can achieve something similar in C++ without infinite template recursion, but I have to use pointers to do it. (Your Haskell has pointers too, they're just hidden from you.) Also there is nothing 'runtime' about the example you've got here. I can do the same thing with a tagged-union type in C/C++ and a switch statement, which is what your Haskell compiles down to. Maybe you were thinking of existential types? http://en.wikibooks.org/wiki/Haskell/Existentially_quantifie... . That's something C++ can't do.

---------

Edit: nm I see what you did there with the Foo a = ... | Blah (Foo [a]). That would probably break C++.

jmillikin · on April 13, 2009

In Haskell, you can't have `(+) :: String -> String -> String`, or `(+) :: Float -> Int -> Maybe String`. Maybe I chose a poor phrasing -- I meant that, when debugging, in Haskell it's easier to figure out which function implements the operator. In C++, I usually have to resort to a debugger.

dougp · on April 13, 2009

This is just programming language stockholm syndrome. I like my cell and my kidnappers have my best interest at heart. If I wasn't bound and gagged then I might do something dangerous.

arohner · on April 13, 2009

I've been meaning to write about this subject. During the Ruby Array#sum thing from last week, some rubyists were suggesting that adding Array#sum to the core lang would fix it. No, the problem is not that Array#sum is missing, the problem is that Ruby doesn't have a good way to safely monkey patch classes.

jmillikin · on April 13, 2009

I think a big source of confusion regarding operator overloading comes from the primary exposure to it being in C++, where it is poorly implemented. Java decided to omit it entirely, as a reaction against C++ -- this sort of "they didn't do it right, so we shouldn't do it at all" behavior is my greatest gripe against Java. The problems with operator overloading in C++ are thus:

* First, C++ severely limits the set of available operators. You can't define () to mean exponentiation, (^^) to mean boolean XOR, etc. This results in contortions to fit various useful operations into inappropriate operators. For example, anybody that's experienced C++'s mis-use of (<<) for stream insertion or (+) for sequence concatenation is a victim of this.

An aside on concatenation: this is one of the few cases where I think PHP has it right. Arithmetic addition and sequence concatenation are entirely separate, and merging them into the same operator is a terrible idea. Think of these cases of concatenation on strings:

(1 + 2) = (2 + 1)

(1 + 1 + 1) = (2 + 1)

("1" + "2") /= ("2" + "1")

("1" + "1" + "1") /= ("2" + "1")

Concatenation behaves quite differently from addition, and shouldn't share the same operator.

* Second, there's no base classes for operator parameters in the C++ standard. Ideally, you'd have a `class Number`, a global procedure: `Number operator+ (Number a, Number b){ return a.Add(b); }`, and no need to define odd non-class procedures everywhere.

* Third, C++ culture encourages poor use of operators. The standard library itself uses bit-shift operators (<<) and (>>) for string formatting operations! Python has many of the same "freedoms" as C++, but community mores prevents these things from working their way into major libraries (excepting grandfathered-in poor choices like (+) concatenation and (%) formatting).

* Fourth, C++'s operators are used as crutches to work around other problems with the language. The most galling problem is a lack of syntax for declaring collections. There would be no need for `ostream<<` if you could have something like `cout.Write("foo = %d, bar = %d\n", vector[foo, bar])`.

Of course, most of the potential changes to C++ involve some level of integration between the language and its standard library. For various reasons, such as poor API of the stdlib itself and a desire to maintain the charade of being suitable for low-level development, the C++ designers refuse to do this.

dkarl · on April 14, 2009

Third, C++ culture encourages poor use of operators. The standard library itself uses bit-shift operators (<<) and (>>) for string formatting operations!

You can see elsewhere I slagged on C++ for its limited choice of operators, but this criticism never made sense to me. Using << and >> for stream operations isn't confusing to me. They aren't intrinsically shift operators; that's just the only way they were used in C.

Perhaps this kind of disagreement is why operator overloading is so despised. I see no reason not to use the same operator to mean two different things in two completely different contexts. I don't especially love the iostreams library, but the choice of << and >> to denote insertion and extraction seems perfectly sane to me, and obviously seems insane to you. For some reason method naming isn't so contentious.

The most galling problem is a lack of syntax for declaring collections. There would be no need for `ostream<<` if you could have something like `cout.Write("foo = %d, bar = %d\n", vector[foo, bar])`.

This example doesn't make any sense. First, the iostream designers didn't use format strings because they thought IO type-checking should be handled via the same mechanism as type-checking for other operations. This decision might have been infeasible without operator overloading, but the motivation was type safety. Second, you don't need a collection here. It isn't even helpful. If you wanted to use format strings, you could use the same function signature as printf.

russell · on April 13, 2009

== and != are why Java should have operator overloading. The Java implementation of == for objects is utterly stupid. For objects that have a natural ordering the rest of the comparison operators make sense.

I agree that you should not create nonsensical operators because you can, e.g. '*' for string replication, or '+' for sequence concatenation, although I have been guilty of that one myself.

scott_s · on April 13, 2009

For a good use of operator overloading, check out Boost.Spirit, which implements EBNF grammars directly in C++: http://spirit.sourceforge.net/

An example, from the user's guide:

  struct calculator : public grammar<calculator> {
    template <typename ScannerT>
    struct definition {
      definition(calculator const& self)
      {
        group       = '(' >> expression >> ')';
        factor      = integer | group;
        term        = factor >> *(('*' >> factor) | ('/' >> factor));
        expression  = term >> *(('+' >> term) | ('-' >> term));
      }

      rule<ScannerT> expression, term, factor, group;

      rule<ScannerT> const& start() const
      {
        return expression;
      }
    };
  };

This is nowhere near as concise as it would be in, say, OCaml, but if you're already using C++, it's useful.

frossie · on April 13, 2009

OP writes: "Of course, there are more operators that can be overloaded, such as * or /, but I find that even strong advocates of operator overloading quickly run short of arguments when trying to justify why one would even want to overload them outside of specific algebras."

Isn't that like saying "Why would you ever want a red light outside of an intersection"? I wouldn't, but that doesn't mean I want to do without red lights.

Specific algebras (if by that he means matrices, quantities that carry error quantities with them etc) are powerful and elegant ways of addressing problems and are extremely suited to operator overloading (eg. see the Perl Data Language).

Personally I have never seen + overloaded without the rest of the arithmetic operators being overloaded too, so my experience is sufficiently different that may be entirely missing the point.

iamwil · on April 13, 2009

one I saw is in Hpricot, where the "/" was overloaded to search.

(doc/"table td.country")

I guess this goes along with vim and gmail's convention to use / as the shortcut to search as well.

tomjen · on April 13, 2009

It is sort of an unofficial shortcut in UNIX. less use it as well.

yesimahuman · on April 13, 2009

My favorite use of operator overloading is /= in boost::filesystem. You can append a path to a path by doing this:

mypath /= "home"; mypath /= "mystuff";

natmaster · on April 13, 2009

Sounds like this guy is just against bad interface design. Although on that note, what does 'put' mean? How does that convey a map in any sense?

_csoo · on April 13, 2009

Put means put the value at the key location.

Like in Smalltalk:

    dictionary at: 'key' put: value.