

Better Python APIs - ozkatz
http://ozkatz.github.com/better-python-apis.html

======
JulianWasTaken
It's true that double underscore methods are sometimes underused and that
using them can often make for a better experience, but it's also easy to be
overly clever, or to "misuse" them by using them to add semantics to an object
that is already a bit mis-designed.

The first example in the post (using __repr__) is probably the least
controversial. There are places in the official docs that explicitly recommend
defining __repr__ for every object you create. This is mostly something that I
do do (though there are exceptions of course), but one thing you do want to be
careful with is to make sure that what appears in your repr is relevant,
terse, and helpful. That means that I _will_ and do sacrifice a repr that is
eval'able (which I find to be useless) for a repr that is short and sweet. So
for the example in the article, it may or may not make sense to put
`self.objects` in there. For a "Container" class, it probably does, but don't
extend that to "let's put all our data in the repr for everything".

__iter__ is nice, and is one of the most oddly underused methods, but rather
than using it there, I think the better thing is to not have the object
encapsulate a search, but to have the object encapsulate a YoutubeAPIClient or
whatever, and have `search` be a method that returns an iterable. This is also
kinda touchy-feely though since I guess there might be examples of __iter__
where I'd feel more comfortable with something like this (a results object).

The next one with __sub__ I don't like too much, because I don't like
encouraging slow operations with syntax sugar (besides the typechecking).

Anyways, I guess I'm nitpicking on examples, mostly. Double underscore methods
are generally underused I've found, and if you have doubts on whether you're
overreaching, you can always add a method with the same behavior you're using
them for in case anyone doesn't like how it reads.

------
pjscott
Nice post overall, but I take issue with a couple of the examples. First, the
YoutubeSearch class has no reason to be a class. It wants to be a function:

    
    
        def youtube_search(term):
            url = 'https://gdata.youtube.com/feeds/api/videos'
            params = {'q': term, 'alt': 'json', 'orderby': 'relevance', 'v': '2'}
            r = requests.get(url, params=params)
            r.raise_for_status()         # 4xx and 5xx replies raise exception
            for video in r.json.get('feed').get('entry'):
                yield {
                    'title': video.get('title').get('$t'),
                    'url': video.get('link')[0].get('href'),
                }
    

The ExpressiveList's __sub__() method, as written in the article, requires
O(n*k) time to remove k things from a list of length n. By using a set of
things to remove, we can get that down to O(n+k):

    
    
        class ExpressiveList(list):
            def __sub__(self, other):
                if isinstance(other, list):
                    other_set = set(other)
                    return ExpressiveList(x for x in self if x not in other_set)
                else:
                    return ExpressiveList(x for x in self if x != other)

~~~
jspthrowaway2
The primary reason to make YoutubeSearch a class is, in the future,
implementing searching other video sites and taking advantage of inheritance
in the implementation. That isn't demonstrated here, but the plumbing of
"doing a request", "providing an iterator" and such could be promoted to the
base class and subclasses are only responsible for knowing their domain.

~~~
icebraining
When you actually want to implement those other sites, you can replace the
function by a class with a __call__ method.

~~~
hcarvalhoalves
There's no practical difference between a method and a class in this case.

------
krosaen
I suggest the author and readers watch the excellent pycon talk, "Stop Writing
Classes":

<http://www.youtube.com/watch?v=o9pEzgHorH0>

That said, some nice factoids in the post about making your objects readable
in the repl etc.

------
ninetax
I disagree about the operator overloading. It's kind of like a hidden function
that could have ambiguous connotations. It could confuse people because of the
association with the traditional operations.

For simple things like list addition it might be intuitive what the result
will be, but if I had a a Vector class and then you were reading some code and
saw two vectors being multiplied like this

    
    
        a * b
    

well is that a dot product or a cross product? Or maybe a is a scalar and it's
scalar multiplication? If it's one of those what do the other operations get
for symbols? isn't it easier to do a.cross(b)?

If you keep it simple a little syntactic sugar is probably find. Nobody wants
to be writing a.add(b.subtract(c.mult(d))) for a + b - c *d .

Anyone have other thoughts on this?

~~~
tikhonj
This is why I like the ability to define your own operator symbols. I think
this is a good compromise between the C++/Python approach and the Java
approach.

The C++ approach has the problems you outlined. A single operator name (like
+) gets overloaded with far too many different--and often completely unrelated
meanings. Particularly, this violates reasonable expectations about how
certain operators behave; for example, + should be commutative. Python's + for
lists, very clearly, isn't.

The Java approach of not allowing operator overloading or custom operators is
also patently untenable. Just look at the BigInteger class! There are plenty
of other cases when using an infix symbol makes for much clearer code.

With a language like Haskell, you can just come up with new operators as you
need them. In fact, they act exactly like normal function names: the only
difference is in how they're parsed. This also means they're overloaded like
normal functions. So while there is a + operator for a bunch of different
types, it always represents some notion of addition. This feature can be
abused--it's relatively easy to write confusing operator names--but I've found
it to be a net benefit in practice. If used correctly, it makes the code much
clearer; you just have to be a little careful.

For example, in Haskell you use ++ to concatenate lists. So you can't mistake
list concatenation for addition, which is important because they behave in
very different ways.

As an aside, if you're willing to use Unicode symbols--and I think you should
be!--then you can define cross as × and write a.cross(b) as a × b. In a small
expression this does not seem like much of an advantage, but if you are doing
more complicated math it makes the code easier to follow. It's also roughly as
easy to type: with the right input mode, it's just a \times b.

~~~
mercurial
> With a language like Haskell, you can just come up with new operators as you
> need them. In fact, they act exactly like normal function names: the only
> difference is in how they're parsed. This also means they're overloaded like
> normal functions. So while there is a + operator for a bunch of different
> types, it always represents some notion of addition. This feature can be
> abused--it's relatively easy to write confusing operator names--but I've
> found it to be a net benefit in practice. If used correctly, it makes the
> code much clearer; you just have to be a little careful.

I found that in practice it is often abused and leads to code full of .+:, >>>
and other operators which certainly make a lot of sense to the author of the
library, but don't do much for intuitive understanding of a piece of code
without reading the documentation for each imported library.

~~~
tikhonj
This _can_ be a problem, but it is entirely a matter of library design. In
most cases, I haven't found this to be a problem. It helps to follow certain
conventions--for example, a name like <|> can be read as "a different version
of |" (which represents alternation).

I think it's still better than what you get with C++ and Python. I would much
rather have a relatively inscrutable <+> operator that does something vaguely
like addition than having + do multiple completely different things on
different types.

Also, for many of these operators, the clarity of having an infix version
trumps the fact that you'd have to look it up. Take >>> as an example. Even if
it was called something like next, you'd still have to read the Arrow
documentation to understand exactly what it did (it's fairly abstract). And
compare how the code would look:

    
    
        actionA >>> some complicated action >>> actionB
        next actionA (next (some complicated action) actionB)
    

Ultimately, it is impossible to design a language that does not allow any bad
code. I think having custom operators is better than the main alternatives
(C++ and Java styles).

~~~
tehwalrus
If compilers supported proper unicode _as code_ not just in strings (like, I
believe, Go does) then you could use A ⊗ B for outer products and A ⋅ B for
inner products.

(a quick google implies that Haskell supports this
(<http://www.haskell.org/haskellwiki/Unicode-symbols>). Good!)

~~~
mercurial
You mean, Unicode symbols. Quite a few languages, including Java, support
Unicode letters as identifiers. But frankly, I'd rather stay with cryptic :+:
(see the answer to this question [1] for why)

1: [http://stackoverflow.com/questions/2793792/is-it-a-good-
idea...](http://stackoverflow.com/questions/2793792/is-it-a-good-idea-to-use-
unicode-symbols-as-java-identifiers)

~~~
tehwalrus
Go has unicode symbols, but I meant just for operators. The reasons listed are
important some of the time, but for example when you're a bunch of Chinese
developers working on a project it seems sensible to use chinese identifiers
in the code (especially if you all only have basic english.)

In this situation, everyone is going to have a computer set up to use the
character set, so the ����� + ��� situation shouldn't arise.

In terms of confusing things at linking / runtime stage, I didn't think of
that - maybe you're right! Go gets away with it by being completely compiled
down to binary (and requiring the source of all the libraries locally, IIRC.)

~~~
mercurial
As a non-native English speaker, I can tell you it's a very bad idea to mix up
a language with English keywords but identifiers in a different language. The
disconnect between the two is awfully annoying, and the "native language
identifiers" end up being a mix of English and native language anyway (I don't
know how it is with Chinese but many language don't have useful equivalents to
many IT terms, or they are so terrible that nobody wants to use them).

~~~
tehwalrus
Indeed - I can imagine, with so many english keywords everywhere. The code
I've seen from developers who speak other languages tends to be in english,
with comments in their language - frequent use of google translate to
understand what's going on! :)

------
lysol
I always consider these nice to have, because often-times they fall on the
implicit side of things rather than explicit. But if you explicitly state "X
is also a generator" in your documentation, then by all means implement these
methods.

