
Migrated WordGap from Python to Go 1 - voidlogic
http://blog.zmxv.com/2012/03/migrated-wordgap-from-python-to-go.html
======
waterside81
We're going through the same thing right now. Porting a few thousand lines of
Python to Go. It's almost a line for line port, with the odd exception here or
there. One thing I really miss going to Go from Python is using getattr or
dict.get(some_key, default_value). Also tuples. And marshalling from a JSON
blob is kind of a pain in Go.

In fact, I started making a list of things Python devs need to know about when
switching to Go. It's all a tradeoff, you lose some expressiveness for
performance.

~~~
voidlogic
"It's all a tradeoff, you lose some expressiveness for performance."

Not only do you gain performance but I think you gain maintainability and
clarity. I value that when you read Go the run-time complexity of operations
is very clear. In many other languages you often find O(n) or worse operations
that appear to many programs in such a way the uninformed assume they are O(1)
operations. In Go that does not happen- almost always what you see is what you
get. I will gladly write a few more lines of simple code to gain clarity.

~~~
wging
Can you post an example?

------
viraptor
I'd like to see the before and after code. Such difference is not impossible
between languages, but it's possible that a better implementation/algorithm in
the original code wouldn't bring a similar speedup...

~~~
jerf
This is the sort of algorithm that really beats on CPython's weaknesses.
Strings are immutable, so you're sitting there and just creating little tiny
string object after little tiny string object, themselves made out of
characters that are fairly heavyweight. You could hope that PyPy or something
could work out how to optimize it, but I'm not sure PyPy would optimize around
the constant strings.

Meanwhile, since Go has true arrays, and since you can probably just assume
we're in English only and let byte[] substitute for real unicode, you can
write very nearly what you'd write in C, albeit more safely, and not be
creating string objects by the thousands, and it wouldn't be particularly
unnatural Go code. A 12x differential is very plausible. I'd believe you could
even go higher.

This is not to say that Go is awesome or that (C)Python is a bad language in
general. I like (C)Python. But it does have some bad cases in it if you need
lots of performance, fiddly little string algorithms and lots of numeric
computation being two of the big ones. (By which I mean numeric computation in
pure CPython, not with Numeric Python or PyPy.) In both cases the sheer rate
of object creation and destruction work dwarfs the real work being done.

~~~
earthboundkid
This gist of what you said was correct, but you made a few technical errors.

First of all, Go's strings are also immutable. To get around this, you can use
the []byte type, as you said, but Go is built with the assumption that all
text is UTF-8. Frankly, I find the suggestion that ASCII is good enough, even
for Americans, absurd.

Second, Python's "list" type and Go's slice type (what you're calling a "true
array") are almost the same under the hood. Both are variable length arrays.
(I'm not sure why Python calls it a "list" since it's not a linked list.) The
only major difference is that Python's lists are always filled with pointers
to the underlying objects. In Go, you have the choice of whether you want to
use pointers to the underlying objects or to put the objects into a contiguous
block of memory.

Speaking more broadly, there's no reason you couldn't use a lot of different
Go practices in Python and end up with much more efficient Python code: pre-
allocating lists of the proper size, for example, or making sure to use
mutable buffers instead of immutable str objects. However, Go makes it natural
to be concerned with these efficiencies, and Python makes it natural to ignore
them. Each language has its strengths.

------
pjmlp
It seems that Go is killing Python and Ruby projects instead of the languages
the original authors thought of.

While in the enterprise everyone is happy using the available set of
languages, on HN we continuously see posts about migrations from Ruby/Python
to Go.

------
NateDad
And the app engine only supports a single thread for Go, imagine if you could
make it multi-threaded.

~~~
phasevar
Multi-threaded wouldn't help the case of a single request, which is what this
post is reporting latency numbers on.

~~~
Encosia
Assuming he's searching a trie to build the list of anagrams, he could
distribute a single request's work over a thread per letter.

~~~
Evbn
If he has multi core, which is often missing in a VM.

------
tansey
If WordGap was written in pure Python, without the use of any matrix libraries
like numpy/scipy, then I think you could have done better. 7x speedup is
actually pretty low for algorithms that I've rewritten to use numpy.

------
bsaul
Rewriting any code in any language pretty much always show improvements.
Unless we're sure it's not a different data structure or algorithm it doesn't
mean that much.

~~~
kkowalczyk
Except it doesn't, not when improvement is so dramatic.

If he rewrote Python in Ruby, the perf would have been more or less the same.

If he rewrote in C, he would get even better perf but at much bigger cost (in
dev time).

The point of his article is that Go is almost as easy to write in as Python
but gives you 10x perf and that's why Go is attractive. It has the right
perf/coding effort ratio.

~~~
bsaul
i understand what you're saying, but my point is that even rewriting from
python to python would show improvement. i've just coded the same project
three times in different languages last year, and my design is getting better
every time, and it has (almost) nothing to do with the language themselves. a
benchmark is a much more reliable thing if you want to compare relative speed.
or a "pet store" app. but not a "real world" project. altough that's still a
really nice experience to read about.

------
kristianp
It would have been useful if there were some implementation detail. Did he go
from hashes in python to maps in go? Or switch to maps of structs for example?

