
Experience report on a large Python-to-Go translation - psxuaw
https://gitlab.com/esr/reposurgeon/blob/master/GoNotes.adoc
======
mappu
Here's another experience report: I ported a small 1 KLOC PHP project to Go
this week (in some spare time between large C++ compile times). The primary
goal was to reduce the number of supported languages we use.

The port happened in the mechanical line-by-line way, copying each PHP file to
a *.go and fixing all the syntax. The project was small enough that automation
wasn't interesting.

I agree with the "1/3 time spent debugging the result". Another complicated
facet was the lack of insertion-order preserving maps, that PHP applications
end up relying on heavily. The error/exception impedance mismatch was not a
problem in practice at all.

According to cloc, the original PHP project (excluding vendor) is 1.0 KLOC,
the resulting Go application is 1.2 KLOC. I imagined Go would have been more
verbose than this, but actually most lines remained 1:1 conversions, and the
Go standard library happened to cover a lot of small utility functions that
had to be separately written in PHP (e.g. for string suffix matching).

Another interesting point is the number of comment lines in cloc appeared to
drop dramatically, since real type annotations are much less verbose than
PHPDoc.

~~~
rob74
Small nitpick: in a statically typed language, you don't have "type
annotations", since an annotation is usually optional. If you're talking about
a function declaration, I think it's called a parameter declaration?

~~~
dpratt71
Small nitpick: some statically typed programming languages have optional type
annotations.

------
jasonpeacock
I'd like to compliment the author on the quality of this post. It's very well
written, data/example driven, fair, and educational.

Overall, a joy to read. Thank you!

~~~
redsymbol
Came here to say this. Engaging, thoughtful, truly well written... This
excellent piece is a real contribution to the body of knowledge of language
design, and I'm grateful I got to read it.

~~~
partiallyzen
I freakin' love python, but also it feels like the community is top notch
among languages.

------
melling
Go is probably more verbose because it’s missing List comprehensions, for
example.

It needs map, filter, reduce to reduce line count. Swift, while probably not
as performant as Go, makes writing in a more Pythonic style.

    
    
       [1,2,3,4,5,6,7,8,9].filter {$0 % 2 == 0}.map {$0 * 2}.reduce(0, +)
    
       ["550", "a", "6", "b", "42", "99", "100"].compactMap{Int($0)}.filter {$0 < 100}
    

[https://github.com/melling/SwiftCookBook/blob/master/functio...](https://github.com/melling/SwiftCookBook/blob/master/functional.md)

~~~
ivanech
Note that this is the case because Go lacks generics, so it's not a quick or
easy fix

~~~
adjkant
I'm really hoping to see this soon:

[https://blog.golang.org/why-generics](https://blog.golang.org/why-generics)

------
hartzell
[edit: fixed links]

This was discussed recently on the go-nuts mailing list:
[https://groups.google.com/d/msg/golang-
nuts/u-L7PRa2Z-w/kfUS...](https://groups.google.com/d/msg/golang-
nuts/u-L7PRa2Z-w/kfUSx81PAAAJ)

There was also discussion around an earlier post he made about the work:
[https://groups.google.com/d/msg/golang-
nuts/WstriKt2jTA/lsZy...](https://groups.google.com/d/msg/golang-
nuts/WstriKt2jTA/lsZyX4hYAwAJ)

~~~
zellyn
I'm sad that ESR has not (yet) responded to Nigel Tao's comments on the first
thread, which pretty much capture all my thoughts from reading the blog post
exactly.

------
outworlder
> The problem directed the choice of Go, not the other way around. I seriously
> considered OCaml or a compiled Lisp as alternatives. I concluded that in
> either case the semantic gap between Python and the target language was so
> large that translation would be impractical. Only Go offered me any
> practical hope.

I wish they expanded more on this point. Do they mean that rewriting in, say,
Lisp would be longer because it wouldn't be a 'port' and more like writing a
new program from scratch?

EDIT: Spoke too soon. Reading more carefully, I answered my own question.

> Python reposurgeon was 14 KLOC of dense code. At that scale, any prudent
> person in a situation like this will perform as linear and literal a
> translation as possible;

~~~
ptx
Yeah, that point doesn't seem to make a lot of sense. When people talk about
Python and Go being similar, or filling a similar niche, I gather that this
refers mostly to the short compilation time, rich standard library and easy of
use. They are quite different in other ways.

Go might also be more similar in syntax than to Python than Lisp is, but the
article talks about the "semantic gap" (not syntactic gap) which seems much
narrower in Lisp vs. Python than in Go vs. Python.

> I did examine two automated tools for Python to Go translation, but rejected
> them because I judged the generated Go code would have been a
> maintainability disaster.

Perhaps indicating that the languages are not all that similar semantically.

------
downerending
Interesting. I'd have expected more than a 50% code expansion going to Go,
maybe even 3x or 5x.

Similarly, he's using 40x speedup as a rule of thumb. I usually think of
Python as 20x slower than C.

Personally I'd be loathe to convert a working Python system to Go, but it
sounds like he had good reasons. I do wonder a bit whether divide-and-conquer
or a C extension might not have worked instead.

~~~
jerf
"Interesting. I'd have expected more than a 50% code expansion going to Go,
maybe even 3x or 5x."

This has been my extensive experience as well. I wouldn't be able to use Go if
it was that much more verbose than Python. It certainly isn't as succinct as
Python by any means, but it's not the night-and-day nightmare a lot of HN
posters seem to think it is... provided you actually learn the language.

In fact, one of the questions that I've found coming up in my head is... if
you take the huge, _huge_ pile of features that Python or other languages
bring to the table that Go doesn't, and all their corresponding disadvantages
in terms of having to learn them all, and how they all interact, etc.... and
that's _all_ you get in real, production code... is it really worth it?
Because let's not mince words... it's a long list of features, all of which
superficially seem awesome. And I can craft one-liners that would be a dozen
lines or more of Go... but usually those one-liners have a lot of single-
letter variables in them to focus on all the awesome syntax and features. When
I get into real code with real variable names, the advantage fades fast.

I find this to be food for thought. I still haven't fully integrate it into my
worldview. But I can definitely say I feel like the cost/benefit matrix I had
in my head even three years ago has shifted a lot. Perhaps it would be fair to
say I haven't necessarily lowered my estimate of the benefits of all the fancy
features, but my estimates of their costs have gone significantly up. A lot of
them are really cheap in the moment you're writing them down for the first
time and using them, but carry hidden long-term costs that I feel like younger
me was not accounting for properly, especially if you are not the only
developer.

~~~
stubish
You wouldn't take the _huge_ pile of features in an all or nothing. You get to
pick and choose. I think the authors wish list is similar to most people who
are experienced with more expressive languages. I don't see calls for Python
f-strings or async syntax to end up in Go. But something like a list-
comprehension syntax I think would be really popular (based on what I saw when
it was introduced to Python, originally 'why' but now many people's favorite
feature).

Interesting how Python is now considered a huge pile of features, considering
how long it took for things like a ternary operator to end up in the language.
Maybe Go is in it's Python 1.5.2 lifecycle stage ;)

~~~
jerf
"Interesting how Python is now considered a huge pile of features"

I've been tracking Python since 1.5.2 was common and 2.0 was just coming out.
For me, it's not been a problem because the new features were added
incrementally.

But I've seen new programmers try to come at it in 2019 and 2020, swallowing
what I got spread out in ~15 major releases over ~20 years all in one big
chunk, and learning even the core language now is definitely "a pile of
features", to say nothing of the various library ecosystems. I don't even mean
this necessarily as a criticism per se, just a description. It's not an easy-
to-learn language anymore, and I don't recommend that people model it as such.
It's a power tool for developers, not an easy learning language.

And, yes, the "you can pick and choose the features" is a non-starter in a
team environment. At _best_ , the team can pick and choose, and that takes a
rather strong hand and alignment to achieve. It is a very common case that
you'll just end up with what your teammates use.

~~~
downerending
That's a good point. One of the original appeals of Python is that it started
as a pedagogical language, and it shows. Even Python2 today still has most of
that feel.

Python3 on the other hand, just isn't. You're pretty much required to
constantly deal with Unicode and its related issues, and this is a burden for
beginners. In Python2, you can kick that can down the road almost
indefinitely.

~~~
ptx
That hasn't been my experience. In Python 2 your program would appear to work
fine initially, but the first time your data happens to contain a non-ASCII
character you would get a mysterious UnicodeDecodeError somewhere far away
from where the actual problem is.

In Python 3 it just works again, like it did in Python 1.5, before the whole
str/unicode implicit-conversion mess was introduced in Python 2.

------
DougBTX
There would probably be a much longer list of issues if ESR had converted to
Rust instead, but the syntax for error returns is quite interesting. Rust and
Go both opt not to have exceptions, instead they use error return values.

The original Python code using exceptions was:

    
    
        sink = transform3(transform2(transform1(source)))
    

Making that use error return values looks quite verbose in Go, but Rust has
syntax specifically for that case, making it quite manageable:

    
    
        sink = transform3(transform2(transform1(source)?)?)?

~~~
devmop
It's ugly either way but for clarity you can move the error checks into the
transformation leaving the call point clean.

i.e transform1 returns (result, error) and transform2 accepts (result, error)
and short-circuits if err is not nil.

It allows the expressive and succinct description of a transformation list but
with a bunch of messiness hidden.

~~~
andrewzah
This isn't ideal, because it forces each successive function that takes a
Result type to add handling code in the case of an error.

fn(fn(fn()?)?)? is a bit gnarly but better than duplicate code like that, imo.

------
patrec
Adding lookbehinds to the regexp library is a terrible idea.

> The regexp implementation provided by this package is guaranteed to run in
> time linear in the size of the input.

Python's is exponential, because it inherits all the non-regular "regular"
expression mess (such as lookbehind and backrefs) from perl.

One would assume esr would have marinated in unix culture for long enough to
be aware of this.

~~~
wittekm
Some projects need performant regexes, and some honestly just don't. I agree
that keeping the base regex library linear is admirable, but it's be nice if
they offered a well-marked thing like regex.slow_and_perl_like in the stdlib

~~~
alexhutcheson
[https://godoc.org/github.com/glenn-brown/golang-pkg-
pcre/src...](https://godoc.org/github.com/glenn-brown/golang-pkg-
pcre/src/pkg/pcre)

------
cik
I love reading ESR, and it's such a pleasure to see how great his writing is
in 2020. This made me hark back to The Cathedral and the Bazaar - and also
echoed my usage.

I do however find that when you have a python project that is heavily
dependent on third party libraries - these things get significantly larger and
more problematic. That's not really a commentary on Go, inasmuch that it's a
byproduct of the longevity of Python.

------
ageofwant
I do not get why people do these total rewrites, especially for working Python
systems. Why throw out the baby with the bathwater ? Python is fundamentally a
composing toolkit. Rewrite the slow bits in C++/Rust/Go and wrap it. That's
how all major Python components like Numpy, Scipy, Tensorflow, PyTorch etc.
does it. And that's a major reason why Python dominates today.

Align with the core strengths of Python's philosophy and its toolset and get
the benefit. Why fight it ?

~~~
pdonis
_> Rewrite the slow bits in C++/Rust/Go and wrap it._

For reposurgeon, you can't. Not the author, but I have done some hacking on
it. "The slow bits" are not things you can rewrite in a different language and
wrap. They are much too integral to the code.

 _> That's how all major Python components like Numpy, Scipy, Tensorflow,
PyTorch etc. does it._

And what all of these have in common is that the "slow bits" are _not_ like
those in reposurgeon. The wrapped speeded-up things are basically fast
implementations of appropriate basic data types, like Numpy arrays and
vectors. Those aren't the kinds of things that are slow in reposurgeon.

~~~
ptx
So which parts and things _are_ the slow bits of reposurgeon? ESR seemed to be
saying[1] that the last time he tried profiling it was seven years ago.

[1]
[http://esr.ibiblio.org/?p=8161&cpage=1#comment-2065946](http://esr.ibiblio.org/?p=8161&cpage=1#comment-2065946)

------
3fe9a03ccd14ca5
> _The man barrier to translation was that, while at 14KLOC of Python
> reposurgeon was not especially large, the code is very dense._

Reasoning about somebody else’s dense code is probably the least favorite
activities. When I hear about a language being “expressive” or having
“flexible syntax” I shudder.

~~~
hibbelig
I think the author was reasoning about his own dense code, though.

------
mistrial9
significant mastery ahead!

This is a success story and a teaching document.

.. have to point to this : "Now that I’ve seen Go strings… holy hell, Python 3
unicode strings sure look like a nasty botch in retrospect. " (!)

~~~
AnimalMuppet
I wish he had been a bit more explicit there, TBH. What's better about Go
strings?

~~~
Izkata
Skimming over a post from a quick search [0], it looks like Go strictly uses
byte strings and manipulates them with various keywords and functions. No
encoding/decoding between byte strings (and picking an encoding) and unicode
strings like in python.

[0] [https://blog.golang.org/strings](https://blog.golang.org/strings)

~~~
rauhl
I think that’s not really the right way to put it. Rather, go offers UTF-8
strings and byte slices, with a simple typecast in either way, with various
keywords and functions to DTRT to each. One still must worry about encoding &
decoding if one doesn’t want UTF-8, and one must worry about invalid UTF-8 in
a byte slice when casting, but in general Go does what one would expect with a
minimum of fuss.

~~~
0xjnml
Go strings are just a sequence of bytes in no particular encoding, i.e they
can contain arbitrary data. Converting strings to byte slices and vice versa
works always without ever changing a single bit.

------
xiaodai
If it was too slow in Python and now moving to Go. Could there a time when
there is a need to move to Rust/C/C++ for even faster performance? Go seems an
odd choice based on performance consideration alone.

~~~
tejinderss
Especially with node.js/TypeScript one can reach similar performance to Go
with arguably much nicer programming language to work with.

~~~
FAKEDETECTOR
Evidence, please.

------
jrockway
Pretty interesting. It is scary to make your "learn a new language" task to
port 14,000 lines of code, but with that in mind, this all seems to have gone
well. Some random thoughts:

> I had to write my own set-of-int and set-of-string classes

map[int]struct{}, map[string]struct{}

    
    
       ints[42] = struct{}{} // insert
       delete(ints, 42) // delete 
       for i := range ints { ... } // iterate
       if _, ok := ints[42]; ok { ... } // exists?
    

> Catchable exceptions require silly contortions

I am not sure why go has panic/recover, but it's not something to use. panic
means "programming error", recover means "well, the code is broken but I'm
just a humble generic web framework and I guess maybe the next request won't
be broken, so let's keep running". It is absolutely not for things like
"timeout waiting for webserver" or "no rows in the database" as other
languages use exceptions for. For those, you return an error and either wrap
it with fmt.Errorf("waiting for webserver: %w", err) or check it with
errors.Is and move on. Yup, you have to remember to do that or your program
will run weirdly. It's just how it is. There is not something better that
maybe with some experimentation you will figure out. You have to just do the
tedious, boring, and simple thing.

I have used recover exactly once in my career. I wrote a program that accepted
mini-programs (more like rules) via a config file that could be reloaded at
runtime. We tried to prove them safe, but recover was there so that we could
disable the faulty rule and keep running in the event some sort of null
pointer snuck in. (I don't think one ever did!)

> Pass-by-reference vs. pass-by-value

I feel like the author wants []*Type instead of []Type here.

> Absence of sum/discriminated-union types

True. Depending on what your goals are, there are many possibilities:

    
    
       type IntOrString struct { i int; s string; iok, sok bool }
       func (i IntOrString) String() (string, error) { if i.sok { return i.s, nil } else { return "", errors.New("is not a string") }}
       func NewInt(x int) IntOrString { return IntOrString{i: x, iok: true} }
       ...
    
    

This uses more memory than interface{}, but it's also very clear what you
intend for this thing to be.

I will also point out that switch can bind the value for you:

    
    
       switch x := foo.(type) {
       case int:
          return x + 1
       case string:
          i, err := strconv.Atoi(x)
          return i + 1
       }
    

And that you need not name types merely to call methods on them:

    
    
       if x, ok := foo.(interface { IntValue() int }); ok { return x.IntValue() }
    
    

You can also go crazy like the proto compiler does for implementations of
"oneof" and have infinite flexibility. It is not very ergonomic, but it is
reliable.

> Keyword arguments
    
    
       type Point struct { X, Y float64 }
    
       func EuclideanDistance(a, b Point) float64 { ... }
    
       EuclideanDistance(Point{X: 1, Y: 2}, Point{3, 4})
    

> No map over slices

This one is like returning errors. You will press a lot of buttons on your
keyboard. It is how it is.

I personally hate typing the average "simple" for loop:

    
    
       func fooToInterface(foos []foo) []interface{} {
           var result []interface{}
           for _, f := range foos {
               result = append(result, f)
           }
           return result
       }
    

But it's also not that hard. I used to be a Python readability reviewer at
Google. I always had the hardest time reading people's very aggressive list
comprehensions. It was like they HAD to get their entire program into one line
of code, or people wouldn't think they were smart. The result was that the
line became a black box; nobody would read it, it was just assumed to work.

I really like seeing the word "for" twice when you're iterating over two
things.

~~~
zbentley
I feel about list comprehensions the way I feel about regular expressions.
Below a certain point of complexity, both are vastly superior ways of
expressing what's going on. Above that point, comprehensibility drops off
_fast_ , and they immediately become inferior tools.

For example, something like:

    
    
        [some_func(y) for x, y in some_dict.items() if some_condition(x)]
    

...is, at least in my mind, eminently more readable than the imperative
equivalent, and less PEBCAK-risky (i.e. what if you have to do two similar
iterations and forget to use a different accumulator between the two?).

However, I totally agree with you re: "don't get too clever". Pretty much the
instant you have nested comprehensions, or try to get clever iterating over
multiple data structures in a single expression, it immediately becomes _much_
worse than the imperative form, and you should feel a little bashful and bust
out some loops, functions, and accumulators.

Same is true for regex. Something like:

    
    
        ($match) =~ qr/\A(?:foo|bar)[.]com (\d+)-baz$/
    

...while it does require you to know regex, is _way_ simpler and more robust
than writing the equivalent stack of many string slicing conditions, and less
error prone. However, once the "too clever" rubicon is crossed (subjective,
but my personal rule of thumb is: more than 50chrs of non-literals or more
than one lookaround expression), like list comprehensions, it rapidly becomes
_much_ harder to understand than the equivalent long, explicit, string-slicing
form.

As with many things, I think that skill in these areas is a matter of knowing
when to stop.

------
luord
I liked this summation because the migration happened for a truly valid
reason: Python _really_ was a bottleneck. Not that I expected ESR to succumb
to hype driven development, but it's nice to see for sure.

On the article itself: I just knew that error handling would have the biggest
write-up, even when the one writing was someone like ESR. Gods, the error
handling in Go is odious.

Now my obligatory opinion: If only [insert language here, Go in my current
job]'s promise of producing more maintainable code was true; the reality is
that it's just the same nigh unmaintainable hell I've found in nearly every
other project I've worked on. At least Python is nice to read, even (mostly)
when awfully written. Oh, how I miss it.

------
Insanity
The missing 'keyword arguments' could have been replaced with a struct passed
to a function, no? Unless I'm missing something from Python, in Go you could
replace this type of function:

    
    
         func f(x int, y int, c string) 
    

with something like this:

    
    
         type funcOptions struct {
              x, y int
              c    string
         }
         func f(o funcOptions) {} 
     
         f(funcOptions{x:3, y:-1, c: "hello"})
    

So the readability hit would have been more 'minimal.

~~~
room271
Yes, though you also have to consider default values as a possible source of
error.

~~~
Insanity
True, but you could provide default values as well if you use something like
this: [https://medium.com/@meeusdylan/go-reduce-function-
parameters...](https://medium.com/@meeusdylan/go-reduce-function-
parameters-19b785a87a59)

You'd need extra logic to know a value has not been set at all though. At
which point the complexity might not outweigh simply..not using them.

------
tanilama
One of the better read for a long time.

The translation assistant you write is actually very interesting. Heavily rule
based but surprising to see it actually helps at all.

But the scale of the project itself seems still pretty limited,
reimplementation could still be an option.

Overall good read and interesting approach

------
transfire
If you want a real surprise, try a rewrite in Elixir.

~~~
transfire
Hmm down votes... Too bad b/c I'm serious. I ported a corpus generator some
years ago from Ruby to both Go and Elixir. To my surprise it was easier to
port to Go, but the Elixir version ran much faster.

