
The ups of downs of porting 50k lines of C++ to Go - logicchains
https://togototo.wordpress.com/2015/03/07/fulfilling-a-pikedream-the-ups-of-downs-of-porting-50k-lines-of-c-to-go/
======
acqq
Do read the answer of the author regarding the performance:

"The throughput of the Go program is quite competitive with the C++ one,
although the server’s IO-bound so most of the time is just spent in socket
write/read syscalls. The latency is at least an order of magnitude worse, due
to Go’s garbage collector, which is amplified by the use of an older Go
version. If the server was latency-critical I don’t think it could have been
written in Go, at least not until the new GC planned for 1.5 or 1.6 is
released (assuming we could upgrade to a newer kernel by the time its
released)."

~~~
logicchains
Author here. Just a note that by latency-critical, I'm referring to >10
millisecond latencies. If you can tolerate occasional pauses of 400-500
milliseconds, then the GC wouldn't be a problem. Also note that the GC
slowness came from having to scan a fairly large heap (a lot of cached stuff);
it could be avoided by storing all that off-heap, but I suspect that would
complicate the code significantly.

Finally, note that by "at least an order of magnitude worse" I'm comparing it
to hyper-optimised C++ that's designed for sub-millisecond latencies, as the
C++ server used the same framework used in latency-critical HFT software.

~~~
pron
May I ask why you've chosen Go over Java, which is becoming very popular in
the HFT industry, even for latency-critical code?

The code generation tools are better (what you call "compile time IO"), the
IDEs are much better, it has generics which you seem to miss, performance is
better (the GCs are state-of-the-art), monitoring is much better, the language
is also regular and simple, and you don't have to write inheritance-heavy code
if you don't like.

As someone who likes both Java and Go, I find it surprising that anyone would
choose the latter for long-running server code, especially where performance
matters. Go is great for quick command-line apps or very simple services, but
when you need to build an important server, Java wins out every time.

Certainly for your particular requirements and preferences, Java seems to have
all of Go's advantages and (almost) none of the disadvantages.

~~~
acqq
How do you avoid too many allocations to happen in Java? The GC is certainly
slower with more allocated elements but I seldom see any code example in Java
which doesn't behave like the allocations don't cost anything. More than that,
it seems that the whole language is based on that premise? As I consider the
info you already provided a good argument for Java, I hope you can provide
some good links.

~~~
rwallace
Java has a generational garbage collector, so short-lived allocations pretty
much don't cost anything, and by and large it's exactly the short-lived
allocations that are the ones you could have optimized away in e.g. hand-tuned
C++.

~~~
acqq
What about an object being always handled _only_ through the pointer? Does
that mean that the array of 10M objects is actually an array of 10M pointers
and 10M allocated objects, all of which have to be allocated, deallocated and
the travelled through by the garbage collector? And what if it's not an array,
but some more complex form? Is there a clean way to group a lot of objects to
be treated by the allocator, deallocator and the GC as the single allocation
unit? I understand that some language lawyers think that's not important
("just use the 'new,' the VM should care and not you") but for somebody like
me who's used to the C level of control and actually cares and measures the
performance differences which can result in the different number of servers
needed to solve the problem, it really is.

~~~
pron
This kind of stuff matters to Java developers, too, as, perhaps surprisingly,
Java has become a high-performance language, especially when it comes to
concurrency (as it offers low-level support for memory fences, and includes
state-of-the-art implementations of concurrent data structures).

As pjmlp said, the issue of "array of structs" is being addressed in Java 10.
In the meantime, for contiguous memory allocation, you can make use of off-
heap memory (which also helps those "more complex forms"). But the flip side
is that Java's memory allocation is a lot faster than C's (i.e. in throughput
-- not latency, as there are GC pauses), and most GCs are copying collectors
that automatically arrange objects contiguously (though it's far from being
good enough for arrays of object, as you need to follow references, and every
object carries additional overhead, which is precisely why this is being
addressed in Java 10).

------
more_original
> Go also forced me to write readable code: the language makes it impossible
> to think something like “hey, the >8=3 operator in this obscure paper on
> ouroboromorphic sapphotriplets could save me 10 lines of code, I’d better
> include it. My coworkers won’t have trouble understanding it as the meaning
> is clearly expressed in the type signature: (PrimMonad W, PoshFunctor Y,
> ReichsLens S) => W Y S ((I -> W) -> Y) -> G -> Bool”.

I like this description. It's one of the reasons why I prefer OCaml to
Haskell, but I've found it hard to verbalise.

~~~
sfk
I'm still trying to force myself to like Go. The basic problem is that Go is
_too regular_ for me, which makes it painful to read other people's code.

I don't have this problem with functional languages or C written in a free
flowing coding style like djb's.

Is it even possible in Go to have an individual style?

~~~
MetaCosm
Too regular? I gotta be honest, that comment makes very little sense to me. My
problem diving into most codebases (over 10 years spent contracting) was
always multiplied horrifically by non-idiomatic code in language X.

Bob the developer has his own ideas about what + should mean... and uses gobs
of hard to puzzle through magic all over the place. You need to run this
C++-alike code through Bob's Pre-Processor -- but it is fine, cause if QT can
do it, Bob can do it.

I want to read the code to understand the goal and the means for achieving it,
not to be impressed with your brevity or cleverness.

Go was/is developed for teams, which means to some degree to the lowest common
denominator. So far, in the last couple years, I have -- in general -- come to
see these trade-offs as generally wise. Go is exceptional pragmatic for
working on a team.

~~~
gnuvince
> Too regular? I gotta be honest, that comment makes very little sense to me.

Not agreeing or disagreeing with the parent, but I believe he may be referring
to a common problem in software development that is best summed up by a quote
by Yaron Minsky: "You can't pay people enough to carefully debug boring
boilerplate code. I've tried."

If your Go code is just a series of `if x, err := foo(); err != nil { ... }`,
it can become easy during code review to miss subtle bugs because in one
function somewhere, someone write `err == nil` instead. This is not unique to
Go; many programmers feel the same way about Java.

I find that one of Go's weakness is its limited ability to abstract common
patterns. I also believe that for some languages (e.g. Haskell, Common Lisp,
Scala) their main weakness is their apparent _unlimited_ ability to abstract
common patterns, sometimes beyond the understanding of most programmers. I'm
not smart enough to come up with (a) an objective way to measure how
little/how much a language allows abstraction, (b) a language that would hit a
sweet spot of allowing abstraction without going into the deep end.

~~~
jerf
The problem is the sweet spot _moves_ , and it can also be different between
individuals and teams. I wouldn't even be surprised to learn that the sweet
spot for a team is a lower level of abstraction than any individual team
member is capable of, partially due to communication reasons and partially
because "abstraction comfort" is not really on a line and you probably ought
to target the minimum for any given "element" of it on your team.

It's why I've said before that while I'd very much like to _work_ with Go,
it's not my _personal_ favorite. I'm comfortable with Haskell, but I would be
engaging in malpractice to put code into the source control system that
requires that level of fluency with abstraction to understand. Go has a really
solid balance for large teams. If you're currently in a startup with 5 well-
chosen, high-skill engineers, you won't have any clue what I mean by that, but
when you've got hundreds of engineers who may touch some bit of code, where
most of them are trying to spend as little of their cognitive budget as
possible on it so even the very smart and very skilled ones are pretty much
just stabbing the code until it does what they want it to, you start to
appreciate a language that limits how hard they can twist the knife.

------
p0nce
> we had for instance a library for interacting with a particular service that
> was generated from an XML schema, making the code perfectly type-safe, with
> different functions for each datatype. In many languages that allow compile-
> time metaprogramming, like C++ and D, IO cannot be performed at compile
> time, so such schema-driven code generation would not be possible.

Actually this would be possible in D since you can read files at compile-time
with the "import(filename)" syntax. Then you can use compile-time parser
generators to parse it.

~~~
logicchains
I didn't realise that was possible; I've updated the post to reflect that.

~~~
p0nce
Avoiding code generation is a stated goal of the language. Too bad the hype
train swings in random directions.

~~~
logicchains
For what it's worth I'd personally have preferred to use D, but politically it
wasn't an option.

------
inglor
"“hey, the >8=3 operator in this obscure paper on ouroboromorphic
sapphotriplets could save me 10 lines of code, I’d better include it. My
coworkers won’t have trouble understanding it as the meaning is clearly
expressed in the type signature: (PrimMonad W, PoshFunctor Y, ReichsLens S) =>
W Y S ((I -> W) -> Y) -> G -> Bool”."

This was worth it - the fact a languages idioms are "don't write _clever_
code" is extremely positive.

~~~
pjmlp
Specially when reviewing that latest code drop from the off-shoring partner
company.

------
zeekay
I've also found the lack of parametric polymorphism a huge pain point. Type
safety is constantly sacrificed to allow for code-reuse and nicer APIs,
leading to really awful code using type switching at best, inscrutable amounts
of reflection at worst. This seems to plague the Google developers as well,
just look at the Go App Engine APIs.

~~~
learc83
I built ported part of a distributed system I have in production to Go about 2
years ago. I remember doing a lot of reflection trying to factor out some
common SQL code. Like this:

    
    
        func CreateRecord(record interface{}) (err error) {
            t := reflect.TypeOf(record)
            v := reflect.ValueOf(record)
    

Where I named the struct the same thing as the table it mapped to.

~~~
jerf
There are often alternatives. One of the ones I picked up from Haskell (of all
places) is having an instance of something in hand solely for its type. I
don't know what all you were about to do with that record, but in this case,
you may have been able to use:

    
    
        type Record interface {
            Create() Record
        }
    

which declares an interface of things that know how to create another copy of
themselves and return it. Well, the type doesn't guarantee that the same type
will come out but you can document that. Then you don't need reflection to
create a new instance, as long as you can get an instance of the correct type
from somewhere else. You presumably have some arguments that goes in, but it's
reasonably likely (though not certain) that there's some sort of regularity
you can exploit and put into the type signature of Create() up there.

I've used this pattern myself in a generic "game server" that implements a
network protocol and manages creating "rounds" of games and other high-level
bookkeeping, where you pass it in a "prototypical" game object that provides a
"CreateNew()" method on it, thus making it so the core engine can create new
games without ever actually knowing what the game itself is.

If it's good enough for Haskell it's good enough for Go.

(Haskell, a bit confusingly, calls this a Proxy, perhaps because the instance
is standing in as a proxy for the type? I was never quite sure where the name
came from.)

I've actually written quite a bit of Go now without needing reflection, and
the only "interface{}"s in the system are either A: things that legitimately
can be "anything" (on a per-instance basis, i.e., not ATDs that really want to
just have one type in them but legitimately on an object-by-object basis may
contain an "anything") or B: things I really want to say are
"encodable/decodable as JSON by the standard encoding/json library but there's
no clear way to say that in the type". The latter annoys me in theory a great
deal more than it annoys me in practice.

Mind you, I acknowledge there's a point where you'll be left with no other
options but an interface{} or reflection, but people do end up reaching for it
more quickly than is strictly speaking necessary. And it isn't necessarily a
compliment for Go that it makes you think a bit harder for this sort of thing.

------
coliveira
The lack of polymorphism and parameterized types makes Go the C of 2010s. This
practically means that in a few years we will have someone, somewhere,
creating Go++ and the story will repeat itself.

~~~
humanrebar
Go supports several types of runtime polymorphism, including interfaces and
closures. It just doesn't support polymorphism through OO-style inheritance.

C supports runtime polymorphism, for that matter, it just requires a bit of
boilerplate to set up and use function pointers and tagged dispatch.

You might be right about Go++ (ObjectiveGo?) being inevitable, though.

~~~
TylerE
The mostly painful situation isn't at all what you mean, it has nothing to
with inheritance at all.

In Go, it is impossible to do "container" polymorphism. That is, write a
function that manipulates, say, an array or maps of "somethings", where the
only thing it does with the somethings is assign them, read them (to return
one), or check if two "something"'s are equal to one another.

The necessity of this functionality can be seen by the fact the Go stdlib can
ACTUALLY DO THIS for the built-in functions, but no facility is provided to do
the same for user code.

~~~
Animats
_" The necessity of this functionality can be seen by the fact the Go stdlib
can ACTUALLY DO THIS for the built-in functions, but no facility is provided
to do the same for user code."_

Right. Go has generic types. Channels and maps are generic types. It just
doesn't have user defined generics. This is a lack.

Getting around this is creating a new level of cruft on top of Go. "Generate"
and reflection are being used to work around the lack of generics. It's going
to be tempting to use "generate" to invoke a macro language. That may not end
well.

------
fit2rule
>>Simple, regular syntax. When I found myself desiring to add the name of the
enclosing function to the start of every log string, an Emacs regexp find-
replace was sufficient, whereas more complex languages would require use of a
parser to achieve this.

It would be really wonderful to have a series of tutorials on this subject. It
might be a good reason for me to learn to use Emacs, anyway ..

~~~
Dewie
Another way to use search in Emacs is to go to a place that you need to
edit/insert text. Instead of using the movement commands or the mouse, use
forward or backwards search to search for the point you need to be at: like
"(str" in "(String s) ...".

I don't know if it is _faster_ for me. But it can feel more ergonomic, since
you don't have to make so much effort into going to a specific point. I'm
getting better at it, though: maybe soon it will become second-nature.

~~~
lclarkmichalek
That way also makes writing kbd macros a lot easier

------
stcredzero
_No inheritance. I’ve personally come to view inheritance-based OO as somewhat
of an antipattern in many cases, bloating and obscuring code for little
benefit_

So everywhere you would use inheritance, you use composition instead? The
stuff you'd have stuck in the superclass, you stick somewhere else and stick
in your struct?

~~~
autarch
> So everywhere you would use inheritance, you use composition instead? The
> stuff you'd have stuck in the superclass, you stick somewhere else and stick
> in your struct?

No, Go provides interfaces (aka traits or roles), which you can use to share
composable functions (aka methods).

~~~
stcredzero
_No, Go provides interfaces (aka traits or roles), which you can use to share
composable functions (aka methods)._

I wasn't talking about creating conventions/apis/facades, rather about how one
reuses code.

------
tapirl
I feel many of pros and cons listed in this article are not related to the
porting at all.

btw, I think go is really not a replacement of c++. In my experience, go is
more a replacement of java to improve the development speed.

~~~
logicchains
The pros and cons are all based on what was learned from the porting process.
Unfortunately the confidential nature of the software prevents discussing it
in greater detail.

Go can be a replacement for C++ for programs that didn't need to be written in
C++. There aren't many programs like that around nowadays however, as most of
the time C++ is only used when it's really necessary, such as for extremely
latency-sensitive applications or applications requiring precise
memory/allocation control.

~~~
rakoo
I wouldn't bet on that. I would expect C++ programs to have been written ages
ago and not having been rewritten because of a combination of "it works"/"I
can't read it"/"It's not my code". The most famous C++ to Go port must be
dl.google.com ([http://talks.golang.org/2013/oscon-
dl.slide#1](http://talks.golang.org/2013/oscon-dl.slide#1)), which arguably
never needed to be written in C++ in the first place, except it was probably
the only reasonable choice at the time.

In other words: legacy.

------
hitlin37
i think in the long run, having to maintain a code that is readable helps a
lot. even though c++ itself is easy to follow, it starts to get complicated
once you get deeper and deeper in oo where everything is inherited from
something else. this is one thing i like in python modules. they are highly
readable. and then write something in Cython if its time critical. same with
Go, the code feels very clean and easy to maintain. i haven't done parametric
polymorphism in c++, so no idea about it.

~~~
ayrx
> and then write something in cpython if its time critical

I believe you mean Cython? :)

~~~
hitlin37
updated :)

------
kakakiki
"Since one of my reasons for getting into programming was the opportunity to
get paid to use Emacs, this is definitely a huge plus."

Wow! Wish I could say the same!

~~~
blt
"one of my reasons for getting into auto repair was the opportunity to get
paid to use Snap-On tools"

I feel text editor loyalty too, but this statement does feel a bit strange :)

------
xjia
Please, use Dialyzer for Erlang.

BTW, I don't know Go, but Erlang has per-process GC, so there won't be a large
heap to scan.

~~~
masklinn
Go doesn't have per-process GCs, because goroutines share memory. Structures
are not copied or moved across channels, a pointer to the structure is copied
and both sender and receiver get access to the same object in memory.

------
Kiro
I'm a PHP programmer. Can someone explain why the lack of parametric
polymorphism is a big deal?

~~~
one-more-minute
Say you need a Vector2D type:

    
    
        struct Vector2D
          int x
          int y
      

Except, hold on, I have a routine that needs floats. In a dynamic language,
I'd leave off the type hints; with a decent type system I'd parametise the
`int` type; in Go I have to reimplement the whole type:

    
    
        type Vector2DFloat
           float x
           float y
    

Lather, rinse and repeat for complex numbers, vectors of vectors, etc. The
only way around this is (a) to use the `interface{}` type (in which case
you're just using a very verbose dynamic language) or (b) to rely on lots of
text-based code generation.

~~~
nadams
Maybe my brain hasn't starting firing on all cylinders yet this morning...but
are you saying that:

If I have some function in Go (pseudocode - I don't know Go but it should get
my point across)

f(float x) { }

and call

f(vec.x)

Go can't cast it to a float?

I'm not being sarcastic or anything - I'm genuinely curious on Go and your
example.

~~~
gnuvince
No, what he means is the you would need to write these two functions:

    
    
            func DotProductInt(v1, v2 Vector2D) int {
                 return v1.x*v2.x + v1.y*v2.y
            }
    
            func DotProductFloat(v1, v2 Vector2DFloat) float64 {
                 return v1.x*v2.x + v1.y*v2.y
            }
    

If Go allowed for parametric polymorphism, the type of the elements could be
abstracted away like this (not real syntax obviously)

    
    
            type Vector2D<t> struct {
                 x, y t
            }
    
            func DotProduct(v1, v2 Vector2D<t>) t {
                 return v1.x*v2.x + v1.y*v2.y
            }

------
shawn-butler
What is the fascination of software people with LOC as a code measure /
metric?

It seems indicative of nothing, not quality, especially not readability nor
maintainability.

I've never understood this apart from the very early days of punch cards and
memory/storage limitations which placed physical limitations on computation.

~~~
frostmatthew
> It seems indicative of nothing, not quality, especially not readability nor
> maintainability.

It's not meant to be indicative of any of those things (though I'd argue, all
else being equal, maintaining an application with more LOC is harder than one
with less).

It's indicative of complexity and scale. A programmer can read through an
entire 500 LOC program and will know _everything_ about the program. This
becomes much more difficult for a 50K LOC program and outright impossible for
a 5 million LOC program.

Taking a 50K LOC program and bringing it down to 10K (regardless if that's by
refactoring, removing unneeded code, or rewriting in a new language) makes it
much easier for each developer to know/understand a larger portion of the
program.

Prior to my current job I had only worked on relatively small applications
(<25K LOC) and I was blown away by the difference between working on things
like that and working on something measured in the _millions_.

[And in this specific context I doubt it would be on the front page of HN if
somebody took 500 lines of C++ and rewrote them in a hundred lines of Go, i.e.
knowing the LOC is useful to determine if this was a meaningful undertaking or
not]

------
Animats
_" It also allows parallel/async code to be written in the exact same way as
concurrent code, simply by setting GOMAXPROCS to 1."_

Aargh! If your code has race conditions with GOMAXPROCS > 1, it's broken.
"Share by communicating, not by sharing". (Ignore the bad examples in
"Ineffective Go". Send the actual data over the channel, not a reference to
it. Don't try to use channels as a locking mechanism.)

------
faragon
What's the point of using GC in high performance software? Seriously. In my
opinion, it makes no sense.

