
Comparing a web service written in Python and Go [pdf] - guai898
https://indico.cern.ch/event/449425/session/1/contribution/6/attachments/1168560/1685802/DAS_python_vs_go.pdf
======
mpdehaan2
I've been seeing a lot of Python vs Go stuff lately and I think a fair amount
of the folks involved in these are not aware of general Python web
architecture patterns.

Of course something compiled directly is going to be a bit faster, but
development time is important too. Python has more libraries and is (for many
people) probably faster to write.

Serving multiple requests is best utilized using a preforking webserver in
front of Python, whether Apache, nginx, etc. This allows multiple requests in
without any async voodoo code. Twisted for example is not the right answer in
this case, because it doesn't get you multiple processes and messes up the way
you write code (async event driven code is more time consuming to
write/debug).

On the backend, your webserver does not start longrunning backend processes,
but you can launch them using things like celery, which is a process manager
that allows you to start jobs and so forth. Celery can run on any number of
machines, and your backend can scale independently of your frontend if you
wish.

Historically, some very computational parts of Python were often written with
C bindings. While I haven't done so, things like Cython may also be promising
for extensions. There's also things like ctypes for quickly just taking
advantage of native libraries in a Python function.

Personally, given, I like how Go has things like channels, but I would never
adopt a programming language for just one specific feature when I lose out on
other features that are valuable to me, for instance, an object model.

(I'm also really curious to see how the typing options in Python 3 play out)

Anyway, I mostly wanted to point out as most people are doing web services
that you should be fronting Python with some sort of web server that allows
preforking, and then the concurrency issue, in my experience, becomes _not a
thing_.

Many backend libraries can easily take advantage of libs like microprocessing,
which are not the most 100% friendly in their more complex IPC-type cases, but
are pretty workable.

~~~
whyever
> Personally, given, I like how Go has things like channels, but I would never
> adopt a programming language for just one specific feature when I lose out
> on other features that are valuable to me, for instance, an object model.

That really depends on your requirements. If you need multithreading (not
multiprocessing), you cannot use Python.

~~~
dec0dedab0de
_That really depends on your requirements. If you need multithreading (not
multiprocessing), you cannot use Python._

About to show my ignorance, but when is multithreading useful when
multiprocessing is not? Assuming it is a use case that is suited for a high
level dynamic language to begin with.

~~~
Consultant32452
Let's say you have a REST request come in. In order to fulfill that request
you need to make 10 REST requests of your own to various back-end systems. If
you can make some of those requests asynchronously you'll greatly reduce your
response time. While I suppose you could mangle your way through it multi-
process, it seems like a bastardization of the model.

~~~
falcolas
If you're just sending HTTP requests, regular threading in Python works fine -
waiting for a socket response doesn't block the execution of other threads.

Python threads are real threads, and things like blocking socket IO does not
block Python execution in other threads.

------
andor
Basically their Python version ("3 thread pools, 175 threads") is synchronous
and single (OS)-threaded, while the Go rewrite uses goroutines and multiple OS
threads. The fact that their Python version takes "minutes to startup"
indicates that a rewrite was necessary anyways.

Go is a good tool for the job, Python _threads_ are not. asyncio or one of the
event-based IO frameworks should work much better.

As for the problem of sharing data between processes (slide 5): it appears
that this service is read only? If that's true, what do you need to share?
Every process can have it's own connection pool. You don't even need
multiprocessing, just use SO_REUSEPORT and start your application multiple
times.

~~~
mtanski
You could probably get decent performance for a similar application written in
another language (then Python) using 175 threads. 175 threads is not that big
of deal, the OS can manage it pretty well. It's only when you start talking
about thousands of individual connections and thousands of threads that you
need to worry. Python sucks at that at low number of threads (GIL).

~~~
fauigerzigerk
175 threads use a lot of memory and cause a lot of context switching. I would
never write an application so that it needs 175 OS threads, because if it
needs that many, how many am I going to need down the road? It's an ominous
sign for scalability in my view, even if it works for a while.

[Edit] I'm a assuming a CPU with 8 cores, not some 64 core monster.

~~~
mtanski
175 threads really don't use that much ram. I know userspace stacks are large
by default but most apps don't use them and they are never materialized. So
even if you're using 1MB of stack space for each one that's only 175MB. You
can easily fit that on whatever is the smallest AWS instance.

I imagine that context switching between 175 OS threads all in the same
process wouldn't really be that big of a deal.[https://www.quora.com/How-does-
thread-switching-differ-from-...](https://www.quora.com/How-does-thread-
switching-differ-from-process-switching/answer/Robert-Love-1)

Additionally there are many legitimate cases for for a lot of threads like
disk IO. If you find your self having to push a lot of bytes to/from a high
iops drive like an SSD / NVM drive. Unless you're doing large sequential
transfers that you can do in one large call, you will needs submit many
concurrent request to saturate the drive (via threads). Disk IO is not network
IO.

~~~
fauigerzigerk
To be honest, I don't really have a good intuition or hard data on where the
context switching overhead (or other limits) starts to bite, because I have
always avoided architectures that go into the hundereds or thousands of
threads.

Maybe you are right and it's one of those urban myths that we sometimes carry
over from times long past based on assumptions that are no longer true.

I would love to have more hard information on that one, because I think that
the currently fashionable async/event based way of doing a lot of things makes
programs much harder to understand and write.

~~~
dboreham
Good rule of thumb for modern kernel and server class hardware : 100's of
(native) threads is ok. 1000's will probably be ok. 10's of 1000 is where you
will start to see trouble and 100's of 1000 will most likely cause you to pull
out your hair. So the comment above about 175 threads being too many is
incorrect.

------
mherrmann
Anybody else find it difficult to believe that a 4k LOC Go project takes 26k
LOC in Python?

~~~
dekhn
Typically rewrites like this focus on core functionality; I truly down the
project is a 1:1 equivalent. There may be factorings, as well (functionality
included as part of Go).

~~~
mherrmann
Yes. I really do feel like we are not being told the whole story here.

~~~
dekhn
That said, I'm not really surprised about the performance details. My
experience was that Go made it pretty easy to "light up all the cores" on a
machine. I say this as a person who spent a lot of time releasing the GIL for
multithreaded C++ code hiding behind python front ends.

------
FraaJad
This looks like a report written by someone who is trying to show how their
$favorite system is better than the $other one.

Best opensource the code for both and the benchmarks and have people go at it.

~~~
laumars
Not really. It's just a report written by someone who has an existing code
infrastructure and is experimenting with alternative approaches so wrote some
basic scripts for benchmarking.

~~~
FraaJad
While it may be true, when it is posted without the proper context and an
unbiased way to assess the outcome, this presentation will be used as "proof"
that Go is better than Python. (which it might be, but not everywhere)

~~~
laumars
Possibly. But you'd expect most developers to be smart and impartial enough to
read these statistics and take them as a case study rather than hold them as
gospel.

Quite frankly, I'm getting a little sick of the arguments that happen on HN
whenever a Go-related article comes up (and particularly so with Python vs
Go). You get people who seem to hate Go who go all out to criticise Go and/or
statically typed languages. Who argue that pro-Go articles are biased; and so
forth. And then you get the Go fanboys (of which seem to be less vocal lately)
who declare that Go is our lord and saviour and we should be rewriting core OS
internals in it. It's just nuts. Sensible people would see that Go is better
at some things and worse at others. And that articles like this are just an
interesting case study and might not apply to their own personal software
problems but not in any way biased beyond the fact that the article is
tailored specifically to their own software problems.

~~~
FraaJad
I don't hate Go, as much as I detest the breathless fans of the language. I am
a "former-ish" python programmer who now uses D for present projects with an
eye towards Rust for building libraries and low-level system stuff. I also use
Nim in place of Python for a few personal scripting projects.

All these languages are statically typed and in my understanding of
Programming Languages, way better than Go.

~~~
laumars
Sorry, I didn't mean that as a dig against you specifically. Just a
generalised comment as I've just noticed a backlash on HN wrt Go.

Nim's often interested me. I'd really love to try it but sadly it's lack of
mainstream support means it would be hard to justify using Nim in any
professional capacity, which vastly limits it's usefulness to me. And sadly I
don't have enough free time to learn languages for fun these days.

------
aidos
I haven't done any real work in Go yet but this sounds like one of the (many)
use cases it's well suited to.

Unfortunately this overview is light on any meaningful details. As a general
rule a rewrite of any project will result in fewer lines of code, however, in
general, a rewrite of any project is a terrible idea.

Given that this seems to be a situation in which you have a lot of blocking
waiting for concurrent requests, why not try something like gevent?

It's good for people to try different approaches and technologies. I'm glad
they managed to have success with Go, that's good for everyone. It would have
been interesting for the reader to see some of the gory details of hacking
around with the existing codebase to see some of the ideas that may (not) have
worked.

------
alexchamberlain
Yet another Go article not fairly comparing technologies. What about a Python
implementation that used `asyncio`, for example? What about `PyPy`?

~~~
rbanffy
It's a Go rewrite of an existing, and probably old, Python application. You
are asking them, who already did a rewrite in Go and kindly provided their
assessment of the process, to also to a Python rewrite using more modern
approaches.

Feel free to rewrite their old Python app in Python for free. They may thank
you and even use your port.

------
kozak
I'm not saying you shouldn't use dynamic languages at all (in fact, I'm
developing in one right now), but you should keep in mind that you are paying
a computational price for that dynamism every time a line of your code is
executed.

~~~
collyw
And you are paying for developer time otherwise.

~~~
laumars
Static languages don't take _that_ much longer to write than dynamic
languages. But on the flip side: a more performant software stack (regardless
of language paradigm) does reduce your sysadmin time due to them having to
maintain a smaller server cluster, as well as reducing your hardware / cloud
costs. Generally speaking, of course. But this is quite a generalised
discussion as is.

~~~
collyw
Java did the last time I tried it.

~~~
kozak
I once worked on a project that was written in Groovy (a hipster version of
Java, so to speak). At some point I have converted it from dynamic to static,
by adding the @CompileStatic directive and adding some type declarations. It
became MUCH more productive and maintainable. If I would start a new project
on the Java platform right now, I would be certain to choose Groovy with
@CompileStatic instead of Java. It has all the good things of Java without all
the bad things.

~~~
vorg
I've found Groovy's only good for dynamically typed code...

* the stuff you want to write quickly but don't mind it running slowly, i.e. throwaway code

* the small stuff you know won't become a larger system one day, e.g. 30-line Gradle build scripts and code testing Java classes

* when you know you won't be upgrading to Java 8, which Groovy hasn't kept up with syntactically

Groovy's static compilation was tacked on for version 2.0 and doesn't work,
except for sprinkling the occasional @CompileStatic around your code in a
trial and error fashion. You didn't actually say you HAD started a new
statically typed project in Groovy on the Java platform. If you do, use Java 8
which makes much of what Groovy brought to Java 7 redundant, or another
language written from the ground up for static compilation, e.g. Scala or
Kotlin.

In fact, I've found even for code testing Java classes, using Clojure is more
productive than Groovy once you get over the syntax hurdle because macros can
eliminate verbosity in repetitious test scripts in a way functions can't.

------
iamd3vil
Anyone who thinks it's difficult to program in Erlang, please have a look at
Elixir([https://elixir-lang.org](https://elixir-lang.org)). It's quite nice to
work with.

~~~
brokentone
This does not seem relevant to Go or Python.

~~~
flippant
Erlang is mentioned as a solution on page 6.

>[Go is] way to easy to program than Erlang

------
mbreese
Can anyone comment on what the CMS DAS web service is? I'm having a hard time
understanding what it is supposed to do. I'm sure the audience knew or maybe
it's obvious and I'm just missing something.

~~~
andrioni
I'd guess it's this system:
[https://cmsweb.cern.ch/das/](https://cmsweb.cern.ch/das/)

~~~
mbreese
CMS = Compact Muon Solenoid particle detector

DAS = data aggregation service

------
cptwunderlich
Look at the scales for the graphs on page 9. What a ridiculous comparison...

------
esseti
are the conclusion true in general? I mean, sw written in go performs better
than the one written in python

~~~
mhd
Software rewritten in Python often performs better than the original in
Python, too.

------
SjuulJanssen
I think a more true comparison would be if the author used a reactor/async
based solution in his python code

