
Why we’re writing machine learning infrastructure in Go, not Python - calebkaiser
https://towardsdatascience.com/why-were-writing-machine-learning-infrastructure-in-go-not-python-38d6a37e2d76
======
tus88
Sounds like fairly generic deployment infrastructure that has nothing to do
with machine learning.

But why pass up the opportunity to use a buzzword to get on the front page of
HN?

~~~
SQueeeeeL
I blame us for upvoting them

------
therealrootuser
I think the real lesson here is to choose the language that works best for
your team.

On my team we use Python and Scala. For network critical I/O stuff in Python,
asyncio has worked out just fine for our needs. For massive CPU parallelism
needs (at least in sporadic bursts), we've actually found that AWS/Lambda does
pretty well.

Golang seems to be really polarizing. Most engineers on my team have tried
Golang in the past, but haven't liked it, which is why we would never consider
building anything on top of it. Everyone likes Python well-enough that it has
kind of become the lingua franca for us.

Deployment is all based around containers or serverless/lambda, and we have a
pretty standardized way of deploying these things by now. Just because a bunch
of k8s tooling is written in Golang doesn't mean I need to rush out and write
my stuff in Golang too.

------
cutler
I'm a bit green on infrastructure & deployment but I don't quite get this. If
your ML algorithm code is still Python how does deployment with Go make that
much difference? It sounds like you're not replacing the Python ML code so why
is this such a big deal?

~~~
cigaaa
I don’t think people who work in infrastructure currently will be surprised
that Go is a better choice than Python for infra, but for those who are newer
to the field of ML or only work on model development (vs deployment), it is
likely surprising that a major part of production ML is best done in a
language other than Python.

~~~
threeseed
I don't buy this comment at all.

I've worked with hundreds of Data Scientists, many new to the industry and
they all know that R and Scala are important and popular languages for ML.

Majority of Data Engineering today is using Spark, which is written in Scala
and even when you write Python code using it you can't escape Java/Scala
internals being exposed.

~~~
7thaccount
Is Scala really that prevalent? R & Python for sure.

------
tracker1
I think, like TFA says, it comes down to ease of deployment for support
tooling. A single executable is easier to distribute than a set of
dependencies and a language runtime. These are tools that run outside
containers to manage code that can run inside containers, where dependency
management and isolation are easier. It makes total sense to me.

------
sandGorgon
I would do it in python using one of the fast, modern ASGI servers like
uvicorn.

Zero downtime model updates can be done using a redis cache to persist models.

In any case, that's a solved problem using haproxy and kubernetes.

Not sure why go has these advantages

~~~
hnaccy
If you do model inference in web server process it will be compute bound and
lock up the web server, is there a preferred/clean way to req/rec or similar
pass the jobs to second process and allow web server process to non-blocking
wait for response?

~~~
woeirua
Yes, use a dedicated job server/process like Celery.

------
rezeroed
I would've chosen Erlang or Elixir for those reasons. Are we getting another
Go package management solution this year? A pleasure to work with? I've been
ditching Go for Nim recently. Other people seem to be enjoying Crystal. Rust
is great, and coming down the road Zig looks excellent. I think Go has turned
out to be a bit of a damp squib. Considering, unlike the other languages, it
has Google behind it - unimpressed. After six years, I don't expect to be
using it at all within the next year or two.

------
luord
Great, another one of these articles, but this time I feel more confident in
my usual reply, having been working in Go exclusively for a while.

> Implementing all of this functionality in Python may be doable with recent
> tools like asyncio, but the fact that Go is designed with this use case in
> mind makes our lives much easier.

This just makes me think about Armin Ronacher's article on back pressure but,
sure, whatever.

> Building a cross-platform CLI is easier in Go

No, it isn't.

> The performance benefits of a compiled Go binary versus an interpreted
> language are also significant

Ah, yes, because performance is such a key feature of command line interfaces,
as evidenced by bash and its _outstanding_ performance in every benchmark.

> The Go ecosystem is great for infrastructure projects

And the reality discussed in this point would be different if docker wasn't
written in Go. Had the docker developers chose _anything else_ , this point
would apply to that hypothetical language, so it isn't an inherent advantage
of Go as a language.

> Go is just a pleasure to work with

No, it really, _really_ isn't, but that's not the point.

This is ultimately the real reason they chose go: whoever made the original
decision liked it and everything else is post-hoc rationalization.

Which is fine, most of this tends to be subjective.

------
Runawaytrain2
All the stuff that requires speed is written in a language that compiles
directly to machine code while the machine learning libraries are all python
based. That seems standard, no?

~~~
bitexploder
I think that happens, but I don't know about standard. It is pretty obvious
and natural. However, a lot of code is written in Python and it can be hard to
move ML teams to use other tools. Many of them aren't great at programming
because their background is stats/math so it can be hard to move critical code
to more performant solutions without resistance from the teams.

------
toolslive
"in the land of the blind, the one-eyed man is king"

Golang is probably a step up from Python, but it's just that. There are a lot
of issues with Golang. From the top of my head, lack of decent error handling
(if err !=nil { return nil,err} ) or lack of decent polymorphism are the most
annoying. There's a github repo dedicated to what's bugging people:

    
    
      https://github.com/ksimka/go-is-not-good

~~~
vardump
Well... there's also other side of the coin. There's value in visibility and
lack of magic.

> lack of decent error handling (if err !=nil { return nil,err} )

Errors are in your face, instead of having exceptions performing invisible
gotos to somewhere far up in call tree. Implicit error handling is more code,
but your error handling is going to be much more robust.

> or lack of decent polymorphism...

Lack of polymorphism also means you don't have to guess about concrete types
when reading code. When troubleshooting, you can see what's going on without
going through whole inheritance tree.

Go encourages composition instead of inheritance. That's something I wish more
C++ codebases would do as well. Composition makes code inherently more
maintainable and easier to refactor.

Go tends to be easy to read and maintain. It does come with some cost. It's
just a matter where your priorities lie. Software projects spend majority of
their life as legacy, something that needs to be maintained.

~~~
allovernow
>Composition makes code inherently more maintainable and easier to refactor.

My understanding of composition comes from interfaces in C#. In my experience
composition is actually more difficult to code and refactor, because it leads
to bloat from repeated code as you cannot simply inherit. Refactoring then can
become tedious as you may need to manually edit every repeated code block.
Whereas with inheritance I define a function once and override it where
necessary.

Am I misunderstanding something?

~~~
vardump
Composition tends to lead to more code reuse. And to more code that's still
usable in unpredictable future.

Over time your whole inheritance model often turns out to be no longer viable,
when the basic assumptions made years ago are no longer valid.

Worse, because of all of the accumulated cruft, you might not even be able to
change the shape of the monster.

------
oflannabhra
Why not Swift? (I think I know the answer).

Concurrency in Swift is not yet a solved problem, but libdispatch is quite
workable (although not "elegant, out of the box" per the article).

With the work being done in Swift for TensorFlow [0], I'd imagine in a year or
two both the infrastructure and the ML portions of a product like Cortex could
be written in a single language.

[0] - [https://www.tensorflow.org/swift](https://www.tensorflow.org/swift)

~~~
calebkaiser
Swift is an interesting choice, one we haven't explored in depth. Out of
curiosity, have you done any work with Swift for Tensorflow/what has your
experience been?

~~~
Jerry2
Not him but the company I work for has moved most of our ML pipelines away
from Python and over to Swift. We started the move after Google announced
support for Swift for TF. Before the move over to Swift, we were rewriting a
lot of the data paths in C++ but were generally unhappy with the state of TF
for C++ libraries. The move to Swift hasn't been without hiccups, however, but
the community has been extremely helpful and it was worth it overall. State of
Swift for TF libraries today is quite good.

We looked at Go early on but quickly dismissed it because with the move from
Python to Go, it didn't seem like we were getting enough benefits to warrant
the amount of work that would be required.

------
kelsolaar
Not reading carefully the title might make one think that you are doing ML
with Go which by the content of the article you are obviously not. This is
almost click-bait.

------
nhumrich
> Making all of these overlapping API calls in a performative, reliable way is
> a challenge.

Pythons asyncio is pretty hard to beat. For non-cpu intensive tasks, I find it
a pleasure to work with. Goroutines can still have race conditions.

> Originally, we wrote the CLI in Python, but trying to distribute it across
> platforms proved to be too difficult

Sure, I get that go can cross-compile. But what makes python hard? Python
works on every platform, and distributing is just a "pip install" and "pip
install -u" Surely thats easier than "Download the correct binary for the
platform, unzip it, change permissions, add it to your path, then do it all
over again for every update"

I was the original author of the awseb cli and we found that pip install was
significantly less of a hurdle than a go binary and decided to do it in Python
instead. If a user on windows has a hard time installing python and pip,
telling them to drop a binary and change their path isnt going to be any
easier.

~~~
aequitas
> Python works on every platform, and distributing is just a "pip install" and
> "pip install -u"

Until you have a dependency which has a C dependency (like a crypto framework,
SQL connector, etc). Suddenly you need an entire compiler toolchain, dev
dependencies, all the library headers, and a decent amount of time. Also the
errors thrown when these compile steps fail are anything but helpful for new
users. If you are lucky there is already a wheel for your platform/arch.

> Surely thats easier than "Download the correct binary for the platform,
> unzip it, change permissions, add it to your path, then do it all over again
> for every update"

This is trivial to automate using a script and has a ton less failure modes to
deal with than Pip would have (do you have the correct Python version?, is
there a compiler installed for C modules?, etc).

~~~
workthrowaway
> Until you have a dependency which has a C dependency (like a crypto
> framework, SQL connector, etc).

but... that's not a python problem.

every time people say they are having a hard time installing a python module,
it's almost always a non-pure python module. it has an extension in c or c++.
if that tells us anything, it's that mixing c and c++ makes software hard to
install...

a fairer comparison with go here is with a go package that has extensions or
bindings written in another language.

~~~
jvolkman
Sure, but the most popular Python connectors for MySQL and PostgreSQL both
utilize C or C++ extensions, whereas the equivalent connectors for Go are both
written in pure Go. Even if it's not a problem with the language in its purest
form, it is a problem with the ecosystem.

------
flavio81
Anything is faster than CPython. Even PHP!

