
Einstein Analytics and Go - bsg75
https://stackoverflow.blog/2019/10/07/how-salesforce-converted-einstein-analytics-to-go/
======
erokar
Of all new programming languages that have received a certain amount of hype,
Go is the only one I never felt inclined to play with after reading about it.
It just seems too dumbed-down and backwards. It's like someone wished CS
stopped evolving after 1970 and designed a language to pretend it did. The
simplest of abstractions are removed from the language, like operations on
collections that abstracts away for loops (e.g. filter, map, reduce, etc.).

I don't doubt that it has good CPU performance and concurrency, the developer
experience just seems so frustrating.

~~~
tschellenbach
I like that it's more productive to work with than C++ and about 20-40 times
faster than Python (more if you take the concurrency improvements into
account). That's a perfect sweet spot for many companies.

~~~
weberc2
I like that the tools just work. I don't have to learn a DSL just to write my
own build system for my project (unlike CMake, gradle, opam, etc); it's just
`go build` or `go test`. And it works for almost everyone's project, so anyone
can contribute to anyone else's project without learning a crappy one-off
build system just to compile the code or run the tests. Similarly, I don't
have to learn a special comment syntacs for docs nor write CI jobs to publish
code or documentation packages--docs are just comments above functions and
code and docs are automatically published when you push to version control.
Also, `gofmt` is ubiquitous, so everyone has the same style.

Go was built for software development, and it addresses real, practical
problems.

------
twotwotwo
The post dates the rewrite effort as happening in 2016-7. In the past couple
years escape analysis has gotten a lot of effort
([https://github.com/golang/go/issues/23109](https://github.com/golang/go/issues/23109))
and there's also now mid-stack inlining
([https://github.com/golang/go/issues/19348](https://github.com/golang/go/issues/19348))
which obviously can be useful in itself but also help escape analysis and
other optimizations work.

Does anyone know if there's any big picture work on GC throughput in progress?
The idea of the ROC (fast freeing of memory never shared between threads on
goroutine exit) was canned since it regressed on some workloads and I'm having
trouble finding signs Austin's idea with hashing memory landed either. A
moving GC or a read barrier seem mostly out because they can hurt other things
(C interop, performance of code that doesn't allocate much). Though they seem
to have addressed pauses quite well, and lots of code out there isn't GC-
heavy, it still seems like a key area where it might be possible to give a
"free" throughput win to existing code.

------
moksly
GO is always listed as popular, but if I search my entire country for GO jobs
there isn’t a single one. Well, there is one single listing, but that’s for a
C/C++ job at Google, and it’s listed under “Nice to know” along with Rust and
a few others.

Are we behind the curve?

~~~
peterwwillis
It's a trendy language. It will increase in popularity (e.g. developers want
to learn it because it's new) until it eventually wanes. Remember when
everybody wanted to write Ruby?

This page shows job postings per programming language from searches on
Indeed.com: [https://www.codeplatoon.org/the-best-paying-and-most-in-
dema...](https://www.codeplatoon.org/the-best-paying-and-most-in-demand-
programming-languages-in-2019/) Go isn't on the list. The TIOBE index has Go
at #17: [https://www.tiobe.com/tiobe-index/](https://www.tiobe.com/tiobe-
index/)

~~~
fjp
> Remember when everybody wanted to write Ruby?

There is an absolutely mind-boggling amount of Ruby jobs in the US.

------
derivagral
I found an interesting subtlety to the message here. They had a Python wrapper
around a C library, and effectively never updated the library. I wonder if
this was due to engineering decisions, hiring/expertise of their staff (or
future hires), or something else (risk? etc). I imagine it is easier these
days to find Go devs than C devs who want to work at a company like this?

Also, for a sense of scale, this rewrite took about 4 years to complete from
first concepts.

~~~
sho
For 9 out of 10 articles like this - "We rewrote our old code in tired old
language FOO with hot new language BAR and it's SO MUCH BETTER!" \- I suspect
they could have simply rewritten with the same language, engineered it better
and applied all the lessons learned, and achieved basically the same result,
probably faster. But that doesn't scratch the "new hotness" itch, does it?

That said, considering the requirements this thing sounds like it had to meet,
perhaps golang was the right choice after all. Unlike certain other projects I
know about where they HAD to write everything in golang/microservices/k8s/etc
because it HAD TO SCALE, took 18 months instead of the 3 months it would have
taken with rails but credit where credit is due - those 2 or 3 requests a
minute (peak) are handled very, very quickly.

~~~
sidlls
That may be true in many cases, but this article really resonated with me. I
spent more years than I want to admit working on a large machine learning and
analytics platform that was originally built with Python. It was, frankly, a
nightmare. The dependency problems, the performance issues, the problems with
typing and use of data structures were all huge headaches to manage. I have a
blog in the engineering organization I work in where I'm basically publishing
a series of articles that reflect a similar path the Salesforce article
describes, for basically the same reasons. (My group is using Java, not Go,
though.)

~~~
blub
You're doing ML in Java? :) What are the equivalents of some of the popular
Python libs for that?

~~~
roseway4
DL4J and the ND4J library are a good start and at a high-level offer
equivalent functionality to tensorflow, pytorch, and pandas:
[https://github.com/eclipse/deeplearning4j](https://github.com/eclipse/deeplearning4j)

------
thanatropism
I'm reminded of the trend of starting systems in Lisp (I believe this included
reddit) and then rewriting in Python.

Maybe the Blub paradox has been hitting its limits. The point of using
leftfield expressive languages was that developer time is orders of magnitude
more expensive than machine chugging time. But maybe web-scale-big-data-etc.
needs enough (latency) juice that the scales tip and the comparative advantage
is clearly on the side of adopting "industrial-strength" enterprisey
technologies again.

~~~
robbrit
They touched on this in the article, but I'll go into more detail here since
it's been fashionable for years to bash on "Blub" languages like Java or Go. I
myself was guilty of this for a long time until I started using these
languages in settings where they shine, and developed an appreciation for
them.

The argument is nothing to do with machine chugging time, and is entirely
towards developer time. The problem with expressive languages like Lisp, Ruby,
Python, etc. is that the language ends up varying from person to person - the
more expressive the language, the more variance there is. This is a feature
when you're a small team because the abstractions you build let you move
quickly, but it is a bug when you're a large team maintaining a piece of
software over years, where developers have come and gone. The ramp-up time to
learn and understand the various abstractions that people have built over the
years ends up accumulating and cancelling out the gains that those
abstractions gave earlier on.

Blub languages on the other hand tend to be more uniform, so it's easier for
someone who isn't very familiar with the code to dive in and understand what
is going on.

~~~
blub
Java is no longer uniform, especially in the latest versions they're adding
more and more features. And there's also Kotlin and project Lombok for people
that want even more excitement.

And yet Java somehow manages to be both a boring language and still have too
many things to lean. This achievement is probably not appreciated enough by
those that criticize it.

------
weberc2
> After these ports, our team has built up some expertise with Go and its
> compiler quirks. But you can still get burned. For example, you can very
> easily write data that you want to place on the cheaper stack instead onto
> the much more expensive heap. You won’t even know this is happening by
> reading over your code. That’s why, as with any new language that you
> require high performance from, you need to monitor processes closely and
> create benchmarks around CPU and memory use. And then share what you learn
> with the community so that this knowledge becomes less tribal.

While Go doesn't have formal semantics that let you specify whether a value is
allocated on the heap or stack, it seems like it would be easy enough to
create your own. The compiler has flags that cause it to output the sites
where it heap-allocates--if you add comments above a function or an allocation
site such as `// noalloc`, then you could write a "linter" that compares the
allocation sites against those comments and errors if one of those noalloc
sites allocates.

In lieu of allocation semantics, this seems like a better approach than
writing performance tests for each of these sites.

~~~
hinkley
There are at least two ways to fail. You can get something wrong from the
start, or you can get it right and fail to keep it that way.

Perf sensitive code can often be that way. Some innocent coworker comes in to
add a feature and they don't get why two idioms are not equivalent from the
compiler's perspective, and so they change things.

Meanwhile, you're busy doing something else. The code is still correct from a
testing standpoint. It still satisfies your code standards. But it no longer
meets your response time expectations. It's a lot of work to maintain the sort
of toolchain that lets you reliably spot this sort of thing at build time or
even in pre-prod environments, and it's difficult to identify places where
those tools are failing to detect problems until there's been a problem.

~~~
weberc2
I agree with all of this. My proposal addresses exactly this problem for the
allocation related bottlenecks. CI will fail if someone introduces an
allocation in a noalloc—annotated block.

------
scythe
Previous discussions here:

[https://news.ycombinator.com/item?id=21203651](https://news.ycombinator.com/item?id=21203651)

I worked on this project; my comment is here:

[https://news.ycombinator.com/item?id=21205504](https://news.ycombinator.com/item?id=21205504)

------
SnarkAsh
Why is this on the Stack Overflow blog?

When I replied to their survey that I wanted "tech articles written by other
developers", I was imagining a platform for Stack Overflow authors to
contribute longer-form work -- an idea that's been floated by staff for most
of the life of the site! I wasn't expecting random cross-promotional content.

------
trimbo
Java was not considered?

~~~
blub
One one hand one would be tempted to dismiss Java as uncool, but on the other
hand we're talking about Salesforce, so this is a very legitimate question.

The truth is probably rather ordinary: even dull corporations want to look
cool.

------
elisharobinson
what is this strange obsession with type , it improves performance yeah sure .
but to go around claiming that it is a panacea for all things poorly planned
is pure unadulterated stupidity. i can concede that some immutable data
structures are needed , but types not for me .

Python doesnt handle threads well this can be true. also if you are straining
python threading system you ought to take a look at your own abstractions and
design. chances are you wont go very far before you hit the same bottle neck
in a different language if you assumed python gave up at X you could achive
100X with go but you will almost certainly hit the same issue when your data
grows 100X.( i am aware of computational complexity is a factor. but the
author claims that part is handled in C , from what is described it looks like
they replaced flask with go)

~~~
wyldfire
> what is this strange obsession with type , it improves performance yeah sure

Static typing allows for a lot of static decisions which in general brings
better performance for a language. CPython pushes so many decisions to runtime
that it is a very significant performance impact.

But the article cites more significant challenges related to type. Not
performance, but design:

> "First, Python uses loose typing, which was great for a small team rapidly
> developing new ideas and putting them into production –but less great for an
> enterprise-scale application that some customers were paying millions of
> dollars for," he writes.

Cue the reminder that in fact Python is strongly (but dynamically) typed. But
LeStum's point stands: dynamic type hurts developers trying to read/write
unfamiliar python code. The mypy static typing should help out a lot but I
don't think it's very popular yet.

For performance, IMO in this order you should consider (1) PyPy, (2)
multiprocessing, (3) cython and/or c-extensions. (and I suppose implicitly (0)
analyze your algorithm, exploit numpy where possible). If you exhaust those,
Go seems like a great alternative.

~~~
fiedzia
> For performance, IMO in this order you should consider (1) PyPy, (2)
> multiprocessing, (3) cython and/or c-extensions

They are all very limited in what they can do (neither can replace others) and
bring tons of complications. Modern languages can give you all of that without
big issues.

------
crimsonalucard
Analytics needs unstructured data in order to function well as a modular
service. Golangs lack of coproducts makes such data much harder to deal with.

As much as I hate js. Nodejs is actually better for one of these servers. Then
again it's mostly the database that determines the schemas.

------
jbergens
With the current upwards trend for python I wonder how many companies will
come to the same conclusion the next 5 years?

Maybe ending up rewriting large parts of their system. If not to Golang then
to some other fast and statically typed language like c# or java.

------
SomaticPirate
I’m curious to what tribal knowledge they are referring to? What are these
incantations to make my Go code faster? Does this mean just stripping debug
symbols or are there flags I should know about?

~~~
weberc2
They were talking about writing code that doesn't allocate in the hot path.
Mostly this is pretty easy to eye-ball, but you can make sure it doesn't
allocate by passing `-gcflags -m` to the `go build` command. This is almost
always the initial bottleneck in naive Go code.

------
lugg
> First, Python uses loose typing, which was great for a small team rapidly
> developing new ideas and putting them into production—but less great for an
> enterprise-scale application that some customers were paying millions of
> dollars for.

Code review is supposed to catch people trying to mix return types, mess with
or overload parameters in unexpected ways.

Static typing removes this need but adds a million more.

If your people aren't catching this stuff in review, they're probably not
catching other stuff, that is, static typing isn't going to save you like you
think it is.

------
thumbsdownnow
Simplicity is king.

