The only way this makes sense to me is if they have to contend with lots of expe...

sho · on Dec 13, 2019

> explore Go or Elixir

I have never seen a good argument for using golang for business logic. If you are writing the actual server then sure, use golang. If you are writing some high-speed network interconnect, use golang. Some crazy caching system, sure use golang. The public WS endpoint, use golang.

But if you need to access a DB with golang for anything more than, like, a session token, then you made the wrong choice and you need to go back and re-assess.

Elixir is in the "germination phase" and I predict massive adoption in the next 5 years. It is a truly excellent platform, every fintech company I know at least has their toe in the water. Everyone I show this video to [1] just says "well, shit."

[1] https://www.youtube.com/watch?v=JvBT4XBdoUE

yurish · on Dec 13, 2019

What is wrong with accessing DB from golang?

spamizbad · on Dec 13, 2019

Nothing. But I imagine with “business logic” you’d favor expressiveness over speed and type safety.

bjacokes · on Dec 13, 2019

You hit the nail on the head here. When N different API requests simultaneously time out – all because a ramda.uniq call in one of them received an array of 100,000 nested objects – it's easy to make a spot code fix, but harder to systematically prevent it from happening in the future. There aren't really linters for "bad event loop blockage". Code reviews are the main tool we have, but you'd be surprised what sorts of logic can trickily block the event loop. For API reliability and development velocity in the short-term, by far the easiest approach was to throw more infrastructure at the problem.

We do use Go for almost all of our other services, and there are an increasing number of integrations written in Python. But we're still using and investing in our Node integrations code for the foreseeable future, and this was an important step for simplifying our infrastructure.

We certainly hope the tooling and rollout process in the post were instructive for anyone using Node, even if their stacks were pristine from day 1 and never need this sort of complex migration :)

stickfigure · on Dec 13, 2019

I'd recommend moving away from Node...

Taking a wild guess: Some of their bank integrations probably require browser automation. If you're doing browser automation, the best tool for the job is (currently) Puppeteer, which runs on Node. There are other third-party language bindings for the Chrome dev tools protocol, but Puppeteer is developed by Google as a first-class citizen alongside Chrome.

paulddraper · on Dec 13, 2019

I think that overemphasizes Pupeteer itself.

It's really just bindings for the dev tools protocol.

Half the GitHub issues result in "well the protocol requires X and we can't change that".

Pupeteer is popular because it's web automation protocol bindings for a web language, not because it a sophisticated layer or does very much.

There are literally dozens of language bindings for the protocol. [1] Some are quite good and widely used, for example chromedp (Go bindings). [2]

[1] https://github.com/ChromeDevTools/awesome-chrome-devtools#pr...

[2] https://github.com/chromedp/chromedp

rdsubhas · on Dec 13, 2019

4000 chrome instances? Probably not. Here I am trying to run 4 chrome instances in parallel in CI without crashing.

stickfigure · on Dec 13, 2019

Presumably not every integration requires browser automation, so they might not all be going at once. But they have a $25k monthly EC2 bill, so it's not out of the ballpark.

FWIW, I reliably have 6 puppeteer/chrome instances (headful, even) going on a single box and it's not even at half capacity.

rmetzler · on Dec 14, 2019

Check out the selenoid project. It’s like a selenium grid but dockerized with novnc, etc.

duxup · on Dec 13, 2019

That was my thought to. They've got a problem where they've got no idea what a given transaction costs and some unpredictable amount of transactions result in some serious work that holds up the event queue.

God knows they could be waiting for some reel to reel tape to spin up somewhere...

coddle-hark · on Dec 13, 2019

The whole point of async I/O is to be able to do something useful while waiting for tape to spin up.

I don’t buy it.

Slartie · on Dec 13, 2019

But the whole point of synchronous I/O is to isolate the programmer from having to think about that spinning tape up takes a non-zero time. I have a feeling that this gets lost sometimes in all that "async I/O is the GREATEST!" craze.

Async is nice - if you can handle it. But this is not easy to do in complex systems and processes. It is certainly easier to work with an old-fashioned process that blocks when waiting for whatever you need to wait for, and just scale by letting the OS run lots of those in parallel. Sure, it's less efficient. But it's easier for the devs to handle.

I just read the hidden undertone of this article as "our devs aren't that smart after all".

spamizbad · on Dec 13, 2019

But you need to know if you can do that something first, or if you've done that something too many times in the last N minutes (and could get blocked, forcing thousands of other somethings to get endlessly queued). Or if that something could take too long, and actually you could be doing 200 other somethings in the same time etc. It's not that simple.

duxup · on Dec 13, 2019

The article certainly raises more questions than answers that's for sure.

kreetx · on Dec 13, 2019

Haskell also has very nice concurrency IMO.

markstos · on Dec 13, 2019

Their velocity might have been slowed by figuring out how to manage 4,000 containers effectively. If they had dealt with managing concurrency effectively sooner, they would need 30x less containers-- 133.

tracker1 · on Dec 13, 2019

Not so much, they're using ECS which takes care of a lot of those headaches and sounds like they're coordinating with a load balancer / reverse proxy for distributing those requests... A 1-1 request model in that kind of system is really simple to setup. Setting up to orchestrate multiple requests per node was probably much more time intensive.

https://aws.amazon.com/ecs/