
Gotchas from Two Years with Node - scapbi
https://segment.com/blog/gotchas-from-two-years-of-node/
======
jonpress
The root of the eventloop issue is not Node-specific. The real underlying
issue here is that one CPU core is given too much work while others are more
or less idle. Offloading some of the work to another process naturally solves
this issue - It doesn't really matter that this other process is a Go program
or a Node.js one - Both approaches would have solved the problem. Attributing
credit to Go itself for solving the issue is disingenuous.

If you ran Go as a single thread, you would also run into similar issues. The
main advantage of Go is that it makes it easier to parallelize your code
thanks to goroutines and channels (a single source file can encapsulate the
logic of multiple concurrent processes).

That said, I find that this 'ease of concurrency' makes Go code less readable.
In Node.js, it's really easy to identify process boundaries since the
child_process module forces you to put code into different files and
communicate via loosely coupled IPC channels.

Most of the Node.js vs Go arguments are weak. It's surprising that Node.js is
still outpacing Go in popularity in spite of all this slander.

~~~
nemothekid
> _Most of the Node.js vs Go arguments are weak. It 's surprising that Node.js
> is still outpacing Go in popularity in spite of all this slander._

Thats a rather defensive position for node in what is a very rare use case for
the language. It's _unsurprising_ Node.js is still outpacing Go, given the
large number of JS developers and the fact that Go is pretty much worthless
for hosting front end web applications (you won't find your favorite asset
pipeline in Go).

Its not surprising at all that they switched to a different language for data
processing & pipelines, and its still somewhat surprising as that they chose
Go, given that most teams in a situation like this would switch to the even
more popular JVM/Spark/Storm/Kafka stack.

Finally, your statement that _The real underlying issue here is that one CPU
core is given too much work while others are more or less idle._ isn't
accurate - the issue is one _thread_ was has too much work - no modern OS
built in the last 20 years would allow a single process to hog all the CPU
time unless you explicitly turned off the kernel's scheduling. The root of the
eventloop issue, is _eventloop_ specific and its even more Node-specific since
the event loop is pretty much the only way to achieve concurrency in node.
Other languages (like Go and Java) at least have options for other models of
concurrency.

Consider the following - what if _both_ processes got tied up? Do you just
start another process? Would it feasible or wise to run 1000 processes (no it
wont)? However this is a problem that you won't come across in Go by using
goroutines and taking advantage of its scheduler, as you can easily run 1000s
of goroutines performantly.

That said - this is a rather narrow use case to make the judgement that one
language is better than the other - its just the case that Go is likely better
suited for these kind of services.

~~~
mercurial
> the issue is one thread was has too much work

The underlying issue is that when you have a consumer-facing API which accepts
HTTP requests with a body, the first thing you should think about is limits.

> Consider the following - what if both processes got tied up? Do you just
> start another process? Would it feasible or wise to run 1000 processes (no
> it wont)? However this is a problem that you won't come across in Go by
> using goroutines and taking advantage of its scheduler, as you can easily
> run 1000s of goroutines performantly.

I have no experience of go, but my understanding is that goroutines are green
threads multiplexed over a small thread pool. If you get 5 MB of JSON in N
different requests (N=number of cores) at the same time, I don't see go
generating free CPU time out of thin air. The usual way to go about these
things in a language without multithreading is to have a queue and a process
pool, but this also won't magically solve the issue if all cores are busy.

~~~
nemothekid
> _If you get 5 MB of JSON in N different requests (N=number of cores) at the
> same time, I don 't see go generating free CPU time out of thin air._

You don't, but the scheduler normally won't allow one thread to completely
starve the cpu. Of course, its clear they should be using limits, however JVM,
glibc threads scheduler or Go's green threads likely wouldn't allow a single
thread to completely starve the CPU, eventually the scheduler will step in and
divert resources to another thread.

Without limits in a threaded solution, you would see the latency increase, but
you wouldn't see the application stop taking requests altogether.

However there are real benefits for having an event loop concurrency, so this
shouldn't be taken as a reason one model is strictly better than another.

------
spion
All these problems are solvable (streams by ditching built-in streams and
replacing them with something decent, errors with promises + typescript, event
loop blocking with a streaming JSON parser). But that still means that out of
the box node is a pretty unsatisfying experience all around.

It took us a year to arrive at the solutions above, and many prominent members
of the community scoffed at a number of them. Some are still scoffed at.

For example, everyone still thinks that promises should not catch errors, as
if somehow throwing on typos is useful. Its not. The real solution here is a
type system, the sensible error capturing model of promises that doesn't
destroy all assumptions about code (unlike domains), and a sensible library
like bluebird that reports rather than swallows up unhandled errors.

People also swore that streams are the best thing ever but the reality is that
the built in streams are an organically grown design that accreted many flaws
along the way. Most of them can be mended though and with streams 3 things are
finally starting to be acceptable.

~~~
xhrpost
> "ditching built-in streams and replacing them with something decent"

What have you found sufficient for this?

~~~
spion
We have an inhouse solution which is basically a set of external functions
that work with built in streams in a way that avoids frustration.

Some of them can be found in [https://github.com/spion/promise-
streams](https://github.com/spion/promise-streams) \- the others we haven't
extracted to an npm module yet. (Promise streams also has a minimal extension
of built in streams to make them work better with promises).

We never managed to find a full replacement though. Probably not worth the
effort either given that it will always have to wrap existing streams, and the
only drawback of external functions is that you cannot invoke them as methods.
Although, one contender that I have hopes for is WHATWG streams [1].

note: to replace event emitters we built [https://github.com/doxout/promise-
observer](https://github.com/doxout/promise-observer) which gives you a lot
more power and control in terms of execution order and goes away with the
"multiple events per object" stringy design that makes things harder for
typesystems like typescript/flow.

[1]: [https://github.com/whatwg/streams](https://github.com/whatwg/streams)

------
ExpiredLink
Why not Java EE? Everything out of the box. Tried, proven, standardized. Well
suited for startups:

[http://www.adam-
bien.com/roller/abien/entry/a_java_ee_startu...](http://www.adam-
bien.com/roller/abien/entry/a_java_ee_startup_getting)

~~~
smegel
> Everything out of the box.

Everything except that special sauce called green threads (or goroutines).
They are quite popular these days for good reason - callbacks suck.

~~~
pron
Blatant self-promotion here, but Java's got fibers (aka lightweight threads,
aka goroutines), too:
[https://github.com/puniverse/quasar](https://github.com/puniverse/quasar)

BTW, I don't like the term green threads because they have a connotation of
being scheduled onto a single OS threads, while fibers/lightweight threads
employ parallelism.

------
rvirding
Why am I getting this strong deja vu feeling? Hmm. Oh, now I know! These were
exactly the type of problems we were attacking, and solving, 25 years ago when
we were creating Erlang. How you build concurrent, fault-tolerant and non-
blocking systems with low-latency. And about 10 years ago there came
implementations of Erlang which handled multi-core transparently and provided
things like load balancing automatically.

I would say you would have to have really extreme requirements to handle those
type of things yourself.

------
klodolph
> Plenty of times, there will be an uncaught exception which–through no fault
> of your own–bubbles up and kills the whole process.

Really? I'm just a beginner with node.js, and I've been deeply frustrated by
error handling, but if this is true, that's pretty damning. In just about
every other web framework under the sun, you can go wild with exceptions and
the worst you'll get is a 500 response for _that_ request. (Yes, worse
behavior is possible but very uncommon.)

~~~
toxicFork
Same happens with Java, C++, and many other languages.

There are web server packages on node that do catch errors that happen in
synchronous code. For async, you can use promises, which take errors into
consideration.

I'm not sure what happens on unhandled async errors on other languages, I
guess in Java you could have a dangling request (happened to me before) or it
could crash the app (also happened to me before). C++? I haven't really done
much webserver work on that one so someone else may be able to write more
about it.

Also obviously PHP, you just get a white page or actual full error stacks sent
to the client, the joy! (edit: of course, if you don't handle it properly)

~~~
72deluxe
C++ - couldn't you just put a catch(...) within your request handler and
return 500 if that is hit?

~~~
toxicFork
And you can do the same on node :)

~~~
esailija
not if you are using any kind of event emitters. These don't participate in
any kind of flow but fire "error" events as side effect sneakily in the
background, and if there is no listener on an event emitter for the "error"
event, it will crash the server. And almost everything in core is an event
emitter.

Also bonus points if the event emitter object is private to some module that
doesn't expose it and doesn't attach an "error" event handler to it.

------
bontoJR
The switch from Node to Go seems quite popular right now and it honestly makes
me thing there's something wrong with the general perception of Node.

We are currently in a world where we have a huge amount of traffic on almost
every web app with just a discrete success, but we still make the mistake to
pick a technology that seems "good enough", instead to pick a "great one"
because looks slightly harder to manage/learn/deploy. I know that during the
early stage, pace is very important and RAILS or Node are easier and faster to
handle compare to Scala or Erlang, but sometimes another technology at the
beginning would save a lot of headaches and night calls. We still fail at the
very early stage to choice the right technology, but nobody is afraid to admit
it and to switch, I find this amazing.

~~~
hayksaakian
I think it makes sense, early on you're still "figuring it out" so a flexible
framework thats easy to dive into is advantageous.

I don't see anything wrong with planning a rewrite X months into a product,
since 90% of things don't make it to month X.

~~~
jestar_jokin
A code rewrite means time spent working on stuff that isn't delivering
features, which means your business could be stagnating, making customers
dissatisfied, giving competitors an opening.

As an example, I recently migrated a Node app from MongoDB to Postgres. This
ended up taking two and a half weeks, due to re-writing a fair portion of the
server-side code. That's a long time to go without delivering new features or
fixes. We justified it because we had inexplicable data loss (not pinpointed
on MongoDB, but a poor reputation is a hard thing to remedy) and our data
model did not suit a document store. But you have to then accept it when the
business folks say "well, why didn't you get it right the first time? Aren't
you supposed to be the expert?".

As technologists, of course we find it fun to try new technologies. But
outside of the main tech hubs, a large proportion of developers aren't working
for tech companies whose main product consists of web services/APIs, in which
case we need to always consider the costs/benefits of any tech switch. If it's
not justified, you're stuck supporting flakey apps until you can move on to
the next gig / learning experience.

~~~
nemothekid
As a counter-example, I recently read Twitter was built on Rails _6 months_
after it was released. Once the concept was proven out, and the production app
was failing like crazy (fail whale everywhere) - they rewrote their entire
stack in Scala/on the JVM.

Now should have Twitter spent the first 6 months of their life building out
the perfect infrastructure with proven tools spending the little money they
had mainly on engineering or were the justified in pushing that technical debt
down the road to focus on other things today.

It seems to me, for most startups, that the marginal cost of building it
"right" today is much higher than rewriting when you can/if you need to.

~~~
jon-wood
Twitter was also originally built as a side project to amuse some friends. It
wouldn't surprise me if someone took the view that they were writing a stupid
little throwaway thing which everyone would probably get bored of, so why not
try learning this hot new web framework everyone's talking about.

------
cpprototypes
Node has a lot of flaws today. But, even with those flaws, it's very useful
for quick prototyping (discipline is required here since it must be a true
prototype which means throwing away the code) and command line scripting (for
me, it has mostly replaced python for small scripting tasks). Using it in a
production service is possible, but should be limited to simple services (not
too much business logic and it should be IO heavy).

But there is a lot of potential in the future. ES6 greatly improves the
Javascript language and it's possible ES7 or 8 will add types. If something
like TypeScript becomes standard Javascript and async/await is added, the
language will become a "serious" language for many developers.

I think the Node of the future will look nothing like the Node of today. When
Javascript/Node gains these features, it has the potential to become a true
server side language.

~~~
mark_l_watson
I agree that JavaScript will get better with ES7, etc. that said, I am half
way throw the edX Typescript class and it offers a lot of what ES6 and future
releases will offer today. I also really like ClojureScript but TypeScript is
an incredibly well designed language.

------
vkjv
I've had some similar experiences as we've scaled from internal only
applications a couple years ago, to handling thousands of requests a second at
peak.

1\. Event loop. Yep, always be careful not to block it. But, I think this is
more of a tuning thing. If you're using Java, you figure out how many requests
you can reasonably handle and make that your water mark. Do the same thing in
node, if you take too many connections, start refusing them fast and not
queuing them forever.

2\. Exceptions. Honestly this has never been an issue. We've used Promises for
async flow from the beginning. This means that there is nearly zero code that
is not inside of a promise chain and hence inside of a try {} catch {}. We
just don't have this problem and we don't crash.

3 / 4\. Oddly enough, I found streams to be a solution to this problem and not
a cause, if using back pressure properly. I highly recommend highland.js.

------
drapper
Some good stuff there, but I thought "don't do any heavy processing in
consumer-facing node.js instance" was part of Node's 101.

------
moonlighter
"To further avoid event loop problems entirely, we’ve started switching more
of our data processing services to Go".

This seems to become a common pattern... start with Node, hit some limits,
rewrite microservices in Go.

------
dreamdu5t
The one problem they use as an example wasn't solved by switching to Go. Go is
great but let's be real, parsing and concurrency problems don't just magically
go away by switching languages.

~~~
moonlighter
Concurrency issues don't magically go away by switching to another language,
but they have a much better chance of getting cleanly addressed by switching
to a language which has explicit/native features as part of its design to help
address concurrency issues, via goroutines/channels in this case.

~~~
mietek
_> by switching to a language which has explicit/native features as part of
its design to help address concurrency issues_

…such as Haskell or Erlang.

------
eva1984
Just curious though, I have seem quite a number of posts in the past year
mentioned the switch from Node to Go, is there a pattern here?Why Go in
particular?

~~~
mhogomchungu
> Why Go in particular?

Its the new shiny thing.

~~~
Kurtz79
I think go and node are more or less of the same "age" and "shininess"?

EDIT: Very much so, they were actually released to the public in 2009.

~~~
Macha
Ruby was first released in the mid 90s and became the shiny new thing in
around 2005 when ten years old.

Sometimes the shiny newness is in the public eye rather than absolute terms.

~~~
Kurtz79
Not really, Ruby ON RAILS became the shiny new thing, and was released in
2005.

------
davidw
> At any given time there’s only a single running code block.

> But here… there be dragons.

That's one of the things Erlang solves pretty well. Sometimes I sort of
envision it as that robot dog/mule thing that they keep kicking and it gets
back up and keeps going.

[https://www.youtube.com/watch?v=cNZPRsrwumQ](https://www.youtube.com/watch?v=cNZPRsrwumQ)
if you have never seen it before.

~~~
perishabledave
Segment.io seems like the type of challenge that would be right in Erlang's
sweet spot. The talk that Whatsapp gave on scaling Erlang is a good proof
point that it's built to handle these systems quite well.

------
eeZi
PyPy (a JITed Python interpreter) is experimenting with STM to solve the
issue:

[http://morepypy.blogspot.de/2014/11/tornado-without-gil-
on-p...](http://morepypy.blogspot.de/2014/11/tornado-without-gil-on-pypy-
stm.html)

------
vosper
Their first problem was sending customer requests directly into processing.
They're collecting customer metrics - they should never be unable to handle a
request because another request has consumed all the resources.

A much better way to do this would be to have a very lightweight API putting
events into a stream (Kafka or, since they seems to be on AWS, Kinesis [1]).
Let the stream absorb that crazy customer data, and let your data processing
run full-speed from the stream. You'll get blips and slow downs, but it won't
affect your ability to receive more data. Log out any errors or malformed data
so that customers can see the problems. Do your profiling and optimisation,
but avoid losing data.

[1] We're using Kinesis. Very easy to provision, does what it says on the box,
and can easily handle thousands of requests per second.

~~~
15155
Yep. This technology is too cheap/readily available to "do it wrong."

Let the Kafka project handle the nuances of durable, performant queueing -
that's not your business model.

------
TazeTSchnitzel
Arbitrarily limiting JSON size/nesting because it blocks your event loop seems
silly. What if a customer needs to give you something taking a while to parse?

Do the parsing in a web worker.

------
Kiro
OT but can someone explain this?

PHP: Starts reading from database, blocks everything else, returns data.

Node: Starts reading from database, lets it go and continues with other stuff
until the data is ready and then returns it.

I understand it in theory but since Node is single-threaded, doesn't it need
to use that thread for the database operations? Which means it's blocking the
program until it's done anyway?

~~~
taternuts
I believe what happens is it registers the functions callback on the event-
loop which it'll check the next time around to see if it's done, then pushes
the work to be handled in a separate pool of threads that libuv manages. When
it's done, the next event-loop tick or the next time around it will execute
the callback with the results. This is how asynchronous functions work
anyways, but some functions are actually synchronous functions such as
JSON.stringify, so if you are doing a JSON.stringify on a huge JSON object,
you're literally grinding everything to a halt until you're done doing that

~~~
Kiro
Thanks. So Node is actually using multiple threads behind the scenes or did I
read that wrong? How do I know which kind of operations are pushed there by
libuv?

~~~
taternuts
Yeah, there's a pool of non-blocking C++ threads that do the work behind the
scenes. Any async operation gets pushed out, and honestly most functions will
behave asynchronously. JSON parsing is a sync operation that usually doesn't
block for very long because it will be a small operation 99.9% of the time, so
it's kind of a pitfall because you almost never see this blocking issue unless
your parsing huge JSON objects. In nodeland things are kind of expected to be
asynchronous, and if they are not they should be labeled as such so that
everyone using it knows. Checkout the FileSystem (fs) nodejs lib, you'll see
calls explicitly labeled as "sync" to denote that they are blocking.

------
aioprisan
Event loop debugging is key and I'm glad that there are some interesting
offering profiling services on this level, especially for frameworks like
Meteor: [https://kadira.io/](https://kadira.io/)

------
failedstartup2
Node, like Rails before it, was an enabler tech - it enabled non-developers to
build stuff. Those with experience would have gone direct to something like
Erlang in the first place, carefully sidestepping the dog that is Go.

------
lobster_johnson
Anyone else thinking this sort of blog post is actually really bad PR?

There are some very obvious architectural issues here, issues that _aren 't_
tied to the choice of Node.js. I wouldn't want to use a provider whose stack
was this immature, to be perfectly blunt.

For example, an API shouldn't do any of the processing they describe. The
API's role is to handle submitted requests as fast as possible, so it must
off-load the actual work to a queue that can be processed elsewhere.

I'm also skeptical of solutions like YAL, at least the way it seems to be used
here. IMHO you'll want local spooling via something like rsyslog, so that
you're impervious to network failures. Logging directly to a remote server
means you become completely reliant on that server.

------
swang
I wonder why they don't use a stream JSON parser like oboe.js?

------
vans
"Simple exceptions should be caught using a linter. There’s no reason to have
bugs for undefined vars when they could be caught with some basic automation
[...} It catches unhandled errors and undefined variables before they even get
pushed to production. " And there are people still thinking untyped languages
are good for the server...

"Node is good when you're using a linter, when you're writing your back end in
go, when you write MASSIVE unit tests..." May be it could be much simpler to
use a more suitable langage, no ? Just saying...

