
Windows 3.1 All Over Again - dsego
https://tomjoro.github.io/2017-02-03-why-reactive-fp-sucks/
======
Animats
The real "Windows 3.1 all over again" experience is mixing functions which
block with ones that are "async". If nothing blocks for long, as with
Javascript, that's fine. If everything is threaded, and threads block, that's
fine too. What works badly is mixing the two. If you block something that's
doing cooperative multitasking, the whole system stalls. This is still a big
problem inside browsers.

Python has this problem with "async". If you make a blocking call, you stall
out the "async" stuff. Go gets around this by having a "fiber" or "green
thread" mechanism underneath. If you block on a lock, that doesn't tie up a
resource needed to run other goroutines. This is why retrofitting async to a
language has problems.

~~~
vog
Note that Python has greenlet and gevent, though. It works by monkeypatching
all blocking calls in the standard library (and common external libraries such
as database client libs like psycopg2).

~~~
Animats
That's not a standard part of the new "async" feature. It's a leftover from
the Stackless spinoff of Python.

------
gizzlon
This feels like Elixir propaganda =/

Starts off nicely with a good premise, but instead of digging into the
interesting details it glosses over them and starts to preach Elixir.

I don't have any stake in this, I use neither js or Elixir, but this article
is too shallow to be really useful.

~~~
ENGNR
Yeah he stakes a bit of a stab at scala play, which as far as I can tell in
the latest release uses the same concurrency mode (actors) as Erlang anyway

------
Lerc
I tend to think of it as a pre-emption issue more than a concurrency or
parralellism issue. Endless loops cause hangs. Being able to pre-empt things
would solve a lot of the problem.

    
    
        10 print "mike is cool"
        20 goto 10
    

simply doesn't have a good analogue in javascript. Even in Web Workers a loop
like that causes the worker to be unresponsive to events. The Shared Memory
addition alows a rudimentary form of this by allowing a flag to make an aware
program exit. It wouldn't be too hard to have a BASIC to worker compiler that
compiled the above code to

    
    
        while (true) {
           print("mike is cool");
           if (shared_global_flag === true) break;
        }
    

There are a few javascript livecoding websites out there that update the code
as you type. There is a problem with this. Broken (as half written code tends
to be) javascript doesn't just fail to work, it breaks other things

for example

    
    
        var x=1;
        while (x<10) {
          output(x);
          x+=1;
        }
    

Typing this into a livecoding environment breaks because the while loop goes
on forever until you type the code to increment x. It breaks everything
because the while loop blocks you from being able to type the code that would
get you out of the situation.

I don't see the wisdom of disaallowing something due to the specter of race
conditions when its absence seems to cause a much easier to cause debilitating
problem.

~~~
greggman
apparently codepen runs the code through a parser and inserts exit checks in
all detectsble loops.

[https://blog.codepen.io/2016/06/08/can-adjust-infinite-
loop-...](https://blog.codepen.io/2016/06/08/can-adjust-infinite-loop-
protection-timing/)

~~~
greggman
as for why no races I think because it's basically impossible (or way too non
performant) to write a JavaScript engine that can handle multiple threads
deleting properties on objects. You'd have to lock on every delete.

~~~
Lerc
That's the nature of JIT though, non-performant but working is ok for the bits
that might cause trouble. JavaScript should just make fast the things it can.

I'd be ok for a light pre-emtion that would happen between javascript
statements. Instead of genuain preemtion have the Javascript engine include
break checks. A check of a flag and conditional branch (usually not taken) is
not a huge expence. The cost can be especially ameliorated if you do the check
only on statements that jump to earlier statements.

That fixes the problem of long running code stopping other things. It doen't
need system level threads or parallelism, just a js engine doing spot checks.

------
BillinghamJ
Author seems to slightly miss the point of Node. It’s designed for IO-bound
work, and essentially nothing else.

If you have a long-running synchronous algorithm, you should not be running it
in Node, or alternatively you could dispatch to another process/C lib and have
it run in a true thread and asynchronously wait for the result.

~~~
jcelerier
> Author seems to slightly miss the point of Node. It’s designed for IO-bound
> work, and essentially nothing else.

No remotely relevant app is only ever IO-bound. There are always some long
lists to sort, some graphs to walk, some intricate canvas renderings to do.

~~~
zbentley
I'm not sure about that assertion.

I work on a giant [Health|Fin|Defense]Tech monolith which has been around
forever, has do to everything for everyone, and has been worked on by hundreds
or thousands of developers with radically different skill levels. It connects
to many databases, external services, etc., and does some immensely complex
data munging just to render what you'd think are simple pages (since the
inflexibility of the backing model and limited space to make denormalizations
of really big data mean everyone has to do super complex aggregations in the
app server, across data sources X, Y, and Z, all to show the user a 10 row
table).

In short, it's huge, ugly, and computationally expensive.

I was asked to quickly research the benefits of switching its platform (a
single-threaded scripting language) to NodeJS (I wasn't told to research
anything other than NodeJS, despite my objections).

I figured the savings would be minimal, since all our application servers are
constantly running out of CPU (page loads crazy expensive, see above) or mem
(aggregations crazy expensive, see above).

So I broke down what the app was doing on some representative servers, working
both from the coarse level (dtrace/system resource usage) to the fine level
(flame graphs of calls/wait time/yield events within the application runtime
itself). I didn't profile the batch processing services; they were RPC'd to
via the renderers, and used more appropriate languages/patterns for huge-data
manipulations. As far as my profiling was concerned, they were functionally
databases.

The result? 88% on average time spent waiting on IO or blocking non-block-file
system calls. P90 was 99% blocking.

That went totally against my assumptions.

Sure, our webservers were overloaded with non-IO load, but if we were to
switch to non-blocking IO and buy more webservers, we'd have gotten a massive
performance increase _without_ having to change the fundamental architecture
of our webapp.

That was when I started seriously considering the benefits of reactive-style
programming in a single thread, a la NodeJS; it hits a nice balance between
"programmers that aren't necessarily super skilled having to engage with a
full/real concurrency system" and "do everything blocking one per process".

There are tons of downsides, of course. Switching to nonblocking IO after
spending so long in a blocking world would require both massive technical
expenditure, and would probably also require reorganizing the capacity
planning of all the other services/databases the app servers talked to, since
they'd be fielding a lot more requests. Basically, the blocking nature of the
render loops was an informal rate limiter on database queries. Parallelizing
the render loops via processes gave much more direct control of changes in
resource utilization, which is nice for proactive scaling. Additionally,
node/callback style is still harder to learn (even with async/await sugar)
than plain 'ol top-to-bottom sequential code. All that said, we'd be rewriting
code in a new platform that _looked_ different, but the code could still _do
the same things, per render, in the same order_ , which is a huge benefit.

A platform that hides preemption/concurrency while allowing people to program
in the sequential style (e.g. Erlang) _might_ have been a better fit, but . .
. we were already using one of the best M:N resource schedulers in the world,
the Linux process scheduler, to multiplex concurrent sequential processes that
were just . . . linux processes. At the end of the day, I gained a lot of
respect for the power and balance struck by single-thread/event-loop-driven
reactive runtimes like Node.

Edits: grammr.

~~~
le-mark
_So I broke down what the app was doing on some representative servers,
working both from the coarse level (dtrace /system resource usage) to the fine
level (flame graphs of calls/wait time/yield events within the application
runtime itself). I didn't profile the batch processing services; they were
RPC'd to via the renderers, and used more appropriate languages/patterns for
huge-data manipulations._

That's very interesting; you were given a huge old legacy app that had scaling
issues (cpu and memory). Presumably the business was tired of throwing
hardware at it? Or they hadn't gotten to that stage yet? Continuing, and
tasked with diagnosing the performance problems you looked at the system and
application. I can read up on dtrace, but how did you profile the application
level stuff (time/yield)? Was it some functionality provided by the run time?
Ie java visual vm for example?

This is a problem many, many companies have; ill performing legacy apps that
the "legacy" staff aren't capable of handling (because the talented people
left long ago ie "don't move my cheese"). It'd be really educational to see a
write up of this!

~~~
zbentley
It broke down roughly like this. I can't write it up, and am being vague,
because I don't wanna get yelled at, sorry. Googling the below techniques will
get you started, though.

1\. Simple resource usage (system time vs. user time, memory, etc.) got the
metrics for how long the OS thought the app was spending waiting on IO.

2\. Dtrace was able to slice those up by where/how they were being called,
which syscalls were being made, and what was being passed as arguments. This
was important for filtering out syscalls that would remain a constant, high
cost (e.g. block local file operations on old versions of Linux, which we
have, get farmed out to a thread pool in NodeJS, so I pessimistically budgeted
as if that thread pool were constantly exhausted due to volume + filesystem
overuse).

3\. In-runtime profiling. We have the equivalent of java visual VM (well, more
primitive; more like jstack plus some additional in-house nice features we
built, like speculative replay), but for our scripting language platform. That
generated flame graphs and call charts for processes. Those were somewhat
useful, but tended to fall into black boxes where things called into native
code libraries, which was where the dtrace-based filtering data was able to
help disambiguate. Using this we got a comprehensive map of "time spent
_actually waiting_ for IO".

There was a lot more to it than that, though:

Since different syscalls both had different call overhead (and different call
overhead depending on arguments supplied) and different blocking times, all 3
steps were necessary.

For example, an old monolithic chunk of code that did ten sequential queries
to already-open database connections is going to issue select(2) (or epoll or
whatever) at least ten times. Conversion to Node, and it's single-poll-per-
tick model, would make that cost vastly reduced, making it move the
performance needle a lot. Of course, that's only true if the ten queries in
question can _actually_ be parallelized, which typically requires
understanding the code . . . if it can be understood, which is not a given.

However, a page render that called ten different HTTP services would make ten
full-cost connect(2) calls in the worst case, ten low-cost (keepalive'd)
connect calls in the best case. Node would still have to make those same ten
calls, making it a less needle-moving thing to move into nonblocking IO
(though the time spent waiting for connect to complete or time out would still
not be paid directly in the render, which had to be accounted for as a
positive). And it goes deeper: depending on the services being hit, keepalive
window, and rate at which they were called during a typical server's render
operations, we had to calculate how often, say, a 50-process appserver worker
pool would be _redundantly_ connecting to those services (because separate
sibling processes can't share the sockets if they're initiating the
connection, and before you ask, I would not like to add to the chaos by
passing open file descriptors over unix sockets between uncoordinated
processes thank you very much). If the redundant connect rate was high, Node
might offer significant savings by allowing keepalive'd sharing of connections
within a single node process (we'd need many fewer of those per server than
appserver workers). If it was low, fewer savings.

TL;DR it's complicated but possible to measure this data using established
techniques. You don't have to get super far down the rabbit hole to get a
_decent_ guess as to whether it will be beneficial to performance, but
transforming _decent_ into _good_ requires a fairly thorough tour of Linux
internals.

And, as always, the decision of whether or not to switch hinged primarily on
the engineering time required, not the benefits of switching. C'est la vie.

------
baybal2
The author seem to loose the point of FP, as is the case with many web
programmers

Async != FP

Back in my high school days, FP was almost strictly about usage of pure
functions above anything else.

The late comers to the party, the React guys, then bunched FP with async
stuff, and function polymorphism. That was wholly apocryphal to the original
FP idea.

I can still cite my high school teacher citing Niklaus Wirth: "If a function
does not interact with external data in a any way other than reading them from
its inputs, the whole program can be represented as a spreadsheet"

The original advantages were states as:

1\. a program written in this manner is very easy to optimize. 2\. an
interpreter or a compiler executing such program is easier to program, and is
less error prone 3\. for as long as a kind of middleware sits in between the
"spreadsheet function," and the data, it is possible to write a continuous
execution log with ease, as well as to do tricks with getters/setters to start
execution from arbitrary points in code.

All of that is stuff from times when Oberon and Modula were "hip" (seventies)

~~~
pjmlp
Oberon was designed in 1986 and it was hip up to around 1998. :)

------
andrewguenther
> “what else am I supposed to do when I’m waiting for a result from the
> database”? I can’t do anything else until I have that result. I need the
> result to make give to a template to give to the renderer, etc.

I dunno, maybe you could serve one of your 100 other concurrent requests?

> You can do something useful in the meanwhile and get the result later

and then immediately...

> It has nothing to do with hogging or stealing resources.

It has everything to do with this. If you're blocking on something waiting for
a result while you could do something useful in the meantime, that is the
definition of hogging resources.

~~~
christophilus
His point is that blocking while waiting should not be an expensive thing. In
preemptive systems like BEAM, it's not. Blocking a process is not an issue. In
Node/Windows 3.1, it is a big deal, as it means starving other work.

~~~
pas
Await works the same way as BEAM, no?

In theory their performance and behavior should be the same. BEAM and other
N:M green thread systems being maybe faster if they can efficiently use
multiple threads. But since they use message passing anyway (needed for
isolation), they are again almost equivalent to running multiple Node/reactive
request handlers.

N:M is probably useful for desktop apps. But for that single thread is (should
be) always enough, so same behavior.

~~~
jerf
"In theory their performance and behavior should be the same."

This isn't about performance, it's about how you write code. In Erlang/Elixir
(and also Go and Haskell), you aren't sitting there constantly explaining to
the runtime over and over again how to plumb together bits of "async" code.
Essentially, all code is simply presumed async, and the plumbing is always
already there for you; the act of calling a function includes, for free, all
the "async" plumbing.

I expect anyone who has been doing this for a couple of months should have
noticed it's the _same_ plumbing, over and over again. Here's the long-running
IO task. Here's what to do with it when it's done. Here's what to do with
errors. Here's the next long-running IO task. Here's what to do with it when
it's done. Here's what to do with errors, which is pretty similar to the last
thing. Here's the next long-running IO task. Here's what to do with it when
it's done. Here's what to do with errors; whoops, forget that one, which is
gonna cost me during debugging but nothing in the code, runtime, or syntax
affordances particularly cares that I forgot. Here's the next long-running IO
task....

Yes, they do come out roughly the same in speed for Erlang/Elixir and JS, and
Go can quite easily be substantially _faster_... _and_ easier to write.

And if we want to discuss performance, Go still isn't as fast as it could be.
There's a solid reason why N:M threading can't be the fastest way to run code
in general, but I don't see much reason why it couldn't be something more like
20% away from C speed in general, with some pathological cases where it's much
slower and out of the question (very fast packet handling with those fancy
software TCP cards, for instance).

~~~
WorldMaker
async/await as mentioned by the previous poster is "do-notation" for the
asynchronous monad Promise<T> in modern ES/JS(/TS). It's not terribly
different from the languages you mentioned, even if it is a different approach
because the language started synchronous first. You don't have to do that much
plumbing as with the bad old JS world of Node callbacks, just sprinkle the
async/await as appropriate, make sure things stay in Promise<T> and write
things do-notation style just about like you would write the equivalent
synchronous code.

From what I've done async/await in ES (and C#) is easier to write than Go's
equivalent, but we're hitting your mileage may vary territory.

------
noncoml
Can we stop with the erlang/Elixir is slow? For most web services it is as
fast as go.

For my last side project I was evaluating go and erlang. Just to test out how
faster go is, I quickly coded a simple HTTP request in both language(I am not
an expert in either).

The server would get a request with large body, cache it to a file on the
disk, then load it in 128k chunks, get the hash of the chunk and save both the
digest and the 128k data to a database.

To my surprise, erlang was consistently slightly faster for each request.
Happy to share the code snips of both if anyone want to audit/scrutinize.

Edit: Code for both:
[https://pastebin.com/aBQWqkG3](https://pastebin.com/aBQWqkG3)

On my laptop a single request(1.2MB) takes around 230ms for Erlang and 280ms
for Go. Not a scientific test, but it gives an idea.

~~~
lobster_johnson
Your Go code uses "defer" in a for loop, which could be a performance
bottleneck. The deferred function call doesn't run until the saveToDb function
returns. A simple fix would be to reuse the rows variable and put the defer
outside the loop.

I believe it might, depending on the database driver, also cause another
issue. Since you're not closing the rows right away, and not even consuming
the rows, this may force the driver to mark the internal connection as busy
until the defer runs, meaning the next database call would have to open a new
connection. Try closing the rows as soon as you can.

If you don't expect any results, you can also use Exec instead. Then there's
nothing to close.

~~~
noncoml
Just to update that you where right. Without the `defer` now golang is as fast
as erlang.

Exec vs Query didn't make a difference.

Yet another update: things get really interesting when I run _wrk_. I don’t
want to spoil it for you. I can let you try it ;)

~~~
egisspegis
Please spoil it for us. I would love to see results, but can't run wrk myself
at the moment.

~~~
noncoml
I get 2x more requests on erlang

~~~
noncoml
OK, I was unfair to golang again. Turns out it needs a bit more tuning
"db.SetMaxOpenConns(10)" and now golang and erlang are on par.

------
le-mark
This articles is a fantastic, real world overview of cooperative and
preemptive multitasking. Note that coroutines and fibers are essentially
equivalent because they can be implemented, each with the other[1].
Unfortunately this thread (up to now) has been simply language tribal-ism.
Helps if everyone is on the same page with respect to technical terms before
that all starts.

[1]
[https://stackoverflow.com/a/3325985/2561675](https://stackoverflow.com/a/3325985/2561675)

------
spc476
There are other concepts, like coroutines. At work, I use Lua coroutines to
handle network requests. Basically, any function that can block instead starts
the IO request, then yields the current coroutine. Yeah, the management code
was a bit of a pain to write, but you can write (using the BASIC example from
the article and not Lua, but the idea is similar):

    
    
        10 A = 1
        20 A = A + 1
        30 PRINT "HELLO " + A
        40 PRINT "YES"
        50 GOTO 20
    

where `PRINT` will do a non-blocking write, if there's anything left over, set
the IO descriptor to trigger when ready for writing, and yield(). It's in the
poll loop (select(), poll(), epoll(), kevent(), whatever the API is) to
schedule the proper coroutine.

Yes, this requires making sure the code doesn't do anything CPU intensive, but
for the application I have running, that is the case. But for me, using Lua
coroutines means the main business logic is still sequential, even in an event
driving environment.

Also, because you call a function to preempt execution, you don't need to save
the entire CPU state. For C on the x86-64 bit, this is just six registers:
[https://github.com/spc476/C-Coroutines/blob/314607cf058352be...](https://github.com/spc476/C-Coroutines/blob/314607cf058352bee922a4c0d1cfc0a37646d250/coroutine_yield-x86-64.asm#L85)

------
ComodoHacker
>In Elixir (thanks to Erlang) yo have true pre-emptive multi-tasking

>An Erlang process is not a system process. An Erlang process is a lightweight
process. But here’s a key difference: there is no way in Erlang for processes
to share memory

I'm confused. Is there real concurrency in Erlang? If yes, what Erlang
processes use under the hood: system processes or threads? If threads, why
they aren't sharing memory?

~~~
ramchip
The VM has a set of threads called schedulers, all running in parallel. Each
scheduler has in turn a set of Erlang processes (possibly thousands) to run.
Every process has a pointer to its own separate chunk of heap where it
allocates its stuff.

The language has no concept of pointer, so there is no way to create a
reference to memory owned by another process. You can send a message to
another process, but internally the VM will copy the message to that process’
heap.

------
AndrewCHM
"I’ve heard a few reasons:" ... that you've considered, then cherry picked two
of them, so you can confirm your views?

To properly reason why the approach is wrong, shouldn't you consider all
significant reasons, including backwards compatibility as probably the biggest
one?

"New language that learns from the mistakes of languages before it" would
generally be better than "language and runtime that is keeping compatibility
with a programming language that was heavily rushed just to fill a feature
point for a web browser"

One might as well state that the grass is green and sky is blue, no?

------
norswap
With all due respect, "the reasons he heard" are bogus.

The only reasons there is no preemptive multithreading in Node is just that
the V8 interpreter is under GIL.

------
qualitytime
" 10 A = 1 20 A = A + 1 30 PRINT_NON_BLOCKING "HELLO" \+ "A", when done
callback { LINE 40 } 40 PRINT "YES" 50 GOTO 20 A common quiz for Javascript is
to ask: which is printed first? “YES” or “HELLO3”? "

Is that a mistake with "HELLO3"? Should it not be HELLO2? Or am I stupid?

~~~
vog
Please note that your formatting is totally screwed, making your fine comment
almost unreadable.

See also the "help" link next to the edit box:
[https://news.ycombinator.com/formatdoc](https://news.ycombinator.com/formatdoc)

------
z3t4
With nodeJS using fs.readFile is basically the same as scaling across multiple
machines. NodeJS kinda force you to learn how to manage concurrency, eg what
people call callback/promise/future hell.

------
chaostheory
> Concurrency is hard. If you want concurrency you have two choices -
> processes or threads, take your pick.

Sometimes there's a 3rd choice: actors. imo it's much easier to manage than
threads

~~~
lobster_johnson
In Erlang, actors are called "processes".

------
annon23
yup... I think browser need to come with some type of bytecode or vm that
supports concurrency so we can finally have a true app platform.

~~~
andrewguenther
Don't you already have this with WebWorkers? This isn't a limitation of the
environment per se as much as it is a limitation of the DOM.

~~~
pmontra
The DOM wants to be accessed sequentially by a single thread, then we ended up
to say single threading is good because we don't have alternatives and because
we're using the same frontend technology for backend jobs (v8.) It's a kind of
Stockholm syndrome.

Sequential access to the DOM can be ok because we are the only user of the
browser. Single processing is not so ok on the backend because there could be
thousands of users there. We scale Ruby, Python and Node with multiple
processes (I'm doing it.) I'm also developing an Elixir application using
Phoenix. The approach is similar to writing Rails or Django code (I never sent
a single message in all the application) with the convenience of not having to
manage sidekiq or celery for background jobs (they're in the language) and
autoscaling.

------
chmod775
Can we stop repeating the meme that Node.JS is not multi-threaded?

True, your JS code all runs in a single thread, but all the heavy lifting is
behind asynchronous interfaces, delegating the work to a threadpool in the
background.

What a Node.JS application really is, is a supervisor thread (your JS code)
giving work for a bunch of worker threads. Doesn't sound very single threaded
to me.

~~~
Mithaldu
Can you start a thread with a JS loop in node so you can have two loops
running at the same time doing CPU-heavy tasks?

If not, then calling it multi-threaded is misleading to a programmer who needs
such things, and instead you'll need to find a more accurate word.

~~~
chmod775
This will repeatedly run 100000 iterations of PKBDF2 in 10 threads:

    
    
        process.env.UV_THREADPOOL_SIZE = 10;
    
        function startOne() {
            const start = Date.now()
            require('crypto').pbkdf2(String(start), 'secretSalt', 100000, 512, 'sha512', (error, result) => {
                console.log(`${result.length}b took ${Date.now() - start}ms`);
                startOne();
            });
        }
    
       for (let i = 0; i < 10; i++) startOne();
    

You can use htop or another tool of your choice to confirm this is the case.

~~~
Mithaldu
Are the threads started by way of external process, thread started within a C
library, or started with an actual Node.js call i can write directly in
node.js code and also use to run literally arbitrary node.js code?

~~~
chmod775
If I'm going to assume that by "node.js" code, you actually mean JavaScript
code running within the V8 engine as a part of the whole that is Node.JS,
then: The answer is no, you cannot do that using pure JS and without using
(external) stuff like WebWorkers.

If you are talking more generally about anything we can run within Node.JS, is
using Node.JS user-facing APIs and works without doing any modifications to
Node.JS, the answer is yes. You can easily achieve that with node's native
module support, and I would encourage you to do so over committing the folly
of doing anything CPU intensive in JS.

That is if you manage to find any CPU intensive task that isn't already
handled by node built-ins or some library out there.

 _Edit:_

Now you may say: "Ahah! You can't use threads using [limited usage of node].
Also threads must be usable for arbitrary workloads in order to consider a
thing multi-threaded. Node.js is single-threaded!"

I just can't argue with that. It's just a matter of opinions now about what
makes a thing/environment single-threaded, and what makes it multi-threaded.

To me it's simple: [x] Thing uses multiple threads to perform work.

