
Node.js setInterval callback function unexpected halt - luu
https://github.com/nodejs/node/issues/22149
======
pkulak
Ouch, not a good look when someone keeps mentioning that you should never
expect a month of uptime.

~~~
ddtaylor
I am impressed the reporter of the bug kept his cool and did all the debugging
work for them using libfaketime while they blobiated about code not supposed
to be running for 25 days and being unable to reproduce anything.

~~~
jessaustin
Eh, I've seen worse. Overall it seems everyone was pretty cooperative. We're
just lucky this wasn't an _npm_ bug...

------
chatmasta
It’s amusing to see the bottom of issue threads like this grow with messages
“issue was referenced from another project.” It would be cool to generate a
graph of which projects have issues linking to each other. I suppose it would
be sort of like a dependency graph, but one that only includes the most
fragile dependencies. :)

------
thecatspaw
> Note that what I meant by that isn't that your services aren't expected to
> be up for a month. It's that redundancy is expected (2+ instances behind a
> load balancer) so servers are resilient to failure.

I guess thats why he mentioned gameservers specifically, as those often depend
on being a single node

~~~
sp332
And if you boot all your redundant servers at the same time, they're all going
to go down pretty close to the same time as well.

~~~
pvorb
And you also need to know about this bug in order to make your server go down
after 2^31 ms. As stated in the Github issue, the service remained otherwise
functional.

~~~
NullPrefix
So the lesson learned: Dont' check if the service is up, just restart it
anyways.

------
kryptiskt
Here is the commit that fixes it:
[https://github.com/nodejs/node/pull/22214](https://github.com/nodejs/node/pull/22214)

------
samspenc
TL;DR from the GitHub discussion: had to do with an integer overflow related
to oxffffff hex string lengths and unsigned ints, introduced in one of the
Node 10.x versions. This is the pull request that fixes it:
[https://github.com/nodejs/node/pull/22214](https://github.com/nodejs/node/pull/22214)

------
trias
doesn't this fix mean it crashes after 2^32 milliseconds now?

~~~
moefh
No, it only uses an int if the value is less than 0xffffffff, otherwise it
uses a double.

That means it's ok up to 2^53 milliseconds, at which point it starts losing
precision (you can't represent every integer above 2^53 as a double, it gets
rounded).

It doesn't really matter because no computer that exists today will still be
running in 2^53 milliseconds.

------
foreigner
AFAICT hasn't the fix just kicked the can down the road to 2^64 milliseconds?
Future NodeJS coders 585 million years from now are going to be cursing at
us...

------
pvtmert
exactly same thing i thought when i see some timer bug (not recurring events)

i actually came to this bug couple times (nw.js user here) i re-initialize
timers once in a week. :/

------
dmitrygr
Windows 95 had a similar bug

[https://www.cnet.com/news/windows-may-crash-
after-49-7-days/](https://www.cnet.com/news/windows-may-crash-
after-49-7-days/)

------
_pmf_
Seems similar to this beauty:
[https://github.com/nodejs/node/issues/12115](https://github.com/nodejs/node/issues/12115)

~~~
canhascodez
You seem to be making some sort of negative judgment about that particular
issue. Perhaps you'd care to give us the benefit of your reasoning.

~~~
_pmf_
> You seem to be making some sort of negative judgment about that particular
> issue. Perhaps you'd care to give us the benefit of your reasoning.

A freshman learns not to store data in floating point data types if the
significant digits of the data's domain exceeds the floating points guaranteed
1:1 domain.

Some maintainers of node.js's file system module did not (or did not care
about it)[ _].

Given that JS developers are hailed as the pinnacle of software engineering,
this strikes me as odd.

[_] "But JS does not have a proper 64 bit datatype / BigInts" ... that's not
an argument

------
congresstart
Incorrect title, should be 2^31 milliseconds

~~~
femto113
Which is about 25 days, or roughly a month, either of which would make a
better title. I read the article hoping to discover how they ran a test that
was 60 years long...

~~~
rtkwe
I don't think so, seeing 2^31 (units) immediately tells me what's probably
going wrong in a way that 25 days or ~1 month doesn't unless I go and convert
that into milliseconds.

------
truth_seeker
Its a feature rather than a bug ;)

Can't think why someone would do that. Such a large number of milliseconds
just not practical.

~~~
olliej
Long runtime apps, calendar-esque scheduling, etc al@ seem like things that
could reasonably* hit that

* specifically they could reasonably end up with time steps >= 2^31ms. Even if the design isn’t ideal it should still work

~~~
avip
That does not sound like a sound implementation.

~~~
zamalek
The bug occurs if your process reaches ~25 days, period. Regardless of the
interval that you set - it could be 1ms and still fail. It has been 'fixed'
and is now ~49 days, but setTimeout does not throw and so you have to rely on
your process crashing due to some other condition.

If you have setTimeout anywhere in your code (including packages), you will
need to force it to crash/exit once a month.

The implementation of setTimeout in Node is not sound.

------
justicezyx
So this is like a reminder to anyone who always reminder others that you do
not need to solve problems like Google has.

------
baybal2
Well, so does the settimeout. A lot of underlying code is limited to signed 32
bit integers as per prehistoric convention.

JS committee should consider turning timer values into js native signed double

~~~
tomxor
> cumulative running time is not needed to calculate when next to call the
> interval

I don't see how that solves anything, you are just defining a new limit to be
broken, which would occur when the timer resolution is next increased if not
in practice.

It makes sense that either the actual interval or timeout periods are limited
by the underlying types used to store them... but I see no intrinsic reason
why cumulative time that setInterval runs need be limited by any underlying
type (cumulative running time is not needed to calculate when next to call the
interval).

~~~
baybal2
>I don't see how that solves anything, you are just defining a new limit to be
broken, which would occur when the timer resolution is next increased if not
in practice.

I think that having some uniformity in a zoo of rolloverable values will make
life easier to people who don't read the language spec meticulously to the
letter.

And yes, by any extend of human imagination, no programming language can
justify implemention of a plain periodic timer through (oldTime - nowTime) %
interval.

A timer should be a timer, and modern opetating sysyems provide developers
with a whole arsenal of high level timer abstractions just for those people to
not to write their own hacks.

------
jschulenklopper
Would someone seriously encounter a defect like that? Who would set a function
to run after ~25 days (2^31 milliseconds) and expect that that process still
runs after all those days? You're not safeguarding yourself against the very
likely situation that processes could stop over such a long period.

Also, considering the length of that period, it's most likely that the delay
for the function-to-be-called is from a business (and not some technical)
requirement. So, a more fitting design would be to set up a delayed job for
asynchronously executing long-running or later tasks in the background by a
scheduler.

~~~
tobr
Conceptually they are not scheduling a function to be run after a month, but
scheduling it to run twice per second as long as the process is running. When
the process is still running a month later, it is very unexpected that the
scheduling would suddenly stop.

~~~
jschulenklopper
My bad, perhaps because of the original (or confusing) post title earlier. A
scheduling process that stops after 25 days is indeed a more realistic bug
than a function planned for execution in 25 days not being called.

