> prior to 1.2, the Go linker was emitting a compressed line table, and the program would decompress it upon initialization at run-time. in Go 1.2, a decision was made to pre-expand the line table in the executable file into its final format suitable for direct use at run-time, without an additional decompression step.
This is a good choice I think and the author of the article missed the
most important point - it uses less memory to have an uncompressed table.
This sounds paradoxical but if a table has to be expanded at runtime
then it has to be loaded into memory.
However if a table is part of the executable, the OS won't even load
it into memory unless it is used and will only page the bits into
memory that are used.
You see the same effect when you compress a binary with UPX (for
example) - the size on disk gets smaller, but because the entire
executable is decompressed into RAM rather than demand paged in, then
it uses more memory.
If you decompress it to an mmaped file it'll be one of the first things written to disk under memory pressure anyways and instantly available in normal situations.
With the ever decreasing cost of flash and it's ever increasing speed relative to the CPU compression is not really worth what it used to be to startup times 10 years ago though.
Swapping is fine for workstations and home computers. But high performance machines running in a production environments will absolutely have swap disabled.
The performance difference between RAM and disk is not an acceptable tradeoff. RAM will be tightly provisioned and jobs will be killed rather than letting the machine OOM.
Memory mapped files don't go to swap under pressure they go to the file they were mapped to, no different than the OS knowing you're debug data is in the file just not loaded to memory yet which is the only other scenario here.
Generally yes. Swapping changes the perf characteristics of that process (and often any other process on the same machine) in unpredictable ways. It's better to have predictable process termination -- with instrumentation describing what went wrong, so capacity planning and resource quotas can be updated. The process failure would generally be compensated-for at a higher level, anyway.
> “process failure would generally be compensated-for at a higher level”
I don’t quite get it. What does it mean? You re-run the process at an other time? On an other machine?
I personally do have swap on all my production machines. I’m not relying on it, but it acts as a safety net in case unexpected memory usage sneak in some jobs. This goes with a sane periodic review of host metrics of course to ensure safety nets do not become the norm.
I much prefer a slightly delayed job than a failed job.
I guess it depends on the complexity of your distributed system (assuming you’re operating one).
We prefer to have job OOM kill and get retried elsewhere (which could be a completely different machine) and we have plenty of infrastructure that makes this trivial. This infrastructure also deals with other types of partial failure, such as complete machine failure.
As mentioned above, paging introduces strange pref behaviour. Which may not always be important, but if your working under tight latency requirements then paging can push you over that boundary.
That sounds strange when we’re happy to see jobs die entirely (that screws latency). But the issue with paging is you have no idea when it’s gonna hit you, and may impact a job that’s behaving perfectly fine, except something got paged out by a badly behaving job.
Ultimately disabling paging is a really good tool for limiting the blast radius of bad behaviour, and making cause-and-effect highly correlated (oh look thing X just OOMed, probably means thing X consumed too much memory. Rather than thing Y has strange tail latency because thing X keeps consuming too much memory). It’s failing fast, but for memory rather than exceptions.
I get your point, and I think we agree here. I mainly wanted to argue against the original statement, which was quite broad “high performance machines running in a production environments will absolutely have swap disabled”. Would you agree to rephrase both our arguments by:
- paging incurs seemingly random performance degradation of processes and should be avoided
- if you have a form of task queue/job distribution system which handles automatic re-run, and can afford at no business cost to restart a process from scratch, then disabling swapping allows fail fast behaviour
- otherwise swapping can be used as a safety net for programs that would be better off slightly late than restarted from scratch
- both scenarios require sane monitoring of process behaviours, to catch symptomatic failures/restart in case 1) and recurring swap usage in case 2)
> if your working under tight latency requirements then paging can push you over that boundary
Even if you're not under tight requirements, swap can do strange things. I've actually seen situations where hitting swap, even trivially, can cause massive increases in latency.
I'm talking about jobs which took 10s of milliseconds to complete now taking multiple 10s of seconds.
I've even seem some absurdly bad memory management where Linux will make very very poor choices about what to page out.
> Ultimately disabling paging is a really good tool for limiting the blast radius of bad behaviour
>> “process failure would generally be compensated-for at a higher level”
> I don’t quite get it. What does it mean? You re-run the process at an other time? On an other machine?
I expect the commenter means whatever thing kicked off this job will detect it failed and do something about it (try it again elsewhere, log an alert triggering an admin to go provision more capacity, etc.)
Their view is that a failure is easier to troubleshoot and fix than success with intermittently anomalous characteristics.
To be honest, what I really wanted here is to force the commenter on agreeing
that whatever the architecture/framework/etc., there is no magical solution,
and saying that it will "be compensated-for at a higher level" just hides the
only 2 real possibilities:
- Kill the process and restart it somewhere else (no swap)
- Wait for the process to finish anyway (using swap)
Both choices have their own pros and cons, as discussed in other comments. But
arguments such as "it's better to have predictable process termination", and
"under tight latency requirements then paging can push you over that boundary"
are exposed as much less meaningful once the problem is stated in simple terms.
If the process indeed uses more memory than planned, then you have no guarantee
than restarting it will work this time. Worst case, you can even have to
allocate urgently a new node to handle that 1MB of memory above what was
available. Not sure that is really the best solution "under tight latency
requirements".
My experience would indeed be the reverse: if you are faced with unexpected memory
usage growth, then having some swap on the side to allow your process to finish makes
for a much smoother production system than having to restart it all from scratch
somewhere else/at an other time.
In my experience, production is not the place where you want to enforce ideological
concepts such as "a process shall not use more memory than what was planned".
Production is the place where you make compromises and account for everything that
slipped through these concepts, because you have no choice: it _has_ to run.
Staging would be a good place for this though, as it's much less critical if
something fails, and you have more time to fix and account for it.
I really don't want to debug/profile/release/deploy a program in a hurry just because
somehow it ended up using more memory than planned. (or convince infrastructure to
deploy a new node type ASAP to handle that new memory constraint).
Now one could argue that such unplanned memory consumption should have been caught
earlier in the development process, to which I would reply:
- Not everyone is able to properly test programs in a production like environment,
especially when dealing with resource-hungry processes. In our case for instance
a typical staging machine would be a downscaled version of the production one (less
cores, less memory, less bandwidth).
- I tend to plan _also_ for what should _not_ happen (because it will happen). Even
with a perfect environment to calibrate programs before a release, there will be
releases in which the memory usage was not accounted for. Guaranteed.
I’m guessing this is a disagreement on the meaning of “reliable,” and there is room to disagree on that. From my perspective, if your failure model is “it has to run” then you don’t really have a failure model and are just hoping for the best. If you have the resources to improve on that approach and formalize expected failure, then OOM logs are another tool that can give you metrics from production while failing operationally. The real benefit is ease of debugging.
For those of us who build distributed systems, it’s all about number of modes we need to design for, test, and monitor. If all I have to worry about is process death, I can design my service around that. Monitoring for process death is generally pretty straightforward as well.
Gray failures (like process slowdowns), on the other hand, are fairly difficult to design for, and detect. And it can wreak havoc on distributed systems if one of the nodes suffers a gray failure.
It depends on the memory profile of that particular job of course, and the overall latency/throughput requirements of the application. Swap memory being as slow as it is I've experienced few jobs that can reasonably handle it.
Virtual memory works even without swap enabled. Since the mapped file is the binary, and code is never changed after loading, the OS can simply take pages backing the memory. But when there is a page fault, it will be bought back in.
Reliable software used swapping for years. Artificially having a low virtual memory ceiling by disabling swap is just austerity for almost no benefit.
Paging into swap often has almost no perceptible effect in regular usage unless you are running software with microsecond guarantees on infrequently accessed processes (those that get swapped out).
In 'modern' microservice architectures, this is not true. If you look at the k8s approach, reliable software is created through redundancy. A part of the app being killed by the OOM shouldn't matter, it should automatically be rescheduled on another node.
Kubernetes as a platform recommends disabling swap completely, and you have to explicitly allow nodes to have a swap, otherwise it fails.
This is sane behaviour if you're dealing with a large cluster with a complex architecture you no single person could or should know all the ins and outs of. There is no "let's log on to the machine and see what's happening" when dealing with this types of architectures, even at smaller scale.
And a massive part of Go's target is exactly these modern architectures/workloads...
On linux there is the best of both worlds approach. You can use zram for swap and userspace OOM killer, like earlyoom. If processes start using a bit more memory or leaking they won't get killed and nothing will slow down much, but will get killed if things start to go too far to cause performance problems.
No, there's no need for a tradeoff at all in a properly implemented compiler. You can certainly compress a table in memory and still have that table be indexable at runtime. Entropy coding is not the only type of compression that exists. DWARF solved this a decade ago.
This is a good choice I think and the author of the article missed the most important point - it uses less memory to have an uncompressed table.
This sounds paradoxical but if a table has to be expanded at runtime then it has to be loaded into memory.
However if a table is part of the executable, the OS won't even load it into memory unless it is used and will only page the bits into memory that are used.
You see the same effect when you compress a binary with UPX (for example) - the size on disk gets smaller, but because the entire executable is decompressed into RAM rather than demand paged in, then it uses more memory.