The Linux kernel's inability to gracefully handle low memory pressure

bArray · on Aug 6, 2019

Similarly, there are many annoying Linux bugs:

`pthread_create` can sometimes return back a garbage thread value or crash your program entirely without any way to catch it or detect it [1]. High speed threadding is hard enough as it is, without the kernel acting non-deterministically.

Un-killable processes after copy failure (D or S state) [2]. If the kernel is completely unable to recover from this failure, is it really best to make the process hang forever, where your only available option is to restart the machine? I ran into this with a copy onto a network drive with a spotty connection, that actual file itself really didn't matter - but there was no way to tell the kernel this.

Out Of Memory (OOM) "randomly" kills off processes without warning [3]. There doesn't appear to be a way to mark something as low-priority or high-priority and if you have a few things running, it's just "random" what you end up losing. From a software writing stand-point this is frustrating to say the least and makes recovery very difficult - who restarts who and how do you tell why the other process is down?

[1] https://linux.die.net/man/3/pthread_create

[2] https://superuser.com/questions/539920/cant-kill-a-sleeping-...

[3] https://serverfault.com/questions/84766/how-to-know-the-caus...

idoubtit · on Aug 6, 2019

Point 3 is wrong. OOM killing is not random. Each process is given a score according to its memory usage, and the highest score is chosen by the kernel. The way to mark priority in killing is to adjust this score through /proc. All of this is documented in `man 5 proc` from `/proc/[pid]/oom_adj` to `/proc/[pid]/oom_score_adj`.

http://man7.org/linux/man-pages/man5/proc.5.html

simias · on Aug 6, 2019

I haven't toyed with that in a long time (probably about a decade really) but back when I did it was still very difficult to get the OOM to behave the way you wanted. IIRC the scoring is fairly complex and take child processes into account so while it's technically completely deterministic it can still be fairly tricky to anticipate how it's going to work out in practice. And I did know about `oom_adjust`. Often it worked. Sometimes it didn't. Sometimes it would work too well and not kill a process that was clearly using an abnormal amount of memory. Finding the right `oom_adjust` is an art more than a science.

Overall I ended up in the camp "you shouldn't throw passengers out of the plane"[1]. The best way to have the OOM killer behave well is not to have it run at all. If I don't have enough RAM just panic and let me figure out what needs to be done.

[1] https://lwn.net/Articles/104185/

SEJeff · on Aug 6, 2019

And the OOM Killer has been rewritten about twice entirely in the past decade. Christoph Lameter (one of my coworkers, who also wrote the SLUB memory allocator) wrote the very first one. Very little if any of his original code is in the current linux OOM Killer.

The current approach does indeed work much better. You can entirely disable the OOM Killer for a given workload with those procfs handles.

yebyen · on Aug 6, 2019

When I have multi-tenant shared servers that are regularly experiencing low memory conditions, that's now one of the first things that I turn off. I set sysctl vm.oom_dump_tasks=0 and vm.oom_kill_allocating_task=1, because without those and a comfortable amount of swap, I've found the likelihood that a rogue process uses up all the memory and the system becomes completely unrecoverable without power cycling, seemingly goes way up.

The way that the score is calculated, as I understand it, the processes with large memory footprints that are not particularly long-running, and especially not owned by root, are the most likely to be killed. My shared servers are running unicorn for Rails apps, so they almost all look the same under that lens.

I want the process with the runaway memory usage to be killed, so that person becomes aware of the problem when they find their app is down. (There are many better ways to solve this, but in our current system design, there's nothing to tell us that a service is down, so killing some other disused service means it's down until some user comes along and is impacted.) It seems like killing the process that made the last allocation is the most likely way to get this behavior I'm looking for, where the failures are noticed right away.

But I'm not convinced I've fixed anything, even if the behavior characteristics seem better, and I haven't had any servers going rogue and killing themselves lately, I am not the sysadmin, I'm pretty sure the sysadmin just comes and makes the machine over-provisioned so this doesn't happen again, which seems to be the best advice out there. And as I've learned from the discussion around this article, the fact of having a fast SSD on which paging stuff out and expiring it from the swap, all happens fast enough to look almost as memory, almost completely neuters the OOM anyway.

Our whole situation is wrong, don't even get me started; I'd like to solve this with containers, where we can set a policy that says "all containers must have memory request and limit" and thanks to cgroups, this problem never comes back.

wahern · on Aug 7, 2019

> thanks to cgroups, this problem never comes back.

cgroups doesn't save you from the OOM killer. In fact, we're seeing persistent OOM issues in production even when there's more than enough physical memory to satisfy all allocation requests.

Similar to the issue discussed in the leading LKML thread, I/O contention and stalls when evicting pageable memory (e.g. buffer cache) can result in the OOM killer shooting down processes when an allocation request (particularly an in-kernel request, such as for socket buffers) can't be satisfied quickly enough--quickly according to some complex and opaque heuristics.

The fundamental problem is that overcommit has infected everything in the kernel. The assumption of overcommit is baked into Linux in the form of various heuristics and hacks. You can't really get away from it :(

bArray · on Aug 6, 2019

> OOM killing is not random.

Yeah I know, hence the use of "random".

> The way to mark priority in killing is to adjust this score through /proc.

Haven't heard about this, thanks for the heads up!

fluffything · on Aug 6, 2019

> > The way to mark priority in killing is to adjust this score through /proc. > > Haven't heard about this, thanks for the heads up!

While going down that road is technically correct, it is a road full of pain.

A slightly less painful strategy is to disable overcommit. That way, if memory pressure is high, and a process calls `malloc`, that call will fail if there is not enough memory, and that process will fail. If you only have a couple of processes in your system that are using most of the memory and you can control them, it is simpler to just making them resilient to these kind of errors, than to try to mess with the process score to control the OOM killer.

thayne · on Aug 6, 2019

Unfortunately, the fork/exec method of spawning processes doesn't work for memory heavy processes (such as say a java server) without overcommit and copy-on-write memory. Not that I think fork/exec is the best method to spawn processes, but it is the standard way in unix-like systems.

tawy12345 · on Aug 6, 2019

The rationale for disabling overcommit only really makes sense if physical and virtual memory consumption are in the same ballpark. That's true for some workloads but not generally true.

There are totally legitimate use cases for processes that use significantly more virtual memory than physical memory (since virtual memory is relatively cheap, but physical memory isn't). A lot of programs are going to touch and not release all virtual memory they allocate from the kernel, but there are plenty of important counterexamples.

fork()/exec() is one example (which I've been burned by personally), but there are plenty of others. Any program that uses TCMalloc and has fluctuating memory consumption will have a lot of virtual address space allocated but not backed by physical pages. Sophisticated programs like in-memory caches or databases can also safely exploit a larger virtual memory space while keeping the amount of physical memory bounded.

bjourne · on Aug 6, 2019

Virtual memory and overcommit isn't the same thing. Virtual memory means using disk space as memory. Overcommit means that the OS allows more memory to be allocated than it can guarantee is available. It is exactly like an airline booking 1 000 passengers for a flight with 500 seats, hoping that only half of them will actually show up. I don't think overcommit is ever needed in a modern system or is even a useful feature. For example, Windows doesn't support it at all.

thayne · on Aug 10, 2019

> I don't think overcommit is ever needed in a modern system or is even a useful feature.

If you want to spawn child processes from a process that uses half the system's memory (not uncommon in server environments) using fork/exec it is useful. In case you aren't familiar with how that works, the parent process makes copy of itself, including a virtual copy of all the memory assigned to that process. That memory isn't actually allocated or copied until the child process child tries to write to it (and then only the specific pages that are written to). Typically, the child process then calls `exec` to replace itself with a new program and replaces the process memory. Without overcommit or swap if the parent process is large enough, then the fork syscall fails due to insufficient memory.

In a desktop environment using swap/virtual memory is fine. But in a server environment, where the disk may be network-attached (higher latency) and just big enough for the OS and applications, needing significant swap space is often undesirable.

wahern · on Aug 7, 2019

Windows supports overcommit, it's just not the default, and it's not how typical Windows runtimes allocate memory. And the nice thing about making it opt-in is that processes which didn't ask for overcommit won't get shot down when overcommitted memory can't be faulted in.

bjourne · on Aug 8, 2019

No, it doesn't. See quotemstr's explanation here https://lwn.net/Articles/627632/

wahern · on Aug 8, 2019

Thank you for the clarification.

What's left out, though (and perhaps the source of my confusion), is that you typically commit a reserved page from an exception handler, and if the commit fails then presumably in the vast majority of situations the process will simply exit. See https://docs.microsoft.com/en-us/windows/win32/memory/reserv...

If the code dereferencing the reserved-but-uncommitted memory was prepared to handle commit failure beforehand it would normally have done so explicitly inline. I can't imagine very many situations where I would pass a pointer to some library expecting the library to handle an exception when dereferencing it. There are some situations--and they're one reason why SEH is better than Unix-style signals--but extremely niche.[1]

Either way, my point was that Windows does strict accounting, and while you can accomplish something like overcommit explicitly, nobody else has to pay the price for such a memory management strategy. Only the processes gambling on lazy commit end up playing Russian Roulette.

[1] In a POSIX-based library I once used guard pages, SIGSEGV handler, per-thread signal stacks, and longjmp to implement an efficient contiguous stack structure. This was in an extremely performance critical loop (an NFA generated by Ragel) where constantly checking memory bounds on each push operation had substantial costs (as in multiples slower). AFAICT, it was all C and POSIX compliant. (Perhaps with the exception of whether SIGBUS or SIGSEGV was delivered.) Though, because neither POSIX nor Linux support per-thread signal handlers you could effectively only use this trick in one component of a process--you had to hog SIGSEGV handling--without coordination of signal handlers. SEH would have resolved this dilemma. This being such a niche use case, that wasn't much of a problem, though.

lokedhs · on Aug 8, 2019

That is solved by adding a reasonable amount of swap space. In practice the swap will never be used, but it's there as a buffer to guarantee that things will work if the forked process doesn't immediately exec.

enneff · on Aug 6, 2019

But any process can be trying to allocate at the time your system runs out of memory, and most applications are not authored to handle malloc failing. Process failure seems easier to work around, from what I’ve seen. Would love to hear more from someone who has contrary experience.

AnIdiotOnTheNet · on Aug 6, 2019

At least you can then properly blame the software for doing the wrong thing and potentially patch it. The kernel should not implement global behavior that encourages improper memory allocation failure handling.

AgentOrange1234 · on Aug 6, 2019

Yes so much. The overcommit approach seems to be to say, “Most people are lazy so we shouldn’t allow anyone to brush their teeth.” Handling malloc failures isn’t rocket science.

the_why_of_y · on Aug 6, 2019

Do you have an example of a nontrivial user-space project that has malloc failure handling that actually works, with tests and all? The one I'm aware of is SQLite. Then there is DBus, whose author disagrees with your assessment.

https://blog.ometer.com/2008/02/04/out-of-memory-handling-d-...

wahern · on Aug 8, 2019

Lua, various OS kernels

Like any aspect of writing software, how you approach the problem effects the complexity of the final product. If someone doesn't make a habit of handling OOM, then of course their solutions are going to be messy and complex; they're going to "solve" the problem in the most direct and naive way, which is rarely the best way.

For example, unless I have reason to use a specialized data structure, I use intrusive list and RB-tree implementations, which cuts down on the number of individual allocations many fold. Once I allocate something like a request context, I know that I can add it to various lists and lookup trees without having to worry about allocation failure. Most of my projects have more points where they do I/O operations than memory allocations. Should people just ignore whether their reads or writes failed, too?

lokedhs · on Aug 8, 2019

The JVM, and any other runtime which allocates its heap up front. You could argue that's cheating, but it does give you predictable behaviour.

pcwalton · on Aug 6, 2019

Incorrectly handling malloc failures has been responsible for a lot of security problems. We are all better off acknowledging that C programmers are broadly incapable of writing correct OOM handling at scale and figuring out what to do with this fact, instead of wishing that it weren't this way.

throwaway2048 · on Aug 6, 2019

There is now a way to mark processes as OOM-killer exempt

https://backdrift.org/oom-killer-how-to-create-oom-exclusion...

Part of the issue with processes stuck in D state (waiting for the kernel to do something) is that it is deeply tied into kernel assumptions about things like NFS, NFS is stateless, and theoretically severs can appear and disappear at will, and operations will keep working when it comes back. You can make NFS a hell of a lot less annoying in this regard by mounting it with soft or intr flags, however if the network disappears or hiccups, you WILL lose data (the network is NEVER reliable, in fact the entire model of NFS is arguably wrong to begin with)

joosters · on Aug 6, 2019

That's not a problem with NFS, that's a fundamental issue with computers. Things fail. Nothing protects you from losing data, your local log-structured filing systems won't save you either. They'll help you protect an existing state from corruption, but they don't protect you from loss.

That new request you just received when a hardware failure occurred? Say goodbye to it, you've no way of ensuring it will make it to storage when the disks have caught fire. Later on, when you've put out the blaze, all that algorithms can do is tell you when things started to get lost.

zzzcpan · on Aug 6, 2019

What you are talking about is unrelated to NFSs broken model. Although still important, because these are the reasons why we don't need fsync() to actually flush data to disk for example, it can be asynchronous and only needs to add a checkpoint to your log structured filesystem that will be written to disk later, once enough data is buffered or long enough time has passed.

labawi · on Aug 7, 2019

Often ordering would be enough, but sometimes you do want to wail until the changes are on disk. It's a mess.

quazeekotl · on Aug 6, 2019

This is a bad excuse for the really flakey nature of nfs, other network filesystems do not have its issues.

Volt · on Aug 6, 2019

Like which ones?

throwaway2048 · on Aug 6, 2019

AFS is a good example of a network filesystem that takes properties of the network into account, however its a little weird and not fully compliant with POSIX filesystem semantics.

However, it scales like crazy and is very powerful and reliable.

IcePic · on Aug 9, 2019

I ran AFS for very long, it does have issues which perhaps other network fs'es don't even notice. It has issues with clients behind NAT for instance, the callbacks won't work. The limit of files in a dir is less than for local filesystems (or were) so Maildirs with long mail filenames could get into something that feels like "out of inodes but not space" situations. All in all, it was very nice though.

marmaduke · on Aug 6, 2019

> the entire model of NFS is arguably wrong to begin with

On local networks (everything attached to a single switch) with good hardware, it is reliable, and soft/intr is the worse choice among others.

To wit NFS is one of the commonly supported VM storage options (libvirt, VMware, etc).

yetanotherme · on Aug 6, 2019

Best of the bad options doesn't make it a good option. It's far from reliable and one of the most common reasons I encounter for deadlocked Linux systems. Well, to be fair, it also hangs a fair share of BSDs and sometimes even a Solaris. If at all possible, I'd suggest avoiding it.

marmaduke · on Aug 6, 2019

What do you propose for a shared file system?

mkesper · on Aug 6, 2019

There is no use in specifying (no)intr as of kernel 2.6.25, it can always be interrupted by sigkill. See e.g. https://access.redhat.com/solutions/157873

There are many things "known" about NFS that are legacy.

joshumax · on Aug 6, 2019

What are you referring to regarding pthread_create()? Last time I checked I thought that it would return an undefined thread* only when giving a nonzero return code, which while certainly isn't handled in a lot of multi-threaded applications could be checked before anything else is done with the newly created thread.

bArray · on Aug 6, 2019

There appears to be cases under high create/destroy scenarios where it returns zero, but failed to allocate memory for a thread during create. This is with tonnes of available memory (hence not a mapping error) and no exceptions thrown.

That said, it's still entirely possible I've made a mistake. Please see here: https://github.com/electric-sheep-uc/black-sheep/blob/master...

The idea is that you do preliminary processing of a camera frame before sending a neural network over the top.

datenwolf · on Aug 6, 2019

> This is with tonnes of available memory (hence not a mapping error) and no exceptions thrown.

Could be a memory fragmentation issue

EDIT: Also since you're using C++ threads: You should really use move semantics, because right now you have two points of failure acting on the same thing: `new` operator may fail on creating the threads instance, and the underlying pthread_create may fail as well.

datenwolf · on Aug 6, 2019

Comment on my mentioning on move semantics, because I feel that this really ought to be pointed out:

In all the C++ standard library implementations of std::thread the only member variable of that class is the native handle to the system thread itself; there are no additional member variables! This means that the size of a std::thread object is equal to the size of a native handle, usually the size of a pointer, but sometimes smaller.

If you create std::thread by `new` you're essentially creating a pointer to a "possibly a pointer", which comes with all the inefficiencies associated with it: Double indirection, small size allocations tend to fragment memory. And at the end of the day to actually use it, you have to at least lob around that outside pointer around on the stack anyway.

So there is zero benefit at all of using dynamic allocation for std::thread. Don't do it! Just create the std::thread instances on the stack, they're just handled/pointers with "smarts" around them, and you can copy them around just efficiently as you can copy a pointer or an integer. Better yet, if you're not trying to "outsmart" the compiler you'll often get copy elision where applicable.

gmueckl · on Aug 6, 2019

sts::thread has its copy constructor deleted. This means that creating it on the heap is often your only option if you have to mix it with other types that don't handle noving properly because move semantics are strictly opt in.

datenwolf · on Aug 6, 2019

> std::thread has its copy constructor deleted

and for good reason

> your only option if you have to mix it with other types that don't handle moving properly

It's not the only option. The other option is to implement move semantics on the containing type. Properly implemented move semantics gives you assurance about ownership and that you don't use the tread interface inappropriately.

gmueckl · on Aug 6, 2019

Deleting the copy constructor is a very arbitrary design devision without any good reason.

Move semantics are a can of worms in themselves. You assume in your comment that you can modify the other types that interact with the types that are only movable. This is only possible if you own all relevant types, which is actually the exception rather than the norm. And even if you own the relevant related types, move semantics transitively enforce themselves onto containing types which turns their introduction into sprawling mess of cascading changes with hidden surprises.

datenwolf · on Aug 6, 2019

Here's something to ponder on: What are the proper semantics for copying a thread? What is it you want to express by doing that?

You'll find that usually the copy constructor has been deleted only for those classes where the semantics of a copy are not well defined.

So let's assume you work around that by encapsulating that thread in a std::shared_ptr or a std::weak_ptr. What are the constraints you must work within when using that thread reference?

Usually when you run into "problems" caused by an object not being "move aware" triggered by encapsulating a non-copyable type, this is a red flag that something in your codes architecture is off. Think of it as a weaker variant of the strong typing of functional languages. You probably don't want to have a shared_ptr on a thread inside your object (and the object being copyable), but wrap that object in a shared_ptr (or weak_ptr) and pass those around.

gmueckl · on Aug 6, 2019

Yourncery first question is already leading you down the wrong path: std::thread is a thread handle, not the thread itself. Equating the handle with the thread (a complex construct of a separate stack, seperate processor state, separate scheduling state etc.) is folly.

There are more software architectures between heaven and earth than exist in your philosophy. C++ especially is an old language and most code was written before C++11 started to be adopted. So a pure C++11 style codebase that follows the associated design best practices may be able to deal with std::thread and similarly restricted classes with little friction. But this just isn't the norm. Most big important codebases are too far down different roads to adjust them to play nice with move semantics.

datenwolf · on Aug 6, 2019

> std::thread is a thread handle, not the thread itself

While technically true, semantically there's not much of a difference. Yes, you can copy around a handle, but then you have the burden of tracking all of these copies, so that you don't end up with a dangling handle to a thread long dead… or even worse, a reused handle for an entirely new thread that happens to have gotten the same handle value.

This is why you should not think of std::thread being a handle, but the actual thread. Yes, from a system level point of view there's the handle, and all the associated jazz that comes with threads (their dedicated stacks, maybe TLS, affinity masks, etc.), all of which are non-portable and hence not exposed by std::thread, because essentially you're not supposed to see all of that stuff.

> C++ especially is an old language and most code was written before C++11 started to be adopted.

That is true. Heck, I've still got some C++ code around which I wrote almost 25 years ago. But if you use a feature that was introduced only later, then you should use it within the constraints supported by the language version that introduced it and not shoehorn workarounds "just to make it work".

gmueckl · on Aug 6, 2019

std::thread is just fundamentally flawed. The way it encapsulates the thread itself is just one of the things. Thread CPU affinity cannot be managed. Code cannot query which thread it is running on. There are no thread ids (the handle would be a workable substitute if it were copyable). Threads cannot be killed. In other words, if you take threading seriously std::thread is useless.

I need all of these things except for killing threads. So this is not just an academic list for me.

datenwolf · on Aug 6, 2019

> Thread CPU affinity cannot be managed

That's because the capability of doing so depends on the target environment. C++ is all-target. If you need that you can use `std::thread::native_handle` + the OS's API for that.

> Code cannot query which thread it is running on

You're wrong assuming that. std::this_thread::get_id() exists: https://en.cppreference.com/w/cpp/thread/get_id

> There are no thread ids

Yes, there are. std::thread::get_id() exists (also see above): https://en.cppreference.com/w/cpp/thread/thread/get_id

> Threads cannot be killed.

Not all runtime environments actually support doing this kind of thing. Also within the semantics of C++ the ability to kill threads opens an gargantuan can of worms. For example how would you implement RAII style deallocation and deinitialization of objects created within the scope of a thread?

Or even one lower level: How do you deal with locks held within such a thread? Not all OS's define semantics on what to do with synchronization objects that a held in a thread that's been killed. Window implicitly releases them. Pthreads defined mutex consistency, but after killing a thread holding a mutex, the state of the affected mutex is indeterminate until a locking attempt on the mutex is done.

Killing threads really is something that should be avoided if possible. Not since C++11 but since ever, because it causes a lot more problems than it solves. If you need something that can be killed without going through too much trouble, spawn a processes and use shared memory.

std::thread is very limited because C++ is an all-purpose, all-operating-system, all-environment language and it must limit itself to greatest common denominator of threading support you can expect. And realistically this boils down to: 1. there are threads. And 2. threads are created, run for some time, may terminate and you can wait for termination.

That's it. Anything beyond that is utterly dependent on the runtime environment. And because of that std::thread does give you std::thread::native_handle, to be able to interface with that.

gmueckl · on Aug 7, 2019

Many features of the STL are optional. So the existence of operating systems that are incapable of providing certain features is not a valid argument for leaving out features that are essential to using threads in any meaningful way on others.

Killable threads are not rocket science, either. You're just limited in the kinds of things these threads can do. But there's no need to get all hungup on that particular feature.

asveikau · on Aug 6, 2019

> Please see here: https://github.com/electric-sheep-uc/black-sheep/blob/0735de...

Creating and destroying threads in a tight loop (and blocking on their destruction, thereby reducing the point of having many) seems like a bad idea. Conceptually you have only maximum of two threads in any given part of this snippet, the one running the loop and the current instance of visThread. My guess is also that the loop thread spends most of its time waiting for the recently created threads to die. Why not have visThread only get created once and process a queue of events?

Anyway without additional evidence you potentially flagged a bug in the c++ standard library rather than pthread_create.

bArray · on Aug 6, 2019

Firstly, thanks for taking the time to look.

> Creating and destroying threads in a tight loop (and blocking on their destruction, thereby reducing the point of having many) seems like a bad idea.

The purpose is that it's pretty much 100% always running a visThread (it's a neural network that takes about 100ms per image). The pre-process on the other hand runs in about 10ms, but there's no reason why it can't be run in advance (+). The neural network can't really be run on multiple cores safely, but it does have OpenMP (parallel loops).

(+) It does create some latency in the output compared to the real world, but it's not a massive deal when it comes down to it.

> Why not have visThread only get created once and process a queue of events?

This is probably the best way to do this, but this was the lazy way with not too much overhead (I think) :)

> Anyway without additional evidence you potentially flagged a bug in the c++ standard library rather than pthread_create.

Possibly, I did run it through GDB and Valgrind, it reliably seemed to die in pthread_create, but that of course could have only been the trigger. It could also be the aggressive optimization [1].

[1] https://github.com/electric-sheep-uc/black-sheep/blob/0735de...

epiphanitus · on Aug 7, 2019

What do you recommend doing when Linux Freezes? It doesn't come up a lot, but when it does it can be kind of unnerving since the three-finger-salute doesn't work.

I would also love to know if anybody has a solution for getting video to play properly in Firefox. I know it's not a bug per se, but it would be nice to not have to switch between browsers all the time.

I've been using Ubuntu for about a year now and otherwise its been a very positive experience.

bArray · on Aug 7, 2019

> What do you recommend doing when Linux Freezes? It doesn't come up a lot, but when it does it can be kind of unnerving since the three-finger-salute doesn't work.

Unfortunately, I don't really have a solution for this. I have an SSD as the main disk and _even now_, when I hit this too hard Linux grinds to a halt. No mouse, no keyboard, just heat, fans and disk light.

One thing that sometimes works for me is the old CTRL+ALT+FX mashing, but not always. Once you can get a shell you can type into you're okay, but of course this doesn't always work.

> I would also love to know if anybody has a solution for getting video to play properly in Firefox. I know it's not a bug per se, but it would be nice to not have to switch between browsers all the time.

What do you mean? It's been reliable for quite a long while? There were two issues I used to have, one was not having my graphics card setup (it was running from the CPU) and the other was not having flash (when that was something).

> I've been using Ubuntu for about a year now and otherwise its been a very positive experience.

Yeah, I think it makes one of the better daily drivers.

tomthehero · on Aug 7, 2019

One of those magic sysrq combination used to work for me. I used to use it a lot before I upgraded to 12 GB RAM.

epiphanitus · on Aug 7, 2019

Thank you for your answer. It's good to know that dealing with freezes is not a problem only faced by newbs like myself.

Re: playing video, for some reason I can't play South Park on either Firefox or Chrome. Videos on twitter also won't load with Firefox, though they work fine with Chrome.

bArray · on Aug 7, 2019

> Thank you for your answer. It's good to know that dealing with freezes is not a problem only faced by newbs like myself.

Yeah, it's another bug with Desktop based Linux. The problem is that Linux "basically" treats the GUI like any other program, when things get heavy everything gets roughly evenly screwed. OSes designed to be centralized around a GUI on the other hand usually guarantee that GUI related processes get a minimum amount of time on the CPU to ensure they don't freeze.

Linux should 100% be doing this. Doing lots of hardware I/O shouldn't mean you lose the mouse or keyboard. When you lose control of input, you think the machine isn't doing something, when in actual fact it's doing tonnes, it's just not showing you. Even something like Android suffers from this under heavy load, it's really crap.

The real joke is, it's probably a difficult kernel fix. You would need some kind of watch dog timer for the kernel to make sure it's not getting too bogged down with any one particular task and then interrupt ones that are (some tasks don't like to be interrupted) [1]. You then need make sure that all of the heavy kernel calls don't make guarantees about the call being completed (i.e. blocking), which to be completely honest should be the default position to take anyway.

I'm not completely up-to-date on this, but my bet is that the issues come from everywhere. Programs can read/write arbitrarily large amounts of data (RAM, disk, network, bus, etc), when I believe you should be able to ask the kernel what size it would like you to read/write based on I/O activity and the capabilities of the device. If your program is bogging down the kernel, it should lover your recommended block read/write size. Better yet, this would be compatible with existing software, as they could choose to ignore this, possibly with it "punishing" programs that eat up lots of kernel time by making them wait longer for their next opportunity. There's a bunch of algorithms for time splicing tasks, but the most optimal appears to be max-min with very little organizing overhead [2].

> Re: playing video, for some reason I can't play South Park on either Firefox or Chrome. Videos on twitter also won't load with Firefox, though they work fine with Chrome.

Hmm, that shouldn't happen, sounds like you potentially have some system-wide badness. A few things to try:

* Disable any customization you made (extensions, add-ons, etc) - see if it is one of these interfering

* Make sure you have all updates and you're running an up-to-date version of Ubuntu (this issues have possibly been patched already)

* Make sure you have the correct drivers installed for your GPU

But... One thing I did note was that JavaScript coin miners have gotten so bad that I can't run certain sites without ad-blockers anymore (uBlock Origin is generally recommended). I remember my CPU sitting at one core maxed out just because of the JS engine. I generally run uBlock Origin + NoScript on every page and manually enable temporary scripts on pages I trust. One of the biggest offenders for crazily heavy JS was actually Facebook.

[1] https://en.wikipedia.org/wiki/Watchdog_timer

[2] https://uhra.herts.ac.uk/bitstream/handle/2299/19523/Accepte...

epiphanitus · on Aug 7, 2019

I thought about your comment re: video and I did some more poking around online since I was hoping to avoid having to do a fresh install. I tried updating my video codecs and now Netflix/Hulu works! Hooray!

>> JavaScript coin miners have gotten so bad that I can't run certain sites without ad-blockers anymore

Whoa, is that the reason why some websites are eating a ton of memory?? Is this common?

>> I generally run uBlock Origin + NoScript on every page and manually enable temporary scripts on pages I trust.

Thank you for the recommendations, I'll check those out.

bArray · on Aug 9, 2019

> I thought about your comment re: video and I did some more poking around online since I was hoping to avoid having to do a fresh install. I tried updating my video codecs and now Netflix/Hulu works! Hooray!

Yeah, it's important to pull updates regularly :) In general many of the bugs you'll come across will end up being fixed sooner or later, so it's worth checking regularly.

> Whoa, is that the reason why some websites are eating a ton of memory?? Is this common?

That and just the poor use of JS. Most websites really don't need JS. I use mbasic.facebook.com instead of facebook.com as it'll run without JS, plus you have to manually refresh to get any kind of notification.

azinman2 · on Aug 6, 2019

I have to say, reading all these replies about the OOM killer makes Linux look quite bad. These proc scores are not an elegant solution. I far prefer Darwin’s launchd which lets you set actual memory limits (soft and hard) that gives you warnings before you cross a threshold. Now this is more consumer OS oriented, but something equivalent for servers that let you express preferences in a more natural way seems desirable.

TheDong · on Aug 7, 2019

systemd lets you configure soft and hard limits on a per-service level, almost identically to launchd. See MemoryHigh= and MemoryMax= [1].

This does depend on cgroupsv2, but it works on most modern distros.

[1]: https://www.freedesktop.org/software/systemd/man/systemd.res...

azinman2 · on Aug 7, 2019

Oh nice! Thanks for the pointer!

jimpudar · on Aug 6, 2019

You can use ulimit to set soft and hard limits for all sorts of system resources (including memory) on Linux.

IcePic · on Aug 9, 2019

..but if the problem is programs not checking malloc() return codes since it will not return failures, then ulimits will in themselves not help the program. It will help the OS to stay alive which is good, but we still need to deal with the programs who expect to run all the way into a swamp and sink without malloc giving them "bad news".

azinman2 · on Aug 6, 2019

True — I forgot about this. But can you do that on a process via some config _before_ the process is created?

jimpudar · on Aug 7, 2019

Yeah, usually you call ulimit before calling the process. The new process inherits the limits. If you want to modify the limits of an _existing_ process, you can use prlimit.

the8472 · on Aug 6, 2019

memory limits are also available in linux via cgroups or ulimit.

altmind · on Aug 8, 2019

There is nothing more frustrating than unkillable processes stuck io iowait(D) state. There's no reason for this behavior to exist. And its so easy to hang forever - network blink, your NFS client gets stuck and your programs too.

Animats · on Aug 6, 2019

Ah, yes, that bug.

Few programs can handle a fail return from "malloc", and Linux perhaps tries too hard to avoid forcing one. Most programs just aren't very good at getting a "no" to "give me more memory" Browsers should be better at this, since they started using vast amounts of memory for each tab.

I used to hit a worse bug on servers. If you did lots of MySQL activity, so that many blocks of open files were in memory, and then started creating processes, you'd often hit a situation where the Linux kernel needed a page of memory but couldn't evict a file block due to some lock being set. Crash. That was years ago; I hope it's been fixed by now.

fluffything · on Aug 6, 2019

> Browsers should be better at this,

Browsers are quite good at this actually. Major web browsers run on Windows (and even 32-bit windows!), where there is no overcommit, so malloc can return "no" any time, which happens quite often when you are limited to 4Gb of memory per process.

The only apps that suck at this are Linux-only apps that are never used anywhere else and just assume that all Linux systems have overcommit enabled.

simias · on Aug 6, 2019

>Most programs just aren't very good at getting a "no" to "give me more memory"

I suspect that overcommiting is one of the reasons for this though. Many programmers in the Linux world have integrated that "malloc can't fail" and the only error handling they bother doing is calling abort() if malloc fails.

Of course the fact that C doesn't provide any sane way to implement error handling probably doesn't help.

needs · on Aug 6, 2019

Handling malloc() failure is almost never done for short lived programs. For instance git used to fail as soon as an error popped of (whether be it malloc() or open(), etc...). It just is much simpler and convenient to do so.

While C has no special error handling mechanism in place, error handling can still be done reasonably. IMO, the big reason for why malloc() errors are rarely handled is because it is quite hard to come with a viable fallback strategy.

simias · on Aug 6, 2019

>Handling malloc() failure is almost never done for short lived programs.

True, and that makes sense for something like git. But in my experience many long-lived programs don't bother to handle ENOMEM gracefully either.

But I guess I'm veering off-topic here, I'm mostly fine with applications crashing of their own volition when they don't have enough memory. I agree with you that in many cases there's no clear recovery path for an application that's out of RAM. It's the OOM-killer I have a problem with.

>While C has no special error handling mechanism in place, error handling can still be done reasonably.

I very much disagree with that. There are a few factors that make error handling in C a pain:

- No RAII, so you have to explicitly handle cleanup at every point you may have to early-return an error (goto fail etc...).

- No convenient way to return multiple values from a function. That means that in general functions signal errors returning some special value like 0 or -1 (even that is very much nonstandard, often even within the same library).

Oh you want to be able to signal several error conditions? Uh, maybe use several negative codes then? Oh you need those to return actual results? Well maybe set errno then? Don't forget to read `man errno` though, because it's easy to get it wrong. Oh you had a printf in DEBUG builds in there that overwrote errno before you could test it? Oops. Don't do that!

What's that, your function returns a pointer and not an integer? Ah, mmh, well maybe return NULL in case of error? You want to return several error codes? Well maybe you can just cast the integer code into a pointer and return that, then use macros do figure out which is which. It's terribly ugly? Well the kernel does it so... It can't be that bad right? Oh and what about errno? Remember that?

What's that, NULL is a valid return value for your function? Uh, that's annoying. Maybe use an output parameter then? Oh, or maybe some token value like 0xffffffff, that probably won't ever happen in practice right? After all that's what mmap does.

So no I wouldn't consider C error handling reasonable in any way shape or form. "Non-existent" is more accurate. You can always work around it but it always gets in the way.

I try to always implement comprehensive error checking in my programs. I do a significant amount of kernel/bare metal work, so it's really important. It's not rare that I end up with functions that contain more error-handling-related code than actual functional code.

Crinus · on Aug 6, 2019

You are making things sound way more complicated than they need to be, the situation is actually very simple: if you need to return multiple error codes, use a return value for the error code and give back things via an output parameter, otherwise just use a sentinel value for error (0, -1 or NULL depending on context, they aren't totally random you know, 0 and nonzero are used for false/true, -1 is used when you expect some index and NULL when you expect some object). When in doubt just use an error return code everywhere (e.g. what many Microsoft APIs - even some C++ ones - do with HRESULT).

simias · on Aug 6, 2019

If it's not that complicated please explain why OpenSSL, the linux kernel, Curl a multitude of very popular C libraries don't do what you describe. Clearly it's complicated enough that even talented C coders try to cut some corners when given the chance.

C error handling ergonomics are non-existent which means that everybody bakes ad-hoc library-specific conventions that are extremely error-prone.

You could argue that they're doing it wrong and you might have a point but if almost everybody gets it wrong maybe it's fair to blame the language itself a little bit.

Crinus · on Aug 6, 2019

I already gave an example of APIs that do this - pretty much all COM APIs use HRESULT. I do not know why not everyone does this as i'm not everyone and as such i cannot tell what sort of considerations (if any) were going on. At best i can make some guesses.

BTW curl does seem to do what i wrote above, for example `curl_easy_init` returns a `CURL` object on success or NULL if there was an error [1] and `curl_easy_perform` returns a `CURLcode` value [2] that looks like it is used across the API to indicate errors.

[1] https://curl.haxx.se/libcurl/c/curl_easy_init.html

[2] https://curl.haxx.se/libcurl/c/curl_easy_perform.html

robert_foss · on Aug 6, 2019

The kernel very much returns sentinel values, if something more complicated has to be transmitted error codes are commonly used. I see nothing wrong with it.

simias · on Aug 6, 2019

I'm not arguing that the kernel devs are doing it wrong. I'm only pointing out that, in my opinion, the way C deals with error handling (that is, by not doing anything at all) is far from reasonable and the cause of many bugs. It's terrible ergonomics.

If you have a kernel function returning a pointer and you think that you're supposed to check for NULL when it actually returns a ERR_PTR in case of errors you will not only fail to do the check but on top of that end up with a garbage pointer somewhere in your program. If you have a MMU and you try to de-reference the pointer you'll have a violent crash, which at least shouldn't be too hard to debug. If you feed the pointer to some hardware module or if you're working on an MMU-less system then Good Luck; Have Fun.

C doesn't have your back here. It doesn't let you signal how a function reports errors, it doesn't even let you tag nullable pointers.

bjourne · on Aug 6, 2019

Often you need to return error objects. Consider a function for parsing something. You want to return not only the error code, but also the line and column number of the parse error, and a description of it. So you need two output parameters; one for the result and one for the error. Your declaration becomes something like this:

    bool parse(inp_type *a, out_type **b, out_error **c);

where the return value false indicates an error. In C++, you'd just have written something like:

    out_type parse(const inp_type& a);

and thrown an exception on error.

Crinus · on Aug 7, 2019

In C you can return a struct, however a better approach is to use a context object which also contains error information, like:

    ctx_t* ctx = ctx_new();
    if (!ctx) ... fail ...
    if (!ctx_parse(ctx, code)) {
        show_error_message(ctx_erline(ctx), ctx_ercol(ctx));
        ... more fail ...
        ctx_free(ctx); /* often done in a goto'd section to avoid missing frees*/
    }

This also allows you to extend the APIs functionality, error information, etc in the future while remaining backwards compatible.

machinecoffee · on Aug 8, 2019

Which is great, except that ctx_new() requires a malloc, which then can fail, and now you can't even explain why the thing failed, as you have no context info.

You also have to worry about all of the ctx objects you've created along the way, to free them up as you recover from the low memory error.

alyandon · on Aug 7, 2019

That is very similar to the way I handled errors back in my C days.

bjourne · on Aug 8, 2019

Yep, you're absolutely right. But don't tell me that is simple! :)

la_barba · on Aug 6, 2019

> No RAII, so you have to explicitly handle cleanup at every point you may have to early-return an error (goto fail etc...).

I think RAII can be useful, but I've never found any use for it in systems level code that I write. Most of the time I'm dealing with resources that were allocated inside a systems library or an external component which just gives me a handle to the resource. I think this is a common enough scenario in systems code that I don't think its just me.

e.g.

    1. X = CreateResource()
    2. Y = TransformResource(X)
    3. ProcessNewResource(Y)
    4. Z = TransformResource(Y)
    5. etc. etc.

And so as you transform that resource, you will have multiple ways to unwind the resource depending on where the failure occurs. Even if you wrap X in some RAII container, you don't know what your destructor is going to look like.

Another con to RAII, especially when paired with shared-ownership smart pointers, is you lose predictability over your resource deallocs. You never know when the last pointer is going go out of scope, and if its a 'heavy' resource with a complicated unwind, you're going to get a CPU spike at an indeterminate time. I deal primarily with industrial automation code and I much prefer to have a smooth/even CPU graph. I think this issue is more relevant to systems code which is the context of this thread.

abjKT26nO8 · on Aug 6, 2019

I've just checked and it appears that c++'s std::vector::resize may throw std::bad_alloc when malloc fails, but rust's Vec::resize's interface doesn't leave any room to report any errors, so I guess that it will panic...? That's sad.

steveklabnik · on Aug 6, 2019

The standard library assumes infallible allocation, yeah. We have plans to add fallible stuff eventually, but we’re still working on our allocator APIs.

hurrrrrrrr · on Aug 6, 2019

Rust in general will panic on memory allocation failure. There was some discussion about oom handling a while back but I don't know the current state.

Animats · on Aug 6, 2019

Now you regret not putting in exceptions.

JJMcJ · on Aug 6, 2019

> vast amounts of memory for each tab

What underlies this? I am astounded to see 1GB of memory returned when I close a couple of tabs.

Chrome and Firefox both seem like this.

jamienicol · on Aug 6, 2019

It's spread across all parts of the browser, but speaking as a Firefox graphics engineer, we use quite a lot of memory. Painting web pages can be slow, so we try to cache as much as possible. When elements scroll separately, or can be animated, we need to cache them in separate buffers. If we get the heuristics wrong (and it's hard to get it right for every web page out there) this can be explosive. It's not helped by the fact that graphics drivers can frequently bring down the whole process when they run out of memory. It's a hard problem, but webrender will help as it needs to cache less.

cosarara · on Aug 6, 2019

Maybe the browser should try to discard some cached data when the system is out of memory. Then some things in the browser would be slower, but the operating system wouldn't hang.

bzbarsky · on Aug 6, 2019

The browser _does_ do that. The hard part is detecting "the system is out of memory". Some OSes notify you when that happens, and Firefox listens to those notifications and will flush caches. Some OSes will at least fail malloc and let you detect out-of-memory that way. Linux does neither, last I checked.

Disclaimer: I work on Firefox, but not the details of the OS "listen for memory pressure" integration.

cosarara · on Aug 6, 2019

Any userspace process can see how low the memory is though, firefox could do it itself. Still, if a notification system is already used in other OSes, a very easy solution would be to add such notification channel in userspace so that any process could ask firefox to free memory. Right now I am using earlyoom to save my system from freezing. It sometimes kills firefox, sometimes dbeaver, sometims VMs. But if it could tell firefox to chill for a bit and free memory, then I could avoid the massacre (at least sometimes).

bzbarsky · on Aug 6, 2019

> Any userspace process can see how low the memory is though

How, exactly?

Or put another way, how do you reliably tell apart "we are seriously thrashing" and "resident memory is getting close to the physical memory limits but there is plenty of totally cold stuff to swap out and it won't be a problem" from userspace? The kernel is the only thing that can make that determination somewhat reliably.

cosarara · on Aug 6, 2019

Maybe there's a better way than this, but the same way earlyoom decides it's time to kill processes (% RAM usage) firefox would decide it's time to free cache. While using the 100% of RAM might be the optimal state if you aren't going to use more than that, it's not safe.

heavenlyblue · on Aug 6, 2019

So let's say I'd like to write a memory-efficient web page, what should I avoid then?

bzbarsky · on Aug 6, 2019

Based on my experience as a Firefox developer investigating memory usage reports, the worst-performing "normal" web pages in terms of memory have:

1) Lots of script (megabytes). 2) Possibly loaded multiple times (e.g. in multiple iframes). 3) Possibly sticking objects into global arrays and never taking them out (aka a memory leak for the lifetime of the page). 4) Loading hundreds of very large images all at the same time. 5) Loading hundreds or thousands of iframes that all have nontrivial, if not huge, amounts of script. Social media "like" buttons often fall in this bucket.

There are obviously also pathological cases: if your HTML has 1 million elements in it (not a hypothetical; I've seen this happen), memory usage is going to be high, obviously. And arguably having a page with thousands or hundreds of thousands of JS functions is "pathological" too, but it's pretty normal nowadays...

kg · on Aug 6, 2019

For video memory/tile memory usage, avoid anything fancy that's hard to rasterize, for starters: Think things like rounded borders, drop shadows, opacity, transparent background images, etc. The more complex it is to draw your page the more likely that it will end up being cached into temporary surfaces and composited and stuff. For a while for some absurd reason Twitter's main layout had a bunch of rectangles with opacities of like 0.95 or 0.99, so all the layers had to get cached into separate surfaces even though you could barely tell it was happening. Getting them to fix that made the site faster for basically every Firefox and Chrome user. They hadn't noticed.

For JS and DOM memory usage you can use the browser's built in profiler to get a pretty good estimate of where things are going and what you've done wrong.

gnode · on Aug 6, 2019

Rather than guess at what to avoid, you should make use of the memory profilers which Firefox and Chromium developer tools provide. Apparently Firefox's memory profiler is an add-on: https://developer.mozilla.org/en-US/docs/Mozilla/Performance...

heavenlyblue · on Aug 6, 2019

Yeah, but this is a pigeonholing principle.

I don’t want to spend my time developing something only to discover it doesn’t perform well due to some reason.

I would prefer to not use any of the performance killers in the first place.

gnode · on Aug 6, 2019

Firstly, avoid leaking memory (including objects like images and DOM nodes) in JavaScript. Leaking memory here means retaining a reference beyond the end of the object's use. The garbage collector only collects memory which is no longer referenced; it does not attempt to analyse when a reference is no longer used.

Secondly, avoid including unnecessary resources. Many web pages include many libraries which are then mostly unused. Some packaging tools can help eliminate such unused code.

A memory profiler helps in both cases: it detects leaks, and it measures the cost of resources, allowing you to make educated decisions about their inclusion.

Crinus · on Aug 6, 2019

HTML3+

_ofdw · on Aug 6, 2019

JavaScript

heavenlyblue · on Aug 6, 2019

Is that a joke, or are you essentially saying that if I used WebAssembly, then most of the memory usage would go down?

superkuh · on Aug 6, 2019

He's not wrong. If you disable JS by default tabs will take much, much less RAM. Sites that require JS aren't worth the time anyway. Unless you're literally writing an application there's no reason to require executing code to render in text and images. And there's absolutely no reason to not have a no-JS fallback. In fact, there should be a real HTML skeleton first upon which you write JS enhancements.

But these are all things you do if you want to make a webpage for people. If your main concern is corporate profit or saving institutional funds then SPAs and requiring JS for obfuscation makes sense in the anti-user way that corps trend to. It's just faster. Who cares if it's anti-user when profits are on the line?

Narishma · on Aug 7, 2019

Is there a setting to disable all of that caching? In other words, prioritize memory usage over performance?

vaylian · on Aug 6, 2019

To be honest: I do not know. But given how fat most websites are today, I am not that surprised that so much memory is needed. Yes, there is still a major leap from a couple of megabytes to a full gigabyte, but with so many DOM nodes and JS objects I can imagine that even a resource-conservative browser will have trouble keeping memory usage low.

Or are there any browsers where you observe significantly less memory usage on the same websites? (Ignoring limited browsers like Lynx of course)

ygra · on Aug 6, 2019

Most modern browsers cache a lot of data in memory as well to make things like navigating back snappy and avoiding a full page re-load. It's also necessary to cache most of the state of the previous page to allow retaining form fields if you accidentally navigated away (as some forms are only in the DOM, created by JS, and not part of the original HTML).

I believe to have read somewhere that at least Chrome listens to low-memory situations broadcast by the OS and will evict such caches. So while it uses a lot of memory as long as memory is available, it will also release much of it if necessary.

PeCaN · on Aug 6, 2019

This makes sense but is somewhat annoying since a web browser is not the only program running on my computer (but apparently wants to be) and eats up RAM that could be used for the OS file cache.

I wish there was some sort of allocation API specifically for allocating caches so that recently accessed files could kick out a web browser's cache of a not-so-recently accessed tab or vice versa.

nitrogen · on Aug 7, 2019

POSIX does have something sort of like this in madvise(), but I couldn't find a specific option for the semantics you described.

bjoli · on Aug 6, 2019

On a low memory computer I had configured FF to not show images unless I alt-clicked them. Together with not using JavaScript this meant using significantly less memory. I suspect browsers still have to have raw bitmaps in memory, which for the 2mb jpegs you are fed everywhere quickly adds up...

chmod775 · on Aug 6, 2019

They have to be in memory somewhere. They technically don't have to be kept in RAM if they are uploaded to VRAM though.

Assumining RGBA it's just over 8MB for a 1920x1080 image.

gnode · on Aug 6, 2019

Images could be kept compressed until they are painted. Most GPUs support texture compression, so don't need to keep a bitmap for compositing.

chmod775 · on Aug 6, 2019

Texture compression is inherently lossy, so it isn't an option.

It's also really dependend on your textures and what you want to do with them, you don't want browser to just go give it their best shot at compressing your company logo.

iamnotacrook · on Aug 6, 2019

You want your browser to be fast. Browsers are often the only thing running. You have a bunch of ram. Unused ram is pointless. Disks are large and fast and are used for swap. It's not clear why you're suprised, what the problem is, or why anyone would put any effort into optimising a browser for memory usage. The number of people who have tens or hundreds of tabs open in the real world isn't as large as it is on HN and other tech sites, and for many people who do just keeping the url around so the site can be reloaded is probably good enough.

rat9988 · on Aug 6, 2019

> Browsers are often the only thing running.

If I had to choose one program that is proportionally used the least alone, I would have voted browsers.

plasticchris · on Aug 6, 2019

Javascript? I use noscript and it's less than 200MB/tab, even with major news sites and what not open.

agent008t · on Aug 6, 2019

200MB/tab just to display some text with markdown and maybe some small images? Isn't that insane?

ryandrake · on Aug 6, 2019

Answer is too simple. Exactly what is taking up what memory? I'd love to see the annotated memory map of these processes. M Kb of javascript text. N MB of image data, O MB of this, P MB of that.

Unpopular opinion time:

My guess is most developers don't care and are not even looking at this anymore, either during development or after release. Nobody seems to even know how much memory their program allocates and how quickly it allocates that memory under various running conditions. I used to challenge my fellow developers: Stop in the debugger right now. About how much memory should the process be using? Nobody even seems to have an order-of-magnitude guess anymore. It's your program, dude! Shouldn't you know this?

You can ask any embedded software engineer exactly how much memory his/her program uses, what's the stack size, what's the heap size, what's statically and dynamically allocated. Sadly, this discipline is pretty much gone outside of that specialized area.

CJefferson · on Aug 6, 2019

For browsers, this problem is "once removed". Firefox and Chrome both go to heroic efforts to reduce memory usage. However, given the html and javascript which fill most websites, and users expecting responsiveness, it turns out to be very hard to use less memory.

voltagex_ · on Aug 6, 2019

Chrome: https://developers.google.com/web/tools/chrome-devtools/memo...

Firefox: https://developer.mozilla.org/en-US/docs/Tools/Memory

lanevorockz · on Aug 6, 2019

Browsers are little OSes with encapsulated virtual machines. As every website tends to be a mix of web trackers, ads, server dependent functionality. The whole thing can be a big mess.

tempguy9999 · on Aug 6, 2019

If it helps any I have ~2500 tabs open in palemoon (well overdue for an afternoon of tab culling but anyway) and it's ~1.5GB. I never allow JS. So my guess is it's either JS directly, or perhaps JS pulling in extra resources when allowed to run.

michaelmrose · on Aug 6, 2019

I think a personal wiki with clickable links to 2500 sites would be more useful than a browser with 2500 tabs open. It could even be easily curated by more than one person.

That said, Firefox some strides starting in 55 insofar as handling very large number of tabs starting in version 55.

https://www.techradar.com/news/firefoxs-blazing-speed-with-h...

Unfortunately Palemoon is more or less firefox 38 still isn't it?

tempguy9999 · on Aug 6, 2019

Just tidying up after myself would fix most of that. I don't know what palemoon~firefox relationship is versionwise.

But my point was more that sans JS it really seems to use up far less memory. Honestly, try nuking js for 1/2 hour and see how it feels.

michaelmrose · on Aug 6, 2019

I get annoyed at all the sites that don't work without js.

My most common scenario with js is 10 tabs using 400Mb of ram to 30 tabs using less than 1gb with the former scenario being more common.

_-___________-_ · on Aug 6, 2019

1 GiB of virtual memory, I assume. IOW, not the same thing as 1/32 of your 32 GiB of RAM, for example.

zwaps · on Aug 6, 2019

I think that's not that bug at all. When memory runs out, the entire system stalls, including the UI, but nothing crashes. If these issues are frequent, the system is basically frozen.

I have this in Matlab on Linux. Matlab can actually deal with worker processes being killed, but my machine just locks up. Therefore, we have to run these specific simulations under Windows, where this doesn't occur.

HugThem · on Aug 6, 2019

I witnessed MySQL bringing linux servers down two.

In my case it happens like this:

I have a long running PHP process that constantly fires away mostly SELECT but also a bunch of INSERT and UPDATE statements and also some DELETEs.

Since the DB and the key files do not fit into memory, its all disk bound work.

All tables are MyISAM.

Like clockwork, this stalls the virtual machine once per day.

All I can do is to hard power down the VM and restart it. Afterwards the table data is corrupted beyond repair.

Not sure it is related to memory though. Because the memory usage of PHP and MySQL seem to be constant. Most RAM seems to be used by Linux for caches.

kokey · on Aug 6, 2019

The most common cause here is something causing a situation where some queries hang or takes a long time to complete, while also locking access to something, while new queries keep coming in. This builds up quickly.

A good way to catch this would be to have something log the list of running queries every couple of seconds. Look at this log after the crash and you'll hopefully be able to identify which are the long running processes, and which are the regular queries that builds up.

To fix it would be a combination of making the queries that cause the locking to be less like that, also perhaps putting in a limit on how many queries can build up and also implement a way for the regular queries that build up to time out or fail quicker or more gracefully.

HugThem · on Aug 6, 2019

I fire the queries sequentially. So there is never more then one query running.

kilburn · on Aug 6, 2019

The "once per day" part is suspicious. Maybe a scheduled backup is what's running the other query ;)

HugThem · on Aug 6, 2019

I hade the same suspicion. Especially since the provider indeed runs a nightly backup of the VMs. But even after turning that off, the VM stalled the next night again.

z3t4 · on Aug 6, 2019

Try doing some rate limiting in order to not cause the dead lock. Should probably also disable write cache. And if it still doesn't work switch to a bare metal machine. And give it a lot of swap and up the swappiness. Swapping is a much better alternative then crashing. VPS providers doesn't like swap because it will tear their SSD disks, so the swap and swappiness is probably preset too low.

HugThem · on Aug 6, 2019

I tried sleeping 0.1s every 5s or so. It did not help. Still crashed.

I don't think swapping would even occur since neither mysql nor php grow their memory usage over time.

It's not an SSD. It's good old rusty HD.

jval43 · on Aug 6, 2019

>Afterwards the table data is corrupted beyond repair.

That should not happen with a DB even if you turn off the power. Are you sure the hardware is good?

smueller1234 · on Aug 6, 2019

GP ist using the MyISAM storage engine. It's not crash safe. This is sad but expected behavior.

Don't use MyISAM!

ygra · on Aug 6, 2019

Is there any reason to ever use it (or: Why does it still exist)? In-memory databases for caches or other things that are not critical? I have to admit, I was astounded when I first got the error message from MySQL that a table was corrupted and that I should run REPAIR TABLE. That sounded like very weird behavior for a database.

smueller1234 · on Aug 6, 2019

Any remaining reasonable use cases would be sufficiently corner-casey that that the first order approximation is "if you want it to behave like a database, no, you do not want MyISAM".

This being said, at least some years ago, a use case I saw that held SOME water then was generating MyISAM tables offline, importing them as-is into a running MySQL (or taking an instance offline and bringing it back up) and then serving from it read-only. At least at the time, this provided better RO performance than InnoDB. I wouldn't be surprised if that was still true. Please don't do that at home!

Also, I think until the previous-to-most-recent release, some internal tables were still MyISAM, causing MySQL overall to have some very rare cases of not being crash safe. Again, I think that's since been resolved in 5.8(?).

HugThem · on Aug 6, 2019

    Is there any reason to ever use it

It is faster, uses less disk space and has a more logical filesystem layout.

amaccuish · on Aug 6, 2019

What do you mean by more logical? innodb_file_per_table has existed for a while now.

HugThem · on Aug 6, 2019

A flag with that name exists, yes. But it does not seperate table data into one file per table. It will still put stuff related to the tables into the central ibdata1 file.

Google "ibdata1 one file per table" to see all the pain it causes.

amaccuish · on Aug 6, 2019

False.

> But it does not seperate table data into one file per table

That's because if you didn't have it set when creating the database, it won't move data to the new fs layout when you set the setting on, without an OPTIMIZE. If you had it on from the beginning, table data is per file. I literally just did an ls on my /va/lib/mysql and there's a folder per database, in which there are 2 files per table (.frm and .ibd).

When innodb_file_per_table is on, and the database has been OPTIMIZEd, only the following is stored in ibdata1 [0]:

- data dictionary aka metadata of InnoDB tables

- change buffer

- doublewrite buffer

- undo logs

[0] https://www.percona.com/blog/2013/08/20/why-is-the-ibdata1-f...

HugThem · on Aug 6, 2019

   only the following is stored in ibdata1

You say "only", I say "clusterfuck".

Just look at the very page you linked to. It's a totally confusing concept that befuddles users and causes questions "we often receive", starts "panic", can "unfortunately" not easily be analyzed and you might need to "kill threads" and initiate "rollbacks" to fix the problems it brings.

MyISAM got that right. One dir per database.

amaccuish · on Aug 6, 2019

Ok but I don't get why you're so obsessed with the fs layout anyway. You should mostly treat it as a black box. And the point of ibdata1 is safety, which as you stated higher up is a serious problem with MyISAM. Even if it's not oom situations, you'll end up stuck sooner or later. You have been warned.

HugThem · on Aug 6, 2019

    You should mostly treat it as a black box

Again, check the very link you posted. People do that. Until shit hits the fan. And then they have to take that black box apart. Which is not easy.

PeCaN · on Aug 6, 2019

welcome to MySQL

_y4o5 · on Aug 6, 2019

In general, the out of memory condition doesn't always come from the Linux kernel however but from the underlying memory allocator which typically is the memory allocator in the CRT in libc. Just because some process's memory allocator returned NULL or threw bad_alloc doesn't mean the system as whole is running out of memory.

When the kernel is running out of memory it will just start the OOM killer which will kill a process with low "nice" value.

joosters · on Aug 6, 2019

Actually, I'd say that if malloc (or equivalent) returns NULL then the system really is out of memory. Every general-purpose memory allocator is going to contact the OS to ask for more memory if it doesn't have anything free in its own buffers.

But... it's still no good saying 'make your program behave nicely when malloc fails' - even if your own code is perfect, what are the chances that every library you use does the same thing? And even then, Linux by default will optimistically over-allocate memory (and rightfully so!) - with the result that you'll never catch every out of memory condition.

IMO, 'out of memory' is not a property that each single process should try to manage, rather it should be the OS or some other process with a global oversight that monitors memory usage and takes measures when memory gets tight.

_y4o5 · on Aug 6, 2019

You're right the memory allocator ultimately gets the memory it manages from the OS but as a programmer you're looking at it from the abstraction that its API provides and assuming any particular condition that would cause a NULL to be returned or bad_alloc to be thrown may or may not be correct.

The other point is that there's a distinction between kernele's view of OOM condition and some memory managers's OOM condition. Consider you run two processes, both allocate X gigs of memory and both succeed. However once you start running and committing the memory you'll get a kernel OOM condition and one process is killed. This is the overcomitting you mentioned.

Personally I don't see why people make such a big fuzz about dealing with memory allocation failures. Memory is just a resource same as any other OS resource, socket, mutex, pipe whatever. Normally in a well designed application you throw on these conditions and unwind your stack and ultimately report the error to the user or print it to the log and perhaps try again later. Just because it's "memory" should not make it special IMHO.

joosters · on Aug 6, 2019

It's the transparency of memory allocation that makes it so difficult to deal with failures. Even 'trivial' library functions could allocate memory, hell even calling a tiny dumb function might cause the stack to require a new page of memory, leading to failure. Just checking that all malloc calls check for NULL isn't even half of it.

Exception handlers won't save you either. Unless you consciously consider every memory allocation failure, your exception handlers will be too high level and result in your program either aborting by itself or becoming unusable. Did you pre-allocate enough resources to pop up an 'out of memory' error window? Good luck failing gracefully.

Memory allocation is special.

_y4o5 · on Aug 6, 2019

Yes, it's an imperfect world and you can't control what happens in a library but the attitude "it doesn't matter if my code is messed up or not, some library will still do the wrong thing" doesn't help. All you can do is make your code work properly and that's what you (and everyone else) should do.

Again it's imperfect world saying that "it won't work because x, y, z will happen" is not the right attitude and is bad attitude. Most of your code should treat it as just a resource allocation failure and in a sane program that is indicated by propagating an exception up the stack. Now you might be right that the program might fail when it'd be the time to display a message box to the user or whatever. But somewhere in the middle layers of the code you don't have that context, you don't know that it will fail. Therefore that part of your code really should be (exception) neutral just like in any resource allocation case.

cies · on Aug 6, 2019

> Memory allocation is special.

I think the Zig lang treats it as special, therefore making you write code that handles the case that a malloc fails explicitly.

bzbarsky · on Aug 6, 2019

> I'd say that if malloc (or equivalent) returns NULL then the system really is out of memory.

That's very much not true when 32-bit processes are involved. You can easily be out of (non-fragmented) address space in a 32-bit process (whether it's all resident or not) while the overall system is nowhere close to being out of memory.

Even in a 64-bit process you can exhaust the address space without being out of memory if you try hard enough; you just have to try much harder.

That said, even on Linux allocators will return NULL when they're just out of address space; there's no overcommit going on there.

joosters · on Aug 6, 2019

> That said, even on Linux allocators will return NULL when they're just out of address space; there's no overcommit going on there.

Try calling fork() in that process then. By rights, the new process should inherit its own copy of all the address space of the old process, and is free to overwrite it with whatever it wants. Linux (by default) won't stop fork() from failing on a process with N GB of RAM and total memory(+swap) < 2N GB, yet there simply isn't the memory around for both processes. There's your overcommit.

LgWoodenBadger · on Aug 6, 2019

Would it be possible for the kernel to suspend the process in scenarios where malloc would fail instead of returning a failure? Either until enough becomes available for it to succeed, or until something tells the kernel to renew/revive/resume the process and try the malloc again?

rtkwe · on Aug 6, 2019

It could but if that process is using most of a systems memory that will lock up forever because while the process is frozen it won't release any of it's memory.

ailideex · on Aug 6, 2019

You provide limited information but it is not clear the scenario you explain is a bug. If too much memory is locked into resident memory with mlock then this sounds like the expected and correct behavior.

hobbes78 · on Aug 6, 2019

Then I prefer the unexpected and incorrect behaviour of Windows, which freezes the offending application and continues to be responsive, allowing me to kill it if I wish to do so...

notacoward · on Aug 6, 2019

> Few programs can handle a fail return from "malloc"

Fewer than should, that's for sure, but hardly a trivial number. A lot of old-school C programs are very careful about this, and would handle such a failure passably well. Unfortunately, just about every other language tends to achieve greater "expressiveness" by making it harder to check for allocation failure. How many constructors were invoked by this line of code? By this simple manipulation of a list, map, or other collection type? How many hidden memory allocations did those involve? I'm not saying such expressiveness is a bad thing, but it does make memory-correctness more difficult and so most programmers won't even try.

As the world moves more and more toward "higher level" languages, returning an error from malloc becomes a less and less viable strategy. Might as well just terminate immediately, since "most frequent requester is most likely to die" is better than 99% of the OOM-killer configurations I've ever seen.

sinsterizme · on Aug 6, 2019

Glad to see this issue raised! My system hangs for minutes sometimes and is very frustrating compared to Windows and OSX which seem to handle out of memory in a much more user-friendly way. Which seems to be: suspending the offending program and letting the user decide what to do from there. I'm sure there's a reason the Linux kernel doesn't do something similar, but can anyone enlighten me? :)

yjftsjthsd-h · on Aug 6, 2019

Probably lack of integration; if NT hits a memory issue, it can just pass notice to the tightly-coupled userland and GUI. If Linux runs out of memory, even if it internally knows what process to blame... What would it do that makes sense for a headless server, TiVo, and Android phone? Keeping in mind that the kernel folks don't even work that closely with many userspace vendors.

wiml · on Aug 6, 2019

OSX handles this with a kqueue event that can notify userland when the system moves between various memory pressure states; this is hooked into by libdispatch and other userland libraries which will discard caches and so on.

I don't see why Linux couldn't do the same; open /sys/kernel/something and epoll on it.

antientropic · on Aug 6, 2019

This already exists: applications can receive memory pressure events (such as the system reaching "medium" level, where you may want to start freeing some caches) via /sys/fs/cgroup/memory/.../memory.pressure_level. See https://www.kernel.org/doc/Documentation/cgroup-v1/memory.tx....

packetized · on Aug 6, 2019

The first two nota benes explicitly describe this document being outdated and not what most people expect when it comes to “memory controller”. I am not certain that citing this is a great example.

Sean1708 · on Aug 6, 2019

What about this? Seems to do what they want.

https://serverfault.com/a/949045

useerup · on Aug 6, 2019

Windows (server and desktop versions) will throw up a message dialog on the screen. It will also start to kill off processes just enough to resolve the low memory situation.

During this - unlike Linux - you can actually use the mouse, CLI and close programs yourself.

On top of that server applications like IIS has built-in watchdogs. If an IIS process grows to use too much memory (60% IIRC) or excessive CPU, the watchdog will recycle the process.

StreamBright · on Aug 6, 2019

I think Windows kernels do not use overcommit, so memory allocation will fail if you run out of memory.

Asooka · on Aug 6, 2019

You could use the message bus to post a message to the service that handles out of memory decisions, which in turn could either

1. Show a GUI with a choice

2. Show a message on the current terminal and ask what to do

3. Just return "kill it now" if there is no interactive session

And if there is no such service, just default to 3. The problem really is that the state cannot be captured and communicated to the user. I doubt the NT kernel itself shows a GUI window, it's probably a service that gets woken up by a kernel exception, which in turn shows the window. Basically, the Linux kernel needs more pluggable functionality for user interactions. It's absolutely fine and even recommended to not have an entire GUI in the kernel, it needs to just provide a mechanism for userspace to capture the event and decide what to do with it.

chacham15 · on Aug 6, 2019

Throw a signal like it would do if the process were out of memory completely and about to be killed? (for clarification, no snark intended, actual question)

saagarjha · on Aug 6, 2019

What signal is sent when a process is out of memory? I thought malloc would either start returning NULL or you’d fault when trying to access overcommitted memory.

joosters · on Aug 6, 2019

Yes, but ideally you want to be throwing some ‘memory pressure’ signals before absolutely running out of memory, so that programs can take simple actions like emptying caches, etc.

Catching an otherwise-fatal out-of-memory fault and recovering would be too complicated / bug-prone.

pjmlp · on Aug 6, 2019

Android sends low memory events and kills processes based on heuristics.

Then again, Android has a customized Linux kernel.

jcfrei · on Aug 6, 2019

This describes my general Linux experience well: A very stable kernel, with which I never had serious issues on a headless server. But applications in the userspace (apart from the standard GNU packages) are usually a tossup anywhere between system-crashing garbage and perfectly working pieces of software.

LifeIsBio · on Aug 6, 2019

I used to run into this problem all the time in grad school. Once a month or so I'd load a data set, do some dumb Python operation on it that took significantly more memory than I predicted, and BAM! I'd have to restart my laptop.

I just kinda assumed that's how computers worked until I got a Mac a couple of months ago...

The link suggests that there might be some default parameters you could change to protect against this behavior. Does anyone have any suggestions on what settings to change?

kccqzy · on Aug 6, 2019

A Mac is certainly better at handling these kinds of issues but it's by no means totally safe. It tries to compress memory and dynamically allocate more swap, but there's still a limit and you can see that if you accidentally run programs with way higher RAM requirement than you have. I've had multiple occasions where my program used so much RAM that even moving the cursor is an exercise in patience, never mind switching to a terminal window and typing commands to kill the process.

savoytruffle · on Aug 6, 2019

A Mac will keep creating virtual memory swap up to some limit (some multiple of the amount of physical RAM — can't quite remember, possibly 5x) and then it will produce a kind of vague dialog box saying "You've run out of application memory" with a list of applications to force quit.

Avamander · on Aug 6, 2019

But at least you can recover rather cleanly from the issue.

caf · on Aug 6, 2019

If you've foolishly decided to run without swap (like the original post), then suspending the offending program does nothing.

This is because the offending program has allocated a lot of private dirty pages, which can't be dropped from memory because without swap space, there is nowhere for it to go.

ieoei9jjd · on Aug 6, 2019

Linux use cases tend to be servers where user interaction is unexpected at 3AM? No one around to make a choice, so automate a choice.

IMO despite the standout behavior, I prefer my software to deal with itself.

Systems designed to wait for user input end up having design choices intent on keeping a user using them.

Software is just a tool. Not a lifestyle. Set and forget this shit as much as possible

hvidgaard · on Aug 6, 2019

If the alternative is simply killing a process or crashing the kernel, then surely a better approach would be to suspend the offending process and call a handle that does something. If you want that something to restart the machine, fine. You want it to notify the administrator, fine.

pmontra · on Aug 6, 2019

There is the use case of android phones. One of the answers to the OP is about that case. It sends that Google developed a user space process to monitor those events https://lkml.org/lkml/2019/8/5/1121

From that reply it seems that Facebook implemented something similar, I guess for their servers.

jcelerier · on Aug 6, 2019

> Linux use cases tend to be servers where user interaction is unexpected at 3AM? No one around to make a choice, so automate a choice.

Even if it's a small percentage of the overall "computing" population, there are still millions of people running Linux on the desktop (roughly 2% out of 3.2 billion people using internet makes for 64 million - a large european country). It's 64 million of people for which this behaviour is a pain in the arse.

1-6 · on Aug 6, 2019

Does FreeBSD handle this issue better?

mrtweetyhack · on Aug 6, 2019

And you've turned swap off on Windows?

anticensor · on Aug 6, 2019

Windows will not BSOD due to memory pressure.

StreamBright · on Aug 6, 2019

I think it happens only if you have a broken device driver, applications cannot cause BSOD with memory allocation.

EmpirePhoenix · on Aug 6, 2019

Well I beg to differ, using a unlimited swap file can quickly reach hard issues after 64gb of swap use. At that point mallocs in the windows ui fail (timouts or something?), that apparently are not meant to, eg fonts from shutdown menu missing, the system being unable to shutdown ect.

1-6 · on Aug 6, 2019

Does it BSOD if it runs out of swap space though?

adamnemecek · on Aug 6, 2019

I'm not sure but the assumption might be that there's generally no user to ask as the computer might be a server.

IshKebab · on Aug 6, 2019

Right but if there is a user to ask then it should ask!!

noncoml · on Aug 6, 2019

Ehm, and how does the OS know which is the “offending” process?

I think you are confusing the issue raised here with your desktop experience.

brianpgordon · on Aug 6, 2019

Currently the Linux kernel computes a score for each process based on some heuristics. There's a good introductory article on LWN:

https://lwn.net/Articles/317814/

lazyguy · on Aug 6, 2019

Yep and it's about as good as just picking a random process and killing it.

It's awesome when you run out of memory and you try to log in only to have it kill sshd.

wyldfire · on Aug 6, 2019

A classic from [1]:

> An aircraft company discovered that it was cheaper to fly its planes with less fuel on board. The planes would be lighter and use less fuel and money was saved. On rare occasions however the amount of fuel was insufficient, and the plane would crash. This problem was solved by the engineers of the company by the development of a special OOF (out-of-fuel) mechanism. In emergency cases a passenger was selected and thrown out of the plane. (When necessary, the procedure was repeated.) A large body of theory was developed and many publications were devoted to the problem of properly selecting the victim to be ejected. Should the victim be chosen at random? Or should one choose the heaviest person? Or the oldest? Should passengers pay in order not to be ejected, so that the victim would be the poorest on board? And if for example the heaviest person was chosen, should there be a special exception in case that was the pilot? Should first class passengers be exempted? Now that the OOF mechanism existed, it would be activated every now and then, and eject passengers even when there was no fuel shortage. The engineers are still studying precisely how this malfunction is caused.

[1] https://lwn.net/Articles/104185/

antisemiotic · on Aug 6, 2019

Fortunately, engineers have invented a way to attach a Strolling Wheelbarrow After Plane, where you can stash the sleeping passengers without ejecting them out of the plane entirely. This has the unpleasant side effect of slowing down the journey for everyone when passengers wake up inorderly (and God forbid everyone wake up at the same time), though.

Annatar · on Aug 6, 2019

What I still do not understand is why people continue to turn a blind eye to this instead of switching to SmartOS. I just don't get it.

nisa · on Aug 6, 2019

How does Solaris/SmartOS handles that situation?

PeCaN · on Aug 6, 2019

It doesn't get in that situation, because malloc() can return null on Solaris (i.e. it never¹ overcommits).

While in general I think this is vastly better than the somewhat insane Linux OOM killer, you can get in awkward situations where you can't start any processes (including a root shell) because you're out of memory.

I rather like the FreeBSD solution to this, which is to not overcommit, but after a certain number of allocation failures it kills the process using the most memory. This prevents situations where you can't start any processes.

There's no one-size-fits-all solution to handling low memory conditions, but the Linux solution manages to almost never do what you want which is kind of impressive in a way.

¹ I seem to recall hearing somewhere that you can allow allocations to overcommit on a per-application basis on later versions of Solaris, but don't quote me on this.

floatboth · on Aug 6, 2019

> FreeBSD solution to this, which is to not overcommit

Where did this myth come from? Did y'all just assume that the vm.overcommit sysctl actually makes sense and zero means "no overcommit"? :)

https://news.ycombinator.com/item?id=20623919

But indeed, OOM killer kills the largest process, which makes more sense in most scenarios than Linux's "badness" scoring.

PeCaN · on Aug 6, 2019

Huh, I had no idea it worked like that. That's bizarre.

NikkiA · on Aug 6, 2019

Running sshd as an on-demand (Type=socket) service would probably work better, since then the sshd process would be new and thus have a better heuristic score - also not be tying up memory sitting unused in the meantime.

systemd still seems to run it (Type=notify) with the -D option all the time though, at least on the systems I can check.

Dropbear is configured by default as a Type=socket service though.