Hacker News new | past | comments | ask | show | jobs | submit login
Measuring memory usage: virtual versus real memory (lemire.me)
47 points by ibobev 52 days ago | hide | past | favorite | 14 comments

This barely scratches the surface of what a complicated topic memory usage is. No mention of shared memory (which is very important for Android because everything is forked from zygote), no mention that Windows separates address space allocation from “committing” to provide pages (so you can’t OOM on a page fault like with Linux, but you can fail more often on allocations).

So you can have virtual address space “usage”, commit “usage”, and physical page “usage” (which can be shared between processes).

And that’s just anonymous memory, file backed gets even more complicated. Are pages that are in file cache but no longer mapped into a process counted as “usage”?

Anyways, for an introduction to a topic I would expect more links to background information, and you can go very deep trying to answer the question of how much memory an application is using. And the first step is to carefully define “memory” and “using”.

Linux actually will let you allocate address space without allocating pages. You just disable all access and the OS knows that it doesn't actually need to give you any pages. You then commit by using mprotect to enable access to the page. There are madvise hints you can use to tell the OS that you no longer care about what's in the page, although they're a bit misnamed for compatibility reasons(MADV_DONTNEED will immediately free the pages while MADV_FREE will lazily free the pages)

Edit: Actually it looks like the behavior might have changed. hmmm. I guess read the man pages tied to the kernel you're using

Edit 2: Nope, I misread the most recent manpage. I should not read manpages when I'm tired. It does zero fill specifically for anonymous mappings

On Windows I always look at the commit charge. I think it's the most intuitive metric. While shared memory etc. do complicate things, I've found they're also not really relevant whenever I'm looking at the memory usage of a process.

Virtual memory is often misunderstood and, in my opinion, underused by application programmers.

For example, using a std vec is a very common pattern for a simple dynamic array, but the whole array content is copied every time it resizes, a cost that is more or less amortized because of the doubling in size every time.

Yet, a very large virtual memory block will have the added benefit of address stability, simplicity of implementation and zero copy, of course there is no free lunch as the page fault triggering the page allocation when needed will interrupt the process.

I should make some benchmark to compare the different approaches at some point…

You will still pay for at least 4 KB or whatever the page size in your system is. If you reuse the buffer memory use will never go down, if you reallocate on reuse you pay the price of communicating with the operating system. On x86 the virtual memory is just 48 bit big, which means you can have only 2^16 32 bit space entries.

To me what you are proposing seems like a clever way of allocating that has a pretty restricted use.

One problem with using the virtual memory system this way is that it will not be portable to platforms without virtual memory (like WASM). Admittedly, probably not a problem for most people, but it should be at least a consideration for library authors (ideally libraries should delegate all their memory management to the library user anyway though).

> Yet, a very large virtual memory block

...of what size? is the problem

I'd be interested

Somewhat on topic, since folks are talking about private vs shared a handy script for displaying memory usage nicely per application is ps_mem.py [1] or pip install ps_mem

And there are a myriad of command incantations to massage existing data in one-liners [2]

[1] - https://github.com/pixelb/ps_mem/

[2] - https://www.commandlinefu.com/commands/matching/memory/bWVtb...

> Given a choice, I would almost always opt for bigger pages because they make memory allocation and access cheaper

You can just allocate in units of 4, or however many pages at once; and trust the OS's ability to speculate accesses. The main advantage of larger pages is smaller page tables and higher TLB hit rates.

> higher TLB hit rates

Not always right? Atleast in x86 - switching to huge pages also decreases TLB size afaik.

    $ x86info -c
      TLB info
       Instruction TLB: 4K pages, 4-way associative, 128 entries.
       Instruction TLB: 4MB pages, fully associative, 2 entries
       Data TLB: 4K pages, 4-way associative, 128 entries.
       Data TLB: 4MB pages, 4-way associative, 8 entries

Unless I'm misunderstanding, despite reducing TLB size you still massively increase hit rate because each hit covers a lot more actual memory, right?

Sure it covers more area, but that area has to be contiguous. System memory in modern bloated systems is anything but that.

The author discourages "micro-optimizing" memory instead of "thinking in pages". But the page transactions between an OS and an application do comprise a series of micro-transactions (malloc).

I see no reason why key decisions in the micro-optimizations space could not lead to emergent page minimums.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact