No. _I_ want to know what we're talking about, as my original question clearly indicates.
You can do "lock-free memory based interprocess communication" with memory (obviously). There is no need to back this memory with files, certainly not files on a hard drive that you would otherwise access using read() and write(). Hence my original question.
First, I didn't assume it a requirement to have two processes read() and write() _directly_ to the same memory (I suppose you meant "file region" here). And idk, it might not be a good idea to require that.
Also, you can use normal (non-file-backed) memory to do the necessary synchronization (lock-free or not). I'm still not seeing why the memory should be backed by a file, that's why I was genuinely asking. One reason why it could be practical that I can now see could be for an embedded database like sqlite, but again I'm not sure it would be a good idea. While it would allow for pretty much setup-less synchronization of otherwise uncoordinated processes, it's a fringe application that might be better implemented with one big flock(). And one reason why it could be not a good idea is that it might couple the file format to a particular CPU architecture.
Another big issue I guess is that the atomics actually do have an effect to the underlying file whenever the pages are flushed. What if the computer shuts down unexpectedly? The synchronization affairs aren't cleaned up, yet the original processes are gone.
You weren't asking, you were saying it wasn't necessary, which you did in the sentence right before this one:
Also, you can use normal (non-file-backed) memory to do the necessary synchronization (lock-free or not).
Again, this is just a repeated claim, it isn't an explanation. How do you have two processes writing to the same place in memory without memory mapping a file?
have two processes read() and write() _directly_ to the same memory
I didn't say read() and write() I said read and write as in reading and writing with memory addresses. Again, this is all about lock free interprocess communication. You can't write outside your own memory from a process with normal permissions so how do you share memory with another process?
You memory map the same file. This isn't about the file being written to some sort of persistent storage, that happens on the OS level and doesn't interfere with two running processes communicating with each other. The file can be deleted after the last process closes it. It is just a way for the two processes to have memory mapped into their virtual memory space that overlaps with each other.
You need to deal with memory directly so you can use atomics. You need to use atomics so you can avoid locks.
I thought you might have had some other technique that I'm not aware of but it seems now you were making claims without much behind them, which is disappointing.
These are still memory mapping files using file paths and returning file descriptors as far as I know, which makes sense because you have to have something coordinated between the two processes.
> These are still memory mapping files using file paths ...
No, they're not. The entire purpose of MAP_ANONYMOUS is to avoid using files.
Sources:
1. The Linux Kernel source code [1], where it comes with the code comment: "don't use a file".
2. The glibc source code [2], where it comes with the same code comment: "Don't use a file".
3. The Linux man-pages project documentation of mmap [3], where it is documented thus: "The mapping is not backed by any file; its contents are initialized to zero. The fd argument is ignored"
Similarly for SHM, but if you still don't get the point about MAP_ANONYMOUS, I doubt you'll get it for SHM either.
> ... and returning file descriptors
A socket is a file descriptor. An epoll handle is a file descriptor. On modern Linux kernels, a pid handle is a file descriptor. None of them are backed by "files".
> ... because you have to have something coordinated between the two processes.
FDs are not the only things processes can share, even if you go back to the venerable, original Unices, so I don't see what you mean.
This just goes back to the same question - what do two processes use to map the same memory into their memory space if it isn't a path to a file?
I'm not saying there isn't anything, I'm just seeing an extreme avoidance to an actual answer. The other guy went down a rabbit hole of syncing that memory to storage, which has nothing to do with anything.
I'm starting to think you're even more confused than I had assumed. You were literally given a reasonable possible answer to your question multiple times (MAP_ANONYMOUS). And if there wasn't a big confusion you wouldn't be asking these questions in the first place because you could just make up your own answer.
I'm also left uncertain if you're assuming Linux and not talking about it. At least your objections to general statements are weirdly specific, while you never clarify the context (e.g. what OS you're talking about), and you seem to assume that there couldn't be other ways of achieving stuff. There seems to be a weird lack of understanding of the basics in your comments.
At the core, everything you need to share memory is that the participating processes agree about the (physical) address range of that memory (e.g. a 64-bit starting address and a 64-bit size). You could literally hardcode a physical address range, map this range to arbitrary (and possibly different) virtual address ranges in each of the processes, and start communicating through that shared memory. Note that the mappings are stored in the RAM and CPU, it has nothing at all to do with any files or filepaths.
And this whole discussion is completely pointless anyway because it started of YOU misunderstanding what I meant by "file-backed memory", which is not my fault at all. The term is completely unambiguous, it means (as opposed to POSIX SHM / MAP_ANONYMOUS / whatever) page cache memory that gets synced to an underlying file on a filesystem.
Please stop questioning and start experimenting and understanding what we're saying. We know what we're talking about. You don't.
"MAP_ANONYMOUS|MAP_SHARED mapped memory can only be accessed by the process which does that mmap() call or its child processes. There is no way for another process to map the same memory because that memory can not be referred to from elsewhere since it is anonymous."
misunderstanding what I meant by "file-backed memory"
No, it started by talking about using atomics for lock free interprocess communication, something MAP_ANONYMOUS can't do.
You hallucinated writing to storage as being part of this, didn't explain yourself and are getting upset about it. Atomic instructions that manipulate memory is orthogonal to what the OS does is the background. No one would think an operation on the order of nanoseconds has anything to do with writing permanent storage.
clarify the context (e.g. what OS you're talking about)
This thread is about mmap - it says it in the title.
it has nothing at all to do with any files or filepaths.
Two processes need some way to map the same memory and they do it through file paths.
> This thread is about mmap - it says it in the title.
I was asking what YOU are talking about. And also, this thread is actually about the approach of memory-mapped file I/O, not about POSIX mmap() specifically.
That's why I was (clearly) making statements that are not tied to any particular OS or platform, from the beginning.
If your boss said "we need these two programs to have lock free IPC through memory" and you said "use MAP_ANONYMOUS" they would say "that is local to the process tree and won't work".
You can try to ignore the context of this thread, but if someone wants IPC, this doesn't work.
> But then that isn't interproccess communication.
It is. It may not be _generic_ IPC, but it is IPC all the same. E.g., this is how postgres does IPC across its processes.
> that is local to the process tree and won't work
Isn't that what SHM is for? But, oh I see, you're willfully ignoring the fact that SHM keys _are not file paths_. So, yeah, I guess in _your_ world, non-file-backed IPC can't work.
> If your boss said ...
Sucks to be your boss, since _you_ don't get the fact that SHM keys and the filesystem are entirely separate namespaces.
> You weren't asking, you were saying it wasn't necessary, which you did in the sentence right before this one:
quoting my OP: " Why do you need lockless atomic updates to a file-backed memory area? Genuinely curious. " . Dude.
> it seems now you were making claims without much behind them, which is disappointing.
Well thank you very much.
I get the feeling we might just be talking about the same thing. Or we might be not, I'm not sure.
> How do you have two processes writing to the same place in memory without memory mapping a file?
> You can't write outside your own memory from a process with normal permissions so how do you share memory with another process?
For example on Linux, use shm_open() + mmap(). This is just an example, and granted it uses a file-like API (shared memory objects show up on /dev/shm on a typical Linux) but it is not "file-backed" (I meant disk backed and this might be the misunderstanding) and in particular it's certainly not mapping the database file. It's just one way on one OS to map the same physical memory into different processes' address spaces.
If this example approach is "file-backed" to you, then so be it but I think you have willfully misread my comments up to here.
Homework: go back through my comments and identify all the places where I was VERY CLEARLY pointing out that my statement is that no disk-backed file is needed, or where you could reasonably infer this from my use of the term "file-backed", as well as from the general context of the discussion.
> shm_open("/TESTOBJECT"
>>That's a file path
Pedantically, no. It's a name (https://man7.org/linux/man-pages/man3/shm_open.3.html) that identifies a memory object that is only coincidentally also mapped to the file path "/dev/shm/TESTOBJECT" on a typical linux. shm_open() returns an "FD", though.
On Linux, as a sibling poster noted, you could also use mmap(.. MAP_SHARED | MAP_ANONYMOUS, /*fd*/ -1 ...) , which to my knowledge is entirely "file-free" by any meaning of the term "file". But then again, in my understanding this would only work with child processes because that mapping has to be inherited.
On other OSes, there may be completely different APIs to map shared memory that don't involve anything "file" like, either. Quite honestly I can't point you to any because I do only Linux and Windows, but let's just end the discussion here and let's agree that memory != file. I'm angry at myself for wasting another evening fighting a pointless discussion with somebody who would rather argue than try to get my point.
You conflated files with disks on your own. No one did that for you.
rather argue than try to get my point.
I still don't know what your point is. You have to have something that coordinates between two processes for shared memory interprocess communication and that ends up being file paths for the OS. You asked questions, they were answered and you could have learned something.
The whole point was actually that you can map the same memory into two different processes and use atomics, which is an incredible technique. For some reason you wanted to ignore that and make claims without explanation.
If you didn't want to waste time, you would have explained what you meant or asked questions.
> If you didn't want to waste time, you would have explained what you meant or asked questions.
You clearly haven't done your homework, because I did.
> You conflated files with disks on your own. No one did that for you.
I did not really conflate this. It is just conventional but imprecise terminology, and everyone who gets into such a discussion (especially when starting personal attacks) is expected to know to be careful when one hears "file" that it could mean "filepath", "file descriptor", or "file data" - especially "persistent file data" / "file storage", and that it could or could not mean something specific Unix-y or not Unix-y, or just some unspecific "data object". My usage of the term "file-backed" is definitely clear enough. More so given all the other explanations I made. Even more in the context of mmapping database files.
How about this: You yourself are the one who wasn't clear (or just wrong, not really understanding virtual memory), and I was the one clarifying myself multiple times, and I was the one just trying to make a simple point that could be easily understood by not being stubborn.
> The whole point was actually that you can map the same memory into two different processes and use atomics, which is an incredible technique. For some reason you wanted to ignore that and make claims without explanation.
I never ignored that but said from the beginning that you should share memory, but not file-backed memory. It's standard to share memory between processes and threads (especially threads), not an "incredible technique". It's an essential part of virtual memory management.
Go right back here to my first reply to your first reply, https://news.ycombinator.com/item?id=29943137 . Which has it all. "Because it allows you to do lock free memory based interprocess communication, which can be extremely fast." > " There is no need for file-backed memory to do that. ". Also go read my OP's sibling comment. Go read TFA, or just the title of this discussion. How can you not stop pretending you were just caught in an argument that you could not get out of without acknowledging you were wrong?
My very next comment: https://news.ycombinator.com/item?id=29947339 , "You can do "lock-free memory based interprocess communication" with memory (obviously). There is no need to back this memory with files". That comment also explains the problems of using a persistent file as backing. WHAT THE HELL STOP PRETENDING I WASN'T CLEAR THAT THIS IS ABOUT FILES ON DISK.
The next comment: "you can use normal (non-file-backed) memory to do the necessary synchronization (lock-free or not). I'm still not seeing why the memory should be backed by a file"
Then you wouldn't explain it and eventually admit that you do need to have a file path to give to another process, but only after I asked you to show what you meant multiple times.
And there isn't. It seems you just don't really understand virtual memory, and don't want to acknowledge what everyone else understands by "file-backed memory". And given that I find it courageous how stubborn you are, as well as starting personal attacks.
> Then you wouldn't explain it and eventually admit that you do need to have a file path
Need to have a file path IN WHICH ENVIRONMENT, IN WHICH CONTEXT??? Could YOU please clarify. We can easily make a simple OS which doesn't have "files" but does have processes that can share memory using virtual memory technology.
Shared memory IPC is fundamentally not about files, and you were even shown a way to setup shared memory mappings between Linux processes using normal userland API entirely without the use of files or file paths - with the restriction that the mappings have to be inherited (fork()).
How someone, even with no real understanding of the topic, could not at the latest at https://news.ycombinator.com/item?id=29947339 acknowledge that I was being perfectly clear that I was talking about persistent files (I literally said on a hard drive), is beyond me. I should have stopped this discussion at that point.
Files being persistent on storage has nothing to do with communicating through shared memory. It isn't necessary and it doesn't interfere if it's there. It is completely orthogonal, I don't know why it would ever be a part of the conversation when talking about direct reading and writing to the same memory.
> Files being persistent on storage has nothing to do with communicating through shared memory.
Files (whether persistent or not) have not really anything to do with communication through shared memory. In the implementation of an API like shm_open(), the VFS (virtual filesystem) is simply the address space and lookup mechanism that an operating system like Linux happens to use in order to find the memory that should be shared.
> It isn't necessary and it doesn't interfere if it's there.
Sure it does interfere. By backing memory needlessly with a persistent file, you're causing disk I/O from the loading and flushing (that can't really be controlled) and potentially bad performance.
Also, as explained, if you use a persistent file to track the synchronization state, the synchronization state won't be reset when the communicating processes die unexpectedly, and this might be problematic.
system like Linux happens to use in order to find the memory that should be shared.
Right. Is there some other mechanism to coordinate mapping the same memory between processes? That's all I ever asked.
Sure it does interfere. By backing memory needlessly with a persistent file, you're causing disk I/O from the loading and flushing (that can't really be controlled) and potentially bad performance.
That is orthogonal, since once you have the memory mapped into both processes you can use atomics for lock free IPC. That's the whole thing. It doesn't matter what the OS does or doesn't do in the background, atomically reading and writing to memory is unaffected.
> It doesn't matter what the OS does or doesn't do in the background, atomically reading and writing to memory is unaffected.
That's not true. If this thing is file backed there is usually no guarantee that the page of virtual memory (i.e. a page of the file data) you're accessing is present in physical memory. You'll cause page faults and data transfers to/from disk. This can delay the execution of an atomic read or write potentially infinitely, or even cause a "crash" of some kind if the disk transfer fails.
You can avoid the page faulting part of this if you somehow pin the memory. Which is completely ridiculous given that all you ever wanted is anonymous memory. I've looked up a website that seems to explain this better (but I haven't checked it deeply). Maybe it helps: https://eric-lo.gitbook.io/memory-mapped-io/pin-the-page
This can delay the execution of an atomic read or write
You can play "what if" all you want if you don't know what else running, but this was always about lock free interprocess communication, which is not broken by a page fault or process suspension.
An atomic instruction by design will do everything it needs to when the instruction runs.
Saying the OS can ultimately control the execution of a process is a nonsense cop out to try to skew away from the original point.
all you ever wanted is anonymous memory
This is local to a process tree and does not work for interprocess communication.
Dude, the example I gave you with shm_open() is creating anonymous memory. That's just what non-file-backed mappings are called, no matter how long you want to keep obsessing about any "file paths".
This doesn't even seem like a reply to what I said.
If you map memory anonymously you aren't doing interprocess communication.
If you don't, you have a file path that the other program can use to map the same memory.
That's it, there is nothing wrong with this. I don't know why this is so upsetting. Mapping memory anonymously is local to the process tree and doesn't work for two different programs communicating.
Ok, I'm extremely embarassed but it looks like I got the terminology wrong with regards to "Anonymous memory". And sorry for being so upset, at least I finally got something out of it.
It's also a fact that if I'm using disk swap space on a Unix, the same performance and stability issues apply as for disk backed file mappings. In that sense, there really is no difference.
You can do "lock-free memory based interprocess communication" with memory (obviously). There is no need to back this memory with files, certainly not files on a hard drive that you would otherwise access using read() and write(). Hence my original question.