Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

close() is typically a blocking operation. But when it happens in devfs, procfs, tmpfs, or some other ram only filesystem I expect it to be fast unless documented otherwise.


Especially when you are in devfs you should not assume anything at all! Close in devfs is just a function pointer which is overridden by each of the myriad device drivers that expose files in /dev. Your close() could be the final one which lets the driver perform some cleanup. It might decide to borrow your thread to do it. Maybe some device was about to be ejected/disabled but could not previously because you were holding an FD to it.

The same goes for /proc and /sys which are very similar to /dev in that they represent various entry points into the kernel.


It can be slow every time if your AV software hooks close to do its expensive scan operation like Windows Defender.


> I expect it to be fast unless documented otherwise.

Logically you should expect it to block indefinitely unless documented otherwise. The exception would be completing within a time bound, the rule is blocking indefinitely.


> Logically you should expect it to block indefinitely

Frankly, that’s completely insane. It should block if and only if there is actual io in flight which could produce a failure return that an application needs. Syscalls should be fast unless there is a very good reason not to be.


> It should block if and only if there is actual io in flight which could produce a failure return that an application needs.

Blocking simply means that the specification does not guarantee an upper bound on the completion time. There is no other meaningful definition. POSIX is not an RTOS therefore nearly all system calls block. The alternative is that the specification guarantees an upper bound on completion time. In that case what is an acceptable upper bound for close() to complete in? 1ms? 10ms? 100ms? Any answer diminishes the versatility of the POSIX VFS.

> Syscalls should be fast unless there is a very good reason not to be.

I think this is an instance of confusing what should be with what is. We’ve been through this before with O_PONIES. The reality is that system calls aren’t “fast” and they can’t portably or dynamically be guaranteed to be fast. So far the only exception to this is gettimeofday() and friends.

Robust systems aren’t built on undocumented assumptions. Again, POSIX is not an RTOS. Anything you build that assumes a deterministic upper bound to a blocking system call execution time will inevitably break, evidenced by OP.


> The reality is that system calls aren’t “fast” and they can’t portably or dynamically be guaranteed to be fast.

Perhaps, but the reality is also that the vast majority of games and other interactive applications routinely make blocking system calls in a tight main loop and expect these calls to take an unspecified but reasonable amount of time.

“It’s a blocking syscall so if it takes 1s to close a file, that’s technically not a bug” is correct, but is any player of “Papers, Please” going to be sympathetic to that explanation? Probably not; they’ll think “Linux is slow,” “Linux is buggy,” “why can’t Linux run basic applications correctly that I have no problem running on Windows or OS X?,” etc.

“Syscalls should be fast unless there is a very good reason not to be” strikes me as a wise operating principle, which weights usability and usefulness of the operating system alongside being technically correct.


> “It’s a blocking syscall so if it takes 1s to close a file, that’s technically not a bug” is correct, but is any player of “Papers, Please” going to be sympathetic to that explanation? Probably not; they’ll think “Linux is slow,” “Linux is buggy,” “why can’t Linux run basic applications correctly that I have no problem running on Windows or OS X?,” etc.

I don’t agree with this logic. Windows and macOS system calls also block. The issue of people considering Linux to be slow is not relevant to the fact that its systems calls block. The poorer quality of Linux games, and commercial Linux software in general, is more likely due to smaller market size / profit opportunity and the consequential lack of effort / investment into the Linux desktop/gaming ecosystem.

Now if your argument is we should work around buggy applications and distribute hacked patches when the developers have abandoned them for the sake of improving user experience. I agree with that.

> “Syscalls should be fast unless there is a very good reason not to be” strikes me as a wise operating principle, which weights usability and usefulness of the operating system alongside being technically correct.

Linux already operates by this principle. We are examining a situation where best effort was not good enough to hide poor application design.


> Linux already operates by this principle. We are examining a situation where best effort was not good enough to hide poor application design.

Linux has this principle as a goal, but it's probably not checked often.

I would say this code fails the principle, independent of particular application problems.


> I would say this code fails the principle, independent of particular application problems.

For every system call you determine satisfies that principle, I could come up with a application level algorithm that is broken because of it. The principle is aspirational, Linux does a best effort as all Unix systems do not because Linux is buggy but because it can never be 100% given the spec. The core issue here was not close() taking 100ms or whatever it took, the core issue was doing unbounded work on the main drawing thread, which has strict timing requirements.


They're both problems.

This slowness is approaching the point where even checking for joysticks on a dedicated thread would start having delay problems. And spawning a thread per file would be ridiculous and would get even more scorn if it was slow, "why are you spawning so many threads, of course that's not efficient".


> This slowness is approaching the point where even checking for joysticks on a dedicated thread would start having delay problems.

Poorly designed code will perform poorly. Well designed code won’t have delay problems.

> And spawning a thread per file would be ridiculous and would get even more scorn if it was slow,

Where in this entire thread was it suggested to spawn a thread per file? Threads are able to perform more than a single unit of work.


> Poorly designed code will perform poorly. Well designed code won’t have delay problems.

If I need to open and close 20 files every few seconds, and they all might have unpredictable latencies, even the best designed code in the world could have delay problems.

> Where in this entire thread was it suggested to spawn a thread per file?

You just implied that checking all the files on a dedicated thread is still 'poorly designed code', didn't you?

So if a dedicated thread for the whole group of files isn't enough, sounds like you need to move to a thread per file. Unless it's wrong to use close() at all, or something? You can only blame the code so much.


> Blocking simply means that the specification does not guarantee an upper bound on the completion time.

I don't think that's a commonly-accepted (or useful) definition of "blocking." By that definition, getpid(2) is blocking.

> I think this is an instance of confusing what should be with what is.

Who is doing the confusing? I said "should be." Are you saying they're fast now but should be slow? Why?

> The reality is that system calls aren’t “fast” and they can’t portably or dynamically be guaranteed to be fast.

This isn't a portable program; it's a Linux program. The problem isn't that close can't be portably guaranteed to complete in some time bound; it's that Linux is adding what is essentially an extra usleep(100000), with very high probability, for the devfs synthetic filesystem in Linux.

This is entirely an own-goal; Linux has historically explicitly aimed to complete system calls quickly, when that does not break other functionality. It is a bug that can be fixed, e.g., with the proposed patch(es).

POSIX does not mandate that close blocks on anything other than removing the index from the fd table -- it's even allowed to leave associated IO in-flight and silently ignore errors. It makes little sense for a synthetic filesystem without real IO to block close so grossly.


CyberRabbi's definition of blocking is correct and what I've always seen commonly accepted.

Blocking means you don't know how long it'll take, and you want to wait for it to finish. The only safe assumption is that you cannot guarantee how long it'll take.

getpid is accurately therefore a blocking call. You don't know how long it'll take. You can profile and make best guesses, but you can never assuredly say how long it'll take.


Every operation in a non-RTOS is blocking by this definition, even local function calls that don’t enter the kernel, because the kernel may switch to another thread at any time. It’s utterly useless as a definition. Much more common is to divide system calls into ones that call depend on some external actor and those that don’t. Eg, recv() on a socket, blocking on a futex held by some other process, or waiting on IO to some disk controller. Getpid() is synchronous but does not block.


Blocking in that sense is usually used in relation to some event. E.g. sleep() blocks on a timer, read() blocks on IO, etc.

In the general sense, it means that the call has an indefinite run time. E.g. “this call blocks” = “this call could take an arbitrarily long amount of time”

getpid() is blocking but it likely does not block on IO (though it could as that is allowed by the spec).


If you call getpid, or even local functions, can the rest of your code (in a single thread) continue till getpid returns?

E.g if you do this inside a function (useless code)

int pid = getpid(); std::cout << pid+2 << std::endl;

Will the output print even if the hypothetical call to getpid takes a second?

If the answer is the print will wait, then it's a blocking call.

If it was an async call, then it could happen concurrently or in parallel, and unless you waited, it would continue on in a non blocking fashion.

Waiting for a return == blocking. It may be quick but unless the spec specifies that it must be synchronous+non-blocking, the distinction between the two is moot.


But with such an extreme definition, can you even show me what an an async non-blocking syscall would look like?

Because I'm going to point at the assembly instructions that pass the parameters, and say "an interrupt happens here, delaying it for 1 second".

Any definition of blocking that includes "int fifty() {return 50;}" strikes me as having problems.

More specifically, I'd say there's some amount of "kernel does a thing" that needs to be excusable when you're talking about whether a syscall is blocking or not, otherwise everything is blocking.

Unless we want to say that 'nonblocking' is fake on non-RTOS systems, and not even try to define the term in that context.


There are two points that I've made a couple times that are perhaps getting lost:

1. It's about blocking your logic flow, not about how the system is actually executing it or what the machine code resolves to. If a subsequent call is blocked on a previous one, then it's blocking. Spawning an async function or creating a new thread etc can be blocking, whereas what runs on it isn't (for your current thread).

2. Being blocking or not is independent of performance. A blocking function call can be near instant, it may get inlined, it may take a year to run. Similarly an async or non blocking call can also have the same time complexity. The issue is that if the spec doesn't say it returns instantly, or you don't know for sure that it does, you can't guarantee that the blocking time will be short enough to be acceptable. So while getpid or close will almost always return instantly, it's still blocking. And if the spec doesn't say it's guaranteed, then the performance acceptability in the hot path can change.

End of the day it's all just (often pedantic) semantics to let people describe the execution nature of things so devs can make the best decisions for their performance needs.


I think you replied before I added 'Unless we want to say that 'nonblocking' is fake on non-RTOS systems, and not even try to define the term in that context. "

Sure, the spec doesn't give a guarantee. But let's say it's impossible to give a guarantee on Linux. Is it really the best option to give up on defining 'nonblocking' entirely? Maybe we should formulate guarantees with an escape hatch for non-RTOS hazards. If we can do that, then getpid deserves one of those conditional guarantees.

And since I'm pretty sure the intent of mentioning getpid was to talk about the code, not the documentation, I think that would make it nonblocking.

> End of the day it's all just (often pedantic) semantics to let people describe the execution nature of things so devs can make the best decisions for their performance needs.

Which is why you don't want to label everything blocking. Nobody can have a useful discussion then.

And also why it's useful to talk about the execution nature of code, even when no spec exists. You don't want to get stuck on implementation details but you shouldn't ignore implementation either.

Edit:

> Spawning an async function or creating a new thread etc can be blocking, whereas what runs on it isn't (for your current thread).

There's some value in talking about functions that way, but for a syscall in particular you need a nonblocking spawn for the syscall to be nonblocking. If that's definitionally impossible, then something bad has happened to the definitions being used.


The only reason I mentioned that spawning threads/creating an async future is blocking is because you had mentioned that async would generate blocking assembly by my definition.

And I agree, it would and therefore the definition is potentially meaningless. But pedantically it is blocking (but the functions called within it aren't to the current thread).

In a colloquial every day sense, I'd not be this pedantic. but this is a thread specifically about that pedantry.

End of the day, if I were talking colloquially, I'd only talk about expensive blocking calls as being blocking, regardless of IO when responsiveness is important. Otherwise it doesn't matter unless it's parallelizable and there are performance gains to be had.


> And I agree, it would and therefore the definition is potentially meaningless. But pedantically it is blocking (but the functions called within it aren't to the current thread).

If I was going for maximally pedantic but still useful definitions, I'd say that a "[non-]blocking syscall" is a different concept from how you'd describe running functions synchronously or asynchronously. And to elaborate, something like: Code that runs asynchronously is non-blocking, code that runs synchronously can be either blocking or non-blocking, and a syscall always has at least some synchronous code.

I like the idea of saying a syscall is non-blocking if the spec says it returns instantly. But I would add on to that, and say that if "this is not a real-time-OS" is the only reason the spec doesn't say it returns instantly, then we should call that non-blocking too. Or "non-blocking*" with a footnote that mentions RTOS issues.

You ask about getpid() taking a second. I'd say that within the model of "put those RTOS issues aside", that doesn't happen and can't happen. Just like we usually exclude unplugging the computer from our execution model, so too we exclude "linux isn't RTOS" from our execution model. getpid can't get stuck waiting on any resources, and does only trivial computation, so it will return immediately.


> I like the idea of saying a syscall is non-blocking if the spec says it returns instantly.

”instantly” is not a strong enough guarantee to call the syscall non-blocking. The caller needs to know exactly how the callee will perform in terms of run time. Most high level RTOSes spec this as saying the call will take a constant amount of time, allowing you to measure the call once during your testing and using that to estimate future runs.

Words like “fast” “slow” “instantly” are not useful in the domain of building real time systems at all. It’s about specifying a predictable run time.

Without providing any spec on the runtime of a system call, the only robust assumption is to assume it blocks indefinitely. When you assume a run time spec for a call where one is not spec’d (e.g. close()) that will inevitably result in unexpected behavior. Using calls that take unbounded time in a process that has strict time requirements is a recipe for failure. The domain of real-time interactive systems is not the same as the domain of batch processing.

> You ask about getpid() taking a second. I'd say that within the model of "put those RTOS issues aside", that doesn't happen and can't happen. Just like we usually exclude unplugging the computer from our execution model, so too we exclude "linux isn't RTOS" from our execution model. getpid can't get stuck waiting on any resources, and does only trivial computation, so it will return immediately.

This further shows that there is a fundamental misunderstanding in how POSIX systems operate. It’s very possible for getpid() to take longer than one second during normal operation because it’s stuck on a resource and POSIX allows for that on purpose. Every entry into a system call invokes a litany of bookkeeping tasks by the kernel before returning to user space, with the exception of VDSO calls like gettimeofday(). Please see exit_to_user_mode_loop() which gets called before every syscall returns to user space to see all the potentials sources of additional latency a call like getpid() may incur: https://github.com/torvalds/linux/blob/c9e6606c7fe92b50a02ce...

Again this is not by accident, this is on purpose. You’ll find a similar loop in all POSIX kernel system call entry/exit code.


Pretend I said 10 microseconds everywhere I said instantly, then. Same argument, more or less.

Anything that could make getpid take too long is outside the scope of what linux could guarantee.

But inside that scope, it's still worthwhile to distinguish between "blocking" and "nonblocking with very specific exceptions"

> It’s very possible for getpid() to take longer than one second during normal operation being stuck on a resource

What resource? I did my best to look at the implementation, but the source code is complicated and scattered. I can't really process your link by itself. How often are these things causing delays?

"Being rescheduled" is already part of the model of any process, anyway. If a system call doesn't make it any more likely that my process stops compared to the baseline, then I think "nonblocking" is a reasonable term to want to use.


> What resource? I did my best to look at the implementation, but the source code is complicated and scattered. I can't really process your link by itself. How often are these things causing delays?

A signal may need to be invoked and that could cause paging to disk. The point is that the kernel is allowed to do a non-predictable amount of work on most system calls and therefore you cannot assume getpid() completes in any amount of time. If you’re building a real time interactive system, then this matters. If you’re building a system that’s allowed to be non-responsive (for running batch processes, network servers) then it doesn’t.


People are going to keep using non-realtime systems to run soft realtime UIs.

We can't make them stop, so it's still important to distinguish between "this syscall might hit a signal or an interrupt, just like every single line of code in the program" and "this syscall might hit a signal or an interrupt, but also it might get stuck waiting on a resource in a way that couldn't have otherwise happened".

If you want to suggest different terms from "nonblocking" and "blocking" I'm open to change. But in the absence of better terms, I'm going to keep using those, with an asterisk that says I'm inside linux and literally anything could technically block.


> People are going to keep using non-realtime systems to run soft realtime UIs.

Very true and if they want their applications to work well they should write they applications correctly!


The best way to help them write applications correctly is not to say "all syscalls are blocking, none are nonblocking, no other categories".


There are categories, some system calls block on timers, some block on disk io, some block on network io. But they all block, except for gettimeofday() and friends.


I mean I wouldn't say gettimeofday is significantly better than getpid because your thread might switch out anyway. But sure five categories is fine, I just dislike lumping almost everything together.


I'd say that the commonly accepted definition for a blocking call is one that may depend on I/O to complete, releasing control of the CPU core while waiting.

By that definition, getpid() is definitely nonblocking, though it doesn't have an upper bound in execution time. POSIX does not offer hard realtime guarantees.

close() in general would probably be blocking (as a filesystem may need to do I/O), but I'd expect it to behave nonblocking in most cases, especially when operating on virtual files opened read-only. Unfortunately, I don't think those kinds of behavioral details are documented.


A function that sleeps for 5 seconds is blocking. No IO involved.

Blocking just means that you're blocking your current code till you return out of the called function.

Anything else regarding a function call is an assumption unless you know the exact implementation.


> I don't think that's a commonly-accepted (or useful) definition of "blocking." By that definition, getpid(2) is blocking.

When it comes to expecting a specific duration, getpid() is blocking. If you run getpid() in a tight loop and then have performance issues you can’t reasonably blame the system.

> This isn't a portable program; it's a Linux program

But the interface is a portable interface

> POSIX does not mandate that close blocks on anything other than removing the index from the fd table

And what if the fd-table is a very large hash table with high collision rate? How do you then specify how quickly close() should complete? 1ms/open fd? 10ms/open fd? Etc.

It should be clear that the problem here is that the author of the code had a faulty understanding of the system in which their code runs. Today the issue was close() just happened to be too “slow.” If the amount of input devices were higher, let’s say 2x more, then the same issue would have manifested even if close() were 2x “faster.” No matter how fast you make close() there is a situation in which this issue would manifest itself. I.e. the application has a design flaw.


> Today the issue was close() just happened to be too “slow.” If the amount of input devices were higher, let’s say 2x more, then the same issue would have manifested even if close() were 2x “faster.” No matter how fast you make close() there is a situation in which this issue would manifest itself.

Close, on an fd for which no asynchronous IO has occurred, should be 10000x faster, or more. It’s unlikely a user will have even 100 real input devices. I agree the algorithm leaves something to be desired, but the only reason it is user-visible is the performance bug in Linux.

I’ve worked on performance in both userspace and the kernel and I think you’re fundamentally way off-base in a way we’ll never reconcile.


> I agree the algorithm leaves something to be desired, but the only reason it is user-visible is the performance bug in Linux.

The only reason it wasn’t user-visible was luck. Robust applications don’t depend on luck.

Something tells me you’ll think twice before calling close() in a time-sensitive context in your future performance engineering endeavors. That’s because both you and I now know that no implementation of POSIX makes any guarantee on the runtime of close() nor will likely do so in the future. That’s just reality kicking in. Welcome to the club :)


There's no guarantee for the runtime of any function. It's perfectly valid for the OS to swap your program instructions to disk, and then take seconds or even minutes to load it back.

It's effectively impossible to avoid depending on what you call "luck". The OS does not provide nearly enough guarantees to build useful interactive applications without also depending on other reasonable performance expectations.


> It's perfectly valid for the OS to swap your program instructions to disk, and then take seconds or even minutes to load it back.

It’s not valid to swap your program instructions to disk if you call mlock() on your executable pages. Indeed, performance sensitive applications do just that. https://man7.org/linux/man-pages/man2/mlock.2.html

> It's effectively impossible to avoid depending on what you call "luck". The OS does not provide nearly enough guarantees to build useful interactive applications without also depending on other reasonable performance expectations.

This is all self-evidently false. You likely wrote your comment on a POSIX-based interactive application. It just takes knowledge of how the system works and what the specifications are. Well-designed programs are hard to come by but they do exist.


Does mlock itself have a guaranteed maximum execution time? Is it guaranteed to return success under the relevant conditions? While that is an excellent way to address the problem I mentioned, you still have to depend on more than just the guaranteed behaviour of the OS.

> You likely wrote your comment on a POSIX-based interactive application. It just takes knowledge of how the system works and what the specifications are.

I wrote my comment on an interactive POSIX application, yes, but I believe my browser depends on "reasonable performance" of OS-provided functions in order to be usable.

It would be a fun exercise to evaluate such a program that supposedly did not. For any given program, I suspect I could patch the Linux kernel in such a way that the kernel still fulfilled all guaranteed behaviour while still making the program unusable.


I agree the application should not have done this. On the other hand I also agree indefinite block time is not a useful definition despite being correct in theory, perhaps a more pragmatic one would be some time / compute unit percentile? So a consistent 100ms close call which is proven to be a bug won't get lost in definition.


The machine is not running POSIX, it's running Linux which is POSIX-ey, and an RTOS does not guarantee that system calls do not block. The insistence on only referring to POSIX was what caused the O_PONIES debate in the first place.

If one assumes that "there is no upper bound on the completion time", then that also means assuming that a poll/read/write will never return within the lifetime of the machine as it could block for that long (maybe you're using this computer: https://www.youtube.com/watch?v=nm0POwEtiqE), and so it is impossible to implement a functioning, responsive application, much less a game.

In the real-world you need to make slightly more reasonable assumptions. And, again, when interacting with device files you must refer to the kernel documentation rather than POSIX, as POSIX does not describe how these files work in any meaningful way or form.


> poll/read/write

The “non-blocking” nature of those calls were invented for network servers, not for video games. Not only is jitter tolerable there but high latency is allowed from the lowest layers of the stack. It’s not uncommon to simply get no response from a network request.

A video game should never ever do arbitrary system calls on its main drawing thread unless those system calls are specifically intended for that use case. Jitter is not tolerable in this use case since the timing requirements are so strict. The code must product a frame every 16.6ms, no exceptions. The interface must never become unresponsive.

> RTOS does not guarantee that system calls do not block

RTOSes do indeed provide upper bounds for all calls.

> And, again, when interacting with device files you must refer to the kernel documentation rather than POSIX

Yes that would be a relevant point if it were the case that the kernel documentation for these devices specified that close() should complete within some time bound.


Very similar to people using node.getenv in hot sections of code and the resulting not understanding what's happening.

https://github.com/nodejs/node/issues/3104

When you call out to the sys or libc things are going to happen and you should try and be aware of what those are.


Sorry... what? Why the hell was an application using env() to carry application state?!

The environment list is created at init, it's literally placed right behind the C argument list as an array -- AUXV if you want to go read the ABI Specification for it.

Therefore, anything you grab using getenv() can be considered to be static (Barring use of setenv), so the proper and correct thing to do is shove the things you need into a variable at init. Unless you yourself are editing it, but you should still use a variable because variables are typed and getenv is not (Thinking along the lines of storing port information, or whatever, where you need to parse it into a string to get it into the environment, and then need to parse it out of a string). For things like $HOME, those only ever change once, and you should really have a list of those that you check, because you will want to check XDG_HOME_DIR, and a few other areas. So you will want those in a list anyway, might as well do it at creation time when the data is fresh.

Anything you set with setenv() only alters the your environment state, and that will carry down to newly created children at creation time. So the only reason I can think of why anyone would do this would be to communicate data to child processes. Except there are so, so many better and non-stringly typed ways to do this, including global variables. Child processes inherit copies(?) of their parent's state, you can just use that, so there is literally, NO reason ever to do this.


… unless you intend to exec after forking


Sure, but just use execvp and it's a damn sight safer because you know exactly the state of your child's environment state. You can see this in the CERT C coding guidelines: https://wiki.sei.cmu.edu/confluence/display/c/ENV03-C.+Sanit...

also ENV02-C comes into effect, as well, if your program is invoked with

    SOME_INTERNAL_VARIABLE=1 PORT=2000 ./prog
then you try to invoke your child with:

    setenv("SOME_INTERNAL_VARIABLE", "2", 1);
    (fork blah blah)


u/CyberRabbi is absolutely correct. It's true that for _some_ kinds of devices you could expect fast close(2) IF the device documents that. But as you can see, implementing this can be hard even for devices where you'd think close(2) has to be fast. Even a tmpfs might have trouble making close(2) fast due to concurrency issues.

The correct thing to do when you don't care about the result of close(2) is to call it in a worker thread. Ideally there would be async system calls for everything including closing open resources. Ideally there would be only async system calls except for a "wait for next event on this(these) handle(s)" system call.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: