Hacker News new | past | comments | ask | show | jobs | submit login

Link to the standard:

http://pubs.opengroup.org/onlinepubs/000095399/functions/clo...

Some discussion that is related to this:

http://lwn.net/Articles/365294/

http://stackoverflow.com/questions/7297001/if-close2-fails-w...

Something which I would consider to be "the answer" from Linus for Linux:

http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-09/3000...

http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-09/3171...

http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-09/3202...

(Apparently, the first thing Linux does is deallocate the file descriptor; /then/ it starts flushing pending written data. If this process is interrupted, it will return EINTR, but the file descriptor itself is already deallocated, and may have been reused long before close() returned.)

Honestly, after reviewing all of this material, I think the author of this post might simply not know what "unspecified" means (or, alternatively, is making an abstract problem sound overly concrete and blown somewhat out of proportion).

When a specification says "unspecified", it doesn't mean "unknown at runtime, could be anything nondeterministically". It only means that the standard did not specify the result.

Often, this is because there is either a historical artifact or reasonable practical constraint that was posed by one or more vendors that were implementing the standard that meant different implementations simply /weren't/ going to agree on any one value.

However, for any given implementation, there very likely might be a correct answer. In the case of Linux, you should not retry calls to close(): the primary maintainer has told us that it deterministically guarantees the file descriptor is closed even if the function returns EINTR (critical edit: the wording of this sentence was incorrect in original draft).

That said, there may be another implementation out there, possibly one we may use (but with a standard like this, could easily be one that we have never heard of and that already died a death of obscurity years ago) that has the opposite behavior: where EINTR interrupts something (such as pending writes) before the file descriptor is closed, requiring developers to retry.

It is, however, unfortunate that this happened, and the author is right about issues for developers who attempt to use this API "without assuming more than the standard specified".

That said, there may be other options, such as using SA_RESTART, as argued in this discussion I just found (which also happens to talk about the meaning of "unspecified" ;P):

http://permalink.gmane.org/gmane.os.hurd.devel.readers/265

(edit:) Oh, and people may find it interesting that standards like this are often modified over time, with "interpretations" and outright edits. As an example, here is some vaguely related discussion regarding closing file descriptors during process termination:

http://austingroupbugs.net/view.php?id=498

(edit:) Some more existing discussion:

https://sites.google.com/site/michaelsafyan/software-enginee...

http://lists.freebsd.org/pipermail/freebsd-threads/2010-Augu...

http://unix.derkeiler.com/Newsgroups/comp.unix.programmer/20...

http://www.cpptalk.net/image-vp121851.html

http://programming.itags.org/unix-linux-programming/79468/

Another (very long) Linux-specific thread involving some more responses from Linus:

http://lkml.indiana.edu/hypermail/linux/kernel/0207.2/0391.h...

http://lkml.indiana.edu/hypermail/linux/kernel/0207.2/0409.h...

http://lkml.indiana.edu/hypermail/linux/kernel/0207.2/1173.h...

In this thread we also find someone else claiming the standard is "broken":

http://lkml.indiana.edu/hypermail/linux/kernel/0207.2/0413.h...

With a possibly not useful response, that at least demonstrates that the examples from the specification did not retry:

http://lkml.indiana.edu/hypermail/linux/kernel/0207.2/0395.h...




When a specification says "unspecified", it doesn't mean "unknown at runtime, could be anything nondeterministically". It only means that the standard did not specify the result.

Sure. But if you want to write portable code, this means you can't assume any particular behaviour.

If you never want to run on anything except recent versions of linux, life is much easier for you. But some of us don't like being so limited.


Sure. Or just don't close() from a thread where you want to handle signals (without terminating, that is). Signals are the broken bits, frankly, not the POSIX standard. They have never played well with system calls. There's an essay somewhere I remember reading where an ITS hacker looks at Unix for how it handles the interrupted system call problem and comes away horrified at the discovery that it makes the user do it via EINTR.

There's no good reason to be catching synchronous signals in modern code[1]. See signalfd() et. al.

[1] Well, except for things that can only be delivered synchronously like SIGSEGV for user-handled paging. But that's complicated enough that the extra complexity of handling a SIGSEGV delivered out of a syscall is probably tolerable.


> There's an essay somewhere I remember reading where an ITS hacker looks at Unix for how it handles the interrupted system call problem and comes away horrified at the discovery that it makes the user do it via EINTR.

I think you're talking about Worse is Better: http://www.jwz.org/doc/worse-is-better.html

> There's no good reason to be catching synchronous signals in modern code

In libraries you can never know what the main program will do, and who wants to write a library that says "this library will malfunction if you use OS feature X"?

> See signalfd() et. al.

That's very cool, I had not seen that! But it's Linux-specific.


> There's an essay somewhere I remember reading where an ITS hacker looks at Unix for how it handles the interrupted system call problem and comes away horrified at the discovery that it makes the user do it via EINTR.

Not really the real point of the essay, but yeah there kind of is. Now come with a design (with similar premices, you can't just argue that the whole system call semantic has to change...) which does not involve the user, and we will talk.


Given that SA_RESTART already manages to perform that restart without the user's intervention, it is obviously pretty simple to come up with such a design: you just make SA_RESTART the default behavior. ;P


No reasons huh? I use signals to interrupt threads that block on open() and waitpid(). The only other alternative is sleep-polling which really sucks.


I said don't catch signals, not don't use them. See signalfd() et. al.

Though I'm curious why you're wanting to trying to interrupt those syscalls. In particular, why are you waiting on a process that you don't know is already dead? That's what SIGCHLD is for. And the only case I can think of for a blocking open() is a network/fuse mounted file. In which case the filesystem will surely eventually return an error (and if it doesn't no amount of voodoo in your application logic is going to correct the bugs in the underlying system).


signalfd() don't exist. I'm sorry that the manpage isn't really explicitly clear about that.

(It's a Linux-specific extension.)


What about other operating systems like FreeBSD, Solaris and OS X? I have to support many platforms and my code is heavily multithreaded.


FreeBSD: It seems pretty clear in FreeBSD that you should not retry the call to close(); sys_close() calls kern_close() which locks the file, frees the file descriptor (!), and /then/ calls closef() (which, if capable of returning EINTR, is already too late).

http://lists.freebsd.org/pipermail/freebsd-threads/2010-Augu...

http://fxr.watson.org/fxr/source/kern/kern_descrip.c

Mac OS X: Here you should not retry, but you should also be careful; depending on whether you are UNIX2003 you will get different behavior from the close() function from libSystem, directing you to either the syscall close() or close_nocancel().

If you end up with close(), then the first thing that happens is __pthread_testcancel(1) is called, and if the thread is cancelled it will return EINTR before doing anything else: in this case, you would need to retry.

However, I think close_nocancel(), which calls closef_locked(), might be capable of returning EINTR, which will be held and only returned from close_internal_locked() after _fdrelse() has already removed the descriptor.

So, if it is the case that EINTR is capable of being returned from the closef_locked() call, you would need to /not/ retry, which thereby means that the close() version of this call is impossible to use safely on Mac OS X: if I were you I'd avoid it to use close_nocancel() (explicitly if warranted).

http://opensource.apple.com/source/xnu/xnu-1699.24.8/bsd/ker...

Solaris: Unfortunately, I do not know enough about Solaris to provide any useful commentary. :(


Notice how long and complex your discussion of this issue is, and how many different systems you have to investigate in depth to even begin to form a coherent picture.

All we're trying to do is close a file safely. This is core functionality that virtually every application will have to depend on. It would be like free() saying it might be interrupted by a signal, and you have no way of knowing if the memory has actually been deallocated or not. That would be insane.


I believe FreeBSD will never return EINTR for close, but I wouldn't bet my life on it. I've heard rumours that Solaris can return EINTR, but likewise I make no guarantees. I have no idea what OS X does.


If FreeBSD is like OpenBSD, you may get EINTR on NFS, but the file is still dead.


Have you ever tried to write portable code for multiple unixes?


Yes. Afterwards, I learned more about standards and specifications (including taking on a weird fetish for lurking on the mailing lists where people actively are working on them), and realized that my attempts were flawed by premise. ;P

Seriously, POSIX is /not/ able to provide for you the ability to have a single program always work on every system: they tried really hard, but the world isn't perfect. They were (and are) attempting to unify and control something really complex, and they made remarkable progress given how many people they were trying to bring together who already had existing incompatible implementations.

In this case, they carefully made clear that this behavior was unspecified. That does not mean that they failed or that their specification was broken. In fact, I'd argue the opposite: if they had required implementations to do something in specific, the specification would be broken as it would not have described reality. You can't just claim that the implementations of your standard that people are actively trying to code against don't exist or are incorrect.

Honestly, though, to take a more direct appraisal of your question, the real epiphany for me came when someone clubbed me over the head with the difference between "portable" and "ported", and then demonstrated that all of the people that had come before me whose work I most admired had concentrated on making "portable" code as opposed to "ported" code: the most amazing code I've ever seen is the code that has managed to easily be adapted to changing environments as it had the simplest design and most powerful abstractions from the underlying systems, often as a direct result of attempting to embrace so many unrelated platforms.

Which then leads to a "better" question: have you ever tried to write code that could easily be ported between multiple operating systems, whether they be any of the numerous implementations of Unix (old or new, BSD or System V, largely compliant or downright buggy), Windows (using native APIs, not compatibility wrappers), or Mac OS (9, not X)?

If not, I recommend trying it, as that is what "portability" really is: once you experience it for your own code, it is difficult to take projects that insist on only working on a single homogeneous set of environments seriously anymore.


> I think the author of this post simply doesn't know what "unspecified" means.

I'm pretty confident Colin Percival knows what "unspecified" means. I suspect he also knows how it's different from "implementation-defined" (also pointed out in the thread you link to), and how it means that, officially, we don't even have a way to determine what an implementation does, because they don't have to document their choice.

SA_RESTART is also not a guaranteed solution for someone trying to be standards-compliant, since it's part of the XSI extensions, which may not be present or otherwise required for the application.

For such core functionality, this is a dangerous bit of ambiguity.


I claim the issue here is that this standard is not evil/epic magic. It did not and could not solve all possible portability issues between different implementations.

After continuing to research this issue (as I love looking at this kind of stuff: understanding more about the history and complexity of implementations excites me), I finally hit jackpot.

http://programming.itags.org/unix-linux-programming/79468/

HP-UX 11.22: """[EINTR] An attempt to close a slow device or connection or file with pending aio requests was interrupted by a signal. The file descriptor still points to an open device or connection or file."""

AIX 5.3: """If the close subroutine is interrupted by a signal that is caught, it returns a value of -1, the errno global variable is set to EINTR and the state of the FileDescriptor parameter is closed."""

So, here we have two Unix implementations--both of which predated POSIX--that have incompatible definitions of close(). I do not feel it is reasonable to demand POSIX solve this, and sure enough: it didn't.

Now, maybe that makes POSIX "useless" (if what you care about is being able to write code without knowing anything about the target), but I don't think it is fair to claim that the definition of close is "broken".

In distinction, it clearly states that the behavior is "unspecified", which is a bothersome yet practically acceptable tradeoff. Some things in life are simply unspecified. :(


I do not feel it is reasonable to demand POSIX solve this

I think it would be entirely reasonable to demand that POSIX resolve this. But I've always come down on the side of proscriptive standards rather than merely descriptive standards.


Interesting, I didn't know anything actually refused to close the file. HP-SUX sucks again.

From one standpoint, I can see that would even be the desired behavior, because it gives the app the opportunity to deal with the error in some meaningful way, like to finish flushing. But since everybody else has long since decided that if you care you flush then close, being different is just aggravating. Although it sounds like flush then close will work everywhere.


I think it's AIX that's truly broken in this case, returning EINTR even though it did actually close the descriptor.

If every implementation returned success when it had closed the descriptor and EINTR only when it hadn't, this would be easy.


I think you're missing the broader point. We don't actually have a problem with the close() behavior differing between implementations, we have a problem because there is no way to determine the correct course of action afterward.

For HP-UX, it's "close it again".

For AIX, it's apparently "everything is fine, don't close it again". Linux, too.

That could be OK, even when undocumented, IF we had a way to discover what had occurred. But we don't. close() isn't broken because it can behave in multiple ways that can't be predicted in advance, it's broken because there's no safe way to proceed.

And by the way, we do know something about the target when we're writing POSIX-compliant code. We know the target is POSIX-compliant. That's the whole point of POSIX.


That is totally fair: even something as simple as a #define available inside of a header file could have allowed this to be discovered from the source code, which would have not only not broken any existing code, but also would have allowed existing implementations to not have unreasonable burdons.

I just have a difficult time with the rather strong terminology that the definition is broken: the definition is consistent, reasonable, and even somewhat workable... it is just not terribly useful, satisfying, or comforting; if anything, I'd say "incomplete".

(Although, given that it chose "unspecified" rather than "implementation-defined", one has to wonder whether some existing system actually did something horrific in this case and left non-deterministic behavior. I guess there could always be a constant for "you're screwed", though. ;P)


Sure, there's a way to tell. There are lots of ways of telling which platform you're on (#defines), and platforms can document the expected behavior. If a platform doesn't, then complain to the vendor.

Yes, it sucks to resort to conditional compilation around "close", but it's already been explained why it's necessary. POSIX was not created in a vacuum as a theoretically ideal OS specification.


> SA_RESTART is also not a guaranteed solution for someone trying to be standards-compliant, since it's part of the XSI extensions, which may not be present or otherwise required for the application.

I need to backtrack on this. I just discovered SA_RESTART appears without surrounding XSI tags in the 2008 (Issue 7) standard. This was not the case in the 2004 (Issue 6) standard.

EDIT: I had here a discussion of whether dropping the tags was intentional, but now I see I missed the key sentence in the 2008 update: "Functionality relating to the Realtime Signals Extension option is moved to the Base."

SA_RESTART is core POSIX functionality from the 2008 edition on. Of course there are many systems out there not compliant with 2008, but this does change the picture somewhat.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: