You're right. Looking at my actual code, instead I stored the accept to be yield...

Arch-TK · 2024-10-30T13:13:17 1730293997

> You're right. Looking at my actual code, instead I stored the accept to be yielded next time you call accept and only cancel an accept call if you drop the entire listener object mid-accept.

This is still a suboptimal solution as you've accepted a connection, informing the client side of this, and then killed it rather than never accepting it in the first place. (Worth noting that linux (presumably as an optimisation) accepts connections before you call accept anyway so maybe this entire point is moot and we just have to live with this weird behaviour.)

Now it's true that "never accepting it in the first place" might not be possible with io_uring in some cases but rather than hiding that under drop the code, it should be up front about it and prevent dropping (not currently possible in rust) in a situation where there might be uncompleted in-flight requests before you've explicitly made a decision between "oh okay then, let's handle this one last request" and "I don't care, just hang up".

withoutboats3 · 2024-10-30T13:21:46 1730294506

If you want the language to encode a liveness guarantee that you do something meaningful in response to an accept rather than just accept and close you do need linear types. I don't know any mainstream language that encodes that guarantee in its type system, whatever IO mechanism it uses.

amluto · 2024-10-30T15:18:25 1730301505

This all feels like the abstraction level is wrong. If I think of a server as doing various tasks, one of which is to periodically pull an accepted connection off the listening socket, and I cancel that task, then, sure, the results are awkward at best and possibly wrong.

But I’ve written TCP servers and little frameworks, asynchronously, and this whole model seems wrong. There’s a listening socket, a piece of code that accepts connections, and a backpressure mechanism, and that entire thing operates as a unit. There is no cancellable entity that accepts sockets but doesn’t also own the listening socket.

Or one can look at this another way: after all the abstractions and libraries are peeled back, the example in the OP is setting a timeout and canceling an accept when the timeout fires. That’s rather bizarre — surely the actual desired behavior is to keep listening (and accepting when appropriate) and do to the other timed work concurrently.

It just so happens that, at the syscall level, a nonblocking (polled, selected, epolled, or even just called at intervals) accept that hasn’t completed is a no-op, so canceling it doesn’t do anything, and the example code works. But it would fail in a threaded, blocking model, it would fail in an inetd-like design, and it fails with io_uring. And I really have trouble seeing linear types as the solution — the whole structure is IMO wrong.

(Okay, maybe a more correct structure would have you “await connection_available()” and then “pop a connection”, and “pop a connection” would not be async. And maybe a linear type system would prevent one from being daft, successfully popping a connection, and then dropping it by accident.)

gpderetta · 2024-10-30T15:59:46 1730303986

> maybe a more correct structure would have you “await connection_available()” and then “pop a connection”

This is the age-old distinction between a proactor and reactor async design. You can normally implement one abstraction of top of the other, but the conversion is sometimes leaky. It happens that the underlying OS "accept" facility is reactive and it doesn't map well to a pure async accept.

amluto · 2024-10-30T16:42:54 1730306574

I’m not sure I agree. accept() pops from a queue. You can wait—and-pop or you can pop-or-fail. I guess the former fits in a proactor model and the latter fits in a reactor model, but I think that distinction misses the point a bit. Accepting sockets works fine in either model.

It breaks down in a context where you do an accept that can be canceled and you don’t handle it intelligently. In a system where cancellation is synchronous enough that values won’t just disappear into oblivion, one could arrange for a canceled accept that succeeded to put the accepted socket on a queue associated with the listening socket, fine. But, in general, the operation “wait for a new connection and irreversibly claim it as mine IMO just shouldn’t be done in a cancellable context, regardless of whether it’s a “reactor” or a “proactor”. The whole “select and, as one option, irrevocably claim a new connection” code path in the OP seems suspect to me, and the fact that it seems to work under epoll doesn’t really redeem it in my book.

rpnx · 2024-10-30T19:20:16 1730316016

This is a simple problem I have met and dealt with before.

The issue is the lack of synchronization between cancellation and not handling cancel failure.

All cancellations can fail because there is always a race when calling cancel() where the operation completes.

You have two options, synchronous cancel (block until we know if cancel succeded) or async cancel (callback or other notification).

This code simply handles the race incorrectly, no need to think too hard about this.

It may be that some io_uring operations cannot be cancelled, that is a linux limitation. I've also seen there is no async way to close sockets, which is another issue.

doctorpangloss · 2024-10-30T18:44:20 1730313860

TCP connections aren’t correct representations of the liveness of sessions. The incorrectness is acute when it’s mobile browsers connecting over LTE to load balanced web servers. That’s why everyone reinvents a session idea on top of the network.

gpderetta · 2024-10-30T15:56:21 1730303781

> Worth noting that linux (presumably as an optimisation) accepts connections before you call accept anyway so maybe this entire point is moot and we just have to live with this weird behaviour.

listen(2) takes a backlog parameter that is the number of queued (which I think it means ack'd) but not yet popped (i.e. listen'd) connections.

ethegwo · 2024-10-30T15:43:14 1730302994

> if the accept completes before the SQE for the cancellation is submitted, the FD will still be leaked.

If the accept completes before the cancel SQE is submitted, the cancel operation will fail and the runtime will have a chance to poll the CQE in place and close the fd.

withoutboats3 · 2024-10-30T16:37:14 1730306234

Hmm, because the cancel CQE will have a reference to the CQE it was supposed to cancel? Yes, that could work.