Hacker News new | comments | ask | show | jobs | submit login
Show HN: Moustique – C++14 coroutine-based non-blocking IO on Linux (github.com)
68 points by matt42 10 months ago | hide | past | web | favorite | 31 comments

I haven’t developed much for Linux, but I think it’s a good idea to replace your std::vector<ctx::continuation> fibers with std::unordered_map<int, ctx::continuation> fibers.

Here’s why: https://stackoverflow.com/a/9376493/126995

That StackOverflow comment is wrong. Linux guarantees that for open(2) / creat(2) / socket(2) / etc. "[t]he file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process." This has been the case for the entire 40+ year history of Unix, and was certainly relied upon in the days when select(2) could only handle file descriptor values that were less than 32.

See the man page at, e.g., http://man7.org/linux/man-pages/man2/socket.2.html from which the quote above is taken. Note that I'm not expressing any opinion on whether to use a vector or an unordered_map in this particular application.

> Linux guarantees that for open(2) / creat(2) / socket(2) / etc.

Not “etc.”, accept(2) that creates the sockets in that code doesn’t guarantee that.

See the man page at, e.g., http://man7.org/linux/man-pages/man2/accept.2.html

Hmmm. Looking further, the dup(2) man page also mentions "lowest," but pipe(2) doesn't. A documentation oversight, I claim; after all, what would be the point of the kernel not using a single function for "get me an available fd" everywhere that it needs one? Not as reassuring as finding it in the specs, though; here's what http://pubs.opengroup.org/onlinepubs/9699919799/ has to say:

"2.14. File Descriptor Allocation

All functions that open one or more file descriptors shall, unless specified otherwise, atomically allocate the lowest numbered available (that is, not already open in the calling process) file descriptor at the time of each allocation. Where a single function allocates two file descriptors (for example, pipe() or socketpair()), the allocations may be independent and therefore applications should not expect them to have adjacent values or depend on which has the higher value."

> what would be the point of the kernel not using a single function for "get me an available fd" everywhere that it needs one?


To return lowest-numbered file descriptor, calls to that single function need to be serialized. On multi-core, and especially on NUMA, I can see how that might hit the performance.

Apps don’t normally create thousands of files, or pipes, or client sockets per second. However, for some server software, accepting thousands of sockets per second is normal.

Update, see also multi-queue NICs: https://www.kernel.org/doc/Documentation/networking/scaling....

If your application is long running and lives under a snapshot and restore system you may find this behavior. Just a guess.

Note that the stackoverflow post does not technically apply. accept() finds the smalles available file descriptor - it will never suddenly hand you a file descriptor of a huge integer value.

You can also control this with the "max open file" ulimit setting. This is by default often 1024 - meaning your process can have max 1024 open file descriptors, and the max value of a descriptor will be 1023.

Even if you need to support one or two orders of magnitude more file descriptors, a vector would likely be more efficient due to better memory locality.

> a vector would likely be more efficient due to better memory locality

Vector’s memory locality only matters when you’re iterating over the elements. For this case it’s essentially a random index.

Besides, unordered_map memory locality is fixable: https://github.com/Const-me/CollectionMicrobench

I'll try and swith to unordered_map if it has a negligible impact on performances.

Please be fair if in the readme that a comparison to ASIO is a bit apples to oranges. This one seems to be mostly a learning project, and is heavily work in progress.

boost asio already works with boost coroutines in a reasonable fashion. And this would certainly be a good way if someone wants to build some production project on top of it.

Regarding the code and example itself:

Having a separate callback for connection close seems to be a bit backward, since it again means the connection state is spread out over multiple callbacks. The purpose of a coroutine is to have everything in scope of that routine. In that case the connection close should be signaled by reading 0 from the socket, or the read returning an error.

Removed the comparison with ASIO. But note that this is not "heavily in progress" but ready for production too.

I took your remark for the closing callback. read now returns 0 and write returns false when the connection is closed or lost.

Not super fair to say linux sockets (in the README) and I/O in the title when it's really just a TCP client - you force SOCK_STREAM and bind(2).

This is default behaviour but you can fill you own socket file descriptor if you want, using the second overload of moustique_listen:

template <typename G, typename H> int moustique_listen(int listen_fd, G closed_connection_handler, H data_handler);

But you're right, maybe we want two moustique_listen_tcp moustique_listen_udp helpers. I put this in my todolist.

While you're at it please add a namespace .. it'd make it a lot easier to administer in a project.

Great project by the way - I've put it on my workbench-TODO for a little hacking some time in the next few days. I've got a standard 'command processing server' project that I've been hacking on over the years, I might try to hack up a moustique_ integration, just for grins.

Note: I wrote this yesterday after I wrapped my hands around coroutines. This is a work in progress (even if it's already fully working) I'll take any remarks here.

Since you asked, I suggest running your readme through a spell checker. I didn't bother looking further than that.

Not everyone has English as a first or even second language. No bothering because of spelling seems like a good way to filter out some cool pieces of code.

Sounds like a good way to filter out a certain kind of users.

With some potentially unreadable documentation. Spoken language is not less important than programming languages (and interestingly, from my experience, there seems to be at least some correlation between caring about either).

I did not spend too much time writing the documentation because of the simplicity of the library. I thought that the tcp echo example would be enough. But I'll spend more time in the Readme, at least fixing the language issues :)

You've got memory leaks and resource leaks there.

Got it. I'll fix them asap thanks

This is the API:

int moustique_listen(const char* port, G closed_connection_handler, H data_handler);

int moustique_listen(int listen_fd, G closed_connection_handler, H data_handler);

This could be improved, because the port argument could easily be confused with the file argument.

I.e.: moustique_listen(80, ...)

> moustique_listen

What, no love for namespaces?

I did it this way because This library contains just one function. Would you still use namespaces in this case?

It's just more C++ ish to write something::listen rather than something_listen. Namespaces are easy too. Why not?

My fear with this approach that it allows to write "using namespace moustique". This would cause: - name conflicts on the listen function. - Code harder to read since it's not obvious anymore that listen comes for the moustique library.

I often use namespaces but I do think that they can have a bad impact on the usability of the api, especially when you have too many deeply nested namespaces, and have to use plenty on using namespace ... to fix this.

Why not? (An API is an API, no matter how many functions.)

I would. Both because it's more idiomatic and because it's inevitable more functions will be added.

It looks to me like this was by design. Several items are behind an 'namespace moustique_impl'.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact