> The real NtQueryDirectoryFile API takes 11 parameters
Curiosity got the best of me here: I had to look this up in the docs to see how a linux syscall that takes 3 parameters could possibly take 11 parameters. Spoiler alert: they are used for async callbacks, filtering by name, allowing only partial results, and the ability to progressively scan with repeated calls.
This is a recurring pattern in Windows development. Unix devs look at the Windows API and go "This syscall takes 11 parameters? GROAN." But the NT kernel is much more sophisticated and powerful than Linux, so its system calls are going to be necessarily more complicated.
Side-by-side, comparing VMS to UNIX, and VMS's approach to a few key areas like I/O, ASTs and tiered interrupt levels are simply just more sophisticated. NT inherited all of that. It was fundamentally superior, as a kernel, to UNIX, from day 1.
I haven't met a single person that has understood NT and Linux/UNIX, and still thinks UNIX is superior as far as the kernels go. I have definitely alienated myself the more I've discovered that though, as it's such a wildly unpopular sentiment in open source land.
Cutler got a call from Gates in 89, and from 89-93, NT was built. He was 47 at the time, and was one of the lead developers of VMS, which was a rock-solid operating system.
In 93, Linus was 22, and starting "implementing enough syscalls until bash ran" as a fun project to work on.
Cutler despised the UNIX I/O model. "Getta byte getta byte getta byte byte byte." The I/O request packet approach to I/O (and tiered interrupts) is one of the key reasons behind NT's superiority. And once you've grok'd things like APCs and structured exception handling, signals just seem absolutely ghastly in comparison.
Since we're going into the history of Windows NT, VMS and Dave Cutler. I'd like to highlight this classic book on the history of all three of the above[1]
It follows the same line of narrative as The Soul of a New Machine
I've never met a single person who understood what they were talking about and referred to a "UNIX kernels". It may be true that Linux was once less advanced than NT - this is no longer the case, despite egregious design flaws in things like epoll. It has simply never been true (for example) for the Illumos (nee Solaris) kernel.
Which design faults do you think epoll specifically has?
I know there are lots of file descriptors not usable with epoll or rather async i/o in general and that sucks (e.g. regular files).
For networking, I find epoll/sockets nicer to work with than Windows' IOCP, because with IOCP you need to keep your buffers around until the kernel deems your operation complete. I think you have 3 options:
1) Design the whole application to manage buffers like IOCP likes (this propagates to client code because now e.g. they need to have their ring buffer reference-counted).
2) You handle it transparently in the socket wrapper code by using an intermediate buffer and expose a simple read()/write() interface which doesn't require the user to keep a buffer around when they don't want the socket anymore.
3) You handle it by synchronously waiting for I/O to be cancelled after using CancelIo. This sounds risky with potential to lock up the application for an unknown amount of time. It's also non-trivial because in that time IOCP will give you completion results for unrelated I/Os which you will need to buffer and process later.
On the other hand, with Linux such issues don't exist by design, because data is only ever copied in read/write calls which return immediately (in non-blocking mode).
The cool thing is that by gifting buffers to the kernel as policy, the gets to make that choice.
In Linux, you don't, and userspace and kernelspace both end up doing unnecessary copying, and it means with shared buffers, something might poll and not-block, but actually block by the time you get around to using the buffer.
This is annoying, and it generally means you need more than two system calls on average for every IO operation in performance servers.
As a rule, you can generally detect "design faults" by the number of competing and overlapping designs (select, poll, epoll, kevent, /dev/poll, aio, sigio, etc, etc, etc). I personally would have preferred a more fleshed out SIGIO model, but we got what we got...
One specific fault of epoll (compared to near-relatives) is that the epoll_data_t is very small. In Kqueue you can store both the file descriptor with activity (ident) as well as a few bytes of user data. As a result, people use a heap pointer which causes an extra stall to memory. Memory is so slow...
> something might poll and not-block, but actually block by the time you get around to using the buffer
On Linux if a socket is set to non-blocking it will not block. I don't really understand your point with shared buffers. You wouldn't typically share a TCP socket since that would result in unpredictable splitting/joining of data.
> In Linux, you don't, and userspace and kernelspace both end up doing unnecessary copying
I'm not so sure the Linux design where copies are done in syscalls must be inherently less efficient. I'm pretty sure with either design, you generally need at least one memcpy - for RX, from the in-kernel RX buffers to the user memory, and for TX, from user memory to the in-kernel TX buffers. I think getting rid of either copy is extremely hard and would need extremely smart hardware, especially the RX copy (because the Ethernet hardware would need to analyze the packet and figure out where the final destination of the data is!). Getting rid of TX copy might be easier but still hard because it'd need complex DMA support on the Ethernet card that could access potentially unaligned memory addresses. On the other hand, I also don't think you need more than one copy, if you design the network stack with that in mind.
> you need more than two system calls on average for every IO operation in performance servers.
True but it's not obvious that this is a performance bottleneck. Consider that a single epoll wait can return many ready sockets. I think theoretically it would hurt latency rather than throughput.
> As a rule, you can generally detect "design faults" by the number of competing and overlapping designs.
On Linux, I think for sockets, there are only: blocking, select, poll, epoll. And the latter three are just different ways to do the same thing.
On Windows, it's much more complicated - see this list of different methods to use sockets (my own answer): http://stackoverflow.com/questions/11830839/when-using-iocp-...
> One specific fault of epoll (compared to near-relatives) is that the epoll_data_t is very small.
Pretty much universally when you're dealing with a socket, you have some nontrivial amount of data associated with it that you will need to access when it's ready for I/O, typically a struct which at least holds the fd number. Naturally you put a pointer to such a struct into the epoll_data_t. I don't see how one could do it more efficiently outside of very specialized cases.
> I'm not so sure the Linux design where copies are done in syscalls must be inherently less efficient.
Windows overlapped IO can map the user buffer directly to the network hardware, which means that in some situations there will be zero copies on outbound traffic.
> especially the RX copy I also don't think you need more than one copy, if you design the network stack with that in mind.
When the interrupt occurs, the network driver is notified that the DMA hardware has written bytes into memory. On Windows, it can map those pages directly onto the virtual addresses where the user is expecting it. This is zero copies, and just involves updating the page tables.
This works because on Windows, the user space said when data comes in, fill this buffer, but on Linux the user space is still waiting on epoll/kevent/poll/select() -- it has only told the kernel what files it is interested in activity on, and hasn't yet told the kernel where to deposit the next chunk of data. That means the network driver has to copy that data onto some other place, or the DMA hardware will rewrite it on the next interrupt!
If you want to see what this looks like, I note that FreeBSD went to a lot of trouble to implement this trick using the UNIX file API[0]
> On Linux, I think for sockets, there are only: blocking, select, poll, epoll. And the latter three are just different ways to do the same thing.
Linux also supports SIGIO[1], and there are a number of aio[2] implementations for Linux.
epoll is not the same as poll: Copying data in and out of the kernel costs a lot, as can be seen by any comparison of the two, e.g. [3]
Also worth noting: Felix observes[4] SIGIO is as fast as epoll.
> I don't see how one could do it more efficiently
Dereferencing the pointer causes the CPU to stall right after the kernel has transferred control back into user space, while the memory hardware fetches the data at the pointer. This is a silly waste of time and of precious resources, considering the process is going to need the file descriptor and it's user data in order to schedule the IO operation on the file descriptor.
In fact, on Linux I get more than a full percent improvement out of putting the file descriptor there, instead of the pointer, and using a static array of objects aligned for cache sharing.
For more on this subject, you should see "what every programmer should know about memory"[4].
Thanks for the info. Yes I suppose zero-copy can be made to work but surely one needs to go through a LOT of trouble to make it work.
I'm curious about sending data for TCP through, don't you need to have the original data available anyway, in case it needs to be retransmitted? Do the overlapped TX operations (on Windows) complete only once the the data has also been acked? Are you expected to do multiple overlapped operations concurrently to prevent bad performance due to waiting for ack of pending data?
Windows was designed for an era where context switches were around 80µsec (nowadays they're closer to 2µsec), so a lot of that hard work has already paid for itself.
> don't you need to have the original data available anyway, in case it needs to be retransmitted? Do the overlapped TX operations (on Windows) complete only once the the data has also been acked?
I don't know. My knowledge of Windows is almost twenty years old at this point.
If I recall correctly, the TCP driver actually makes a copy when it makes the packet checksums (since you have the cost of reading the pages anyway), but I think behaviour this is for compatibility with Winsock, and it could have used a copy-on-write page, or given a zero page in other situations.
And Solaris, (unlike Linux historically at least), supports async I/O on both files and sockets. Linux (historically) only supported it for sockets. I have no idea if Linux generally supports async I/O for files at this point.
Let me rephrase it: there is nothing on any version of UNIX that supports an asynchronous file I/O API that integrates cleanly with the file system cache -- you can do signal based asynchronous I/O, but that isn't anywhere near as elegant as having a single system call that will return immediately if the data is available, and if not, sets up an overlapped operation and still returns immediately to the caller.
They're called threads. Overlapped I/O on Windows is based on a thread pool, which is why you can't cancel and destroy your handle whenever you want--a thread from the pool might be performing the _blocking_ I/O operation, and in that critical section there's no way to wake it up to tell it to cancel. Just... like... on... Unix.
The difference between Windows Overlapped I/O and POSIX AIO is that on Windows it's a black box, so people can pretend it's magical. Whereas on Linux there hasn't been interest (AFAIK) to merge patches that provide a kernel-side pool of threads for doing I/O, and the decades-long debates have spilled out onto the streets. If you view userspace code as somehow inelegant or fundamentally slow, then of course all the blackbox Windows API and kernel components look appealing. What has held Linux back regarding AIO is that Linux (and Unix people in general) have historically preferred to keep as much in userspace as possible.
This is why NT has syscalls taking 11 parameters. In the Unix world you don't design _kernel_ APIs that way, or any APIs, generally. In the Unix world you prefer simple APIs that compose well. read/write/poll compose much better than overlapped I/O, though which model is most elegant and useful in practice is highly context-dependent. As an example, just think how you'd abstract overlapped I/O in your favorite programming language. C# exposes overlapped I/O directly in the language, but doing so required committing to very specific constructs in the language.
As for performance, both Linux and FreeBSD support zero-copy into and out over userspace buffers. The missing piece is special kernel scheduling hints (e.g. like Apple's Grand Central Dispatch) to optimize the number of threads dedicated per-process and globally to I/O thread pools. But at least not too long ago the Linux kernel was much more efficient at handling thousands of threads than Windows, so it wasn't really an issue. That's another thing Linux prefers--optimizing the heck out of simpler interfaces (e.g. fork), rather than creating 11-argument kernel syscalls. IOW words, make the operation fast in all or most cases so you don't need more complex interfaces.
The difference between Windows Overlapped I/O and POSIX AIO is that on Windows it's a black box, so people can pretend it's magical.
No. The difference is that in Windows you can check for completion and set up an overlapped I/O operation in one system call. Requiring multiple system calls to do the same thing means more unnecessary context switches, and the possibility of race conditions especially in multithreaded code. That and, as trentnelson stated, the Windows implementation is well integrated with the kernel's filesystem cache. Linux userspace solutions? Hahaha.
Supplying this capability as a primitive rather than requiring userland hacks is the right way to do it from an application developer's perspective.
As load increases, polling in Unix asymptotically approaches one additional syscall per thread for any number of sockets. That's because a single poll returns all the ready sockets--you're not polling each socket individually before each I/O request[1]. That means if your thread has 10,000 sockets, the amoritized cost per I/O operation is <= 1/10,000th of a syscall.
As for IOCP being "well integrated", what does that even mean? In Windows when file I/O can't be satisfied from the buffer cache, Windows uses a thread pool to do the I/O (presuming it just doesn't block your thread; see https://support.microsoft.com/en-us/kb/156932), just like you'd do it in Unix. There's nothing magical about that thread pool other than that the threads aren't bound to a userspace context. Maybe you mean that the kernel can adjust the number of slave threads so that there aren't too many outstanding synchronous I/O requests? But the Linux I/O scheduler can implement similar logic when queueing and prioritizing requests. It's six of one and a half-dozen of the other.
[1] At least, assuming you're doing it correctly. But sadly many libraries do it incorrectly. For example, I once audited for a startup Zed Shaw's C-based non-blocking I/O and coroutine library. IIRC, he had devised an incredibly complex hack to fallback to poll(2) instead of epoll(2) because in his tests epoll(2) didn't scale when sockets were heavily used; he only saw epoll scale for HTTP sockets where clients were long-polling. But the problem was that every time he switched coroutine contexts, he was deleting and re-adding descriptors, which completely negated all the benefits of epoll. Why did do this? Presumably because to use epoll properly you need to persist the event polling. But if application code closes a descriptor, the user-space event management state will fall out of sync with the kernel-space state, which is bad news. He tried to design his coroutine and yielding API to be as transparent as possible. But you can't do that. Performant use of epoll requires sacrificing some abstraction, similar to the hassles IOCP causes with buffer management.
The benefit of IOCP isn't performance--whether it's more performant or not is context-dependent. The biggest benefit of IOCP, IMO, is that it's the defacto standard API to use. You don't need to choose between libevent, libev, Zed's Shaws library, or the other thousands of similar libraries. On Windows everybody just uses IOCP and they can expect very good results.
The myth that IOCP is intrinsically better, or intrinsically faster, is a result of what I like to call kernel fetishism--that things are always faster and better when run in kernel space. But that's just a myth. IOCP nails down a very popular and very robust design pattern for highly concurrent network servers, but it's not necessarily the best pattern. And sticking to IOCP imposes many unseen costs. For example, it makes it more difficult to mix and match libraries each doing I/O because when you're having to juggle callbacks from many different libraries your code quickly becomes obtuse and brittle. It also demands a highly threaded environment with lots of shared state, but that likewise leads to very complex and bug prone code.
It's only the right way to do it from an application developer's perspective if the behaviour of the official OS-approved interface matches what their application needs, which is unlikely to be the case here. For example, according to https://support.microsoft.com/en-us/kb/156932 there's a fixed-sized pool of threads used to fetch data into cache to fill async I/O requests. If you try to have too many async I/O requests for the number of threads (which can be as small as 3) the excess are automatically converted into synchronous, blocking requests. This renders the API useless for something like an event-driven web server, because any file I/O call could block the main thread until it completes.
(Also, curiously when the data's in the cache that page shows a performance penalty for async reads that complete synchronously from the cache compared to sync reads. Wonder why.)
I've never had a single issue with "async things suddenly becoming synchronous which blocks the main thread" -- if you architect things properly that just never happens, blocking operations are off the hot path, and when you absolutely must block, IOCP's concurrency self-awareness kicks in and another thread is scheduled to run on the core, ensuring that each core has one (and only one) runnable thread.
> (Also, curiously when the data's in the cache that page shows a performance penalty for async reads that complete synchronously from the cache compared to sync reads. Wonder why.)
Because a synchronous operation is always faster than an overlapped operation if it can be completed synchronously.
Lots of stuff happens behind the scenes when an overlapped operation occurs.
Your rephrasing still doesn't matter, Solaris event ports are not signal-based.
My understanding is that Solaris event ports were intended to offer equivalent functionality to Windows' I/O completion ports, so this should not be surprising.
Solaris also has its own native async I/O API in addition to supporting POSIX async.
Heh, 10 years old, original link doesn't work, image is tiny. And it sounds like they were comparing Linux and Apache to IIS and Windows.
It's hard to evaluate this in any way more than "yeah that's a cute spaghetti diagram". If I wanted to drag Linux through the mud visually I'd depict how much time every socket I/O op spends in vfs/fsync stuff. (i.e. you can depict anything to make your point)
That might be a cool diagram to show! I honestly know nothing about kernel programming, just happened to come across that a long time ago and bookmarked it.
The second image is of IIS running on Windows, not Apache. Different software on different OSes. Regardless, it doesn't seem like parent is making the argument that the NT kernel isn't complicated - just that it is superior.
It is more complicated because -- surprise -- I/O usage patterns do not fit in one neat little box, and you have to provide special handling of special cases that do occur frequently in the real world.
I was fascinated by BeOS in the late 90s when I had a lot of enthusiasm (and little clue). All their threading claims just sounded so cool. I was also really into FreeBSD from around 2.2.5 so I got to see how all the SMPng stuff (and kqueue!) evolved, as well as all the different threading models people in UNIX land were trying (1:1, 1:m, m:n).
NT solves it properly. Efficient multithreading support and I/O (especially asynchronous I/O) are just so intrinsically related. Trying to bend UNIX processes and IPC and signals and synchronous I/O into an efficient threading implementation is just trying to fit a square peg in a round hole in my opinion.
As for reading list... I've bought so many old books lately. Here's my "makes the short list" bookshelf: http://imgur.com/DfTUVQx
Lol, I've been saying something like this for some time.
Thanks for the answer. I guess what I'm interested is somewhat obscure/historical operating systems and also HW that are in some way superior to currently popular solutions. The more comparative the better.
Also your reading list has quite a few Oracle SQL entries so I'm guessing it's your preferred DB of choice. What features are you using that aren't available in MySQL or Postgres?
As far as commercial vendors go I really preferred SQL Server from about version 2000 onwards. I got into Oracle again recently for a consulting project with a finance client (who basically have unlimited Oracle licenses) and really quite enjoyed it since the Oracle 7/8 days.
You can do some phenomenally sophisticated things... I extensively leveraged things like partitioning, parallel execution (dbms_parallel_execute!), lots of PL/SQL using the pipelined table cursor stuff, data mining stuff (dbms_frequent_itemset!), index-organized tables, and my god, bitmap indexes were a godsend, direct insert tricks for bulk data loading, external tables were fantastic (you can wrap a .csv in an external table and interact with it in parallel just like any other table -- great for ingesting large amounts of janky .csv data from other parts of the business).
The parallel execution and robust partitioning options were probably the most critical pieces that have no particularly good counterpart in open source land.
They also have a text mining engine which lets you dump almost any filetype in the database and do sophisticated full text queries, including generating html snippets with highlighted matches. And they have a native column type for storing geometry, so you can do geometric queries and aggregates, for which they then also have a web viewer engine with tiled maps support, so you basically have the core of a GIS engine. There's a ton of cool stuff in there, I haven't even looked at half of it.
Oh man how did I forget the geo stuff! I used that extensively! The custom indexes that could do lat/lon proximity joins! And yeah the text mining stuff is great as well. (Ugh, I've forgotten so much already -- such a shame.)
I always thought the external table functionality was a terrible idea, but if the use case is just for import that's more reasonable. Don't you like SQL*Loader or something ;)
It's great when you need to whip something up quickly. My workflow was to create foo_ext, poke around at the data and get some sensible column defaults, then `create table foo select * from foo_ext`. Worked really well, especially for once-off or infrequent stuff.
For batch-oriented stuff where you're getting pretty consistent data at a regular interview I'd go with SQL*Loader.
But the NT kernel is much more sophisticated and powerful than Linux
That does not follow from the example. All it shows is that Microsoft prefers to put a lot of functionality in one interface, while Linux probably prefers low-level functions to be as small as possible, and probably offers things like filtering on a higher level (in glibc, for example).
Neither explanation has anything to do with sophistication. I personally believe that small interfaces are a better design.
Actually it does, as it mentioned that the extra parameters are for things like async callbacks and partial results.
The I/O model that Windows supports is a strict superset of the Unix I/O model. Windows supports true async I/O, allowing process to start I/O operations and wait on an object like an I/O completion port for them to complete. Multiple threads can share a completion port, allowing for useful allocation of thread pools instead of thread-per-request.
In Unix all I/O is synchronous; asynchronicity must be faked by setting O_NONBLOCK and buzzing in a select loop, interleaving bits of I/O with other processing. It adds complexity to code to simulate what Windows gives you for real, for free. And sometimes it breaks down; if I/O is hung on a device the kernel considers "fast" like a disk, that process is hosed until the operation completes or errors out.
I wrote the windows bits for libuv (node.js' async i/o library), so I have extensive experience with asynchronous I/O on Windows, and my experience doesn't back up parent's statement.
Yes, it's true that many APIs would theoretically allow kernel-level asynchronous I/O, but in practice the story is not so rosy.
* Asynchronous disk I/O is in practice often not actually asynchronous. Some of these cases are documented (https://support.microsoft.com/en-us/kb/156932), but asychronous I/O also actually blocks in cases that are not listed in that article (unless the disk cache is disabled). This is the reason that node.js always uses threads for file i/o.
* For sockets, the downside of the 'completion' model that windows is that the user must pre-allocate a buffer for every socket that it wants to receive data on. Open 10k sockets and allocate a 64k receive buffer for all of them - that adds up quickly. The unix epoll/kqueue/select model is much more memory-efficient.
* Many APIs may support asynchronous operation, but there are blatant omissions too. Try opening a file without blocking, or reading keyboard input.
* Windows has many different notification mechanisms, but none of them are both scalable and work for all types of events. You can use completion ports for files and sockets (the only scalable mechanism), but you need to use events for other stuff (waiting for a process to exit), and a completely different API to retrieve GUI events. That said, unix uses signals in some cases which are also near impossible to get right.
* Windows is overly modal. You can't use asynchronous operations on files that are open in synchronous mode or vice versa. That mode is fixed when the file/pipe/socket is created and can't be changed after the fact. So good luck if a parent process passes you a synchronous pipe for stdout - you must special case for all possible combinations.
* Not to mention that there aren't simple 'read' and 'write' operations that work on different types of I/O streams. Be ready to ReadFileEx(), Recv(), ReadConsoleInput() and whatnot.
IMO the Windows designers got the general idea to support asynchronous I/O right, but they completely messed up all the details.
You're completely missing how the NT I/O subsystem works, and how to use it optimally.
> * Asynchronous disk I/O is in practice often not actually asynchronous. Some of these cases are documented (https://support.microsoft.com/en-us/kb/156932), but asychronous I/O also actually blocks in cases that are not listed in that article (unless the disk cache is disabled). This is the reason that node.js always uses threads for file i/o.
The key to NT asynchronous I/O is understanding that the cache manager, memory manager and file system drivers all work in harmony to allow a ReadFile() request to either immediately return the data if it is available in the cache, and if not, indicate to the caller that an overlapped operation has been started.
Things like extending a file, opening a file, that's not typically hot-path stuff. If you're doing a network oriented socket server, you would submit such a blocking operation to a separate thread pool (I set up separate thread pools for wait events, separate to the normal I/O completion thread pools), and then that I/O thread moves on to the next completion packet in its queue.
> * For sockets, the downside of the 'completion' model that windows is that the user must pre-allocate a buffer for every socket that it wants to receive data on. Open 10k sockets and allocate a 64k receive buffer for all of them - that adds up quickly. The unix epoll/kqueue/select model is much more memory-efficient.
Well that's just flat out wrong. You can set your socket buffer size as large or as small as you want. For PyParallel I don't even use an outgoing send buffer.
Also, the new registered I/O model in 8+ is a much better way to handle socket buffers without the constant memcpy'ing between kernel and user space.
> IMO the Windows designers got the general idea to support asynchronous I/O right, but they completely messed up all the details.
I disagree. Write a kernel driver on Linux and NT and you'll see how much more superior the NT I/O subsystem is.
> The key to NT asynchronous I/O is understanding that the cache manager, memory manager and file system drivers all work in harmony to allow a ReadFile() request to either immediately return the data if it is available in the cache, and if not, indicate to the caller that an overlapped operation has been started.
> Be careful when coding for asynchronous I/O because the system reserves the right to make an operation synchronous if it needs to. Therefore, it is best if you write the program to correctly handle an I/O operation that may be completed either synchronously or asynchronously.
Microsoft is directly saying that it reserves the right to violate the guarantee you are counting on at any time, and it documents several known cases of this. You can try to guess when this will happen and put those I/O operations on a different thread pool, but you're just playing whack-a-mole. And you're violating Microsoft's own recommendations.
That's not a particularly good article with regards to high performance techniques.
You wouldn't be using compression or encryption for a file that you wanted to be able to submit asynchronous file I/O writes to in a highly concurrent network server. Those have to be synchronous operations. You'd do everything you can to use TransmitFile() on the hot path.
If you need to sequentially write data, wanted to employ encryption or compression, and reduce the likelihood of your hot-path code blocking, you'd memory map file-sector-aligned chunks at a time, typically in a windowed fashion, such that when you consume the next one you submit threadpool work to prepare the one after that (which would extend the file if necessary, create the file mapping, map it as a view, and then do an interlocked push to the lookaside list that the hot-path thread will use).
I use that technique, and also submit prefaults in a separate threadpool for the page ahead of the next page as I consume records I'm writing to. Before you can write to a page, it needs to be faulted in, and that's a synchronous operation, so you'd architect it to happen ahead of time, before you need it, such that your hot-path code doesn't get blocked when it writes to said page.
That works incredibly well, especially when you combine it with transparent NTFS compression, because the file system driver and the memory manager are just so well integrated.
If you wanted to do scatter/gather random I/O asynchronously, you'd pre-size the file ahead of time, then simply dispatch asynchronous writes for everything, possibly leveraging SetFileIoOverlappedRange such that the kernel locks all the necessary sections into memory ahead of time.
And finally, what's great about I/O completion ports in general is they are self-aware of their concurrency. The rule is always "never block". But sometimes, blocking is inevitable. Windows can detect when a thread that was servicing an I/O completion port has blocked and will automatically mark another thread as runnable so the overall concurrency of the server isn't impacted (or rather, other network clients aren't impacted by a thread's temporary blocking). The only service that's affected is to the client that triggered whatever blocking I/O call there was -- it would be indistinguishable (from a latency perspective) to other clients, because they're happily being picked up by the remaining threads in the thread pool.
> > Be careful when coding for asynchronous I/O because the system reserves the right to make an operation synchronous if it needs to. Therefore, it is best if you write the program to correctly handle an I/O operation that may be completed either synchronously or asynchronously.
That's not the best wording they've used given the article is also talking about blocking. If you've followed my guidelines above, a synchronous return is actually advantageous for file I/O because it means your request was served directly from the cache, and no overlapped I/O operation had to be posted.
And you know all of the operations that will block (and they all make sense when you understand what the kernel is doing behind the scenes), so you just don't do them on the hot path. It's pretty straight forward.
I disagree. Write a kernel driver on Linux and NT and you'll see how much more superior the NT I/O subsystem is.
Can programming against the userspace interface the I/O subsystem really be compared to programming against the kernel driver interface to I/O subsystem? In Linux, kernel drivers have access to structures, services, and layers that userspace doesn't. And can these be compared between a monolithic and a micro-kernel approach, other than what has been debated ad nauseam for micro/monolithic kernels in general (not just used for I/O)?
I didn't make my point particularly well there to be honest. Writing an NT driver is incredibly more complicated than an equivalent Linux one, because your device needs to be able to handle different types of memory buffers, support all the Irp layering quirks, etc.
I just meant that writing an NT kernel driver will really give you an appreciation of what's going on behind the scenes in order to facilitate awesome userspace things like overlapped I/O, threadpool completion routines, etc.
I have never worked with systems-level Windows programming, so I don't know the answer to this...but, how is what you're describing better than epoll or aio in Linux or kqueues on the BSDs?
I'm guessing you're coming from the opposite position of ignorance I am (i.e. you've worked on Windows, but not Linux or other modern UNIX), though, since "setting O_NONBLOCK and buzzing in a select loop, interleaving bits of I/O with other processing" doesn't describe anything developed in many, many years. 15 years ago select was already considered ancient.
I think IO Completion Ports [1] in Windows are pretty similar to kqueue [2] in FreeBSD and Event Ports [3] in Illumos & Solaris. All of them are unified methods methods for getting change notifications on IO events and file system changes. Event Ports and kqueue also handle unix signals and timers.
Windows will also take care of managing a thread pool to handle the event completion callbacks by means of BindIoCompletionCallback [4]. I don't think kqueue or Event Ports has a similar facility.
The documentation for the new thread pool APIs in Windows say that it is more performant than the old ones. Do you have any experience transitioning from the old thread pool to the new one, and if so did you notice performance differences?
Last decade I wrote a server based on the old API and BindIoCompletionCallback. I can't comment on the performance differences, but the newer threadpool API is much better to work with.
IMO one of the biggest problems with the old API is that there is only one pool. This can cause crazy deadlocks due to dependencies between work items executing in the pool. The new API allows you to create isolated pools.
The old API did not give you the ability to "join" on its work items before shutting down. You could roll your own solution if you knew what you were doing, but more naive developers would get burned.
You also get much finer grained control over resources.
I'm really lucky to have decided to dig into this stuff (for PyParallel) around ~2011, by which stage Windows Vista (where all the new APIs had been introduced) had been out for like, 5-6 years, so no, I never had to deal with the old APIs, thankfully.
> The “Why Windows?” (or “Why not Linux?”) question is one I get asked the most, but it’s also the one I find hardest to answer succinctly without eventually delving into really low-level kernel implementation details.
>
> You could port PyParallel to Linux or OS X -- there are two parts to the work I’ve done: a) the changes to the CPython interpreter to facilitate simultaneous multithreading (platform agnostic), and b) the pairing of those changes with Windows kernel primitives that provide completion-oriented thread-agnostic high performance I/O. That part is obviously very tied to Windows currently.
>
> So if you were to port it to POSIX, you’d need to implement all the scaffolding Windows gives you at the kernel level in user space. (OS X Grand Central Dispatch was definitely a step in the right direction.) So you’d have to manage your threadpools yourself, and each thread would have to have its own epoll/kqueue event loop. The problem with adding a file descriptor to a per-thread event loop’s epoll/kqueue set is that it’s just not optimal if you want to continually ensure you’re saturating your hardware (either CPU cores or I/O). You need to be able to disassociate the work from the worker. The work is the invocation of the data_received() callback, the worker is whatever thread is available at the time the data is received. As soon as you’ve bound a file descriptor to a per-thread set, you prevent thread migration
>
> Then there’s the whole blocking file I/O issue on UNIX. As soon as you issue a blocking file I/O call on one of those threads, you have one thread less doing useful work, which means you’re increasing the time before any other file descriptors associated with that thread’s multiplex set can be served, which adversely affects latency. And if you’re using the threads == ncpu pattern, you’re going to have idle CPU cycles because, say, only 6 out of your 8 threads are in a runnable state. So, what’s the answer? Create 16 threads? 32? The problem with that is you’re going to end up over-scheduling threads to available cores, which results in context switching, which is less optimal than having one (and only one) runnable thread per core. I spend some time discussing that in detail here: https://speakerdeck.com/trent/parallelism-and-concurrency-wi.... (The best example of how that manifests as an issue in real life is `make –jN world` -- where N is some magic number derived from experimentation, usually around ncpu X 2. Too low, you’ll have idle CPUs at some point, too high and the CPU is spending time doing work that isn’t directly useful. There’s no way to say `make –j[just-do-whatever-you-need-to-do-to-either-saturate-my-I/O-channels-or-CPU-cores-or-both]`.)
>
> Alternatively, you’d have to rely on AIO on POSIX for all of your file I/O. I mean, that’s basically how Oracle does it on UNIX – shared memory, lots of forked processes, and “AIO” direct-write threads (bypassing the filesystem cache – the complexities of which have thwarted previous attempts on Linux to implement non-blocking file I/O). But we’re talking about a highly concurrent network server here… so you’d have to implement userspace glue to synchronize the dispatching of asynchronous file I/O and the per-thread non-blocking socket epoll/kqueue event loops… just… ugh. Sure, it’s all possible, but imagine the complexity and portability issues, and how much testing infrastructure you’d need to have. It makes sense for Oracle, but it’s not feasible for a single open source project. The biggest issue in my mind is that the whole thing just feels like forcing a square peg through a round hole… the UNIX readiness file descriptor I/O model just isn’t well suited to this sort of problem if you want to optimally exploit your underlying hardware.
>
> Now, with Windows, it’s a completely different situation. The whole kernel is architected around the notion of I/O completion and waitable events, not “file descriptor readiness”. This seems subtle but it pervades every single aspect of the system. The cache manager is tightly linked to the memory management and I/O manager – once you factor in asynchronous I/O this becomes incredibly important because of the way you need to handle memory locking for the duration of the I/O request and the conditions for synchronously serving data from the cache manager versus reading it from disk. The waitable events aspect is important too – there’s not really an analog on UNIX. Then there’s the notion of APCs instead of signals which again, are fundamentally different paradigms. The digger you deep the more you appreciate the complexity of what Windows is doing under the hood.
>
> What was fantastic about Vista+ is that they tied all of these excellent primitives together via the new threadpool APIs, such that you don’t need to worry about creating your own threads at any point. You just submit things to the threadpool – waitable events, I/O or timers – and provide a C callback that you want to be called when the thing has completed, and Windows takes care of everything else. I don’t need to continually check epoll/kqueue sets for file descriptor readiness, I don’t need to have signal handlers to intercept AIO or timers, I don’t need to offload I/O to specific I/O threads… it’s all taken care of, and done in such a way that will efficiently use your underlying hardware (cores and I/O bandwidth), thanks to the thread-agnosticism of Windows I/O model (which separates the work from the worker).
>
> Is there something simple that could be added to Linux to get a quick win? Or would it require architecting the entire kernel? Is there an element of convergent evolution, where the right solution to this problem is the NT/VMS architecture, or is there some other way of solving it? I’m too far down the Windows path now to answer that without bias. The next 10 years are going to be interesting, though.
"The problem with adding a file descriptor to a per-thread event loop’s epoll/kqueue set is that it’s just not optimal if you want to continually ensure you’re saturating your hardware (either CPU cores or I/O)"
Both epoll and kqueue permit multiple threads to poll the same event set. Normally you do this in tandem with edge-triggered readiness (EPOLLET on Linux, EV_CLEAR on BSD) so that only one thread will dequeue an event.
How do think IOCP is implemented in Windows? There's a thread pool in the kernel which _literally_ polls a shared event queue. It's just hidden so you can pretend it's magical. But conceptually it works almost identically to how you would do it in Unix.
The benefit of IOCP is that it's a native API. It's warts and shortcomings notwithstanding, developers never even need to think about how it's actually implemented. Whereas with epoll and kqueue you either have to roll your own framework, or select from various third-party options. Seeing how the sausage is made can turn some people off. But just because you don't see the gory details doesn't mean it's implemented using magical fairy dust.
There's much to recommend Windows, and many things the NT kernel conceptually gets right. But IOCP vs polling? The only real difference architecturally is how much of the stack sits in user-space vs kernel-space, and how much of the stack is delivered by Microsoft (all of if in the case of IOCP) vs other sources (in Linux, glibc does AIO, while all the event loop and callback code is provided by various libraries or written yourself).
Putting more of the stack in kernel-space doesn't magically make it easier to perform optimizations. That's marketing speak and kernel fetishism. You have to first show why those optimizations can't be achieved elsewhere, like in the I/O or process scheduler. Various Linux components traditionally are more performant (e.g. process scheduling) than in Windows, so many of the optimizations wrt IOCP is arguably clawing back performance lost elsewhere in the system.
According to your website pretty much every other technology runs better on Linux than it does on Windows, and of course pyparallel runs better than everything you tested.
How can I run these tests my self? I specifically want to test it against golang.
Based on what you describe here, I'd say your comparative understanding is about two decades out of date. Async I/O has been a capability in various Unix and Unix-alike kernels for that long.
As in signal based AIO? Have you ever tried to use it in a high performance network server, where you want to have reads also satisfied from the cache if possible?
I think the problem here is not a syscall taking 11 parameters, it's a syscall that merely lists what is inside a directory taking 11 parameters. ataylor_284 explained the reasons (how convincingly, I'd argue) but on the first sight that surely smells bloat.
I'd also object NT kernel being more "powerful". Sure unixy kernels and NT has their differences but I don't think either one is superior.
11 parameters may seem bloated but in some cases Unix syscalls weren't designed with enough parameters whcih caused a bunch of pain necessitating things like
That's the thing. Just by glancing at the API docs, Windows looks more complicated but where the rubber meets the road in terms of real high-performance application development, Windows is way simpler. In Windows you can do in one syscall what would take several in Linux. You can schedule I/O calls across multiple threads in a completely thread-safe manner without having to manage the synchronization yourself -- and since threads go to sleep entirely while waiting for I/O operations to complete, there is no chewing up CPU cycles in a select/epoll loop. So yes, writing "hello world" or simple filters is simpler in Unix -- but writing multithreaded server applications that maximize throughput is simpler in Windows.
Unix is bristling with features designed to "allow you to save me some time". It was designed to make it easy to write quick, "one-off" programs in C. VMS -- the predecessor to Windows NT -- was designed to run long-lasting, high-performance, high-reliability business applications for real users with money on the line (i.e., not just hackers) and Windows NT inherits this legacy.
>the NT kernel is much more sophisticated and powerful than Linux
Source?
It's not sophisticated enough or powerful enough to be the most used kernel on super computers (and in the world). Windows pretty much only dominates the desktop market. Servers, super computers, mainframes, etc, mostly use Linux.
A few years ago there was even a bug in Windows that caused degradation in network performance during multimedia playback that was directly connected with mechanisms employed by the Multimedia Class Scheduler Service (MMCSS), this is used on a lot of audio setups. If they can't even get audio setups right how can people consider anything Windows releases "sophisticated"?
It's made to do anything you throw at it I guess, it's definitely complicated, but powerful and sophisticated aren't words I would use to describe NT.
I would go so far as to say that a large part of why audio is such a CF under Linux is -- wait for it -- lack of real asynchronous I/O.
Audio is asynchronous by nature, and to do that right under Linux you need a "sound server" with all the additional overhead, jank, and shuffling of data between kernel and at least two different user spaces that implies. Audio under Linux was best with OSS, which was synchronous in nature and not suitable for professional applications. JACK mitigated that somewhat, but for an OS to do audio right you need a kernel-level async audio API like Core Audio or whatever Windows is doing these days.
Windows has a sound server too, you know. I believe Core Audio on Mac does too. A large part of why audio is such a CF under Linux is that PulseAudio is incredibly badly written and poorly maintained. My favourite was the half-finished micro-optimization that broke some of the resamplers because the author got bored and never modified them to handle the change, which somehow made it all the way into a release. I dread to think what they'd do with powerful tools like asynchronous I/O.
I don't want to take sides in this discussion but share an anecdote. Hey, maybe even someone knows a solution for this.
I have a PC connected via on-board HDMI to a Denon AVR solely for the purpose of getting the audio to the amplifier. Windows doesn't let me use that audio interface without extending or mirroring my desktop to that HDMI port. Since there is no display connected to the AVR I don't want to extend the desktop, and mirroring heavily decreases performance of the system.
On Debian Sid the computer by default allows me to use the HDMI audio without doing anything to my desktop display. It seems the system realizes that there is no display connected to the AVR but it's still a valid sink for audio.
If I remember correctly, NVIDIA did a decent job with their on-board HDMI driver audio-wise; what brand is yours? I'm 100% the functionality is dependent on the driver rather than the OS.
It works fine as in "I can listen to audio on my laptop from multiple programs at once, with a centralized way to control audio volume on a per-application or per-sound-device basis." I literally cannot imagine any audio system doing better than that given the hardware I have to work with.
But this is exactly what made me switch, windows was preventing me from accessing my sound card directly in order to record a remote interview.
I use Linux regularly to record and edit audio, it's free , it works, and I dont have to worry my OS is active reducing the functionality of my equipment.
They got audio setups right. The reason the network degradation happened is that video and audio playback were given realtime priority so background processes couldn't cause pops, stutters, etc. At the time Vista was released, most home users didn't have a gigabit network, so the performance degradation would only happen on a small number of users, and most would rather prefer good audio and video performance to a slowdown in network performance in a small percentage of users. With today's massively multicore systems, it's even less of an issue, while linux still has a problem with latency on applications like pro audio.
The reason the network degradation happened is that Microsoft couldn't figure out how to stop heavy network activity causing audio glitches on some systems even after giving audio realtime priority, so they hacked around it by adding a fixed 10,000 packets-per-second cap on network activity regardless of system speed or CPU usage (less if you had multiple network adapters). See https://blogs.technet.microsoft.com/markrussinovich/2007/08/... This was just as much of an issue on multicore systems because the cap was unaffected by the system speed and chosen based on a slow single-core machine.
I don't know why you're getting downvoted since the parent is basically stating random opinions about "power" and "sophistication" without anything to actually back it up.
11 param functions don't say "power" to me. They say "poorly thought out API design". Much can be said for most Windows APIs in general.
Plan 9 replaces ioctl with special files that require writing magic incantations.
It's similar to the various knobs in Linux /proc which require reading and writing specially formatted data. ioctl is simpler in that you don't need to worry as much about formatting the data (the struct declarations take care of that for you), but a file-oriented interface is nicer in that it's a higher-level abstraction--for example, it maps better to different languages, similar to how ioctl requires C or C-like shims whereas /proc can be used from any language that understands open/read/write/close, including the shell.
A lot of Microsoft APIs and subsystems are similarly bloated. There are probably tons of factors at play, but I believe being closed-source and having to support many individual use cases is one fundamental reason. (See for example CreateProcess vs. fork...)
When it comes to system call interfaces, it's because Dave Cutler has forgotten more than many modern "kernel hackers" will ever know about how to design an OS.
Dave Cutler's skills aside, Unix predates Windows by decades, and to anyone remotely familiar with kernel development it is clear that the sheer quantity and complexity of subsystems stem from the fact that nobody but Microsoft can actually see, modify, and redistribute Windows' source code.
Unless you can actually say "here's why Windows is qualitatively better" and point out specific tasks Windows does better, I'll just point you to the fact that the internet infrastructure and most of the servers on it, along with every Apple desktop and pretty much every mobile device, run Unix.
> I'll just point you to the fact that the internet infrastructure and most of the servers on it, along with every Apple desktop and pretty much every mobile device, run Unix.
I wonder how much of the internet infrastructure would run Unix if free (as in beer) clones like *BSD and GNU/Linux did not exist in first place.
How much internet infrastructure would run actually Unix if ISPs had to choose between Aix, HP-UX, Solaris, Digital UX, Tru64 and Windows licenses?
I'd guess that, even more important than the cost, up until recently, interacting with Windows in a headless mode was next to useless. Most sysadmins in my experience avoid GUIs like the plague when managing servers.
I started coding in the 80's and I remember how the UNIX market was afraid of Windows NT workstations, before they actually started losing market share to the free (beer) UNIX versions in form of BSD and GNU/Linux.
Why is it valid to conflate every kernel that runs a *nix OS together under "Unix"? Is there any meaningful overarching "Unixy" way in which, say, Xnu and Linux are similar?
NT's approach to async IO is at least somewhat empirically better as it does not require an extra context switch between receiving a "ready" event and actually performing the IO operation.
Completion notification requires committing a (potentially cache-hot) buffer for an operation that might not complete until some time in the future. With readiness notification you only need to have the buffer ready when you know you will use it.
Also, on an high performance poll/epoll/kevent based system you only need to poll cold fds, while you can do speculative direct read/writes to hot fds, so no need for extra syscalls in the fast case.
That doesn't mean that completion notification doesn't have its advantages, especially when coupled with NT builtin auto-sizing thread pool, but it is not strictly better.
Depending on current load, that will either immediately do an asynchronous operation, or attempt synchronous non-blocking ones up to a certain number, then fall back to asynchronous.
To level the playing field, and if we are to take your opinion seriously, it would be beneficial to know what books or articles or whitepapers you have read to inform yourself about the design of the NT kernel.
I'm sorry, who's "we"? You replied to a comment that presented fact. The fact is that Windows use is mostly limited to desktops. As far as you're concerned, I could be entirely illiterate and my argument would still hold because it's based on fact. If you want to claim Windows is superior, please don't point us to design documentation. Show us actual numbers and use cases where Windows outperforms Unix, or is used in critical infrastructure, etc.
(FYI, I've read the "internals" book for a relatively old version of Windows, along with plenty of books about attacking the Windows kernel through its huge attack surface that exists to accommodate various needs of various software vendors...)
False. You ASSUMED facts based on co-relation - "Its used everywhere" doesn't mean anything. "But people must have a reason to use them" STILL doesn't mean anything. "Well, so why don't they use windows" STILL doesn't get you anywhere.
> I could be entirely illiterate and my argument would still hold because it's based on fact.
No. You can't enter into an argument when you know nothing about the subject. That is not how it works.
Why don't YOU present actual facts about the design? Show us you actually understand the internals of the kernel or have atleast some rudimentary knowledge. Otherwise you'd just be wasting everyone's time.
Well, that's an (unbacked) opinion, and I don't share it. NT design is not too bad (obviously especially in contrast with Consumer Windows), and especially given what was achieved on the first few releases (that was made easier by Cutler serious experience in the area), but now it is far from brillant, and it has it (huge) share of problems every serious users of both Windows and Unix based OS knows.
Now at one point, way in the past, NT was far above Linux, and some Linux fanboys existed that did not even knew what they were talking about, yet had strong opinions of superiority about the kernel they used. Now we are ironically in the opposite situation: Linux has basically caught up on all the things that matters (preemptive kernel, stability, versatility, scalability) and then quickly overtook NT, yet some people like to talk endlessly about the supposed architectural superiority of NT, that did not provide anything concrete in the real world in the long term and widely used, and that MS had to work around and/or redo with an other approach (while keeping vestigial of all the old ones) to do all its modern stuff.
What kernel hackers know to do, is to detect problem in architecture that look neat on paper. Brillant ones are able to anticipate. I don't even have to: history has shown were NT has been hold back by its original design.
You piqued my curiosity - just made one by extracting the syscall dispatch table from lxcore.sys and placing it alongside the Linux syscall list: https://goo.gl/QHGe1U
A lot of coverage there, but interesting to see which ones aren't yet implemented, at least in the recent build 14342.
I have installed the current fast ring build and have tried installing several packages on Windows. Some do install and work (compilers, build environment, node, redis server), but packages that use more advanced socket options (such as Ethereum) or that configure a deamon (most databases), still end with an error. Compatibility is improving with every new build, and you can ditch/reset the whole Linux environment on Windows with a single command, which is nice for testing.
They've said the initial intent is for developers to use it, not for running servers / etc (which is why they only target Windows 10 client and not Windows Server OSs).
There is "running servers" in production and there is "running servers" in dev.
If I can't run the entire stack I use for dev under the subsystem then I will go the other route, which is to continue using VMs. I am excited about the initial release, and the prospect of being able to use Windows for all of the regular things I do, but it's clear that this isn't ready for primetime even as a dev tool.
Yup, when I'm developing I need to run pretty much most stuff. I guess, I can install say postgres using the windows native version, but then we are back at square zero.
Since NT syscalls follow the x64 calling convention, the kernel does not need to save off volatile registers since that was handled by the compiler emitting instructions before the syscall to save off any volatile registers that needed to be preserved.
Say what? The NT kernel doesn't restore caller-saved registers at syscall exit? This seems extraordinary, because unless it either restores them or zaps them then it will be in danger of leaking internal kernel values to userspace - and if it zaps them then it might as well save and restore them, so userspace won't need to.
I can't think of much that would benefit from this except for, perhaps, headless command line type applications. The one that comes to mind is rsync. Being able to compile the latest version/protocol of rsync on a Linux machine and then running the same binary on a Windows host would be nice but fun seems to end there plus with Cygwin, this is largely a no-brainer without M$ help.
What about applications that hook to X Windows or do things like opening the frame buffer device. I've got a messaging application that can be compiled for both Windows and Linux and depending on the OS, I compile a different transport layer. Under Linux heavy use of epoll is used which is very different than how NT handles Async I/O - especially with sockets. So my application's "transport driver" is either compiling an NT code base using WinSock & OVERLAPPED IO or a Linux code base using EPOLL and pthreads.
Over all it seems like a nice to have but I'm struggling to extract any real benefit.
Can anyone offer up some real good use cases I may be overlooking?
There are both free and commercial X servers for Windows, and you can get a linux app running under WSL to work with one of those X servers very easily. I played with it a little bit and it worked fine.
With this feature, if you're a Linux developer, you're automatically a Windows developer as well. Almost like being able to run all Android or iOS apps on Windows phones.[1][2]
If you disassemble lxcore.sys you can still see hints of the Android subsystem project that it grew from: the \Device\adss and /dev/adss devices, the application name Microsoft.Windows.Subsystem.Adss, various function names containing "Adss", and some other textual references to Android.
It's too bad that x86 hardware doesn't do virtualization as well as IBM hardware. You can't stack VMs. That's exactly what's needed here - a non-kernel VM that runs above NT but below the application.
I thought that a) the conclusion of VMware's "Comparison of techniques" paper [1] was that x86 and possibly everything is Popek-and-Goldberg-virtualizable [2] via binary translation, and b) the last several years of Intel and AMD chips all have hardware virtualization support, including nested virtualization, that made their architectures Popek-and-Goldberg-virtualizable in the obvious way?
Very, very slowly. Microprocessors still have DMA instead of mainframe-like "channels", although we're starting to see MMUs on the I/O side. With channels, devices can't blither all over memory and neither driver nor device need be trusted.
Cygwin is layered above Win32. Win32 has no provision to nicely handle forks. So even if there was an NT API fork syscall (I'm don't think there is on Windows 10, WSL does not use the NT API, there is not any more Posix/SFU/{Whatever Unix NT classic subsys of the day} as far as I know), this would not go anywhere.
Windows 10 is still windows NT. The NT native API is widely used these days. It would be a huge departure for MS to stop supporting it in future versions of windows.
Most parts of the NT API have never been officially documented, officially supported, stable (in the "won't change" meaning), and the tiny parts that have actually been documented come with caveats that they are susceptible to change. Supporting fork through the NT API forever makes no sense if there are no users anymore. They could continue to do it for no specific reason, just because fork is internally needed by WSL for example, and so because it is easy to export the capability through the NT API, but I really don't see why they would necessarily do that.
Cygwin programs technically run under the Win32 subsystem, but they're not that cleanly layered. The runtime calls into a lot of Nt* functions, including undocumented ones. midipix (which another commenter mentioned) is another Unix-like environment for Windows that also runs under the Win32 subsystem, and apparently it has successfully implemented a real copy-on-write fork() on top of undocumented NT syscalls, so it's definitely possible.
Interesting, I wonder how much overhead is added to syscalls to look up the process type. Does NT still do this check when no WSL processes are running?
Pretty sure these are different entry points, so you wouldn't need to do anything different for normal Windows processes whether WSL is running or not.
I don't think so... both linux and windows binaries are using the same SYSCALL cpu instruction, and thus must be going to the same handler in the NT kernel.
They document the WinAPI, but how that talks to the kernel is not documented. You can talk to it directly if you want, but there is nothing from Microsoft on how to do that. So if you see those as the true system calls, they are not documented at all.
Well, tiny parts of the NT API (callable from userspace) are documented, but then often with the caveat that they are not stable (in practice, even some undocumented ones can be considered stable if used by enough programs in the wild, especially if they are simple and standalone and have no Win32 equivalent)
The very precise mechanism, though, is extremely unstable. For example virtually every release of Windows (even sometimes SP) changes the syscall numbers. You have to go through the ntdll, which is kind of a more heavyweight version of the Linux VDSO. (The NTDLL approach was invented way before the VDSO, though)
Ntdll is similar to VDSO in the sense that it is loaded into the memory space of every userspace process. Even that I think might have exceptions on the Linux side. Either way, unlike VDSO, Ntdll actually does export functions potentially useful when called from the program. Here is an interesting read. http://undocumented.ntinternals.net/
What do you think the VDSO is used for? It also exports "functions potentially useful when called from the program".
The approach is a little different though; ntdll exports all of the NT API, and you need to go through it to reach the NT API in a somehow more stable way than using syscall numbers. OTOH, the VDSO exports only virtual syscalls that gain (or have gained in the past) from being performed in userspace, and even then corresponding syscalls still exist in the kernel, with both stable numbers and even a stable API.
Next step is Microsoft basically needs to turn Windows into a flavour of Linux. If they don't, they're under massive pincer threat from Android and Chrome, which are rapidly becoming the consumer endpoints of the future. Windows is about to "do an IBM" and throw away a market that it created. See PS/2 and OS/2.
They should probably just buy Canonical. That would put the shivers into Google, properly.
Funny years ago i would have reflexively flabbergasted at the thought of microsoft buying canonical (or any linux distro producer)...but actually thinking on that concept, and seeing recent (perhaps less-than-hostile) approach that microsoft has taken towards open source and linux, that wouldn't be a bad idea. I mean if microsoft could have both offerings - for windows servers and ubuntu-installed servers - i suppose that would be a very smart business move. Assuming they don't actually butcher or deny resources to whatever linux company they would buy, i could see several benefits - not only to microsoft but to developers, system integrators, etc. worldwide. Hey if a side benefit is that it would spur the market (a la google, apple, etc.) a little - to the benefit of us civilians - that's cool too.
I think Microsoft should do what Apple did with BSD Unix aka Nextstep and merge it with their old OS.
Microsoft should take the Windows GUI and put it over Linux as a desktop manager. Microsoft could sell the Windows GUI for Linux users that want to run Windows apps.
I've been heavily downvoted for the view, but the facts are, there are hundreds of billions of dollars being spent in the Linux ecosystem, by corporations. Microsoft cannot afford not to be present in it. It's as simple as that. Canonical is starting to look like hitting Red Hat a bit on support contracts for corpos ets, so that's why I suggested that, but as you say, it could be another big and credible Linux distro (though Ubuntu all over the cloud must surely be tempting). Generally the idea that Microsoft wants to/must go big into Linux is uncontroversial, for me.
I'm incredibly biased. But that bias has come from assessing the technical details and concluding that NT really is superior, if that's any consolation.
PyParallel flogs the Windows versions of things like Go, Node and Tornado because none of those were implemented in a way that allows the NT completion-oriented I/O facilities to be optimally exploited.
It's depressing, honestly. In the sense that open source software never really comes close to taking advantage of NT because there are no such paradigms on UNIX. It's also complex as hell... I came from a UNIX background and completion ports were just a bizarre black box of overcomplicated engineering -- but after taking the time to understand the why, that was just a blub paradox reaction. And it's been a couple of years now of concerted study to really start appreciating the little details.
"The things that bothers me about all the 'async I/O' libraries out there... is that the implementation -- single-threaded, non-blocking sockets, event loop, I/O multiplex via kqueue/epoll, is well suited to Linux/BSD/OSX, but there's nothing asynchronous about it, it's technically synchronous non-blocking I/O, and it's inherently single-threaded."
Now I'm curious; have you played with reactOS at all? Do they implement the same VMS paradigm? That is, if given decent hw drivers for io on ReactOS, could one expect performance on the same order of magnitude as with a new nt-derived kernel?
I only have superficial experience with it, but from what I can tell there is a huge mismatch between IOCP and the UNIX readiness/poll model, and from my experience most server programs are written primarily for the latter.
You need to design your server code somewhat differently to take advantage of IOCP. What many UNIX-like softwares do instead is bend IOCP or WaitForMultipleObjects() to behave like Linux. It works, but the performance is not there.
Note that I haven't checked the code for any of the softwares in your chart so I could be wrong.
Oh man, seeing UNIX people use WaitForMultipleObjects() as if it were select() (or, gasp, trying to use it for socket server stuff) just kills me.
It's actually a really sophisticated multi-semaphore behind the scenes; but that sophistication comes at an inevitable performance penalty, because the wait blocks are in a linked list and it's generally expensive (in comparison to say, a completion packet being queued to a port) to satisfy that wait (the kernel needs to check lots of things and do locking and unlocking and linked list manipulation).
Here's what I don't understand: If IOCP gives us such a great performance boost why don't I see people using it, even on Windows systems? The first thing that comes to mind is that IOCP can likely only maintain high performance under certain edge-cases that aren't pertinent to the real world, second is that the other I/O models are needed not for unix, but for other aspects of the language its self where IOCP is not appropriate. So IOCP likely causes overhead. I don't know much about it so maybe someone can explain. It sounds revolutionary if it can be applied to the real world and not just edge cases.
People are used to the POSIX readiness model, that's all. And you can kind of make it work on Windows, even though the performance is not great. OTOH, porting an IOCP-oriented application to Linux will give you catastrophic performance because you need to use many threads to replicate the async model on top of the synchronous Linux I/O calls.
And for some reason, people get very defensive when you point out some of the advantages the NT kernel has over Linux. I mean, I use Linux, I don't like to use Windows but I have no problem admitting the IOCP model is better and async I/O on Linux is a sad broken mess. Denying it only serves to keep it that way.
IOCP is the top item on my (short) list of things Windows simply does better. The performance boost you see from designing a server for IOCP from the ground up is jaw-dropping. However, it's hard to grok because it's a very different model, most servers are proprietary code, and the MSDN documentation is barely sufficient (at least, back when I was writing IOCP-based servers in '09).
Yeah, I had to figure out how to leverage all the new threadpool stuff with async sockets from scratch for PyParallel, none of the MSDN docs detailed how AcceptEx() and ConnectEx() and DisconnectEx() and all that jazz were meant to be used with CreateThreadpoolIo(), StartThreadpoolIo() etc.
Even bought a couple of Microsoft books and they only covered it at the file I/O level, too. And the last Winsock book is reaaaaaally getting old.
Also, registered I/O is a similar jump in jaw-dropping performance. Disable completion notifications, pre-register all of your buffers on large pages, and use timers to schedule bulk de-queuing of all available completion events.
You definitely need a feedback loop in there to fine-tune the timings, but holy hell does it fly. You basically get a guaranteed-max-latency state machine (given bandwidth, computational cost of request processing, and number of clients connected).
Not only that but exploiting NT optimally takes a lot of knowledge about a lot of very low-level kernel details, which you sort of need to make a concerted effort to really learn, which is made harder by no official access to (recent) source code.
So I genuinely think there are just fewer programmers out there that specialize in this sort of stuff. There's definitely a correlation between well-above-the-average programmers trending toward open source contribution, but not really the same level of intellectual curiosity about NT/Windows.
One thing I've noticed is that all the people that grok really low level NT kernel details usually end up as reverse engineering experts.
Oh, except for the OSR internals list, that is usually overflowing with clue (both now and historically).
It also requires enjoying programming on Windows and buying all the Microsoft Press books that have tons of healthy information how the whole stack works.
Which doesn't work in the free (as in beer) culture world of modern UNIX.
So only those of us that feel like rewarding the work of others do get the information how everything works.
The simplest explanation is that the performance benefit of using Windows and implementing just about any Windows-specific design is outweighed by the cost of the Windows licensing fees, when compared to a measurably worse-performing Linux or FreeBSD solution that costs nothing. So very few bother to treat Windows versions of "backend" software as anything but an afterthought.
It's not even the licencing fees in my opinion. It's that it has had the equivalent of 200 Potterings run amok in it for 30 years. There are so many things that will eclipse any advantage from IO completion ports.
Not really... the license model provides support and creates fairly standard releases by which you can more easily leverage. If licensing or selling software is your thing. If all you need is something good enough and cheap then licensing could get in the way
I guess you should also mention on each of your comments that you are extremely biased against and paranoid about Microsoft and that you actively target any popular Microsoft article posted here in order to inject something negative, whether it's relevant or not.
I use to run Linux in a VM on windows and use Chocolatey for package management and cygwin and powershell etc, then I realized I was just trying to make Windows into Linux. Seems to be the way things are going and with the addition of the linux subsystem it kind of proves that Windows really isn't a good OS on it's own, especially not for developers.
I wish Windows/MS would abandon NT and just create a Linux distro. I don't know anyone who particularly likes NT and jamming multiple systems together seems like an awful idea.
Windows services and Linux services likely won't play nice together (think long file paths created by Linux services and other incompatibilities), for them to be 100% backward compatible they need to not only make Windows compatible with the things Linux outputs, but Linux compatible with the things windows services output, and to keep the Linux people from figuring out how to use Windows on Linux systems they'd need to make a lot of what they do closed source.
So I don't see a Linux+Windows setup being deployed for production. It's cool for developers, but even then you can't do much real world stuff that utilizes both windows and Linux. If you're only taking advantage of one system then whats the point of having two?
I went ahead and made the switch to Linux since I was trying to make Windows behave just like Linux.
> I wish Windows/MS would abandon NT and just create a Linux distro. I don't know anyone who particularly likes NT and jamming multiple systems together seems like an awful idea.
I do. The NT kernel is pretty clean and well architected. (Yes, there are mistakes and cruft in it, but Unix has that in spades.) It's not "jamming multiple systems together"; an explicit design goal of the NT kernel was to support multiple userland APIs in a unified manner. Darwin is a much better example of a messy kernel, with Mach and FreeBSD mashed together in a way that neither was designed for.
It's the Win32 API that is the real mess. Having a better officially supported API to talk to the NT kernel can only be a good thing, from my point of view.
Well, large parts of the NT API are very close from Win32 API for obvious reasons, and so are often in the realm of dozen of params and even more crazy Ex functions. Internally there are redundancies that do not make much sense (like multiple versions of mutex or spinlock depending on which parts of kernel space use them, IIRC), and some whole picture aspects of Windows makes no sense at all given the architectural cost it induces (Winsock split in half between userspace and obviously needed kernel support is just completely utterly crazy, beyond repair, it makes so little sense you want to go back in past and explain the designer of that mess how stupid this is). The initial approach of NT subsystems was absolutely insane (hard dep on a NT API core, so can't do emulation with classic NT subsystems - so either limited to OS having some technical similarities like OS/2, or very small communities when doing a new target like the Posix or SFU was) -- WSL makes complete sense, though, but it is maybe a little late to the party. Classic NT subsystems are of so little use that MS did not even use them for their own Metro and then UWP things, even though they would like very hard to distinguish that more from Win32 and make the world consider Win32 as legacy. I've read the original paper motivating to put Posix in an NT subsystem, and it contained no real strong point, only repeated incantations that this will be better in an NT subsystem and worse if done otherwise (well for fork this is obvious, but the paper was not even focused on that), with none of the limitations I've explained above ever considered.
Still considering the whole system, an instable user kernel interface has few advantages and tons of drawbacks. MS is extremely late to the chroot and then container party because of that (and let's remember that the core technology behind WSL emerged because they wanted to solve the chroot aside userspace system on their OS in the first place, NOT because they wanted to run Linux binaries) -- so yet another point why classic NT subsystems are useless.
Back to core kernel stuff, IRQL model is shit. Does not make any sense when you consider what really happens, and you can't really use arbitrary multiple levels. It seems cute and clean and all of that, but Linux approach of top and bottom halves and kernel and user threads might seem messy but is actually far more usable. Another point: now everybody uses multiprocessor computers, but back in the day the multiple HAL were also a false good idea. MS recognize it now and only want to handle ACPI computers, even on ARM. Other OSes do all kind of computers... Cutler pretended to not like the "everything is a file" approach, but NT does basically the same thing with "everything is a handle". And soon enough, you hit exactly the same conceptual limitations (except not in the same places) that not everything is actually the same, so that cute abstraction leaks soon enough (well, it does in any OS).
On a more result oriented approach, one of the things WSL makes clear is that file operations are very slow (just compare an exactly identical file heavy workload under WSL and then under a real Linux)
So of course there are (probably) some good parts, like in any mainstream kernel, but there are also some quite dark corners, and I am not an expert about all architectural design of NT but I'm not a fan of the parts I know, and I strongly prefer the Linux way to do equivalent things.
> Cutler pretended to not like the "everything is a file" approach, but NT does basically the same thing with "everything is a handle". And soon enough, you hit exactly the same conceptual limitations (except not in the same places) that not everything is actually the same, so that cute abstraction leaks soon enough (well, it does in any OS).
Explain? Pretty much the only thing you can do with a handle is to release it. That's very different from a file, which you can read, write, delete, modify, add metadata to, etc... handles aren't even an abstraction over anything, they're just a resource management mechanism.
You are right, but those points are details. FD under modern Unixes (esp. Linux, but probably others) serves exactly the same purpose (resource management). The FD where read/write can't be used just don't define those (same principle for other syscalls) -- similarly if you try to NtReadFile on an incompatible Handle it will also give you an error back. Both are in a single numbering space per process. NT largely makes use of NtReadFile / NtWriteFile to communicate with drivers, even in quite core Windows components (Winsock and AFD). And NT Handles do serve at least an abstraction (I know of): they can be signaled, and waited for with WaitFor*Objects.
Uh, no, they are very crucial details. For example, it means the difference between letting root delete /dev/null like any other "file" on Linux, versus an admin not being able to delete \Device\Null on Windows because it isn't a "file". The nonsense Linux lets you do because it treats everything like a "file" is the problem here. It's not a naming issue.
And to give you another example, look at how many people bricked their computers because Linux decided EFI variables were files. You can blame the vendors all you want, but the reality is this would not have happened (and, mind you, it would have been INTUITIVE to every damn user) if the OS was sane and just let people use efibootmgr instead of treating every bit and its mother as files. Just because you have a gun doesn't mean you HAVE to try shooting youself, you know? That holds even if the manufacturer was supposed to have put a safety lock on the trigger, by the way. Sometimes some things just don't make sense, if that makes sense.
"The way it works is useful?"! When was the last time you found it useful to delete something like /dev/null via command line? And how many poor people do you think have screwed up their systems and had to reboot because they deleted not-really-files by accident? Do you think the two numbers are even comparable if the first one isn't outright zero?
It literally doesn't make any sense whatsoever for many of these devices to behave like physical files, e.g. be deletable or whatnot. Yes there is an exception to every nonsense like this, so yes, some devices do make sense as files, but you completely miss the point when you ignore the widespread nonsense and justify it with the exceptions.
Your complaint is with the semantics of the particular file. There's no reason why files in /dev need be deletable using unlink. That's an historical artifact, and one that's being rectified.
"Everything is a file" is about reducing most operations to 4 very abstract operations--open, read, write, and close. The latter three take handles, and it's only the former that takes a path. But you're conflating the details of the underlying filesystem implementation with the relevant abstraction--being a file implies that it's part of an easily addressable, hierarchical namespace. Being a file doesn't imply it needs to be deletable. unlink/remove is not part of the core abstraction. But they are hints that the abstraction is a little more leaky than people let on. Instantiating and destroying the addressable character of a file poses difficult questions regarding what the proper semantics should be, though historically they're deletable simply because using major/minor device nodes sitting atop the regular persistent storage filesystem was the simplest and most obvious implementation at the time.
Hm I was more thinking about open FD, not just special file entries on the FS. Well, I agree with you: it's a little weird and in some cases dangerous to have the char/block devices in the FS, and it has already been worked-around multiple times in different ways, and even in some cases simultaneously with multiple different work-around. NT is better on that point. But not once the "files" are open and you've got FD.
> IRQL model is shit. Does not make any sense when you consider what really happens,
On the contrary. It's only when one considers what happens, especially in the local APIC world as opposed to the old 8259 world, that what the model is actually does finally make sense.
I don't care about the 8259, I don't see why anybody should care except PIC driver programmers, it's doubtful anybody designing NT cared given the very first architecture it was design against at was not a PC, and this is typically the kind of thing that goes through the HAL.
IRQL is a completely software abstraction thing, in the same way top/bottom halves and various threads are under Linux (hey, in some version of the Linux kernel it even transparently switches to threaded IRQ for the handlers, there is no close relationship with any interrupt controller at this point...). IRQL is shit because most of the arbitrary levels it provides are not usable to distinguish anything continuously from an application point of view (application in the historical meaning, no "App" bullshit intended here), even so in seemingly continuous areas (DIRQL), so there is no value in providing so many levels with nothing really distinguishing between them -- or at some level transitions too many things completely different. It's even highly misleading, to the point the article you link is needed (but does not even provide the whole picture.) I see potential for misleading people with PIC knowledge, people used to real time OSes (if you try to organize application priority by basing on IRQL, you will miserably fail), people with background in other kernels, well, pretty much everybody.
No, they do have sophisticated user mode library but use only public kernel APIs. That user model library also helped them relatively painlessly migrate SQL Server to Linux.
Well I've personally seen Microsoft employees themselves complaining about the state of NT while saying it's "fallen behind Linux".
An old HN commenter once wrote (mrb)
> There is not much discussion about Windows internals, not only because they are not shared, but also because quite frankly the Windows kernel evolves slower than the Linux kernel in terms of new algorithms implemented. For example it is almost certain that Microsoft never tested I/O schedulers, process schedulers, filesystem optimizations, TCP/IP stack tweaks for wireless networks, etc, as much as the Linux community did. One can tell just by seeing the sheer amount of intense competition and interest amongst Linux kernel developers to research all these areas.
>Note: my post may sound I am freely bashing Windows, but I am not. This is the cold hard truth. Countless of multi-platform developers will attest to this, me included. I can't even remember the number of times I have written a multi-platform program in C or Java that always runs slower on Windows than on Linux, across dozens of different versions of Windows and Linux. The last time I troubleshooted a Windows performance issue, I found out it was the MFT of an NTFS filesystem was being fragmented; this to say I am generally regarded as the one guy in the company who can troubleshoot any issue, yet I acknowledge I can almost never get Windows to perform as good as, or better than Linux, when there is a performance discrepancy in the first place.
"As Brother Francis readily admitted, his mastery of pre-Deluge English was far from masterful yet. The way nouns could sometimes modify other nouns in that tongue had always been one of his weak points. In Latin, as in most simple dialects of the region, a construction like servus puer meant about the same thing as puer servus, and even in English slave boy meant boy slave. But there the similarity ended. He had finally learned that house cat did not mean cat house, and that a dative of purpose or possession, as in mihi amicus, was somehow conveyed by dog food or sentry box even without inflection. But what of a triple appositive like fallout survival shelter? Brother Francis shook his head."
I initially thought this one was less ambiguous but I have to admit, I think Microsoft's phrasing is right. Let's try some substitution:
"Russia Factory for England" most likely exists inside of Russia and is for the English.
"John's mail for Sally [try: who is out of town]" even with the addition, I presume that John has authored mail for Sally and is not collecting the parcels to give to her.
Here's a trickier one:
"Sampsons' Dinner for Two". This could be the following:
1. A product named "Sampsons' Dinner for Two" bought from a retail store
2. An item "Dinner for Two" on a menu from a restaurant named "Sampsons"
3. A place named "Sampsons' Dinner for Two" with only two-person tables.
4. A product "Sampsons' Dinner" which comes in multiple sizes, one of them being designed for two people. (which is the ambiguous form - presuming there's also say Annie's Dinner for One/Two and Martha's Dinner for One/Two - each with a brand specific cuisine). Even here though, the ownership of which "Dinner for Two" product is still clear - it's the "Sampsons'" or "Martha's" brand.
Regardless of what kind of substitution, we go back to "Windows Subsystem for Linux" for the most part parsing as
"Windows [Subsystem for Linux]" like
"Windows [Media Player]". I don't assume that it's "[Windows Media] Player" - as in some multi-platform software that is tasked with playing the proprietary windows media formats.
It seems weird, but I think it's unarguably the right choice.
Maybe because the way they stated it it would be a much more attractive technology. Seems like an attempt to regaining ground in the server market.
This would be really useful for distributing Windows apps as Linux binaries. It would make it easier to develop from Linux and target Windows. Need the same for OSX.
Pretty neat stuff. I think that MS should just create their own Linux Distribution & port all MS products. Get rid of the Windows NT Kernel. I believe it's outdated & doesn't have the same update cycle that the Linux Kernel has.
Why run a Linux Application/binary on a windows server OS? When you can just run it on Linux OS and get better performance & stability.
There are leaks galore. NT4, 2000, and more recently, the Windows Research Kit. Just google something like 'apcobj.c' and see. (Hah, first link was a github repo!)
Here are some, or maybe this is not part of the NT Kernel...
1. The use of drive letters A-Z for file system access.
2. Creating symbolic links to files and folders, like you can in Unix/Linux. You have to set a setting somewhere to enable this, but there's a security risk.
3. Standard functional/usable non-gui terminal application like Unix/Linux ssh. PowerShell doesn't come close.
4. Ability to SUDO or su Admin like Unix/Linux.
Maybe these are not kernel related above, but the OS specific layer.
1. The use of a multi-root hierarchy vs. a single root hierarchy is pretty arbitrary. Drive letters in turn are just an arbitrary way to define the multi-root hierarchy.
2. `mklink` [0] has existed since Windows Vista for NTFS file system. No settings toggling required.
3. What is your argument against PowerShell? In what ways does it fall short? I have been pretty successful with using it for various tasks.
4. This is about the only legit claim. Windows always requires full credentials to execute as another user. Windows does provide `runas.exe`, but you must provide the target user's full credentials.
> 3. What is your argument against PowerShell? In what ways does it fall short? I have been pretty successful with using it for various tasks.
Powershell is an acceptable scripting language. It's a horrible interactive shell. They shoudn't have made "shell" part of its name if they weren't going to at least include basic interactive features like readline compatibility and usable tab completion.
I'm not familiar with what you mean by readline compatibility. I see a GNU Readline library with history functions, but I'm not sure what that means being integrated into a shell, and what features you would expect to see.
My experience with tab completion in PowerShell is great. It completes file paths, command names, command parameter names, and even command parameter values if they're an enumeration. Could you describe what else you would expect to see?
Powershell's completion is worthless for the most common use case of saving keystrokes. Since it fills in the entire remaining command name and cycles through the possibilities in alphabetical order instead of completing just what's unambiguous and presenting a list of the possibilities from there, it doesn't help with typing out common prefixes and if you find yourself cycling through an unreasonably large number of possibilities you have to backspace get rid of the unwanted completion (often very long, given the naming conventions) before you can refine your search.
They took a code completion technique that works alright for an IDE and put it on the command line, losing some key usability in the process when they could have just implemented the paradigm that has been standard in the Unix world for decades.
Yes, it does a great job of identifying what the completion possibilities are in almost every context. That's good enough for an IDE, but only half the job when you're making an interactive command line shell.
As for other readline features: it's really annoying to only partially implement a well-known set of keyboard shortcuts.
See, and I personally dislike the system you mention. It bugs me to no end to have only a couple options, but the system will only fill in the common prefix. Now I have to look, figure out what's there already, figure out what the next letter is, hit it, then hit tab again. If I want to save key strokes, that what aliases and symlinks are for, not tab completion.
Also, at least in PowerShell 5, if you run the ISE instead of the cmd-based terminal, it shows an IDE-like overlay of completions while you're typing.
I am fairly certain that they use the same console system. For instance, right-clicking on the title bar gives the exact same options as in `cmd.exe`. I was perhaps a little overzealous in calling it cmd-based; that's a slip-up on my part simply because `cmd.exe` was the only program to use that system before. Thanks for keeping me honest.
Both use the Windows console host, effectively what a terminal emulator is on Unix-likes. It provides the window and a bunch of other functionality (character grid with attributes, history, aliases, selection, drag&drop of files into the window, etc.).
It's just that every console application on Windows uses that host. This includes cmd, PowerShell, Far Manager, or even vi. Sorry, I may have seen it conflated with cmd too often. It just nags me. For Linux users it's probably when everyone starts calling a terminal (emulator) "bash".
https://github.com/lzybkr/PSReadLine perhaps. Personally I find it nice, but it requires a bunch of tweaking to feel comfortable, but maybe less so to people who are used to readline.
Tab completion works nicely and you can even rotate through the different values a parameter can take. Also, ctrl+space lists all the options (parameters, values, files, folders) available at your cursor's position.
I would call it something more than acceptable scripting language. It is object oriented and it can easily use C# libraries, which is pretty neat.
You are confusing the win32 subsystem with the NT kernel. They are not the same, the win32 layer acts as a translation. Also symbolic and hard links are supported by NTFS, they are just not exposed in the UI. There are utilities to create them if you really want to.
The shell itself and the rest of userland has very little to do with the kernel. It seems its the userland you are upset with. Swapping out the kernel won't fix that.
For some values of well. If you logon interactively and start a powershell session, you do not have administrative powers and cannot get them without opening a new shell. If you logon via PS remoting, you have administrative powers by default and cannot lose them.
UAC is a GUI kludge, and is very grating especially in Powershell.
My biggest pet Peeve about Windows is the way it accesses files I'm not sure if this is a kernel or filesystem issue. But when a remote user has a file open as long as that file is open other users are prevented from updating or replacing the file. It happens all the time at my work and I know of no obvious way to work out who has the file open because as far as I can tell nothing like lsof exists.
This is probably the number one cause of me banging my head against the desk and wishing Windows behaved more like Linux.
What you can do while it's open is partially defined by the dwShareMode parameter in CreateFile. Unfortunately a lot of people look at the daunting documentation, shrug, and put 0 there, which is the least permissive mode. A lot of libraries do that too.
OTOH there are other limits that are not dictated by dwShareMode. Such as deleting files while handles are open - this blocks a new file with the same name from being created until all handles are closed. That's probably the worst one. There are some other crappy ones involving directory handles that I don't care to enumerate.
2. Filesystem feature, supported by NTFS for quite a while in differen styles. It has a security policy because a lot of userland software doesn't know about them. I use them all the time and they work, even if you move things like /Users/.
3. Userland issue. You can compile bash and an ssh server for Windows if you want. PowerShell is quite different, yes.
> 1. The use of drive letters A-Z for file system access.
Why is this a problem? As a user, I've always preferred to have drive letters - it makes it immediately clear if, for example, I'm moving files between different physical drives.
Considering mount points, reparse points, and things like subst, I doubt you can ever really know that. Granted, the deviations from the normal scheme are your own making as a user, but so are the places where you mount volumes from different physical drives in a single root hierarchy.
Curiosity got the best of me here: I had to look this up in the docs to see how a linux syscall that takes 3 parameters could possibly take 11 parameters. Spoiler alert: they are used for async callbacks, filtering by name, allowing only partial results, and the ability to progressively scan with repeated calls.