So I see the historical NT subsystem story restated all the time and that makes ...

skissane · on Nov 29, 2020

> The short lived and of limited scope OS/2 subsys was for binary compat, but given the origin and NT and the similarities of some techs it is not surprising

They could have used a more classic NT environment subsystem architecture for WSL1, it would have mostly worked. The primary reason they didn't was performance, and to a lesser extent compatibility.

Unix systems create lots of short-lived processes, because processes on Unix are relatively cheap. By contrast, NT processes are much heavier weight. This is part of why the POSIX subsystem / Interix / SFU / SUA and Cygwin too always have had performance problems. Introducing a new lightweight process concept, the picoprocess, helped get over the performance issue. But these new lightweight processes can't be allowed to make arbitrary NT syscalls, which rules out the classic NT environment subsystem implementation pattern - have DLLs which make NT syscalls instead of e.g. OS/2 syscalls.

If they'd followed the classical Windows NT environment model, they would have enhanced NT to be able to load ELF executables and then shipped their own libc.so which made NT syscalls. That would have worked fine for most apps which don't make Linux syscalls directly, but would have had worse performance than WSL1 had. It also would have meant more work for them, since they'd be emulating not just the syscall layer but the libc layer too. (They could have sped things up by reusing the GNU libc code; to do that, they would have had to have open sourced most of the code of their subsystem, which would have probably been a positive overall, but I'm not sure how feasible that would have been given internal Microsoft politics around open source.)

To support the minority of apps which make Linux syscalls directly, bypassing libc, they needed something in the kernel to intercept the SYSENTER/SYSCALL/INT 0x80. They could have redirected it back to user mode for handling in their libc.so. That would have worked, it just would have been slower than WSL1 is. (And WSL1 already has enough performance issues, that performance was one of the main motivators for switching to the new architecture in WSL2.)

temac · on Nov 30, 2020

There are tons of programs that do not go through "the" libc under Linux, everybody and their dogs seem to reimplement their own locks recently for example. Plus the libc actually makes a way greater surface to cover given it gives you access to most of the syscall plus other functions.

Plus you still want all the fine semantics of Linux (mmap, cow or at least cow-like, etc.) and for tons of calls the libc is a trivial layer anyway, so you are not gaining much. The layout of processes also. At this point you are back to a big part of picoprocesses again, because you don't want the classic NT layout, nor cb to userspaces, etc.

Maybe would have been more of a practical idea for a BSD, but even then the remark about the surface of the libc API vs syscalls apply.

But Linux has a stable API, so it does not make much sense to do Linux compat without re-implementing it and given you can do just that, well do just that.

And you rightly note at the end that syscall support is needed, but at this point the whole architecture would have made absolutely no sense: if you are going to implement syscall eventually, just implement them in the first place, and not more code.

Also I doubt the only issues of WSL1 were performance. I doubt they would have been able to follow the upstream quickly enough, at least not without a pretty big dedicated team, and it seems difficult to justify. The WSL2 approach leverages techs that are used for many other things.

skissane · on Nov 30, 2020

> There are tons of programs that do not go through "the" libc under Linux

I agree with you there. This is a big problem with using the classic environment subsystem model.

> Plus you still want all the fine semantics of Linux (mmap, cow or at least cow-like, etc.) and for tons of calls the libc is a trivial layer anyway, so you are not gaining much.

One potential advantage to the environment subsystem model – it might have pushed them to enhance the NT API to be closer in feature parity to the Linux API, which could then have benefited Win32.

> I doubt they would have been able to follow the upstream quickly enough, at least not without a pretty big dedicated team, and it seems difficult to justify.

Well, one relevant difference is that Linux is much more of a moving target than POSIX or OS/2 are. POSIX is a formal standard, and changes in formal standards are slow, they take years.

Back in the 1990s, OS/2 was a moving target, with IBM coming out with new versions with new features (2.x, 3.x, 4.x). But, Microsoft had decided they only wanted to support OS/2 1.x, and to ignore all new features in OS/2 2.x and higher, which made Microsoft's task much simpler. And even back when IBM was actively developing OS/2, they couldn't attain anywhere near the pace of change which is happening with Linux. Linux has far more people working on it than IBM ever had working on OS/2; and IBM, certainly back then (I don't know whether it is still true today), was notorious for slow, cumbersome and overly bureaucratic development practices, which was another limit on OS/2's velocity.