
How the Windows Subsystem for Linux Redirects Syscalls - jackhammons
https://blogs.msdn.microsoft.com/wsl/2016/06/08/wsl-system-calls/
======
ataylor284_
> The real NtQueryDirectoryFile API takes 11 parameters

Curiosity got the best of me here: I had to look this up in the docs to see
how a linux syscall that takes 3 parameters could possibly take 11 parameters.
Spoiler alert: they are used for async callbacks, filtering by name, allowing
only partial results, and the ability to progressively scan with repeated
calls.

~~~
bitwize
This is a recurring pattern in Windows development. Unix devs look at the
Windows API and go "This syscall takes 11 parameters? GROAN." But the NT
kernel is much more sophisticated and powerful than Linux, so its system calls
are going to be necessarily more complicated.

~~~
trentnelson
Curiosity got the better of me recently when I re-read Russinovich's [NT and
VMS - The Rest Of The Story]([http://windowsitpro.com/windows-client/windows-
nt-and-vms-re...](http://windowsitpro.com/windows-client/windows-nt-and-vms-
rest-story)), and I bought a copy of [VMS Internals and Data
Structures]([http://www.amazon.com/VAX-VMS-Internals-Data-
Structures/dp/1...](http://www.amazon.com/VAX-VMS-Internals-Data-
Structures/dp/1555580599)).

Side-by-side, comparing VMS to UNIX, and VMS's approach to a few key areas
like I/O, ASTs and tiered interrupt levels are simply just more sophisticated.
NT inherited all of that. It was fundamentally superior, as a kernel, to UNIX,
from day 1.

I haven't met a single person that has _understood_ NT _and_ Linux/UNIX, and
still thinks UNIX is superior as far as the kernels go. I have definitely
alienated myself the more I've discovered that though, as it's such a wildly
unpopular sentiment in open source land.

Cutler got a call from Gates in 89, and from 89-93, NT was built. He was 47 at
the time, and was one of the lead developers of VMS, which was a rock-solid
operating system.

In 93, Linus was 22, and starting "implementing enough syscalls until bash
ran" as a fun project to work on.

Cutler despised the UNIX I/O model. "Getta byte getta byte getta byte byte
byte." The I/O request packet approach to I/O (and tiered interrupts) is one
of the key reasons behind NT's superiority. And once you've grok'd things like
APCs and structured exception handling, signals just seem absolutely ghastly
in comparison.

~~~
jen20
I've never met a single person who understood what they were talking about and
referred to a "UNIX kernels". It may be true that Linux was once less advanced
than NT - this is no longer the case, despite egregious design flaws in things
like epoll. It has simply never been true (for example) for the Illumos (nee
Solaris) kernel.

~~~
trentnelson
I qualified it as "Linux/UNIX kernel" because I wanted to emphasize the kernel
and not userspace.

Solaris event ports are good, but they're still ultimately backed by a
readiness-oriented I/O model, and can't be used for asynchronous file I/O.

~~~
niels_olson
Here's a nice graphical comparison of syscalls between Linux and Windows

[http://www.visualcomplexity.com/vc/project.cfm?id=392](http://www.visualcomplexity.com/vc/project.cfm?id=392)

Are you saying the Windows flow looks like spaghetti only because the software
tested software (Apache) wasn't designed for Windows?

~~~
trentnelson
Heh, 10 years old, original link doesn't work, image is tiny. And it sounds
like they were comparing Linux and Apache to IIS and Windows.

It's hard to evaluate this in any way more than "yeah that's a cute spaghetti
diagram". If I wanted to drag Linux through the mud visually I'd depict how
much time every socket I/O op spends in vfs/fsync stuff. (i.e. you can depict
anything to make your point)

~~~
niels_olson
That might be a cool diagram to show! I honestly know nothing about kernel
programming, just happened to come across that a long time ago and bookmarked
it.

------
luchs
>As of this article, lxss.sys has ~235 of the Linux syscalls implemented with
varying level of support.

Is there a list of these syscalls somewhere? It would be cool to check it
against the recent Linux API compatibility paper [0, 1].

[0]: [http://oscar.cs.stonybrook.edu/api-compat-
study/](http://oscar.cs.stonybrook.edu/api-compat-study/) [1]:
[http://www.oscar.cs.stonybrook.edu/papers/files/syspop16.pdf](http://www.oscar.cs.stonybrook.edu/papers/files/syspop16.pdf)

~~~
besselheim
You piqued my curiosity - just made one by extracting the syscall dispatch
table from lxcore.sys and placing it alongside the Linux syscall list:
[https://goo.gl/QHGe1U](https://goo.gl/QHGe1U)

A lot of coverage there, but interesting to see which ones aren't yet
implemented, at least in the recent build 14342.

(I used Filippo Valsorda's work from [https://filippo.io/linux-syscall-
table](https://filippo.io/linux-syscall-table) as the Linux syscall data
source.)

------
Maarten88
I have installed the current fast ring build and have tried installing several
packages on Windows. Some do install and work (compilers, build environment,
node, redis server), but packages that use more advanced socket options (such
as Ethereum) or that configure a deamon (most databases), still end with an
error. Compatibility is improving with every new build, and you can
ditch/reset the whole Linux environment on Windows with a single command,
which is nice for testing.

~~~
skrowl
They've said the initial intent is for developers to use it, not for running
servers / etc (which is why they only target Windows 10 client and not Windows
Server OSs).

~~~
stuaxo
Yup, when I'm developing I need to run pretty much most stuff. I guess, I can
install say postgres using the windows native version, but then we are back at
square zero.

~~~
Maarten88
Installing postgres on lxss still ends in a 'syscall not implemented' error.

------
caf
_Since NT syscalls follow the x64 calling convention, the kernel does not need
to save off volatile registers since that was handled by the compiler emitting
instructions before the syscall to save off any volatile registers that needed
to be preserved._

Say what? The NT kernel doesn't restore caller-saved registers at syscall
exit? This seems extraordinary, because unless it either restores them or zaps
them then it will be in danger of leaking internal kernel values to userspace
- and if it zaps them then it might as well save and restore them, so
userspace won't need to.

~~~
trentnelson
I think that's referring to the prolog/epilog convention and "homing" of
parameter registers, e.g.

Frame struct ReturnAddress dq ? HomeRcx dq ? HomeRdx dq ? HomeR8 dq ? HomeR9
dq ? Frame ends

    
    
        NESTED_ENTRY Foo, _TEXT$00
    
        mov Frame.HomeRcx[rsp], rcx
        mov Frame.HomeRdx[rsp], rcd
        mov Frame.HomeR8[rsp], r8
        mov Frame.HomeR9[rsp], r9
    
        alloc_stack 64
    
        END_PROLOG
        
        ; *do stuff*
    
        BEGIN_EPILOG
    
        add rsp, 64
    
        NESTED_END Foo, _TEXT$00
    

[https://msdn.microsoft.com/en-
us/library/tawsa7cb.aspx](https://msdn.microsoft.com/en-
us/library/tawsa7cb.aspx)

------
emcrazyone
I can't think of much that would benefit from this except for, perhaps,
headless command line type applications. The one that comes to mind is rsync.
Being able to compile the latest version/protocol of rsync on a Linux machine
and then running the same binary on a Windows host would be nice but fun seems
to end there plus with Cygwin, this is largely a no-brainer without M$ help.

What about applications that hook to X Windows or do things like opening the
frame buffer device. I've got a messaging application that can be compiled for
both Windows and Linux and depending on the OS, I compile a different
transport layer. Under Linux heavy use of epoll is used which is very
different than how NT handles Async I/O - especially with sockets. So my
application's "transport driver" is either compiling an NT code base using
WinSock & OVERLAPPED IO or a Linux code base using EPOLL and pthreads.

Over all it seems like a nice to have but I'm struggling to extract any real
benefit.

Can anyone offer up some real good use cases I may be overlooking?

~~~
quux
There are both free and commercial X servers for Windows, and you can get a
linux app running under WSL to work with one of those X servers very easily. I
played with it a little bit and it worked fine.

------
coverband
With this feature, if you're a Linux developer, you're automatically a Windows
developer as well. Almost like being able to run all Android or iOS apps on
Windows phones.[1][2]

[1] [http://www.pcworld.com/article/3038652/windows/microsoft-
kil...](http://www.pcworld.com/article/3038652/windows/microsoft-kills-
project-astoria-the-tool-designed-to-port-android-apps-to-windows-10.html) [2]
[https://developer.microsoft.com/en-
us/windows/bridges/ios](https://developer.microsoft.com/en-
us/windows/bridges/ios)

Edit: Now I am puzzled as to why this got downvoted?

~~~
besselheim
If you disassemble lxcore.sys you can still see hints of the Android subsystem
project that it grew from: the \Device\adss and /dev/adss devices, the
application name Microsoft.Windows.Subsystem.Adss, various function names
containing "Adss", and some other textual references to Android.

------
Animats
It's too bad that x86 hardware doesn't do virtualization as well as IBM
hardware. You can't stack VMs. That's exactly what's needed here - a non-
kernel VM that runs above NT but below the application.

~~~
pmalynin
You can have nested VMs.

[https://www.kernel.org/doc/Documentation/virtual/kvm/nested-...](https://www.kernel.org/doc/Documentation/virtual/kvm/nested-
vmx.txt)

~~~
overgryphon
Windows also now supports nested virtualization.

[https://msdn.microsoft.com/en-
us/virtualization/hyperv_on_wi...](https://msdn.microsoft.com/en-
us/virtualization/hyperv_on_windows/user_guide/nesting)

------
kevincox
> the Linux fork syscall has no _documented_ equivalent for Windows

Emphasis is mine. I wonder if this is something that cygwin could (ab)use.
Also I wonder why they would need this undocumented call.

~~~
xorblurb
Cygwin is layered above Win32. Win32 has no provision to nicely handle forks.
So even if there was an NT API fork syscall (I'm don't think there is on
Windows 10, WSL does not use the NT API, there is not any more
Posix/SFU/{Whatever Unix NT classic subsys of the day} as far as I know), this
would not go anywhere.

~~~
pcwalton
> So even if there was an NT API fork syscall

You can do it with NtCreateProcess:
[https://groups.google.com/d/msg/microsoft.public.win32.progr...](https://groups.google.com/d/msg/microsoft.public.win32.programmer.kernel/ejtHCZmdyaI/k0D0Jinx9KwJ)

(The Win32 userland won't understand what you did, but you can still do it.)

~~~
xorblurb
Well, you can do it on some versions of Windows. On Windows 10, and even
future version of Windows 10, not so sure...

~~~
lmm
Windows 10 is still windows NT. The NT native API is widely used these days.
It would be a huge departure for MS to stop supporting it in future versions
of windows.

~~~
xorblurb
Most parts of the NT API have never been officially documented, officially
supported, stable (in the "won't change" meaning), and the tiny parts that
have actually been documented come with caveats that they are susceptible to
change. Supporting fork through the NT API forever makes no sense if there are
no users anymore. They could continue to do it for no specific reason, just
because fork is internally needed by WSL for example, and so because it is
easy to export the capability through the NT API, but I really don't see why
they would _necessarily_ do that.

~~~
lmm
MS has historically maintained compatibility with even undocumented APIs. Of
course that could change.

------
bla2
Does anybody know how fork() is implemented? This blog post kind of sounds
like fork() would do the slow emulation of it through CreateProcess().

~~~
xorblurb
fork() is properly implemented by the NT kernel. WSL is _not_ layered above
Win32.

------
obnauticus
Excellent post, Jack.

------
quux
Interesting, I wonder how much overhead is added to syscalls to look up the
process type. Does NT still do this check when no WSL processes are running?

~~~
stuaxo
Pretty sure these are different entry points, so you wouldn't need to do
anything different for normal Windows processes whether WSL is running or not.

~~~
quux
I don't think so... both linux and windows binaries are using the same SYSCALL
cpu instruction, and thus must be going to the same handler in the NT kernel.

------
_RPM
Does Microsoft document all system calls?

~~~
detaro
They document the WinAPI, but how that talks to the kernel is not documented.
You can talk to it directly if you want, but there is nothing from Microsoft
on how to do that. So if you see those as the true system calls, they are not
documented at all.

~~~
xorblurb
Well, tiny parts of the NT API (callable from userspace) are documented, but
then often with the caveat that they are not stable (in practice, even some
undocumented ones can be considered stable if used by enough programs in the
wild, especially if they are simple and standalone and have no Win32
equivalent)

The very precise mechanism, though, is extremely unstable. For example
virtually every release of Windows (even sometimes SP) changes the syscall
numbers. You have to go through the ntdll, which is kind of a more heavyweight
version of the Linux VDSO. (The NTDLL approach was invented way before the
VDSO, though)

~~~
therein
Ntdll is similar to VDSO in the sense that it is loaded into the memory space
of every userspace process. Even that I think might have exceptions on the
Linux side. Either way, unlike VDSO, Ntdll actually does export functions
potentially useful when called from the program. Here is an interesting read.
[http://undocumented.ntinternals.net/](http://undocumented.ntinternals.net/)

~~~
xorblurb
What do you think the VDSO is used for? It also exports "functions potentially
useful when called from the program".

The approach is a little different though; ntdll exports all of the NT API,
and you need to go through it to reach the NT API in a somehow more stable way
than using syscall numbers. OTOH, the VDSO exports only virtual syscalls that
gain (or have gained in the past) from being performed in userspace, and even
then corresponding syscalls still exist in the kernel, with both stable
numbers and even a stable API.

------
davidgerard
Yes, yes, but can we run Wine on it?

------
negus
wtf is "pico process" and "pico driver"?

~~~
wereHamster
[https://blogs.msdn.microsoft.com/wsl/2016/05/23/pico-
process...](https://blogs.msdn.microsoft.com/wsl/2016/05/23/pico-process-
overview/)

------
prirun
Step 1: embrace

------
smegel
Funny they don't mention ioctl.

------
vegabook
Next step is Microsoft basically needs to turn Windows into a flavour of
Linux. If they don't, they're under massive pincer threat from Android and
Chrome, which are rapidly becoming the consumer endpoints of the future.
Windows is about to "do an IBM" and throw away a market that it created. See
PS/2 and OS/2.

They should probably just buy Canonical. That would put the shivers into
Google, properly.

~~~
mxuribe
Funny years ago i would have _reflexively_ flabbergasted at the thought of
microsoft buying canonical (or any linux distro producer)...but actually
thinking on that concept, and seeing recent (perhaps less-than-hostile)
approach that microsoft has taken towards open source and linux, that wouldn't
be a bad idea. I mean if microsoft could have both offerings - for windows
servers and ubuntu-installed servers - i suppose that would be a very smart
business move. Assuming they don't actually butcher or deny resources to
whatever linux company they would buy, i could see several benefits - not only
to microsoft but to developers, system integrators, etc. worldwide. Hey if a
side benefit is that it would spur the market (a la google, apple, etc.) a
little - to the benefit of us civilians - that's cool too.

~~~
orionblastar
I think Microsoft should do what Apple did with BSD Unix aka Nextstep and
merge it with their old OS.

Microsoft should take the Windows GUI and put it over Linux as a desktop
manager. Microsoft could sell the Windows GUI for Linux users that want to run
Windows apps.

~~~
vegabook
Could not agree more. Windows WM as an option on Linux is a clear and logical
strategy.

------
zxcvcxz
I use to run Linux in a VM on windows and use Chocolatey for package
management and cygwin and powershell etc, then I realized I was just trying to
make Windows into Linux. Seems to be the way things are going and with the
addition of the linux subsystem it kind of proves that Windows really isn't a
good OS on it's own, especially not for developers.

I wish Windows/MS would abandon NT and just create a Linux distro. I don't
know anyone who particularly likes NT and jamming multiple systems together
seems like an awful idea.

Windows services and Linux services likely won't play nice together (think
long file paths created by Linux services and other incompatibilities), for
them to be 100% backward compatible they need to not only make Windows
compatible with the things Linux outputs, but Linux compatible with the things
windows services output, and to keep the Linux people from figuring out how to
use Windows on Linux systems they'd need to make a lot of what they do closed
source.

So I don't see a Linux+Windows setup being deployed for production. It's cool
for developers, but even then you can't do much real world stuff that utilizes
both windows and Linux. If you're only taking advantage of one system then
whats the point of having two?

I went ahead and made the switch to Linux since I was trying to make Windows
behave just like Linux.

~~~
pcwalton
> I wish Windows/MS would abandon NT and just create a Linux distro. I don't
> know anyone who particularly likes NT and jamming multiple systems together
> seems like an awful idea.

I do. The NT kernel is pretty clean and well architected. (Yes, there are
mistakes and cruft in it, but Unix has that in spades.) It's not "jamming
multiple systems together"; an explicit design goal of the NT kernel was to
support multiple userland APIs in a unified manner. Darwin is a much better
example of a messy kernel, with Mach and FreeBSD mashed together in a way that
neither was designed for.

It's the Win32 API that is the real mess. Having a better officially supported
API to talk to the NT kernel can only be a good thing, from my point of view.

~~~
xorblurb
Well, large parts of the NT API are very close from Win32 API for obvious
reasons, and so are often in the realm of dozen of params and even more crazy
Ex functions. Internally there are redundancies that do not make much sense
(like multiple versions of mutex or spinlock depending on which parts of
kernel space use them, IIRC), and some whole picture aspects of Windows makes
no sense at all given the architectural cost it induces (Winsock split in half
between userspace and obviously needed kernel support is just completely
utterly crazy, beyond repair, it makes so little sense you want to go back in
past and explain the designer of that mess how stupid this is). The initial
approach of NT subsystems was absolutely insane (hard dep on a NT API core, so
can't do emulation with classic NT subsystems - so either limited to OS having
some technical similarities like OS/2, or very small communities when doing a
new target like the Posix or SFU was) -- WSL makes complete sense, though, but
it is maybe a little late to the party. Classic NT subsystems are of so little
use that MS did not even use them for their own Metro and then UWP things,
even though they would like very hard to distinguish that more from Win32 and
make the world consider Win32 as legacy. I've read the original paper
motivating to put Posix in an NT subsystem, and it contained no real strong
point, only repeated incantations that this will be better in an NT subsystem
and worse if done otherwise (well for fork this is obvious, but the paper was
not even focused on that), with none of the limitations I've explained above
ever considered.

Still considering the whole system, an instable user kernel interface has few
advantages and tons of drawbacks. MS is extremely late to the chroot and then
container party because of that (and let's remember that the core technology
behind WSL emerged because they wanted to solve the chroot aside userspace
system on their OS in the first place, _NOT_ because they wanted to run Linux
binaries) -- so yet another point why classic NT subsystems are useless.

Back to core kernel stuff, IRQL model is shit. Does not make any sense when
you consider what really happens, and you can't really use arbitrary multiple
levels. It seems cute and clean and all of that, but Linux approach of top and
bottom halves and kernel and user threads might seem messy but is actually far
more usable. Another point: now everybody uses multiprocessor computers, but
back in the day the multiple HAL were also a false good idea. MS recognize it
now and only want to handle ACPI computers, even on ARM. Other OSes do all
kind of computers... Cutler pretended to not like the "everything is a file"
approach, but NT does basically the same thing with "everything is a handle".
And soon enough, you hit exactly the same conceptual limitations (except not
in the same places) that not everything is actually the same, so that cute
abstraction leaks soon enough (well, it does in any OS).

On a more result oriented approach, one of the things WSL makes clear is that
file operations are very slow (just compare an exactly identical file heavy
workload under WSL and then under a real Linux)

So of course there are (probably) some good parts, like in any mainstream
kernel, but there are also some quite dark corners, and I am not an expert
about all architectural design of NT but I'm not a fan of the parts I know,
and I strongly prefer the Linux way to do equivalent things.

~~~
wfunction
> Cutler pretended to not like the "everything is a file" approach, but NT
> does basically the same thing with "everything is a handle". And soon
> enough, you hit exactly the same conceptual limitations (except not in the
> same places) that not everything is actually the same, so that cute
> abstraction leaks soon enough (well, it does in any OS).

Explain? Pretty much the only thing you can do with a handle is to release it.
That's very different from a file, which you can read, write, delete, modify,
add metadata to, etc... handles aren't even an abstraction over anything,
they're just a resource management mechanism.

~~~
xorblurb
You are right, but those points are details. FD under modern Unixes (esp.
Linux, but probably others) serves exactly the same purpose (resource
management). The FD where read/write can't be used just don't define those
(same principle for other syscalls) -- similarly if you try to NtReadFile on
an incompatible Handle it will also give you an error back. Both are in a
single numbering space per process. NT largely makes use of NtReadFile /
NtWriteFile to communicate with drivers, even in quite core Windows components
(Winsock and AFD). And NT Handles do serve at least an abstraction (I know
of): they can be signaled, and waited for with WaitFor*Objects.

So the naming distinction is quite arbitrary.

~~~
wfunction
> You are right, but those points are details.

Uh, no, they are _very crucial_ details. For example, it means the difference
between letting root delete /dev/null like any other "file" on Linux, versus
an admin not being able to delete \Device\Null on Windows because it isn't a
"file". The nonsense Linux lets you do because it treats everything like a
"file" is the problem here. It's not a naming issue.

~~~
strcat
Linux has plenty of file descriptor types that do not correspond to a path,
along with virtual file systems where files cannot be deleted...

Your example of device files is hardly universal, and the way it works is
useful.

~~~
wfunction
And to give you another example, look at how many people bricked their
computers because Linux decided EFI variables were files. You can blame the
vendors all you want, but the reality is this would not have happened (and,
mind you, it would have been INTUITIVE to every damn user) if the OS was sane
and just let people use efibootmgr instead of treating every bit and its
mother as files. Just because you have a gun doesn't mean you HAVE to try
shooting youself, you know? That holds even if the manufacturer was supposed
to have put a safety lock on the trigger, by the way. Sometimes some things
just don't make sense, if that makes sense.

~~~
jamespo
How many people really did this compared to eg windows users attacked by
cryptolocker?

------
dragonbonheur
.

~~~
geofft
"As Brother Francis readily admitted, his mastery of pre-Deluge English was
far from masterful yet. The way nouns could sometimes modify other nouns in
that tongue had always been one of his weak points. In Latin, as in most
simple dialects of the region, a construction like _servus puer_ meant about
the same thing as _puer servus_ , and even in English _slave boy_ meant _boy
slave_. But there the similarity ended. He had finally learned that _house
cat_ did not mean _cat house_ , and that a dative of purpose or possession, as
in _mihi amicus_ , was somehow conveyed by _dog food_ or _sentry box_ even
without inflection. But what of a triple appositive like _fallout survival
shelter_? Brother Francis shook his head."

~~~
stuaxo
What is this from ?

~~~
geofft
Walter Miller's _A Canticle for Leibowitz_.

------
l3m0ndr0p
Pretty neat stuff. I think that MS should just create their own Linux
Distribution & port all MS products. Get rid of the Windows NT Kernel. I
believe it's outdated & doesn't have the same update cycle that the Linux
Kernel has.

Why run a Linux Application/binary on a windows server OS? When you can just
run it on Linux OS and get better performance & stability.

~~~
jjtheblunt
What makes you believe it's outdated?

~~~
zxcvcxz
Can you show me the source so I can check?

~~~
UK-AL
Actually there was a leak for 2000, most critics said it was surprisingly
good.

~~~
trentnelson
There are leaks galore. NT4, 2000, and more recently, the Windows Research
Kit. Just google something like 'apcobj.c' and see. (Hah, first link was a
github repo!)

