
Midipix: Posix for Windows - kryptiskt
http://midipix.org/
======
mpixorg
The project's pre-pre-alpha will be out very soon (in a week or less) and will
already give a nice taste of its design and speed. The primary focus thus far
has been major challenges (toolchain, process creation and initialization,
signals, glue layer between posix system call layer and libc, etc.). This
means that while you will be able to test functions such as fork(), execve(),
mmap(), or fopen() with a utf-8 path, some of the easy/easier-to-implement
system calls will "surprisingly" still be missing.

For one or the other reason, fork() has become in many discussion of posix on
windows something of a fetish. The interface surely has its place, and
figuring out how to efficiently implement it took a huge amount of effort, yet
the vast majority of applications do not truly need it. Matter of the fact is
that even on linux, where fork(2) is natively supported, the sequence
fork+execve is more costly than clone+execve (where clone's flags are CLONE_VM
&& !CLONE_THREAD). For additional reference, see for instance the
implementation of posix_spawn in musl libc ([http://git.musl-
libc.org/cgit/musl/tree/src/process/posix_sp...](http://git.musl-
libc.org/cgit/musl/tree/src/process/posix_spawn.c)).

If high performance of the posix layer were not possible the project would
have not existed. Among the factors that make high performance possible are 1)
direct use of kernel interfaces (aka the Native API, where most of the runtime
layer is written as a user-space driver), 2) utf-8 as the primary supported
multibyte encoding as a foundational concept rather than an afterthought, and
3) tls implementation that matches in speed the native tls facility.

~~~
the8472
> For one or the other reason, fork() has become in many discussion of posix
> on windows something of a fetish.

how about this one?

fopen + mmap + unlink

Afaik windows APIs forbid deleting mmaped files.

~~~
mpixorg
With FILE_SHARE_DELETE there shouldn't be any particular problem, but I
certainly want to test this with the posix flags you had in mind. Do you
happen to have a minimal example that you deem problematic on Windows?

~~~
wfunction
If you delete the backing file of (or perhaps even close its handle -- I don't
remember) a memory-mapped file, you get VERY bizarre behavior in Windows. I
don't remember what it was, but I remember I did it and the behavior I got was
quite nonsensical.

------
wfunction
This project sounds too good to be true. I would be quite astonished if they
can make fork() work seamlessly without an amazingly high performance cost. In
my experience the only way to do such a thing is to use NtCreateProcess(), but
that function itself seems impossible for anyone besides Microsoft to use
correctly. (I have tried numerous times and failed, and so have many others.)

~~~
TickleSteve
[[https://www.cygwin.com/faq.html#faq.api.fork](https://www.cygwin.com/faq.html#faq.api.fork)]

Here's how it works:

Parent initializes a space in the Cygwin process table for child. Parent
creates child suspended using Win32 CreateProcess call, giving the same path
it was invoked with itself. Parent calls setjmp to save its own context and
then sets a pointer to this in the Cygwin shared memory area (shared among all
Cygwin tasks). Parent fills in the child's .data and .bss subsections by
copying from its own address space into the suspended child's address space.
Parent then starts the child. Parent waits on mutex for child to get to safe
point. Child starts and discovers if has been forked and then longjumps using
the saved jump buffer. Child sets mutex parent is waiting on and then blocks
on another mutex waiting for parent to fill in its stack and heap. Parent
notices child is in safe area, copies stack and heap from itself into child,
releases the mutex the child is waiting on and returns from the fork call.
Child wakes from blocking on mutex, recreates any mmapped areas passed to it
via shared area and then returns from fork itself.

~~~
wfunction
And where is the copy-on-write happening here? If I recall correctly, the
problem was that attempting to reproduce fork()'s copy-on-write behavior on
Windows resulted in a massive performance hit. (Which, IIRC, contributes to
the slowness of Cygwin.)

EDIT: Oh, I'd skipped the part above "here's how it works"... that just says
exactly the same thing I said above.

~~~
TickleSteve
I've personally never had any real performance issues with Cygwin. What are
you doing that involves lots of forking?

For me, Cygwin has always been pretty much native-speed, its only an API
translation layer.

~~~
wfunction
> What are you doing that involves lots of forking?

Running scripts?

Also, try enumerating the files in a large directory hierarchy and compare it
with native Windows speed (or Linux speed), it's not even comparable.

~~~
TickleSteve
fair enough, it probably wont match native for that.

Personally tho, speed has always been sufficient for me to never notice. I use
it most days, but I don't do comparisons.

~~~
wfunction
"Won't match native" is quite the understatement.

It's not a question of a 40% speed difference, more like a > 400% speed
difference last time I checked.

------
antics
I work for Microsoft. As you can imagine, this is a problem we have devoted a
considerable amount of energy to thinking about. A lot of this discussion has
revolved around comparisons to previous solutions (particularly Cygwin and
Interix/SUA), but I think it's worth backing up and thinking about what the
fundamental limitations of implementing something approaching POSIX compliance
in userspace. I will try to tell the story more or less as it is told
internally by the people who built these things from scratch.

To be transparent, I am sort of worried that the authors have told us in this
thread that they think the number of applications that "really need" `fork` is
small, and that the discussions about POSIX subsystems on Windows are
overindexed on discussions about `fork` without justification.

The truth of the matter is that the semantics of `fork` infect every API that
creates process state. Every library, every syscall that creates process state
will have a clear answer for what happens when a process that is using that
bit of process state calls `fork`. This includes file and socket management,
IPC stuff, threads, signals, and so on. To make `fork` work properly on
Windows, you absolutely need to replicate what UNIX does here, or you will
break a lot of apps, because make no mistake, a _lot_ of apps depend on these
semantics to work correctly. `fork` is not just a function that occasionally
gets called and sometimes needs to be fast for people who are calling it. It
is core to the semantics of the POSIX API, and if you don't treat it as such
you are in for a bad time later.

Perhaps more worrying, though, is that there is a serious impedence mismatch
between the UNIX and Windows models of asynchrony. Particularly in the case of
sockets and signals, this difference is immense, and I am extremely skeptical
we will find a good (or even close to production-worthy) solution in usermode.
Maybe these good folks have found something I have missed; to me the approach
just seems doomed.

And of course, this is all just the start of the problem. It is a much longer
trudge to solve the "real" problem, which is immense. The original Interix
POSIX subsystem in NT 3.5 (I'm told) had a `fork` that was just barely enough
to pass the 1003.1-1990 validation suite, and fell over quickly if you pushed
much harder. But they didn't have to mess with the `fork` implementation to
turn that 67kloc into the more robust and conformant SFU 3.x; it was mostly a
long tail of things extrernal to fork that just needed to be ironed out.

On top of that, to really have `fork` interop, you would likely need to
redefine every Win32 API so it did the "right thing" under UNIXy things like
signal delivery and fork, and that is a truly, terrifyingly tall order.

~~~
dalias
These are some good observations. Note that on modern POSIX, calling anything
but async-signal-safe functions after forking in a multi-threaded process
results in undefined behavior. I think it's totally reasonable to consider any
programs using the WinAPI (not just POSIX functions provided through midipix)
as being formally multi-threaded, and to consider the WinAPI non-AS-safe. I'm
not 100% sure what midipix is doing in this regard, but my advice on the
project (I'm the primary author/maintainer of musl libc, which midipix is
using) has been not to worry about making arbitrary functions work after fork,
but only supporting the things that POSIX requires to work. My view in general
is that fork should be phased out, but posix_spawn is not sufficiently
powerful yet to replace all uses of fork+exec -- it can't do advanced uid
changes, setsid, resource limits, etc., and posix_spawn can't be used
effectively from an async signal context because the API (attributes and file
actions) inherently involves allocation.

------
opaque_salmon
That's an interesting name choice that might make it hard for me to remember
in the future, do you happen to know why they chose it?

In any case, I welcome a posix interface for windows, it would be a cool tool
to have, especially in the case of cross-platform utilities.

~~~
TickleSteve
Any reason Cygwin isnt suitable? Thats the goto project for the type of stuff.

~~~
dalias
The biggest problem with Cygwin is that programs linked with Cygwin inherit
global state from from a Cygwin installation on the system they're running on.
If you want to produce a Windows program that just runs on any system you
install it on using Cygwin, it will work right for most users, but if a power
user who has Cygwin installed and has their own custom mounts, options (like
different binary/text mode settings), etc. tries to run your program, it might
break spectacularly. This makes Cygwin a really poor choice for making
binaries you want to distribute as standalone programs.

Aside from that, Cygwin tries to hard to be a complete Unix environment on
Windows, whereas midipix just gives you enough to use interfaces that were
standardized in POSIX as a reasonable, uniform API for all operating systems
to provide. Some functions go beyond that, but you don't have to use them. And
even some things that are mandatory in POSIX are optional in midipix; as I
understand it, you can choose at build time whether you want the overhead of
being able to support tty devices (and the associated semantics like job
control, signals from the controlling tty, etc.).

~~~
TickleSteve
It used to be like that... not anymore.

Cygwin works fine with multiple installations these days
[[https://cygwin.com/faq/faq.html#faq.using.multiple-
copies](https://cygwin.com/faq/faq.html#faq.using.multiple-copies)]

The thing that wont work is if you try to mix and match a dll from one
installation with binaries from another, but I think you can agree that that
situation is fair. You just need your paths setup correctly.

------
hsivonen
Having fopen() that takes utf-8 on Windows is a huge deal. I already want to
investigate using this for Firefox on Windows for this reason alone.

~~~
dalias
Yes, this is the really big deal that everyone focused on fork and mmap
semantics and other details is overlooking. Having midipix as the means of
producing Windows versions of cross-platform software like Firefox should make
it possible to remove a lot of ugly #ifdeffery and/or whole "portability
layers" that are avoiding the standard functions like fopen because of a lack
of Unicode support on Windows.

~~~
cygx
I assume midipix uses the same approach as Scheme 48, Racket and Rust[1] to
deal with ill-formed UTF-16?

[1] [https://simonsapin.github.io/wtf-8/](https://simonsapin.github.io/wtf-8/)

~~~
mpixorg
Actually no... the application makes all calls using utf-8, and is expected to
provide it in a well-formed manner so that the system call layer could convert
it to utf-16. In the reverse route, where utf-16 is read by the system call
layer and then converted to utf-8 (getdents(2) and friends), it is expected
that file names be in well-formed utf-16. For a file-system volume to have
ill-formed utf-16 name entries would make for an interesting case... have not
encountered that yet, but will certainly look into that.

------
exDM69
This is very interesting, I could certainly use this for porting some of my
projects to Windows. But what the website is missing is a "getting started"
doc that explains how to set up a toolchain for cross (or native) compiling.

------
TickleSteve
Cygwin anyone?

Cygwin is just a DLL for API translation with an associated package manager
(Though many people think its more heavyweight than that and consequently dont
like it).

~~~
beagle3
Cygwin is just a DLL, and it works reasonably well, but it does come with some
nontrivial baggage:

\- It's GPL licensed, or you buy a license from RedHat; either might be too
onerous for the people who develop midipix

\- You can only have one cygwin1.dll in memory at a given time; If you have
two cygwin using programs, they must both use exactly the same version of the
cygwin1.dll; Which means that you can't just distribute a self-contained
cygwin program and expect it to work.

\- It's a bit clunky at the edges, with the mounts (it looks like you have a
/usr directory on the root, and you can see it with "ls" or cygwin "dir" but
not cmd "dir", for example; user integration is a bit clunky).

I'm not sure midipix will be better - some problems are inherent. However,
cygwin was designed around Win95/NT4 deficiencies some 20 years ago. It has
evolved very gracefully, but it's possible a modern version without all that
legacy will work better.

~~~
TickleSteve
Like you say, some issue are inherent in the problem-space.

but... multiple versions of Cygwin can coexist these days. It did used to be
an issue, granted. tho these days quite a few programs distribute their own
cygwin dll and tools.

I agree, licensing may be an issue for some people.

~~~
beagle3
Thanks. It's good to know that the multiple version issue has been addressed -
although from the FAQ it sounds like there are still a few (unlikely) corner
cases one needs to keep in mind.

------
tkubacki
Deploying Linux apps on Windows Server is easier these days (Hyper-V).

If you really need decent POSIX OS on Windows just use virtualiztion.

~~~
cwyers
I'm sympathetic to people who want to develop portable GUI apps across
Linux/Windows/OS X. If you're developing something like Inkscape or
LibreOffice, most of your potential user base is on Windows and they are NOT
going to install a Linux VM just to run your app.

But yeah, there are a lot of people on HN who constantly post about how their
work issued them a cat and they have all these tools they install on their cat
to make it act more like a dog but it makes a really terrible dog anyway and
so cats must be crap. Uh, it's a cat. If you need a dog, go get a dog. Or run
a dog on Hyper-V if you need a dog and a cat at the same time. My favorite is
when people talk about how their favorite scheme they use to get their dog to
catch mice doesn't work for their cat, therefore it's evidence that cats are
terrible at catching mice.

~~~
markbnj
>> I'm sympathetic to people who want to develop portable GUI apps across
Linux/Windows/OS X.

I am too, but I wonder whether it is even a relevant goal anymore. It's one of
those things that has always been a topic of conversation, and has never
really happened in a way that a mass of end users has adopted. Everyone who
ever ran a java desktop app on Windows back in the day probably has some
theory of why, but I think it was mostly because there was never really a
strong need on the user side, despite the idea being so attractive to devs.

Now... I don't even know what the world of work is going to look like in ten
years... Android? Windows? Linux? Mac? Tablet? Phone? Laptop? All of the above
and more, I assume, and the whole interface portability thing seems to be
heading down the exact same road that it headed down on the desktop, i.e.
either write native or accept some lesser solution.

------
zx2c4
A related project, flinux -
[https://github.com/wishstudio/flinux](https://github.com/wishstudio/flinux)
\- allows you to run unmodified linux binaries on Windows. It is in the
process of implementing all the linux syscalls...

------
auganov
I hope passing file descriptors over Unix sockets will work contrary to
Cygwin. I really want to use ControlMaster in OpenSSH. Btw. if anyone knows a
decent workaround I'll be super thankful as well.

------
kickingvegas
Interix did this, and then they were shelved.

[http://en.wikipedia.org/wiki/Interix](http://en.wikipedia.org/wiki/Interix)

~~~
dalias
Interix is a lot different because it requires installing a system component,
which requires administrator privileges. Midipix produces applications that
(at least as I understand it) run on basically any NT-based Windows with no
special privileges.

~~~
kickingvegas
Sure, but to echo userbinator's comment above, compatibility layers are
limited. Interix was architected to run directly on top of HAL.

------
ised
Was POSIX ever really intended to apply to a non-UNIX OS, e.g., VMS?

The Windows kernel is based on the VMS kernel, right?

And NTFS is based on the VMS filesystem?

[http://en.wikipedia.org/wiki/David_cutler](http://en.wikipedia.org/wiki/David_cutler)

I would have been happy with VMS on the PC.

Instead we got Windoze. How much of our lives has this monstrosity wasted?
Just let it die.

Do daemontools' supervise and svscan need fork()?

~~~
JoeAltmaier
Posix was an API standard, implementable anywhere. The very first POSIX
implementation I assisted with, and it ran on CTOS which had a message-passing
kernel and built-in networking. Nothing like Unix or Windows.

~~~
ised
Somehow it just became strongly associated with UNIX?

Or UNIX became the embodiment of the concept of "run anywhere"?

------
zvrba
No word about debugging. How are they going to port, say, gdb? Are they going
to implement ptrace, although it doesn't conform to any standard? If so, it
also raises the question of WHICH flavor of ptrace?

Also, POSIX has a number of optional features, like realtime and queued
signals. Which features exactly will be supported?

------
fithisux
very interesting. actually, exactly what I was looking for. But how it
compares with uwin?

------
edwintorok
Maybe I'm missing something obvious but where do I download the installer? Or
do I have to (cross-)build it all from source using mingw-w64?

~~~
mpixorg
The general design allows for building the toolchain in its entirety (libc,
libcgcc, libstdc++) independent of the runtime components. The idea behind
this is that changes to the toolchain will fairly soon become rare [but see
below], whereas changes to the runtime will be very frequent at least in the
first couple of years.

Building the cross-compiler requires a native compiler and a shell environemnt
that is capable of building gcc. The build process is trivial, and thus far
has been tested on several Linux flavors (just asked that this also gets
tested on BSD and OSX). Building gcc in an msys/cygwin environemnt is always a
bit more tricky, so we have not spent much time trying that. Instead, we are
working hard to become self-hosted as soon as possible.

\+ building the cross-compiler:

git clone git://midipix.org/cbb/cbb-gcc && cd cbb-gcc && ./cbb-midipix-cross-
gcc.sh

yep, that's that. This will use $HOME/temp as a temporary folder, and install
the toolchain to $HOME/midipix. Make sure you add $HOME/midipix/bin to your
path in order to use the toolchain. As with all other gcc builds, you need to
have the usual dependencies on the build system (gmp,mpfr,mpc,libelf,texinfo)
and a working shell environment.

\+ pre-pre-alpha, radical changes: some major changes to the toolchain are
underway. If you find it hard to believe that building the cross-toolchain is
that easy please go ahead and run the above commands from a nearby shell, but
please also rebuild it in a day or two (look for a commit message mentioning
"automatic creation of GOT entries" :))

\+ as a kind reminder, cross-compiling is just for building applications;
testing them can only be done with the runtime library, which is not out yet.

~~~
mpixorg
Almost forgot to mention: the current cross-compiler is based on gcc-4.6.4
since that is the last modern gcc which does not depend on C++, and is
therefore easier and faster to build and run. Porting subsequent gcc versions,
and likewise clang and cparser, is a high priority, and will follow the
initial release of the runtime components.

------
greggman
I'm clueless in this area but could git be linked with this so we get a fast
good git on windows?

~~~
ianlevesque
I've never had a problem with msysGit [1]

[1] [https://msysgit.github.io/](https://msysgit.github.io/)

~~~
greggman
Compare it to linux git, it's about 10-20x slower on same hardware

~~~
wumbernang
That's probably the NTFS MFT rather than the actual runtime. Git plops lots of
small files on the disk which end up in the MFT. The MFT is really slow. We
have the same trouble with SVN.

You can tune this a bit with fsutil and turn off last access time, 8dot3
filenames and change mftzone size and it'll be significantly faster.

ReFS is a little better in this respect. I suspect this was a hang-back from
early NT versions which were tuned to larger binary blobs as a normal file
statistic rather than lots of small text files.

~~~
beagle3
Very interesting. Do you have a pointer to some deeper info about the required
NTFS tuning? e.g. specific MFT size recommendations, and possibly other
changes?

~~~
wumbernang
[http://www.ntfs.com/ntfs_optimization.htm](http://www.ntfs.com/ntfs_optimization.htm)

Old but good. fsutil wraps all the registry poking for reference.

------
ElijahLynn
My first response after reading the headline...

But why?

~~~
crpatino
If you create a product that runs on client's hardware, you want to be able to
run consistently in as many platforms as the majority of your potential
customer's. You are leaving money on the table if you don't.

If a subset of those platforms is Linux + (Commercial) Unixes + Windows, the
Windows part is going to take twice as long and cost you thrice as much to get
in line with the other 2... unless you have some clever solution like the one
proposed here.

