
Cygwin and MinGW utilities may lose files - nanis
https://www.nu42.com/2016/03/tar-anomaly.html
======
jcoffland
Tl;dr if you have two files in a tar archive, one with an .exe extension and
another without, but otherwise with the same name, e.g. test and test.exe,
then both cygwin and msys2 tar will overwrite one of the files with the other
on extraction. The developers insist that this is the correct behavior, refuse
to even consider changing it and will barely even discuss it.

I've often run into cases like this where the developers insist on some
ridiculous behavior and will not budge and even get angry if you question
their decision. Another good example I found recently was with makepkg, a tool
for creating packages for the pacman package system on ArchLinux. makepkg
refuses to run as root. It used to have an option --asroot which would
override this behavior but the devs decided to remove the option in recent
versions. They insist that there is absolutely no reason that you would ever
need to run makepkg as root and it is just too dangerous for you to be allowed
to do so. This makes it extremely difficult to run makepkg inside a Docker
container. They say you can just sudo to the nobody user but in Docker this
takes many nontrivial steps.

Another good example is Bitcoin where the "core" devs currently have a
strangle hold on the community. They've repeatedly made decisions that many
disagree with. Now the community has split into warring factions yet still
"core" retains its power.

Sure you can fork an Open-Source project but the power is held by those who
control the most popular outlet for that software.

~~~
ikeboy
Bitcoin is not a good example, because there can't be competing forks on the
protocol level. Once one fork "wins", the other automatically dies. Therefore,
it's much more important to get the right design, because individuals in the
minority who don't like the choice made by the others have no way to change
anything.

Whereas a typical open-source project can, in theory, be forked with
absolutely no consequences for those who remain on the original fork. Any
change introduced in bitcoin harms those who disagree with it. The analogue by
other software would need to be auto-updated software that couldn't be
blocked.

Edit: another difference is that if you fork bitcoin and mess up, it can be
unrecoverable. It might crash, or all funds might become unspendable, or a
couple of other things. If it shakes confidence, then you might not be able to
go back.

Also, you can't fully test bitcoin releases in advance because they interact
with the network, especially for forks. So you might not know something's
wrong until it's too late.

While a mistake in other software can just be reverted, and can be tested
independently, because instances are localized.

(Disclaimer: I am simplifying a bit here.)

~~~
jcoffland
More than one fork of Bitcoin can and should coexist. There's no reason why
miners couldn't run different codes. The worst that can happen is that some
blocks are not accepted by the majority and those miners loose the reward.
This is perfectly ok and will only push the competing softwares (and miners)
to some sort of consensus.

I agree that the opinion you layout above is probably what the majority of
Bitcoiners believe but I don't think it's right. A diversity of code would
improve the Bitcoin ecosystem and distribute the control which is currently
centralized on the "Core" devs.

~~~
ikeboy
They can't coexist for any length of time. That's all I was saying.

Unlike regular software, where different forks can be completely independent.

I don't see what I said that you're disagreeing with.

------
JonathonW
> I am confused as to why this behavior also carries over to MinGW versions of
> these tools. After all, I thought the whole point of MinGW was not to try to
> provide a POSIX layer.

Not really.

MinGW's compilation environment doesn't try to provide a POSIX layer-- if you
build something with MinGW, you get a native Windows executable that runs
exclusively against the APIs provided by Windows (it's a replacement for the
MSVC toolchain).

The MinGW/MSYS/MSYS2 toolchain _itself_ , though (including gcc, binutils, and
the shell and coreutils that it's usually installed with), requires a POSIX
environment. They got this by forking Cygwin-- which explains why you see the
same behavior between Cygwin's tar and MSYS's tar. They share the same
behavior because they share code. (The old MSYS was a pretty old fork of
Cygwin that they didn't keep up-to-date with upstream, but MSYS2 tries to stay
pretty close to upstream Cygwin when they can.)

~~~
nanis
Thank you for that explanation.

------
acqq
It seems that there's the way to work it around: the order the files are
packed matters!

[https://sourceware.org/ml/cygwin/2009-08/msg00293.html](https://sourceware.org/ml/cygwin/2009-08/msg00293.html)

    
    
        > tar -xvzf test.tar.gz
        > mydir/myexe.exe
        > mydir/myexe
        >
        > ls myddir
        > myexe
    

"if a file foo.exe exists, and an application calls stat("foo"), it will get
told that, yes, "foo" exists."

But:

"if the order of the files in the tar archive is reversed, both files are
unpacked. Or, unpack mydir/myexe.exe explicitely afterwards. The reason that
this works is that Cygwin does not check for a file "foo", if the name of the
file is explicitely given as "foo.exe"."

Anybody tried?

Also, it's relatively (as in since some years) recent change: "Cygwin always
handled the .exe suffix transparently in terms of stat(2) calls, but Cygwin
1.7 also handles them transparently in terms of open(2) and any other call."

~~~
nanis
This is illustrated in all three screenshots in my blog post.

~~~
acqq
Now I think I see it. So it seems confirmed.

I haven't fully analyzed the screenshots, I've read just your text, seen the
arrow in the screenshot and that way have seen that the files are missing, and
only then read the linked explanation from Corina, and that explanation is
much clearer: touch or not touch as such is not relevant, only the order of
the files in the resulting tar matters, and now it seems to me that your
screenshots confirm that: it's just about order during untaring. Thanks.

~~~
nanis
The `touch` example is there to show another instance of unexpected behavior:
`touch test.exe test` only creates `test.exe`.

~~~
acqq
It seems that the archivers ask "does test exist" and when answered "yes"
perform "delete test" which by cygwin >=1.7 actually deletes test.exe? Or even
worse, the "open test" with some flag results in deleting test.exe and
creating test.

But it also seems that touch asks "does test exist" and when answered yes it
doesn't create it, just changes the datetime. Which probably changes the
datetime of test.exe?

Anyway it would be good to investigate which actual calls by the user
applications are being made in each case, to know what the actual sequence in
each case is.

------
Nacraile
Having grappled with mapping between posix paths and the windows case-
preserving-but-insensitive semantics, I can't say that I find this sort of
thing terribly surprising. Unfortunately, this kind of edge-case behaviour is
unavoidable when you're trying to bridge semantic differences between OSes.

~~~
mark-r
I _expect_ to have problems with filename case. Not with .exe extensions. This
would have taken me completely by surprise as well.

------
dspillett
So it is trying to map "file" which has its executable bit set to "file.exe",
and vice versa, potentially clobbering one file with the other. Or is it doing
this irrespective of any posix execute bit that is set?

~~~
colejohnson66
It's probably because Windows implicitly adds the .exe extension when trying
to execute something, and Cygwin works around this by just treating files with
and without the extension as the same name.

~~~
nanis
You are ignoring everything else that appears in `%PATHEXT%`.

~~~
colejohnson66
And if my theory is right, the fact that I forgot about %PATHEXT% could mean
the Cygwin devs did too when they coded this "feature". But, I'm probably
wrong.

------
captainmuon
Is there any way to disable this "feature", or to patch it out? This magic
handling of ".exe" should really be handled by bash, not by the standard
library.

~~~
_kst_
It would make more sense for it to be handled by execve() or equivalent.

For example, you need to be able to execute a file named "/bin/sh" \-- but for
that file to be executable under Windows, it needs to be named "sh.exe". If
you have a file with no extension in its name, Windows more or less doesn't
know what to do with it.

Cygwin's solution to this is to treat "sh" and "sh.exe" as the same file, by
tweaking the filesystem code. If "foo.exe" is the only file in the current
directory, "ls" will show it -- but "ls foo" will show the same file, but with
the name "foo".

Making any system call or library function that searches $PATH for executable
files, or that executes a file, accept "foo.exe" when you specify "foo", would
have been another solution, and perhaps a better one (though there could
easily be some nasty gotchas that I haven't thought about).

There is, as far as I know, no way to implement a POSIX layer on top of
windows without _some_ kind of ugly hack for "*.exe" files (short of running a
self-contained VM, but the point of Cygwin is that you can still access the
Windows filesystem).

~~~
barrkel
The downside of that approach would be breaking thousands (millions?) of
Makefiles.

~~~
_kst_
Yes, that's the nasty gotcha that I hadn't though about!

------
pervycreeper
Are there any use cases for Cygwin these days on a modern system where VMs are
so easy to set up?

~~~
sp332
You don't have to transfer files between Cygwin and Windows. Keeping Cygwin up
to date is a lot easier than maintaining a whole Linux distro. And you don't
have to wait for it to boot up.

~~~
xorblurb
VMs can share host volumes, Cygwin can suddenly become buggy and has no
release (it is currently doing shit with ACL on my install, to the point of
being unusable, and this seems to have affected other people too), so is not
completely suitable for team work, SSD are fast and modern Linux distro really
fast to boot, and there is basically nothing to do to maintain a modern
mainstream Linux distro (although you have to avoid distro like Mint too much
oriented in GUI/UX and not enough in core system engineering and stability,
but they would not give you anything critical in a VM anyway, because I your
graphical stuff should be done mainly in your host)

You might want Cygwin for some reasons, but for me that would not be those you
gave compared to a VM (disclaimer, I use both, at least when Cygwin is
working...)

------
julie1
Computer Science is looking like speaking french.

A lot of rules. The first rule being to expect exceptions for every rules.

If a natural language can work this way, why not software industry?

Oh! I forgot uncertainties are bad and you may want to get things done in a
way that is expected in a way every one understand without equivocation.

Are not inconsistent behaviour defeating the purpose of abstractions? It is an
exception to a well expected result.

Cygwin is supposed to be having a low level consistent behaviour. At least
that is what POSIX and unix are all about.

I don't see anything good coming from Cygwin and mingwin discarding this
critic by a brush of pragmatism.

~~~
xorblurb
You don't but I do: you don't want to replace all your exec of e.g. "cp" with
"cp.exe", .exe are not very often handled in real Unix environments, results
of build without .exe under a real Unix would already clash with a .exe-less
entry, Cygwin/MSys/MSys2 are there mainly for compat, and in this context the
most compat you achieve (both way) is to handle .exe the way they do -- and
there should be no exception.

This leads to odd behaviors, which is unfortunate, but alternatives would lead
to even more and/or worse ones.

