Hacker News new | past | comments | ask | show | jobs | submit login
Cygwin and MinGW utilities may lose files (nu42.com)
42 points by nanis on March 4, 2016 | hide | past | favorite | 55 comments

Tl;dr if you have two files in a tar archive, one with an .exe extension and another without, but otherwise with the same name, e.g. test and test.exe, then both cygwin and msys2 tar will overwrite one of the files with the other on extraction. The developers insist that this is the correct behavior, refuse to even consider changing it and will barely even discuss it.

I've often run into cases like this where the developers insist on some ridiculous behavior and will not budge and even get angry if you question their decision. Another good example I found recently was with makepkg, a tool for creating packages for the pacman package system on ArchLinux. makepkg refuses to run as root. It used to have an option --asroot which would override this behavior but the devs decided to remove the option in recent versions. They insist that there is absolutely no reason that you would ever need to run makepkg as root and it is just too dangerous for you to be allowed to do so. This makes it extremely difficult to run makepkg inside a Docker container. They say you can just sudo to the nobody user but in Docker this takes many nontrivial steps.

Another good example is Bitcoin where the "core" devs currently have a strangle hold on the community. They've repeatedly made decisions that many disagree with. Now the community has split into warring factions yet still "core" retains its power.

Sure you can fork an Open-Source project but the power is held by those who control the most popular outlet for that software.

Bitcoin is not a good example, because there can't be competing forks on the protocol level. Once one fork "wins", the other automatically dies. Therefore, it's much more important to get the right design, because individuals in the minority who don't like the choice made by the others have no way to change anything.

Whereas a typical open-source project can, in theory, be forked with absolutely no consequences for those who remain on the original fork. Any change introduced in bitcoin harms those who disagree with it. The analogue by other software would need to be auto-updated software that couldn't be blocked.

Edit: another difference is that if you fork bitcoin and mess up, it can be unrecoverable. It might crash, or all funds might become unspendable, or a couple of other things. If it shakes confidence, then you might not be able to go back.

Also, you can't fully test bitcoin releases in advance because they interact with the network, especially for forks. So you might not know something's wrong until it's too late.

While a mistake in other software can just be reverted, and can be tested independently, because instances are localized.

(Disclaimer: I am simplifying a bit here.)

More than one fork of Bitcoin can and should coexist. There's no reason why miners couldn't run different codes. The worst that can happen is that some blocks are not accepted by the majority and those miners loose the reward. This is perfectly ok and will only push the competing softwares (and miners) to some sort of consensus.

I agree that the opinion you layout above is probably what the majority of Bitcoiners believe but I don't think it's right. A diversity of code would improve the Bitcoin ecosystem and distribute the control which is currently centralized on the "Core" devs.

They can't coexist for any length of time. That's all I was saying.

Unlike regular software, where different forks can be completely independent.

I don't see what I said that you're disagreeing with.

It's not ridiculous behaviour; it's the correct behaviour for Cygwin in almost all cases.

Windows executables are identified by their extension. That's in direct conflict with POSIX-style executables. You wouldn't want to have to execute every command-line tool like ls.exe, cp.exe, bash.exe, rm.exe etc. So Cygwin looks on the filesystem twice: both with and without the exe extension.

Similar things need to happen when building code from tarballs. It's not just finding executables to actually execute; compilers and linkers need to work correctly too, and random build scripts aren't going to always be checking for the .exe version of files. So the behaviour applies to reading and writing as well.

Tar is already surprising in that it overwrites by default, potentially including the archive you're extracting from. It's usually a bad idea to extract tar archives into anything other than empty directories, unless you know exactly what you're doing.

You are missing the point of the bug/feature described in the article. Neither me or the OP are arguing that you should have to type cp.exe to run the executable. The tar program should always just extract the files contained in it. Instead it overwrites some of them in some cases. This is perhaps related to calling a program with or with out the .exe extension but is only an unnecessary side-effect.

I noticed that after I wrote the comment, but I doesn't affect my opinion that this is still the correct behaviour for Cygwin. If anything, there should be Cygwin compatibility code added to tar.

Unless the dev's statement (quoted in TFA) is a blatant lie, this _has_ been discussed multiple times in the past. He also doesn't refuse to discuss the issue, but recommends reading the past discussions before arguing over previously resolved points.

In fact the dev and other users continue to discuss the reasoning for this behavior for quite a lengthy email chain.

Always assume the best of someone until proven otherwise, it saves a lot of stress :)

I hate that sense of entitlement. The whole point of foss is that people can mold software to fit their vision of how stuff should work without caring about what anyone else thinks. More pointedly the point of free software specifically is to guarantee the power to end users to control the software.

You hate the sense of entitlement that people have when they write software to behave how they want it to behave? Yes, that's shocking.

I'm sorry, I might have been unclear. I meant the entitlement that OP was exhibiting by wanting developers to write software that aligns with his view on how it should work instead of theirs.

Just yesterday I ran into a problem where xfce file manager thunar would show a warning message about using it as root. Every single discussion thread about this issue was closed with a comment how thunar should not be used as root and there will be no feature changes to disable this message.

If it refused to work at all, that would be annoying, but why is it such a problem to have a warning message?

> I am confused as to why this behavior also carries over to MinGW versions of these tools. After all, I thought the whole point of MinGW was not to try to provide a POSIX layer.

Not really.

MinGW's compilation environment doesn't try to provide a POSIX layer-- if you build something with MinGW, you get a native Windows executable that runs exclusively against the APIs provided by Windows (it's a replacement for the MSVC toolchain).

The MinGW/MSYS/MSYS2 toolchain itself, though (including gcc, binutils, and the shell and coreutils that it's usually installed with), requires a POSIX environment. They got this by forking Cygwin-- which explains why you see the same behavior between Cygwin's tar and MSYS's tar. They share the same behavior because they share code. (The old MSYS was a pretty old fork of Cygwin that they didn't keep up-to-date with upstream, but MSYS2 tries to stay pretty close to upstream Cygwin when they can.)

Thank you for that explanation.

It seems that there's the way to work it around: the order the files are packed matters!


    > tar -xvzf test.tar.gz
    > mydir/myexe.exe
    > mydir/myexe
    > ls myddir
    > myexe
"if a file foo.exe exists, and an application calls stat("foo"), it will get told that, yes, "foo" exists."


"if the order of the files in the tar archive is reversed, both files are unpacked. Or, unpack mydir/myexe.exe explicitely afterwards. The reason that this works is that Cygwin does not check for a file "foo", if the name of the file is explicitely given as "foo.exe"."

Anybody tried?

Also, it's relatively (as in since some years) recent change: "Cygwin always handled the .exe suffix transparently in terms of stat(2) calls, but Cygwin 1.7 also handles them transparently in terms of open(2) and any other call."

This is illustrated in all three screenshots in my blog post.

Now I think I see it. So it seems confirmed.

I haven't fully analyzed the screenshots, I've read just your text, seen the arrow in the screenshot and that way have seen that the files are missing, and only then read the linked explanation from Corina, and that explanation is much clearer: touch or not touch as such is not relevant, only the order of the files in the resulting tar matters, and now it seems to me that your screenshots confirm that: it's just about order during untaring. Thanks.

The `touch` example is there to show another instance of unexpected behavior: `touch test.exe test` only creates `test.exe`.

It seems that the archivers ask "does test exist" and when answered "yes" perform "delete test" which by cygwin >=1.7 actually deletes test.exe? Or even worse, the "open test" with some flag results in deleting test.exe and creating test.

But it also seems that touch asks "does test exist" and when answered yes it doesn't create it, just changes the datetime. Which probably changes the datetime of test.exe?

Anyway it would be good to investigate which actual calls by the user applications are being made in each case, to know what the actual sequence in each case is.

Having grappled with mapping between posix paths and the windows case-preserving-but-insensitive semantics, I can't say that I find this sort of thing terribly surprising. Unfortunately, this kind of edge-case behaviour is unavoidable when you're trying to bridge semantic differences between OSes.

I expect to have problems with filename case. Not with .exe extensions. This would have taken me completely by surprise as well.

So it seems to me: ".exe" is essentially translated as the "+x" bit and dropped from the file.

Didn't read TFA, was just breezing through, and thought I'd see if I the gust of it.

So it is trying to map "file" which has its executable bit set to "file.exe", and vice versa, potentially clobbering one file with the other. Or is it doing this irrespective of any posix execute bit that is set?

In my experiments, I did not explicitly set any executable bits using the corresponding `chmod`. Also, you can see in the Cygwin screenshot that neither `test` nor `test.exe has executable bit set in the archive.

It's probably because Windows implicitly adds the .exe extension when trying to execute something, and Cygwin works around this by just treating files with and without the extension as the same name.

You are ignoring everything else that appears in `%PATHEXT%`.

And if my theory is right, the fact that I forgot about %PATHEXT% could mean the Cygwin devs did too when they coded this "feature". But, I'm probably wrong.

Is there any way to disable this "feature", or to patch it out? This magic handling of ".exe" should really be handled by bash, not by the standard library.

It would make more sense for it to be handled by execve() or equivalent.

For example, you need to be able to execute a file named "/bin/sh" -- but for that file to be executable under Windows, it needs to be named "sh.exe". If you have a file with no extension in its name, Windows more or less doesn't know what to do with it.

Cygwin's solution to this is to treat "sh" and "sh.exe" as the same file, by tweaking the filesystem code. If "foo.exe" is the only file in the current directory, "ls" will show it -- but "ls foo" will show the same file, but with the name "foo".

Making any system call or library function that searches $PATH for executable files, or that executes a file, accept "foo.exe" when you specify "foo", would have been another solution, and perhaps a better one (though there could easily be some nasty gotchas that I haven't thought about).

There is, as far as I know, no way to implement a POSIX layer on top of windows without some kind of ugly hack for "*.exe" files (short of running a self-contained VM, but the point of Cygwin is that you can still access the Windows filesystem).

The downside of that approach would be breaking thousands (millions?) of Makefiles.

Yes, that's the nasty gotcha that I hadn't though about!

Yes, this. The problematic behavior is in the shell, but they "fixed" it by updating the standard library to have it try to auto-guess what the application wanted. The fix should have been in the shell.

Fixing it in the shell wouldn't have worked. See my other comment.

Are there any use cases for Cygwin these days on a modern system where VMs are so easy to set up?

You don't have to transfer files between Cygwin and Windows. Keeping Cygwin up to date is a lot easier than maintaining a whole Linux distro. And you don't have to wait for it to boot up.

VMs can share host volumes, Cygwin can suddenly become buggy and has no release (it is currently doing shit with ACL on my install, to the point of being unusable, and this seems to have affected other people too), so is not completely suitable for team work, SSD are fast and modern Linux distro really fast to boot, and there is basically nothing to do to maintain a modern mainstream Linux distro (although you have to avoid distro like Mint too much oriented in GUI/UX and not enough in core system engineering and stability, but they would not give you anything critical in a VM anyway, because I your graphical stuff should be done mainly in your host)

You might want Cygwin for some reasons, but for me that would not be those you gave compared to a VM (disclaimer, I use both, at least when Cygwin is working...)


Docker on Windows is just a Hyper-V/VirtualBox VM running Docker.

Virtual box allows shared directories.

I just point VMs at samba shares. Less effort than configuring the VMs and better control.

The use cases are certainly narrowing, but cygwin is still occasionally useful, e.g.:

- You want an actually functional ssh server running on windows so you can integrate with a mostly-linux test automation environment.

- You're committed to a (terrible) windows-only VCS, but develop some linux applications.

Sure -- I might be constrained to work on Windows development or use Windows tools, and having access to good UNIX/Linux-like command-line utilities for scripting, etc., in that environment is a big win.

Do you actually do that?

There's a lot of hoops one would have to get around for a seamless experience with a VM. Most of the time Cygwin just works[0]. I tried running my Emacs and terminal in a separate VM, wasn't so pleasant. Sharing the file system bidirectionally is the biggest problem.

[0] I'm sure for certain domains, say unix'y systems programming it's not true at all.

When I last tried installing cygwin it was horrific. The ui was awful, and involved manually clicking loads of boxes via a clunky ui. Next to impossible to reproduce. Has this improved? Can you just double click on InstallCygwin.exe now?

What's your use case? My intuition is telling me you want it to be something it's not :PP

I use it exclusively for CLI stuff via Mintty (the default terminal emulator). The installation just extracts the basic files and lets you download packages. Yes, setup.exe as a package manager is clunky. There's a third party thing called "cyg-apt" it's not that great but works. Personally I just went ahead and downloaded all available packages (just excluded a bunch that were a few GB large).

My 100 most used command: https://gist.github.com/auganov/1242885dbd452db05a47

It was a shitty development system for a POS system. Yes, it was too large to want to download everything but no programmatic way to tell it what you wanted (you'd think that someone that aped how a proper os worked would understand the need for complete command line control of installation, wouldn't you) and yes, it's been 3 years now but I remember the package manager looking like something someone knocked up in 3 minutes and meant to get around to fixing but never quite managed it. I think if you uninstalled and reinstalled it it even remember some of the previous values, making it very hard to retrace your steps with a view to scripting (on paper) what needed to be clicked on, so nobody else had quite the same subset of packages as me.

In any case where one has to mitigate the problems of bringing windows into the mix. At my last job I had several bridges to windows only systems and silliness that exposed out to a nix universe with ssh invokable cli tools.

Currently my only use case is that it does count in my book as a different variety of nix. If you are attempting to have a portable software it is another platform you can use to for testing that can be brought into a CI pipeline.

A more common use case (though still strange) is people using windows natively, who want to use C, C++, Fortran, Ada, etc to make windows programs (instead of cross compiling from Linux). Generally this category is students, and this is still far more educationally beneficial than attempting to learn skills with long term value from say visual studio's IDE.

edit note: asterisks make things italics on hn

Yes; when you want to operate on your Windows OS filesystem using Cygwin tools; when you want to use Cygwin tools freely in shell pipes with native tools; when you want to run something like Emacs locally with modes that use background executables for semantic / syntax / linting support; when you want to use shell job control with Windows executables; etc.

In no particular order, using vim, openssh, git, perl, bash, grep, tar, etc etc etc on the entire native file system in a POSIX environment is really fantastic. I also have an ubuntu VM to spin up on hand but I find I rarely need to do so. The cygwin mintty console is also far superior to the windows command console.

Yes, when you are forced to work on windows and you need sane CLI tools and a decent shell.

It's nice having a nearly native Unixlike shell and its more convenient than a VM. Even for simple tools like git and ssh. I use babun and it works great.

I prefer using cygwin and opens to ash into vms rather than putty

Computer Science is looking like speaking french.

A lot of rules. The first rule being to expect exceptions for every rules.

If a natural language can work this way, why not software industry?

Oh! I forgot uncertainties are bad and you may want to get things done in a way that is expected in a way every one understand without equivocation.

Are not inconsistent behaviour defeating the purpose of abstractions? It is an exception to a well expected result.

Cygwin is supposed to be having a low level consistent behaviour. At least that is what POSIX and unix are all about.

I don't see anything good coming from Cygwin and mingwin discarding this critic by a brush of pragmatism.

You don't but I do: you don't want to replace all your exec of e.g. "cp" with "cp.exe", .exe are not very often handled in real Unix environments, results of build without .exe under a real Unix would already clash with a .exe-less entry, Cygwin/MSys/MSys2 are there mainly for compat, and in this context the most compat you achieve (both way) is to handle .exe the way they do -- and there should be no exception.

This leads to odd behaviors, which is unfortunate, but alternatives would lead to even more and/or worse ones.

"Computer Science is looking like speaking french.

A lot of rules. The first rule being to expect exceptions for every rules.

If a natural language can work this way, why not software industry?"

Huh? They do both work the same way in that respect. Do you wish they did, or didn't?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact