
I Broke Rust's Package Manager for Windows Users - sasheldon
http://sasheldon.com/blog/2017/05/07/how-i-broke-cargo-for-windows/
======
bluejekyll
This is a great example of something else about software. As software grows in
usage and use cases, it starts bumping up against edge conditions which need
to be handled for various reasons.

Cargo now is becoming stronger and more stable because of bugs like this being
discovered. All software goes through this growth cycle. It's great to see
these things worked out in the various projects that support Rust.

There is another point here though; anytime the question comes up to just
rewrite a piece of software, throw out all the technical debt, it's not as
straightforward as it seems. Remember, together with that technical debt lies
a lot of valuable learnings written into the code. I haven't worked on Windows
directly in years, but I never knew that NUL was a reserved word as a file. I
would, and probably still will make this mistake in the future.

Which makes me wonder, has anyone written a file name validation crate that
guarantees that you're not writing to any reserved words on a filesystem of
the host OS? A quick search of crate.io doesn't turn anything up.

~~~
curun1r
It also shows how necessary it is to have some sort of deprecation process.
Maintaining nonsensical landmine features for compatibility with an operating
system released 36 years ago is putting the interests of MS's lazy long-term
users ahead of the interests of its current users. Even if MS maintained a
policy of only removing functionality after a 10-year deprecation period, this
"feature" would have been gone long ago. Transitions must be orderly, but they
should still happen.

It's nice that Rust's toolchain is better able to live Windows crazy
ecosystem, but that doesn't make Windows any less crazy.

~~~
unscaled
Joel Spolsky famously praised this policy of backward-compatibility at all
costs which he called "The Raymond Chen Camp"[1]. Many agreed with him, but I
always thought that Microsoft compatibility ideals were too radical to be real
wisdom. At some point the list of features you try to keep compatibility with
grows large enough that the Raymond Chen Way becomes unmaintainable.

The received wisdom of the 90s is wrong. Most users don't care about
compatibility, as Apple's success has clearly shown, and most companies are
now out following the Apple road. Large enterprise care about compatibility,
and they pay a lot, but this is not a forward-looking market. They'll keep
using buying new versions of your software because of the compatibility, but
if compatibility is the only story you have to offer, you'll slowly lose that
market.

I completely agree with you that Microsoft should have had a strategy for
deprecating these features _back in the 90s_ , when they were already old.

In this specific case of outdated filename restrictions, you could start with
what they already did: Windows NT 3.5 - Allow accessing all filenames with a
special prefix (which they already did). Windows NT 4.0 - Make it easy to
migrate to sane filenames by providing an opt-in per-process flag that makes
all APIs use them by default. At this point they can easily dogfood and
migrate all Microsoft software to the new APIs, so you would be able to delete
these pesky files in explorer. Windows 2000 - Make the new API flag default
for all versions compiled with the latest version of the Windows SDK. Windows
XP - Make the new API default for any app without a special entry in the
compatibility database.

Somewhere along the road, batch files (which is the only place where
compatibility with the old filenames was necessary) could be easily made
compatible by modifying the batch parser to replace redirections to NUL with
redirections to \\\?\devices\null or something akin. You may see some breakage
in scripts which use NUL and CON in non-standard way (e.g. as an argument),
but the migration pain won't be huge, and you could still save an old script
with a compatibility flag.

Microsoft obviously didn't take that way, and yeah, all the batch files
written back in 1981 may still work without hitch, but newer things keep
breaking in strange ways.

[1] [https://www.joelonsoftware.com/2004/06/13/how-microsoft-
lost...](https://www.joelonsoftware.com/2004/06/13/how-microsoft-lost-the-api-
war/)

~~~
to3m
Newer things only break in strange ways because they're broken. So rather than
break the old stuff, why not fix the new stuff?? - because after all,
approximately the only criticism you _can 't_ level at the Windows
NUL/PRN/COMx/etc. special names is that they're some kind of surprise that
appeared suddenly out of nowhere! It's been this way for a very long time.

(I wonder if part of this is the rage of Unix fans discovering that portable
means actually, you know, making an effort... and that there's more too it
than just checking it builds on x86 Debian as well as x64 Ubuntu...)

~~~
loa_in_
You can't just say it's been that way for a long time so it's acceptable,
because the industry (and for that matter the Internet) is getting fresh new
people every day. You can't expect them not to be surprised, and you can't
just arbitrarily require them to know something they haven't stumbled upon
until after it caused problems.

------
garaetjjte
Other magic aliases include CON, PRN, AUX, COM1-9 and LPT1-9. They are aliased
to respective devices in Win32 namespace "\\\\.\". COMs and LPTs above 9 don't
have aliases in global namespace and must be accessed explictly in Win32
namespace, eg. "\\\\.\COM10" (which itself is symlink to NT native
"\Device\Serial9")

In fact, it is possible to create files named NUL, COM1, etc. using \\\?\ (eg.
"\\\?\C:\NUL" is valid path) prefix which disables parsing arcane Win32 magic
files. Unfortunately these files are causing strange behaviour in applications
that don't use that prefix, Explorer included.

source: [https://msdn.microsoft.com/en-
us/library/windows/desktop/aa3...](https://msdn.microsoft.com/en-
us/library/windows/desktop/aa365247\(v=vs.85\).aspx#namespaces)

~~~
monochromatic
I still remember using "copy con foo.txt" and ending with ctrl-z to quickly
create a file. It was years before I understood how that actually worked.

------
tatterdemalion
As the blog post mentioned, we solve the issue by deleting the crate from the
package repository and reserving these problematic names. The incident lasted
about 2 and a half hours.

Crate names have to be one or more valid idents connected by hyphens, so no
other clever names like `/home` would be possible to upload. We already had
some crate names reserved and we just needed to add these to the list.

~~~
derefr
Reserving just the crate names won't cover your bases, though, no? I'm not
clear on what exists as part of a crate—but if there's any user control over
the filenames of the _contents_ of the crate (e.g. if the crate's source code
is in there) then any crate might contain a _file_ named e.g. "nul.rs",
triggering the same problem.

~~~
kibwen
I think you're misunderstanding the problem described in the OP. When you
build a project via cargo using the default settings, it fetches the git
repository at [https://github.com/rust-lang/crates.io-
index](https://github.com/rust-lang/crates.io-index) to enable it to resolve
dependencies locally. This git repository contains metadata for each library
on crates.io, where the metadata for a given library is located in a file with
the same name as the name of that library. When the OP uploaded a library
whose name was an illegal filename on Windows, git unexpectedly choked when
updating the local crate index repo, impacting all Windows users.

It sounds like the concern you're describing is a different matter. It's
likely true that if the source of a crate contains a file named "nul.rs",
cargo on Windows will fail if it attempts to git-fetch the source (unless
you're using Linux Subsystem for Windows, anyway). While this would indeed be
a problem, it would only affect users who elect to use specific libraries,
rather than serving as a denial-of-service for every Rust user on Windows.

~~~
wand3r
I was going to ask how a remote pckg could do that. Not knowing how rust works
(or package managers apparently) I didn't understand how it could be
widespread. Makes sense, damn; that's substantial.

~~~
kibwen
I'm not sure how other package managers do it (it should be noted that this
approach was designed to avoid some problems that other package managers have
encountered), but there is still room for improvement here: ideally, I think
we'd be hashing crate names rather than storing them verbatim on the
filesystem, to enforce more uniform distribution in the trie.

~~~
wand3r
Interesting, hashing them makes sense but it was a corner case; huge outage
but it was definitely something that was easy to overlook for sure

------
slobotron
There was a bug in Windows 95 (98 too?) where if you tried to open 'nul\nul'
or 'con\con' etc, it would BSOD instantly. Provided lots of drive-by fun in
computer labs... (got really good at typing win+r con\con)

~~~
tonyarkles
For more fun, you could also target other machines with SMB shares.
\\\thevictim\foo\nul\nul would BSOD _that_ machine. Good times.

~~~
cesarb
IIRC, you could also reference it in a HTML page, so the whole computer would
crash when that page was viewed.

------
protomyth
For those who don't use Windows and might need this info:
[https://msdn.microsoft.com/en-
us/library/windows/desktop/aa3...](https://msdn.microsoft.com/en-
us/library/windows/desktop/aa365247%28v=vs.85%29.aspx)

~~~
Kenji
That is a great page. Is there also such a page for Linux and Mac OS?

~~~
slrz
There's path_resolution(7) in the Linux man-pages set plus of course the
relevant parts of the POSIX standard.

------
captn3m0
What I don't understand is why cargo fetches the entire crate list and create
a directory for every crate (even if you never install it). Why not just have
a single file with the entire list? The issue mentions they use a trie, but
why use the filesystem as the trie store? Why not have a single file?

~~~
kibwen
The original authors of cargo, wycats and carllerche, aren't around today to
ask (it's a weekend!) though IRC attempted to answer regardless:

    
    
      <foo> to keep the number of files in a single directory down
      <foo> tools become unhappy with hundreds of thousands+ of things in a single dir
      <foo> as do filesytems
      <bar> why not just a flat file
      <bar> or sqlite or whatever
      <qux> right now it uses git's deduplication feature
      <qux> aka, when downloading updates you only download the objects that changed
      <qux> but it mostly works on a per file basis
      <qux> so git hashes each file and if the hash didnt change, it doesnt download an update
      <qux> but if it did, it treats it as completely new file, even if its just a little change

~~~
kibwen
Update:

    
    
      <wycats> Because of this: https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomment-193772935
      <wycats> I ran some scenarios against huge repos when I first worked on cargo
      <wycats> Trying to minimize the cost of operations
      <wycats> I landed on the current strategy, and GitHub in the above thread more or less endorsed what we were already doing at that time
      <wycats> Also see https://github.com/rust-lang/cargo/issues/2452

~~~
comex
It's still fundamentally a waste of disk space. On my system, as of a minute
ago, ~/.cargo/registry/index took up about 200MB for three different checkouts
(for some reason). After deleting that and running `cargo update`, only one of
them is recreated, 104MB. Out of that, 57MB is the JSON files and 47MB is git
history. But if I just concatenate all the JSON files, the result is only
33MB, and after gzipping, 3MB. Hypothetically, a non-GitHub-based Cargo could
store only those 3MB (using binary deltas to avoid resending it on every
update), or even 0MB if it just relied on the server to resolve dependencies.

~~~
thristian
Once you've gzipped to achieve that 3MB storage, binary deltas are useless.
Perhaps it the data could be (almost certainly is) transferred gzipped, then
expanded to the full 33MB size so binary diffs could be applied to it later,
but setting up a system to do binary diffs is a lot of incidental complexity:
xdelta is a surprisingly complex format, and bsdiff is really tuned for
executables, not arbitrary content (and is pretty complex too).

It sounds like the biggest win would be for cargo to keep using git, but clone
the crates.io index as a bare repository rather than checking out the
plaintext content. Then it would only take 47MB by your count, which is pretty
close to 33MB, and you could still get out the plain content with `git cat-
file` and friends.

~~~
comex
Technically, the Cargo /already/ bundles a full copy of libxdelta as part of
libgit2 (in addition to the separate Git binary delta algorithm); I just
checked using nm that it's actually included in the binary. It could probably
be removed, but, well, it probably adds a lot less than 44MB to the binary
size :)

Alternately, since JSON is text, I suppose you could just ensure that whatever
emits this hypothetical merged JSON file puts newlines between different
packages' entries, and then use a regular text diff (on the uncompressed
version, of course). But reading 44MB of JSON isn't instant; it would probably
be better to switch to either a binary format, or even something silly like a
sorted list of JSON strings separated by newlines.

There would be some incidental complexity around generating and applying the
diffs… you'd probably want to precalculate them on the server side, but it
could be rather expensive to, on every change, calculate a diff between the
current version and every previous change. Instead, you could have daily
checkpoints: each day the server would make a checkpoint and calculate a diff
to the last N checkpoints; on every update the server would recalculate the
diff between the latest checkpoint and HEAD. The client would store both HEAD
and a reverse diff to the latest checkpoint (or just store the checkpoint
separately and waste a few MB), so when it updates, it could revert to that
checkpoint and request the diff from there to the new latest checkpoint; it
would also request the diff from the checkpoint to the new HEAD. If its
checkpoint is too old then it would just redownload from scratch.

Overall, not a trivial change, but probably not too hard either.

apt-get does something vaguely similar with its pdiff files.

------
Sir_Cmpwn
Sounds more like a problem with stupid Windows design choices than with
anything you did.

~~~
dagw
Windows is, for better or worse, fiercely proud of its backwards
compatibility. So it's not so much a stupid Windows design choice as a
'stupid' DOS 1.0 design choice (and not even so much a choice as simply a
quirk of how the DOS 1.0 file system worked) that Windows doesn't want to
break backwards comparability with.

~~~
wand3r
I agree with parent that it's a bit crazy; but I wouldn't be as critical. to
your point; presumably even if they dropped DOS support something between DOS
and now likely relies on that. It's a fine line.

------
pvg
In the Mac System 7-ish days, people used to earnestly warn each other not to
name a file '.Sony' (a special name reserved for the floppy driver) as it
supposedly trashed your HD. Although I've never heard of anyone reproducing
it.

~~~
db48x
Now's your chance:
[https://archive.org/details/mac_MacOS_7.0.1_compilation](https://archive.org/details/mac_MacOS_7.0.1_compilation)

I'd try it myself, but I've only got my phone with me.

~~~
pvg
7.0.1 might be a little late, at least, for the supposed catastrophic results.
It doesn't like the file at all but nothing dreadful seems to happen.

~~~
db48x
I tried it in that and System 6
([https://archive.org/details/mac_MacOS_6.0.8);](https://archive.org/details/mac_MacOS_6.0.8\);)
System 6 actually didn't care at all. An interesting bug.

I haven't actually done so, but earlier versions are available if you know
where to look
([https://archive.org/details/mac_Paint_2](https://archive.org/details/mac_Paint_2)).

------
yrashk
Wouldn't it make sense for Cargo not to use crate names in file names, and use
hexadecimally encoded hashes instead?

~~~
brianberns
Yes, or some other identifier that's unique to that crate. Assuming that the
crate name is also a valid file name seems risky.

------
ziikutv
What did you end up calling the new crate?

Edit: I suggest "terminated"

~~~
filleokus
The .toml file in the master branch on Github seem to still call it "nul":
[https://github.com/SSheldon/nul/blob/master/Cargo.toml](https://github.com/SSheldon/nul/blob/master/Cargo.toml)

I can't find it on crates.io though.

~~~
ziikutv
I guess we can say, its To be Determined or entirely scrapped.

------
Strilanc
Urgh, this "nul" filename / reserved filename bug is probably in a _lot_ of
software.

~~~
tannhaeuser
Every MS-DOS programmer of old knows about nul, con, and the other reserved
names. Those might come from CP/M actually (so are even older), and Atari TOS
had them as well I believe.

------
roryisok
I was working on a video project for a local comics convention, and named the
project file "con.proj". That file hung around until I upgraded my hard drive
because no file manager could delete it.

~~~
stepik777
You can remove such files using Bash on Windows. `rm con.proj` works just
fine.

~~~
roryisok
This was years ago, I'm sure there were ways to do it but I didn't know about
them.

------
alkonaut
It's very tricky to do cross platform file handling stuff, and only the most
mature projects have ironed out this. Just look at your pet project and see if
it handles

\- Windows and unix line breaks in text files

\- Windows and unix path separators

\- BOM and non-BOM marked files if parsing UTF

\- Forbidden filenames such as in this article

By "handling" I mean it should accept or fail nicely on unexpected input -
e.g. say that line breaks should be unix style, or paths should be backslashes
etc. Very few projects actually do this well. Even fewer will do even more
complex things like handling too long paths with nice error messages etc.

~~~
tannhaeuser
Since Windows 10 now comes with an official Linux subsystem, why not just use
POSIX APIs and conventions everywhere, and not bother with Windows-specific
code if possible?

~~~
untog
For one, because the Linux subsystem is an optional install. If you're making
anything user-facing you can't rely on it being there - it's really a tool for
developers, not end-users.

~~~
majewsky
I'm pretty sure there are already programs out there whose install
instructions include activating developer mode and installing WSL Ubuntu.

------
encryptThrow32
I recommend ':?', as it will work in POSIX, but not Windows.

~~~
akerro
This sounds like end of Rust for next month for me...

------
msimpson
While I know nothing of Rust, Diesel, or CrateDB, I do know that Windows uses
a case-insensitive file system and this fix doesn't seem to take that into
consideration. However, the author of the fix does note:

> I believe crates.io's namespace is case insensitive let me know if that's
> wrong

Someone should probably validate that.

~~~
zyx321
Not quite. Windows uses a case insensitive API on top of a case sensitive file
system.

FUN FACT: As of 2017, Windows 10 is (partially) binary-compatible to Ubuntu
Linux. Any application that was originally compiled for Linux will still be
case sensitive when running on Windows 10.

~~~
msimpson
Right. I forgot NTFS is in fact case sensitive. Thanks.

------
toabi
I tried `npm install nul` on my win7 VM and it created a folder called nul
which I can't get rid of ¬_¬

~~~
toabi
Well, notably `npm uninstall nul` gets rid of it. But in the Explorer you
can't do anything with it.

------
hmottestad
Makes me wonder if I can make a crate called "../.." and have it overwrite
some user files.

~~~
kibwen
Crate identifiers are required to be [a-zA-Z] for the first character and
[a-zA-Z0-9_-] for the rest. So, no. :P

~~~
aisofteng
More succinctly: [A-z]+[A-z0-9_-] _

~~~
kibwen
If you look closely at your comment, you'll realize the hilarious HN bug that
prevents me from writing it as you suggest. :P

~~~
majewsky

      [a-zA-Z][a-zA-Z0-9_-]*
    

FTFY

------
lsiebert
Hmm... I feel like someone should stick the reserved names into a json
somewhere for easy reference for the next package manager.

------
joshu
Remember not to name anything pr#6 for either...

------
HedleyLamar
Don't they have a continuous integration system where they run the unit tests
on all platforms for all checkins to master?

~~~
steveklabnik
Yes. Why do you ask?

~~~
akerro
Obviously, he wanted to know.

