Cargo now is becoming stronger and more stable because of bugs like this being discovered. All software goes through this growth cycle. It's great to see these things worked out in the various projects that support Rust.
There is another point here though; anytime the question comes up to just rewrite a piece of software, throw out all the technical debt, it's not as straightforward as it seems. Remember, together with that technical debt lies a lot of valuable learnings written into the code. I haven't worked on Windows directly in years, but I never knew that NUL was a reserved word as a file. I would, and probably still will make this mistake in the future.
Which makes me wonder, has anyone written a file name validation crate that guarantees that you're not writing to any reserved words on a filesystem of the host OS? A quick search of crate.io doesn't turn anything up.
It's nice that Rust's toolchain is better able to live Windows crazy ecosystem, but that doesn't make Windows any less crazy.
Transitions are nice from a development perspective but I can guarantee you'll never hear someone who uses your library happy that they need to rewrite parts of it.
Also Windows doesn't have a monopoly bizarre filenames/features/etc you can find plenty of things in the nix family as well.
Lastly, Rust is one of the few projects I've seen that has phenomenal Windows support. It's something that's really appreciated and is going to help them capture markets that other software won't.
Misreading. GP talked about MS's lazy long term users, "lazy" applies to the users not to Microsoft.
Like what? I'm not aware of special file names in arbitrary directories. Only in known/documented ones like /proc or /dev.
I'd say *nix OSes are too lax in what they allow as anything without a zero byte is valid.
The fact that multiple /s will be normalized to be the same as one sometimes trips up security code or code trying to validate that some particular file isn't used (i.e., checking that the filename doesn't start with /dev or a list of other blacklisted directories will fail if the user passes //dev).
Symlinks! Oh, gosh, symlinks. Were this not a stream-of-consciousness dump they probably should come first. You can do terrible things with symlinks, like upload a tarball or zip file that creates a symlink to an arbitrary location in the system, then use that symlink reference as a directory reference to plop a file down. (Some archivers prevent this, others don't.)
Also, /dev is just a convention, it's possible to place device nodes anywhere you want.
You can also pretty much mount arbitrary things in arbitrary places via bind mounts. Hard links can also cause some fun with code that assumes file systems aren't cyclic. Windows technically has a lot of these features but they're harder to get to and less well known whereas UNIX uses the various links in base Linux installs and they're readily available.
More Taste: Less Greed? or Sending UNIX to the Fat Farm describes a V7 derivative that had /dev/deuna, /dev/arp, /dev/ip, and /dev/udp.
12738 Jul 25 1985 usr/sys/inet/tcp_device.c
13461 Aug 6 1986 ./sys/inet.old/old/tcp_device.c
13457 Feb 3 1987 ./sys/inet.old/tcp_device.c
13457 Feb 24 1987 ./sys/inet/tcp_device.c
13542 Feb 20 1990 lsys/inet/tcp_device.c
13622 Mar 9 1992 sys/inet/tcp_device.c
Edited to fix formatting.
Here are the commands I used to identify the right file.
find . -type f -print0 | xargs -0 grep -I "/dev/tcp" | less
Edited to add the command sequence for the historical record.
Edited again to fix wording of the first sentence.
You would be correct in then pointing out that if you pass user parameters to bash without treating them as carefully as you'd treat radioactive waste, you're asking for trouble, and that /dev/tcp doesn't offer much than the various "nc"s don't. That's why I was sort of non-committal about condemning them; it's not like they are a massive breach of security. It's just one more thing that can surprise people if they're trying to lock a system down, and that's already a pretty long list. And since it's not clear to me that it could ever be a short list, that's why I wanted to emphasize I wasn't trying to condemn UNIX. It's just that it's a feature that doesn't add much but complexity to bash, while not really offering any functionality that isn't better done with nc or something, and on the balance, probably ought to just be removed from an already complicated and security-sensitive program.
I agree that having this as a bash feature versus just using nc doesn't seem to buy much. But I think having these in the actual file system is useful. So why not do both: expunge them from bash, and get them into /dev (or maybe /net, or wherever they belong).
I rest my case. ;-)
What do we gain from having NUL everywhere, as opposed to having it in only one specific location, e.g. root?
Also, as an aside, I thought it wasn't a magic file (nul), but rather a magic device (NUL:), which IMO makes a lot of sense.
For a while, if you were in a team in which some developers were on Linux and others were on Mac OS X, and someone on the Linux side checked in a file named with a diacritic, on the Mac OS X side the file appeared to have been deleted (and a new untracked file with the "same name" appeared). Later git grew special code to work around this misfeature.
And yes, Linux has the "bizarre feature" of being way too permissive. A filename is a sequence of bytes of which only the null byte and the slash are forbidden, and only a single or double dot have special meaning; one can have files named with control characters, and/or with something which is not valid for the current character encoding (LC_CTYPE), leading to pain for languages which insist that a string must be always valid Unicode (this includes Rust).
But yeah, nothing compares to the madness that is forbidding simple names like "nul" or "con" or "aux" (alone or followed by any extension) in every single directory, made worse by the fact that you can create files with these names if you use a baroque escaping syntax (which is not available for every API), confusing every other program which does not carefully do the same.
And let's not forget about the fact that the file you just created might not be readable or writable the next instant, because some other process (usually some sort of "antivirus") decided to open it in a exclusive mode. I've seen several projects add retry loops when opening (or moving, or deleting) a file on Windows, to work around that issue.
I was under the impression that the new APFS stopped trying to understand bytes in filenames at all, thereby switching from 'confusion' to [tableflip] as a policy (which is likely an improvement, but also amuses me on the basis it's nice to know [tableflip] is about the only response anybody has to certain unicode-isms)
And let's not forget about the fact that the file you just
created might not be readable or writable the next instant
because some other process (usually some sort of
"antivirus") decided to open it in a exclusive mode.
The problem on Windows is that too many APIs decided that exclusive should be the default mode if none is specified - which is the safer choice in a sense that it gives the most guarantees (and the least surprise) to the caller, but arguably the adverse effects it causes on other apps are more surprising and harmful in the end.
Each OS has it's set of weird, broken and surprising behavior. Most of it in the name of backwards compatibility. There is a group of people that see one mess bearable while the all others totally brain-dead. There are other groups that have somewhat different opinion.
Everything sucks. Which one sucks less? I pick the one that I know more about.
But actually, I was wrong - GNU make passes strings to execute to the shell, so you can use nested quotes: CFLAGS='"-I/path/with/spaces"'. Not sure why I thought differently. The shell itself doesn't work this way, though: when it splits a variable into multiple arguments, it just splits by spaces rather than doing any fancier processing. So there are issues with shell scripts.
What constitutes "bizarre" depends a lot on what your prior assumptions are.
The received wisdom of the 90s is wrong. Most users don't care about compatibility, as Apple's success has clearly shown, and most companies are now out following the Apple road.
Large enterprise care about compatibility, and they pay a lot, but this is not a forward-looking market. They'll keep using buying new versions of your software because of the compatibility, but if compatibility is the only story you have to offer, you'll slowly lose that market.
I completely agree with you that Microsoft should have had a strategy for deprecating these features back in the 90s, when they were already old.
In this specific case of outdated filename restrictions, you could start with what they already did:
Windows NT 3.5 - Allow accessing all filenames with a special prefix (which they already did).
Windows NT 4.0 - Make it easy to migrate to sane filenames by providing an opt-in per-process flag that makes all APIs use them by default.
At this point they can easily dogfood and migrate all Microsoft software to the new APIs, so you would be able to delete these pesky files in explorer.
Windows 2000 - Make the new API flag default for all versions compiled with the latest version of the Windows SDK.
Windows XP - Make the new API default for any app without a special entry in the compatibility database.
Somewhere along the road, batch files (which is the only place where compatibility with the old filenames was necessary) could be easily made compatible by modifying the batch parser to replace redirections to NUL with redirections to \\?\devices\null or something akin. You may see some breakage in scripts which use NUL and CON in non-standard way (e.g. as an argument), but the migration pain won't be huge, and you could still save an old script with a compatibility flag.
Microsoft obviously didn't take that way, and yeah, all the batch files written back in 1981 may still work without hitch, but newer things keep breaking in strange ways.
(I wonder if part of this is the rage of Unix fans discovering that portable means actually, you know, making an effort... and that there's more too it than just checking it builds on x86 Debian as well as x64 Ubuntu...)
When I hear portable, I immediately think of the Portable Operating System Interface.
Apple is not exactly big in the same markets where MS is big, e.g. enterprise. So while I agree that "most" users don't care, the very few who do care might be important customers for MS.
The only MacBooks I've seen at the various meetups I've been to were at 'hip' dev shops.
by having 3% of the desktop market and 10% of the smartphone market?
And 7% of the PC market:
And greater than 10% in the US.
I would assign more meaning to cash hoard:
Microsoft: ~$100 billion. Apple: ~$250 billion.
It's not. It's a reserved word through the MS-DOS file redirection facilities. If you use the newer file API or you use the \\?\[path] convention; the reserved words are not an issue and you can create files named for them.
While we're here: NUL, COM<n>, LPT<n> and AUX are reserved.
Worse: they're reserved with any extension. Have a file in your repository called "aux.rs"? It will cause problems on Windows.
I think you need to be a wee bit pro-active and take a look at your potential deployment targets and try and guard against these types of naming issues. Unix and Linux aren't the only (one true) operating systems in the world.
QString appDataDir = QStandardPaths::writableLocation(QStandardPaths::AppDataLocation);
// ~/Library/Application Support/<APPNAME> on macOS
// C:/Users/<USER>/AppData/Roaming/<APPNAME> on Windows
// ~/.local/share/<APPNAME> on Linux
Also those locations are user specific, there's nothing there to support the use-case of an app that's available to all users, or might just be a system service (/daemon).
COPY CON MYFILE.TXT
type con > myfile.txt
Also, have fun trying to delete C:\Program Files\Xerox ;-)
In fact, it is possible to create files named NUL, COM1, etc. using \\?\ (eg. "\\?\C:\NUL" is valid path) prefix which disables parsing arcane Win32 magic files. Unfortunately these files are causing strange behaviour in applications that don't use that prefix, Explorer included.
Crate names have to be one or more valid idents connected by hyphens, so no other clever names like `/home` would be possible to upload. We already had some crate names reserved and we just needed to add these to the list.
And because it was a weekend, much of that time involved me trying to figure out who had the proper credentials for crates.io, and then texting those people until one of them responded. :)
It sounds like the concern you're describing is a different matter. It's likely true that if the source of a crate contains a file named "nul.rs", cargo on Windows will fail if it attempts to git-fetch the source (unless you're using Linux Subsystem for Windows, anyway). While this would indeed be a problem, it would only affect users who elect to use specific libraries, rather than serving as a denial-of-service for every Rust user on Windows.
Looking at the repo you linked, there's no allowance for that, so at least in this case you should be safe.
null is not a problem, but null.null, on the other hand...
<foo> to keep the number of files in a single directory down
<foo> tools become unhappy with hundreds of thousands+ of things in a single dir
<foo> as do filesytems
<bar> why not just a flat file
<bar> or sqlite or whatever
<qux> right now it uses git's deduplication feature
<qux> aka, when downloading updates you only download the objects that changed
<qux> but it mostly works on a per file basis
<qux> so git hashes each file and if the hash didnt change, it doesnt download an update
<qux> but if it did, it treats it as completely new file, even if its just a little change
<wycats> Because of this: https://github.com/CocoaPods/CocoaPods/issues/4989#issuecomment-193772935
<wycats> I ran some scenarios against huge repos when I first worked on cargo
<wycats> Trying to minimize the cost of operations
<wycats> I landed on the current strategy, and GitHub in the above thread more or less endorsed what we were already doing at that time
<wycats> Also see https://github.com/rust-lang/cargo/issues/2452
It sounds like the biggest win would be for cargo to keep using git, but clone the crates.io index as a bare repository rather than checking out the plaintext content. Then it would only take 47MB by your count, which is pretty close to 33MB, and you could still get out the plain content with `git cat-file` and friends.
Alternately, since JSON is text, I suppose you could just ensure that whatever emits this hypothetical merged JSON file puts newlines between different packages' entries, and then use a regular text diff (on the uncompressed version, of course). But reading 44MB of JSON isn't instant; it would probably be better to switch to either a binary format, or even something silly like a sorted list of JSON strings separated by newlines.
There would be some incidental complexity around generating and applying the diffs… you'd probably want to precalculate them on the server side, but it could be rather expensive to, on every change, calculate a diff between the current version and every previous change. Instead, you could have daily checkpoints: each day the server would make a checkpoint and calculate a diff to the last N checkpoints; on every update the server would recalculate the diff between the latest checkpoint and HEAD. The client would store both HEAD and a reverse diff to the latest checkpoint (or just store the checkpoint separately and waste a few MB), so when it updates, it could revert to that checkpoint and request the diff from there to the new latest checkpoint; it would also request the diff from the checkpoint to the new HEAD. If its checkpoint is too old then it would just redownload from scratch.
Overall, not a trivial change, but probably not too hard either.
apt-get does something vaguely similar with its pdiff files.
I know that some of the people who worked on cargo originally had experience with other package managers - mainly bundler - and I believe bundler used to use a single file, but ran into performance issues.
Anyways, it shouldn't really have mattered that Microsoft didn't care much for this for a long time. If everybody else had just been using the prefix since Windows XP came out then Microsoft would soon have been forced to change their own software as well.
For example with Python and non-MS C (e.g. clang and gcc on mingw), they should simply have made the standard libraries implement the file api using the prefix. Of course if you have a need to call CreateFile directly you would still be on your own, but if everything else created files you couldn't interact with then you would probably fix the problem.
I'd try it myself, but I've only got my phone with me.
I haven't actually done so, but earlier versions are available if you know where to look (https://archive.org/details/mac_Paint_2).
Edit: I suggest "terminated"
I like terminated! Good suggestion :D
I can't find it on crates.io though.
- Windows and unix line breaks in text files
- Windows and unix path separators
- BOM and non-BOM marked files if parsing UTF
- Forbidden filenames such as in this article
By "handling" I mean it should accept or fail nicely on unexpected input - e.g. say that line breaks should be unix style, or paths should be backslashes etc. Very few projects actually do this well. Even fewer will do even more complex things like handling too long paths with nice error messages etc.
git does not support UTF-16LE. The result is that UTF-16LE encoded files will be mangled by the line ending conversion. There is at least one generated Visual Studio file (GlobalSuppressions.cs) that is saved in UTF-16 by default.
I think "if possible" is (at least still) very rarely the case that it is.
And speaking of UI, that's one major hurdle right there.
And Mac classic linebreaks (\r only), for that matter.
> I believe crates.io's namespace is case insensitive let me know if that's wrong
Someone should probably validate that.
FUN FACT: As of 2017, Windows 10 is (partially) binary-compatible to Ubuntu Linux. Any application that was originally compiled for Linux will still be case sensitive when running on Windows 10.
The former includes a few punctuation characters that are between "Z" and "a".