Hacker News new | past | comments | ask | show | jobs | submit login
The weird world of Windows file paths (fileside.app)
435 points by sharevari on April 21, 2023 | hide | past | favorite | 285 comments



After years of working exclusively on Windows, I took a job that required me to build file management, except now, on macOS and Linux (along with Windows).

All I can say is, this article is the tip of the ice berg on Windows I/O weirdness. You really don't realize how strange it is until you are actively comparing it to an equivalent implementation on the two other competing operating systems day-to-day.

Each of them has their quirks, but I think Windows takes the cake for "out there" hacks you can do to get things running. I sometimes like to ponder what the business case behind all of them was, then I Google it and find the real reasons are wilder than most ideas I can imagine.

Fun stuff!


"get-a-byte, get-a-byte, get-a-byte, byte, byte" - Dave Cutler https://retrocomputing.stackexchange.com/questions/14150/how...

>Windows I/O weirdness. You really don't realize how strange it is until you

"can I get a wut... WUT?" - Developers, Developers, Developers

====

it's just hard for people today to understand what a revolution stdin and stdout was, full 8 bit but sticking with ASCII as much as possible. There was nothing about it that limited Unices from having whatever performant I/O underneath, but it gave programmers at the terminal the ability to get a lot done right from the command line.

The web itself is an extension of stdin and stdout, and the subsequent encrusting of simple HTTP/HTML/et al standards with layer upon layer of goop that calls for binary blobs can be seen as the invasion of Cutlerian-like mentalities. It's sad that linux so successfully took over that all the people who we were happy to let use IIS and ASP and ActiveX had to come over to this side with their ideas. No idea of which is bad, but which together are incoherent.

client server FTW, bring it back


"No idea of which is bad, but which together are incoherent."

Right, my complaint about filename/path length being exceeded in my earlier post often occurs when a web page is saved by a browser (some web pages have outrageously long filenames).

Incidentally, many years ago I did a tour of Microsoft's operation in Seattle around the time Microsoft introduced subdirectories into MSDOS and the tour guide (can't recall his name but he was responsible for the development of MS's Flight Simulator) gave a considerable spiel about why Microsoft decided to run with the backslash instead of the forward one as per Unix. Even then, I thought 'oh no, here comes confusion', and others with me thought the same. When we challenged him about it, he said we (Microsoft), want to clearly differentiate ourselves from Unix (there was an arrogance about his answer that I well remember).


Flight Simulator was developed outside Microsoft, but within Microsoft you could only be referring to Alan Boyd. I received a similar tour.

I'm sure you heard what you thought you heard, arrogance and all, but iirc it was backslash because IBM insisted the slash be the switch character. Anybody remember $SWITCHAR


I know Flight Simulator was originally developed outside MS and I think my tour wasn't that long after MS acquired it.

It's too long ago for me to associate the name 'Alan Boyd' with the person in question but I do remember that he had a loud, penetrating self-assured voice. (Incidentally, he spent considerable time demonstrating Flight Simulator's new features).

You're right, IBM was a large part of the discussion as back then it was the principal client for MSDOS. However, I came away from the visit with the understanding that MS was in full agreement with IBM's decision despite MS's dabblings with Unix.

I had a particular interest at the time as I had a S-100 Godbout CompuPro computer, and in addition to CP/M, I ended up putting Seattle Computer Products' DOS (SB86 from Lifeboat Associates) on it which meant that I had compatibility with MSDOS.

I can understand why MS would have wanted to differentiate MSDOS and the backslash being one way, what I'm still not clear about is why IBM would have wanted to make such a distinction.

Re $SWITCHAR, very vaguely, but from my point it added confusion. Like some other commands its implementation and architecture appeared to be the result of afterthought rather than good design. I've forgotten much of that stuff.


At least Windows paths are not as clunky as Cutler's prior operating system, VAX/VMS Version 1.00 21-AUG-1978 15:54.

  $ create/directory dm0:[foobar]
  $ set default dm0:[foobar]
  $ copy DM0:[SYSMGR]SYSTARTUP.COM example.txt
  $ dir/full

  DIRECTORY DM0:[FOOBAR]
  21-APR-23 14:42

  EXAMPLE.TXT;1 (615,2) 0./0. 21-APR-23 14:42 [1,4] [RWED,RWED,RWED,RE]

  TOTAL OF 0./0. BLOCKS IN 1. FILE


I rather liked VMS conventions as it was immediately obvious what was the directory part of the path and what was the filename and the device. You could create virtual devices as well, so for example, with VMS TeX, I had TEX_ROOT defined to point to the root of the TeX distribution and you would have TEX_ROOT:[INPUTS] for the input directory, TEX_ROOT:[MF] for Metafont files, TEX_ROOT:[EXE] for executables, etc. and everything was logically arranged. CLD files were another wonderful thing where you could define your CLI interface in an external file and let the OS handle argument and option parsing for you.


> I rather liked VMS conventions as it was immediately obvious what was the directory part of the path and what was the filename and the device.

1. unix doesn't expose devices in the file tree, and thank the [deity] for that

2. directory part: everything to the last /

3. filename: everything afte the last /

4. TEX_ROOT=/foo/bar; cd $TEX_ROOT/MF

Could it be any easier?


> unix doesn't expose devices in the file tree

I mean, they are exposed as files (e.g. /dev/sda) within the file tree, but they aren’t exposed by a file path.


Maybe parent is talking about the fact that you don’t know whether bar in foo/bar is a file or a directory?


On 2 & 3, what if you have a directory? There’s no easy way to tell whether /foo/bar is a directory or a file. FOO:[BAR] must be a directory.

I have a vague notion that there might have been the ability to have a virtual device span multiple directories, but as I think about it, this seems unlikely since it would make it ambiguous where something would be created if a directory exists in both root1 and root2, so perhaps not. It’s been 23 years since I last used VMS and 30 since it was my daily driver though, so it’s hard for me to say too much.


Clunky? Version control built in, distributed FS built in...


The downside of building in stuff is that you can't easily replace it. I don't have any VMS experience, but the versioned file systems I've studied are very far from being a replacement for what I would consider version control today. They're more like the automatic writing of backup files in Emacs.

Distributed FS sounds like something with a huge design space. It's better to do that in userspace. Was Plan 9 really the first time virtual file systems could be implemented in userspace? It seems like such an obviously useful idea in retrospect.


The diff3 utility for 3-way merge appeared after the VMS debut (1979).

https://en.wikipedia.org/wiki/Diff3

Can you really call what we see in VMS full revision control when it lacks this capability?


I’m not familiar with inner workings, but simply moving files feels odd in Windows compared to macOS. If it’s a big lift in terms of data size or file/folder counts it’s most obvious but it feels like Windows literally copies the files into memory, then rewrites them on disk or something similar that has results in negative performance and a long running cut/copy/paste dialog box. I’ve had some of these run for hours on decent hardware (SSD, etc) for what I consider small datasets (couple GB). It’s been a major Windows gripe of mine for years now.

Meanwhile macOS appears to just change an internal link to the data that’s already written on disk. As such, it’s usually so very fast compared to Windows.


Windows File Explorer does a lot of extra work to get a sense of file sizes and other metadata to try to keep the UI looking fresh/interesting/useful to someone watching the job in real time.

If you need to seriously move/copy lots of files or lots of data in Windows it is generally a good idea to use shell commands. Robocopy [1], especially, is one of the strongest tools you can learn on Windows. (It gets very close to being Windows' native rsync.)

[1] https://learn.microsoft.com/en-us/windows-server/administrat...


Windows does literally copy (parts of) files into memory. More precisely it's Windows Defender Real-Time Protection. It's a real menace when you're dealing with a lot of small files, e.g. node_modules.

Windows Explorer is also slow for an unknown reason.

Doing file operations through the API with Real-Time Protection turned off is several orders of magnitude faster in the case of small files. It's crazy stuff.


A lot of this depends on whether you're crossing devices. If you think of drive letters as mount points it may make more sense - if you're moving between mountable filesystems obviously a move has to be a copy-then-delete; if you're remaining on the same filesystem a move can typically be a rewriting of indexing information only with very limited data rewriting.

One other thing that can be an issue particularly on NTFS with ACLs is that moving files typically retains their ownership and permissions, while copying files typically inherits the ownership and permissions of the destination. This can bite you if as an administrator you're moving data from one user's account to another because a move will leave the original owner still owning the files.


That's likely to be due to the NTFS file system. Another piece of legacy Windows drags along


Eh, with moving files on Windows in the same security context it is 'generally' pretty fast on the same drive.... Are you sure you didn't paste in a directory that is setting new security permissions on all files?


Been a while since I looked at the details, but Explorer's file management is generally slow compared to what you can do with the actual Win32 APIs.


> All I can say is, this article is the tip of the ice berg on Windows I/O weirdness. You really don't realize how strange it is until

In one way it is beautiful. "Laid cards lies", you know. Don't mess with the user for your conception of Agile Clean Extreme Code (tm). Each stupid design decision is forever.

Windows .bat files win the awfulness and quirkyness contest with sh with a razor thin margin. And both are awesome.


Windows has PowerShell these days, and it is actually usable and not at all awful (if a little quirky) compared to bat files.


PowerShell is an absolutely amazing scripting language; it gets things done way quicker than Bash, because it's object oriented, and you don't have to call external tools to get anything done (sed, grep, find, touch, curl, etc.). It can even run raw C# code for you.


This definitely falls into the category for me of "things that I wasn't there for."

Because I learned computers when DOS was a thing, I will always be able to write a .bat or use CMD when necessary, but having been on the UNIX/Linux side since 2003, I didn't learn C# or PowerShell but rather bash, php, ruby. So while I'm friendly with modern Windows now that they closed up the "is it a stable OS" gap with Apple, I don't really know what to do in PowerShell and am more likely to use WSL!


PowerShell is the closest of Xerox PARC REPL experience that ships in the box on modern platforms.

Because not only it is a proper programming language, it is integrated into .NET, COM/DLLs as well, so not only you can script the OS, any application automation library is exposed as well.

Nowadays, it is possible to automate anything on Windows via PowerShell, the same OS APIs exposed by GUIs are also accessible to PowerShell.

On UNIX side there are things like Fish shell that also offer these capabilities, but they aren't widely adopted as PowerShell on Windows.


Sorry, but Powershell is not a programming language. It has way too many quirks and gotchas to qualify as a programming language. It is an interactive scripting language first, and a scripted language second. But a programming language it is not.

To give just one example: its automatic boxing and unboxing of arrays disqualifies it as a programming language. Try to return a one-element array from a Powershell function and you'll see what I mean.


It's worth learning at any age, especially now that it is an open-source, cross-platform shell. The PS Koans [1] that recently showed up on HN seemed an interesting way to try to learn it.

[1] https://github.com/vexx32/PSKoans


I just want a shell that runs my commands, I don't want yet another language.

The _beauty_ of bash is you can learn the basics of the language very easily, call out to external tools, and _take that knowledge with you_. Those tools exist independently.


I’ve tried to switch to Powershell a few different times and I always find it to occupy this no man’s land between a quality shell and a quality scripting language. As a shell I find it inferior to BASH and as a scripting language I find it inferior to Python.


“Powershell(tm): Not Entirely Awful (if a little quirky)!”

Seems worth investing a lot time into given Microsoft’s history of not rug pulling developers.


The first release was 16 years ago and they're still making new releases of it so I'd say it's definitely here to stay ;)


Also it is MIT-licensed open source today: https://github.com/PowerShell/PowerShell


And cross platform: https://learn.microsoft.com/en-us/powershell/scripting/insta...

Not that it’s a particularly compelling feature on Linux with the standard offering, but it’s a good option for cross platform scripts at times, particularly running in docker.


I mean, if I want something that can run on as many platforms as possible without a prior installation, I stick as closely as I can to posix sh. If I want something more flexible that can run consistently, but may require an installation beforehand, I use python. I don’t really see what niche PowerShell would fill for me.


It doesn't have to fill a niche for you. Before cross-platform PowerShell I certainly used Python for some of those kinds of scripts.

I think a lot of it gets down to ergonomics/aesthetics to decide if you find a useful niche for PowerShell for yourself. Python's os module is powerful and lets you run/chain almost any native commands and shell operations you want to spawn, but it is still a very different API and abstraction with different ergonomics and aesthetics than shell-style pipes and redirects.

PowerShell gives you that focus on shell-like pipes/redirects, but then gives you some Python-like power on top of that to also work with the outputs of some commands as objects in a scripting environment. There's a lot of interesting value to comparing/contrasting PowerShell and Python and if you are happy with Python maybe there isn't a big reason to learn PowerShell. PowerShell is there for when you are doing a lot of shell-like processing pipelines and want to write them as such, but have some of that power of a language like Python behind it. It's a lot more powerful than posix sh and it is similarly but differently powerful to Python but it starts from a REPL that looks/acts more like posix sh. I don't know if you have a need for that niche yourself, but I find it useful for that.


A bit like finding in favour of london's old, awful slums by comparing them to the Somme.

"Well yes, kind of..."


Powershells biggest problem is error handling. There's just no equivalent to set -euo pipefail

It ends up with manual error handling to handle the case of bat, ps1 and exe's


`$ErrorActionPreference = "Stop"` is very similar and does that for all of PS1 cmdlets. You still have to check $LastExitCode manually for BAT/EXEs, though.

`$PSNativeCommandUseErrorActionPreference = $true` is an experimental flag [1] as of PowerShell 7.3 that applies the same $ErrorActionPreference to BAT/EXEs (native commands), stopping (if $ErrorActionPreference is "Stop") on any write to stderr or any non-zero return value.

[1] https://learn.microsoft.com/en-us/powershell/module/microsof...


Having to handle exe's and bats separately is _exactly_ the problem with $ErrorActionPreference, and why it's not suitable.

I wasn't aware of $PSNativeCommandUseErrorActionPreference though, seems like it's very new. How does that work with the helpful windows tools that decide to not return 0 on success (hello, robocopy)


The answer to that, including robocopy as the direct example used, is at the bottom of that documentation I linked on $PSNativeCommandUseErrorActionPreference: you set it to $false before calling something like robocopy and then reset it when done.


Sadly, the powershell windows ships with is 5.1.

Why?

Because of backwards Compatibilytytytyy


`winget install Microsoft.PowerShell`

5.1 was the last "Windows-specific"/"Windows-only" PowerShell (and is still branded "Windows PowerShell" more than "PowerShell") before it went full cross-platform (and open source). It's an easy install for PowerShell 7+ and absolutely worth installing. If you are using tools like the modern Windows Terminal and VS Code they automatically pick up PowerShell 7+ installations (and switch to them as default over the bundled "Windows PowerShell"), so the above command line really is the one and only step.


You can also install the latest PowerShell Core (the open-source, cross-platform releases we're talking about) via Scoop, which is a package manager for Windows that works even if you don't have admin rights: https://scoop.sh/#/apps?q=pwsh&s=0&d=1&o=true


Unless I can rely it being somewhat available, it's not really feasible to use. It's a bit like writing scripts in fish because it's easily installable - nobody is going to use it.

Winget isn't bundled with windows 10 either, (but I think it is with 11), and it's not on windows server.

If I need to install a package manager _and_ a shell, I might as well just install WSL and be done with it.


Winget is auto-installed in Windows 10 by Windows Update and/or Store Update for every copy of Windows 10 with a recent enough build for more than a year or two, so long as that machine doesn't have the Store disabled or blocked. It is bundled inside the "Application Installer Platform" which is a low-level Store package that powers a lot of little things like the "double-click to install an MSIX file" experience and Windows generally keeps up to date quickly if Store updates aren't blocked.

I can't speak to your usage of Windows Server, but provisioning winget and PowerShell 7+ are standard bootstrapping steps in VM images at places I work, because those are generally assumed to be basic equipment at this point.


powershell is still built on crap

I had it randomly throwing exceptions the other day that a path was too long

(it was only about 300 characters...)


It also adds it's own special brand of crap .. as in after trying 10x different ways (not kidding: https://social.technet.microsoft.com/wiki/contents/articles/...) of executing an external ffmpeg command over several hrs I eventually wrote a one line .bat file* and was done with it. Never again.

* for %%a in ("*.mp4") do ffmpeg -i "%%a" -vcodec libx265 -crf 26 -tune animation "%%~na.mkv"


It's very simple, you don't need any special magic to run a command.

    gci -filter "*.mp4" | foreach { ffmpeg -i $_ -vcodec libx265 -crf 26 -tune animation ($_ -replace '.mp4','.mkv') }
Files with spaces just work.


The irony of commenting that under an article that cites the maximum path cannot exceed 260 characters is not lost on me


The maximum path length of the NT kernel is 3276-something UCS-2 characters. 260 is a limit imposed by the legacy Win32 interfaces IIRC. I believe the W-interfaces get you the full-fat version, it's just that they're so inconsistently used as to all but guarantee that something you need will have problems.

The user-mode stuff is kind of a mess. The kernel-mode stuff is comparatively orthogonal.


You also have to prefix[1] the path with \\?\ unless you've enabled a group policy[2] in Windows 10+.

[1]: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/...

[2]: https://learn.microsoft.com/en-us/windows/win32/fileio/namin...


But the path length can be longer! They just don't care and offload the work.


how is that ironic?


It's like rain on your wedding day


I think the margin of bat vs sh is a larger margin. Batch files use gotos, the entire file is reread for each line, etc.

My main complaint about sh and Unix is that environment variables are mostly inherited from parent processes, and that can be awkward.

(And Windows at least, these days, is seeing a lot more PowerShell adoption.)


> I think the margin of bat vs sh is a larger margin [...] the entire file is reread for each line

Ye agreed. Separation of data and code always was a mistake.

I wonder if this feature of bat files was like a thing once as a "best practice"? Practically, you should only append lines I guess. When I close my eyes, I can see a DOS batch file doing actual batch job processing, appending to itself and becoming a intertwined log and execution script.


"All I can say is, this article is the tip of the ice berg on Windows I/O weirdness."

Well, then, is there a more detailed summary than this one that's accessible?

This one looks very useful and I'll use it, but to make the point about more info it'd be nice to know how they differ across Windows versions.

For example, I've never been sure about path length of 260 and file length of 255. I seem to recall these were a little different in earlier versions, f/l being 254 for instance. Can anyone clear that up for me?

Incidentally, I hit the 255/260 limit regularly, it's damn nuisance when copying stops because the path is say 296 or 320, or more.


There are some APIs that have a lower limit than 260. But the limits can be bypassed using `\\?\` prefixed paths (except when using SetCurrentDirectory) or by enabling long paths https://learn.microsoft.com/en-us/windows/win32/fileio/maxim...


Windows.h defines MAX_PATH as 260.

Many apps do something like

char path[MAX_PATH]

In that case, no amount of prefixing will help you, if random app enforces the limit.


"no amount of prefixing will help you, if random app enforces the limit"

I've noticed that, it's partially the reason for my confusion (I didn't wake up for quite a while as I put it down to the different versions of Windows I was running on various machines). Other pains are caused by apps that still don't run Unicode and crash or stop copying when they encounter a non-ASCII character.


Yeah, old ones using raw Win32 calls, where developers haven't read anything beyond Petzold's book.


So is the limit 259 characters plus a null?


Thanks for the reference, I wasn't aware of those changes in Win 10 (I run mainly Linux and have been weaning myself off Win for some years).

"...(this value is commonly 255 characters)."

I think the word 'commonly' in those notes confirms my point in that it's changed slightly over the years. Also, I recall the internal processing was once 16k and not 32k, come to think of it this may have been with the previous version of NTFS (can't remember which version of Win that was). My interest is now piqued so I'll search it out.

That we're even discussing such matters confirms the thrust of the article.


Explorer can resolve some http urls to network paths, e.g. for SharePoint libraries.

Programs, e.g. python scripts, can then use those paths, but only after explorer has resolved them. Before that, the path will be treated as not existing.


From my experience as an EE, working with serial ports is much nicer on Windows (COM1, COM2, etc.) than on Linux where serial ports are abstracted to files like /dev/ttyACM0 and has a lot more gotchas.

PowerShell is also quite a powerful alternative to Bash/Mingw, although it came out much later.

Windows might do some things differently than UNIX-like OSs, but it does them really well.


Technically, COM1, COM2 etc. are filenames as well. They are just special in that they are available everywhere. That's why you are not allowed to create any file named COM1 or such.

But it's a DOS relic. Actually, Windows has a "Win32 Device Namespace" under "\\.\", which is something like /dev/ in Unixoids. COM1 is actually \\.\COM1: https://learn.microsoft.com/en-us/windows/win32/fileio/namin...


it's a CP/M relic of i'm not mistaken

https://en.wikipedia.org/wiki/CP/M


Wow! There's an operating system I ain't heard tell of in a good long while. That's the very first OS I used in a professional context. Got me my first computer store job (in my late "teens") on a CP/M system.


I deal with software that processes files on a Windows system... loves to break when people on other OS's subnet AUX, PRN, COM, File:Name, and tons of other unacceptable names (like 'file ').

I'm glad our new releases work on Linux and we don't have to deal with that crap in 99.99% of cases now.


I've done quite a bit of work with serial ports on Windows, Linux and other unixes. I've also written a serial device driver.

Your comment is very confusing to me. The serial ports are abstracted to a file on Windows just like on unixes - the file is actually discussed in the above article: \COM1

Maybe you're talking about the old days where you would just outb 0x3f8? The modern interfaces are actually fairly similar.


0x3f8 IRQ4, 0x2f8 IRQ3 - still hardcoded in my brain 30yrs on!


My "burned in" code snippet is "call -151" from Apple ][+ days, to drop to the built-in 6502 (dis)assembler/debugger.


MONZ

I spent a lot of time reading the disassmbly listing in the back of the manual to see what happens when I jump to the monitor.


Remember typing in entire programs from magazines and computer manuals and saving them to cassette tape or floppy disc? That was "the good ol' days" for sure… :)


There is also the persistent problem of USB serial adapters being assigned incremental numbers until they're in double digits that many tools don't let you select from their GUI. So you have to go in and manually purge those devices to get back to sane numbering.


I just started with using serial ports on windows while doing some Raspberry Pico hobby projects. Something that I find strange is that every new device gets assigned a new comport, I mean let's say I do this for a while one day I will have a comport 100, 200 and so on. Is that right, or does it somehow reset the comports?


That's how it works and generally it's to the user's advantage. We often set specific parameters based on the device's serial number so getting the same COM port is nice, sometimes the devices are so simple that you cannot query its serial number.

Sometimes I'll do a "blank slate" and delete all my accumulated COM ports in Device Manager (need to enable "Show Hidden Devices").


COM ports on Windows are crap nowadays due to how crappy USB to serial adapters have become. I've seen Windows reassigning different COM names to the same device every single time it was unplugged due to it "remembering" what COM port was used previously. Needless to say, that was an anti-feature if there ever was one.


Windows tries to keep a long term identity of all of the device instances that it knows about (and in the idela world assign the same COM port numer to the same physical serial adapter). For USB this is supposed to be done by combination of VID, PID and serial number in device descriptor. But even early on there was a lot devices that had the serial number empty and thus Windows came up with some heuristics about when this method is unreliable and switches to identifying the device by its physical position on the USB bus. The whole mechanism is well intentioned, but in the end probably somewhat misguided because it is not exactly reliable and has surprising consequences even if it were reliable.

As a side note: on modern Windows NT implementations the so called "registry bloat" is non-issue (anyone who tells your otherwise is trying to sell you something), but keeping list of every device that was ever plugged into the computer in there is somewhat ridiculous.


> As a side note: on modern Windows NT implementations the so called "registry bloat" is non-issue

How modern? I manage Windows 7 (transitioning to 10) machines that are used for QC in a hardware manufacturing environment that enumerate hundreds of devices (with mostly identical VID/PID) every week. We find that if we don't clear old devices out of the registry every so often, the enumeration time slows to a crawl.


That is kind of a niche use case ;)

In the times when it was a real issue (I would hazard a guess that that means “before XP”) the reason was that the original registry on disk format made every registry access more or less O(n) in bunch of things like the overall on disk hive size, total number of keys, number of subkeys in each of the keys along the path…


It also do this for monitor or usb/bluetooth earphones. So you end up get earphone(2), monitor(2) even you never have a second one. The only way to fix it is delete the hidden device in device monitor and rename it back in monitor/audio manager.

It's really a confusing thing to me that the script I use to change sound output and leveling suddenly didn't work after a bios/mobo software/whatever windows update and noticing the device have an appended (2).


And this is why I hate Windows in an industry automation environment. Dislike having to troubleshoot why that USB NIC or Serial device being destroyed by plugging it into another port. Had to write a PowerShell script for the USB NIC issue to reapply NIC settings with a reboot.

Also, always locking an open file is repulsive. Other OSs allow for renaming an open file. Not Windows! Thumbs.db being locked because File Explorer keeps the file open / locked preventing deleting an empty folder and wastes so much time waiting for Windows to unlock the file.

You have to pay me to use Windows!



COM1 = CreateFile("COM1", ...) => Nice!

COM9 = CreateFile("COM9", ...) => Nice!

COM10 = CreateFile("\\\\.\\COM10", ...) => NOT nice!


How often do you end up with 10 COM ports?


We do all the time. In industrial automation COM ports are still shockingly popular, although it's usually the USB emulated variety. On a lot of our development and on some of our production tools we end up with COM20 or COM30, not because we have that many running at one time but because over time we've plugged in that many distinct devices. Nowadays most drivers will assign COM(n+1) when they see a device with a new serial number.


UART is available on nearly every microcontroller under the sun, and USB<->UART serial chips are super cheap, so it makes complete sense to me that'd become the defacto system for interfacing the automation controller with a computer


Even beyond that, USB is available on many microcontrollers, a USB CDC device is dead simple to implement, the drivers are baked into every modern OS, and all the software developers operating at that layer already know how to interact with text streams. Add in the ease of debugging when you can just manually send/receive ASCII to operate a device, and you've got the makings of a standard practice.


If you use USB dongles for serial adapters, then each path through USB is assigned a different COM number when you plug it in. For example if you plug into USB controller 2, port 3 which goes to a hub, and then you plug into port 2 that gets a number. Now plug the same thing into a different port and it will get another COM number.

Under the hood this is because the USB devices do not have the (optional) unique serial number (or in some cases they all get the same serial number).

https://devblogs.microsoft.com/oldnewthing/20041110-00/?p=37...


I’ve found the assignment of com ports in windows really annoying


Interesting. I very much prefer working with serial ports under Linux than Windows. It's more straightforward and easier to engage in fine control.


So do I, I find the addressing more consistent, too.

It used to be completely predictable when I was working with drivers on 1994 (patching the code), then less predictable when hardware for more diverse, and predictable again (or at least "always the same") with UUIDs.

It was always amateur/hobby dev or sysadmin so I may have had the wrong impression.


It’s the flip side of the ‘we bent over backwards so SimCity runs’ coin. Even though Windows hasn’t supported programs out of this era since 64bit became the standard, it’s still held back by clinging on to the legacy. Because it doesn’t dare say ‘this is too old, run it in a VM’.


The fact these paths are considered at all "weird" just underlines how much we live in a Unix world.

Filesystem paths used to all be weird in the sense that there was more OS diversity. I'm sure some people here remember that classic MacOS paths used colon as the separator:

  Hard Drive:My Folder:My Document
VMS (designed by the same person as Windows NT by the way) had paths that looked like this (per Wikipedia):

  NODE"accountname password"::device:[directory.subdirectory]filename.type;ver
RSTS/E had [project,user] in the filename.

Multics paths:

  >dir1>dir2>dir3>filename

Apple Lisa used dashes as the separator. Etc.


As someone who grew up with Windows, I don't think these paths are that weird at all. Drive letter working directories just make sense, for example. The weirdest part is the (edit: HFS) compatibility mode (file.ext:substream).

One fun surprise is that because of codepage reasons, the Windows will use ¥ as a path separator in Japanese. In Korean, it's ₩. These characters represent U+005C, which is \ in Latin-compatible character sets.


It's pretty weird that Windows drives use random letters instead of just the name of the disk.


I tend to use /dev/sda1 more than /dev/disk/by-path/pci-0000:00:17.0-ata-1.0-part1. Disk names are nice, but also often longer than 8 characters and usually not very unique.

Starting from A and iterating on through Z makes sense, for an OS that's designed for two drives at most. /dev/sda and /dev/sdb are no less arbitrary than A: and B:.

One major difference was that Unix was used on big servers and couldn't fit itself onto a single disk, so /usr had to be created. DOS and Windows never needed a second drive to boot, so they didn't need to embed their resources into the drive hierarchy.

Of course, you can mount NTFS volumes at any directory you wish since at least somewhere in the early 2000s. Very few people do it, but you can!

For example:

    $Disk = Get-Disk 2
    $Partition = Get-Partition -DiskNumber $Disk.Number
    $Partition | Add-PartitionAccessPath -AccessPath "G:\Folder01"


It’s a legacy from the IBM CP/CMS days.

First floppy drive was A, Second B, and when internal Hard Drives came along they defaulted to C to be compatible with computers that had at least 2 disk drives.

https://en.wikipedia.org/wiki/Drive_letter_assignment?wprov=...


If I rembember correctly, you could use the B drive even if you have just one unit. It was useful to copy files from one disk to another, even if you didn't had an hard drive as temporary storage


This was ultimately inherited from IBM’s CMS system [0] (from 1968 I believe) via CP/M and DOS.

[0] https://en.wikipedia.org/wiki/CMS_file_system


> The weirdest part is the HPFS compatibility mode (file.ext:substream).

HPFS had extended attributes, but not substreams. You are thinking about HFS; substreams were added to NTFS to support storing resource forks on network shares used by Macs.


Oops, you're right, added an extra letter to the acronym.


that's amazing. and point taken that Windows is probably extra weird due to its longevity, evolution, and backward compatibility.


Windows is younger than Unix, and Unix filesystem evolved has "evolved less" due to getting it right the first time, removing backward compatibility issues.


that's not a compatibility thing, is it? it's just the alternate streams feature that NTFS implemented.


NTFS implemented it to be compatible with Mac. They then started using it for storing the Mark of the Web and other special system properties, but practical came much later.


> I'm sure some people here remember that classic MacOS paths used colon as the separator

In modern macOS (previously OS X), you’ll eventually bump into those if you need to work with paths in AppleScript. You have to specify when you’re using a POSIX path so it is properly converted. Example:

    $ osascript -e 'POSIX file "/System/Applications/Mail.app/Contents/MacOS/Mail"'
    => file Macintosh HD:System:Applications:Mail.app:Contents:MacOS:Mail


Doesn't MacOS translate "/" into ":" sometimes in save dialog boxes when you type in a filename?


Go in the Finder and try to change a file name to have a colon. macOS will tell you it can’t do it.

Now change it to have a forward slash. macOS will happily abide.

Finally, look at that file’s path in a terminal. Where the Finder shows a forward slash, the terminal will show a colon.

Redoing the AppleScript example:

    $ osascript -e 'POSIX file "/tmp/file with : forward slash.txt"'
    => file Macintosh HD:private:tmp:file with / forward slash.txt


As another example, with ADFS (Advanced Disk Filing System) on the Acorn/BBC computer family, the root directory was specified with `$`, and the directory separator was `.`.

    $dir1.dir2.dir3.filename


A full path on RISC OS included the filesystem:

  ADFS::IDEDisc4.$.Games.!Repton.Arctic
ADFS is the filesystem, IDEDisc4 is the disc name, $ is the root directory, Games is a subdirectory, !Repton is an application directory (since it begins with !) and Arctic is a file within the application directory, not normally referenced by users.

  Resources:$.Apps.!Edit
is the application !Edit from the built-in ROM.


macOS still uses colons as the path separator, it just does a great job of hiding them from the user. If you try to open a file with a slash in its name in a shell, though, you'll need to use a colon.


I suspect that it is the other way around and the Finder and standard dialogs (both of which use slashes as path separator when you type the path) simply shows colons in filenames as slashes.


macOS's kernel has BSD roots, so I'd be surprised if its VFS code accepts anything other than unix paths. Just a guess, but it's probably the Cocoa APIs accepting colon paths and translating it to unix paths internally.


> NODE"accountname password"::device:[directory.subdirectory]filename.type;ver

Interesting. When you think about it, that doesn't look all that different from:

scheme://user:password@host:port/directory/filename.type?key=value&key=value#fragment

which is arguably the most common kind of "file path" in use today.


I was parsing paths into an array of dirs recently.

The `/` root dir is quirky. You can’t just do `dirPath.split(‘/‘)`. You have to handle it as a special case. Would be easier if it had a special name. Like `$/dir1/dir2`.

Or am I missing something.


thank goodness unix has cleared this up, with paths, mountpoints, overlay filesystems, chroot, device trees, bind mounts, loopback mounts and probably a few I forgot...

(sort of amazing the original premise, and the exceptions and workarounds you gradually accumulate and take for granted)


Another lesser known fact:

The volume id string (what you get with mountvol) is - at least up to Windows 10[1] a UUID version 1 according to RFC 4122, i.e. time and node based:

https://www.famkruithof.net/guid-uuid-make.html

https://www.famkruithof.net/guid-uuid-timebased.html

Since windows creates the UUID the first time it "sees" a volume, and - usually - uses the network card MAC as node, by decoding the UUID you can get the MAC address of the PC and the time the volume was seen (this can be useful for forensics, expecially with removable devices and to verify there has been no manipulation of the MountedDevices in the Registry).

[1]possibly windows 11 changed that, or at least the UUID's shown in the article are type 4


So many things wrong with this article. Some things that I noticed by skimming over it:

> UNC paths can also be used to access local drives in a similar way:

> \\127.0.0.1\C$\Users\Alan Wilder

> UNC paths have a peculiar way of indicating the drive letter, we must use $ instead of :.

This is actually incorrect... He's actually accessing some random share that has no real connection to a drive. Yes, sometimes (quite often), the C$ share corresponds to the C: drive's root, but this is by no means given, as one can easily either delete the C$ share, or have it pointing to somewhere else entirely

> When the current directory is accessed via a UNC path, a current drive-relative path is interpreted relative to the current root share, say \\Earth\Asia.

This is also wrong. There is no "current directory" on an UNC share (which can easily be shown by trying to open a command prompt on a UNC share, it will show an error and start you somewhere on C:\users), and the example he gives just tries to access the share "Asia" on the server "Earth"

> Less commonly used, paths specifying a drive without a backslash, e.g. E:Kreuzberg, are interpreted relative to the current directory of that drive. This really only makes sense in the context of the command line shell, which keeps track of a current working directory for each drive.

Also wrong, it's not the command line shell that keeps track of the current directories, it's the Windows kernel itself. But I agree that such a scenario is quite useless as you can never be quite sure on what CWD you are on a given drive

> For the most part, : is also banned. However, there is an exotic exception in the form of NTFS alternate data streams.

Yeah, well, surprise: the ":" is not part of the file name, it's just a separator between filename and stream name. This is like saying that "you cannot have \ characters in a file name, but in directory names it is allowed". No, it's not. It's a separator


> There is no "current directory" on an UNC share

SetCurrentDirectory allows setting the current directory to a UNC share. https://learn.microsoft.com/en-us/windows/win32/api/winbase/...

> Also wrong, it's not the command line shell that keeps track of the current directories, it's the Windows kernel itself. But I agree that such a scenario is quite useless as you can never be quite sure on what CWD you are on a given drive

Not for a long time. It's set as a special (hidden) environment variable like `=C:=C:\current\directory`. https://devblogs.microsoft.com/oldnewthing/20100506-00/?p=14...


>SetCurrentDirectory allows setting the current directory to a UNC share.

Exactly. The cmd prompt not setting UNC paths as current directory was introduced around Windows 2000 (or maybe post-XP, it's been a while) to help legacy batch files being run from a share and then getting confused by being on a UNC path instead of one beginning with a drive letter.

This was also why, when you do a pushd \\server\share cmd.exe puts you on a mapped drive instead of directly on a UNC path.

If you use the Windows native version of tcsh, for example, you can happily use UNC paths as current directories and run commands (provided they don't try to parse drive letters from their CWD)


> we must use $ instead of :.

Eh, I just thought the $ in a Windows NAS share was to ensure the share was hidden from browsing. Microsoft used to have documentation on that, but seems to be missing from their site after they removed old articles.


that's exactly the reason, yes


The bit about "UNC Paths" is a bit simplified. The "$" shares are administrative shares. They're created by default, you can delete or disable them (though, if you delete them, they'll be recreated on a reboot). You can also add normal users to them.

It should also be noted that while the single driver letter ones are automatically created, the "$" at the end just marks them as hidden. You can create your own hidden shares if you ever want to.


You can also change the permissions on them so you don't need to be an admin to access them.


The (second-)worst offense I'm aware of here is that alternate data stream names can have otherwise special characters in them, like backslashes. So if you (for example) want to strip off the last path component, you technically cannot do this by just stripping everything after the last backslash.

In fact this probably isn't the worst thing - it's even worse than this. Because you first need to strip off the prefix that represent the volume (like C:\) before you can look for a colon. But the prefix can be something like \\.\C:\ or \\.\HarddiskVolume2\, or even \\?\GLOBALROOT\DosDevices\HarddiskVolume2\. Or it can be any mount point inside another volume! (Remember that feature inside Disk Management?)

Moreover you can't even assume the colon and alternate data streams are even a thing on the file system - it's an NTFS feature. So you gotta query the file system name first. And if the file system is something else with its own special syntax you don't know, then in general you can't find the file name and strip the last component at all.

All of which I think means it's impossible to figure out the prefix length without performing syscalls on the target system, and that the answer might vary if the mounts change at run time.


A stream name is somewhat more limited than that:

> All Unicode characters are legal in a streamname component except the following:

> * The characters \ / :

> * Control character 0x00.

> * A streamname MUST be no more than 255 characters in length.

>

> A zero-length streamname denotes the default stream.

https://learn.microsoft.com/en-us/openspecs/windows_protocol...


Oops, thanks for the correction! I must've seen this with other characters (most likely double quotes) and not realized slashes and backslashes are an exception.

Though ironically that still doesn't help you strip the last component, since it could still be a volume mount point. Like you don't want C:\mnt\..\foo to suddenly become C:\foo, just like how you don't want \\.\Server\Share1\..\Share2 to become \\.\Server\Share2, or for \\.\C:\..\HarddiskVolume1 to become \\.\HarddiskVolume1, etc.


> Moreover you can't even assume the colon and alternate data streams are even a thing on the file system - it's an NTFS feature. So you gotta query the file system name first. And if the file system is something else with its own special syntax you don't know, then in general you can't find the file name and strip the last component at all.

If the :stream syntax is not FS-specific then you can parse the data stream name out statically in almost every case. Yes, you have to work out the prefix, but you can mostly do that statically too, I think:

> In fact this might not even be the worst thing - it's even worse than this because you first need to strip off the prefix that represent the volume (like C:\) before you can look for a colon. But the prefix can be something like \\.\C:\ or \\.\HarddiskVolume2\, or even \\?\GLOBALROOT\DosDevices\HarddiskVolume2\. Or it can be any mount point inside another volume! Which I think means it's impossible to figure out the prefix length without performing syscalls on the target system, and that the answer might vary if the mounts change at run time.

The prefix of `\\.\C:\Foo:Bar` is `\\.\C:` as `C:` couldn't be a file name. The prefix of `\\.\HarddiskVolume2\Foo:Bar` is `\\.\HarddiskVolume2` because the volume name ends at the backslash. The prefix of `\\?\GLOBALROOT\DosDevices\HarddiskVolume2\Foo:Bar`... can be harder to determine but it doesn't matter because clearly there is no letter drive name in sight since a letter drive name would be... a single letter, but if the volume name were a single letter then it might require using system calls to resolve it (`\\?\GLOBALROOT\DosDevices\X\Y:Z\A:B` is harder to parse because X might be the volume name, or maybe Y: might be the letter drive and X might be part of the path prefix).


> If the :stream syntax is not FS-specific

It is, I believe, as I alluded to in the comment.

> `\\?\GLOBALROOT\DosDevices\X\Y:Z\A:B` is harder to parse

As in, this is impossible to do statically in the general case - those names aren't guaranteed to look like that. See the note I had added about mount points. Remember C:\mnt can itself be the mount point of a volume instead of a drive letter. (Junctions present a similar problem, but at least for those, you can make an argument that they're intended to look like physical folders, and treat them similarly. With mount points, you might not have that intention - you might be just trying to go over 26 drive letters.)


> It is, I believe, as I alluded to in the comment.

The FILE_STANDARD_INFORMATION_EX structure alludes to a common handling of alternateStream. Winbtrfs is a great resource on this, since it implements many bells and whistles from NTFS in an open way -- you just grep for a keyword and you will be close. The code exercising the Windows API for testing is src/tests /streams.cpp.

Grep on FILE_STREAM_INFORMATION in the source should provide more useful hits on the source, but phone browsers are clumsy.


Those back slashes were annoying to me since before I knew what they were doing. Whereas the forward slash always made sense.


What is an alternate data stream?


Any data stream which is not the first one.

Data stream is basically the file content and on NTFS a file can have more than one. In practice it is comparable to extended attributes in the Linux world but somewhat superior.

But like extended attributes it doesn't seem to have too much real world use. The only use case for alternate data streams I can remember are the "this file was downloaded from the internet, do you really want to run it" warnings. In such cases the browser attached a standardized marker as alternate data stream to the file.


Oh contraire :-) .Alternate data streams are widely used by virus writers and spies using them to exfiltrate data from foreign (to the spy) government and corporate Windows IT systems.

You think I jest ? Look up the leaked source code for the US government spy tooling. They hide data to be exfiltrated in an ADS on the root directory of the share :-).

I finally realized ADS were the mother of bad ideas when Ted Tso responded to me asking why I couldn't have them in Linux for the umpteenth time by showing me a Windows task manager screenshot of Myfile.txt as an actively running process.

If the ADS ends in .exe then Windows will happily run it :-).


That standardized marker is also known as the 'Mark of the Web' (MOTW) in case you want to search for more details about it.

In general, there is a tool which comes with the SysInternals suite that allows you to see which files have streams and their size:

https://learn.microsoft.com/en-us/sysinternals/downloads/str...


If you have macOS clients connecting to an SMB file share hosed by a Windows server they use alternate data streams to store resource forks - like fonts. Makes for a fun 'oh shit' moment if you go to zip up files on Windows to archive, then realize you're missing data when you later unzip as most compression applications don't keep them.


> they use alternate data streams to store resource forks - like fonts

macOS stores fonts in resource forks? I'm confused, what use does this have and what happens when you accidentally miss them?


> macOS stores fonts in resource forks? I'm confused, what use does this have and what happens when you accidentally miss them?

Classic MacOS considers fonts to be a type of resource, and hence stores them in the resource fork. Contemporary macOS fonts are just ordinary files with a data fork only. I think grandparent is talking about the 1990s, although some of those machines remained in active use through the first few years of this century.

Windows originally considered fonts to be a type of resource too – the original bitmap fonts used with Windows 1.x-3.x are stored as a resource–except unlike MacOS it embeds resources into EXE/DLL file data instead of putting them in a fork. In fact, a .FON file containing a Windows bitmap font is just an EXE with no code, only resources. Nobody really uses this any more, everything is TrueType now and TrueType uses its own file format not resources, but Windows still supports the old bitmap fonts for any legacy apps which still use them.


I originally thought you meant "macOS takes random fonts, stuffs them in resource forks for other non-font files, then bad things happen if the resource forks are ever lost" which makes zero sense to me.

Anyway... so macOS fonts themselves were made of resource forks and therefore trying to transfer fonts themselves across a non-resource-fork-supporting network share will fail? As in, the resource forks were needed in order to use the font file?


> I originally thought you meant

Not me. ajcoll5 made the statement, you expressed confusion with it, I tried to explain what (I assume) ajcoll5 meant.

> Anyway... so macOS fonts themselves were made of resource forks and therefore trying to transfer fonts themselves across a non-resource-fork-supporting network share will fail? As in, the resource forks were needed in order to use the font file?

On Classic MacOS, some files, all the actual contents is in the resource fork, and the data fork is ignored and can be empty. So you copy such a file to a filesystem which doesn't support resource forks, you can end up with an empty file.

A good example of this is executables. 68k Mac executables, all the code is stored in the resource fork (as code resources), and the data fork is ignored and can be empty. So you copy a 68k Mac executable to a forkless filesystem, you can end up with an empty file.

By contrast, PPC Classic MacOS executables, the code is in the data fork, and the resource fork only contained actual resources such as icons or strings, not the code. If you lost the resource fork, you'd still have the code of the executable. But it probably won't run without the icons/strings/etc it expected.

This was how Apple's original (1994) implementation of "fat binaries" worked. The data fork contained the PowerPC binary and the resource fork contained the 68K binary. PPC Macs would load and run the PPC code from the data fork, 68K Macs would ignore the data fork and load and run the code from the resource fork. If you only needed PPC support, you could shrink the executable by deleting all the 68K code resources from its resource fork.

The core resources of Classic MacOS were originally stored in a single file, the "System suitcase". Originally, each installed font was a separate resource in the resource fork of that file; its data fork was unused, except to store an easter egg text message. Fonts were distributed as resources in separate suitcase files, and the "Font/DA Mover" copied them from the distribution suitcases into the system suitcase. So yes, a suitcase file used to distribute a classic MacOS font, the actual font data would be in the resource fork, and the data fork could be empty. In System 7.1, Apple introduced a separate folder called "Fonts". In some MacOS versions (not sure when it was introduced, but definitely was there by System 7.0), Finder displays suitcases as if they were folders, even though they are actually resource forks.

Contemporary macOS doesn't really use any of this stuff. It supports resource forks for backward compatibility, but modern applications don't use them. The "Font Book" app can import Classic MacOS fonts (not bitmap ones, but TrueType and Type 1) from the resource fork of a suitcase file. But once imported, the fonts are stored in ordinary files (with a data fork only) on the filesytstem.


> Not me.

Eh, whatever. I originally thought whoever meant. can't edit the comment now.

> On Classic MacOS, some files, all the actual contents is in the resource fork, and the data fork is ignored and can be empty. So you copy such a file to a filesystem which doesn't support resource forks, you can end up with an empty file.

Yeah, that's about what I thought. That makes sense, thank you~


It's used for other things too. Like modern file compression (compact /exe:lzx).


Are you sure we mean the same thing?

You seem to talk about a specific command line argument of the compact command with a Windows typical (and IMO ugly) option style with '/' instead of '--' as option marker and ':' instead of '=' as option value separator.

But that would not be directly related to ADS and I cannot imagine a good use case where the compact command should use ADS.


Yes, look up WofCompressedData. It's the stream name ultimately used by that command.


Thanks, with this keyword I found https://devblogs.microsoft.com/oldnewthing/20190618-00/?p=10...

In context of ADS the first thing I imagined was storing the compressed and uncompressed file alongside. (which is rather silly, why compress at all)

This use case is also kinda strange. Have the compressed content as ADS, the primary contend filled with 0 as sparse and fill it when needed/accessed. :/


And used to be used to hide malicious software back in the early days.


C:\foo has a default (primary) data stream; the name of that stream is empty, so it's omitted entirely when writing the name. But the file can also have C:\foo:bar on NTFS. It's a different stream that's part of the original. (Look up "NTFS ADS" or just "NTFS streams".) These are often used to store information tied to a file that shouldn't affect the file contents.


In the late 1990s, there was a bug in MS IIS where if you requested http://example.com/page.php , it would execute the PHP script, but if you requested http://example.com/page.php: , it would give you the PHP source code. Even more than today, it was common to hard-code database connection details, including passwords, into the source code.


One thing that make Windows paths wired is that Windows API, NTFS and most Windows tools have different restrictions on file paths.

NTFS would accept almost anything. The Windows API (I think of the old Win32 one) would apply most restrictions the article mentions.

But for example not the normalization part. A filename can end with a space, no problem. That lead me once to a minor bug in .NET Framework. One of the path related functions (I think it was Directory.move) did not correctly apply this normalization and could produce directories with trailing whitespace. Good luck removing/fixing those in Windows Explorer.


>NTFS would accept almost anything.

So for the longest time Adobe software had random bugs where it would create a series of folders name "Application Data" repeating recursively 3000+ characters deep.

Yea, that was fun to try to delete.


Sounds like it wasn't handling junctions correctly. I wonder what obscure/ancient code caused that.


The bit about allowing / as a path separator is one of my favorite bits of DOS/Windows trivia. As a unix guy it's fun to give a windows person a path with the slashes wrong like "z:/foo/bar", being corrected for a unix-ism, then having it actually work!

In practice I think the biggest problem with using forward slashes on Windows is confusing programs which expect "/" to indicate program switches. The non-uniformity of shell parsing is also a big unix/win design difference.


yeah almost no one knows that forward slashes have been acceptable as path separators since (I wanna say) Win95. perhaps MSDOS.


It doesn’t work everywhere. For example tab completion in cmd.exe doesn’t work for a path containing forward slashes (even when quoted), because forward slash is the prefix character for command-line options.


right but that's a cmd.exe thing, not a Windows thing.

Windows supports it, CMD doesn't. programs that you run from a CMD prompt support other options flag syntaxes, so it's just a cmd.exe feature.

CMD.exe is its own thing with its own backwards compatibility requirements and the case could be made that cmd.exe is "Windows" as much as anything else is, so I get it.


The point is, you can’t just blindly use forward-slash as a file system path separator everywhere on Windows. It’s not on equal footing with backslash in that respect.

As another example, you can’t use forward slashes in the File Open dialog of Visual Studio: https://developercommunity.visualstudio.com/t/allow-forward-...


That's incorrect. You're confusing totally separate issues by examining specific pieces of software with product specific bugs. This isn't a valid way to examine the issue: by this metric, spaces aren't supported on unix because many programs choke on them.

In fact, you can use forward slashes across the entire file API on Windows. That's the point.


I'm viewing this from the end user's perspective. They can use backslashes everywhere as a path separator, but they can't use forward slashes everywhere. In that sense, forward slashes are in practice a second-class citizen on Windows. The canonical path syntax is and will remain with backslashes.


I'd qualify that as "almost no non-programmers know". Forward slashes are so useful in languages that use \ as an escape sequence that most programmers do know this.


It actually predates MSDOS, and I believe dates back to PCDOS 2 when support for directories was first added.


Since MS-DOS 2.0.


Can you mix the two in one filepath though? '/' and '\\'?


Yes. You can even write stuff like “cd foo/\bar\/baz”.


Depends on the interpreter, batch and PowerShell accepts both.


I had to double-check, but I ran into some issues at work where .Net Framework got confused if I used both separators in a path and used ".." to try to access the parent directory.


> UNC paths have a peculiar way of indicating the drive letter, we must use $ instead of :.

I don't believe that's true, I am almost positive they're SMB shares, just like any other, but are created by the system, which is why "accessing drives in this way will only work if you’re logged in as an administrator."


The dollar sign indicates that the share is 'hidden' and can't be enumerated by traditional means. The C$ share is created by default and provides root level access to the system drive, and is locked down by default for this reason

you are correct that they are just SMB shares like any other. They can be removed, though many management processes across different applications assume that those shares will be present


In UNC paths you can append “$NOCSC$” to the hostname to force the client to bypass the “Offline Files” cache. (There are probably other wild undocumented bits like this one hiding in other places in the Windows stack.)


Do you happen to have a source where you learned that? I'm always interested in "teaching myself to fish"


I don't recall. Like the other reply to you says, these get leaked in support, etc. I'll also run "strings" or even Ghidra on closed-source binaries when I'm troubleshooting issues. There's usually good fun to be had from Microsoft binaries doing that. I've discovered undocumented debugging switches, registry entries, etc.

(In version 10.0.19041.985 of cscsvc.dll in Windows 10 I'm seeing the string "If you hit this breakpoint, send debugger remote to BrianAu." Presumably that's "Brian Aust", referenced in a chat[0] re: Offline Files.)

[0] https://techcommunity.microsoft.com/t5/storage-at-microsoft/...


Lots of this stuff we just find out while working various deep MS cases, and then the info just leaks out.


I knew windows filesystem layout was super bonkers when I had to explain to fellow devs that on a 64-bit machine, you put the 32-bit libraries in SysWow64, and the 64-bit libraries in system32.


This is a great article and really illustrates just how hard Windows works to be backwards compatible.

Lots of these (eg: the COM/LPT stuff) could be dropped and wouldn't affect most people either way, but for those things depending on it, it would be a profoundly breaking change.


Newest versions of Windows do let you use these names.


How new?

'echo foo > COM1' returns 'The system cannot find the file specified.' on Windows 11. (Machine doesn't have a COM1; if this wasn't being redirected to the port, it'd have gone into a file of that name.)


You can now use ".\COM1" or "COM1.txt" but not a bar COM1.


Thanks! That's good to know.


I think what's missing from this discussion is an emphasis on how layered Windows paths are.

The Win32 paths are like an emulation layer. They parse the given path and produce a kernel path. Win32 implements all the weird history you know and love as well as things like `.` and `..`. You can use the `\\?\` prefix to escape this parsing and pass paths to the kernel.

The NT kernel has paths like `\Device\HarddiskVolume2\path\to\file`. NT paths are much simpler. There are no restrictions on paths except that they can't contain empty components (notably they can contain nul). At this layer, `.` and `..` are legit filenames.

However, it's the filesystem driver ultimately says what's a valid filename and what isn't.


You know, I’ve never tried. Can you use the device paths in explorer without assigning a drive letter?


No, not really. It only supports "normal" win32 paths. You would have to mount it to a drive letter or directory.


A lower-level, more security oriented look at some of the same issues: https://googleprojectzero.blogspot.com/2016/02/the-definitiv...


> Say you, for whatever incomprehensible reason, need to access a file named .., which would normally be resolved to the parent directory during normalisation, you can do so via a literal device path.

Oh no. No. Windows allows files to be named `..`?!


Maybe? https://i.imgur.com/ebo4Nd8.png

But seriously: no, at least not on NTFS. This filename does have trailing space. Though it is enough to defeat Explorer, you cannot move or delete it and properties window is broken.


There are even more "strange" cases, JFYI:

https://msfn.org/board/topic/131103-win_nt~bt-can-be-omitted...


Preferring a minimal look (and being immature) my desktop shortcuts for "This PC" and "Recycle Bin" have been renamed with two of the many invisible characters that windows allows.

I also routinely use single extended unicode characters as root folder names and identifiers for various purposes.

Using a search programme 'Everything", it's a lot easier to find things if I use something like pilcrow symbol as the root folder for any directory dedicated to text documents, when the alternative is to wade through results for 'documents', 'text', 'reading' or any combination of those words.

For the same reason, I find I can make much more memorable associations. It helps me harness things relationally. I can preserve uncertainty and avoid the frustration and negativity of trying to make shades of grey and rose fit black and white patterns. It does sound a bit new age, but there's no doubt in my mind, flat heirachical alphanumeric patterns are restrictive, prescriptive, insufficient. For example, a lot of artists actively work to defy pidgeon holing. I still need identifiers.

I mean, even if I wasn't into 'bleeding edge' culture, restrictions, problems and frustrations are the normal experience. I think this is illustrated by the unsatisfactory experiences that people find when they try to make id3 tagging "work".

It's as close as I can get to banishing the pervasive 'what-if' heartbreak of WinFS being cancelled. Sadly it doesn't help at all make up for what 'Semantic Web' promised. But that's probably why I'm a believer in GPT and the like.

Is it just me that can't help thinking they are products that have arisen from the need to make non-semantic computing useful again?


Yes you can create files named `.` and `..`. However, any sensible filesystem driver will reject that name (spoiler: there does exist drivers that aren't sensible).


I can't find a way to create one - so if Windows allows that, it's begrudgingly at best.

You can put `..` earlier in the file name, though.


> Windows allows files to be named `..`?!

Maybe not?

Under unix, if you create a symlink to a directory, e.g. `~/syslogs` is a symlink to `/var/log`, then `..` can be used to traverse the "true" parent directory. So `~/syslogs/../lib` will traverse `/var/log/..` and refer to `/var/lib`, not to `~/lib`.

However, a "normalising" path interpreter will just take something like `~/syslogs/../lib` and change it to `~/lib` without consulting the filesystem.

Given that (AIUI) Windows has supported symlinks for a while now (?), it's possible that files called `..` aren't actually allowed, but the ability to access `..` is still necessary.

(Notably, the article does point out that filenames ending in `.` are disallowed - which should exclude `..` as a name one can give a file.)


Seems like that violates the "can't end in a period" rule


Just using Cygwin will show that file names can indeed end in periods (and spaces). The article is very much restraining itself to the standard limitations imposed by the Win32 API, but not what the operating system actually allows. Case sensitivity has always been a thing, since Windows NT 3.1, for example; the "forbidden" characters are not so forbidden with the right file access APIs.


No coverage of this nonsense would be complete without also mentioning that CON, AUX and PRN and a couple of others are verboten as filenames in Win. Although apparently you can defeat this via e.g. \\?\C:\con


The article mentions this in the Disallowed Names section.


One of the early stupid annoying teenager programs I wrote was a tool that would spam your desktop with CON.001, CON.002, and so on through the \\?\ trick.

Windows explorer could not delete the file. You have to specify the \\?\ path to get the delete call to work, but that didn't work well with cmd.exe's `del` command.

I've since used these files to create directories that can't be deleted by automated cleanups and such, like a special folder in %TEMP% that one program needed but didn't create on its own.


The 260 characters limit has been the bane of my existence. Even though it can be disabled in the registry there are gazillions software built against old APIs that will still not work. You also get really odd bugs when that plague hits you.


Needs mention of auto-generated short file names (8.3 alias, typically with ~ in them) on volumes that don't support long names


Not only there. Windows originally created them on disks that supported long names, too.

That was necessary to support the use case where an older OS tried to read the disk (could happen because the user rebooted into an old DOS, for example, or if an external disk was moved to a different computer)

https://en.wikipedia.org/wiki/8.3_filename#VFAT_and_computer...:

“VFAT, a variant of FAT with an extended directory format, was introduced in Windows 95 and Windows NT 3.5. It allowed mixed-case Unicode long filenames (LFNs) in addition to classic 8.3 names by using multiple 32-byte directory entry records for long filenames (in such a way that only one will be recognised by old 8.3 system software as a valid directory entry).

To maintain backward-compatibility with legacy applications (on DOS and Windows 3.1), on FAT and VFAT filesystems an 8.3 filename is automatically generated for every LFN, through which the file can still be renamed, deleted or opened, although the generated name (e.g. OVI3KV~N) may show little similarity to the original. On NTFS filesystems the generation of 8.3 filenames can be turned off. The 8.3 filename can be obtained using the Kernel32.dll function GetShortPathName“


Right. Back in the 90s I worked on a network server to allow AppleTalk clients into DOS or OS/2 based networks. The Mac users enjoyed their filename freedom but the PC clients had trouble with the super-weird 8.3 short names. You couldn't really tell what the Mac filenames were.

The other direction worked great, though, DOS filenames always worked on the Mac side of the network.


"Legal" characters is a fun topic.

I realized at some point that there is a discrepancy between what's allowed on the file system, and "Windows" itself (or, more exactly, the programs running on Windows and using its APIs to communicate with said file system.

In this case, NTFS, totally allows for "illegal" characters such as < > : " | ? * etc... pretty much everything except / and \, and \0, I think.

This makes for funny situations, where sometimes Windows programs cannot deal with that. At best, they can't read, write or rename them... at worse they'll crash, which is always fun.


This got me to try out out the Fileside app (an Explorer/Finder alternative) from the blog post author. It's available for Windows and Mac.

I think this is an interesting space with room for innovation.

Fileside starts out with a grid of four directories: Home, Documents, Desktop and Downloads. You can customize and name new grid layouts that are shown in a sidebar for quick switching. This seems like a neat idea for specific recurring manual workflows.

It's doesn't seem to be targetted to the minimalist crowd. Directory entries beginning with a dot are visible (but greyed out). Full Unix-style permissions are shown for each entry, etc.

It looks like it's Electron-based and implemented in a javascript SPA framework. It doesn't use the default system font (SF Pro) on Mac. A bunch of other things also don't look or behave as you expect.

The font weight in the size column maps to each file's relative size. All the way from very thin to very bold. Kind of cute.

The path completion seems pretty good - as could be guessed from the blog post.

I think this app sometimes confuses power with details/verbosity. There are some gold nuggets in there though.


the worst thing about windows paths is how unintegrated it all is. You can navigate into all kinds of weird paths in the win32 com shell (file explorer), which is itself possibly the pinnacle of executed design MS ever achieved. But those paths you build.. You can't copy them to the clipboard, you can't serialize them, you can't move them between various tools, and particularly not to the command prompt nor to socalled Power"shell. If there ever was a continent of independent fiefdoms, windows is it :-/. If you don't know what i am talking about, try navigating to your android phone's image folder in file explorer. Next, then try to USE that path in powershell or cmd, to copy those image files.. good luck. there must certainly have been some moron in charge to make SURE things couldnt interoperate on windows.. in spite of them having explorers design.


These path shenanigans get even trickier when invoking WSL or Cygwin.

Here's an alias from my Cygwin .bashrc (which took me way too long to figure out) where both the *nix and Windows style paths are invoked:

alias ms='/cygdrive/c/Windows/System32/OpenSSH/ssh.exe -A -i 'C:\Users\me\.ssh\mm-id_rsa' me@myserver'


Or MSYS2. Other tricky variants is using Windows commands (which typically use '/' as option marker), e.g.

  $  tasklist /V
  ERROR: argument/option invalid - "C:/msys64/V"
or the time I wanted to use xmlstarlet and any xpath expression was interpreted as Windows path :(

At least for MSYS2 the environment variable MSYS2_ARG_CONV_EXC can be used to prevent conversion. https://www.msys2.org/docs/filesystem-paths/#process-argumen...


I believe it’s because it’s executing a Windows program (ssh.exe) located in Cygwin’s mount for your C: drive and that program therefore expects Windows-style paths.


I know now :)


The period as the final character limit has gotten me when copying albums to a fat32 drive. Turns out a few album names end in a period. :-/



There is also this can of worms regarding translations. For example C:\Users will be shown as C:\Benutzer on a German system but still be C:\Users on the FS. "C:\Program Files (x86)" will show up as "C:\Programme (x86)" BUT "C:\Program Files" will stay the same (non translated).

(I forgot how this topic is called though and on what layer it takes places.)

Windows XP already could do it, but didn't for the most part. I remember, like ~18 years ago, I was a sysadmin. One user out of 50 got his "My Doc/Pictures/etc." in English but for me on the File-Server it was all in German. Very confusing.


It's the shell (i.e. Explorer) showing a localized name. Although I think in Windows XP it was still baked into the language you installed Windows with and stuck with that.


> That’s what sticking to a policy of total backwards compatibility for several decades gets you!

Kind of? Surely unix file path conventions are of a similar age (if not older), yet somehow they seem to exhibit much less weirdness..


That's because the Unix convention is far simpler: everything except / and \0 (the null byte) is allowed in filenames, which are also case-sensitive (exact byte match).


Wow, incredible write up. Not only did I learn a lot - but it was also the perfect product pitch.

The “and this is why fileside exists” transition at the end - perfect. No “sign up” or anything. Awesome!


Amiga used to have the best file-path convention I've ever worked with: drive:dir/file where driver was not a letter like in Windows but rather the drive's label, so you could have Work:Pictures/HN.jpg.

To go up to the parent directory you had to use an additional slash. So /xxx was the equivalent of Windows ..\xxx, and you could add more slashes to go further up in the directory tree: Work:a/b/c/d////file was the same as Work:a//file.

Drives could be "virtual" ones, similar to "bind" mount points in Unix, that could be associated with multiple positions. E.g. you could assign both System:Libs and Data:MyLibs to the virtual drive LIBS: so that LIBS:xxx would match a file called xxx in either directory.

Files used to have a comment field to store kind of extended attributes, but was seldom used IIRC.

Wildcards were quite unique, I think ? was like regexp . meaning any char, and # like * but prefixed meaning any number of the _following_ char, so that #? would match anything.

I'm sure there were other niceties I can't remember right now.


Ahhh, the memories of naming my folder AUX and then having my Uni professors with administrative privileges not being able to access my files on that folder. Drove them crazy because they thought there is the secret of the Universe hidden there and they demanded I let them see what's inside, when in reality I only wanted to show off and had nothing at all. Novel Netware on top of DOS, year is 1994 - good times.


eh some of these are slightly wrong but yeah it's weird.

MSDOS included some compatibility for CP\M and everything since has maintained compatibility with the version prior, except for a few exceptions.

so even today we have compatibility built into Windows for things that don't exist anymore in any real capacity.

Microsoft is very serious about backwards compatibility.


Old Microsoft was very serious, young blood Microsoft not so much.


I am not sure where you are coming from with that comment. Microsoft hasn't broken backwards compatibility on anything that I'm aware of in some time.


.NET Framework to .NET Core transition, Xamarin.Forms to MAUI, XNA, the multiple rewrites on the WinRT platform since Windows 8, Windows 8 to 10 users pace drivers framework, .NET Native, C++/CX, Win2D, WinRT for XBox replaced by WinGDK, .NET Sharepoint Framework via a JavaScript one,


exactly zero of those things break backwards compatibility.

programs written using ANY of those technologies run on Windows 11 unmodified, and they will for Windows 12, too.

backwards compatibility means new OS versions can run programs written for older OS versions.

backwards compatibility promises do not prevent you from coming up with new ways to write programs.


Except when the old way no longer works, the bugs don't get fixed,tooling disappears from Visual Studio.

I am on the Microsoft ecosystem since MS-DOS 3.3.


I still don't think you know what backwards compatibility is. you are saying anything necessary to shut me up and concede and this point, and since you are confused about backwards compatibility, you are not correct.

backwards compatibility is not about keeping all features once supported in visual studio in all future versions. that is forwards compatibility. Microsoft does not do that.

we are talking about backwards compatibility: the ability of new operating systems to run software unmodified which ran on old versions of the same operating system.


The part about trailing periods and spaces being disallowed isn’t quite correct. On an NTFS drive, you can actually do

  echo > \\?\C:\path\to\file.
and

  echo > "\\?\C:\path\to\file "
Similarly, files with such names can be created with Cygwin.


"On any Unix-derived system, a path is an admirably simple thing: if it starts with a /, it’s a path. Not so on Windows."

Then, the author goes on to explain that every path on Windows starts with "\\" instead. :)


On Ruby, '/' and '\' function identically, presumably due to normalization underneath.

    File.join(too_long_identifier, some_other_long_identifier).gsub('/', '\\')

is often useless and nauseating to read when

    "#{dir}/#{filename}"
is portable enough for most of Ruby's File APIs.


Here is an excellent article about Paths on Win32: https://googleprojectzero.blogspot.com/2016/02/the-definitiv...


Ultimately, the thing that really distinguishes Linux from Windows is file paths.


This might be some run-of-the-mill weirdness, but I was using a Microsoft file globbing library recently (https://learn.microsoft.com/en-us/dotnet/api/microsoft.exten...) and it handed back filepaths with forward slashes (instead of double-backslash), even on Windows (and in .Net Framework). I don't know if this is a library someone developed for .Net Core and forgot that it also was going to be used in .Net Framework? Anyways, another reason I don't like the occasional dip into Windows I have to do at my job (which is 80% mac/linux).


It'd be nice if Microsoft read this list and adjusted their software, like perhaps File Explorer, to be able to read and write this data. Or at least delete it.


> Disallowed characters

> < > ” / \ | ? * Never allowed

A friend once somehow crated a file called <HTML>.

I don't know how, but he also couldn't delete or do anything with it.


A long time ago I found a hidden directory on my file server that some student had created to store their pirated software. This was back in the days of DOS and Novell NetWare.

Turns out you could create files with "illegal" (and invisible) characters in the filename. The standard OS utilities would not allow them, but the underlying file system did not care. So you could write a short program to do it.

I had to write a utility just to delete it.


They should have fixed that when they went to long file names. It's ridiculous that you can't name a file with its contents' actual title. Random example: http://doi.org/10.1145/327070.327153


There are two possibilities here...

Connecting from another operating system that allowed names like that to be corrected to a Windows share.

There are a few other possibilities where you boot to Linux using a FS driver for NTFS that allows you to create illegal file names. And/or odd things like WSL/Cygwin.


Possibly the most annoying thing about windows w.r.t. this topic is you need a large physical C drive to accomodate future windows bloat. It is very hard to get anything installed to put data on another physical drive. It is impossible to extend C to another physical drive. Realistically you need a 1TB SSD/NVMe as your primary drive. So if you get a laptop you then usually need to get a high end one to offer you that.


If they could slowly start adopting Unix file paths and slowly phase out Windows file paths, I think more people would start to use Windows. I would love to want to use Windows. It's a platform that has first class hardware support and paid support, and it's designed (well, in theory) with users and application platforms in mind. And it actually has a bunch of advanced features that Linux and Mac doesn't have.


Windows was built with a POSIX layer; it already does this.

"Broad software compatibility was initially achieved with support for several API 'personalities', including Windows API, POSIX, and OS/2 APIs – the latter two were phased out starting with Windows XP."

https://en.wikipedia.org/wiki/Windows_NT


Why would it matter? 99% of these are effectively edge cases and feature-wise they're pretty equivalent.


people don't choose to use Windows or not based on the file and path syntax.


you can create files and folders with illegal names like '.' or '..' with python. comedy ensues if you try to recursively go into . directory and explorer crashesh -_- . despite windows having rules, appearently not everything needs to play by it despite giving these expectations


GitHub Actions in 2023 has issues with checking out a repository with 256ish path names on Windows runners. whups


Add \\wsl$ to the list of "weird paths". This one fits in the general schema the article is about but has the special meaning of "your files in the Linux VM hiding in the Windows box" (aka WSL). It's running as a Plan9 network filesystem.


Is it a weird path? It's just a simulated computer with hostname wsl$. It looks like this is just a network share server running on the loopback interface. Similarly, \\wsl.localhost\ will also show you your WSL files. The dollar sign has some weird special meaning to the Windows security system (trusted account, local computer, etc., in AD it's related to this mess: https://techcommunity.microsoft.com/t5/core-infrastructure-a...) but it's essentially just a standard Windows UNC paths.


$ sign just really means "hide this", though it's commonly used on administrative shares.


Starting with Windows 10, it also supports Nix style file paths too


Can this tool help me identify duplicates in my photo library?


One of these tripped me up, the "Disallowed Names" section has a bunch of not-too-wordlike names, except one.

When I made carefulwords.com, which I made because I wanted a thesaurus where you could just write eg https://carefulwords.com/book for "book" and get the results, I found out the hard way that you cannot make a file named "con" on Windows. Or "con.html", or any file extension. You can try to force this, make it via a script, but then programs like Git will hang when they come across it. So in my thesaurus the actual page is /con-word.html and I just have it rewrite to /con


This has actually changed in Windows 11. You can use "con.html" without fear. "con" is still a bit of a problem. ".\con" will work but not a bare "con".


It's not really a new version of Windows if it does not introduce a new variation of universal path addressing to end all variations of universal path addressing.


Was Dave Gahan left off purposely?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: