
Git for Windows accidentally creates NTFS alternate data streams - latkin
http://latkin.org/blog/2016/07/20/git-for-windows-accidentally-creates-ntfs-alternate-data-streams/
======
smhenderson
_The root cause of all this is a relatively obscure NTFS feature called
alternate data streams._

Obscure indeed, I've never seen them used for anything other than hiding
malicious content. Curious, I read about them on Wikipedia[1] and it turns out
they were originally created to support resource forks in Services for
Macintosh. Browsers also use them to flag files downloaded from the internet.

[1]
[https://en.wikipedia.org/wiki/NTFS#Alternate_data_streams_.2...](https://en.wikipedia.org/wiki/NTFS#Alternate_data_streams_.28ADS.29)

~~~
_wmd
Hardly obscure, every modern OS has an equivalent feature, but only OSX and
Windows unify it with the regular filesystem API.

Streams and resource forks are a play on a now-standard UNIX feature that
almost nobody uses because it has a shitty non-file based API that also breaks
most tools unless they are specifically aware of them: extended attributes.
Resource forks and extended attributes are almost equivalent in every single
way, except that extended attributes can only be read/written atomically
(limiting their size to strings that will fit in RAM), whereas a fork or
stream can be opened like a regular file. Stick that in your pipe and smoke
it, UNIX sycophants, another case where Windows is more UNIX than UNIX ;)

The file-or-directory vagueness created by the hierarchy of resources buried
within a file also more closely maps how the most popular path naming scheme
on the planet (URLs) work: an URL can always represent both a file and a
collection simultaneously, so I see this as closer to an ideal than the
alternative where files can have no children at all. Sadly nobody actually
uses these APIs like that, because all our tooling sucks so bad at coping with
it. I sometimes wonder what the world would look like if directories on
popular operating systems had simply been made 0 byte files

~~~
rwmj
You missed:

\- Unix xattrs have a terrible API and awful command line tools: listxattr(2)
returning \0-separated character arrays with lists of attributes that are next
to impossible to decipher in C? - check! Hiding certain xattrs by default
based only on their names? - check!

\- xattrs have magical qualities based on their names, the kernel version, the
kernel configuration, and the filesystem mount options (eg.
"security.selinux", "trusted.*")

\- Some xattrs are \0 terminated (and the APIs set and return the \0 making
them very awkward to use from shell scripts), some don't, and some are
indeterminate. They can also be binary blobs.

~~~
nailer
Is \0 ASCII null?

~~~
rwmj
Yes. For more details see [http://man7.org/linux/man-
pages/man2/listxattr.2.html](http://man7.org/linux/man-
pages/man2/listxattr.2.html)

------
kazinator
The colon has been special since the dawn of DOS. For instance, you cannot use
"con:" as a file name. (In fact, in a fit of extreme stupidity, DOS also
claimed some devices with no Colon suffix, like "con" and "prn", effectively
making these into globally reserved names in any directory.)

Stock Cygwin does something special with the colon character, so the Cygwin
git shouldn't have this problem. A path like "C:foo.txt" is not understood by
stock Cygwin as a relative reference in the current directory of drive C; the
colon is mapped to some other character and then this is just a regular one-
component pathname.

In the Cygnal project (Cygwin Native Appplication Library), paths passed to
library are considered native. So that certain useful virtual filesystem areas
remain available, I remapped Cygwin's "/dev" and "/proc" to "dev:/" and
"proc:/", taking advantage of the special status of the colon to take this
liberty. You can list these directories (opendir, readdir, ...) and of course
open the entries inside them; but chdir is not allowed into these locations.
(Unlike under stock Cygwin, where you can chdir to /dev). chdir is not allowed
because then that would render the library's current working directory out of
sync with the Win32 process current working directory, which would not be
"native" behavior.

~~~
naz
I remember when any attempted access to c:\con\con would bluescreen any
windows machine. Hours of teenage fun sending people to a website I'd set up
with <img src="file://c:\con\con">

~~~
tonyarkles
You could do it over Windows File Sharing too! \\\someone-elses-
machine\c\con\con would blue-screen _their_ machine!

------
Someone
It's not alone. In MS SQL Server, you can name a database "foo:bar". If you
give a database such a name when you restore it from disk, you'll find that
the database takes zero bytes on disk (at least, that's what Explorer claims)
Your disk space is gone, though.

~~~
_nedR
What? You are saying windows explorer doesn't handle this feature properly?
Thats insane.

~~~
exceptione
I think that is the correct behaviour though. The default is left empty in
this case, so it should indeed be zero bytes.

Keep in mind that for each file you can have multiple data-streams. Suppose
the system reports the total of al the streams for foo combined... You would
be surprised if you would read the reported number of bytes from foo and see
it crash because there are in reality no bytes in the default stream.

However, there are other tools to report the presence of alternative streams.
This is not a feature intended for casual end-users.

~~~
mietek
The user should not have to use a third-party tool to interact with a feature
which is always present in the core OS.

The principle of least surprise applies here: it’s surprising for a user to
find a seemingly-empty file, especially if they expect the file to contain
valuable data.

Clearly, Explorer should make the presence of multiple streams obvious to the
user.

------
duncans
Related to this bug: used to be a vulnerability in IIS back in the late 90s
where you could append ::$DATA to a file name (e.g Foo.asp::$DATA) and
download a server-side script's source code.

~~~
jameshart
Related - meaning the ::$DATA was interpreted as a request for an alternate
data stream from the file, and then read the default stream?

~~~
duncans
More info [https://technet.microsoft.com/en-
us/library/security/ms98-00...](https://technet.microsoft.com/en-
us/library/security/ms98-003.aspx) \- seems to imply that $DATA is the default
stream.

------
Grue3
I had a related problem with Dropbox. Some files uploaded from my Linux
machine were not synced to my Windows machine. Later I narrowed down this
problem to images being saved from Twitter, which have URLs ending with
":orig". On Linux, Firefox happily saves such images as "blahblah:orig.jpg",
whereas on Windows it uses space instead of a colon. And of course Dropbox on
Windows would completely ignore filenames that contain colons and tell that
the directories are synced, when they obviously aren't.

~~~
Ieyeefae
There's
[https://www.dropbox.com/bad_files_check](https://www.dropbox.com/bad_files_check)

~~~
reycharles
I get hit with a login page. Can anyone describe what is linked to?

~~~
jmiserez
It's just a link to
[https://www.dropbox.com/help/145](https://www.dropbox.com/help/145)

(No login needed there.)

------
artifaxx
That is quite the obscure and interesting issue to run into! Who puts colons
in their filenames though? I haven't ever seen that used...

~~~
Tharkun
Why wouldn't you put colons in filenames? Unless of course you use Windows.
Colons, spaces, backslashes, whatever.

~~~
7952
Artificially restricting filenames is an antipattern that makes things harder
to read.

I use software that sometimes (but mostly not) needs files in dos 8.3. Because
of this people seem to think it a good idea to use really short acronymed file
names as a matter of course. If it makes sense to use a special character then
people should be able to.

~~~
xg15
This would be the way to go if all program invocations consistently used some
kind of common, higher-level data structure a la powershell. As long as
programs rely on parsing their command line according to some syntax the
developer just made up, I'm very glad there are commonly agreed sets of "safe"
and "unsafe" characters. Dealing with shell escapes is horrible enough today
(e.g. quoting rules under windows, filenames that start with a dash under
linux...) and this would make matters even worse.

~~~
wahern
Quoting is a much bigger problem than differentiating flags from paths, and on
Unix that's a solved problem: the shell always handles quoting, and unix
programs only expect a list of words which can contain arbitrary characters
(except NUL, of course). If you invoke a program directly using the exec-
family of syscalls, you don't need to quote anything.

Whereas AFAIU Windows programs expect quoted words to be passed via main(),
and must parse them. The only benefit is that you can disambiguate a filename
with a dash (or slash) based on whether it was escaped, but that's a quite
rare necessity, and of course still relies on the caller quoting them. (Does
cmd.exe quote pathname expansions?)

The dash problem is also solved as long as programs use getopt() or
getopt_long(). First, getopt() knows which flags take arguments and which
don't. Knowing this, if a flag takes an argument it doesn't matter whether the
argument begins with a dash or not. One consequence is that there's no such
thing as an "optional" argument to a flag when using getopt and friends, as
that ambiguity cannot be handled cleanly. People who roll their own argument
processing code just so they can get "optional" arguments to flags invariably
don't appreciate the security problem.

Second, a double-dash (--) terminates the argument list. getopt stops
consuming command-line arguments at that point, and optind will index the
first non-flag argument. So if passing a list of filenames to a command, the
correct idiom in Unix is something like, `foo -- /path/to/*`. Of course, that
presumes that the foo is using getopt or getopt_long, or a compatible argument
processing implementation. Fortunately the vast majority do.

Smart programmers should rarely if ever roll their own argument processing
code. Any headaches (real or imagined) related to a mismatch between the
semantics offered by getopt and what the application might want is usually
dwarfed by the usability and security benefits of adhering to the system
facilities.

On a related note, I've always disliked the way GNU's getopt and getopt_long
permuted (reordered) argument lists. I have an inkling it could introduce
needless security issues, though I haven't thought it through carefully.

------
mcculley
This is interesting. I was just recently working on an app where I wanted to
ensure the UI wouldn't accept problematic characters in filenames. Obviously,
Unix has problems with '/'. I'll add ':' to the list. That's unfortunate. What
else should portable apps avoid?

~~~
warbiscuit
This is cribbing from source of a filename sanitizer in one of my company's
internal libraries. The function is a little... paranoid... so I'm not
positive all of these are actually forbidden.

/ and 0x00 for unix

:?"<>/|\\* and chars 0x00 .. 0x31 for windows

'~!#$&%^; if there's a chance of filename being passed to shell w/o proper
escaping.

Windows also forbids a bunch of filenames matching regex
"CON|AUX|PRN|NUL|COM[1-9]|LPT[1-9]"

Also, ending filenames with space or period _really_ messes up windows. File
explorer can see it, but can't delete or rename it.

 _edit: fixed markup_

~~~
DeltaWhy
That should be NUL (one L). Interestingly, when I tried it in Powershell,
'type NUL' reports that the file does not exist, but in CMD, 'type NUL'
outputs nothing (it's the DOS equivalent of UNIX's /dev/null). So apparently
some APIs will allow you to use those as filenames while others will choke on
them.

~~~
jstarks
It is likely that the .NET base class libraries block NUL and the other
special file names that Win32 supports. This would explain why PowerShell
(which is written in C#) behaves differently from cmd (which is written in C).

------
AWildDHHAppears
MacOs (i.e., Os9 and before) had special meaning for colons, too. I wonder
what would happen for git on those platforms.

Edit: Apparently colon is _still_ a special character on Mac!
[http://stackoverflow.com/questions/13298434/colon-appears-
as...](http://stackoverflow.com/questions/13298434/colon-appears-as-forward-
slash-when-creating-file-name)

~~~
lostlogin
And this is how we enter the new era. It goes MacOs, OS X then macOS.
Unfortunately the 10.xx has been kept to mess with what is (capitalisation
aside?) a nice tidy up. Maybe dropping the names part, Sierra, would have made
it better. Relying on readers to spot your capitalisation isn't ideal at all,
and what if you start a sentence with macOS, how do you capitalise it?

~~~
kalleboo
To be super pedantic, it went

1\. "Macintosh System Software"

2\. "Mac OS" (starting with 7.5/7.6)

3\. "Mac OS X"

4\. "OS X" (starting with Mountain Lion)

5\. "macOS" (starting with Sierra)

------
jorangreef
The flip-side of this:

I was running a fuzz test on a backup tool, which verified that file data and
metadata (including timestamps) as reflected by Windows were exactly as
produced by the fuzz test.

I noticed that for some ".eml" files this was not the case. The mtime of these
files was being modified by something else after the initial create by the
application. At last, it came down to a Windows process which was
automatically indexing ".eml" files and creating an ADS for each of them,
thereby touching the mtime.

This was intentional on the part of Windows, but I never saw it coming.

------
xg15
The problem should be addressed, but the proposed workaround seems strange. So
git should refuse to write the file to disk? How am I supposed to use a git
repo that contains such problematic files on Windows then?

~~~
Ruud-v-A
What is the alternative? Renaming the file?

This was actually an issue with early versions of Servo on Windows: cloning
the repository would fail because it contained a file with a # in the name.

[https://github.com/servo/servo/commit/43c999905c01627133240c...](https://github.com/servo/servo/commit/43c999905c01627133240cbb3efe4aef0149abd9)

~~~
SpaceManiac
MSYS2 (a Cygwin-based platform) does renaming, mapping colons to U+F03A from
the Private Use Area (which renders in Explorer like a bullet point). Its git
package cloned the repository from the article with no problem, "ls" shows
"foo:bar", and "cat foo:bar" works. Opening the file in non-MSYS tools also
has no problems with the exotic character.

------
sickbeard
putting colons in your filenames are almost as weird as alternate data
streams.

~~~
cordite
It's not a forward slash or a NUL byte. And it is a printable character.
Doesn't seem so wrong to me.

~~~
cygx
It's used as separator in various places on *nix (eg PATH).

~~~
ygra
So are semicolons on Windows, yet still legal in file names. For PATH you can
just quote the ones containing the separator (at least on Windows).

~~~
cygx
Traditionally, there's no way to quote in PATH on *nix. I do not believe
that's changed, so if you cannot just change the name, you'd need to use a
workaround like creating a colon-free symlink.

------
fowl2
"McAfee Web Gateway" thinks this is porn, great.

~~~
kristianp
Why would that be I wonder? I don't see any keywords that might trigger it.

That reminds me of web filtering software that blocked my search for "java
proxy", but allowed "java procy", which google understood!

------
ragsagar
Wonder why this site is blocked in UAE! :|

