Hacker News new | past | comments | ask | show | jobs | submit login

Except NTFS does not have "extended attributes" in Linux/Irix/HPFS sense.

Every FILE object in the database is ultimately (outside of some low level metadata) a map of Type-(optional Name)-Length-Value entries, of which file contents and what people think of as "extended attributes" are just random DATA type entries (empty DATA name marks the default to own when you do file I/O).

It's similar to ZFS (in default config) and Solaris UFS where a file is also a directory




> Except NTFS does not have "extended attributes" in Linux/Irix/HPFS sense.

Except actually NTFS does have "extended attributes" in the HPFS sense, which were added to support the OS/2 subsystem in Windows NT. And went on to be used by other stuff as well, including the POSIX subsystem (and its successors Interix/SFU/SUA) and more recently WSL (at least WSL1, not sure about WSL2), for storage of POSIX file metadata.

In NTFS, the streams of a regular file are actually attributes of `$DATA` type; the primary stream is an unnamed `$DATA` type attribute, and any alternate data stream (ADS) is a named `$DATA` type attribute. By contrast, extended attributes are not stored in `$DATA` type attributes, they are stored in the file's `$EA` and `$EA_INFORMATION` attributes. I believe `$EA` contains the actual extended attribute data, whereas `$EA_INFORMATION` is an index to speed up access.

Alternate data streams are accessed using ordinary file APIs, suffixing the file name with `:` then the stream name. Actually, in its fullest form, an NTFS file or directory name includes the attribute type, so the primary stream of a file `foo.txt` is called `foo.txt::$DATA` and an ADS named bar's full name is `foo.txt:bar:$DATA`. For a directory, the default stream is called `$I30` and its type is `$INDEX_ALLOCATION`, so the full name of `C:\Users` is actually `C:\Users:$I30:$INDEX_ALLOCATION`. You will note in `CMD.EXE`, `dir C:\Users:$I30:$INDEX_ALLOCATION` actually works, and returns identical results to `C:\Users`, while other suffixes (e.g. `:$I31` or `:$I30:$DATA`) give you an error instead. Windows will let you created named `:$DATA` streams on a directory, but not a named one.

By contrast, extended attributes are accessed using dedicated Windows NT APIs, namely `NtQueryEaFile` and `NtSetEaFile`.

I'm not sure why Windows POSIX went with EAs instead of ADS; I speculate it is because if you only have a small quantity of data to store, but want to store it on a huge number of files and directories, EAs end up being faster and using less storage than ADS do.


EaData & EaFile remind me of the murky memories of OS/2 APIs.

HPFS had a different approach of internally handling EAs, but OS/2 did create extra file on FAT16 filesystems to store EAs, which could point to origin of $EA. (HPFS itself has special EA-handling implemented in its FNODE, equivalent of inode/FILE entry)

I do not recall the EA actually being used anywhere by new code though, quite shocked by the mention of WSL. Old POSIX subsystem originated before ADSes I think, and might have decided to avoid creating more data types.

My quip about difference of Linux/Irix xattr is related to architectural design involved in the APIs - Irix style xattr API (copied by Linux) is rather explicitly designed for short attributes - do not know if it's still current but I recall something about API itself limiting it to single page per attribute? Come to think of it, that would match certain aspects of Direct IO that AFAIK were also imported from Irix...

Oh, and BTW - NTFS internal structures being accessible as "normal" files is one of the design decisions inherited from Files-11 on VMS, one I quite like from architecture cleanliness pov at the very least.


> I do not recall the EA actually being used anywhere by new code though, quite shocked by the mention of WSL.

This explains it: https://learn.microsoft.com/en-au/archive/blogs/wsl/wsl-file...

uid, gid, mode, and POSIX format timestamps are stored in an EA. It also mentions file capabilities being stored in an ADS. On Linux, capabilities and ACLs are stored in xattrs, so that seems to imply that xattrs are stored in ADS not EA.

> Old POSIX subsystem originated before ADSes I think, and might have decided to avoid creating more data types.

I'm not sure about that, I think support for ADS has been in NTFS from its very beginnings, it was designed to support it from the very start.

Actually, from what I understand, the original design for NTFS – which was never actually implemented, at least not in any version that ever shipped to customers – was to let users define their own attribute types. The reason why their names all start with $, is that was supposed to reserve the attribute type as "system", user attribute types were supposed to start with other characters (likely alphabetic). And that's the reason why they are defined in a file on the filesystem, $AttrDef, and why the records in that file contain some (very basic) metadata on validating them (minimum/maximum sizes, etc). If they were never planning to support user-defined attribute types, they wouldn't have needed $AttrDef, they could have just hardcoded it all in the code.


the dollar sign convention predates NT, it's one of the things inherited from Files-11, where the metadata-files were not hidden from end user, just marked with strict enough permission checks. (A lot of VMS APIs used dollar signs for namespacing, too, and I believe some aspects of the naming scheme come from specific PDP assemblers when referring to some names?)

Looking at NTFS from on-disk structure side, it always seemed quite obvious to me that a lot of accolades given to BeFS applied to NTFS - it's the lack of actually using the abilities - and IIRC a lot of the indexing system is actually used by Windows Search, which in tech spaces I always found mentioned as "useless thing I disabled", yet I found out later offices where people are very much dependant on the component (helps that MS Office installed document handlers to index its documents in it)


> Looking at NTFS from on-disk structure side, it always seemed quite obvious to me that a lot of accolades given to BeFS applied to NTFS - it's the lack of actually using the abilities

Microsoft had some very grand plans in this area... Cairo, OFS, WinFS... but they just kept on getting delayed, cancelled, pulled from the beta for too many issues. I think contemporary Microsoft has lost interest in this (it was something Bill Gates was big on) and moved on to other ideas.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: