
The structure of the special type of DOS files called directories (1998) - luu
http://averstak.tripod.com/fatdox/dir.htm
======
red_admiral
> Archive bit is somewhat symbolic. It should be set if the file was not
> archived by the backup utility. Never in my life I have seen the use of this
> bit.

I have - back in the days of DOS' BACKUP.COM and RESTORE.COM utilities.

I believe the way it worked was that when you created or edited a file, DOS
set this bit, and when BACKUP.COM archived it, that cleared the bit again (or
was it the other way round?) That allowed you to do a very early form of
incremental/differential backup, and had a whole section explaining it in the
DOS 3.somethingorother manual I first learnt very basic system administration
from.

Back in those days our school PC came like many others with two floppy drives
called A: and B: - which is why hard disk names traditionally start at C: -
and you backed stuff up with a complicated schedule of put system disk in A:,
boot, put data disk in B:, load program then it tells you to swap disks so you
take out the system disk and put the backup disk in A: and when you're done
you put the system disk back in again.

You could set and clear the archive bit manually with ATTRIB +/-A FILENAME.

~~~
jmalicki
Linux/UNIX has this too, through mtime (a time not a bit)!

Plus it's idempotent, unlike the archive bit! (e.g. if you lost your last
backup, you have a time, not a cleared bit, so you can redo it)

------
ben509
The most complex operation due to some klugey MS stuff is 7 pseudocode
steps... if that's hellish, then engineering is not a good career choice.

"What can go wrong with long filenames? Give some space to your
imagination..."

They can get out of sync, and then they're going to float around until a long-
filename aware system clears them out. Shockingly, systems that aren't aware
of a new standard aren't aware of the new standard. ¯\\_(ツ)_/¯

I've been working on a language for handling data structures much like this,
and have some preliminary notes on a strategy for handling this scenario:
[https://tenet-lang.org/versioning.html](https://tenet-
lang.org/versioning.html)

~~~
anyfoo
Yeah. You can give Microsoft a lot of crap, but what never ceases to amaze me
is how inventive they were in keeping everything as compatible as possible,
which they usually (successfully, mostly) pursued at all cost. Of course, that
came with a high price, in terms of complexity and kludginess.

~~~
ben509
It's pretty interesting because in many cases the software evolved faster than
anyone could keep up with, and there are large portions of backwards
compatibility that I suspect were mostly accomplished through trial and error,
especially when it comes to their layout algorithms.

When they opened up the Office file formats, they wrote an XML spec that
corresponds to the fields in the various binary formats. And there are
sections of it where the specification is simply, "this should be formatted
the way MS Word 5.1 does it."

------
geophile
This is not hell.

I have had mercifully little exposure to this world, but I once had to
implement ACLs for a storage system my company bult. We stored ACLs, and
needed to make them available through Linux and Windows interfaces.

Linux was easy.

Windows (ca. 2006) was a fucking nightmare that puts this document to shame.
From what I could tell, there was no concept of an API. There were data
structures, and the interpretation of them depended on many things, including
minor OS version.

------
dmitrybrant
For all the quirks and facepalms to be found in the FAT logic, I have to give
credit to the staying power of FAT, and its universal compatibility. Its
relative simplicity makes it perfectly adequate as the default filesystem for
short-term storage (e.g. USB flash drives) where things like resilience and
journaling are not an absolute necessity.

~~~
aboutruby
Its 4GB max file size was a pretty big problem near its end of life.

~~~
kikoreis
Apart from media files, where does that limit really hurt?

~~~
devnonymous
Backups ..or generically tarballs/zips/..etc. There have been way too many
times I've had to split zips/tarballs etc just to copy over stuff over a fat
formatted usb disk. Of course I no longer do that but there was a brief period
when this was a problem.

~~~
reaperducer
I remember back in Commodore days, ARC or LHARC, or one of the archiving
programs had the ability to span 1541 disks. But I can't remember why that was
necessary. Under what circumstances was I moving files around that were 3x the
size of the computer's memory?

~~~
anyfoo
Good question. Even for programs and games that spanned multiple disk, you’d
explicitly arrange the data so that it makes sense (sometimes even duplicating
some of it), and not just uniformly spread it across all the disks.

I can’t remember a single instance where there was the concept of a file that
spanned multiple disks on the C64. In theory, the hard drive add ons might
have lead to the use of such files (they don’t have to fit into memory at
once, you can just seek within the file), but given their relatively low
popularity, that was probably very rare, if there was any support for such
files in software at all...

------
magnat
> The entry with the name consisting of exactly one dot is the pointer to the
> root directory

Is it? Wasn't single-dot entry pointer to current directory since DOS 1.40?

------
amaccuish
If only exfat wasn't patent-encumbered :( extfat-fuse works ok, but I'd love
there to be a universal filesystem for external devices.

------
benj111
"If the first character has the code 05, then actually the first character has
the code E5 and it is not a special character. If the first character has the
code E5, then the file was deleted"

I'm struggling to interpret this, why would the code be changed to E5?

I assume a file starting with 05 isn't deleted.

~~~
ratboy666
Choice of E5 for a deleted entry:

#1 E5 is a "sync byte" it cannot be rotated and mis-interpreted: E5 E5 E5 ...
can be used to synchronize a bitstream. Note that floppy discs are read bit-
wise.

#2 Empty 8" floppy discs came pre-formatted with E5 written everywhere.

#3 05 is a "control-e" in "extended 8 bit ASCII (IBM encoding)". E5 was a
usable character.

With CP/M, the disc bitmap was produced when the disc was "loaded". All
directory entries were scanned, and the allocation bitmap was produced. A
freshly formatted disc would have E5 fill, and thus would have no files. MS-
DOS had a separate allocation table, which was also the file linkage table.
So, the strategy of a fresh E5 filled disc being taken as empty no longer
works. But, the key of a deleted entry having E5 as the initial character was
still used.

Hope this helps.

~~~
benj111
Thanks for that. I hadn't come across the idea of a sync byte. For anyone
else, the binary representation of E5 is 11100101. No matter where you start
reading you're always going to know whether you are reading from the start of
the byte or not. Contrast with null (00000000) its impossible to know where
you've started from.

This has a bit of info
ftp://ftp.apple.asimov.net/pub/apple_II/documentation/misc/disk_encoding.doc.txt

------
code_duck
“Note that filename cannot consist solely of spaces, but extension can.”

This is the sort of thing hat makes me wonder what on earth went wrong or even
could have been right about Microsoft. Why would spaces be permittable in a
file extension at all, ever?

~~~
hiccuphippo
Why even treat the extension as something separate from the filename?

~~~
ben509
Early DOS stuff was written in straight assembler, so to search for *.FOO you
can just:

Read a chunk of directory entries into a buffer.

Set ptr to start of the buffer.

Check the extension at ptr[8].

If extension matches, do a thing.

Set ptr to ptr + 32 and loop.

~~~
badsectoracula
Since the source code has been released [0], i decided to check it out and
sadly it isn't that smart. Searching is implemented in DIR.ASM [1] by reading
each entry one by one, matching is done only on the filename (encoded as 11
bytes for FILENAMEEXT) and it only handles "?".

Note that this is for MS-DOS 2.0 which didn't handle "*.FOO" searches on the
kernel side (the star expansion was done on the COMMAND.COM side via the
MakeFcb [2] procedure). MS-DOS 3.0 introduced the star wildcard on the kernel
side, so it might have taken advantage of that... but there isn't source code
for MS-DOS 3.0.

[0] [https://github.com/Microsoft/MS-
DOS/tree/master/v2.0/source](https://github.com/Microsoft/MS-
DOS/tree/master/v2.0/source)

[1] [https://github.com/Microsoft/MS-
DOS/blob/master/v2.0/source/...](https://github.com/Microsoft/MS-
DOS/blob/master/v2.0/source/DIR.ASM#L154)

[2] [https://github.com/Microsoft/MS-
DOS/blob/master/v2.0/source/...](https://github.com/Microsoft/MS-
DOS/blob/master/v2.0/source/FCB.ASM#L30)

~~~
anyfoo
Wouldn’t you have to look how COMMAND.COM implemented * matching then,
assuming that the filename buffer has a similar fixed length fields
representation?

Also note that DOS 2.0 incurred quite significant changes to fs code, as it
was the first version to support directories. Before that, the namespace was
flat.

~~~
badsectoracula
All commands (implemented in the TCODE?.ASM files) call PARSE_FILE_DESCRIPTOR
(defined in MISC.ASM) which just calls MakeFcb (defined in FCB.ASM) that does
the star parsing. See lines 127-133, CX contains the remainder of characters
not parsed, if it is star the rest of the characters are replaced with ?s, so
a FOO<star>.E<star> (...HN eats the stars) would become FOO?????.E?? (the
subroutine MUSTGETWORD "under" MakeFcb does the actual replacing for each
part, once for the filename and once for the extension - see the calls to it
and how the CX is set up for the lengths).

MS-DOS 1.25 does pattern matching mostly the same way [0]. The code is a bit
simpler as it doesn't have the crazy macro stuff that MS-DOS 2.0 code has, but
if you look a bit around the files, you'll see that it is mostly the same code
just moved around and split in several files. The file scan works again in
almost the same way, reading each directory entry one by one and doing the
pattern matching after the fact.

[0] [https://github.com/Microsoft/MS-
DOS/blob/master/v1.25/source...](https://github.com/Microsoft/MS-
DOS/blob/master/v1.25/source/MSDOS.ASM#L557)

~~~
anyfoo
Thanks for the thorough analysis! Very insightful. So the 8+3 structure still
simplifies matching: To my recollection at least, in DOS a * would always
wildcard out the rest of the filename component (I did not look in the code),
so things like FOOxBAR.TXT weren't possible. That way though, the common x.TXT
still is.

Read "x" as *, because I could not for the life of me get the star work as a
star within a word in this comment.

------
tambourine_man
I’ve heard once that the design of FAT was so straightforward that one would
be hard pressed to enforce any patents on it. And that Gates designed it on an
airplane trip.

Does anyone know if any of this is true?

~~~
Someone
Wrote, not designed
([https://blogs.msdn.microsoft.com/oldnewthing/20131008-00/?p=...](https://blogs.msdn.microsoft.com/oldnewthing/20131008-00/?p=3003))

(But it isn’t true.
[https://en.wikipedia.org/wiki/File_Allocation_Table#Original...](https://en.wikipedia.org/wiki/File_Allocation_Table#Original_8-bit_FAT))

------
stevecat
I did not expect tripod.com to still exist.

~~~
marsrover
Seeing a post hosted on tripod.com brings back so many memories. It's my
favorite thing about this submission.

~~~
foobarian
Makes me remember creating files by hand using Norton Utilities' diskedit
directly on the raw bytes. Those were the days :)

------
dana321
"Welcome to hell."

All i needed to read :)

~~~
dana321
Well 3 people dislike this i find that funny

