
A FAT32 fragmenter - jaimebuelta
https://gist.github.com/ryancdotorg/9f557d3513710ce91aed
======
rsi_oww
I did something like this when I was a kid, in QuickBasic. Not as
sophisticated - it would just randomly create, append to, and delete files.

I tested it on floppy disks - if you let it run for a while, the remaining
free space was so fragmented it was almost unusable. You could hear the poor
floppy drive seeking like crazy just to open a tiny text file.

The fun part was defragmenting it afterwards - I miss the days of the
graphical defrag that Norton and MS had.

~~~
upperclasstwit
That was such a pure pleasure. I can't exactly picture it but I remember it
was sort of like watching your dwarf fortress.

~~~
nitrogen
My favorite ones would show you a grid representing all the clusters on the
disk. You would see an open block for free space and a solid block for
allocated space.

Then it would highlight a span of blocks to indicate reading.

Finally it would write those clusters elsewhere, indicating so with a
different color.

 _Edit:_ Uhh.. and HN deleted my diagrams. See
[http://pastebin.com/raw.php?i=nKHddiKe](http://pastebin.com/raw.php?i=nKHddiKe)

~~~
ars
For your enjoyment:
[https://www.youtube.com/watch?v=kPv1gQ5Rs8A](https://www.youtube.com/watch?v=kPv1gQ5Rs8A)

~~~
nitrogen
Neat. I had one for DOS too that would go left-to-right instead of top-to-
bottom, but I don't remember if it was a Norton tool or something else. It was
mesmerizing to me as a kid to watch data moving around on the hard drive and
the computer magically getting faster.

------
Sami_Lehtinen
I've usually done intentional fragmentation by filling disk with small files
and deleting those while growing a new file to claim those blocks. Works
basically with every file system. Some examples where I've used such method.

People often claim that fragmentation doesn't affect SSD drives, but that's
not true: [http://www.sami-lehtinen.net/blog/ssd-file-fragmentation-
myt...](http://www.sami-lehtinen.net/blog/ssd-file-fragmentation-myth-
debunked)

This is slightly related. How contiguously growing files are allocated on
different file systems: [http://www.sami-lehtinen.net/blog/test-btrfs-
ext4-ntfs-simpl...](http://www.sami-lehtinen.net/blog/test-btrfs-ext4-ntfs-
simple-file-allocation-tests)

~~~
userbinator
On the other hand, your worst fragmented case still gets ~200MB/s with a
_random_ read, which is around the _best_ case with a _sequential_ read for
most HDDs today. A 4k random read on a HDD will be 1-2MB/s at most.

Relatively speaking the SSD slows down to ~50% of its max speed with
fragmentation, but the HDD will be down to around 1%. So fragmentation affects
SSDs somewhat, but HDDs are affected much more severely.

(Which SSD was it, and how was the test setup? That's important for comparison
purposes, as the layout of the filesystem blocks and how they correspond to
the NAND eraseblocks/pages has a huge effect on what fragmentation will do.)

------
ambrop7
Related, I'm doing a FAT32 driver for embedded systems (fully asynchronous!).
It's part of my APrinter firmware project:
[https://github.com/ambrop72/aprinter/tree/master/aprinter/fs](https://github.com/ambrop72/aprinter/tree/master/aprinter/fs)

Currently it has good read support and limited write support (can re-write
existing files but not append or create new files).

Before writing that code I made some prototype read-only code in Python, to
make sure I understand the FS structure properly:
[https://github.com/ambrop72/aprinter/blob/master/prototyping...](https://github.com/ambrop72/aprinter/blob/master/prototyping/fat.py)

~~~
userbinator
Embedded as in "32-bit ARM running Linux", or "8-bit microcontroller"? Either
way, as someone who has written a FAT driver for embedded systems in the later
category, that looks like it's far more code than it needs to be. Full
read/write functionality can be done in approximately 800 (Z80) machine
instructions. FAT is a linked list, and if you look at it that way, it doesn't
take much code to manipulate one.

I've written a bit more about that before:

[https://news.ycombinator.com/item?id=7492318](https://news.ycombinator.com/item?id=7492318)

Append = find an empty cluster and link it into the chain for the file, then
write into it. Create is similar except you start with a empty chain and also
add an entry to the directory (which is like writing/appending to a file,
because directories are files.)

When allocating next clusters you can reduce fragmentation significantly over
the dumb "first fit" strategy if you scan the FAT to find contiguous free
clusters and apply a next/best/worst fit. Even better if you add an API to
allow tuning the amount of "gap" after a file when creating it, based on
knowledge of how much it may expand in the future.

I believe that with good tuning and allocation heuristics, FAT32 can
outperform the far more complex filesystems (ext*, NTFS, etc.) widely believed
to be superior. One idea I've never gotten around to testing out is to modify
the Linux FAT driver and do some benchmarking.

~~~
JohnBooty

       I believe that with good tuning and allocation 
       heuristics, FAT32 can outperform the far more complex
       filesystems (ext*, NTFS, etc.) widely believed to be
       superior. One idea I've never gotten around to testing
       out is to modify the Linux FAT driver and do some
       benchmarking.
    

I don't understand. Has anybody ever claimed that the more complex
alternatives were actually faster?

NTFS and other filesystems that followed FAT32 are "superior" because they
support things like journaling and more robust permissions... things that
unavoidably incur (a least) a small performance hit.

------
f_
This is a pretty neat concept; I've come across this some years back in
bisqwit's video
[https://www.youtube.com/watch?v=lxZyxxHOw3Y](https://www.youtube.com/watch?v=lxZyxxHOw3Y)
\- he call's it "enfragmentation".

~~~
ryan-c
Neat, I wasn't aware of bisqwit's hack. Looks like he supported FAT12 and
FAT16 but not FAT32, whereas I only supported FAT32.

His has better status information.

------
marvy
Request for benchmarks: I'd like to see what effect this has on, say, WinXP
boot time, or maybe some database benchmark. In fact, there's no need to get
so fancy. Let's see how much this slows down computing a file's checksum, or
maybe copying a file.

~~~
ryan-c
(I wrote this horrible thing)

It's pretty horrendously bad on spinning rust - you get very, very close to
100% fragmentation, so the disk has to seek for each cluster. A lower bound
average on seek time is probably something like 4ms, so your max transfer rate
is going to be limited to 125 * cluster size. Depending on the disk size[0]
the cluster size may be up to 32KB, so we're looking at maybe 4MB/sec best
case. I'd guesstimate it'd increase boot time by somewhere between 10x and
40x.

0\. [https://support.microsoft.com/en-
us/kb/140365](https://support.microsoft.com/en-us/kb/140365)

~~~
ryan-c
In theory, you could make the disk performance even worse by alternating
between the beginning and end of the disk, though you may need to do a little
tweaking to ensure this doesn't inadvertently allow read-ahead to help.

~~~
digi_owl
I seem to recall that this was very bad juju for old HDDs, in that it would
quickly wear out the mechanics of the RW arm.

~~~
ryan-c
There are stories of _really_ old hard drives being made to move around a room
with abusive disk access patterns.

[http://www.catb.org/jargon/html/W/walking-
drives.html](http://www.catb.org/jargon/html/W/walking-drives.html)

~~~
digi_owl
"Some bands of old-time hackers figured out how to induce disk-accessing
patterns that would do this to particular drive models and held disk-drive
races."

Now that i would loved to be witness to.

------
Shivetya
Having been out of the FAT32 disk environment for a while, is fragmentation
determined by non contiguous sectors on the drive? Do the methods reading data
from the drives consider the number of heads/platters involved for performance
gains? A file would not need to be contiguous necessarily per platter but some
logic employed would map it to where following sectors would pass under a
drive head at the same time.

Granted that would not work out for SSD, but I know next to nothing on their
addressing

~~~
AndrewStephens
>Do the methods reading data from the drives consider the number of
heads/platters involved for performance gains?

In general no, at least not since the early 80s. Although you still hear about
cylinders, heads, and sectors, spinning hard drives have effectively been a
black box to OSes for 30 years and will internally remap sectors wherever they
feel like it (and lie about it if you ask them).

The result is that while you can be reasonably confident that sector 33566 is
followed by sector 33567 and reading both will not involve seeking, things
like reading the whole cylinders at a time are not worth the effort since you
don't know where the sectors are.

------
iso8859-1
Could actually be useful for testing, maybe?

~~~
orionblastar
I can see it used to test defragmentation programs. Fragment the disk into
random pieces and then run your defrag program on it and see how long it takes
to organize it.

------
userbinator
Would be interesting to see a version for ext* too.

------
mikeash
This could make for an excellent subtle LART, for when the more traditional
ones can't be used.

------
0x4a42
A fragmenter driven by the game of life with graphics feedback would be prety
cool. :D

------
raverbashing
So what happens exactly if you run this on Ext3/4, HFS, NTFS or other modern
fss?

~~~
delan
Probably fail spectacularly, as it seems to make modifications to the FAT32
data structures directly. A more generic fragmenter might interact with a file
system via pathological file access patterns, but this doesn’t take that
approach.

------
Aldo_MX
Don't try this on pendrives or sdcards.

------
DiabloD3
It is against HN rules to merely comment about how awesome something is and
not contribute to the conversation in any way.

Screw it, I can't hear your rules over how awesome this is.

~~~
dang
> It is against HN rules to merely comment about how awesome something is

Why do you say that? Of course it isn't.

~~~
DiabloD3
What, really? My bad.

~~~
dang
_Empty comments can be ok if they 're positive. [...] What we especially
discourage are comments that are empty and negative—comments that are mere
name-calling._

[https://news.ycombinator.com/newswelcome.html](https://news.ycombinator.com/newswelcome.html)

