
Windows file system compression had to be dumbed down - ingve
https://blogs.msdn.microsoft.com/oldnewthing/20161101-00/?p=94615
======
Dylan16807
> We live in a post-file-system-compression world.

You'd think so, but allow me to point at my steamapps folder saving 20% disk
space even with the not-very-good compression that NTFS offers. If it could
use a better algorithm and a block size of 2MB instead of 64KB, it could
nearly double that.

Hard drives have always been growing in size. We have always been in a 'post-
file-system-compression' world. But people want to do things like fit on an
SSD, so compression continues to be useful.

I just wish it didn't ultra-fragment files _on purpose_.

~~~
LeifCarrotson
> my steamapps folder saving 20% disk space even with the not-very-good
> compression that NTFS offers

Wait, you run the Steam apps from a compressed folder? Doesn't that kill
performance?

Regardless, I have long felt that the right way for Steam to make this work
would be to have, in addition to the "Download" and "Delete" tools, an
"Archive" button that moves it to a specified path (by default, on the install
drive, but configurable to another drive or a NAS) and compresses it - with
whatever compression they want.

I want my Steam apps uncompressed on my SSD. I don't have room on my SSD for
hundreds of gigabytes of Skyrim textures that I haven't played in 6 months.
But if I delete the app, then I have to wait hours (and cause Steam some
expense) to download it again.

Not everyone has multiple drives or a NAS, but a local archive would
definitely be useful.

~~~
outworlder
> Wait, you run the Steam apps from a compressed folder? Doesn't that kill
> performance?

No. In fact, it may improve performance, by virtue of transferring less data
from slower I/O devices. Yes, slower, even if it's a SSD.

~~~
TillE
> Yes, slower

I'm skeptical of this actually being the case in any real-world scenarios.
There have been a number of tests of running games from a RAM disk vs an SSD,
with precious little difference in load times.

~~~
mitchty
It depends on the data generally, but algorithms like lz4 can decompress
faster than most storage mediums can keep up. Including nvme drives. Compare
1.8GiB/s reads of raw data, versus compressed almost 3 or more. This is on a
skylake i7 and 2 nvmex4 drives. More cpu use but honestly, it would be stalled
waiting on i/o otherwise.

The key is the data sent to the cpu and decompressed, makes up for the stall
from hitting memory or i/o. Comparing ram vs ssd is the wrong comparison to
make, with both you're hitting stalls due to memory. You want to compare reads
of uncompressed versus compressed with the note that (and i'm just making
numbers up with this analogy as i'm about to sleep), 900KiB of compressed data
in, 2MiB of data out. 1.1MiB bonus and yes I'm assuming huge compression but
for times your cpu is idle it makes perfect sense.

And yes, lz4 compression on things like movies still helps. I shaved off over
200GiB on my home nas with zfs.

------
jonstewart
Interestingly, NTFS in Windows 10 introduces a new codec for compressed files.
See [http://www.swiftforensics.com/2016/10/wofcompressed-
streams-...](http://www.swiftforensics.com/2016/10/wofcompressed-streams-in-
windows-10.html)

------
pavlov
_For the algorithm that was ultimately chosen, the smallest unit of encoding
in the compressed stream was the nibble_

Feels to me like nobody talks of nibbles anymore, maybe because we have ample
memory and storage and can usually afford to waste some bits for the
convenience of byte alignment. (It's half a byte, or 4 bits.)

Disk compression is interesting because Microsoft originally included it
already in MS-DOS but lost a lawsuit brought by the company behind a popular
utility called Stacker:
[http://articles.latimes.com/1994-02-24/business/fi-26671_1_s...](http://articles.latimes.com/1994-02-24/business/fi-26671_1_software-
patent)

That was the first time Microsoft got in hot water for bundling features into
DOS/Windows (the web browser would be the straw that broke the camel's back).

~~~
IntelMiner
"Software patents remain controversial; critics have contended that the U.S.
Patent Office does not understand the industry and issues patents that are too
broad. Patent attorneys said Wednesday's decision will encourage small
software firms to use patents as leverage against the industry's big players"

Oh if only they knew what would happen

~~~
pjc50
_" Patent attorneys said Wednesday's decision will encourage small software
firms to use patents as leverage against the industry's big players"_

The patent industry always wheels out the small inventor as PR when defending
the system. As soon as noone's looking they use the same system to keep small
inventors out.

------
kstrauser
While I get the point he makes, and I'm certain that the team aren't idiots, I
vehemently disagree. Amiga had a popular third-party library called XPK that
let you install system-wide codecs on your system, and then any XPK-aware app
installed could use any of those codecs to compress and decompress data on the
fly. There were also patches on the filesystem so that the OS itself could
detect a file's compression algorithm and decompress it transparently for apps
that didn't know about XPK.

In short, in the early 90s Amiga had configurable per-file compression
algorithms. There were CPU-optimized versions of almost all of those codecs,
so someone using an ancient 68000 could interact with files compressed by a
PPC. I could pull a drive out of my fast system and give it to a buddy with an
old, slow CPU, and he could either 1) live with the reading speed penalty, 2)
decompress each file one time and then use the unpacked versions, or 3)
recompress each file with an algorithm more friendly to his system.

I don't think the Windows OS team is dumb by any stretch. I do think they
might have been hampered by NIH syndrome, and weren't aware of (and likely
couldn't care less about) how these problems were solved on other OSes.

~~~
Grishnakh
>I don't think the Windows OS team is dumb by any stretch. I do think they
might have been hampered by NIH syndrome

I think NIH qualifies as a form of stupidity.

------
pyreal
Interesting timing. I just used Windows' built in compression to free up 25 GB
on a customer's full C: drive. First time I've used the feature in years. I
noted that it didn't seem to be very good compression level.

~~~
rexicus
It's never been worth using due to the slowdown and fragmentation it causes.

~~~
tonyarkles
An interesting angle that I read a paper on in grad school a few years ago but
don't have handy: because of the asymmetric growth of CPU speed vs. hard drive
speed, you can actually get performance _gains_ by enabling filesystem
compression. It seems counter-intuitive, but it boils down to "can I compress
this data faster than the disk can write it?"

If you're blasting highly compressible data to disk, compressing it on the fly
can, in some circumstances, have a net bandwidth greater than just writing the
data to disk (and greater than the disk alone is capable of). Yes, you incur
more CPU load, but it's a net win. It's not universally true, YMMV etc.

~~~
LeifCarrotson
One counter to this argument is that, while the CPU has thousands of cycles to
do compression while an old 5400 RPM spinning rust drive seeks or writes, a
modern SSD like the new Samsung 960 Pro can write data at 2100 MB/s, giving
the CPU perhaps one cycle per byte, which makes compression difficult.

The counter to my counter, of course, is that specialized silicon for
compression can easily keep up with even these speeds. In fact, Sandforce SSD
controllers build in compression to boost read and write speeds!

~~~
tonyarkles
Thanks for the reminder that while grad school doesn't feel that long ago,
spinning rust was definitely the name of the game at the time. SSDs were a
thing, but way too expensive and small for non-exotic purposes... :D

------
lucb1e
This article makes a lot of assumptions and has some weird thoughts.

> Well, okay, you can compress differently depending on the system, but every
> system has to be able to decompress every compression algorithm.

But that's not how this works. You don't design an entirely new algorithm for
every possible performance, you create one or two algorithms and tweak them.
Then make one or two decompressors that can handle any compression setting.
Simple example: lz77 (used in deflate/gzip) whose decompressor is extremely
fast regardless of your compression settings.

> Now, Windows dropped support for the Alpha AXP quite a long time ago

Why does this sound like "... and they changed nothing"?

> you can buy 5TB hard drive from [brand] for just $120.

Sure but we also have more data to store. If you wanted 100 games in the 80s
you'd need what, 1MB storage? I'm just guessing, I'm too young to know that.
Now that'd be what, 1TB? It might be cheaper but I'm just saying, storage
prices going down does not change that compression is a good idea.

> many (most?) popular file formats are already compressed

Before they were all about data accessibility and recovery on
different/damaged systems but compression doesn't help that. Now this is a
good thing? Additionally, binaries and libraries are not compressed; many of
my documents are just text files; and databases still benefit from this a lot.
(But who has a database on their computer? Anyone who uses a web browser and
email program.)

> We live in a post-file-system-compression world.

I still think it's a fine idea.

> Tags: History

Lol

------
olavgg
I wonder how effective LZ4 would be as default compression algorithm for NTFS.
It is extremely effective with ZFS.

~~~
zdw
Port it to Alpha and let's see. I'll loan you my 21164 over the weekend.

~~~
dfox
The article mentions that alpha was weak on bit-twiddling, which is something
that almost certainly got fixed by introduction of BWX instructions on 21164.

~~~
honkhonkpants
Was axp even the weakest of the four windows architectures for this task? PPC
and MIPS can leave the bit twiddler scratching their heads, too.

~~~
dfox
Problem with original Alpha is that accessing memory as anything other than
aligned 32b or 64b words (yes, including bytes) counts as bit-twiddling
because only load/store instructions that it has operate on these two word
sizes.

I can't remember any other non-niche architecture with 8*2^n word size that
shares this (mis-)feature. (I suspect that Cray 1 and it's derivates also
share this, but that probably counts as niche architecture)

------
yyhhsj0521
Why can't we just use different compression levels? Or it wouldn't be hard to
build into Windows multiple compress-algorithms. So that fast machines use
high compression level or CPU-demanding algorithms, and slow machines use the
contrary. Therefore slow machines could still decompress files from faster
machines efficiently, because in decompression HDD I/O is the bottleneck [1].

[1] [http://superuser.com/questions/135594/what-is-more-
important...](http://superuser.com/questions/135594/what-is-more-important-
when-extracting-rars-cpu-or-hdd)

~~~
wmf
When this stuff was being designed in the 1990s the hard disk wasn't the
bottleneck.

------
rbanffy
I think there would be a strong case for the decompression logic being able to
read whatever you throw at it while the compression logic would need to decide
how hard it should think to meet the performance requirements depending on
what the present machine can/should do considering its speed and/or present
load.

Keeping the decompressor robust would easily solve the drive portability
concerns, making the data readable (even if not at maximum speed) across
machines and architectures.

------
my123
Deduplication works well these days.

------
revelation
But of course all of that old code is still there and maintained. We live in a
post-file-system-compression world with ubiquitous file-system-compression.

------
rasz_pl
>We live in a post-file-system-compression world.

says a guy working for a company shipping system with >6GB (often 10GB in 100K
individual small files!) of redundant NEVER EVER touched data inside WinSxS.
Data that cant be moved off the main drive without serious hacks
(hardlinking). This fits in the general pattern of indifference to user
hardware. Another one (my fav) is non movable hiberfil.sys that MUST be on
primary drive, cant even move it with hardlinks. There goes 16GB of your SSD
for a file that gets used ONCE per day.

This is what happens when you hire straight out of college programmers and put
them on top of the line workstations.

~~~
leeter
tbf... almost all of WinSXS is hardlinks, there is very little actual
duplication. It's just that explorer doesn't understand that and reports it as
duplication.

~~~
rasz_pl
easily disproved:

C:\WINDOWS\system32>dism /Online /Cleanup-Image /AnalyzeComponentStore

Windows Explorer Reported Size of Component Store : 8.18 GB

Actual Size of Component Store : 7.78 GB

~~~
acqq
The next line of the output from the command you specified but which you
didn't quote and that gives almost 5 GB on the machine in the example it is
named "Shared with Windows"

[https://technet.microsoft.com/en-
us/library/dn251566.aspx](https://technet.microsoft.com/en-
us/library/dn251566.aspx)

"This value provides the size of files that are hard linked so that they
appear both in the component store and in other locations (for the normal
operation of Windows). This is included in the actual size, but shouldn’t be
considered part of the component store overhead."

~~~
rasz_pl
1 you specifically said "WinSXS is hardlinks", this is not true

2 still leaves up to 5GB of redundant garbage in WinSXS, things like multiple
versions of random DLLs nobody ever uses, 360MB for ~11 versions of 'getting
started' package consisting of same movie files, color calibration data for
obscure scanners on bare minimal install etc

Turning drive compression omits this directory while happily compressing files
in /system.

~~~
acqq
> you specifically said

No, you communicated before with somebody else.

> still leaves up to 5GB of redundant garbage

No, if you read that example, there is less than 1 GB which is stale (the diff
between the first and the third number):

>> Windows Explorer Reported Size of Component Store : 4.98 GB

>> Actual Size of Component Store : 4.88 GB

>> Shared with Windows : 4.38 GB

As I've said, you intentionally didn't quote your machine's third line but I'm
sure it's not a 5 GB difference.

Moreover, that difference can be purged if necessary. Which you can also do on
your machine with just a single command, per link on the same page:

[https://technet.microsoft.com/en-
us/library/dn251565.aspx](https://technet.microsoft.com/en-
us/library/dn251565.aspx)

There is a trade-off, the probable case why nobody does it unless really
necessary:

"All existing service packs and updates cannot be uninstalled after this
command is completed"

------
hobarrera
> However, we also live in a world where you can buy 5TB hard drive from
> Newegg for just $120

It's a lot more expensive to install that disk it into an ultrabook/mba.

------
jxy
In other words, if you have control over both software and hardware, you can
deliver a much better user experience.

------
CharlesMerriam2
So many comments being harsh to Microsoft for bad engineering by ignoring the
real world.

It's Microsoft. Would you critical of a pigeon for defecating in flight?

