If you open the properties of a folder, you can see how long it can take to calc...

userbinator · on June 8, 2019

you can see how long it can take to calculate the size of a folder

On older Windows (95, 98, XP) I've never had it take an unacceptably long time, and that was with a regular HDD. With an SSD and the large file caches which are possible with today's machines with lots of RAM, it should be even faster.

Besides, it's not as if the operation needs to be synchronous; the sizes can be calculated and displayed when they're ready. Having an option to show them would be useful, and those who don't need it/don't like the extra disk activity could leave it off.

Bjartr · on June 8, 2019

Older versions of Windows used the FAT filesystem rather than the current NTFS filesystem. FAT happened to get you directory size "for free" due to how it works under the hood, whereas NTFS does not (but it's faster at other things as a result). For much more detail check out this[1] post on The Old New Thing blog by Raymond Chen.

[1] http://blogs.msdn.com/b/oldnewthing/archive/2011/12/26/10251...

Sharlin · on June 8, 2019

It is absolutely hideously slow on NTFS to compute the size of a directory that (recursively or not) contains thousands of files. Indeed, it is absolutely hideously slow on NTFS to do just about anything that needs to touch thousands of files at a time.

adzm · on June 8, 2019

Note that if you parse the MFT itself you can calculate this info for the entire drive in the time explorer can calculate it for a single folder. I know there is some overhead with the official api of course and it needs to support other filesystems but we really needed a way to iterate directories faster just to gather this kind of info. There are several disk space utilization / treesize apps and file search apps that take advantage of the speed you get by bypassing the API.

Tempest1981 · on June 8, 2019

The Wiztree app is a great example of the benefit -- 3 seconds to view the entire drive. Although reading the MFT requires elevation.

https://antibody-software.com/web/software/software/wiztree-...

mqus · on June 8, 2019

Since Windows 98, we used Idoswin (https://www.idoswin.de/) which always calculated directory sizes in the background. It showed the sizes of directories it already calculated and made the field empty for every folder where the size was not ready yet. Usually the small directories are ready first and only the big (>1gb and many files) take a few seconds. This is done in the background and therefore doesn't disrupt your workflow at all.

Another method would be to make it midnight-commander style: add a button which will calculate & display the sizes of the directories.

So the time to calculate it is not really an issue. Use-cases are exactly cleaning up space and roughly calculating backup sizes/sizes of folders I want to copy at a single glance. You could also ask: why do we need to display the size of files in the explorer. The reasons are roughly the same.

NikkiA · on June 8, 2019

I use xyplorer, which has a toggle feature for always displaying the folder sizes, and a keyboard shortcut (shift-F5) to display them for the current folder until changed.

This combination seems to work just fine, I don't usually need folder sizes, so I keep the toggle off, and just hit shift-F5 if I really need to see them.

Causality1 · on June 8, 2019

I don't see why the size of the folder couldn't be written to the folder's metadata every time the folder contents are modified.

ComputerGuru · on June 8, 2019

Because it's recursive. It means a file nested a few hundred folders down needs to update each and every parent. And what about hard linked files? Directory junction points? Drives mounted as folders? It would be insane to think the performance would be acceptable.

hexomancer · on June 8, 2019

First of all, most files are nowhere near 10 levels deep, let alone hundreds of levels.

Second, we don't have to do a naive update every single time a file is changed. We can amortize the cost by updating the parent folder sizes only when the file size is changed by a significant amount since last update (say, 10%). And we can do this process recursively is the parent directories. This was it only takes O(1) time to update folder metadata for each file operation and all the metadata are accurate within 10%.

Also I don't see how drives and hard linked files are any different than regular files in this context.

kilburn · on June 8, 2019

> most files are nowhere near 10 levels deep, let alone hundreds of levels.

Filesystem (and low-level in general) stuff must consider worst cases. There is a lot of software out there doing weird things. For instance, npm created a very deep folder hierarchy for a long time (so deep it messed up some path length restrictions in fact).

One way or another the worst case is going to be hit. And then what? The entire computer grinds to a halt? How is the user supposed to discover why?

> We can amortize the cost by updating the parent folder sizes only when the file size is changed by a significant amount since last update (say, 10%).

So you have a log file inside a folder. Because it just keeps growing line by line then the folder's size is never updated. Now you have many Gb's of log file in that folder, and the folder says it is using "4Kbytes".

Moreover, this propagates upwards the filesystem. In the end, your root drive has a "folder size" of X but its actual usage is Y >> X. How is that not going to confuse everyone?

Put in another way, who is going to trust the X number ever? Why would you pay all that accounting penalty for every write to every file to end up with a half-asset broken-by-design feature?

> Also I don't see how drives and hard linked files are any different than regular files in this context.

They are different in that they exist in multiple folders at the same time (so they would trigger multiple size-updating branches).

hexomancer · on June 8, 2019

> So you have a log file inside a folder. Because it just keeps growing line by line then the folder's size is never updated. Now you have many Gb's of log file in that folder, and the folder says it is using "4Kbytes".

You misunderstood what I meant. I didn't mean we should only update if a single change is significant. I meant we only update when the cumulative changes since last update is significant. An example:

Let's say we create a 1mb file. The next time the file is changed, we only update the parent if the change is more than 10% (the new size is greater than 1.1mb or less than 0.9mb). For example if it is a log file and each line is 100 bytes, we update the parent after 1000 new lines (even though the 1000th line is still only 100 bytes). (of course the number of lines before the update depends on the size of the file, so if we had a 1GB log file, we would update after 1000000 new lines).

It is trivial to prove that the estimate is never off by a factor of more than 10% (even for the parent folders). So this is not a "half-assed broken-by-design feature" since it provides strong guarantees in bounds and at least in my personal day-to-day usage, I almost never care about the exact size of a folder but want to have a rough idea of how large it is.

kilburn · on June 9, 2019

I understand your design (and yes, it would work for the simple cases). Even then, it all still boils down to whether you want the overhead of extra computations for every write to get a [lower,upper] bound on the size of what every folder contains or not.

Then there are the complex situations (this is just a small sample I can come up with right on the spot):

What happens when a file is hard-linked under the same ancestor folder? Should its size be counted once or twice?

How do you even know the parents of a file at write time? Current (unix) filesystems only store folder -> [inodes], where an inode in that list may be referenced by other folders. There is just no inode -> folder(s) where it is stored mapping that I know of.

And then there are bind-mounts (similar to "folder hard-links" but not quite), special files/devices, etc.

All in all, it is a huge mess for a questionable benefit. What actual use cases are just not possible without this feature?

gregmac · on June 8, 2019

You still have to read the stored size on each parent to figure this out, at which point the optimization makes sense only if writes are significantly (at least 2x) more expensive than reads, and this is not true for most desktop PCs.

This really boils down to a caching problem, and well, there's a reason it's one of the two hard computer science problems..

hexomancer · on June 9, 2019

Nope. We only have to read the parent of the last changed folder. Plus, any filesystem worth its salts already caches the frequently used parts.

ComputerGuru · on June 9, 2019

If 10 files change by n bytes each, none of which reach your threshold for updating the parent individually, where are you storing the amount each file was changed since the last parent folder update until you deign it appropriate to update the parent? Your design makes no sense.

hexomancer · on June 9, 2019

> where are you storing the amount each file was changed since the last parent folder update

The same place we store the rest of the file attributes?

I think my design does make a lot of sense but you are actively trying to not understand it.

ComputerGuru · on June 9, 2019

Then each time one file changes you need to read all other files in the same folder to determine if the net change satisfies the increment condition for the parent folder!

ken · on June 8, 2019

It certainly can. Apple does this in APFS. They call it "fast directory sizing": https://arstechnica.com/gadgets/2017/09/macos-10-13-high-sie...

kilburn · on June 8, 2019

Apple said they would be doing this. For some reason it doesn't seem they actually did it [1].

[1] https://eclecticlight.co/2019/02/06/how-big-is-that-folder-w...

jmgao · on June 8, 2019

Because if you do that, you have to walk up the entire directory tree on every file operation.

kalleboo · on June 8, 2019

The Mac has been able to do this since what, System 7.5 in 1995? Yeah it can take a while to calculate but on a modern SSD it's not a problem. If you don't like the performance hit it's an option that defaults to off.