
Understanding Disk Usage in Linux - octosphere
https://ownyourbits.com/2018/05/02/understanding-disk-usage-in-linux/
======
dexen
Website seems to be hugged to death, here's an archived copy:
[https://web.archive.org/web/20190429214402/https://ownyourbi...](https://web.archive.org/web/20190429214402/https://ownyourbits.com/2018/05/02/understanding-
disk-usage-in-linux/)

The article is well worth reading as it's reasonably comprehensive, including
a small foray into COW filesystem.

~~~
minitoar
Is that a reference to Hug Bot? [https://pbfcomics.com/comics/hug-
bot/](https://pbfcomics.com/comics/hug-bot/)

~~~
lbotos
Not sure, but I always thought "hug of death" came from reddit:

[https://en.wikipedia.org/wiki/Slashdot_effect](https://en.wikipedia.org/wiki/Slashdot_effect)

~~~
hwj
Previously, that was called "slashdotting".

(Knowing such things makes me feel kind of old).

------
bscphil
One feature not mentioned in this article is that BTRFS (and some other file
systems) supports transparently compressing files. It's another (very
important) reason why the actual number of bits the file contains might not be
how much of your disk space the file is actually using.

------
the8472
To show extents - including sharing - of a file you can use the tool _filefrag
-v_ , it's more general than _btrfs filesystem du_ since it uses the fiemap
and fibmap ioctls that are supported by multiple, but not all, filesystems.

------
nwlieb
Related: is it possible to reliably maintain physical disk space quotas in
Linux (similar to cgroups)?

Furthermore, is it possible to say how much "space" you would use if you were
to create a file with a given size, accounting for block-size, fragmentation,
and metadata? Matters such as block-size, inode usage, and metadata seem to
make this very difficult even if you add special integration to the userspace
application, for example by using stat or statfs. This could help prevent
quota overruns for example.

These seem like hard problems unfortunately, and I suspect the best solution
is to just create separate disk partitions for each quota group.

~~~
chungy
First question: Quotas have been supported on Linux for a very long time. All
major (and native) file systems support them.

Secondly: disk usage accounting for metadata as well as regular file data may
or may not be tricky. ZFS always tells you how much data+metadata is used by a
file, helped by metadata itself being dynamically allocated on ZFS like
everything else is. File systems like ext4 that have fixed metadata locations
on disk don't report back metadata allocation with the file; it wouldn't
really be useful to see this information since removing the file doesn't free
any metadata in the ext4 case.

~~~
ifcho
Project quotas appear to be more similar to cgroups. They are available in xfs
and ext4 [https://lwn.net/Articles/623835/](https://lwn.net/Articles/623835/)

~~~
cat199
The traditional 4.2BSD-style quotas (1983!) on linux also support quotas on
unix groups. Not sure if you had this in mind, but anyway.

I suppose project quotas as outlined here would allow multi-group support
though.

Another option could be sparsely-provisioned COW LVM volumes.

------
zepearl
Questions about "cp --reflink" (I have never used that option so far, but it
sounds useful).

Quoting "man":

 _" When --reflink[=always] is specified, perform a lightweight copy, where
the data blocks are copied only when modified. If this is not possible the
copy fails"_

Q1: this is copy-on-write, right?

Q2: once/if the command completes successfully, are there any (potential)
dangers (e.g. if I then immediately delete the original file) or can the new
file be treated 100% as if I had copied it the classical way (without that
option)?

Thx

~~~
simcop2387
1) Yes, it's copy on write, and requires such support from the underlying
filesystem (Btrfs, maybe XFS not sure of what others support it)

2) It's intended to be treated 100% as if it had been copied the classical
way. It's not a hardlink or a symlink and can be treated as a completely new
file

~~~
cmurf
Support for reflinks in XFS for a while now, although it requires a mkfs time
option to enable it. That option was just enabled by default in xfsprogs-5.1.0
which is now in Fedora 31 (still prerelease).

~~~
zepearl
Cool => I'll then probably reformat on my root server the partition hosting
the images of my VMs from EXT4 to XFS => then to make 100% backups from time
to time I would just have to shut them down for a minute, do a "cp --reflink"
of their image-file on the host/dom0, then I could start them up again and do
anytime later the slow download of their "copied" file.

Any recommendations about a utility (working on a layer higher than the
filesystem) that can synchronize only the chunks of a specific file that has
changed since the last sync and which generates as final result a "normal"
file (no behind-the-curtains database - just copy the identical data from the
local/old file and the changes/new data from the remote location to get a
normal up-to-date local file)?

~~~
caf
I believe rsync is the utility you are looking for.

~~~
zepearl
Damn, now I'm ashamed of myself - it's since forever that I'm using rsync...
:)

I think that when I tested this with rsync a long time back I did a similar
mistake like this one [https://stackoverflow.com/questions/28819379/block-
level-cop...](https://stackoverflow.com/questions/28819379/block-level-
copying-and-rsync) (doing a sync not over a network) and/or this one
[https://www.reddit.com/r/linux/comments/3nhx0p/rsync_block_l...](https://www.reddit.com/r/linux/comments/3nhx0p/rsync_block_level_incremental_backup/)
(doing a sync by adding/removing some bytes instead of just changing their
values, as it would happen in the preallocated img-file of the VMs).

I'll test it again - thanks a lot for the info!!

------
ChrisArchitect
add (2018)

