

Ask HN: Why is my directory with 300KB of files 60MB large? - jevinskie

Well, actually I know how the directory (on my ext4 partiton) got to be this way but I was interested in the technical details behind it. I was running many network simulations, each one generating a packet trace file. I was going to parse these offline but the simulations ended up making 163GB of small trace files before I ran out of disk space! [I have since changed my scripts to pipe the traces to the analyzer, eliminating the problem.] After deleting the traces, which took about 20 minutes, a `ls -la .` shows that the directory is 60MB large while containing just 300kB of files. I suspect that some remnants of the links to the trace files remain in the directory inode. Why weren't they removed? Will they get "garbage collected" in the future?<p>Thanks for your insights,<p>Jevin
======
jevinskie
After more research I have determined that this is specified behavior. The
directory inode doesn't support online shrinking [1] but you can do it offline
in one of two ways:

1\. e2fsck -D /your/partition

The -D option directs fsck to optimize the directories. Basically, it rebuilds
the directory inode.

2\. mkdir tmp && mv big_folder/* tmp/ && rmdir big_folder && mv tmp big_folder

This actually moves the files to a new, small directory, and deletes the old,
big one before renaming back to the original name.

[1] This thread discusses the problem and some potential online solutions
<http://lkml.org/lkml/2009/5/14/362>

------
jagtesh
I'm not an expert on this, but each file occupies a minimum area on the disk,
called a block. Now depending on the block size and the number of files, it's
easy for 300KB of data to occupy 60MB space on the disk.

eg. 100,000 files of 1 byte each taking 1 block of size 4KB would occupy ~390
MB on disk (instead of 100KB as one would assume).

------
aristus
Two possibilities:

1) `ls -la .` to show "hidden" files.

2) Some process is still running, and it has an open handle on some deleted
file(s). Deleted file space is not reclaimed until all open handles are
released.

~~~
jevinskie
Yes, I forgot to mention the -a flag. This is what it shows for the directory:
'drwxr-xr-x 2 jevin jevin 56M 2011-03-07 12:12 .'

All of the involved processes have long since terminated. I'm fairly sure the
files are gone for good, I actually have free space on my drive! :)

