Hacker News new | past | comments | ask | show | jobs | submit login
Visualizing Ext4 (buredoranna.github.io)
351 points by giis 9 months ago | hide | past | favorite | 27 comments



I did a true graphical visualisation of ext4 at FOSDEM a few years ago. The video is here, the visualisation starts at about 20 minutes:

https://archive.fosdem.org/2019/schedule/event/nbdkit/

Edit: If you're confused about the bit where I talk about the filesystem trims in "blue", well that's because apparently the projector at FOSDEM could not render the light blue colour I was using. I didn't know about this while giving the talk, it looked fine on the laptop screen. There's an accompanying video on my blog which is rendered correctly: https://rwmj.wordpress.com/2018/11/04/nbd-graphical-viewer/


I think in many people's goal to "simplify using a computer" it ends up making things that could easily be educational without actively trying to teach you anything -- basically sparks curiosity and informs a bit. (like this great example the author shows here). One example of this (that previously existed in actual computers) is the old trusty red hard drive light telling you that the hard disk is active... if you were like me, you knew the game was going to actually load this time when it showed a particular pattern and you heard the hard drive make a satisfying sequence of fast disk reads. Seems like a nice compromise is to hide the "advanced view" but keep it there for the curious people who likely will be the next generation of computer nerds making the world go 'round.


When you were a young little nerd old timers were saying "such a shame modern computers don't have LEDs showing the individual state of each bit of the control registers like our mainframe did. Computers these days are so dumbed down. Users can't even see where the instruction pointer is which is so useful to get a feel for what the hardware is actually doing"


Sure, but I think it's fair to say that feedback has gotten too minimal when you have to put sleep(10) all over your diagnostic processes because there is no other way to tell if the computer/app is laggy or dead.


There's a command-line utility called pixd [1] that generates similar data visualizations on the command line. That said, it only shows static representations of binary data and is not nearly as cool as buredoranna's animated gifs showing filesystem changes over time.

It can be helpful to plot these sorts of pixel arrangements on a Hilbert curve, rather than plotting pixels line by line. I learned this trick from a Ghidra plugin called cantordust [2]. 3blue1brown offers some mathematical intuition for the effectiveness of a Hilbert curve pixel arrangement [3].

[1] https://github.com/FireyFly/pixd

[2] https://inside.battelle.org/blog-details/battelle-publishes-...

[3] https://www.youtube.com/watch?v=3s7h2MHQtxc&t=311s


I found this nbdkit demo for visualizing filesystem IO interesting - https://rwmj.wordpress.com/2018/11/04/nbd-graphical-viewer/


The author of which is in this thread as well


This inspired me to do this experiment:

dd if=/dev/zero bs=1K count=$(( 256 * 3 )) of=a.ext4

mfks.ext4 a.ext4

mkdir a

sudo mount a.ext4 a

cd a

sudo chown 1000:1000 .

python3 -c 'open("a", "wb").write(b"\xff\x00\x00" * 2000)'

python3 -c 'open("b", "wb").write(b"\xff\xff\x00" * 2000)'

python3 -c 'open("c", "wb").write(b"\xff\x00\xff" * 2000)'

cd ..

sudo umount a

(echo -n 'P6\n512 512\n255\n' ; cat a.ext4 ) > a.ppm

convert a.ppm a.png

The resulting a.png is reversible - you can convert it back to .ppm file, skip first 15 bytes and you should get a valid .ext4 back.


if twitter didn't do compression it would be fun to store large files as images with twitter as a filesystem


Well, you can maybe put some qr codes less sensible to compression XD. OK, OK, it was a joke, just do some steganography.


It's possible to abuse Unicode codepoints to store bytes: https://github.com/qntm/base131072


Very cool. This kind of data visualisation can really help understand some of the intricacies of how the disk format actually puts things on disk. e.g. the metadata carefully prealloced for at least some usage. I was interested to see what would happen when it ran out of space but unfortunately the animation stopped before that time was reached


The first thing that popped into my mind was the old "defragment disk" visualisation. It looked a lot like the later XP version[0].

Sitting in front of the computer watching the old 95/98 defrag program doing its thing[1] is a nice childhood memory for me.

[0]: https://qvdesign.files.wordpress.com/2012/03/defrag-original... [1]: https://academy.avast.com/hs-fs/hubfs/New_Avast_Academy/how_...


The crazy thing about the 9x defrag was that it required that no other program access the disk while it was working. When anything else wrote to disk, it would say "disk contents changed, starting over..."

You were expected to close down all programs and little utilities that hid in the systray (sorry, notification area). The OS itself was just the OS, there was no indexing happening, no update check, no random nonsense nobody understands. You could be absolutely sure nothing would access the disk on a cleanly booted win9x.


As 95 went on making sure hidden applications didn't exist got harder and harder too. It seemed over time more applications would use the equivalent of a TSR application that ran in the background with no icon in the systray/start bar and on a regular basis would fiddle with the disk restarting the entire process.

I remember some commonly used winmodem driver would cause this behavior.


At that point I'd rather it ran like the old DOS versions or later Filesystem checks at startup (very useful for boot partitions).


I have a similar memory, there's something entrancing about watching a computer do a computationally (in space or time) difficult task with a visualisation about what exactly is actually happening. For defrag it was mostly a progress bar, but still it was fascinating to young me


I kind of think of computer microarchitecure as a factory, as in factorio (Which I admit I have not played, because I've been warned against it)

There's instruction stream streams into L3, L2, L1 caches from DRAM, PCIE, DMA

then there's multiple cores and they each have register files, they shuffle numbers between registers and the caches.

There's reordering going on, there's parallelisation going on, lots of conveyor belts.

It's all so complicated factory.


Absolutely, just the results of the factory are not physical. You can apply this analogy quite far inside software development, engineers are factories producing data for a compute factory to further process etc. Also I'll echo warnings about factorio, you will lose days. But so very worth it.


This reminded me of innodb_ruby [0]. Super useful set of tools to visualize and learn about the InnoDB structure. Example usage [1].

[0]: https://github.com/jeremycole/innodb_ruby

[1]: https://blog.jcole.us/2014/10/02/visualizing-the-impact-of-o...


If the author is looking at these comments: you could save some bytes transferred and give the user video controls (pause, scrub, adjust speed etc) by converting the gif to a video with something like

  ffmpeg -i ext4.gif -pix_fmt yuv420p -c:v libx264 ext4.mp4
and serving it with

  <video controls>
    <source src="ext4.mp4" type="video/mp4">
  </video>


You can use the Kaitai IDE to visualize various binary formats, down to each byte (or bit). If I remember correctly it has definition files for ext4.


Looking at this diagrams I wonder if there are any file systems that allow for metadata to be stored on a separate device. For example store data on HDD and metadata on an associated SSD drive. I guess the benefits would not be extraordinary to outweigh the added complexity since metadata is much easier to cache in memory.


ZFS does. I've heard of other file systems that can put their journal on a separate device but web search sucks these days and I ran out of time to figure out which.



Facebook abuses XFS realtime mode to do this. Omar discusses it some here: https://lwn.net/Articles/943693/


BcacheFS does this




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: