Hacker News new | past | comments | ask | show | jobs | submit login

Pushing concerns up to the OS level is that one thing we could be doing in so many places, but havent really tried in decades. Should we use a universal format/protocol agnostic way of having data attached to files? Naaaahhhh. /s



I'm generally a fan of taking advantage of the filesystem, especially when your application is just... storing and viewing files. It irrationally upsets me when an application grafts its own "Library" on top of my perfectly working filesystem, requiring me to import my files into an artificial thing that is just like a filesystem.

On the other hand, extended attributes and other filesystem-specific features could be problematic if you want to share files with other operating systems. If I copy a file to a FAT32 formatted SDCard, I need to worry about what might not copy over.


the problem as you said is that the common denominator between all filesystem that are currently in use is to practically forget that metadata exist because you're only one filesystem transfer away from it disappearing (copy from one hdd to another, sent to cloud, transfer with a usb, etc ...)

the other problem is that the filesystem/ desktop environment closely entangles 2 concepts that imho should've been more orthogonal. - data storage+indexing for apps (a common unified KV store would've been more general abstraction that can be used to build upon other kind of abstractions, would be nice if yo could define your own indexing instead of what's done with folders) - data/information access for users in the "desktop environment"


Isn't Android an example of disentangling those concepts? Apps have their own folders, isolated from each other, to store data in, but then they present a completely different view of that data to the user. The user is discouraged and often prevented from working with files and the filesytem, instead, they're made to work with opaque apps.

I really, really hate this design.


yes you're right. but that one instance of doing so, but not the only way. I dislike it too since they put an emphasis solely on the app and not on the data.

what I had in my was just idle thoughts about providing disentangled primitives that could be used to build other things on.

for example the primary key for accessing a file/object could be (computer_id?, storage_id or partition_id , object_id/inode) + ways to define different kind of indexes based on you use cases.

instead of just making apps into silos you can have the things builts on top of these primitives be typed structured data objects + API/interface. have an Object explorer and programs can declare they they are able to display or manipulate custom data Type X. you can then have GUI be composable the same way the pipe operator work in the cli.

you can define a regular filesystem on top of these primitives, a relational database, a tag system, or something new all together. if you don't want folders you would ca to deal with them.

the work on fushia OS seem to explore something along these lines (BlobFs + MinFs + Components). (https://fuchsia.dev/fuchsia-src/concepts/components/v2/intro...) Pharo/SmallTalk seem to also explore the ideas akin to this. (https://pharo.org/)

to be fair the current state of affairs is similar enough with file extensions + mime info if you squint hard enough and pretend that app and systems folders files don't exist but it's held with pinky promises.


>common unified KV store

So, something like the Windows Registry?


Well, I was going to say "limited key size" but it looks like in current windows versions the size is "available memory" so... yeah.


The windows registry is more a unified configuration file for the whole system, I think what GP talked about is more about a general store for data


My gut reaction to this was "isn't that just sqlite"?

I don't think this is what you were thinking of, but I do kind of love the idea of formalizing sqlite file formats where the "metadata" is standardized and the "file" is stored inside. Like a file format for a recipe, or a picture, or ...


Isn’t that just a container format, like what video and audio files have used for decades?

I don’t know of any existing container formats with support for a relational DB as one of the embedded streams, but the whole point of container formats is that you can add arbitrary metadata, which of course can be a whole database.

Of course, the way BeOS does what OP is talking about is by having many DB columns within the filesystem itself! (The filesystem is a queryable database).


Yes, I totally get the distinction (and I was among those amazed by BeOS back in the day - I still show the old demo videos to friends who haven't seen it). I hadn't considered the container formats used by media, but in my head it would be the other way around - each file would be a sqlite file first so that they all share some commonality around access and inspection (I'm assuming in my ignorance that the media container formats are different).

Are there any database filesystems today? I haven't really looked, but the last one I heard of was the one that MS abandoned years ago. Actually I suppose Haiku probably still has one? I can't imagine how difficult it would be to get a DB Filesystem as a mainstream choice on Linux, let alone across OSen.


If you want something more tangible than old demos, try HaikuOS. It works wonderfully from a usb drive.

I’m too young to have known BeOS (well I was a kid in the nineties so not too young but afaik, BeOS was pretty rare (overall and) at home. However I’m old enough to have known OSes that were build around offline usage and that’s what I loved trying Haiku is that it remembers me when your OS was made to use your computer, not to be an internet client.

I feel that having your emails as files is a good example of that : you connect to the internet to get your mails. You disconnect. You want to work to those mails on another computer ? No problem, just copy paste them on a USB d… I mean floppy disk, answer your mails put the answers on your floppy disk and send them tonight when you’re back home.

It may feel pretty cumbersome when we have today’s tools but that’s the feeling I feel I lost : owning my data not only legally but physically. And not only physically but physically in a useful way.

It remembers me the time when you just had to understand simple abstractions like files and folders and windows to own the computer (and you were just learning some programming language away to master it).


Every filesystem is by definition a database system.

Out of extant systems, the closest to BeFS outside of Haiku is NTFS as implemented in Windows. In fact, you can run pretty much all of the BeOS behaviors on NT since ~1994 or so, it's an issue of programs not using it. Part of that is allegiance of user applications to Classic Windows-compatible APIs.[1] Part of the "WinFS" efforts was to break with the old approaches totally and push more indexed/searchable APIs etc. but in the end all we have is pretty robust internal search engine that is sadly underused (just like the extended attributes support). It really doesn't help that Explorer.exe is in many ways ridiculously outdated, with Windows95/98 peeking out from various corners when you look deeper into how it acts.[2]

Then ZFS but the ZPL/DMU APIs do not include indexing layer IIRC (also on systems that use Irix-style xattr APIs you lose full scope of resource forks).[3]

Both OS/2 (with HPFS) and OSX do some work with integrating metadata in filesystem, with various level of usage and end-user accessibility.

And of course there's some level of integration in AmigaOS Workbench and .info files, but that's arguably the most niche by now and never evolved to this level of use.

[1] Know the regular posts about how you can't create a file named "CON:" or "COM1:" etc in Windows? In Windows NT you actually can, but a) the only way to do it "safely" is to use alternate NTFS namespace b) I bet most people have never heard there was more than one namespace c) Win32 applications will usually only see Windows95 LFN-compatible one (in two versions, UCS and ASCII) unless they get out of their way to get access to other namespaces

[2] It's not the most egregious though - at least explorer.exe internally uses paths that work with default APIs of the system. In 2021 I ended up having to dig out an AppleScript for converting MacOS Classic paths to POSIX ones, because it turns out Finder AppleEvents API returned only Classic paths. Or at least neither I, or anyone I could find, knew how to get Finder to return a path that wasn't Classic HFS one

[3] Irix-style xattr API is limited in capabilities to only add short K/V data to a file. Solaris instead effectively gives you a complete directory attached to a file, while WindowsNT on NTFS treats everything including main content of file as "extended attribute" and opening file as normal is essentially "open the $DATA attribute of the file".


I prefer the chaos of devs randomly storing data in appdata, programdata, the program files dir, the x86 program files dir with mostly but not entirely duplicate data, c:/games/game/game and ../, c:/game/game, ~/games, ~/game/game &./game, ~/documents/game, ~/documents/games/game, ~/game saves/games, etc...


Special place in hell for games storing heavy or frequently modified files in user's Documents dir - nowadays, Documents is often a synced folder backed by OneDrive. The amount of wasted processing, bandwidth and IO wear generated by this is tremendous.


The one that I've always found odd is everything deciding to dump itself in the user profile directory (this is even something that stuff like VS Code does).

XDG_CONFIG_HOME (|| ~/.config) and friends has been a standard for a long time now on *nix (including macOS) and AppData (née Application Data) has been the standard on Windows for over 20 years at this point.


Other than saves, what files do games heavily modify? And if you're complaining about cloud syncing of (auto)saves, I personally think it's a good thing.


One example I experienced recently: Sims 4 uses a subfolder in Documents as a cache for downloaded data and decoded chunks. It creates and deletes files there constantly while the game is running; we're talking dozens of files per minute or more. Few minutes of play, and there's nothing but hundreds of new additions and deletions in the "recent history". Not to mention, all that auto-syncs with any other machine you have online and using the same Microsoft account.

Wrt. Saves, auto-uploading those can be good, but it's unnecessary for games I rent on Steam, which already handles cloud saves on its own.


Files as an interchange format, sure. But as a primary storage system for application structured data they leave a lot to be desired:

- Portability of metadata is lacking (and can be sneakily removed when you least expect it), as other commenters have pointed out.

- Filtering sets of files from out of a (possibly deep) directory hierarchy based on different criteria requires writing a lot of subtly different loops to check metadata. Querying e.g. SQLite handles that part for you once you express what you want, without as much risk of messing up one of those loops.

- Similarly, a schemaful database can prevent your writing incorrectly-shaped (meta)data up front, where filesystems are flexible enough that bad writes may not be noticed until your program tries to read that data back out.

- The accessibility of file-based internal storage systems to human users can sometimes be too high, a la the joke about someone "organizing and renaming things in the win32 folder". Cracking open and messing about with a flat-file all-in-one DB is a higher barrier to screwing around. To be fair, permissions mitigate this risk substantially.

- Intermediate failures with some single-flat-file DBs are much less impactful than with many filesystems. Two parts to this: one is that a more rigid structure in a DB prevents certain invalid writes entirely; the other is transactionality. While plenty of local-flat-file "myapp.library" DBs don't have a good atomicity story underneath (I'm always saddened when I poke at a proprietary data library format and find that it contains a bug-ridden, informally-specified implementation of half of SQLite), and while some file systems make logical atomicity possible to achieve (e.g. via CoW copying data/directories, doing mutations, atomically swapping a source-of-truth link to the new version, and dropping the old), filesystem-as-database systems tend to fail-corrupted often due to unexpected issues (from bugs to "oops, don't have write access on 1/1000 files" to SIGKILL/power loss/drive failure) during data modifications. While I wish more file-based systems were as robust as maildir, I won't hold my breath when SQLite is right there.


> universal format/protocol agnostic

That's not a "should we", that's a "we can't". Too bit a civilization-level project.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: