Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Has anyone ever tried building an OS that doesn't use files?
33 points by freework on Sept 12, 2014 | hide | past | web | favorite | 64 comments
Since the beginning of computer time, all computer systems have been based on the idea of a "computer file". OSX, Linux, Windows, and every other OS has been based on the system of files and folders and filenames and crap like that. I'm not too versed in computer history. Has anyone ever tried making a computer system that does not use files?

In my opinion, referring to the data that computers deal with as "files" made sense in 1954 when every large company had a "paper file system", but in 2014, I think its time to come up with a new system that better mirrors how we use computers.

I'm working on a project I'm calling "Library Transfer Protocol", which is aiming to replace the concept of "file" and replacing it with 'Library item'. Basically, in 2014 computer usage more closely mirrors the workflow of an author (revisions, publishing, etc), rather than an employee filling up a file cabinet for internal use (thank to the facebook and the like)

Before I get too into this project, I'd like to know if anyone has ever tried doing something like this before.

Several experimental systems have proposed replacing the traditional file system with what they call a "single level store": you have data structures in memory, then the OS saves them and restores them as needed from disk storage - but there are no traditional files on the disk, and the users don't interact with the disk storage at all - they just interact with the data while it is in memory. In this scheme, the disk storage is something like the pagefile in a conventional OS, but that's all there is - there are no conventional files.

An interesting single-level store was discussed by Robert Strandh in his 2004 proposal for a Lisp operating system, Gracle. I can't find the original paper on the web anymore but some pertinent excerpts are in https://github.com/jon-jacky/Piety/blob/master/doc/gracle_ex.... Strandh referenced another experimental OS with a single-level store called EROS. I see he has a more recent LispOS at https://github.com/robert-strandh/LispOS.

I think this will be the way storage goes in the future. With the introduction of memristor-based storage which is both non-volatile and as fast as DRAM and cheaper to produce than mechanical drives, the need to distinguish between 'in memory' and 'on disk' becomes a great deal less important. The distinction which becomes important then is 'what is the user likely to need next'.

the problem I have with flat systems that uses tagging or meta data is that one day when adding pictures I might tag them as "pictures," another day I add some pictures and forget that i called them pictures before and this time call them "photos." A week later I might tag them as "images."

Makes searching and organizing a pain in the ass.

I see no reason why an intelligent system couldn't learn common synonyms. It would be a pain to do on an exclusively local system, but on a networked system it could trade tags with others and learn these synonyms. You might call yours 'photos' while I call mine 'images'. When I transfer a file from you to me, it would be able to translate the tags into my own personal folksonomy. Which would be more convenient than the current situation where I have to rename and integrate into my filesystem every file I download from you.

But also, "where in the (tree-structured) file system did I put those notes I made last year?". Strandh observes that a collection of tags is not so different from the names along a file path.

BeOS had a nice idea, although it still used files. It's file system doubled as a database, and you could add arbitrary columns to files. For example, contacts were empty files in a special directory, and had attributes like "address", "name" and so on. Mails were also stored in plain files, but unlike mbox/Maildir all the metadata was stored as attributes, not in the files, making it easy to process them with scripts, or to sort them in the file browser.

Its a lot like the never-finished WinFS from Microsoft.

Funnily, modern file systems (extfs3/4, NFTS, HPFS+?) all support extended attributes in some form or another. However, they are currently only rarely used: Mostly for the "this file was downloaded from the internet, do you really want to open it" flag. I wish more programs would use them to store interesting metadata, but it's basically a chicken and egg thing now.

Windows and GNOME also have concepts where you can have calculatable attributes - you have a little library that looks up metadata in a database or parses it from the file, and then serves it as an additional attribute on the file (visible in the file properties tab). You can see it e.g. on mp3s or word documents in windows. However, it doesn't seem to be widely used either, and I wouldn't be surprised if that function has been gutted out of GNOME lately.

Funnily, modern file systems (extfs3/4, NFTS, HPFS+?) all support extended attributes in some form or another. However, they are currently only rarely used: Mostly for the "this file was downloaded from the internet, do you really want to open it" flag.

For anyone interested in these, they're called Alternate Data Streams in Windows. When you download a file from the Internet, an ADS is created, called "Zone:Identifier", which contains a ZoneID indicating where it came from.

I believe Windows 2000's Explorer allowed the user to add arbitrary tags to any file, and that those were stored in an ADS. For some reason, that was removed in, I think, Windows Vista.

At least one virus stored itself using Alternate Data Streams, which I imagine is related to why they've been more or less downplayed.

I've long wanted metadata fields to help me organize things, but the Achilles heel of systems that use them has always been the lack of standards. I use 2-3 different OS's daily and others less frequently, so I value having the same view of my files no matter where I am. The end result is I just throw everything into the file name.

At one point there was a command+database called pq floating somewhere in the plan 9 ecosystem. You'd pass in a path to a resource and it would return a path to the relevant metadata.


>Basically, in 2014 computer usage more closely mirrors the workflow of an author (revisions, publishing, etc), rather than an employee filling up a file cabinet for internal use (thank to the facebook and the like)

How did you come to this conclusion about patterns of usage? I'd think the typical user/consumer would more likely have 1000 mp3 files rather than 1000 personally authored Microsoft Word documents (or Photoshop PSD files, etc.)

What about another common usage such as digital camera photos? The digital camera (or iPhone) has jpg "files". How would the user mentally translate the "files" living on a FAT32 flash card and copy them to your "Library Items" storage system? Do they keep 2 mental models of storage paradigms in their head? If your proposal includes a driver/wrapper for hiding the FAT32 file inside the concept of "library item", it seems like you're just renaming "files" to "library items". It's more a shift in terminology rather than shift in paradigm as a sibling comment already noted.

The filesystems in an operating system (NTFS, ext3, etc) are already implemented as special purpose databases. The "rows" are file id entries and they each point to the "blobs" which are the file contents. Whatever you propose to build has to reimplement this underlying "database" as well. Whether you call the rows of that database "files" or "library items" or "objects" or "documents", it isn't going to revolutionize the approach.

My system is only meant to be used by humans. Software will still use the traditional filesystem.

Instead of the user supplying a filename/path for new content, they submit arbitrary metadata. When retrieving content, they query via a language called LQL (Library Query Language).

Basically the system works a lot like ID3 tags for mp3 files. 'Arbitrary metadata' is a lot like the fields for 'title', 'date', 'album', etc except they are arbitrary (you can use any set of key/value for metadata items)

Libraries can connect to other libraries. A 'Library' is much like an email inbox, except its one user per machine instead of multiple users per machine. Each library is 'addressed' via a domain name. All communication between libraries is done through HTTPS.

Traditional filesystems use a permission system that is very archaic. In 1960 it was common for there to be multiple users on one machine, but in 2014, everyone has their own machine. File permissions are useless, since I'm th one person who ever uses this computer. "Permissions" in the 2014 sense is which of my facebook friends gets to see these photos.

So, you're not building a filesystem (because that's hard, and requires concrete engineering skills), but instead a glorified file metadata search?

So, MongoDB with a file URL? (Hint: that's you could implement the MVP of this, and if you use a URL you can even reference user files they don't store locally)

And file permissions are dead? Because nobody has kids that use the same desktop they do?

This is (from a technical standpoint) the silliest goddamn thing I've ever heard.

From a product standpoint, you could probably pitch and get a few M. Why the fuck not.

What I'm buoding is most definitely not a "filesystem". Those use filenames and folders and crap like that. My system does away with that.

What you said about a glorified metadata search is right on.

Some people share devices, but most people we share with use other devices.

I'm glad you think its silly because all great things have haters (bitcoin, justin bieber, etc.)

So, don't be so dismissive of "crap like that". There are a lot of legacy ideas that are with us not from sheer inertia, but because they work.

Many modern filesystems are littered with the reeking remains of attempts at supporting metadata (for example, NTFS), most of which nobody cares about and which just add implementation complexity.

If you want to pitch a more useful and more abstract version of what you're describing ("how can we present searching and accessing a metadata forest backed by traditional hierarchical file stores") then by all means I'll be friendlier but right now you're coming across as a crank ignorant of the history of the ideas you're decrying.

I'm not building a filesystem. I'm building a non-filesystem. Not metadata jammed into a filesystem. A whole new system that is used by the end user in lieu of the filesystem.

"I'm glad you think its silly because all great things have haters"

Stupid things have haters too. Some people "hating" your idea doesn't make it good. Doesn't make it bad either.

I don't know what windows does these days, but OSX has had tags and metadata for "files" for a while. It's even easy to add the info, and search for it. I never bother though because, for me, it doesn't solve any problems or add any value.

Baloo, https://community.kde.org/Baloo is the current incarnation of the Linux semantic desktop, http://www.semanticdesktop.org/

"It's responsible for handling user metadata such as tags, rating and comments. It also handles indexing and searching for files, emails, contacts, and so on"

Cross-device research for semantic desktop: http://www.dime-project.eu/en/home/dime/project/solution/con...

"They laughed at Columbus, they laughed at Fulton, they laughed at the Wright brothers. But they also laughed at Bozo the Clown." -- Carl Sagan

I wouldn't say that having haters is a good measure of "greatness", though.

Have you looked at camlistore? It sounds familiar to your idea.

and git-annex

Apple's Newton OS used object-oriented data stores called "soups" rather than a file-oriented storage paradigm:


MTS, z/OS (its a long story), OS/360 and basically everything written before Multinixs (the pre-cursor to Unix). Didn't use files. Stone age computers.

The hierarchical file system as you listed it really only started to come into its own in the mid 60's. With LISP machines at MIT and Multinix at AT&T.

Storing data in files as you call it is old, and well known solution to this problem. Because finding a node on a tree is simple, and this is how file systems tend to work. Because thinking of objects, as subsets of various super classes of objects is easy for people to understand, when you don't explain it in those words.

The reason very old OS's didn't store things like this, is because there weren't much permanent storage. Actually MTS uses what are roughly files but uses a dot notation to seperation files. Which will look similar to usenet

Where data is your current record.


I support moving to a more revision, publish, etc. structure. But moving away from the tried and true hierarchical model will be difficult. Even an object based file system will develop a hierarchy of inheritance.

Although Multics didn't store "files" per se, but segments, which has a max size of 1 MiB of 9 bit bytes (36 bit CPUs). The were organized in a hierarchical manner, with '>' as the separator (vs. '/' for UNIX(TM) and its descendants, and '\' for MS-DOS and Windows since '/' was used in version 1 for argument :-( ), and were accessed in single level store fashion, mapped into your address space.

There was some time after its creation (as I remember hearing, at least) a "multi-segment file" abstraction created for data sets that were > 1 MiB, something of a kludge.

There's a long history of mocking and decrying the file as a user-level concept, and many applications hide the concept from the user, even though they store user data in the filesystem. Music players (such as iTunes) are an example: there are songs, albums, and artists, but files do not show up in the UI in any way. This is standard practice at the application layer. However, implementing something at the OS layer with the expectation of exposing it directly to the user goes against the current thinking that it's the application layer's job to provide user-friendly concepts, and that the job of the OS is merely to support the application layer in doing that.

If providing support for application-level user-friendly abstractions is what you want to do, then I would suggest studying applications with UI abstractions that you admire and judging your OS storage layer by how well it supports application development.

Files are a shorthand for "documents". Don't be so hung up on terminology.

First, look into history more--there are several non-hierarchical (read: flat) file systems out there.

Second, while the workflow might mirror authoring more closely (which I think is horsehit, but that's neither here nor there) the artifacts of that process are what matter. Existing notions of a "file" map very cleanly onto the storage and organization of such artifacts.

There is an argument to be made for having better querying capabilities or permissions or whatever, but what is to be gained from throwing a commonly-accepted idiom away?

There's nothing to be gained by throwing an idiom away, but maybe there's a lot to be gained with a new idiom.

There is indeed something to be gained by throwing an idiom away: not needing the features which support the idiom which create problems elsewhere, even when that idiom isn't being used.

For instance if we throw away certain programming features from a language, we can gain reliability and security which threaten even the integrity of code that doesn't itself use those features.

Butg there is the time spent in learning a new idiom, and maybe nothing is gained. Tech churn.

On the AS/400 there was no traditional file system, the only storage was in a database.


Unlike the "everything is a file" feature of Unix and its derivatives, on OS/400 everything is an object (with built-in persistence and garbage collection).

But in the end, isn't that just a putting files in database?

Well, if you see it this way, then there is no way to not use files at all. You favorite song will always be a string of bits and you will have to store it on some persistent storage if you want to listen to it again after power cycling your device. You will also always need some metadata to remember where you stored the bits. You can now decide how to call your bits - files, rows, objects, ... - and make some choices about the kind and structure - flat, hierarchical, indexed, ... - of your metadata, but there is not much more to the whole thing. Maybe you can add some goodies like redundancy, garbage collection, deduplication and what not, but in the end there is not that much of a difference between files, databases, registry entries, user account information or even objects in memory - everything is just a bunch of bits with some metadata.

WinFS was supposed to be in Windows 7, unfortunately they decided to drop it, I was really looking forward to that.


I was disappointed by this as well. Not so bothered at the time it was dropped, as I was a bit pro Linux and didn't want to see a killer feature that would beat it. There was a lot of talk of an open source equivalent, but it seems that as nothing came of WinFS, nothing open source appeared either. It would be damn useful having a queryable filesystem.

Browsers. Browsers are de facto operating systems (though typically running on another OS, not on bare metal), and they don't think in files. They handle windows, tabs, documents, document elements (DOM nodes) etc.

You could actually make a far better case for Emacs being an operating system rather than a browser, what with the former not having as many security and sandboxing restrictions.

That and you can actually boot from Emacs: http://www.informatimago.com/linux/emacs-on-user-mode-linux....

Really, by your definition, any piece of software with a plugin-based architecture is an operating system. There's certainly a trend we're witnessing where, for the end user, the browser is displacing the underlying operating system, and turning the latter into an environment for hosting the former. But to call the browser an OS is a huge stretch.

> You could actually make a far better case for Emacs being an operating system rather than a browser, what with the former not having as many security and sandboxing restrictions.

But security and sandbox restrictions are very important elements of an OS, don't you think?

> Browsers are de facto operating systems (though typically running on another OS, not on bare metal)

What does this mean? Are you defining "operating system" loosely enough that you could also count the JVM, for example?

Good question; I don't have single definition for "operating system".

The Wikipedia page for Operating Systems says "An operating system (OS) is software that manages computer hardware and software resources and provides common services for computer programs", and by that definition, the JVM would count as one.

Though most users also expect some GUI/Desktop from an OS, and that's what browsers offer, but not the JVM. (The JVM allows programs to have a GUI, but it doesn't have a GUI itself, for example for launching other programs).

As used in the gaming world, "platform" conveys approximately the same thing and likely wards offs nitpicking about what is and isn't a kernel etc.

The One Laptop Per Child laptop uses a Journal instead of folders -- http://wiki.sugarlabs.org/go/Design_Team/Designs/Journal -- and GNOME's "Activity Journal" project is similar. (Smartphones are starting to use some of the same ideas, too.)

I've toyed with the idea of replacing files with processes. If you have some data that you want to keep, you have a process that holds it in its process memory, and can give it other process via an IPC mechanism (if the other process is local) or over the network (if remote, although you could of course also use the network locally).

I never got around to trying it out. I think I may have tried to start some discussion on usenet along these lines maybe 10-15 years ago, but no one seemed interested.

A "directory" would simply by a process that provides some kind of lookup service to let other processes find the data storage processes that contain the data they are looking for.

You'd still have disks on your computer, but they would be mostly used as swap space.

The system would include some standard simple data holding and directory processes that implement a Unix-like namespace and permission system, but it would be easy to override this for data that needs special treatment. Just write a new data holding program that implements the special treatment you want and knows how to register with the standard directory processes.

For your project make sure you plan what to do if one application created the item, but another program wants to open it.

Don't try to do it by applications registering types they can open - this never succeeds, there are simply too many file types in the world.

Also think about how to send data to someone else.

And finally think about how to integrate with existing devices that still use files.

Not disagreeing with you, but plan9 went the other way, huh? Rather than 'no files', "everything is a file!"

Trying to provide some balance :)

> I'm working on a project I'm calling "Library Transfer Protocol", which is aiming to replace the concept of "file" and replacing it with 'Library item'. Basically, in 2014 computer usage more closely mirrors the workflow of an author (revisions, publishing, etc), rather than an employee filling up a file cabinet for internal use (thank to the facebook and the like)

Please define how a "library item" is different from a "file".

Is it made of bytes that can be read into a buffer and accessed?

(If not, how can an H.264 video or MP3 object exist as a library item and be processed?)

Do you not have spaces which assign names to library items?

The Hyper-Text Transfer Protocol has already replaced the concept of "file" with "resource". A URI doesn't necessarily name a file.

A Library Item is an abstraction above 'file'.

The user constructs a query then sends it to a server, the server returns a list of items that match the query. For instance this may be a query:

INCLUDING artist contains "Pink Floyd"

and the results will look like this:

[ {"artist": "Pink Floyd", "title": "Another Brick In the Wall", "mimetype": "audio/mpeg", "album": "The Wall", "url": http://drive.google.com/blahblah.mp3" etc }, {"artist": "Pink Floyd", "title": "Money", "mimetype": "audio/flac", "album": "The Wall", "torrent_url": http://drive.google.com/blahblah.torrent" etc. }, {"artist": "Pink Floyd", "mimetype": "image/jpeg", "album": "The Wall", "url": http://drive.google.com/albumcover.jpg", "purpose": "album artwork" etc.} ]

You can add more to the query to filter out the content you don't want. The query language is much like a SQL WHERE clause. The query language is meant to be super simple and something your grandma could figure out.

The app then can retrieve the actual file from the url. The end user has no idea about the underlying file crap.

Your proposal doesn't really sound like it deprecates files, but rather that it enhances them with attributes. Most operating systems other than Unix have already expanded on the file as a bag of bytes to one that supports features such as extended attributes and forks. Even most contemporary Unix file systems have xattrs (most notably XFS), but overall they seem to have had limited impact, and in the case of NTFS-style alternate data streams, even introduced some nasty security risks.

Really, what you seem to want is a file system with built-in version control and network sharing? Git and Mercurial are already virtual file systems of sorts, I guess.

Look up Common Lisp's file handling functions. The number of options to open a file seems ridiculous in our world, but in that one function you can see a history of how many different ways "files" used to be accessed.

Did you just describe git? Git has revisions of files, and they can be tagged, merged, etc. It still works on top of a conventional file system though, because why reinvent the wheel?

Git is pretty much only for versioning (specifically annotation of versioning). My system is more for indexing/sharing data.

While not exactly an answer to your question, you might find it worthwhile to take a look at the 1060 NetKernel platform - http://1060.org

"NetKernel can be considered a unification of the Web and Unix implemented as a software operating system running on a microkernel within a single computer."


Palm OS?

Yup, was going to say the same thing.

It didn't have files, it had database records instead. In general each app used its own database, but it was possible to read databases from other apps too.

Also came here to say the same thing. IIRC, it was also free-form data, so it was NoSQL before it was cool.

begins trailing off... I wonder if I still have any old PalmOS app code laying around on an old hard drive somewhere. Would be fun to have a look at that. I think that was the last time I did C++.

I still have my Palm V - it's still charged, and I use it occasionally.

What your building sounds like a document management system: http://en.wikipedia.org/wiki/Document_management_system. Might be worth looking at some to see how they'd fit with what you're envisioning. Even if you write it yourself, it can be helpful to see how others have solved the same problems.

Yep, check out camlistore (http://camlistore.org/). It can be quickly described as a big database at which you throw all your content, along with a JSON containing any attributes you want. These attributes are indexed and then searchable.

I think the file metaphor is a must for the separation of OS and applications. Unless you have a better conceptual framework to deal with this problem, you'd probably live with "files".

MUMPS... it's sort of a database/os/programming language hybrid that uses a key value store.

Yes, iOS.

Is it on github?

https://github.com/priestc/LibraryDSS thats an old repo from when I tried to write code for this about a year ago. Most likely if I ever get around to working on it again, I'm going to make a new repo for it.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact