
What is a file? (2011) - polm23
https://www.microsoft.com/en-us/research/publication/what-is-a-file/#
======
ghusbands
This is a very wordy paper that basically just says: People want to own their
pictures (and associated (Facebook) comments/likes) and similar and want to
know what happens when they delete them, and ownership may be the most
important aspect to visibly preserve.

------
GoToRO
This is the problem when somebody creates something simple that just works:
very quick there will be a lot of other people trying to add to it "just a
little thing" and then you have a lot o complexity. It only stops when nobody
can figure out what that thing became.

------
amelius
Nice, but change the concept of a file, and you'll run into all sorts of
interoperability problems.

~~~
kmicklas
Yeah but it will have to be done eventually. A lot of the pain in modern
programming is trickle down from bad systems abstractions we're stuck with
from Unix and its contemporaries.

~~~
xfer
bad system abstractions? Can you elaborate on what a good abstraction other
than a file for a operating system would be?

~~~
DonaldFisk
Just have persistent objects (or strongly typed values). LaTeX documents,
tables, images, videos, songs, computer programs, etc. have very little in
common, so why make them all files?

~~~
xfer
File is an abstraction that provide a consistent semantics on what operations
can be performed on a piece of arbitrary data/resource that the operating
system can expose. Nothing prevents you from creating another layer of
abstraction that work with your various objects.

I would love to see your proposal on a unified semantics of all these possible
"strongly typed values" that the OS can expose.

~~~
DonaldFisk
Please read this:
[http://thomas.enix.org/pub/rmll2005/rmll2005-shapiro1.pdf](http://thomas.enix.org/pub/rmll2005/rmll2005-shapiro1.pdf)

Because EROS processes survive system shutdown, no file system is required to
provide persistence. The operating system instead implements a transactional
block store with only two kinds of objects at the disk level of abstraction:
pages (which hold user data) and nodes (which hold capabilities). Human naming
services (directories) are provided by applications that act as directory
servers. EROS does not have any notion of a “file system root directory” or a
universally shared file system at all. An EROS system is simply a very large
space of objects that are connected together by capabilities. Surprisingly,
the resulting persistence implementation is both simpler and faster than a
conventional file system.

You can actually go further, and dispense with the operating system too, by
running an image-based programming language, such as Smalltalk or Lisp, on the
bare metal. Device drivers and other essential functions of operating systems
then become part of the language.

~~~
spc476
I read the paper, and it appears to describe a scenario where users don't even
own the computer---they just rent time on a computer run by somebody else
(known today as the "cloud"). Also, it seems to describe a silo of
applications where moving a file across applications becomes a significant
action. I'm reminded of a story of a person, who, when asked to copy a file,
loads up the application that made it (in this case, Microsoft Word), opens
the file (from the harddrive) and saves it in a new location (the floppy
drive), not even aware that one could just "copy c:file.doc a:file.doc".

It does not seem like a system I would like.

I'm also skeptical of image based languages (but I'm willing to admit I am
biased towards files).

------
gumby
We are still stuck in the past:

1 - Machines use a Unix filesystem model even cruder than the Multics model it
was derived from

2 - Network services and especially micro services essentially use an object
model (mostly JSON)

3 - And with phones, sandboxes, containers and security concerns, applications
are reverting to file silos (where a given app's data won't interoperate with
other applications). This happens at a macro level too where the FBs of the
world understandably want to rope you into their "walled silo".

Is it even possible to embrace a more flexible and powerful model at this
point or are we stuck?

~~~
blowski
For the majority of people, the benefits of Facebook's walled garden approach
are cheap, immediate and easy to understand. The disadvantages are only
problems for a minority. What's more, Facebook can afford to hire a whole
marketing team to promote its approach.

Until someone can show concrete benefits for the masses, we will continue in
the current direction.

------
pcunite
I love the file hierarchy. Miss it on my iPhone.

~~~
DonaldFisk
Objects are linked to each other in different ways, so some kind of database
(I prefer graph databases) with a proper search engine aware of different
object types (instead of just grep) would be better.

------
eriknstr
> The first suggestion for a way forward is perhaps the most obvious: it
> entails rethinking the role of metadata. [...] But metadata is also now
> becoming central to what users understand as a file, though they might not
> always think of tags, comments, playlist information and so forth as
> metadata. For what a file is is now often bound up with the things added to
> it, not only by the originating user but by others too.

> Consider for example, behaviours reported by [5]. In their study of
> teenagers and their virtual possessions, participants reported that part of
> the value of photos posted on Facebook was the metadata associated with
> them: comments and ‘likes’ were so pertinent that they were sometimes
> printed out alongside photos and pasted onto bedroom walls as a collection.
> This materialisation of the digital is indicative of a difficulty associated
> with the current technological landscape.

> It is not clear how one would digitally export a Facebook photo in order to
> view it in this way with another computer program or application, and this
> remains so despite recent innovations in the Facebook service. Yet it is not
> surprising that users should want to treat these entities in the way they
> treat a file. If they can upload their photos to Facebook, and given that
> they do so the photos are file-like objects, why can they not download them
> again, while retaining the value they have accrued, but still with the
> benefits of file-like properties? Although it is now easier for users to
> export their data from Facebook, these exports, once represented simply as
> ‘a file’ on a hard disk, lose their potency.

Certainly what we don't need is more meta-data attributes on files.

IMO one should either

1\. Create a simple file format that bundles the contents of a post. For
example a zip file with the media, comments, and likes and such, or

2\. Have the post (for example as JSON) and media in separate files and store
the references to the other files in the post, sort of like in HTML. Perhaps
have the computer system be able to extract such references and let you easily
operate on files that "belong together".

They even mentioned databases and relationships earlier, and grouping files
together in different ways.

\---

> This bundle, this new ‘file’ type, is not merely a complex data type; the
> important thing from the users’ point of view is that it is a mirror of the
> social life that the file enables.

I have no idea what they are trying to say here.

If you create a file format like I said that contains all of the data that
made up the original post then you can represent that at a later point and you
can choose to render it just like facebook would. Surely that's exactly what
the users want?

\---

> However, this immediately raises complexities. For instance, images posted
> to Facebook might be copied not only by the person who posted them, but also
> by others. In these circumstances, should these others be able to copy the
> metadata, the tags as well as the thing-itself? If so, what of the rights of
> the owner or, if you prefer, the maker of the initial file? When people copy
> an originating file, would they be creating a new file or would their new
> entity be a version of the original one? Is there an order of precedence
> that we are proposing and ought this to be reflected in the concept of a
> file that might apply?

Bits don't have color.
[http://ansuz.sooke.bc.ca/entry/23](http://ansuz.sooke.bc.ca/entry/23)

Trying to accurately track origin of a post is going to lead to nothing but
trouble.

Don't try to build a technical solution for something that is not a technical
problem. If someone breaks copyright laws you take them to court and sort it
out there.

>It seems to us that there is a distinction that ought to be made between
things that are put on the web, which the originator wants to have file-like
properties (even as that thing develops a social life once on the web), and
those things that are posted that the user does not want to have file-like
properties. The properties we are thinking of have to do with questions like
whether ‘ making a copy’ means making a copy, a version of the thing itself,
or having and owning (as it were) the originating thing itself and all that
has ensued in that thing’s social life.

WHAT??? Just, WHATT??? Are they purposely trying to ruin the internet? It's
not up to one person to decide in which manners others copy it or not. Once
again, if someone is doing something illegal, take them to court. And if
they're not doing something illegal, don't try and restrict what other people
are trying to this.

Fuck this. No, really. I'm done reading that paper.

------
eriknstr
When I first started using computers, I did so on Microsoft Windows 95. The
first applications that I used were Netscape Navigator and MS Paint, as well
as a few games. Being a child when I was introduced to computers, I did not
have any notion about what was going on inside of the computer at all. All I
knew was that there was a screen, a mouse and a keyboard, and that I could
click on things on the screen and I could type on the keyboard.

The first time I was confronted with the notion of a file was certainly when I
had painted something in MS Paint and I had clicked the save button. I think I
had been told to not click "My Computer", which makes sense -- you don't want
a child to accidentally move, delete or rename files on your computer. Hence I
had no notion of the file system. All that was know to me was the desktop, the
start menu, and a select few programs accessible through either icons on the
desktop or in the start menu.

I had played a couple of games on the computer, and in those I could save the
game and then the next time I could resume the game someplace near to where I
had last been -- or rather, I could have my father help me resume the game. So
when I managed to save the painting I sort of expected that it would just show
up on screen the next time I started MS Paint. When it didn't I was a
befuddled for a moment but I just concluded that I didn't understand what had
happened and didn't give much more thought to it. I think this is pretty
typical of how most children treat situations that confuse them.

This is user level 0. You are able to move the cursor and to type a little bit
on the keyboard and to run some specific programs, but that's it.

During the next few years I learn how to work with files in MS Paint and other
specific applications.

However, not all files are equal. If I try to open a file that was made in one
application with another application it will often either result in an error
message or in garbage on the screen.

This TIED my notion of a _file_ to _specific programs, and to the content that
is shown on screen_ for the LONGEST time. It is a bit difficult to explain
what I mean here but I think that to the majority of the population of a
whole, this is what a file is to them. They view a file as an icon that you
can open in a SPECIFIC program. And they call that file a "<name of program>
file", or they call it by the extension, but they have no idea, or they have
the wrong idea, about what is the contents of the file. If you give a regular
Windows user two files which are both named say .dat, (a commonly used generic
file extension for data,) then they will think that those two files
necessarily must be of the same kind "somehow" and that they are to be opened,
both of them, with some specific, unknown program. This is bad and harmful in
my opinion.

Likewise, I was quite confused for the longest time about "My Documents" and
"My Pictures". I was confused by why things were being put in "My Documents"
by default when those things were not things that I considered to be
"documents".

Furthermore it was quite mystical to me for a long time how "My Computer"
could be on the desktop at the same time as my desktop was under a folder that
was within "My Computer" itself. This however is not a _huge_ deal. Just yet
another thing that didn't make sense to me while I was trapped with the
graphical representation of the system.

The paper is arguing a point of view that a different abstraction should be
used than the hierarchical file system. I agree to some extent but not for the
same reason perhaps.

I think that the desktop metaphor is inherently harmful as a first
introduction to computing. The desktop is fine ONCE you've understood how the
system works from a bit of a different point of view (though perhaps NOT
_necessarily_ a _lower_ level as such), but until then the desktop metaphor
will trick you into believing very many things that are simply not true, and
which are going to come back and bite you in ways like those mentioned in the
paper.

As for the doing away with the hierarchical file system, I agree. Throw it
out. I enjoy Unix, but I don't hold the hierarchical file system particularly
dear. In fact, I think Unix has some very powerful ideas, and it sucks a lot
less than Windows, but Unix is just a local optimum and nothing more.

\---

Finally, on a bit of a different note, I'd like to state how I tend to think
of files now.

To me, a file is data. Often that data will have been structured in a
particular way, and sometimes it will have been structured in no particular
way. A valid python program has a structure that allows the python interpreter
to execute it. A text file that someone wrote using a plain text editor has a
certain encoding but no structure beyond that. A file that was created by
putting random data into it will not have any structure. A file that was
corrupted will have some data that does not conform to the intended structure.

I am aware that the order of the bytes and their property of being one single,
continuous unit, in persistent storage might don't match the order that is
presented to applications by the operating system, but in my use of computers
this has not yet mattered so I choose to ignore that fact. So I too am living
a bit of a "lie" with regards to how I think about files, I admit that.

Some files are not really files, but they are convenient because they offer
you a simple interface to some useful functionality. I am speaking of course
of /dev/urandom and friends.

Regular files are representations of "something". Like a text, or a photo, or
anything else that you can create a meaningful representation of. You can load
the data into a program as long as that program has been programmed to
understand the structure that is used in the file that stores your
representation of your data. If the program that you would like to use does
not understand that file format you either convert the file to some other
format if an acceptable conversion is possible with an existing piece of
software. If none exists, you implement it yourself, either into the program
that you are using, because it's open source as is the vast majority of the
rest of your software, or as a standalone program that does just conversion,
in both cases you are able to do this because the file format is sufficiently
simple, or at least the subset of the format that you need is, and the format
is open. If you are using proprietary software (including using multiple
pieces of proprietary software) in combination with proprietary file formats,
then either

1\. The proprietary piece(s) of software is/are able to do _absolutely
everything_ that you need to do (or at least, you think so), or

2\. The data is not sufficiently important to you to warrant building better
tools yourself, or

3\. It's so difficult to build these tools that you don't have an alternative.
For example, the data might come from very complicated equipment that you
couldn't build yourself even if given a million years to do so, and the data
is so complex that you aren't able to understand it from inspection nor from
reverse engineering, or

4\. The requirements changed (see also point one about thinking that the
software did everything that you needed it to do), or

5\. You've done fucked up. You didn't do your research and now you've stuck
with this. (See also point one about thinking that the software did everything
that you needed it to do.)

Anyway, once you've loaded your data into a piece of software that is able to
read it, something happens, and it brings us back to what we were talking
about.

\---

Once you've loaded your data into a piece of software that is able to read it,
or "opened a file" as is often the way that this is achieved, I would argue
that you are not operating on a file. You are working on the data that was in
the file. This is a very important distinction.

When you hit save, you are not saving the file. You are requesting that parts
of the application state to persistent storage. Again, a very important
distinciton.

If people knew this and understood this, computers would make more sense to
them. They are right in the paper that the mismatch between what developers,
computer scientists and others think of as files is the cause of a lot of the
kinds of problems that people have when they are dealing with files.

But where did the users gain their misinformed ideas about files from? I've
said it already, I think the desktop metaphor is to blame. Again, it's an ok
metaphor _as long as it 's not the only metaphor_.

------
aerodog
October 1, 2011

~~~
dang
Thanks, added.

------
guhesexezu
"We suggest that one aspect of this adaptation is to encompass metadata within
a file abstraction"

I thought File Streams would be the ideal place for metadata, except as I read
elsewhere, the metadata isn't transfered when you export to the cloud. And
didn't Microsoft try and fail at such with WinFS. I guess it's easier to write
papers than actually build the thing. No doubt whoever does actually succeed
in writing one will have to pay Microsoft a patent license.

