
Organizing Files (2005) - ingve
http://www.onlamp.com/lpt/a/6377
======
gabesullice
I love these kinds of old blog posts.

I admire the approach and dedication, but it feels a bit redundant. Aside from
the future dated directories, some creative use of the find command could
achieve a lot of this.

The problem really reminds me of inheritance vs. composition. Like
inheritance, file hierarchy encourages narrow classification and makes shared
traits difficult to manage. Tagging for file organization seems like a much
more elegant solution, letting an article or files categorization be a sum of
its basic properties. Services like Pocket have adopted that approach for a
reason I think.

I haven't looked, but I'd love to see a tag based file system tool. One could
easily imagine having a nested tree of tags where one would be able to
navigate down to a file via multiple directory paths.

Edit: Evidently, this is called a semantic file system:
[https://en.wikipedia.org/wiki/Semantic_file_system](https://en.wikipedia.org/wiki/Semantic_file_system)
and Tagsistant looks a lot like what I was imagining:
[http://www.tagsistant.net/](http://www.tagsistant.net/)

~~~
ics
TMSU ([http://tmsu.org/](http://tmsu.org/)) and git-annex ([https://git-
annex.branchable.com/](https://git-annex.branchable.com/)) also allow for
different tag-based filing methods. OS X has had native file tagging for at
least a version or two now as well, though it's less helpful with many tags.

------
yummyfajitas
Something I'm finding handy, at least for one specific purpose, is emacs org
mode. I have a big file, papers.org. It looks like this:

    
    
        * Computer science       :computer_science:
        ** [[~/org/papers/Category_Theory_Applied_to_Functional_Programming__cain-screen.pdf][Category Theory Applied to Functional Programming]]
           :PROPERTIES:
           :original-source: http://www1.eafit.edu.co/asr/pubs/cain-screen.pdf
           :END:
        ** [[file:papers/bitcoin.pdf][Bitcoin]]    :bitcoin:
        Original paper.
    

All the papers go into ~/org/papers/. I find stuff with text search and tags.

This, combined with org mode task management, is actually making me pretty
good at queueing and reading new papers.

~~~
leni536
For managing papers I use Zotero. It has some nifty features, like recognizing
the paper that you import and fill out all the metadata.

------
bensummers
My company has built a nice business around solving this precise problem,
although as a web app for internal use in an organisation rather than for
organising a filesystem.

The key principle is to organise things by their description, not their
position in some sort of filesystem.

If every field in your description allows multi-values, then you can put an
object in multiple places. And if you can search and browse on any field, you
can find it by subject, author, kind of information, and so on.

There are a few other clever ideas in there, but that's the key: Help users
enter really good metadata, then use it to find things within a flat
namespace.

It's open source: [http://haplo.org](http://haplo.org)

We're hiring developers to work in our London office: [http://www.haplo-
services.com/jobs](http://www.haplo-services.com/jobs)

~~~
fit2rule
This ties into what I've often thought about our computerized world, that it
is mostly a matter of semantic differences, similarities, and identities that
prove the mettle of any system considered 'productive'. If my semantic set and
your set are generally coherent, we can get something done; if they are not,
then we spend a lot of time trying to attain semantic equilibrium, before
anything actually gets _done_ according to the business purpose.

The tools that allow us to synchronize our own conceptual copy of the semantic
universe in a way that produce a 'flow' between individuals, are the ones that
mostly succeed.

Haplo seems to do a good job of giving a small group the ability to construct
a semantic set with a productive goal. I've looked at it for 5 minutes, and
the only thing I can think to contribute is that I feel it would be of
objective benefit to give more examples of how Haplo has granted a group
enough self-awareness to actually get some work done. There is kind of a meta-
sense to the product, which can either work for you, or against you. Showing
how you can use Haplo to build a small business production/flow-line might
make it a little more clear as to what particular problem you are solving.

Another thing is that the web is really quite functional, on the one hand, but
boring on the other. I wonder how you might gamify Haplo ... Well, I have some
idea's, anyway ..

~~~
bensummers
Humans are very good at describing the world, and communicating through common
vocabulary. The problem comes when you don't model the real world, but instead
model your current business processes within a traditional database.

Haplo's data model encourages you to model the real world, using the normal
shared vocabulary, then hang business processes on top. This eliminates the
semantic problems.

Generally problems come when you try and squeeze your information into what's
easy to create with SQL databases and the current crop of NoSQL document
stores. We spent the time to build an object store which was capable of
handling "information", rather than "data", and it makes an enormous
difference.

Regarding examples, we're working on more overview documentation and some
example applications.

As a company without investors, funded entirely by revenue from our customers,
those customers have to take priority over building example applications. Our
aim is to open source the majority of the work we're doing, and hopefully
those will be good examples.

But here's a product we've built on top of Haplo: [http://www.phd-
manager.co.uk](http://www.phd-manager.co.uk)

~~~
fit2rule
>model the real world .. shared vocabulary .. SQL databases .. object store ..
good examples.

This is the crux of the challenge, I think. Anybody who doesn't know what a
scone is, can't really sell it.

The modeling occurs at a word level, like .. as a dictionary .. and everything
else is just baggage. Get everyone on the same page .. of the dictionary ..
and you get a working group. Isn't software secondary to human interaction?

------
TheLoneWolfling
I want a DB-as-FS.

These all have something in common: namely, that files must be stored in one
place. But, as he discovered, files aren't organized by only one thing.

I want a FS where I can tag things. Effectively, a D(A?)G, not just a tree.

Links help some, but nowhere near enough. Especially with the quirks of some
things trying to handle links.

~~~
networked
>I want a DB-as-FS.

So do I. From Project Xanadu to the BeOS to the WinFS it has been a recurring
idea in computing and for a good reason, I think. However, as far I am aware
no popular implementation of it for Linux, the BSDs, Windows or OS X exists.

In particular, I have long wanted to implement a tag-based file system.
Tagging should be easier to implement than a full-on DB-as-FS and,
importantly, it would be easier to interop with existing file systems and
tools that talk to them.

My design ideas for it so far are as follows:

You can map each tag to a directory at the tagging file system's mount point.
Each of these N directories would then contain N-1 for each of the remaining
tags to allow you to select files that have two or more tags. For example, the
files with the tags "a" and "b" would be accessible through /tagfs/a,
/tagfs/b, /tagfs/a/b and /tagfs/b/a.

In contrast to a DBFS accessed through, say, "/dbfs/SELECT * FROM .../" it
would be possible to use "ls" to get the list of all the tags, to apply the
POSIX permission model in a way that made sense and suchlike. E.g., the
permissions of /tagfs/a/b/c could be an intersection of those for "a", "b" and
"c".

One problem with this approach is in how ordinary (non-tag) directories would
be interact with the directories that represent your tags. Not distinguishing
them for the user would create a potential for confusing misfiling errors and
data loss on deletion. Distinguishing them by giving the tag directories
special names (e.g., ones that begin with a sigil) or permissions would limit
the system's power. Extended attributes are not easy to see visualize in most
GUIs, etc.

------
bigbugbag
My experience with organizing files taught me not to consider the user home
directory as owned by the user. The home directory is littered with lots of
files that most software store there so my user data goes into its own
subfolder.

Then I sort my files in directories with an unsual scheme, first directory is
the importance of the files to me:

/buffer a space for file copies as I work on them and temporary files /collect
for new files /datalibrary, /databank, /datastore, /datakeep are to separate
data according to its importance to me for example the keep is a smaller size,
encrypted and automatically backed up every day.

The second level of directories is the action related to the data, for example
/listen will receive audio files, /watch video files, /look for pictures.
other examples include /archive, /customize, /play, /install

Then depending on the content there is a sorting scheme where I either sort by
genre, by theme, by name,… for example pictures, if a picture is worth keeping
it could go in /by_genre/hispeed or /by_genre/tilt-shift, or in
/by_theme/futurama or /by_theme/space, or /by_name/choi xoo ang

Now that I learned of TMSU, tagsistant and the like, I'm gonna try to make use
of those.

~~~
mfisher87
>The home directory is littered with lots of files that most software store
there so my user data goes into its own subfolder.

This is a really good idea, going to try it!

Can you go in to more depth about your first tier and sorting process? I'm
most interested in how you use datalibrary, databank, datastore, datakeep.

------
leni536
Hard or sym links exist for a reason. One doesn't need sort everything into
only one suitable category. It's not like I have too much to brag about
though, my home directory is a mess too. However sorting everything by date in
the filename? Files already have date metadata, I use "ls --sort=time | head"
a lot.

------
fsiefken
I used to use DevonThink OSX for organizing my info, it was excellent to
archive and search through documents. Unfortunately it doesn't work on linux
so I am now using org-mode wiki and projectile and the platinum searcher, just
search, good naming conventions, no tags. Second rate but i can use it
anywhere (including my phone).

