

The Filesystem Test - programminggeek
http://programminggeek.com/essays/the-filesystem-test.html

======
jjbradley
I know that many shops don't work this way, but when I have designed from a
"data driven" methodology (I know this is an old term, but I have been in the
industry since 1984), I always started with the question - what are the things
we are capturing data about? Usually by starting with the "objects" if you
will, I would actually end up with a close to 3rd normal form data design, but
still close to the object level as well. I have also been a big proponent of
the developers having complete access to the database and for them along with
the architect to design out the triggers/stored procedures - mainly to make
sure the data maintained a certain amount of integrity - especially if the
system was going to be designed to allow end users to update and add with
tools such as Access or Excel. I have seen many issues - especially with large
projects where there are more than a few developers, with the data integrity
rules only being in code close to the client as you ultimately end up with two
or more pieces of code handling the integrity differently and one or more of
them being wrong.

------
Dave_Rosenthal
I think the main part of this that resonates with me is not wanting a
disconnect between the parts of your development that happen "above" and
"below" the database abstraction.

For this reason many people seem to be using databases more as lightweight,
reliable storage and pushing as much logic as possible up into the
application. I hear many stories of people using SQL databases with one big
table with a key column and a value column.

The article also begs the question: Why not just use a filesystem for state
storage? Two big issues: 1) It's not very good at creating/updating lots and
lots of tiny files and, 2) aside from some clunky locking, it doesn't have any
good concurrency control mechanism for concurrent access to data. This is
where databases become valuable.

I build FoundationDB, a database that aims to decouple the storage substrate
(which we are building as an ordered, ACID key-value store) from the data
models and interfaces (which we are building with our community as open-source
projects). Not to say that this solves all problems, but it does align, I
think, with the author's way of thinking.

~~~
programminggeek
I haven't had a chance to dig into FoundationDB, but your approach sounds very
interesting. I 100% agree, that the filesystem is not without its flaws for
building a production system. What's interesting to me is once someone has the
database hammer, every problem looks like a nail.

------
lewiscowles1
Hi,

Files: I Agree, storing entire files in databases (or at least the same
databases as common field data is a horrible concept, but paths to files can
and should be stored to files to prevent "searching" to locate files, check
modification dates, sort by alternative, compute once(or as few times as
possible), store forever data, so that files can be easily located and just a
check to see if they exist.

DB Procedures being a poor design practice: I disagree completely, it is
merely a separation of logic. Triggers, can be a bit much, but procedures can
help, especially when used correctly for the right use-cases. Just as True
that a good programmer / engineer should never aim to go for one method or
another, but get the right tool for the Job.

DB Hammer, Programming Hammer, Buzz-word Hammer, all pet hates of mine, I have
spent the last decade learning what I do know exceptionally well, I dislike
hammer lovers on most levels and often find that the structures they build
show their level of workmanship.

Provocative article, Nice work!

~~~
programminggeek
Your point that DB Procedures as a separation of logic being a reasonable
choice I think boils down as much to your criteria of a reasonable choice.
Good design is about understanding your priorities and how they affect the
outcome.

For example, if I prioritize tests and testability higher than just shipping
code, then I will very like care a lot about where my logic lives and if I can
test it. If I just care about shipping code, where my logic lives and how I
test it is less important than a working product.

Well informed tradeoffs are often reasonable choices.

------
fstrati
There is some work on something related to RDBMS indipendence at MetaModel:
<http://metamodel.eobjects.org> . You may be interested in it, as I am. In
fact, I have ideas to go beyond MetaModel, and let software redundance in in
the software layer that deals with the Database. My ideas are at:
<https://sourceforge.net/projects/rdbmssr/> and I'm starting to thinking about
native C++ solutions for Oracle and MS SQL Server as a starting point. I'm
open to discussion: fede [dot] strati [at] gmail [dot] com. Cheers Federico

~~~
programminggeek
Those are interesting ideas to be sure. DB independence is a worthy and
difficult goal, but take it one step further and take the same approach for
APIs, queues, databases, or any external system.

------
bryanthompson
The main benefit that I see in writing for the filesystem first is that you
_can_ defer the decision of database type and structure until it actually
becomes important, then you can make that decision for each component of the
system and swap them out independently.

The typical process, for me at least, has been to write out all the
requirements of an app, then to start building migrations, models, and tests.
Flipping this around has made me think about each layer differently and it
_does_ blow minds when you show a simple "jack" that can be swapped out for
any type of database at any time without affecting anything else.

------
shawnbaden
Already mentioned and it's a minor point, but regarding putting files into the
database... When I worked at MS on SharePoint, we actually transitioned from
storing files on the file system (Windows) in V1 to storing files in the
database (SQL) in V2 for performance reasons. For a system that works with
lots of files, the file system isn't the place. If it's static content, send
'em to a CDN. If it's active content, put 'em in a database for the reasons
Dave mentioned (concurrency, write speed).

------
andrewflnr
You say "code becomes less object-oriented" like it's a bad thing. Sure, being
tied to a specific implementation of SQL is bad, but OO-structured code is not
necessarily a good thing. If the interesting part about your app is the data,
maybe your app should be structured like your data, rather than vice versa.

~~~
programminggeek
I'm not saying your code should be OO-structured per se. What I'm saying is
simply that your code should influence the database more than the database
influence the code.

------
Dylan16807
Honestly I've been considering moving some of my personal data from the
filesystem into a database. Filesystems tend to be extremely slow at metadata.
I've only used a _single_ program that can quickly make a list of all the
files I have and it works by bypassing the OS entirely and reading the FAT.

------
rgbrgb
Is this guy saying that we shouldn't be using ORM's?

I guess I don't really think relational code is all that different from OOP
code. Maybe I've been brainwashed by Rails and you guys can help me out.

~~~
programminggeek
In many cases, no, you shouldn't be using ORM's.

