

Ask HN: NoSQL or flat files? - mcartyem

I suspect a lot of companies use complicated tech for no reason. When using a database, even with the simpler case of not needing table joins at all, people shard databases and use caching on the front-end machines. This complicates their design, with a recent example being the blog post about Pinterest.<p>What does that offer that flat files on a shared filesystem don't?
======
UnoriginalGuy
\- Concurrency (of access to records)

\- Performance (e.g. caching, indexing, etc)

\- Less first party code means less development/debugging/design time, with
benefits growing very quickly the larger a project becomes

\- Queries are more flexible than APIs (and add a clear isolation between data
source-logic and data usage-logic)

\- Scalability; flat files don't have it.

The only benefit of flat files is as a learning exercise. It is useful to
understand how to make a database-like system yourself so you can understand
how things can go wrong and certain edge cases.

But in production code I'm yet to see any situation where a homegrown flat
file system was better in the long run than an off-the-shelf database
solution. I have seen several flat file systems that have made it impossible
for companies to scale and have likely cost them millions in lost revenue as a
direct result, so that is fun too.

------
darkxanthos
When you first start a project and have only one server a single file can be
fine. Once you start getting too many people though you'll start dealing with
file locks. One way of solving that is to map a file to each individual user
(sharding). If you make each entry into the file immutable this will work
beautifully for a while.

If you introduce the idea of groups of users you'll probably want to create a
group text file for each user rather than store it as an entry in their pre-
existing text file (kinda like an index though different)...

Main point being- I advocate starting with a text file until the complexity of
managing that file begins to outstrip the complexity of managing a third party
data store.

To many people go straight to a 3rd party data store by default when text
files are extremely capable.

------
anywherenotes
I worked at a company which sells document archiving to businesses, and they
do not use a database. All data is stored on disk in files. They use indexing
of archived data, similar to how google indexes everything (they don't use
google). They also use indexed files to keep track of some data, but most of
it is about users+permissions, not much to do with data.

I don't feel I can give further details, but the company is making money,
nothing you'd read about, but it's doing OK. Solution scales up so that
insurance and financial companies are their clients. Terabytes of data - no
big deal, it's quite amazing to see how much faster it runs on customer's
computers with terabytes of data, than in house with test-databases on
developer's PC's with only 100 MB of test data. And it doesn't even require an
admin (besides someone to make sure there's enough space on disk).

I think it really depends on your app. Databases aren't always the right
answer, but no one has been fired for using databases. (maybe that's why
people flock to them - that, plus if everyone's doing it, and you are worried
about your resume ... well ... you better do it too)

------
mattzito
Well, shared filesystems are kind of a mess. If they're some sort of clustered
block filesystem, there's all kinds of weird performance edge cases and split-
brain potential problems.

If you're using NFS or some sort of centralized filesever, you're going to
bump into UNIX performance limitations around things like opendir() with lots
of files in a single directory. Not to mention, trying to get any kind of
atomicity around NFS is a huge pain - look at maildir as an example of a
working model for a much simpler situation, email.

Sure, of course, in the early days you won't run into these issues, bceause
you won't have enough objects. But once you start getting into a few hundred
thousand files, you'll need to start hashing into subdirs, and deal with all
the issues above depending on your tech.

------
srinathsmn
I think the kind/amount of data and type of analysis that is to be performed
on the data should probably lead the decision. Scenarios where you have
hundred of GBs of data having to perform text analysis on them, probably would
make me think in terms of flat files. But for simple web apps, probably RDBMS
or a NoSQL store is the way to go.

------
manidoraisamy
Eventually you might end up writing a database, if you start building on top
of a shared filesystem.

