There is no fundamental difference between a database and a flat file, it is all bytes on a disk/memory in the end. So it is mostly a question of balancing the roles of the hardware, OS, and application software.
For example, if the reason you are using a database is that it does a particularly good job at limiting disk IO, then it may not be necessary with fast SSDs. If your reason for splitting into small files is to save RAM, it may not be necessary if you have more RAM. If you want to do to a distributed architecture, maybe your filesystem is not up to the task and you may want a database server.
Bonus points for some open standard, but some kind of understandable format even if proprietary is miles ahead of proprietary and cryptic.
I wish Fusion360 for example had a sanely git-able on-disk format like:
line 0xdeadbeef (0,0) - (123,456)
constraint parallel 0xdeadbeef 0x01234567
(Actually I suppose git - and incremental backups - is the main reason I say I'd still prefer plaintext to db.)
Everything today uses and understands ASCII/UTF-8 and we've used that, plus some handy human conventions like limited character count between newline characters, to build up tools like git to help manage the changes in smaller chunks.
There's nothing preventing us from doing the same for other formats if they become commonly supported or expected. Much like SQLite and some of the tools cropping up.
But ubiquity is extremely special and useful and essential if you want to maintain ownership of your data.
True. I mean if it would be common to put source code and text into sqlite files, then most tools would probably also deal with it. However there is a case about simplicity. For instance writing a diff algorithm on plain text files is much simpler and easy to understand, than writing diff algorithms on text or binary structures.
I'd argue the opposite. Coming up with a good, human readable, diff on unstructured text within acceptable runtime is not that easy. LCS is NP-hard and the naive algorithm has unacceptable performance and memory footprint.
However diffing something structured, especially when it has unique keys like in most databases, is really easy. Structure gives you an easy way to compare data systematically and is handy to express differences.
An Sqlite database may be a binary blob, but thanks to the Sqlite software being open source and ubiquitous, it's pretty easily comprehensible. Sure, not quite as accessible as plain text, but not far from it.
So when you need something just a little bit more advanced than plain text (i.e. SQL functionality) it seems like a good compromise that doesn't -- pun intended -- too much compromise openness and accessibility.
: But, hey, does "plain" text include UTF-8 nowadays or does it still mean ASCII-only?
There are two standards for this common to CAD, called IGES and STEP. They are both plaintext, though their gittability is questionable.
Is there a non-proprietary somewhat sane way to do version control for spreadsheets (Excel and LibreOffice)?
That is not exactly what proprietary means. Though I feel the same - plain text can be versioned and managed by various tools and that is great!
Plaintext could be in a proprietary format but that's far better than proprietary compressed database that's a job to work out what even made it (if that wasn't proprietary itself!). Proprietary plaintext might even be better than open standard blobby format, depending on what you want to do with it. (Better for git and backup, worse for opening in competitor tools.)
If your code is Free and Open Source, then its file-format is presumably not proprietary, whether or not it makes use of an existing standard format (like JSON) or solution (like SQLite).
Do you believe that you can do a better job than those tens of guys who contributed to a DBMS? Go ahead.
Do you not need reliability, integrity, security, performance? Go ahead.
So use a file to store data instead of database but know the trade-offs.
As soon as you start processing forms and doing anything transactional based on that, you'll be quickly reinventing a lot of wheels if you aren't using a DBMS of some sort.
Doesn't Hacker News run on flat files?
And if you e.g. have multiple dataflow paths with different priorities then you also probably shouldn't be using a DBMS, because they're too much of a closed world. Using database technology as a library would be a better approach; otherwise sooner or later you'll find you need to customize the internals of your DBMS, and with most extant products that's somewhere between hard and impossible.
Could you expand? Does this mean you wouldn't use an ACID database for CRUD?
99% of the time all you want to do is a) commit input events to a durable store as quickly as possible b) read a state-of-the-world that's consistent in the sense that it represents all the consequences of all the inputs up until some (logical) time t, and IME an event sourcing model is if anything better at doing that than an RDBMS is.
A database has a whole engine, which I'd argue is a pretty fundamental difference.
I think I get what you were trying to say, though: there's no fundamental difference between database tables and flat files, and there's no fundamental difference between a database and a filesystem.
But "flat" essentially means "one table that doesn't reference (and isn't referenced by) other tables in a reliable way." It's a way of saying 2 dimensions (flat) versus 3+ dimensions (relational). The moment you start having multiple plain text files where columns act as primary and foreign keys, the system is no longer flat, despite using plain text files.
It seems to me that TFA and most comments here are really just talking about using a DBMS vs using plain text files, not using flat vs using relations.
You can tar up your flat files as well, but that is a separate process and can be time consuming for larger directories.
Perhaps there is a use case for a file system backed by portable database like sqlite, such that you can just mount the volume and send it around without pain, but can prefer to use normal command line utils (ls, rm, grep, etc) instead of interacting via SQL.
(This also works without Docker of course, but Docker makes me feel safer when I do stuff like this)
Since we're talking about copying a folder, I'm not sure why you'd feel safer doing it via an additional layer of abstraction.
I know that PG data is usually in /var/lib/postgresql/data, but I like Docker because if I ever need a 2nd or 3rd database on one system, I can give them more meaningful names and not feel like I'm touching Important System-Wide Stuff.
My intuition is that you're expected to do some kind of migration or export/import, because the on-disk format probably changes between major versions.
We have file formats like ODF which are just zip packages with regular files inside
DBMS (and in particular) RDBMS is not a merely persistence medium for an application - DBMS main role is to _share_ data between applications.
Curious as to what is not just "all bytes on a disk/memory in the end"? Also can we please stop calling datastores databases?
Many top posts in the past weeks were all about using SQLite for usecases it wasnt originally designed to do, using it as a distributed database, even.
Now, here's a post which misses the EXACT usecase sqlite was built for, and its barely even mentioned.
SQLite is quite literally the best solution here. Its a file, you open it, and boom you got a database. You can explore it with sqlite3 cli or datagrip or anything you want.
This is like saying "lets use png for animated images" and then searching for "a format to handle uncompressed images with transparency" and not mentioning png.
Boom! What is the program to open (understand) it? Even my elixir's core lib cannot understand it.
Or any SQL DB viewer, like DataGrip.
And guess what the market situation for database viewers is like...?
After Perl I graduated to doing web stuff with a Microsoft Access (!) file stored in the disk alongside my .ASP scripts, exactly like one would do with sqlite today. It was quite performant.
I wonder if your "internet mentor" was the same as mine: Matt from Matt's Script Archive @ http://www.scriptarchive.com.
I tried to ask for msql or mysql, but were told 'no', that they were 'too heavy' for what we needed, so the folks doing perl were pushed back to the dark ages. We also weren't allowed to use 'shtml' - basic server side processing for things like 'include' to include common footers. Whenever we needed to make a footer change we had to write multiple search/replace scripts across dozens of client sites. The new 'asp' folks lauded this over us for a while with "look how advanced asp and Microsoft is - MS really gets the web". I showed what I'd been doing with PHP for 2-3 years at that point, with includes and more. But "well, that's not really powerful enough". Kept moving the goalposts.
tldr - there were many options for more advanced stuff than flat files back in the earliest web days, but often people were hamstrung by short sighted tech decisions made by people who were not responsible for actually delivering the work. has much changed in the past 25+ years? ;)
I received a panicked call from the admin that turned down my request. He wasn’t happy with me but understood the role he played in creating an expensive disaster.
Plenty of learning to go around in that experience :)
I don’t use MySQL or PHP for anything nowadays, but was extremely useful to learn in the early 2000s.
Currently I'm looking at GravCMS  as an alternative. It's free initially, but it can become somewhat expensive with many official plugins. But it's file format is Markdown, and one can combine multiple files into a so-called modular page. It has a backend for editing, forms and e-mailing of form submissions. Seems perfect for small and mid-sized company web site.
Another option I considered was Kirby . Its backend UI is configurable. That's nice in theory but the documentation is somewhat lacking, in my opinion. I've used the starterpack and it took me hours to find the one command to be able to add new pages. Its content format is also custom, not Markdown. Finally, it's €100 per site.
Also, a few days ago, I stumbled upon Typemill  which I will check next week.
One of the newer features are the so called FlexObjects. It’s an absolutely great idea for a CMS but explaining the possibilities and technical intricacies seems moot as the documentation and Discord community are a better place to start learning. 
Websites built with Grav compare to SSG speeds while maintaining a different ease of use and much less time invest to roll out.
And sticking to the topic: being completely flat-file centric, those websites are a breeze to maintain and according to my albeit limited experience also a bit sturdier security wise.
I built many of my (project) websites with it,
it‘s blazingly fast (performance-wise and in terms of building your site), super easy to customize and I never missed my database.
Kirby really makes it easy to create unique stuff, there’s also a nice „Made with Kirby“ catalog here:
Still, your average extfs/NTFS is a really bad database. There's a reason people created "real" databases, and as far as I can tell the only advantage of flat files is that you can use a text editor instead of an SQL editor to view them, which seems pretty minor given all the downsides.
SQLite is probably better in almost every scenario.
If you skip that and do your own ingestion and querying...you just have a built-in database?
This might have some performance shortcomings, but given maturity of caching these days, you can probably just add caching layer in front of the inefficient query engine when the needs arise.
The CMS I picked is PicoCMS - what flat file stuff have you used and can recommend? And probably important, how well does it scale? I cant give much input here, my blog has 21 posts so.. no ceiling in sight yet.
Here I made a post where I ramble a bit about blog systems, as far as I got in contact: https://riidom.eu/posts/021_en
The result looks very good so don't think it's worth. By the way since you copied your posts to your new blog but still retain them to the old one, you may want to add the canonical meta tag to the old ones.
But I'll look into the issue, thanks for the feedback!
This can lead to easier search and manipulation.
I use Logseq and Noteplan, both store data in plaintext.
As a secondary editor to both I use Sublime Text for very fast search and replace, or diagnosing why Logseq might be having issues.
I’ve had issues before in other note taking / task apps and when I run into problems it is much harder to search and manipulate a database file. That’s even if the app has a sensible database format.
Of course depending on how you use and might need edit data you might find databases are better for you. I’ve always found it easier to work with plaintext for note taking/ task apps.
Basically, if you have your workflow written up anywhere I’d love to read it.
I don't see much value at all in small software for the sake of being small, but files are something really special.
You can SyncThing them. You can version control them. You can browse and delete them.
And best of all, they're not really flat anymore. A file is a typed object that tools know how to work with(As long as you do the sane thing and use a common file type).
Databases are often appropriate, but in those cases, I think sqlite is the better choice more often than not, unless you've truly got a large number of users all doing things at once.
if you need more than that for a static blog (like Gatsby/NextJS) i’d consider you unreasonable
In general I liked Zola but I found it hard to work with. I really wanted to set a few metadata fields, pick and apply a theme, put a few meta-pages (like blog index), write 3 articles and be done with it. And it didn't serve me well for that.
Have you considered writing a guide for how did you setup your Zola blog? I'd read and try it.
in Zola terms theme really means a pre-built site
that said, i don’t have a guide, but i have made a sleek template for Zola (link above) which anybody can use in minutes and deploy on Netlify from GitHub
the template is a blog with projects section, although it can be extended to add every section imaginable
the price for the theme is significantly lower than the hours you need to put in to build a similar one from scratch
(sorry for ad)
I see the biggest downside being that the webserver will call your PHP app which will then proxy the data through itself one time. Well, maybe you can use ReactPHP or Swoole and then it’s faster than Node.js even…
Also you then have more programmable options, for example around access control lists or capabilities to access stuff, caching and expiring caches.
How does MySQL compre to ext3 for storing terabytes of data?
As a side note - what do y’all think of FUSE to Amazon S3? Just map a path to it and let it do the rest. GOOFYS I think skimps on the POSIX compliance in favor of speed.
Fuse for s3 might work but you risk the chance of a generic abstraction triggering a massive set of actions that cost a lot. Something like running find or checking the size of a directory could end up costing you thousands.
The Flat File Extractor command line utility has saved my bacon a few times in my career. If I'm dealing with legacy code, data migrations, needing to quickly perform some surgical extraction or munging into another format - ffe is the tool for me.
EDIT: looks like someone's forked and update the tool. https://github.com/igitur/ffe
If I did the work today I'd probably store the metadata in an SQLite database running on the frontend, but I'd keep the poem copy in their own files in the folder.
While that may be a problem for some, I don't think it's the problem for the purposes of this article and discussion.
Different situations have different requirements.
If you have zero control over your environment and no ability to inspect it, then I agree that static files are probably the way to go.
>> “… means being able to start your website by dumping everything into a folder and you're pretty much done (1 step installs).”
>> “Dumping everything into a folder and it just works (1 step installs) makes it easy for even beginners to get a website running, but also it makes your website, blog, forum, journal, or whatever, portable. Very easily portable. You need to put your website on a thumb drive and/or move it to a different domain? Done. Easy. It's just like backing up any folder.”
If there is any possible need to integrate into a larger overall system in ways that have yet to be fully imagined or defined, a DBMS might be a better option.
So in the sense of "put up a website by just transferring static files", an Sqlite database isn't quite "a flat file". It's a single file, yes; but its content is "bumpy" -- not "flat".
(Then again, once you have that translate-content-to-HTML process installed on your Web server, updating the content is reduced to just transferring a new .db file to it.)
HTMLy can easily work with 20.000 posts, without any noteworthy slowdown.
Can anyone explain how sites like Flatboard handle user input into a flat file system? How do they handle conflicts with just one file? What's the workflow and set up like — can you use serverless or do you have to have a full server?
There is a long line of people complaining that git is hard to learn on the internet. But I've never met anybody on the real world that had any problem with the basic cycle of edit - push - review - accept (or the even simpler edit - push).
I wrote a bit about why I went away from this, but in a nutshell: I am not using "modern" deploying techniques, and upping the compiled blob each time I write a new post felt like a huge drag to me. This is a general problem pretty much baked into SSG's, but I have hope that someone will adress this problem in near future.