Hacker News new | past | comments | ask | show | jobs | submit login
Smaller is better – The rise, fall, and rise of flat file software (wilcosky.com)
232 points by riidom 33 days ago | hide | past | favorite | 150 comments

Maybe it is just a reflection on the state of filesystems vs databases.

There is no fundamental difference between a database and a flat file, it is all bytes on a disk/memory in the end. So it is mostly a question of balancing the roles of the hardware, OS, and application software.

For example, if the reason you are using a database is that it does a particularly good job at limiting disk IO, then it may not be necessary with fast SSDs. If your reason for splitting into small files is to save RAM, it may not be necessary if you have more RAM. If you want to do to a distributed architecture, maybe your filesystem is not up to the task and you may want a database server.

For me it's about proprietary vs. readable - unless it's obviously necessary for performance I think I'd still prefer 'flat file' to sqlite, but basically I just want something that isn't an incomprehensible blob, that I can use with other tools etc.

Bonus points for some open standard, but some kind of understandable format even if proprietary is miles ahead of proprietary and cryptic.

I wish Fusion360 for example had a sanely git-able on-disk format like:

    line 0xdeadbeef (0,0) - (123,456)
    line 0x01234567
    constraint parallel 0xdeadbeef 0x01234567
Etc., Or something.

(Actually I suppose git - and incremental backups - is the main reason I say I'd still prefer plaintext to db.)

Perhaps it was deliberate, but your last paragraph (plaintext vs db) really just reinforces your point of standards. There's nothing particularly special about plaintext aside from its total ubiquity.

Everything today uses and understands ASCII/UTF-8 and we've used that, plus some handy human conventions like limited character count between newline characters, to build up tools like git to help manage the changes in smaller chunks.

There's nothing preventing us from doing the same for other formats if they become commonly supported or expected. Much like SQLite and some of the tools cropping up.

> There's nothing particularly special about plaintext aside from its total ubiquity.

But ubiquity is extremely special and useful and essential if you want to maintain ownership of your data.

> There's nothing preventing us from doing the same for other formats if they become commonly supported or expected. Much like SQLite and some of the tools cropping up.

True. I mean if it would be common to put source code and text into sqlite files, then most tools would probably also deal with it. However there is a case about simplicity. For instance writing a diff algorithm on plain text files is much simpler and easy to understand, than writing diff algorithms on text or binary structures.

> For instance writing a diff algorithm on plain text files is much simpler and easy to understand, than writing diff algorithms on text or binary structures.

I'd argue the opposite. Coming up with a good, human readable, diff on unstructured text within acceptable runtime is not that easy. LCS is NP-hard and the naive algorithm has unacceptable performance and memory footprint.

However diffing something structured, especially when it has unique keys like in most databases, is really easy. Structure gives you an easy way to compare data systematically and is handy to express differences.

> I think I'd still prefer 'flat file' to sqlite, but basically I just want something that isn't an incomprehensible blob, that I can use with other tools etc.

An Sqlite database may be a binary blob, but thanks to the Sqlite software being open source and ubiquitous, it's pretty easily comprehensible. Sure, not quite as accessible as plain text[1], but not far from it.

So when you need something just a little bit more advanced than plain text (i.e. SQL functionality) it seems like a good compromise that doesn't -- pun intended -- too much compromise openness and accessibility.


[1]: But, hey, does "plain" text include UTF-8 nowadays or does it still mean ASCII-only?

> I wish Fusion360 for example had a sanely git-able on-disk format like:

There are two standards for this common to CAD, called IGES and STEP. They are both plaintext, though their gittability is questionable.

They are structured text though so I don't think you can just do deltas like like with code. You'd have to treat them like binaries would you not? That's the way we always kept our XML files in git forced to be binaries.

I would really like to see smudge/clean like-filters for git which convert between git objects and work dir files in deal with structured data or binary files more efficiently. So that diffs are minimal and readable even with those formats.

Code is (generally) structured too?

> main reason I say I'd still prefer plaintext to db

Is there a non-proprietary somewhat sane way to do version control for spreadsheets (Excel and LibreOffice)?

Tell LibreOffice to save the file as Flat XML.

> For me it's about proprietary vs. readable - unless it's obviously necessary for performance I think I'd still prefer 'flat file' to sqlite, but basically I just want something that isn't an incomprehensible blob, that I can use with other tools etc.

That is not exactly what proprietary means. Though I feel the same - plain text can be versioned and managed by various tools and that is great!

I was not aware I gave a definition for it. I didn't mean it as an antonym for 'readable', if that's what you mean.

Plaintext could be in a proprietary format but that's far better than proprietary compressed database that's a job to work out what even made it (if that wasn't proprietary itself!). Proprietary plaintext might even be better than open standard blobby format, depending on what you want to do with it. (Better for git and backup, worse for opening in competitor tools.)

I think their point was that proprietary is not a synonym for bespoke.

If your code is Free and Open Source, then its file-format is presumably not proprietary, whether or not it makes use of an existing standard format (like JSON) or solution (like SQLite).

A database can be relational, it can make some guarantees, it can pass Acid. With a file you are responsible for performance, reliability, integrity and security of the data.

Do you believe that you can do a better job than those tens of guys who contributed to a DBMS? Go ahead.

Do you not need reliability, integrity, security, performance? Go ahead.

So use a file to store data instead of database but know the trade-offs.

Yeah, none of that may matter for a "brochure" type of website, or a blog where you are the only person updating it.

As soon as you start processing forms and doing anything transactional based on that, you'll be quickly reinventing a lot of wheels if you aren't using a DBMS of some sort.

> none of that may matter for a "brochure" type of website, or a blog where you are the only person updating it

Doesn't Hacker News run on flat files?

But only the wheels you need, when you need them. Don’t carry the overhead unless needed.

ACID comes with costs, those guys may be smart but they're not magicians. As a rule of thumb, if you never handle a transaction failure or deadlock with something that is not retrying or failing, you shouldn't be using an ACID-based system since you're not getting any benefit from it.

And if you e.g. have multiple dataflow paths with different priorities then you also probably shouldn't be using a DBMS, because they're too much of a closed world. Using database technology as a library would be a better approach; otherwise sooner or later you'll find you need to customize the internals of your DBMS, and with most extant products that's somewhere between hard and impossible.

> As a rule of thumb, if you never handle a transaction failure or deadlock with something that is not retrying or failing, you shouldn't be using an ACID-based system since you're not getting any benefit from it.

Could you expand? Does this mean you wouldn't use an ACID database for CRUD?

Depends how you're interpreting the "U" in CRUD - if you have some complex business logic to resolve conflicting updates to the same object then maybe an ACID database would make sense. But certainly if you're just doing last write wins like 99% of CRUD webapps then using an ACID database is wasteful and pointless. The whole point of ACID is to say that you need these globally isolated transactions and you're willing to pay for them, and while database engines are pretty smart with MVCC and so on there's fundamentally only so much they can do.

99% of the time all you want to do is a) commit input events to a durable store as quickly as possible b) read a state-of-the-world that's consistent in the sense that it represents all the consequences of all the inputs up until some (logical) time t, and IME an event sourcing model is if anything better at doing that than an RDBMS is.

The filesystem provides its own ACID guarantees too. Well, some filesystems.

Portability to other systems.

Which platform do you miss, which SQLite does not support?

Portability is more important than guaranteed atomicity, consistency, isolation and durability?

I mean, not always, but I can certainly imagine cases where it might be.

Sometimes. It really depends on the use case. Use the right tool for the job.

What kind of portability? Why isn’t a db portable?

A SQL dump is incredibly portable.

> There is no fundamental difference between a database and a flat file, it is all bytes on a disk/memory in the end.

A database has a whole engine, which I'd argue is a pretty fundamental difference.

I think I get what you were trying to say, though: there's no fundamental difference between database tables and flat files, and there's no fundamental difference between a database and a filesystem.

> there's no fundamental difference between database tables and flat files

But "flat" essentially means "one table that doesn't reference (and isn't referenced by) other tables in a reliable way." It's a way of saying 2 dimensions (flat) versus 3+ dimensions (relational). The moment you start having multiple plain text files where columns act as primary and foreign keys, the system is no longer flat, despite using plain text files.

It seems to me that TFA and most comments here are really just talking about using a DBMS vs using plain text files, not using flat vs using relations.

File systems are also "whole engines". Otherwise we wouldn't have named files or directory structures.

How is a filesystem not a "whole engine"?

A filesystem is, but a flat file alone is not. That was the only point of my comment - I was just being pedantic.

Some databases also have the ability to be instantly portable, like a sqlite file.

You can tar up your flat files as well, but that is a separate process and can be time consuming for larger directories.

Perhaps there is a use case for a file system backed by portable database like sqlite, such that you can just mount the volume and send it around without pain, but can prefer to use normal command line utils (ls, rm, grep, etc) instead of interacting via SQL.

I’ve had this research paper on my reading list for a while (but haven’t gotten to reading the full thing)[1]. Not necessarily just a file system but It lays out an entire operating system backed by a database and OS state interactions are done through SQL.

1. https://vldb.org/pvldb/vol15/p21-skiadopoulos.pdf

If you're running Postgres in Docker, you can just stop the container and tar up the data volume and start it somewhere else

(This also works without Docker of course, but Docker makes me feel safer when I do stuff like this)

This procedure is officially documented (1).

Since we're talking about copying a folder, I'm not sure why you'd feel safer doing it via an additional layer of abstraction.

(1) https://www.postgresql.org/docs/14/backup-file.html

Why does Docker makes you feel safer here? Is it because you feel more confident there is no database process still running?

It's more about, I don't like having a global directory namespace that everything gets dumped into.

I know that PG data is usually in /var/lib/postgresql/data, but I like Docker because if I ever need a 2nd or 3rd database on one system, I can give them more meaningful names and not feel like I'm touching Important System-Wide Stuff.

If you're an ignorant fool such as myself, your knowledge of a database's internals is sufficient only to inspire fear. With Docker I know that whatever black magic the database uses is contained within a black box I can copy wholesale. Anything important must be in there, since I didn't give it anything else!

Does this work even when the Postgres is updated to a newer version? This would make me want to use docker as well.

I'd have to read the Postgres docs to be sure. Postgres has really good docs.

My intuition is that you're expected to do some kind of migration or export/import, because the on-disk format probably changes between major versions.

> You can tar up your flat files as well, but that is a separate process and can be time consuming for larger directories.

We have file formats like ODF which are just zip packages with regular files inside

I believe hdf5 is the format that can serve that purpose.

This opinion is quite popular nowadays but IMHO it is caused by misunderstanding what a database and database management system is and what it is for.

DBMS (and in particular) RDBMS is not a merely persistence medium for an application - DBMS main role is to _share_ data between applications.

I don’t understand this as distinction with file systems. Networked file systems exist and I have a lot of production code that shares data as files in either some networked file/blob system. Often databases even just store a reference to a file path for stuff like images/other large artifacts that are shared from some file system.

Indeed - a filesystem is a form of DBMS and provides data sharing capabilities. These capabilities are very poor though - especially its data model (or rather lack of one) makes data sharing difficult as applications cannot rely on it and implement their own proprietary data model instead.

Read-only sharing is trivial. Concurrent write accesses, not so much.

> There is no fundamental difference between a database and a flat file, it is all bytes on a disk/memory in the end.

Curious as to what is not just "all bytes on a disk/memory in the end"? Also can we please stop calling datastores databases?

I think that the direction is moving torward SQLite which is a flat file database that has the most installs from apps in general.

A flat file database? It is a single file. This means you need specialized tools to read it, and if you want to sync it a lot of cloud storage solutions will force you to resync every change. This is why a transparent file system db is great for integration into native storage. Databases don't really optimize for the file system layer. I am hopeful one-day we'll have an auto organizing flat file system that auto sorts and sections data in a predictable manner.

I love this point. I had honestly never thought of the question this way before.

HN, are you feeling okay?

Many top posts in the past weeks were all about using SQLite for usecases it wasnt originally designed to do, using it as a distributed database, even.

Now, here's a post which misses the EXACT usecase sqlite was built for, and its barely even mentioned.

SQLite is quite literally the best solution here. Its a file, you open it, and boom you got a database. You can explore it with sqlite3 cli or datagrip or anything you want.

This is like saying "lets use png for animated images" and then searching for "a format to handle uncompressed images with transparency" and not mentioning png.

> Its a file, you open it, and boom you got a database.

Boom! What is the program to open (understand) it? Even my elixir's core lib cannot understand it.

Any SQLite client, which has not only been included on many devices for 20+ years, but you can also just install `sqlite3` and open it manually anyways. You also cannot open an image file without an image viewer, a pdf file without a pdf viewer, and any "hand rolled" raw text format you built yourself (if the file gets large enough) WILL be unreadable just as much.

Or any SQL DB viewer, like DataGrip.

There is tons of photo viewers, tons of pdf viewers, to the point the market is max saturated because it IS SO EASY to create and lots of demands drive the creation. Unlike file format for developers with sophisticated spec. You can also claim that Postgresql is just a bunch of files and folders, it's just viewer, it's just a binary code that read it blah blah.

> There is tons of photo viewers, tons of pdf viewers, to the point the market is max saturated because it IS SO EASY to create and lots of demands drive the creation.

And guess what the market situation for database viewers is like...?

I remember when I got started with webdev around 2002, a lot was built on flat files. And so did I, because that’s how my “internet mentor” did it: guestbooks, bulletin boards, mailing lists, shoutboxes, CMSs. All in flat files, except I used php and not Perl like him.

I also did some Perl and that was super common. I also remember most 90s non-web software also using flat files. Even enterprise stuff, even when they had centralized storage in a file server. Sometimes they had databases but those were also accessed directly in the disk, sqlite-style. I remember seeing Clipper, FoxPro and MS Access being used in this manner.

After Perl I graduated to doing web stuff with a Microsoft Access (!) file stored in the disk alongside my .ASP scripts, exactly like one would do with sqlite today. It was quite performant.

I wonder if your "internet mentor" was the same as mine: Matt from Matt's Script Archive @ http://www.scriptarchive.com.

I remember scriptarchive, which I’ve browsed quite a lot, but it wasnt’t Matt. It was a german guy who had a similiar script collection, some of which he sold online. He reviewed and even tested my scripts and helped me along when I got stuck on incorrect cgi-bin file permissions or syntax errors and stuff like that when I was just learning things. Invaluable.

Ha, at that time I ran a website in Germany where I published my own Perl scripts for web forums, guestbooks, counters etc. (all based on flat files) and I helped other people getting started with web programming and "hacking" as well. I know there were a few such sites at the time but mine was quite popular in Germany. We had a really nice community at the time, I still fondly remember all the discussions and the general "small world" feeling the Internet had back then. I guess if you tried to build an online community like that today it would get overrun by trolls and spambots in no time. Those were the good old days.

I started with php and msql back in ... 1996. Then in 98 took a job and much was perl. This was a web agency, and there was one client running Oracle on a sparc (IIRC), and some folks were tested out 'asp' and using MSSQL 6.5 (again IIRC - 7 wasn't out until later). But for those of us using perl... we were forced in to flat files and/or dbm files for our data storage needs.

I tried to ask for msql or mysql, but were told 'no', that they were 'too heavy' for what we needed, so the folks doing perl were pushed back to the dark ages. We also weren't allowed to use 'shtml' - basic server side processing for things like 'include' to include common footers. Whenever we needed to make a footer change we had to write multiple search/replace scripts across dozens of client sites. The new 'asp' folks lauded this over us for a while with "look how advanced asp and Microsoft is - MS really gets the web". I showed what I'd been doing with PHP for 2-3 years at that point, with includes and more. But "well, that's not really powerful enough". Kept moving the goalposts.

tldr - there were many options for more advanced stuff than flat files back in the earliest web days, but often people were hamstrung by short sighted tech decisions made by people who were not responsible for actually delivering the work. has much changed in the past 25+ years? ;)

Same. Around then, I requested MySQL and was told no for the same reasons you were. So I implemented my database as flat files and proceeded to use up all the inodes a few weeks later on a large call center server used for interviewing and data collection.

I received a panicked call from the admin that turned down my request. He wasn’t happy with me but understood the role he played in creating an expensive disaster.

Plenty of learning to go around in that experience :)

Anyone here remember CuteNews? That was my favourite flatfile CMS back in the day.

Yep! Pretty popular in late 90s to mid 2000s from what I remember?

Same here. The first real dynamic website I created used flat files and PHP, because that’s what I learned from friends online. Then shortly afterwards I taught myself how to use MySQL.

I don’t use MySQL or PHP for anything nowadays, but was extremely useful to learn in the early 2000s.

Ah, I miss those days! Writing a guestbook in PHP was how many got started back then!

I'm trying to get away from a DB-based CMS for some company web sites. Static generators won't do for a number of reasons, so a flat-file CMS seems like a good fit.

Currently I'm looking at GravCMS [1] as an alternative. It's free initially, but it can become somewhat expensive with many official plugins. But it's file format is Markdown, and one can combine multiple files into a so-called modular page. It has a backend for editing, forms and e-mailing of form submissions. Seems perfect for small and mid-sized company web site.

Another option I considered was Kirby [2]. Its backend UI is configurable. That's nice in theory but the documentation is somewhat lacking, in my opinion. I've used the starterpack and it took me hours to find the one command to be able to add new pages. Its content format is also custom, not Markdown. Finally, it's €100 per site.

Also, a few days ago, I stumbled upon Typemill [3] which I will check next week.

[1] https://getgrav.org/

[2] https://getkirby.com/

[3] https://typemill.net/

I can wholeheartedly vouch for Grav. It’s absurdly fast, easy to deploy and even easier to template for thanks to Twig. When I was still freelancing and a project was beyond the scope of htmlcssjs, Grav CMS became my tool of choice. Their admin plug-in makes for a easy to use backend GUI and it’s configurable enough to have non-techies use it without losing sleep.

One of the newer features are the so called FlexObjects. It’s an absolutely great idea for a CMS but explaining the possibilities and technical intricacies seems moot as the documentation and Discord community are a better place to start learning. [1]

Websites built with Grav compare to SSG speeds while maintaining a different ease of use and much less time invest to roll out.

And sticking to the topic: being completely flat-file centric, those websites are a breeze to maintain and according to my albeit limited experience also a bit sturdier security wise.

[1] https://learn.getgrav.org/17/advanced/flex

I can‘t recommend Kirby enough:


I built many of my (project) websites with it,





it‘s blazingly fast (performance-wise and in terms of building your site), super easy to customize and I never missed my database.

Those are some pretty interesting websites of yours! I checked them out to see if they shared some sort of common Kirby look-and-feel, but you seem to put enough emphasis on design to offset any such thing. The projects look very interesting too.

Thank you!

Kirby really makes it easy to create unique stuff, there’s also a nice „Made with Kirby“ catalog here:


Why do I need a special API to store my data on the file system? Doesn't the file system have an API? And then my interface to that would be a lightweight layer specific to my application.

Yes, exactly! Copying, renaming, backups, hierarchy - it‘s all there already!

+1 to this. I made some websites for hire in my earlier days and really enjoyed using Kirby. It was simple to setup, simple to host, and then simple to teach the clients how to update the content themselves. I just checked there and one of them is still going strong nearly 10 years later.

uncle roger voice just use SQLite haiyaaaaa

I'd love watch a uncle Roger programming-wise reviews.

sees unparameterized SQL query being written no no no no no no haiyaaah uncle roger had to put down foot from chair

JomaTech has done a colab or two with him, but I agree -- I feel like Joma could build his own dedicated character of that type.

Maybe I would as well. Who is Uncle Roger though?

He's a character played by a comedian. Here's the video that made him popular https://www.youtube.com/watch?v=53me-ICi_f8

This made me giggle

lol, that guys a fucking racist sponge

FUCKYY whomever downvoted me

Fuck you u racist

I originally wrote https://www.flatpress.org/ as an alternative to WordPress with a flat-file database. I no longer work on it, but an enthusiast user (Arvid) took over the development, so it is now actively maintained.

A file system can be considered a document database with the key being the file name.

Some filesystems even have very database-like features e.g. ZFS has copy-on-write snapshotting, sort-of transactions (you can run a Lua script atomically, thought it's meant to be an admin feature). And lets not forget WinFS!

Still, your average extfs/NTFS is a really bad database. There's a reason people created "real" databases, and as far as I can tell the only advantage of flat files is that you can use a text editor instead of an SQL editor to view them, which seems pretty minor given all the downsides.

SQLite is probably better in almost every scenario.

If you used it somewhat like how you use redis , you’d probably run out of inodes and other performance issues. Directories aren’t designed to have millions of files in them.

Depends on the filesystem. ReiserFS used to be popular for mail servers or especially news servers, precisely because it was great at handling millions of small files.

Database is basically just ab abstraction to help you ingest and query flat files on disk.

If you skip that and do your own ingestion and querying...you just have a built-in database?

This might have some performance shortcomings, but given maturity of caching these days, you can probably just add caching layer in front of the inefficient query engine when the needs arise.

As someone not overly backend proficient, I love the concept of flat-file. I just rebuilt my blog with another flat file CMS, literally finished it last night, and today I am reading this article and think, whether WonderCMS may have been a better choice.. but, no I won't start over.

The CMS I picked is PicoCMS - what flat file stuff have you used and can recommend? And probably important, how well does it scale? I cant give much input here, my blog has 21 posts so.. no ceiling in sight yet.

Here I made a post where I ramble a bit about blog systems, as far as I got in contact: https://riidom.eu/posts/021_en

Before you worry about 1 million daily visitors. Worry about 50 daily visitors.

Even 10 daily users must be acknowledged as a big milestone

Also it’s waaay easier to scale the flat file solutions to a million visitors.

>but, no I won't start over

The result looks very good so don't think it's worth. By the way since you copied your posts to your new blog but still retain them to the old one, you may want to add the canonical meta tag to the old ones.

Good idea, thanks!

Just checked your blog. The text size seems to big on mobile https://imgur.com/a/rwHI5WU

For some reason I get typescript errors and a blank page on imgur since today, both in Vivaldi and Firefox.

But I'll look into the issue, thanks for the feedback!

I like plaintext as I have the option to make edits with various editors in addition to the main application.

This can lead to easier search and manipulation.

I use Logseq and Noteplan, both store data in plaintext.

As a secondary editor to both I use Sublime Text for very fast search and replace, or diagnosing why Logseq might be having issues.

I’ve had issues before in other note taking / task apps and when I run into problems it is much harder to search and manipulate a database file. That’s even if the app has a sensible database format.

Of course depending on how you use and might need edit data you might find databases are better for you. I’ve always found it easier to work with plaintext for note taking/ task apps.

Do you use Logseq and NotePlan together, or as separate silos? I’ve been trying to figure out a way to use them in sync, but it doesn’t quite work. NotePlan is pretty good at making org functions possible just in markdown files tho so I keep leaning towards sticking there — but then I remember how powerful org is and I’m back where I started.

Basically, if you have your workflow written up anywhere I’d love to read it.

Lately I've become a pretty big believer in flat files.

I don't see much value at all in small software for the sake of being small, but files are something really special.

You can SyncThing them. You can version control them. You can browse and delete them.

And best of all, they're not really flat anymore. A file is a typed object that tools know how to work with(As long as you do the sane thing and use a common file type).

Databases are often appropriate, but in those cases, I think sqlite is the better choice more often than not, unless you've truly got a large number of users all doing things at once.

i’m using Zola static site generator for my website (after switching from Wordpress)



if you need more than that for a static blog (like Gatsby/NextJS) i’d consider you unreasonable

Last time I tried Zola (about a year ago) it was not very clear how do you use and manage themes (GIT downloads in a sub-directory IIRC) and it was way too flexible for its own good IMO. What I mean is that you can determine directories and sub-directories for everything by yourself (I think?).

In general I liked Zola but I found it hard to work with. I really wanted to set a few metadata fields, pick and apply a theme, put a few meta-pages (like blog index), write 3 articles and be done with it. And it didn't serve me well for that.

Have you considered writing a guide for how did you setup your Zola blog? I'd read and try it.

Zola isn’t a blog engine per se, but a static side generator with markdown for content management

in Zola terms theme really means a pre-built site

that said, i don’t have a guide, but i have made a sleek template for Zola (link above) which anybody can use in minutes and deploy on Netlify from GitHub

the template is a blog with projects section, although it can be extended to add every section imaginable

the price for the theme is significantly lower than the hours you need to put in to build a similar one from scratch

(sorry for ad)

I use PluXml[1] for a while on my personal blog en other sites I've created. The contents is stored in XML files. To be fast, the post creation date and tags are stored directly in the filename. This hence benefits from native OS file search.

[1] https://github.com/pluxml/PluXml

writing blog posts in xml isn’t nice

sqlite seems better here. text files need to be parsed and loaded. fsOpen has sifficient overhead to be noticable when you use it a lot. unless you're statically pregenerating the html, sqlite is a super fast format with good library support and will give you the fastest access to the disk and its still a flat file.

I think DokuWiki also deserves a mention for being pretty flat-file with a lot of features and extensions

I would argue that DokuWiki actually implements its own little custom database, with its metadata, metadata indexes, fulltext indexes etc., adding complexity and doing it probably in a shittier way than a "real database" would do.

Why don’t more people store all the uploaded files inside BLOBS in a relational database instead of a file system? What are the downsides?

I see the biggest downside being that the webserver will call your PHP app which will then proxy the data through itself one time. Well, maybe you can use ReactPHP or Swoole and then it’s faster than Node.js even…

Also you then have more programmable options, for example around access control lists or capabilities to access stuff, caching and expiring caches.

How does MySQL compre to ext3 for storing terabytes of data?

As a side note - what do y’all think of FUSE to Amazon S3? Just map a path to it and let it do the rest. GOOFYS I think skimps on the POSIX compliance in favor of speed.

You usually want as much of your database stored in memory as possible. Binary blobs would bloat out the db pretty quick.

Fuse for s3 might work but you risk the chance of a generic abstraction triggering a massive set of actions that cost a lot. Something like running find or checking the size of a directory could end up costing you thousands.

This seems like the perfect post to spruik this cool tool (not mine, but greatly appreciated by me):


The Flat File Extractor command line utility has saved my bacon a few times in my career. If I'm dealing with legacy code, data migrations, needing to quickly perform some surgical extraction or munging into another format - ffe is the tool for me.

EDIT: looks like someone's forked and update the tool. https://github.com/igitur/ffe

When I rebuilt my poetry website (using Svelte) I was really keen to get rid of any need for a database on the backend (to save a few pennies on the hosting). I ended up with a folder for all the poems (html snippets) which the frontend can fetch when user navigates to a poem's page. The metadata associated with the poems went into a json file which gets fetched on initial page load.

If I did the work today I'd probably store the metadata in an SQLite database running on the frontend, but I'd keep the poem copy in their own files in the folder.

Using flat files only solves half the problem, in order to truly be able to host anywhere, you also need to be able to ditch PHP and server side processing requirements. (Wonder CMS, highlighted in this article, requires PHP with a couple of mods.) That’s why I think static site generators are the more interesting code in this space. I’m a big fan of Gatsby, but there are several other great ones as well. This allows you to host your blog on even an old school “tilde club” type site, and it works just fine.

> Using flat files only solves half the problem, in order to truly be able to host anywhere

While that may be a problem for some, I don't think it's the problem for the purposes of this article and discussion.

Different situations have different requirements.

Still, it is a consideration, and it all factors in. Copying your WonderCMS files to another host, great… oops, it doesn’t have PHP mbstring extension, sorry, won’t run. (And if you’re trying to use “free” hosting, you probably don’t have privs to install the missing requirements.)

What I mean is that being able to host anywhere is a constraint you're adding, not one that's in the original article.

If you have zero control over your environment and no ability to inspect it, then I agree that static files are probably the way to go.

I disagree, it was touched on, and glossed over, in the article.

>> “… means being able to start your website by dumping everything into a folder and you're pretty much done (1 step installs).”

>> “Dumping everything into a folder and it just works (1 step installs) makes it easy for even beginners to get a website running, but also it makes your website, blog, forum, journal, or whatever, portable. Very easily portable. You need to put your website on a thumb drive and/or move it to a different domain? Done. Easy. It's just like backing up any folder.”

Sometime in the late 1990's, some programmer in some forum was bragging about the speed of some program of his that was working with a 50,000 record database. I don't remember all the exact details. I replied something along the lines that I regularly load text files having more lines than that into a text editor, that it's almost instant on today's hardware.

I feel like you can achieve some pretty awesome feats with a few files and docker. Setting up a simple Mongodb is dead simple. Adding a Fast API is dead simple. Serving a small site via Flask is dead simple. And with a simple docker exec command I can dump the database to JSON. And I can even use Mongo Compass to tinker with the database and try out queries and create indices.

Guile was my CMS in the early 00s. Page data was stored in a sexpr file, which a Guile script read and turned into pages.

Does it have to be one or another with SQLite?

Is a flat file usually considered text, or can it be a binary format as well? Different websites give different answers.

It can be binary. In fact, the origin of flat files goes back to early data processing on mainframes. At that time there was no databases (as we know it today) yet. It was a collection of records stored on punched cards, and loaded on tape (later disks) for processing sequentially. Records were not just text; they were a mix of binary, text, and BCD encoding.

The article's author seems surprised that there's a flat-file forum. What if we told them that the Fossil SCM is a flat-file (sqlite3) forum, wiki, _and_ SCM all in the same flat file? (Commenting on the original article requires an account i'm not willing to set up.)

Flat file is more suited toward small, well defined with simple integration requirements. A lot of simple web sites and services fall into this category.

If there is any possible need to integrate into a larger overall system in ways that have yet to be fully imagined or defined, a DBMS might be a better option.

I used to run a shared webhost provider back in the early 2000s. Flat-file software was a big deal for our customers back then because MySQL and PostgreSQL databases cost actual money.

Is sqlite a flat file disqualifier?

I would say yes, that's a disqualification: It's a single file, but not "flat" in the sense of directly human- (or browser-) readable.

I don’t have a dog in the fight, but I would think any real db would be a disqualified in the spirit of flat files. But, SQLite honestly seems like the best of both worlds to me.

no, SQLite was built to be a flat file db. Its quite exactly the best solution if you need a db but want to have it contained in a folder with your other files.

But you can't just put

in an URL and expect a user's browser to display something sensible like it would with an HTML file (or even a fixed-format text data file). You need a process on your server that translates the content in the database to browser-readable HTML.

So in the sense of "put up a website by just transferring static files", an Sqlite database isn't quite "a flat file". It's a single file, yes; but its content is "bumpy" -- not "flat".

(Then again, once you have that translate-content-to-HTML process installed on your Web server, updating the content is reduced to just transferring a new .db file to it.)

I can recommend HTMLy and automad.

HTMLy can easily work with 20.000 posts, without any noteworthy slowdown.

I concur. I am using HTMLy for my blog since 3 years and it is really a nice infrastructure. At the same time, it is simple and powerful.

One benefit of flat file is you can host it on GitHub pages.

Turns out there are many ways to keep doing things wrong that about the problem being users...

Flat file websites make sense when you have a blog and such. I don't even know why you'd use a CMS when Next.js or SvelteKit can just generate all the pages from a folder with markdown files. Just write your content in Markdown or pull your content in at the static build stage and your content is ready to go.

Can anyone explain how sites like Flatboard handle user input into a flat file system? How do they handle conflicts with just one file? What's the workflow and set up like — can you use serverless or do you have to have a full server?

One thing CMSes have that SSGs don’t is: the ability to create posts without editing the source. Simply login to your admin panel and start writing. No Git repository or build scripts required.

CMS/WYSIWYG functionality does not necessarily have to be baked into the application that serves the content

i've found that folks without git experience are fine to use github.com as a cms when you take 20 minutes to show them how to edit, submit as a pr, and check the preview deployment before merging


There is a long line of people complaining that git is hard to learn on the internet. But I've never met anybody on the real world that had any problem with the basic cycle of edit - push - review - accept (or the even simpler edit - push).

You can directly push after edit? No add, no commit?

Some software join it all in one step. Like the GitHub GUI the GP is talking about.

There are some really nice tools out there that bridge that gap, like Forestry:


CMSes and SSGs aren't necessarily mutually exclusive: WordPress can be an SSG. The only thing about SSGs are they don't static webpages for serving but the process of creating it doesn't matter, whether it's handcrafted (MarkDown) or integrated into a CMS.

Funny you mention SvelteKit - my former blog version was written with Svelte/SvelteKit (using the static adapter).

I wrote a bit about why I went away from this, but in a nutshell: I am not using "modern" deploying techniques, and upping the compiled blob each time I write a new post felt like a huge drag to me. This is a general problem pretty much baked into SSG's, but I have hope that someone will adress this problem in near future.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact