
How NoSQL forced the evolution of a scalable relational database - rickn
http://blog.memsql.com/nosql/
======
stickfigure
Everyone seems to want to compare against MongoDB, but when I think NoSQL I
think about Google Cloud Datastore and Amazon DynamoDB. Databases which are
fully hosted, infinitely scalable, zero-maintenance, transactional, reliable,
and - at least with Google's offering - scales down to a free tier. They
aren't perfect or applicable in every situation, but they're cheap and easy
enough to allow a one- or two-programmer team to achieve massive scale without
hiring devops.

MemSQL is so expensive you have to call for a quote.

If it's just "the ability to scale" then sure, SQL is back on the menu.
MemSQL, Spanner, CockroachDB are leading the charge and that's great! But you
have to pay a pretty penny for it (in TCO). There's still a lot of value in a
cheap fire-and-forget scalable database, and there are not currently any SQL
options there.

Regarding schemaless: I think this is a divide between people who work in
dynamic languages vs people who work in statically typed languages. Schemaless
databases are just fine in languages like Java; your classes define the schema
with enough rigidity to keep you out of trouble. They wouldn't be my choice
for Javascript though, that's for sure.

~~~
nojvek
I don’t understand the “call to get a quote” sales cycle. I bet memsql has a
sales time the size of their engineering team trying to get those fat
Enterprise deals but in the grand scheme of things why should I care?

It doesn’t have a free tier, I can’t use it easily and just seems like a lot
of hype without any 3rd party to back up their claims. Its the kind of things
mongodb used to say to gain mindshare.

Most popular databases like MySQL and Postgres gained its mindshare because
they were open source and anyone could verify they did what they said. Only a
couple closed source databases have won it big. Behind them were huge
monoliths like Microsoft and Oracle with armies of sales teams.

I’m not saying MemSQL will die. They are probably quite profitable. I work for
an analytics company with a custom database. It’s just the average joe doesn’t
get much from it. The cloud based dbs gets the average joe very far.

~~~
manigandham
They're focusing on customers that can afford it, so it's not about you caring
but you not being their primary customer.

You can download the developer edition and it does everything except cross-
region replication if you want to test it.

It's a great product and the real deal, we used it before (as a startup too)
and it did the job and is one of the most polished databases out there.

~~~
DenisM
Nick, your account is shadow-banned. Email hn admins to revert the ban.

~~~
DenisM
Aaand it's back to normal. Welcome to the club.

------
keithnz
I think NoSql, especially things like Mongo, got popular because it is super
easy to program with javascript. While scaling is one of the advantages, I'd
be super surprised if many people actually need scaling capabilities ( other
than because their design is super inefficient ).

Recently I've been inspired to play around with kicking out as many layers
between a relational db and a REST Api, largely because I've been watching
[https://www.twitch.tv/nybblesio](https://www.twitch.tv/nybblesio) ( who I
first came across here on HN on a thread about live coding via twitch
streaming ). One of his projects he works on is a SASS product which is
largely done in postgresql leveraging
[https://github.com/PostgREST/postgrest](https://github.com/PostgREST/postgrest)
. Seems pretty good, going to be interesting to see how it turns out ( though
I haven't seen him work on it lately )

I tend to like SQL Server, ASP.NET Core and found
[https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018...](https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018/01/29/simplify-
rest-api-development-modern-single-page-apps-sql-server/) and I've been
playing around with this kind of idea and using a lightweight mapper when
needed (
[https://github.com/StackExchange/Dapper](https://github.com/StackExchange/Dapper)
which the stackoverflow people created )

I gotta say, I kind of like it

One of the problems of many stacks is that the frameworks wrap general purpose
languages over SQL, which, is not really a good idea, SQL is a vastly more
capable language for dealing with relational data and layers built over the
top often dumb down the database.

The REAL problem SQL wrappers need to solve is that most languages don't have
a good interface with SQL. So most often SQL is handled in strings and
manipulated with string manipulation and there is no type safe way
transitioning data from the DB to a general purpose language.

Microsoft at one stage had Linq to SQL which was quite good..... but they
killed it :)

~~~
189723954
>I think NoSql, especially things like Mongo, got popular because it is super
easy to program with javascript.

I chose Mongo for a project recently. I have many years experience with SQL
databases, and I think SQL will become more of a niche in the future.

Here are the reasons:

Everybody uses an ORM with a SQL database. You can pretend they don't and
everyone is writing raw sql, but they aren't. This is basically a big,
complicated, and slow piece of software that tries to make a SQL database into
something else. A NoSQL database is like a native version of that. It starts
off with you being able to define collections and schemas in code and so on.

SQL is a shit way to query a database. I used to write massive queries that
took an hour or so to write just so that the non-programming people could put
them into analytics software that didn't support any other methods of input.
While I was writing these queries, I was just thinking to myself "I can write
this in code in a few minutes instead of an hour or more". The fact is that
programming languages are much better at querying databases than SQL.

Joins are slow and complicated, and ORMs are slow and complicated, so you can
never actually use any of these things in a high-traffic environment. As you
say, not many people will reach these levels of traffic, but it is still a
factor.

Most of the comments about "I moved from mongo to postgres" lately are first-
time programmers who didn't know any of the basic concepts of databases. They
then discovered patterns that the SQL database forced on them and declared
that mongo sucks when in reality, they just didn't know what they were doing.

>One of the problems of many stacks is that the frameworks wrap general
purpose languages over SQL, which, is not really a good idea, SQL is a vastly
more capable language for dealing with relational data and layers built over
the top often dumb down the database.

This is not true at all. Programming languages are vastly superior at querying
a database. Just go write a complex query and compare it to the one in linq or
whatever you use.

>Microsoft at one stage had Linq to SQL which was quite good..... but they
killed it :)

They have entity framework, which uses linq and is the same thing. It is very
slow though.

~~~
matwood
> Everybody uses an ORM with a SQL database.

I'm ripping ORMs out of any backend code and using things like jOOQ.

> I was just thinking to myself "I can write this in code in a few minutes
> instead of an hour or more".... > Joins are slow and complicated

You know what's really slow? Creating joins in code.

> This is not true at all. Programming languages are vastly superior at
> querying a database. Just go write a complex query and compare it to the one
> in linq or whatever you use.

It makes zero sense to pull a bunch of records back from the db using multiple
network calls to join and then filter. Let the db do its job.

~~~
mmt
> It makes zero sense to pull a bunch of records back from the db using
> multiple network calls to join and then filter. Let the db do its job.

But then it wouldn't be distributed processing! :)

Seriously, though, consider it for a moment.. this pattern has similar
features to something like Hadoop. The data comes from storage nodes (database
server and, hopefully, their read replicas) and goes to processing nodes (app
server) to have the work done and is then new data is written back out over
the network to storage nodes and replicated across the network (to the
replica/slave database servers).

If the data volume is particularly low or the compute load (CPU and/or RAM) is
particularly high, the distributed method would make intuitive sense. I
haven't seen it yet, however.

~~~
jandrewrogers
Doing distributed joins correctly requires an architectural/technical
capability that most distributed database engines don't have: decentralized
parallel orchestration. If you have this, you can do joins even with very high
data volumes efficiently given good parallel scheduling algorithms. Most
databases are designed such that there is a single point of control that
declaratively schedules all data flows required to execute the query; this
scales poorly for operations like joins, never mind recursive joins, which is
why you don't see it.

Letting individual database nodes dynamically schedule and orchestrate their
own data flows with each other, essentially allowing each node in the parallel
system construct its own execution plan in relation to other nodes as it goes
along, does not fit within the "giant distributed file system" paradigm that
most distributed systems are based on. People who design codes for
supercomputers are often familiar with parallel orchestration idioms that work
at extremely large scales but it hasn't crossed over into ordinary distributed
database engines. (This is also a good litmus test for what makes a database
"parallel" as distinct from "distributed".)

Most distributed database architectures are much more centralized than they
need to be, particularly around control of execution planning, and this limits
their expressiveness. It is quite difficult to hack together a distributed
join that performs better than a centralized one without good support for
parallel orchestration.

~~~
dman
Could you point out any open source supercomputing / data products that get
this right (ie have decentralized data flows?).

~~~
mmt
I'm obviously not the parent commenter and have no inside knowledge of the HPC
world, but my guess is the main open source supercompting projects would be
from NASA, the US national labs, and CERN.

You might search HN for the recent announcements about new clusters,
especially top500, and look for the comments discussing using MPI (versus
something custom, I think?), as my recollection is that those topics would
yield further pointers to the actual examples you're looking for.

------
antirez
Redis is not mentioned here at all so maybe the author is thinking mostly of
other NoSQL software here, but well, in the case of Redis the whole point was
not just the in-memory performance part, but the data model as well. My claim
is that you can't really exploit the advantage of using memory if you
perpetuate in using the memory to represent the same data model that you were
using with relational databases. For instance think at Redis sorted sets in
the use case of leaderboards in popular games (the same pattern is used in a
number of applications that have nothing to do with games). To use SQL, even
an in memory one, in such use case is not going to work. So in the case of
Redis the point was to remove the interface between the user and the way the
data is actually fetched from data structures, to make the user do the choices
in a very direct way. Thus this article does not apply to Redis in my opinion.
I've the feeling that many other NoSQL products could argument like that, but
in other areas of their research and difference. For instance I've issues to
see how modern SQL systems can replace CRDTs based stores.

~~~
threeseed
This is the problem with people that use the term NoSQL.

Do you mean Cassandra (BigTable), MongoDB (Document), Riak (Key/Value), Redis
(Mix), Kafka (Log Structured) ? There are dozens of fundamentally different
systems many of which are closer to an RDBMS than their NoSQL peers. And many
of them have rigid schemas so it definitely isn't that either.

~~~
charlesvdv
You can also add Neo4j (graph) as part of the NoSQL family, right?

~~~
skohan
Yeah in general it seems like NoSQL is used to proxy "MongoDB and other key-
value stores."

Graph databases arguably handle relational data better than SQL, and I'd argue
they're much nicer to work with.

~~~
eftpotrm
And MS SQL Server now supports graph databases; it doesn't have to be either-
or.

------
nostalgeek
Well, this is exactly why competition is good, in every domain. That's why
PostgreSQL got its column mode and JSON type, because there was a clear use
case for performance for the former and schema-less data for the latter. EAV
pattern is a plague and I'm glad documents are replacing it.

~~~
icebraining
Postgres has had hstore for schema-less data since 2003.

~~~
igl
This is a very blunt statement and overvalues hstore. Keys AND values in
hstore are and can only be text strings.

~~~
MarHoff
With arrays and composite type the potential for semi-structured schema was
there for decades. The concept just got more mainstream with NoSQL, more easy
to maintain and more efficient with indexables type such as JSONB.

------
janemanos
What concernes me is that the article is strongly biased in the sense, that
modern developments of non-relational databases just get ignored while ranting
on their state from a few years ago. Schemaless databases do support full ACID
transactions (MongoDB, ArangoDB), with some you can also enforce a schema and
not everybody loves SQL. So having competition even on the query language-
side, can only improve the status quo.

Yes, thinking about your data model will for sure increase your understanding
of what you are actually doing... and every professional developer is doing
this as well when choosing a non-relational database. Anythings else would be
stupid.

For me the article rather shows that the modern developments in non-relational
databases do affect vendors on the relational side of the spectrum. Otherwise,
it would not make sense to invest so much time in writing such a long article.

------
rickn
My name is Rick Negrin. I run the Product Management team at MemSQL, a
scalable relational database. I recently wrote a blog on my thoughts regarding
NoSQL vs. Relational Databases and I'd love to hear the community’s thoughts
on this.

~~~
sqlcook
Hi Rick, I've been following Memsql for a few years now, are there any plans
to release "community" edition? Last time I checked about 1.5 years ago json
support was very basic and EE pricing (dont remember exact #s) was rather
high. Thanks

~~~
siscia
What are you looking for in a "community" edition?

~~~
ddorian43
Something that you can use on production based on what your db advertises as
it's advantages (high availability + sharding).

~~~
siscia
I was asking because I am the creator of RediSQL[1] -- SQL steroids for Redis
-- which is a less sophisticated product than MemSQL but still has its own use
cases.

And maybe for parent was enough, or if not it would be very interesting to
know what is missing.

[1]: [http://redbeardlab.tech/rediSQL/](http://redbeardlab.tech/rediSQL/)

~~~
ddorian43
The same answer applies to your product.

~~~
siscia
Honestly, I believe that for small workload you can definitely use RediSQL in
production, it will happily contain your cache or it will be a great SQL
database.

However, I need a way to cut it between people just using the free product and
people actually supporting the project, so provide as paying feature something
that the big company will require it seemed to me the only way to go.

Unfortunately, I don't have the capital nor the bandwidth to go with fully
open source product and selling just support, which I don't believe is anyway
a good business model.

If you were in my shoes, you would do something different?

~~~
ddorian43
To be honest, I fail to see what I could use your product for so I'm out of
the target audience.

Assuming nosql is for something very efficient or very scalable, I need some
space to use it before I have to shell $$. There are many products where I
have to pay before going on production.

~~~
siscia
It really depends on what you are building.

If I were building a fast prototype I would not use a postgres box anymore but
just a redis one.

If you need to cache data in a way more complex than just key->value you don't
have too many alternatives at the moment.

If you want an easy and fast way to have an SQL engine in memory, again is not
going to be simple.

If you need a separated database for every of your user there are no many
alternatives that I am aware of.

It is definitely not a revolutionary product, but it has it's niche, any of
the problems that I mentioned can be solved in a different way, but those
different ways are quite complex.

------
ChicagoDave
NoSQL is an excellent technology for rapid R&D, but once a domain “settles”,
data should be modeled and data stores should be switched to relational or
graph backends.

If a system already has a well-defined domain, NoSQL adds little value.

Of course you could leverage AWS DynamoDB and reduce cost, but you still have
downstream implications for things like reporting, which requires a known
schema with relational paradigms.

~~~
wild_preference
I don’t see how NoSQL could be better for rapid dev when something like
Postgres can migrate its schema and data at once while enforcing constraints.
That’s clearly better precisely when your schema is changing.

~~~
ChicagoDave
In R&D scenarios, NoSQL wins hands down because devs are jamming schema
changes hourly. That level of frequency in a relational model is a massive
drag on time and it’s wasted.

If it’s a redevelopment of a well-known domain, depending on reporting
requirements, I’d slightly side with relational.

But AWS DynamoDB can be a huge cost savings, so as any good architect will
say, “It depends.”

------
elvinyung
I feel like this topic was covered better by Stonebraker :p
[https://homes.cs.washington.edu/~billhowe/mapreduce_a_major_...](https://homes.cs.washington.edu/~billhowe/mapreduce_a_major_step_backwards.html)

In the late 90s/early 00s Eric Brewer also wrote a bunch of papers about his
experiences scaling stateful systems at Inktomi. The most famous one is
probably the one that introduced/codified the CAP theorem:
[https://pdfs.semanticscholar.org/5015/8bc1a8a67295ab7bce0550...](https://pdfs.semanticscholar.org/5015/8bc1a8a67295ab7bce0550886a9859000dc2.pdf)

------
richpimp
Someone please help me understand. I use relational databases everyday for my
projects (db, api, frontend). I'm well-versed in SQL and general best
practices, and my projects scale well (admittedly I'm not in the petabytes
level of scale). Does the desire to use NoSQL databases come from a disdain
for having to think in terms of declarative programming instead of imperative?
Is it desirable to have dynamically typed fields in a database? I've always
thought that the biggest motivation to use NoSQL is simply due to high-level
language programmers' unfamiliarity with relational database logic and SQL
syntax. Once I grasped the concepts, the relational model has proven
indispensable for me.

I will say, though, that it is important to get your schema correct at the
beginning. I've also always worked full-stack, so I don't know what it's like
not to have the ability to manipulate the database.

~~~
njharman
> what it's like not to have the ability to manipulate the database.

In general "not wanting to understand, depend on, or work with other
systems/teams" (often cause of dysfunctional eng/pm organizations) has, over
20years experience, been the number one reason devs (esp front end devs cause
they tend to move the fastest (faster than traditional backend teams are
prepared for) and have least experience in back end systems) go for something
new, something they can control, something they can spin up fast and quick.
NoSQL, cloud, anything they can do themselves.

------
uncle_d
My own experience chimes with this somewhat, having used MongoDB on my last
project - quite often application data storage requirements are relatively
trivial - and it is a boon to do away with the ORM layer and be able to vary
any given object schema without touching the database (although in practice a
.js data migration script may be involved, so this is moot). As a sidenote
local cultural issue, the fact that we can operate the MongoDB servers
ourselves, whereas RDBMS instances are with a central team and buried under a
layer of bureaucracy was also probably an operational consideration.

When it comes to joins, these have lately been added to MongoDB, as has SQL -
although the functionality is still rudimentary compared to a mature RDBMS.

But where we experienced pain was when the business decided they wanted to do
live reporting. We ended up piping the data into a SQL Server instance and
using SSRS.

~~~
lyqwyd
The fact that you could operate the mongo instance, but not the DB is
definitely a cultural issue, or more likely a managerial issue. Definitely NOT
a DB issue. I’ve worked at places where we could manage the DB ourselves, and
other places where we couldn’t eve talk to the DBAs without beuraucracy. It’s
all about the management in those scenarios. And the company results were as
you would expect.

~~~
uncle_d
Sure - I’m not claiming that as a database issue at all. It’s just a side
addendum to note that sometimes technical decisions can have a political
aspect.

------
bcheung
How are people running migrations in these distributed SQL environments?
Especially with something like Kubernetes that has canary / rollback
functionality. Do people run their migrations as a manual step, is integrated
into the CI/CD system?

I can't find anybody talking about how they handle it in a scalable way. Not
having a good DB migration story for dev/prod environments with automatic
rollout / rollback makes using a SQL DB a non-starter for me.

Most Docker / Kubernetes documentation and their ecosystem assumes the app is
fully developed and development doesn't exist. Real world companies develop
products and have to perform DB migrations on the production database on a
regular basis with each release.

What am I missing?

------
pull_my_finger
I keep looking at Tarantool longingly, but I can't figure out when I would
need it.

~~~
davidgould
I have the same feeling. If you ever figure this out, let me know. It looks
very interesting, but I don't think my clients will let me remake their
production with it and it is difficult to evaluate databases without data or a
workload.

------
fogetti
It's hard to understand the point of this article. First of all this is a good
example of a completely biased ranting.

Secondly after that it's bashing NoSQL through many chapters later it is using
those same features that NoSQL was capable of in the first place as a selling
point for MemSQL... Seriously? This is a really biased marketing opinion piece
at best.

Not to mention that it talks about new types of analytics but it completely
ignores (or at least it doesn't mention any of) the category of low latency
streaming analytics applications like Apache Storm, Apache Spark, Apache
Kakfka et al.

> "To do this requires a new breed of analytics systems that can scale to
> hundreds of concurrent queries, deliver fast queries without pre-
> aggregation, and ingest data as it is created. On top of that, they want to
> expose data to customers and partners, requiring an operational SLA,
> security capabilities, performance, and scale not possible with current data
> stores"

Guess what! This is what streaming analytics was invented for.

------
fogetti
> "To give an analogy, imagine libraries saying they are doing away with the
> Dewey Decimal System and just throwing the books into a big hole in the
> ground and declaring it a better system because it is way less work for the
> librarians"

No, to give an analogy imagine a library where the librarian only accepts
books which are below a given size, have a specific color and weight, and he
rejects any books which have even a cm bigger size, a different color or it's
just a gram heavier. Also from time to time (let's say each month) the
librarians are sitting together to decide what other books shall they accept
(sometimes extending (and by that inherently limiting) the acceptable ranges
to other properties like title or smell). Also to give a good analogy the
librarians also reject those books which are returned without their softcover.
That's RDBMS for you. It works, but not everyone is satisfied with such a
service.

------
jokoon
Can't you have ACID with write on ram first, write on disk second, delay the
commit by committing after writing on disk, and have some tolerance for the
time between write on ram and write on disk? How probable is it that a crash
happens in that window? The data loss would seem very minimal and rare.

And is ACID always so important for most databases? I don't think most
databases in the world are related to banks or require that much guarantees
and safety. Better hardware often achieves that level of safety.

Other side note, I wish there would be alternative to the SQL "english
sentence" flavor of the language, so that one could give more precise
parameters for querying data, instead of building a sentence query which
doesn't always make a lot of sense when compared to a non-sentence programming
language.

------
beamatronic
If you want a scalable database that holds JSON and lets you query it with
SQL, you should check out Couchbase

------
commandlinefan
> the need to retrain people

Last I checked, we stopped doing that back in the early 90's - learning new
things is your problem and you better know all of them by next Tuesday, or
we'll replace you with somebody who does.

------
pluma
I'm surprised how many people still conflate NoSQL with "document database"
and only think of MongoDB or CouchDB. I'd love to hear how MemSQL addresses
graph data, for example.

~~~
hinkley
Because when the fox has been in the henhouse this long, it’s a fox house, not
a henhouse.

------
stephenr
> NoSQL came into existence because the databases at the time couldn’t handle
> the scale required.

Arguably the first "NoSQL" database was a DIT, generally accessed via the LDAP
protocol. OpenLDAP is one of the better known open source instances, but
Novell eDirectory was there in the early part of this century, as was MS AD.

In the case of Directory Services, it's a completely different approach to
both SQL and the common "Document Store" approach of modern "NoSQL" options.
Unless you go cowboy-mode, you're still adhering to schemas (albeit with the
option to combine schemas), you still use indexed attributes to conduct
searches, but you have a literal tree of objects with a more flexible layout
(e.g. multi-value attributes without needing to use a JSON column) than with a
regular SQL RDBMS.

The rise in popularity of 'modern' NoSQL options (most commonly document
stores) is IMO driven by a combination of lazy developers and too much Kool-
aid consumption.

~~~
logophobia
I would say the the rise of nosql options is driven by the need for
scalability by a few large companies that desperately need it, and cargo-
culting by developers that don't actually need it, but want to be like google
and don't want to take the time to properly learn and optimize sql.

There's a genuine need for good scalable options, and with stuff like google
spanner, those don't necessarily need to be "eventually consistent" or nosql.
The simpler datamodels were probably easier to create and solved the problem,
now I think there'll be a rise of scalable and consistent databases, best of
both worlds.

~~~
189723954
>don't want to take the time to properly learn and optimize sql.

This sounds like an old-wives tail at this point.

~~~
matwood
> This sounds like an old-wives tail at this point.

Except it's not. I've seen it in person, over and over. I've gone into systems
where people were complaining about the 'database is slow', but there were no
indexes. I've seen systems pull back 100s of thousands of records and then
sort on app server and take the first 50. I've seen what would have been a
simple join with exists clause turned into many round trips to the db with
loops and app code complexities where a query would have 100x easier to reason
about.

It's not even about _optimization_ yet, but taking the time to learn even the
surface capabilities of the tool.

------
pavbelshippable
Perfect! We at Shippable moved from NoSQL MongoDB to PostgreSQL for several
reasons. Here is a small story,

It started with small problems...

Even though we had the ability to add features at a lightening pace, we
started seeing occasional downtimes which always seemed to come down to
MongoDB. For instance:

> We were very happy to have 24x7 availability with primary and secondary
> instances of MongoDB. However, our perf suddenly deteriorated one day and
> retrieval started taking more than a second per document. We tried using
> many tools and profilers, but could not figure out what was happening.
> Finally, we rebuilt a new server, switched that over as primary, and rebuilt
> our secondary. Retrieval times dropped to 150ms again. This is still an
> unsolved mystery! > Our Mongo instance reached 4TB and we were proud of our
> growing adoption. Due to the lack of tooling around managing large DBs, we
> relied on indexes to keep the search times low. When NoSQL DBs first became
> popular, there was no way to create uniqueness, so these features were built
> as an afterthought. Some of the bloating of our MongoDB was actually due to
> indexes, but rebuilding them was primitive and the entire DB would lock
> down. > At one point, we needed to reboot our DB server and it took MongoDB
> 4 hours to come back online. This led to an extended downtime for our
> service, and we had very little visibility into the MongoDB process or
> status.

And then came the knockout punch! The biggest advantage, and disadvantage, of
MongoDB is that it has a flexible schema. This means that documents in the
same collection (aka table in the old world) do not need to have the same set
of fields or structure, and common fields in a collection's documents may hold
different types of data. In a nutshell, there are no strict schema rules and
this opens it up to a lot of cowboy tinkering. While many developers love the
flexibility, it also puts a very high degree of responsibility on their
shoulders to get things right.

The whole experience we have written on our blog
[http://blog.shippable.com/why-we-moved-from-nosql-mongodb-
to...](http://blog.shippable.com/why-we-moved-from-nosql-mongodb-to-
postgressql)

~~~
nojvek
Almost everything has a schema. Whether you explicitly write it down or have
it in someone’s head.

If you really want schemaless then use JSON type in a relational DB.

Also with huge memory and risk machines, one could go multi terabytes with a 3
machine cluster on a relational database.

