Databases 101 for the Non-Database Professional

dhd415 · on July 5, 2018

I'm not sure articles like this one do non-database professionals any favors by associating unrelated concepts that are often found together such as relational databases and ACID compliance. There are both relational databases that are not ACID-compliant such as MySQL with the ISAM storage engine and non-relational databases that are ACID-compliant such as Spanner, FoundationDB, etc. The data model doesn't imply one or the other at all.

mwexler · on July 5, 2018

Every time I see someone compare ACID approaches to BASE approaches, first thing that comes to mind is the struggle it took to come up with words to match the BASE acronym and force it into being. More examples of such somewhat-cleverness at https://www.reddit.com/r/forcedacronyms/ (mostly SFW).

mamcx · on July 5, 2018

This still fall in the trap of paint RDBMS as "scary" and more "complex" to use:

> The truth is that relational databases scale up just fine when placed in the hands of people that (1) know what they are doing, (2) aren’t afraid to learn new things, and (3) don’t mind doing the work necessary

This is true FOR ANY KIND OF DATABASE.

In fact, a RDBMS is far easier to use because it TAKE CARE OF THE HARDEST THINGS OF COMPUTER DEVELOPMENT. Good luck for the average joe to understand ACID and try to implement it on top of Mongo.

RDBMS are for everyone NoSql are for Pros.

JackCh · on July 5, 2018

Most RDBMS's got a well deserved reputation for "scary and complex" because they are enterprise grade software serious enough to have an entire career named after their administration. Much of the resources you'll find about these databases is written with the assumption that the reader is in that career track, which often isn't true.

This of course has nothing to do with the relational model, and teaching a newbie to use sqlite is a good way of demonstrating that.

mamcx · on July 5, 2018

This is true. Ironically Access and/or FoxPro was a good way to introduce people to it.

jchanimal · on July 5, 2018

As someone who participated in building the first wave of NoSQL, I have to agree with the sentiment here. Except tabular inflexibility hits everyone.

I wrote a blog post about how FaunaDB offers the best of relational and NoSQL models here: https://blog.fauna.com/unifying-relational-document-graph-an...

mamcx · on July 6, 2018

As hobby I'm building a relational language, and think the problem that the RDBMS folks have overlooked is how operate in partail relations (like slices or tables with less + more fields). You are set with the initial schema and that is. Is not easy to store arbitrary portions of relations, not is even easy to operate in-memory instead.

Plus, the fact the DB is far from the code miss a lot of oportunities.

In my mind, I think the solution is not to have a "database language + storage far away" but code all the app with a embebed db.

Is not as weird as it sound. This is how we operate with foxpro.

VLM · on July 5, 2018

One important topic missed is time. MySQL default table engine was non-ACID MyISAM until 2010, the internet is chock full of pre- 2010 era blogposts authoritatively declaring mysql can never be ACID compliant which is doubly wrong because InnoDB was added around 2010 so I guess its possible to have an eight year old untouched install where innodb is available and possibly in use, but not default. Mysql can't do transactions is another decade-obsolete mantra that is very common for same reason.

Another topic not mentioned is scalability and hardware reqs change over time. Something that would have required a large database cluster and Oracle and a huge pile of money in the 90s or some strange nosql in the 00s can be done today on sqlite on a single ras pi. You have to trace the requirements backwards all the way. "We have to do something exotic unreliable and complicated in 2018" is usually a short version of "We have to do something exotic unreliable and complicated in 2018 because we spec'd how to do that in 2003, but we'd never do that again in 2018"

A topic not discussed is the dreaded anti-pattern of internal platform. Make your DB and the paperwork around it complicated enough, your distant devs will take a dump from the DB and emulate a truly awful and buggy simulation of a real DBMS in javascript or whatever. World is full of devs doing "Select * from ..." then writing their own un-indexed WHERE clause in their frontend UI, for example. I've seen people try to be better at sorting than the DB, cringy stuff. Just saying whenever you have a DBMS its a natural human desire for devs to write an inner platform DBMS emulating things that should be done in the DB.

The final topic I have is busywork expands to fill all available hardware and a constant DBMS problem is people will max out any DBMS, no matter how large, with meaningless trivialities if you have the spare gigs and cycles. And you'll never get an honest answer from business users, everything is critical means nothing is actually critical. Those are business anti-patterns all DBMS need to fight against regardless of technology, and are present at all DBMS installations.

Its possible I missed some of those topics if they were in fact in there.

collyw · on July 5, 2018

MySQL is usually shambles still. I am pretty sure that Django defaulted to MyISAM until later than that. Or maybe it's that most installs were not on the most recent version until well after the 2010 release. I have had some really crappy problems to fix because of tables being a mix of MyISAM and InnoDB, tables being locked because one type has a foreign key to another.

Then there are the default string settings....

Agree with you on crappy devs essentially doing NoSQL in the application layer though. I am starting to hate this industry, where chasing the new trend is becoming more valuable than learning to do things well.

VLM · on July 6, 2018

Your string comment reminded me of another thing, some DBMS have a philosophy of consistently storing data, like MySQL, and others have a philosophy of storing consistent data, like postgresql, and people often confuse the two and run into horrible business rules problems when they intermix.

Consider the idea of a column type "date". Is that supposed to merely be a string format and search and sort key like (old?) mysql or should a date column be something the dbms tries (and usually fails) to authenticate? For an example of a problem with the latter, there are Gregorian calendar dates that do not exist or exist depending on the locale of the data (or the locale of the DBMS, or the locale of the table, or maybe the locale of the query session?), what should a hyper-authoritarian DBMS do to be strict enough? Will there be a leap second on the last day of 3210 AD and if we later decide not, what should happen to existing stored timestamps for that leap second, just core dump the DBMS every time you try to start it?

verletx64 · on July 5, 2018

Is saying that DynamoDB is a columnar database rather than a K/V database splitting hairs?

wmwm · on July 5, 2018

“Error establishing a database connection”