
Things I wished more developers knew about databases - zdw
https://medium.com/@rakyll/things-i-wished-more-developers-knew-about-databases-2d0178464f78
======
chaps
Here's a fun bug I had a few years ago -

Had a postgres database which was using pgbouncer for connection pooling. The
most senior developer (24yo or so) we had on the project was using Go to
connect to the database to write some simple reports, but each report took
hours to run, and often had to sleep for 30+ minutes. So, after a while,
pgbouncer would kill their connection, and their report would die. No other
application did this among the many that we had connect to that DB, so it was
definitely strange.

Found out pretty early on in troubleshooting it that they had no mechanism to
keep the connection alive, which makes total sense for why his app died. So,
they put the library standard keepalive function in a loop if the report
wasn't doing anything.. but that didn't fix it.. it made no friggin' sense.
After bashing my head against that for a while, I finally threw my hands up
and asked if they could just run a "SELECT 1" as a keepalive instead of
whatever the Go library was doing. Got a bit of pushback, but just told him to
do it and walked away. That ended up fixing the problem.

Turns out the Go library was trying to be clever in its keepalives (can't
remember what it was doing exactly), in that it made some silly assumptions
that there was nothing in the middle managing connections.

I like to think that dev learned a lot about trust in "magical" libraries
after that.

~~~
kerng
Go code often reinvents/reimplements a lot of things from scratch,
reintroducing problems that have been addressed long ago in other systems.

It's like this new trend, let's rewrite everything in Go to be cool.
Financially makes little to no sense.

~~~
inopinatus
Let’s not just target Go with that sentiment, it applies almost universally,
just in varying degree.

Counterpoint: how is anyone supposed to learn, if not from their mistakes? We
might worry about the blast radius, but there’s no compression algorithm for
experience.

~~~
kerkeslager
Well, for one, you can learn from other people's mistakes, which is better
than learning from your own, because then you don't have to feel the pain.

Go is a deserving target for this criticism because the language itself
deliberately made a lot of the mistakes other language communities made and
learned from, like weak typing[1] and naive garbage collection algorithms.
Literally if you opened an undergraduate textbook on either topic you'd see
much better ways to do things. But early adopters argued vehemently that Go
was simple and didn't need those things.

It does seem like Go is learning from their mistakes here: they've introduced
precise garbage collection and it seems like some form of generic or template
types are inbound in the next few releases. Perhaps in a few years Go will be
a language I am willing to work in. But it would have been nice if a new
language which already had these problems worked out had become popular,
instead Go, which has reached popularity through hype rather than technical
merit[2].

Tracking the history of template/generic types has been somewhat humorous: you
can almost see it in this article[3] where the author starts in with the title
"Who needs generics!", goes on to describe some frankly _horrible_ ways to get
around the lack of generics (it's amazing how complex Go's simplicity can be!)
and finally backpedals in an update ("I am the last one to balk at the idea of
generics in Go."). I don't mean this to be picking on the author here though--
I've seen this history played out on other blogs and in the comments of Hacker
News as well.

[1] Before you flame me on this, ask yourself if you can articulate the
difference between strong and static types, because if you can't, you don't
have the prerequisite knowledge to have an opinion on this.

[2] It's worth noting that the decisions made in Go probably have merit within
the context of Google. The problem is that most Go users aren't at Google, and
have different problems than Google, so the tradeoffs made by Go are
nonsensical for their use cases.

[3] [https://appliedgo.net/generics/](https://appliedgo.net/generics/)

~~~
inopinatus
The question was and is rhetorical. People don’t inwardly digest the mistakes
of others. And anthropomorphising a language? Most peculiar.

None of this makes NIH a less than widespread phenomenon.

The readers of this forum often do know their type theory. Gatekeeping
otherwise won’t go over well, it just reads like an arrogant insult from
someone utterly lacking in self-awareness and accustomed to presuming
themselves the smartest person in the room with the only relevant opinion.
Ironically, given the subtopic, much like Google often does.

~~~
kerkeslager
> The question was and is rhetorical.

The fact that you ask a question not expecting a direct answer is not proof
that a direct answer does not exist.

> People don’t inwardly digest the mistakes of others.

In my spare time, I'm a rock climber, and mistakes in my rope systems can kill
me. The same is true in mycology, firearms, airplane piloting, civil
engineering. If you really feel that you can only learn from your own mistakes
then I guess it's lucky for you that you've chosen to learn in a field where
the stakes aren't life and death.

> The readers of this forum often do know their type theory.

That's true. The same is not true for the many Gophers who repeatedly claim
that Go has a strong type system, which is who that comment was directed at.

~~~
inopinatus
No, but it’s a trap. If someone answers an obviously rhetorical question,
they’re inadvertently demonstrating a predilection for engaging the
construction, not the substance, of a statement, and almost certainly missing
the ironic subtext.

I’d be happy to repeat my assertion though. People don’t inwardly digest the
mistakes of others, which is why educators on safety-critical topics such as
those mentioned must go to extraordinary lengths to extract and convey the
salient teachings, translated into better practices, drills, equipment etc.

Reading the archives of the Dropzone fatalities database, for example, won’t
make me a better skydiver.

Conversely, the best structured educational processes I’ve experienced are
essentially offering the student the opportunity to make their own mistakes,
but under circumstances that don’t have consequences (other than pedagogical
or scholastic)

~~~
kerkeslager
> No, but it’s a trap. If someone answers an obviously rhetorical question,
> they’re inadvertently demonstrating a predilection for engaging the
> construction, not the substance, of a statement, and almost certainly
> missing the ironic subtext.

That's a pretty self-aggrandizing analysis of the situation.

From my perspective, I got you to make the statement, "People don’t inwardly
digest the mistakes of others", which sounds a lot more absurd when you
actually say what you mean plainly instead of hiding it in rhetoric.

Maybe _some people_ go through life that way, but that's a pretty poor life
strategy and I personally make a pretty big effort to learn from other
peoples' mistakes. Maybe I haven't been successful always, but I can point to
lots of examples of where I have.

~~~
inopinatus
> ? "I got you to make the statement" ?

That's quite the signal of bad faith debate. I don't think it's my own
aggrandizement in play here. Quite the reverse. c.f. remarks passim re.
hubris. So there the conversation must end.

~~~
kerkeslager
The guy who thinks he "trapped" me with a rhetorical question accuses me of
arguing in bad faith? Okay...

All I did was get you to say clearly what _you_ believe. If what you believe
is so embarrassing that it's a sign of bad faith to get you to say it in
clearly, maybe believe better things?

------
falcolas
(The 80/20 rule applies below, some developers do care)

Developers... just don't care. They want to spin up an ORM, point it at a URI,
and forget about it.

I've fought this for over a decade now as a DBA, SRE, DevOps, and architect.
Most of the developers don't want to deal with anything infrastructure-wise;
they want to spend all the time they can just focusing on the problem they're
writing software to solve.

Observeability, reliability, scalability - these are all words that are
translated into either "someone else's problem" or "unproductive busywork" in
their minds.

~~~
vsareto
Many interests are pulling developers' attention in several different areas
all the time.

Database, security, accessibility, performance, infrastructure, tooling and
productivity, business concerns, workflow processes (agile), language
concerns, new things

All of these like to say "if only the developer could do $MY_AREA better,
they'd be better developers and we'd have better software". Each of them wants
to pile on more requirements for what devs ought to know.

Let's say we agree that devs should know all of these things. After you tally
up every area's demand, you're probably looking at a 10 year timeline before
someone's decent in all of these areas by spending approx. 1 year on each of
these areas doing meaningful work (as opposed to contrived tutorials).

Even then, things change constantly, so you'd have to continually practice all
of these things. Who knows if we'll add a new category next year? Then it
becomes an 11 year timeline.

At some point, developer responsibility has to stop.

~~~
debaserab2
I don't agree. If you're designing data structures in a code base you shoulder
some of the responsibility for the persistence characteristics of that data.

There's a lot of devs that think database design is the same as starting a new
ORM class and generating a migration file.

> Database, security, accessibility, performance, infrastructure, tooling and
> productivity, business concerns, workflow processes (agile), language
> concerns, new things

Yes, these are all things that devs should strive to know as much about as
possible. Software isn't easy. It takes a long time to become an expert. 10
years sounds about right.

> Who knows if we'll add a new category next year?

Skill domains do come and go, but I think the ones you've listed are solid
staples of web development for the past decade and likely will still be for a
decade more.

~~~
vsareto
> I don't agree. If you're designing data structures in a code base you
> shoulder some of the responsibility for the persistence characteristics of
> that data.

There's usually not a relationship that goes the other way though, for
example, developers don't tell DBAs to pick up code so they can write the
models in our language in addition to the underlying SQL. This highlights a
trend of increasing responsibilities pushed onto the developer.

>Yes, these are all things that devs should strive to know as much about as
possible. Software isn't easy. It takes a long time to become an expert. 10
years sounds about right.

But in some cases it's specialists designating what an expert developer should
know. It's giving away control in some respects. This turns into new job
requirements and a higher barrier for entry. The growth will need to stop at
some point.

~~~
AsyncAwait
Agree with this. There's the whole DevOps nowdays as well, which basically
just shifts what used to be an entirely separate, full-time, role onto the
developer. Adding DBA to that sounds like it would benefit noone, except
perhaps business owners looking for short-term savings at the expense of
productivity, ala open-office floor plans.

~~~
pjmlp
In most places I have worked, knowing the database well enough to manage it
and at very least write stored procedures has always been expected from
developers writing database related code.

I have been doing this for a little more than 30 years now, so it isn't
"nowadays" thing.

~~~
AsyncAwait
Sure, knowing your database well enough to "manage it" in terms of indexes,
bloat, writing procedures, that's a developer responsibility.

What is "new", is developers having to manage the DB backups, k8s clusters,
Docker images, Terraform, load balancers, CI/CD, proxies etc.

This used to be a role for a whole another role. Some places do have an "Ops"
role, but most tasks are still expected off the developer, with Ops at best
providing a helping hand, rather than taking over a task completely.

------
wenc
I never realized this before but many excellent developers struggle with SQL
beyond simple SELECT statements. I have a colleague who is by all accounts a
deeply technical person but one day he confessed to me that he didn't really
grok SQL and that he'd rather work with a "real" procedural programming
language to just store and retrieve data.

Part of it may be due to the fact SQL isn't really a programming language but
a declarative DSL for manipulating sets and tables. Things like GROUP BYs and
PARTITION BYs (window functions) that come naturally to mathematical
types/functional programmers are less intuitive to procedural programmers.

I suspect this was what attracted developers to noSQL databases like Mongo in
the first place -- it's more attuned to a programmatic mindset.

(this is not universally true of course -- many programmers have no issues
with SQL at all.)

~~~
brightball
I firmly believe that every developer should spend 2-3 weeks early in their
career working with nothing but SQL. It will pay huge dividends for the rest
of it.

IMO a lot of the issue is that developers for many years using Java or PHP,
were using SQL to handle everything. The application language was a pass
through later between the client and the database.

Your goal was to accomplish as much as possible in a single query and then to
simply return the results of that query to the interface. That meant
formatting numbers or currency in your SQL. Optimizing inserts or updates to
be handled in a single query. Grouping, counting, left/inner joins, having
statements to filter on aggregate results. More than 1 or 2 queries for the
primary area of the screen was both a rare and foreign experience.

And then ORMs started to slowly integrate themselves into the flow of various
frameworks to automate the repetitive things around CRUD tasks. Then to
address scaling & bloat problems we saw an uptick in REST APIs, microservices
that further made those ORMs the norm...and then many developers started
actively trying to stay within those API constraints to an almost religious
degree which led to a nested payload becoming acceptable fueling the whole
"NoSQL" situation, along with the idea that it was somehow better to repeat
the same data thousands of times over.

A whole lot of people pushed back against this and eventually, it mostly ran
its course. I've often seen resistance to SQL to be driven by _fear_ of SQL
more so than anything else. As soon as people get a basic comfort level with
SQL, it become almost automatic.

~~~
TremendousJudge
> The application language was a pass through later between the client and the
> database.

This style of doing things resulted in spaghetti style unmanageable databases,
filled with an unknowable number of triggers and procedures, all written in
PL/SQL (which is much, much worse than either Java or PHP). The reason why
ORMs started to become popular is that you can write your application without
filling your DB with arcane and inscrutable logic

~~~
wenc
There's a middle way which is very powerful: SQL views (just SQL queries; no
triggers or procedures)

Here's a powerful mindset trick: think of _SQL views as an sort of a REST API_
, but whose access language is SQL and not HTTP, and that returns data in a
table rather than JSON (hierarchical).

I once tried to build a REST API to a database, and someone told me I already
had a battle-tested and highly performant API that outperformed REST at scale
-- it's called SQL. A SQL view is a dynamic lens into the underlying tables,
so even if the underlying tables/schemas were to change, your consumers don't
care as long as they can access the SQL View.

SQL views are also composable: you can build SQL views on top of other SQL
views, and any changes made in the base views are propagated throughout. Need
to add/transform a field? Do it in the view. Need pull in auxiliary data?
Bring it in through a JOIN in the view. I've built many systems by composing
SQL views and they're very maintainable and very flexible. They're kind of
like function compositions but on tabular data.

The rule of thumb is: always access a database through a view, never the
underlying raw tables. In computer science, a great many maintainability
issues are alleviated through a layer of abstraction/indirection, and SQL
views provide exactly that.

This centralization of the core logic becomes especially powerful if the
database is accessed from multiple consumers (webapps, analytics backends,
Tableau, ML tools, etc.) The "API" remains consistent throughout.

~~~
jwdunne
This is a very interesting comment!

Two questions:

Do you have any example code that shows how this works? I get what you’re
saying intuitively but example code will help me bring it to table.

What about cross cutting concerns? I’ve found stored procedures to be a
performant solution here. By version controlling them, and limiting to pure
functions, I found them quite maintainable. Would you instead just define a
new view, or extend an existing one, or refactor into a separate view that’s
then joined into the existing views?

I haven’t delved as far as views, admittedly. One app featured a bit of
complicated logic and eschewing the ORM in favour of raw SQL helped (instead
of getting tangled up in Demeter chains). Despite new developers, who have
used purely ORM for years, shitting their pants at the raw SQL, both of us who
worked on it felt it was the right call. We feel much better about leveraging
more of the database in new projects.

In fact, when we took our experience to a Django project, my colleague wrote a
Manager method in such a way that an ORM favouring developer questioned
because it looked too much like SQL. But it was the obvious implementation to
us after using raw SQL. And, after benchmarking, the most performant.

~~~
wenc
Briefly,

1\. Let me try with a simple example. Suppose you have a fact table A with
fields (ItemID, Item, Amt) where Amt is in USD. Rule of thumb is: don't expose
A to the consumer; instead write a SQL View V_A and expose that instead:

    
    
      CREATE VIEW V_A AS SELECT ItemID, Item, Amt FROM A
    

Then suppose a European counterpart wants to use the same API but needs the
amounts to be in Euros. You can write another view: (in practice the
conversion 0.92 shouldn't be a static number, this is just for illustration)

    
    
      CREATE VIEW V_A_EURO AS SELECT ItemID, Item, Amt * 0.92 AS AmtEUR FROM V_A
    

Expose this to the Europeans. You can keep stacking views on top of other
views. Your U.S. consumers will always see the data through the lens of V_A
and your European consumers will always see it through V_A_Euro.

Suppose the underlying table A now changes. There's been a merger and the
company now stops reporting currencies in USD, and everything is now in
British Pounds so your DBA adds a field AmtGBP and starts populating that
field instead. Amt still contains historical data, but moving forward the data
in Amt will be NULLs; AmtGBP is the new internal baseline currency. From a
VIEW perspective, all you have to do is:

    
    
      ALTER VIEW V_A AS SELECT ItemID, Item, ISNULL(Amt, AmtGBP * 1.23) AS Amt FROM A
    

Your V_A and V_A_EURO consumers (could be Tableau, Excel, other SQL views,
etc.) will still happily receive data per usual, unaware of the internal
changes (the British are coming!) that have occurred. Contract kept.

    
    
      Table A <- View V_A <- View V_A_Euro
    

2\. Cross cutting concerns come in many forms so not sure if I can address.
Stored Procedures are definitely an acceptable abstraction -- they accept
parameters and can return tabular results just like VIEWs. They do however
work in a procedural manner (like subroutines) and can produce side effects,
which is sometimes necessary to accomplish very specific tasks. VIEWs on the
other hand are more similar to pure functions (unless random number generation
is involved) with no side effects. Because views are dynamic, they flex with
your data and VIEW definitions.

~~~
Izkata
There's another step that could be added there, too: After the ALTER VIEW, V
could be slowly incrementally updated over however long you need to back-
populate AmtGBP, and the views will continue to just work the whole time. Once
done, V_A can be simplified to remove the ISNULL and Amt, then Amt dropped
from V. That way you don't get build-up of cruft over the years, and the
experience isn't interrupted for the migration.

(Possibly a bad idea for currency conversion for various reasons, but just
wanted to mention it since this type of migration may be just right for other
data)

~~~
jwdunne
Is there anything you recommend for handling SQL definitions in version
control, development and production envs?

For production, I created a command on the app that loads the stored
procedures into the DB idempotently on each deployment/configuration. This
won’t work if the app server scales but allowed us to store stored procs in
VC.

For development, we ran the command on each page load as a sort of hacky “live
reload”. It didn’t work well (which highlighted the issue with scalability in
production) because Postgres, fairly, doesn’t like parallel redefinitions of
the same stored proc.

I’m not sure how best to automate this. For production, seems like a case of
running a command once per DB server.

And in development, using a fs watcher that loads changes in.

But I don’t know, this is new territory for us and I couldn’t find anything
out there to manage it within the context of a web framework. Perhaps I’m
searching for the wrong thing.

~~~
skdv
Web frameworks like Rails/Django use the idea of migrations to make changes to
the database. The idea is that you have a set of migration scripts like:
migrations/1765_create_table_users.sql migrations/2891_store_procedure_x.sql
migrations/5892_change_store_procedure_x.sql

(.sql/.rb/.py, it doesn't matter).

And you have a "migrations" table in your database that contains the numbers
of the migrations that have been run:

    
    
      select * from migrations;
          version
      ----------------
       1765
       2891
    

Every time you deploy to production automatically check which scripts in your
db/migrations folder don't exist in the migrations table and run them. (In the
current example, you'd run the 5892_change_store_procedure_x.sql that hasn't
been run yet).

How to do with functions/procedures?

You commit the function definitions in a functions folder to your version
system like:

    
    
      db/functions/report_x.sql
      CREATE or REPLACE function report_x() returns ...
    

When you change this file, nothing happens, you need to create a migration to
re-run this code once. In rails migrations would be:

    
    
      class UpdateReportXFun < ActiveRecord::Migration[5.2]
        def up
          execute File.read(
            Rails.root.join('db','functions','report_x.sql')
          )
        end
      end

~~~
jwdunne
Yeah, I’m aware of that, thank you. I was wondering if there was a way with a
faster feedback loop and allowed for bug fixes without creating a new
migration.

~~~
skdv
You don't need to write the migration until you're done. It's possible to have
a very tight feedback loop in any case.

I'm doing a lot of work in a Rails codebase where I edit
views/functions/procedures all the time. My setup is quite usable.

My current setup: I edit those .sql files and run them with psql in my local
while developing (without writing any migration yet).

I have some like this running on one screen to make sure the modified files
are executed by psql immediately as I change them (you could use `guard` too):

    
    
      find ~/projectx/db/functions -type f -name "*.sql" | entr -d -p psql db_name -f /_
    

and I edit the db/functions/*.sql files freely, adding things, changing
behaviour of functions and they are updated on the fly. (I can run tests -or
try things in the browser- to verify my changes work as I expect).

\--

Once I finish and I know everything is great, I just add the migration. The
migration is simply an indicator of which files I've modified and to specify
the right order to run them (which is useful if they are dependencies), like:

    
    
      # migration
      def up
        execute File.read(function1_sql_file)
        execute File.read(function2_sql_file)
      end
    

I could have an alias that automates generating that migration but it's just 4
lines...

[ I'm also using pgTAP to write tests for functions, it's quite nice :) ]

~~~
jwdunne
Oh wow, now I see what you mean. Thank you! That’s work great. I wasn’t aware
of ‘entr’ either, that’s exactly what I had in mind!

I’ll have a look at pgTAP too. Naturally we want to test in CI, I can see this
working really well. I did look at myTAP too, since we have a few MySQL
instances.

------
a_c
Learn about modelling. Database is more than just storing data. Drink less
koolaid of NoSQL, any NoSQL. It is trading initial result with future
development time. SQL has been battlefield tested. No amount of "convenience"
is more convenient than learning the fundamentals.

~~~
strbean
I'm disappointed that there isn't more criticism of the SQL _language_. The
whole NoSQL buzz got me excited, then turned out to actually mean
NoRelational.

It is wild that we are still using a language that looks and feels like COBOL,
and any criticism is met with drive-by disapproval (downvotes and no comments)
or an argument about why relational databases are important.

SQL is a deeply flawed language by standards that are pretty much universal
today - we on HN discuss them daily on posts about new programming languages.
For example, one of the top level comments here:

> In SQL it boils down to the fundamental syntactical requirement to put the
> SELECT clause before the FROM clause. So I have to build up my statement in
> this weird spiral pattern where I change something deeper in before I know
> what I can SELECT in the first place.

If this were the case for a new language post on HN, the author would get run
out of town for this reason alone. And yet, the person who wrote that comment
feels the need to hedge by saying they are "an SQL hater in remediation".

Relational data does require a mode of thinking that many programmers are not
practiced with. Wouldn't it help to have a language for working with it that
isn't outright terrible?

~~~
a_c
SQL stood the test of time. SQL is widely adopted. Once you get the hang of
it, it can be applied on a wide range of RDBMS.

I would argue that countless productivity has been lost to learning yet
another query language for yet another NoSQL db. Mongo has its own query
language. Cassandra has its own. Neo4j has its own. What not. Guess what, few
engineers need these to solve their actual problem. Be it building an
application, a library, a SaaS, a tool with wide database support. The yet
another query language is an imaginary solution to an imaginary problem (most
of the time). Your problem isn't big enough to use whatever NoSQL of choice.

When developers starting out, they get attracted to technologies with the most
marketing money. They beat around the bush of learning things that might not
matter anymore 5 years down the road.

Is english the best language? No, not even close. Should one learn it?
Probably.

Is esperanto a better language? I don't know, maybe. People invent it for a
reason (a problem to solve) after all. Should one learn it? Probably no.

~~~
strbean
> SQL stood the test of time. SQL is widely adopted.

I'm highly dubious of "it is widely used so there" arguments. I think in the
case of SQL, the fact that ORMs and SQL generators are ubiquitous are evidence
that a huge proportion of engineers would rather not write it, and part of why
it is still widely used (most of the time, most people can avoid actually
touching it).

> I would argue that countless productivity has been lost to learning yet
> another query language for yet another NoSQL db. Mongo has its own query
> language. Cassandra has its own. Neo4j has its own. What not. Guess what,
> few engineers need these to solve their actual problem.

This is exactly one of the points I addressed in my comment... none of those
are even relational databases. And this is where the discussion ends up every
time someone says SQL sucks.

> Is english the best language? No, not even close. Should one learn it?
> Probably.

The vast majority of the time, when you are using English, you have no control
over the receiving end. The vast majority of time, when you are using SQL, you
/ your org grabbed an SQL RDBMS to use.

Furthermore, I don't see why a modern, reasonable query language couldn't
transpile to SQL easily when necessary, making all path-dependency / adoption
arguments void.

------
mattferderer
> AUTOINCREMENT’ing can be harmful

I'll add that they should never be trusted to not jump around either! I
imagine everyone makes this mistake at least once in their life.

There is a very high chance that the database will skip a few numbers from
time to time. You will then have someone from an accounting department asking
where Record #XX is.

~~~
hobs
There's a simple fix - Don't you ever expose primary keys to the users. Ever.
Seriously, ever.

A primary key is not an order id, it isnt a person identifier, it isnt a
paycheck - its a thing the database should be using behind the scenes.

All of the things I just mentioned change - besides the primary key.

You'll never have this problem if you separate your business logic from your
keys.

~~~
Macha
This is why I'm skeptical of the suggestion to prefer a natural primary key
like a username. It works fine... until they day the business asks for
changeable usernames because BigClient is now LargeClient and can't stand
anything to still have their old brand identity.

~~~
hobs
Even in the natural key's playground (in my mind) of data warehousing, you
still have to manage changing dimensions, and the inevitable reality that your
fact table was actually another dimension table all long.

------
Jaruzel
Since when did Medium.com offer to log you in with your Google account like
this?

[http://www.jaruzel.com/files/medium-
google.png](http://www.jaruzel.com/files/medium-google.png)

I've got uBlock and Cookie Cleaner running - but it's clearly doing some sort
of cross-site shenanigans to create this pop up.

Time to stop visiting Medium.com I think.

~~~
benatkin
Since a long time ago. This is old news. iFrame sandboxing is now a two-way
street, depending on configuration. Not only can an iFrame be prevented from
accessing the parent frame, a parent frame can be prevented from accessing the
child frame. It is still vulnerable to clickjacking, but to reduce the impact
of that, using one-tap sign up only allows the most basic Google permissions.
[https://news.ycombinator.com/item?id=17044518](https://news.ycombinator.com/item?id=17044518)
[https://developers.google.com/identity/one-
tap/web](https://developers.google.com/identity/one-tap/web)

------
Dowwie
"The fastest way to access to a row in a database is by its primary key. If
you have better ways to identify records, sequential IDs may make the most
significant column in tables a meaningless value. Please pick a globally
unique natural primary key (e.g. a username) where possible."

Has anyone had a problem due to surrogate keys?

~~~
ComodoHacker
I can agree with everything in the article except this one.

>Has anyone had a problem due to surrogate keys?

There's one problem with surrogate keys: they are not convenient to users (too
long and not meaningful).

There are two problems with natural primary keys, and you are guaranteed to
hit one of them at some point.

1\. It turns out your key isn't actually unique. To resolve the collision you
have to replace natural key with a surrogate for one of conflicting entities.
Which is not always possible without risk of another collision.

2\. It turns out your key isn't persistent. You have to change it for some
entities, but you can't because of so many FKs.

~~~
brantam
The article mentions auto-incrementing keys, not surrogates. Not the same
thing at all. Not all incrementing keys are surrogates and not all surrogates
are incrementing keys.

Also, your problem number 1 is a problem whether the natural key in question
is the "primary" one or not. Certainly if you choose the wrong natural key
then you'll have to fix that - that's why you should take care to make a wise
choice of natural key regardless of whether you are also using a surrogate.

------
hermitdev
> You are lucky if 99.999% of the time network is not a problem.

This reminds me of one time I was having networking issues (around a CRUD
GUI). Oddly, I identified the issue the (business) day before my users did. I
was working on a rather large change (OS updates, 32->64 bit) and noticed that
multicast updates broke. The timing was funny. I was working towards a minimal
reproducible example already, and I had a small test app that showed that
multicast was broke on my PC. So, I ran down and tried the app on their
computers, and it worked... between the two of them, but not the company et
large.

That's when I vaguely remembered an email for Network Ops about a router
change on my user's floor over a weekend. I went by the Ops team (they didn't
like email for some reason) and told them what I was seeing.

The short of it was, yeah, the router change screwed my 2 users on the floor
(they got put on the printer VLAN by accident, which didn't receive
multicast). Separately, the issue I saw was a bug in how we built a 3rd party
lib on Windows that provided 64-bit multicast support.

------
PappaPatat
THE thing I wished more developers knew about databases: they are badass when
it comes to data manipulation.

Stop sucking all data and manipulate it in your language of choice. Tell you
DBA what you want done and let her do it for you.

Really, DB's may look like they are the special needs kid in the chain, but
they're magnificent powerhouses when it comes to 90% of what you are trying to
do to data.

------
tonymet
The most important and overlooked characteristic is that most RDBMS use b-tree
for their table-space, meaning all the operations (including search / lookup)
are O(log n). For online (OLTP) applications, this means you will have to
shard sooner or later (assuming your audience is growing)

~~~
mtVessel
I agree with your first statement, but could you explain how b-trees logically
lead to sharding (at some growth point)? What storage structure doesn't lead
to sharding eventually?

------
jasonhansel
I would add: when using SQL, the default transaction isolation level is
probably not what you'd expect.

------
AtlasBarfed
One thing I've noticed that in the medium term of a software service (2-5
years) is that your software should have the ability to do double-writes to
and flip reads between two different datastores.

That will afford migration with as close to transparent migration with as
reduced a downtime or no downtime.

~~~
earhart
That gets tricky if one of the writes fails.

~~~
hobs
Generally you get two phase commit involved here, or an asynch mirror that you
can restore over if there's failure.

~~~
smilliken
Two phase commit makes a promise it can't keep. CAP theorem is not amused.

It's slow, complex, a leaky abstraction, and ahould be avoided for better
consistency models.

~~~
hobs
I generally couldn't agree more, but yet its the first tool in the toolbox for
a lot of databases.

------
wilgertvelinga
Best book on the subject:
[https://www.goodreads.com/book/show/23463279-designing-
data-...](https://www.goodreads.com/book/show/23463279-designing-data-
intensive-applications)

~~~
pointy_hat
One more book on the subject, possibly even closer related to the field of
databases (should cover many of the items mentioned in the post):
[https://www.databass.dev](https://www.databass.dev)

------
stuaxo
I just want my databases to be normalised, and my queries optimised.

Stuff like Django CMS is terribly architected, so code is ping-ponging between
DB and code.

This would have been a non-issue if all the APIs were built with batching in
mind, and the data was normalised in the first place.

~~~
jtchang
Just to be clear you are talking about the CMS and not django itself right?
Django itself is fantastic.

------
tenant
An odd thing that happened to me yesterday with the PostgreSQL ODBC driver
(psqlODBC) v9.3.400. It wouldn't let me insert a string longer than 255
characters long into a character varying field into a local v9.4 database on
windows using a recordset update. I didn't have a problem pasting it in via
pgadmin. Altered the field to text and the problem went away. I've a suspicion
that there is a limit on text in the tens of thousands of characters length
too though despite both those fields being essentially the same thing and
limitless.

~~~
echlebek
That sounds like a bug in the driver. I'm pretty sure varchar on postgres is
just an alias for text, which has no limit.

edit: varchar(n) will issue an error past n bytes, varchar without n is the
same as text. character varying(n) is like varchar(n).

[https://www.postgresql.org/docs/9.4/datatype-
character.html](https://www.postgresql.org/docs/9.4/datatype-character.html)

------
Buttons840
I learned SQL working on a "database as a product". The database was filled
with medical ontologies. It was a perfect environment for learning. We were
always looking for obscure things in that database, and rarely changed data,
just selects for days. In the end I once wrote a query spanning about 100
lines that used common table expressions and found it quite maintainable.

Where can an SQL beginner find such a database to experiment with? I was
fortunate because my job provided me this database to experiment with, but
what about those who are not so fortunate?

~~~
war1025
CTEs are great. It's such a nicer syntax than subqueries.

------
gdubs
There seems to be a wealth of information here. I wonder, though, if the goal
would be better served breaking this up into a series. Or perhaps applying the
80/20 rule to the list to come up with truly “a few” items that will have the
highest leverage. Otherwise, I think it will provoke a lot of discussion and
analysis from people who are experts in databases, but it might remain
impenetrable for the core audience: the majority of developers who ignore this
stuff (according to the author.)

------
xupybd
I could be wrong but it seems to me that she is confusing not storing state
with Idempotent calls to the db. Everything you store in the db is some form
of state.

------
liendolucas
Many years ago I've read some chapters from Itzik Ben-Gan's "Inside Microsoft
SQL Server 2008 T-SQL Querying." It's an excellent book to anyone who wants to
know how things work under the hood. While the title is in fact about SQL
Server there so many things explained out there that apply to almost any SQL
engine that is a worth the purchase.

------
dmtroyer
As a developer, I am glad there are other people in the world who understand
this.

------
xwdv
These sort of articles are very effective, if you nail at least one thing in
your checklist the reader does not know or is confident about, it often
encourages them to continue reading further. I know I did.

------
peterwwillis
In my experience, if you think of a database as a 3rd party API that you do
not control sitting somewhere unknown on the internet, your expectations
become more reasonable.

------
inopinatus
“The best programmers started out as system administrators.”

“The best system administrators started out as programmers.”

Optimally, have one foot in each grave. They call that devops, these days.

------
indymike
Nice! Perfect timing. I just started writing a new greenfield project and
decided to use Postgres instead of a NoSQL after five years of NoSQL.

------
neycoda
You don't have to worry about data if you just have a datum.

------
danielovichdk
I love a good old database

