There times to integrate at the database level. But, the default should be single-application databases.
The rationale is the same as the grandparent's rationale FOR database integration. The odds of needing to share data over time are 1.
Given that shared belief, the problem with database integration is that MANY applications need to share facets of the same data. The single database ends up having a huge surface area trying to satisfy every application's needs.
The resulting schema will have definitions relevant for applications A, B, and C but X, Y and Z.
But, worse, there are dependencies between each application's working schema. This means ensuring integrity becomes harder with every application that integrates.
Finally, integration points are the hardest to change after-the-fact. The more services that integrate, the less ability to make the inevitable model changes necessary to fix mistakes/scale/normalize/denormalize/change solutions.
Thus, "build an api" is the best solution. Well-defined APIs and data-flows between applications helps data and process locality and avoids most of the problems I just listed. The trade-off is you're now meta-programming at the process level— the complexity doesn't disappear, it's just reconceptualised.
This is, more or less, exactly what views are for.
> Thus, "build an api" is the best solution.
And you can do that within the database with stored procedures, perhaps even with the same language you would use in the front-end (depending). And look at the advantages you have:
- No implied N+1 issues because your API is too granular
- No overfetching because your API is too coarse
- No additional service layers needed
- All the information is in the right place to ensure data validity and performance
Let me be clear: I see these as two viable alternatives and different situations are going to determine the appropriate tool. I bring this up because I do think the NoSQL crowd overall has a very distorted and limited picture of what exactly it is RDBMSes provide and why. If people look underneath their ORMs, they may find an extremely powerful, capable and mature system under there that can solve lots of problems well—possibly (but I admit, not necessarily) even _their own_ problems.
This is where we part ways.
We're talking specifically about integration. That means each system have different processes and are talking with other people.
If this is a case of three apps exposing the same process over three different channels (HTTP, UDP, morse code); then, database-level integration makes perfect sense.
But, as soon as differing behaviors comes in, then the database level doesn't— by definition— have enough information to ensure validity. One app thinks columns X and Y are dependent in one way, the other app views it another way. Now, one or the both of those apps are screwed for validity. And this problem grows with N+1.
I am certainly not arguing against good databases. Stored procedures, views, etc. are all great even for a single application. But, I am arguing database level integration should be the rare exception to the rule.
I think there's an asymmetry in your argument.
You are describing all of the problems with data management as though they were specific to schemas in a traditional RDBMS; but glossing over how "building an API" solves those same problems, and whether that method is better or worse.
In other words, "build an API" is the problem at hand, not the solution. A traditional DBMS provides a data definition language (DDL) to help solve the problem (schema design), which isn't a solution either, but it does offer a lot of direction. Saying "build an API" really gives me no direction at all.
I specifically said the problem and the complexity inherent in solving it doesn't disappear with per-application databases.
But, the application (the 'A' part) is where the most context around the information is to be found. Tying multiple applications together at the Data-side (rather than the Application-side) means you don't lose that context... and eliminate your chances of clashing (and the ensuring integrity issues).
At some point you'll need to go into the details, but I don't think a comment is the right format. Maybe a blog post to show exactly what you mean.
For instance, if you have an API with the functions:
item# = add_item(description, price, color)
record_sale(item#, quantity, customer#)
record_sale(item#, quantity, customer#, price=NULL)
But you already have sales data without a price, so you need to migrate it by copying the current item prices over.
And if one of three applications doesn't support the new API, you need to offer the old API as well and it would just use the default all of the time.
That sounds pretty much like a schema to me. What's the difference?
You know what though... you may want to use multiple programming languages, and maintaining an API in multiple languages sucks. So, why not just make a service that each application can connect to, and then let it talk to your database. Then you just need to define a simple protocol that language can use to connect to your service.
Or, you could just skip a few steps and use an actual database to begin with.
Views are designed to avoid that problem.
What is the specific way you handle versioning in an API that is superior to versioning in the schema?