It's a shame this comment is pithy, because I think it's dead-on.
There times to integrate at the database level. But, the default should be single-application databases.
The rationale is the same as the grandparent's rationale FOR database integration. The odds of needing to share data over time are 1.
Given that shared belief, the problem with database integration is that MANY applications need to share facets of the same data. The single database ends up having a huge surface area trying to satisfy every application's needs.
The resulting schema will have definitions relevant for applications A, B, and C but X, Y and Z.
But, worse, there are dependencies between each application's working schema. This means ensuring integrity becomes harder with every application that integrates.
Finally, integration points are the hardest to change after-the-fact. The more services that integrate, the less ability to make the inevitable model changes necessary to fix mistakes/scale/normalize/denormalize/change solutions.
Thus, "build an api" is the best solution. Well-defined APIs and data-flows between applications helps data and process locality and avoids most of the problems I just listed. The trade-off is you're now meta-programming at the process level— the complexity doesn't disappear, it's just reconceptualised.
> The single database ends up having a huge surface area trying to satisfy every application's needs.
This is, more or less, exactly what views are for.
> Thus, "build an api" is the best solution.
And you can do that within the database with stored procedures, perhaps even with the same language you would use in the front-end (depending). And look at the advantages you have:
- No implied N+1 issues because your API is too granular
- No overfetching because your API is too coarse
- No additional service layers needed
- All the information is in the right place to ensure data validity and performance
Let me be clear: I see these as two viable alternatives and different situations are going to determine the appropriate tool. I bring this up because I do think the NoSQL crowd overall has a very distorted and limited picture of what exactly it is RDBMSes provide and why. If people look underneath their ORMs, they may find an extremely powerful, capable and mature system under there that can solve lots of problems well—possibly (but I admit, not necessarily) even _their own_ problems.
> - All the information is in the right place to ensure data validity and performance
This is where we part ways.
We're talking specifically about integration. That means each system have different processes and are talking with other people.
If this is a case of three apps exposing the same process over three different channels (HTTP, UDP, morse code); then, database-level integration makes perfect sense.
But, as soon as differing behaviors comes in, then the database level doesn't— by definition— have enough information to ensure validity. One app thinks columns X and Y are dependent in one way, the other app views it another way. Now, one or the both of those apps are screwed for validity. And this problem grows with N+1.
I am certainly not arguing against good databases. Stored procedures, views, etc. are all great even for a single application. But, I am arguing database level integration should be the rare exception to the rule.
You are describing all of the problems with data management as though they were specific to schemas in a traditional RDBMS; but glossing over how "building an API" solves those same problems, and whether that method is better or worse.
In other words, "build an API" is the problem at hand, not the solution. A traditional DBMS provides a data definition language (DDL) to help solve the problem (schema design), which isn't a solution either, but it does offer a lot of direction. Saying "build an API" really gives me no direction at all.
I specifically said the problem and the complexity inherent in solving it doesn't disappear with per-application databases.
But, the application (the 'A' part) is where the most context around the information is to be found. Tying multiple applications together at the Data-side (rather than the Application-side) means you don't lose that context... and eliminate your chances of clashing (and the ensuring integrity issues).
So you want to put an API in place to access your schema-less database? So that multiple applications can access your database in a consistent manner? That makes sense.
You know what though... you may want to use multiple programming languages, and maintaining an API in multiple languages sucks. So, why not just make a service that each application can connect to, and then let it talk to your database. Then you just need to define a simple protocol that language can use to connect to your service.
Or, you could just skip a few steps and use an actual database to begin with.
Agree with this 100%. In the past the way different applications integrate with each other was through sharing the same database tables. Hence the need to keep the schema synchronized across multiple applications. Nowadays applications should integrate through well-defined APIs and service interfaces. See this Martin Fowler short essay on database styles: http://martinfowler.com/bliki/DatabaseStyles.html