

Data Model = Database - Harkins
http://mikeomatic.net/?p=195

======
TY
In my experience, wrapping all data manipulation operations in a set of stored
procedures usually provides abstraction layer powerful enough to address the
issues raised by the author.

In this approach, no application can change data in the underlying tables
directly. Change can only be done by calling appropriate stored procedures.
This rule is not optional but mandatory and is enforced by database
permissions given to the database account used by each application: grant
access to stored procedures, deny DML access to the tables.

As an example, if customer name needs to be updated, client applications will
call procedure called update_customer_phone (customer_id, new_phone) instead
of issuing a direct SQL statement like

update customers set phone = new_phone where customer_id = XXX

There is no need for the application to know that table named customers even
exists.

Read access should be provided through views and not to tables directly. Views
provide abstraction layer that will insulate applications from underlying
table changes.

While this might seem like a lot work, this approach ends up saving a lot of
headache in the long run, especially when one database is used by multiple
applications.

The main problem with this approach is unfortunately company politics. In many
organizations, the only people who can change stored procedures are DBAs or a
group of "database developers". The applications themselves are maintained by
"application developers". Usually these are separate groups, reporting to
different people with different priorities and getting them to work together
is often times challenging.

------
tl
It's probably worth pointing out that the article's title is "Data Model !=
Database" as in not equal to.

~~~
Harkins
Given that I copied and pasted it, there's a bug in HN.

~~~
maxwell
I guess it's a good idea to screen out exclamation points in general, but this
is obviously an edge case that should get through.

------
13ren
Databases became an integration point about 30 years ago, when Oracle ported
their DB to every platform in sight, and standardized on SQL as a a way to
integrate across heterogeneous hardware environments.

I agree with this blog post, but this is clearly a problem that comes up all
the time. My question is: what approaches to _abstract data models_ has the
industry already developed to deal with these problems of schema change and so
on? I have a feeling it's pretty big business.

Also, web services repeat the problem of schema brittleness, because they have
a schema... if that changes, it ripples through all the services that use it.

 _EDIT_ that is, unless you can automatically regenerate the entire service,
based on the schema - this is cool and works for some cases (eg. forms), and
it seems to be the direction the industry is headed (as a no-coding solution,
with many benefits: no tedium, no errors, no expensive/unruly expert
developers).

BTW: the blog author (Mike Arace) is also the developer of the (free) software
he's promoting (CodaServer). I have no problem with that, it's an interesting
article. Perhaps that's why he didn't mention alternatives.
<http://18thstreetsoftware.com/aboutus.php>

~~~
keefe
A lot of people (at least at my current employer) would point you to the
semantic web and RDF. One of the big focuses is model driven development -
defining application state and behavior in an abstract data model (ontology).
I think RDF is a pretty good standard for representing abstract data models
because it is general (being based on graphs), simple (triple = (subject,
predicate, object), and self describing. There is a lot of bleeding edge work
in this area and most of it suffers from severe performance problems at the
current state of the art.

~~~
fauigerzigerk
I like RDF for its flexibility, but OWL is very weak. You can't even define
very basic constraints that RDBMS schema languages have had for ages. All
value constraints on properties have to be constants. So, for example, you
cannot state that shipping_date must greater than order_date.

Querying RDF triples within reasonable performance constraints is very hard
for the general case. I've tried a lot in that area. You just can't get around
the fact that not using upfront knowledge about access paths to define the
storage structures means to accept a huge number of joins. However, it's
certainly a possibility to encode that knowledge in indexes instead of the
logical data model.

Column databases seem to be promising for RDF storage and query.

~~~
keefe
I'm not a big fan of OWL either. I think that you can get all the
expressibility of OWL and more using rules - in fact the fastest current OWL
reasoner, OwlIm, is based on using rules to model owl. A constraint like this
would be something like SELECT ?invalid WHERE { ?invalid rdf:Type your:Class .
?invalid your:value1 ?v1 . ?invalid your:value2 ?v2 . FILTER(?v1>?v2) } Sparql
queries can contain a construct clause which turns them into rules. For
querying an RDF view of a relational database, I don't think anyone has solved
this yet - D2RQ tries but like you say the performance tends to suck. A lot of
the current promising work is using triple stores to store RDF directly, with
a big leader in Franz with Allegrograph.

------
shaunxcode
He makes quite a few valid points and suggests using a business rules engine
called "coda server" <http://codaserver.com/>, does anyone have any
experience/impressions with this?

~~~
jwilliams
Not with Coda, but I've used a lot of commercial implementations (ILog, Fair
Isaac, Experian) and been pretty impressed.

One of the biggest problems I encountered was testing and change control. With
the rules engine you could suddenly change and manipulate things (sometimes on
the fly) - that didn't mean that you should....

------
Retric
I think he has worked with badly designed and maintained enterprise systems.
If only one system is talking to the database then the Data Model should map
directly to the Database, it's only when you have 20 systems each with their
own Data Model that all use the same Database that you have the types of
problems he is talking about.

Edit: That or you have stupid people building the Database. Remember just
because SQL is easy does not make the scema unimportant.

~~~
brentb
Even in the case of a database that is only used by a single application,
there are plenty of circumstances when it's better to interface with an
abstract data model rather than the bare database schema.

One reason could be a desire to keep your database (somewhat or totally)
normalized, while also being able to make queries against more natural (for
the programmer) abstractions. It's rarely the case that all of a given user's
information is stored in a single table (and for good reason... read up on
normalization (<http://en.wikipedia.org/wiki/Database_normalization>) if
you're not sure why), but it can sure save a lot of programmer time if your
data model acts like it's all in a single table.

The case of multiple applications or systems talking to a single database
isn't really all that different from a single application talking to a
database, unless you only have one database call in your entire application.
As soon as you're talking to the database in more than one spot in your code,
unless you're careful, you start running into the brittleness/dependency
issues described by the author.

Even if you're the only one working on a given application, in three months
you'll have little chance of remembering every dependency created by every
database interaction in your application. Abstracting things with a data model
won't fix all of the issues addressed by the author of the article, but it
sure does make it easier to address them.

~~~
Retric
Systems that do this tend to burn a lot of cycles without the coder having any
idea what's going on. For simple CRUD system that can be OK, but making
abstractions above the database is dangerous from a performance standpoint.

Edit: There are a lot of DB tools to help you. Adding views let's you keep
normalized schema's while simplifying reading data. Triggers can automate
logging etc. I often see people using high level data views that end up giving
them less power to manipulate data than the DB has.

PS: Why assume the schema is going to suck? Often it does suck, but that
should be telling you to change the schema. I mean I know people that can't
stand ugly code but they are willing to turn a blind eye to terrible
databases.

------
urlwolf
THere's another solution. Don't use relational dbs, and use some other storage
method based on RDF. There are lots of smart people working on making these as
fast as possible, and while they will never be as fast as a relational
database (for obvious reasons) they may be fast enough for most applications.

~~~
olegp
Do you know of any specific ones? From what I've seen, RDF storage engines
currently implemented, such as the various engines in Sesame, are all pretty
slow.

I'm doing a bit of work on trying to implement an RDF store on top of HBase -
has anyone thought about this and would be interested in throwing ideas
around?

------
DanielBMarkham
Let's see. Stored procs, ORM, web-tiered business-logic, functional code

Use tiers or abstraction to separate changing layers.

I hate to dismiss it as simply a plug for the author's product, but heck, it
ain't like this is a new problem. Why buy yet another product to add into the
stack? Programming languages and tools are made to solve this kind of stuff as
part of their job.

