Hacker News new | past | comments | ask | show | jobs | submit login

For the third time this week, in relatively unrelated fields of computation science, I'm reminded of the quote: "Duplication is less expensive than the wrong abstraction".

An awful lot of the time, a table schema is a terrible abstraction of the actual series it is designed to record. Sometimes it's designed under constraints that exist only to self-sustain the abstraction. Some of them have viable reasoning, some don't. How these structures sustain themselves for . . decades . . is a mystery to me. These non-relational movements, in part represented by the OP article, are (in part) attempts to shift the computing from data to the actual programmatic area. Because the real world doesn't have schemas - although that's still, incredibly, a source of intense disagreement.

Just an interesting thing that keeps cropping up. I wonder what the formal, "scientific" name for this is?




The schemas always exist, it’s just a question of where: the database or the code that interacts with the database.


The other way of putting it is this: the database gives you the chance to express facts (propositions) about the world in the form of tuples in relations ("rows" in "tables".)

You can, naturally, also express facts any other way you like. Comments in this forum, scribbles on the bathroom walls, seven layers of JavaScript buried in 10 microservices.

But a relational DB gives you the tools to at least attempt the expression of those propositions in a way that makes them findable and provable later, with minimal pain, and by people across the organization, not just in one code base.

Doing it in code, your mileage may vary.

And just like in written or spoken language communication, we can express facts poorly or incorrectly in any medium.

It's just that the relational model gives us a tool and framework to help put discipline (and query flexibility) into that process. If we use the tool right, it helps us clarify our logic.


I'm not entirely sure how true that actually rings because it's true by definition in some sense but having tables just be loose collection of typed nullable columns sans natural (maybe compound) primary keys is a very different experience from the kind of thing you expect. It's a schema in the way @dataclass is a schema and just ends up being a place to put stuff. It leads to very very few joins being necessary, natural keys map well to most problem domains, and foreign key constraints feel almost unnecessary since your real data is the key it's really hard to "lose" relationships.


It feels to me like you're mixing the terms schema and relations/joins.

You can have a schema on a table without any foreign keys or relations. But in the above you seems to use those concepts interchangeably.

Taking OPs comments directly about schemas, it seems true to me.


If your app/architecture is effectively "BYO Schema", if one schema is wrong the other's aren't necessarily, and the cost of making a mistake is much lower. And I would also argue that even if your database is noramlized and has a strict schema, the code _still_ has the ability to implement its own schema after pulling data out.


> the real world doesn't have schemas

This is simply incorrect. The real world has business logic, and that's ironclad (or should be, for a successful business, assuming you're not acquired at which point start the business data schema over). Where business logic has exceptions, those exceptions should be accommodated in the schema. But there should be no daylight between how the business works and how the schema records its every function. Anything else is bad design. It works for decades only if a business makes its decisions partially based on how those are constrained by prior logic, and the DNA of that logic is the schema.


In twenty years of "ERP Whispering" I've never known a "formal" "schema" (going to put "schema" in quotes here too, what that actually represents is a strictly validating structure, or system of any kind) that described a business object as utilized (process, reference, part number database) with anything greater than, say, forty percent commonality. At the high end. The remainder is waived, "ad-hocced", or is simply fibbed ("oh sure, all our parts have registered NSNs[1]").

The systems are, at their best, an executive class (VPs, Senior Mgmt) contracting a shaman/priestly class (me) to lay a sheet of order on throbbing chaos.

I probably should mention that it's possible - likely, even - that I have not worked for a non-dysfunctional business, and of course the ERP software ecosystem introduces its own peculiarities on top of that.

I realize I am badly abusing the word "schema" here. A lot of the time, we design schemas based on high level business requirements, and that's what I'm thinking about. There should be a LOT more between schema and business - a whole universe of business architecture, design, and software - but there never seems to be the money to do so.

In the wider context, though, the phenomenological world, I would posit, does not have business logic inherent in the fact of its own existence. This could get to be a very spatious, "dude-wheres-my-bong" sort of discussion, but I think the difference in Weltanschauung we're seeing here might be due to divergent experience.

[1] NATO Stock Numbers


It's almost certainly down to divergent experience. The shamaning and priesting I've done has almost exclusively begun with small companies that were working out their business operations at the same time as they were commissioning software. As they grow, the business logic evolves and the software had to grapple with that, but the originally unified logic imposes constraints both ways. The software sets limits on, and is in conversation with, the whims of management... if for no other reason than that 10 years in, on version 72, the cost of changing the data schema may outweigh the hoped-for advantage of some off the cuff idea to change fundamental operating practices. More concretely, e.g. if a custom logistics system grows up with a company from the ground floor, the business logic tends to bend around the software's limitations [clarification: Employees learn to get things done that weren't originally anticipated, in ways that weren't envisioned]; then the software slowly incorporates the new needs into formal features. If you're dealing with rational management (like the original founders) they will see the sense in maintaining logical continuity and tweak operations as needed to accommodate what should or shouldn't be done in software. [As opposed to "in operations". It's when employees are repeatedly writing on white boards or passing papers around for the same thing that it needs to be integrated as a feature]. But telling the management that "changing this fundamental aspect is impossible without significant downtime" is often enough to start the conversation of how to shape it so that the schema and the business remain in lock step.

This isn't perfect. Ask Southwest airlines, who grew with their own software until they couldn't, and then switched catastrophically to an entirely new system. Sometimes things reach a scale of complexity that the software simply can't conform to. But a really good designer should see those things 2-3 years out and plan for them.

This is the real power of the shaman. I don't dictate business logic, but I do whisper when it contravenes the hard logic of the schema, I fight to minimize the number of exceptions ("hacks" of any kind) and through this keep the beast in check.


I don't know what you mean by business logic but the (in my view successful) businesses I have seen operate more on the highly variable whim of upper management than any sort of ironclad logic.

You notice this quickly when you computerise existing human processes. They are riddled with (sometimes very valuable!) inconsistencies that are hard to fit into the regular computer mold some designer thought was sensible.


I responded at length to the sibling but just want to say - usually where I've come in there is an operations manual that works very much like a schema. Investigating where it's being overridden in daily practice often helps to form more linear processes / more consistent logic than was in the manuals. Once those are worked out, the software becomes the glue that forces employees to follow the processes, and the manual is about the software. But you're right: Stepping into a messy organic system and writing software around it is hard. It's much harder if they aren't willing to be flexible.


That's not what the article is about but the question of what level of abstraction and how flexible it needs to be is always interesting.

On table schemas, I think designers will usually have a good idea of how stable it needs to be and what should go in it. For instance someone creating an invoice table will probably already have a set of unvariable stuff that need to go there for sheer legal reasons.

The other part being, an unefficient or slightly clunky table schemas is not the end of the world and can either be fixed, even in pretty active production envs (it's "just" that much more effort and cost intensive) or be dealt with at the application layer.

Stacking abstractions is always an option if nothing else helps, and I'd see trying denormalized at the very start of a new application a worse tradeoff and lack of thinking about what the application is supposed to do at its core.


Maybe "Impedance Mismatch"?


That's perfect!

The electrical metaphor is a powerful one, as evidenced that it effortlessly describes a sister of the OP problem, "Object-Relational Impedance Mismatch". Looking at the most compact expression of the problem - the electrical one, i.e. math - you start to wonder if the root cause of all these is scale(observer) vs speed vs signal.

Could it be expressed as a logical abstraction to this family of phenomenon: impedance matching; object-relational impedance; business system vs ERP?

For every "reference frame" (electrical, mechanical, software, database, system) an organization node (single developer, team, organization) might be in, there would be a sort of minimum beyond which no unit is discernible. As this unit grows, the risk of "impedance mismatch" grows, even if signal and velocity remain static. If signal and velocity ALSO grow, the probability of mismatch rapidly becomes 100%. Unlike in electronics, the actual physical size of the "carrier wave" is getting bigger[1].

Which, honestly, ok, this all sounds pretty damn obvious. Maybe that's why this is a solved problem in EE, but it's a forty-year-clusterpoop in ERP world. Could it be that the root cause, then, is nontechnical leadership? A PoliSci MS / MBA won't - or can't - see that larger systems necessarily have different signalling / flow, but they "think they can pull this off" because "airplanes and lawnmowers are basically the same thing" and "our culture is always our first product". Blop. Fail. Repeat for two generations, and here we all are.

[1] Which, hmmph, ok, that can happen in some specialized setups. But that's outside this sandbox.


> the real world doesn't have schemas

In the same sense that the real world doesn't have numbers, types, or functions. Platonist might disagree, but in any case it is beside the point. What matters is if these concepts are useful.

There is no such thing as a schema-less database, the question is if the schema is formalized and stored and enforced in the database itself, or if it only exists implicitly in code or in the minds of the people using the database.


Hard agree on the computing system schema - it's always there, whether in a bag of brains or as a XSD file.

The usefulness of these concepts lie on a spectrum, with user behavior located in one part of that spectrum. Numbers occupies a broad spectrum that overlies most human experience, although not all. Type systems have overhead that occlude some aspects of computing, and by itself the phrase "type system" has fuzzy edges. Somewhat - although less so - same thing with functions - is a function a callable unit or is it an explicit method? Platonists - I think - would definitely agree that these things have the inherent quality of existence as a consequence of their nature; Anselm's Ontological Argument is sort of the Final Evolution of dire Platonism.

The world, though, as it exists in nature? Functions, types, even numbers are not primary observables. Our experience of reality is probably more akin to the observed experience of a rainbow, where the observer stands 138 degrees to the sun precisely . . or the rainbow doesn't exist. We're in a 138 degree arc to reality, living on a rainbow, counting the bands. But honestly? Who knows? Probably some learned cosmologist. I am but a MilStdJunkie on the internet.


The world is always 6NF.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: