The best thing is not to allow invalid geometries to begin with. Any validation would need to be done in an off-line fashion for a number of reasons (such as needing to retrieve any referred OSM elements), and by that time you can't automatically revert offending changes as any revert carries a chance of an object version conflict.
> The best thing is not to allow invalid geometries to begin with.
The best thing for whom? The developer? Certainly not for the end user, who needs to have invalid geometries while the drawing is being made and the data is still incomplete. Having a file format that won't admit that temporary state means that either the user can't save incomplete draft work, or that an entirely different format will be needed to represent such in-process work.
The article is rightfuly critizising that such incomplete way of thinking, that
doesn't take into account the full picture nor the systemic effects of a change, is pushed forwards only because they seem "the right thing" from an incomplete understanding of all the concerns and the needs from all stakeholders.
The right technical decision *must* include them to be correct, and the best design might involve a solution other than "update the file format so that it doesn't accept inconsistent geometry (acording to the set of rules that we understand as of today)". But to assess what the right decision is, you need to know how people is using the system in real use-cases beyond classic comp-sci concerns of data storage and model consistency; and to learn those, you need to talk to end users and perform field research to inform your decisions and designs.
> Having a file format that won't admit that temporary state means that either the user can't save incomplete draft work, or that an entirely different format will be needed to represent such in-process work.
Saving such temporary state is very rarely needed in OSM and should be never uploaded to the OSM database.
In addition, in almost all cases it can be simply saved as area of shape that is not yet matching intended one.
> Saving such temporary state is very rarely needed in OSM and should be never uploaded to the OSM database.
Maybe, but you're missing the other use case - that in the future you'll need an extension requiring geometries that are considered invalid by the current set of rules, forcing you to update all tools processing the file format to acommodate the new extension.
Keeping storage and validation as two separate steps is a more flexible design, preferable on platforms where data is entered by a large number of users in a complex domain that is not easy to model inambiguously.
Think of Wikipedia and what would have happened if its text format had only supported grammatically correct expressions without spelling mistakes, and without letting you save templates with any errors. The project would never have attracted the volume of editors it took to create the initial version with millions of articles, and the product would never have taken off. In an open project with data provided by the general public, keeping user data validation in the same layer as the automatic processing model is a design mistake.
I think that noone serious proposes to include rules like
> You could have rules that say you can’t link Finland to Barbados.
in the data model. That is a red herring.
But rules like "area must be a valid area" are a good idea, in the same way as Wikipedia is requiring article code to be a text and is not allowing saving binary data there.
> Maybe, but you're missing the other use case - that in the future you'll need an extension requiring geometries that are considered invalid by the current set of rules, forcing you to update all tools processing the file format to acommodate the new extension.
I think the way to go is to define several layers of correctness. A data set might then be partially valid. In such cases a tool might, for example, support transitions from a complete valid state A to a complete valid state C by an intermediate partially valid state B. (As databases with referential integrity may allow intermediate states in a transaction where referential integrity is broken.)
>
I think the way to go is to define several layers of correctness. A data set might then be partially valid
Thanks, that summarizes what I was aiming for. An open platform will be more flexible and and allow for different use cases the fewer assumptions about how it should be used it includes.
> Maybe, but you're missing the other use case - that in the future you'll need an extension requiring geometries that are considered invalid by the current set of rules, forcing you to update all tools processing the file format to acommodate the new extension.
As someone else in this subtree mentioned, apparently this flexibility wasn't needed for the last 20 years.