Schema on Read Considered Harmful

theamk · on June 1, 2021

that's a pretty confusing blog post... it seems to make a distinction of "schema on read" vs "schema on write", but supplies the following examples:

>>> The following are common examples of schema-on-read:

> Blob storage, (S3, Filesystem, Azure, etc)

> Message queues (SQS, Kinesis, Kafka, RabbitMQ, etc) [...]

>>> Common examples of schema-on-write are: [...]

> JSON Schema

> Structured serialization formats (Protocol Buffers, avro, thrift, etc)

But those are not opposites at all! You can put protobufs in S3 or in the message queue. And even with protobufs, you may end up with their "unknown shape" examples if the scheme is old and evolved over time.

Moreover, most of the issues described -- like "Producers may change field locations or data types without communicating to the consumer teams" -- are very real in pretty much any communication schemes, and are much more indicative of general company culture than of the serialization technology used. After all, nothing in the protobufs require one to warn other teams of the schema updates.