Avro solves this problem completely, and more elegantly with its schema resolution mechanism. Exchanging schemas at the beginning of a connection handshake is hardly burdensome
If by "solving" you mean "refuse to do anything at all unless you have the exact schema version of the message you're trying to read" then yes. In a RPC context that might even be fine, but in a message queue...
I will never use Avro again on a MQ. I also found the schema resolution mechanism anemic.
Avro was (is?) popular on Kafka, but it is such a bad fit that Confluent created a whole additional piece of infra called Schema Registry [1] to make it work. For Protobuf and JSON schema, it's 90% useless and sometimes actively harmful.
I think you can also embed the schema in an Avro message to solve this, but then you add a massive amount of overhead if you send individual messages.
> but it is such a bad fit that Confluent created a whole additional piece of infra called Schema Registry [1] to make it work.
That seems like a weird way to describe it. It is assumed that a schema registry would be present for something like Avro. It's just how it's designed - the assumption with Avro is that you can share your schemas. If you can't abide by that don't use it.
I do not think its unfair at all. Schema registry needs to add a wrapper and UUID to an Avro payload for it to work, so at the very least Avro as-is is unsuitable for a MQ like Kafka since you cannot use it efficiently without some out-of-band communication channel.
Everyone knows you need an out of band channel for it, I don't know why you're putting this out there like it's a fault instead of how it's designed. Whether it's RPC where you can deploy your services or a schema registry, that is literally just how it works.
Wrapping a message with its schema version so that you can look up that version is a really sensible way to go. A uuid is way more than what's needed since they could have just used a serial integer but whatever, that's on Kafka for building it that way, not Avro.
Understanding the data now depends not just on having a schema to be found in a registry, but your schema registry, with the schemata registered in the same specific order you registered them in. If you want to port some data from prod back to staging, you need to rewrite the IDs. If you merge with some other company using serial IDs and want to share data, you need to rewrite the IDs. Etc.
They're saying that for a sequential number, you must have a central authority managing that number. If two environments have different authorities, they have a different mapping of these sequential numbers, so now they can't share data.
Oh. Like, if you have two different schema registries that you simultaneously deploy new schemas to while also wanting to synchronize the schemas across them.
Having the schema for a data format I'm decoding has never been a problem in my line of work, and i've dealt with dozens of data formats. Evolution, versioning and deprecating fields on the other hand is always a pain in the butt.
If a n+1 version producer sends a message to the message queue with a new optional field, how do the n version consumers have the right schema without relying on some external store?
In Protobuf or JSON this is not a problem at all, the new field is ignored. With Avro you cannot read the message.
I mean a schema registry solves this problem, and you just put the schema in to the registry before the software is released.
A simpler option is to just publish the schema in to the queue periodically. Say every 30 seconds, and then receivers can cache schemas for message types they are interested in.
Disagree. Avro makes messages slightly smaller by removing tags, but it makes individual messages completely incomprehensible without the writer schema. For serializing data on disk it's fine and a reasonable tradeoff to save space, but for communication on the wire tagged formats allow for more flexibility on the receiver end.
The spec for evolving schemas is also full of ambiguity and relies on the canonical Java implementation. I've built an Avro decoder from scratch and some of the evolution behaviour is counter-intuitive.
Protobufs are also undecodable without the schema. You can't even properly log unknown tags, because the wire format is ambiguous and doesn't encode data type (just size)
> Exchanging schemas at the beginning of a connection handshake is hardly burdensome.
I dunno, that sounds extremely burdensome to me, especially if the actual payload is small.
And how exactly does exchanging schemas solve the problem? If my version of the schema says this field is required but yours says it is optional, and so you don't send it, what am I supposed to do?
Avro makes that case slightly better because you can default value for a missing field in one of the two schemas and then it works.
It's not worth the boatload of problems it bring in all other and normal use cases though. Having the default value in the app or specified by the protocol is good enough.