
Strongly Typed Events - pabo
https://www.tbray.org/ongoing/When/201x/2019/12/02/Strongly-Typed-Events
======
thelittlenag
Thinking about events to me is a sign that you are probably concerned with the
wrong level of abstraction. Instead, you should be thinking about building
protocols, of which events may be part. To me at least, a protocol combines
notions of both the means of communication, as well as the semantics of that
communication.

In this context of this article, I would prefer to version a protocol. That
version subsumes the particular schema of the events used for that version of
the protocol. If your protocol is well-designed then many of the assumptions
of semantic versioning can be applied and relied on, so that parts of your
system can speak an older version compatibly with a parts that speak a
slightly newer version.

I really wish there was more support in the major ecosystems for creating
protocols and a less obsessive focus on message encodings. Oh well.

~~~
proc0
> so that parts of your system can speak an older version compatibly with a
> parts that speak a slightly newer version.

I think this doesn't guarantee bug-free event code at all. The usual problems
would still arise, I can see at least this would create repetition in logic
which doesn't scale properly with large apps.

Additionally versioning is needless if you have strongly typed events that get
checked at compile time. Any time you need to add an abstract feature to the
event architecture, the compiler would tell you what other part of the app
broke as a result.

~~~
thelittlenag
> I think this doesn't guarantee bug-free event code at all.

Please don't misinterpret me. The goal is to have fewer issues, not zero
issues.

> Additionally versioning is needless if you have strongly typed events that
> get checked at compile time. Any time you need to add an abstract feature to
> the event architecture, the compiler would tell you what other part of the
> app broke as a result.

I disagree. Most type systems, especially those describing data, are really
only good for checking syntactic consistency and not very good at semantic
consistency. That is, it can tell if field changes from an Int to a String,
but not if the content of a String field changes it's semantic, i.e. from a
user id to an email address.

Lack of ability to validate content pervades wire-appropriate encodings.

~~~
Archit3ch
Semantic consistency is at least an order of magnitude harder, in my
experience.

------
edejong
s/json/XML/ s/json schema/XSD/

And we’re full circle.

I was hoping for some theoretical event-based systems approach, using Pi-
calculus to prove correct systems composition.

~~~
ProfHewitt
Synchronized requesting in Communicating Sequential

Processes (CSP) [Hoare 1978] proceeds as follows: “Such

communication occurs when one process names another as

destination [to receive a request] and the second process

names the first as source for [the request] ... [in order

that providing the request] is delayed until the other

process is ready [to receive the request].”

Synchronized requesting x with request r (i.e. x!r) can

be implemented as follows using a 2-phase commit protocol:

    
    
         x.synchronize[Implements Provider ⟦provide ↦ r⟧]
    

so that after x has received a synchronize message with

parameter Implements Provider ⟦provide ↦ r⟧, x can get r

from the parameter using a provide message (cf. [Knabe

1992]). Synchronized sending x a request r (i.e. x!r) can be

algebraically reduced (which is a primary requirement of

communication in the π-calculus [Milner 1993]) because x is

provided with r without arbitration by

Implements Provider ⟦provide ↦ r⟧.

Synchronized requesting (i.e. x!r) has the following

significant costs in time, communication bandwidth, and

robustness by comparison with unsynchronized requesting (i.e. x.r):

    
    
        1. The requester must wait for the receiver’s provide 
           message in order to provide request r.
    
        2. After receiving a synchronize message, the receiver 
           must wait for the request r to be provided  
           (meanwhile holding up processing of other requests).
    
        3. Both the requester and receiver must be online 
           concurrently for communication to take place.
    

Unsynchronized requesting (i.e. x.r) cannot in general be

reduced using an algebraic equation as in [Milner 1993]

because in general, the request must go through arbitration

in order to be received. Although algebraic reductions may

be elegant mathematics, synchronized requesting is not

widely used in large software systems because it is slower,

uses more communication bandwidth, and is less robust than

asynchronous requesting (especially for IoT).

~~~
tempguy9999
I suppose the key to this is in the last line - you're implying something that
doesn't wait is better. That'd be actors; your creation, yes? :)

But out of interest, the syncing that you apparently dislike but which actors
sidestep by having a receiving queue, said syncing allows for messages to be
processed when the receiver is ready (obviously). That's a time overhead. If
actors have unbounded queues then there's a memory overhead (and one which may
grow infinitely on a finite machine if someone isn't processing their messages
fast enough).

How do actors handle that? If I'm talking rubbish some reading matter is
welcome.

~~~
ProfHewitt
Actually, the point is that being _faster_ is better.

Ideally, a message sent to an Actor is _never_ stored in persistent memory. A
runtime system should never accept a unbounded backlog of communications for
an Actor. Instead worst case, it should generate exceptions for further
requests.

Here are references that you requested:

[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3418003](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3418003)

[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3459566](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3459566)

~~~
tempguy9999
Much obliged, thank you.

------
ProfHewitt
Strongly-typed events are axiomatized up to a unique isomorphism here:

[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3418003](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3418003)

[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3459566](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3459566)

------
agentultra
Versioning is important. I wrote a library in Haskell for indexing a family of
types by a natural number (representing the version). It includes some
generics-based machinery for enabling type-correct migrations from _type n_ to
_type (n + 1)_ [0]. I've used it for migrating schema-less documents to
formats with a strongly-typed schema.

I'm also using it for versioning event streams and it works well. It'd be
really neat to be able to share the schemas of these messages in a wider
context.

Versioning events and the stream is a useful property to have. More types the
better.

[0]
[https://hackage.haskell.org/package/DataVersion-0.1.0.0/docs...](https://hackage.haskell.org/package/DataVersion-0.1.0.0/docs/Data-
Migration.html)

------
guitarbill
First CloudFormation adds a registry [0], now EventBridge [1]. Wonder if AWS
always builds duplicates, and if we'll end up with a schema service.

[0] [https://aws.amazon.com/blogs/aws/cloudformation-update-
cli-t...](https://aws.amazon.com/blogs/aws/cloudformation-update-cli-third-
party-resource-support-registry/)

[1] [https://aws.amazon.com/blogs/compute/introducing-amazon-
even...](https://aws.amazon.com/blogs/compute/introducing-amazon-eventbridge-
schema-registry-and-discovery-in-preview/)

------
proc0
I'm new to Elixir/Erlang, but it seems that is what those languages are going
for, an easy way to properly use event based architectures. Someone with more
XP here can correct me.

------
CoolGuySteve
I write a lot of low level C++ network/disk stuff. I need minimal decoding
costs and rely on the type system to catch message differences asap. The
following is an example of an append only format that uses the sizeof as a
kind of version.

In practice, not only is this infinitely faster performance-wise, but I find
maintaining this is about the same amount of work as Protobufs or JSON but
with less fussy tooling issues.

By using X macros or BOOST_FUSION for the struct definitions, you can infer
the type definition of message fields and automatically write serializers for
JSON, SQL, CSV, Python, or whatever schema format is your favourite.
Unfortunately, all that macro/variadic magic is too verbose and fiddly for a
hnews comment.

The important thing is that the drudgery of schema maintenance and conversion
can be eliminated using C++'s (limited) reflection capabilities. Your code is
the schema.

In WhateverMessages.h:

    
    
      #pragma pack(1)
    
      template<typename Message>
      struct Header
      {
        uint32_t type = Message::MessageType;
        uint16_t size = sizeof(Message);
      };
    
      struct Msg : Header<Msg>
      {
        static constexpr uint32_t MessageType = 'msg1'; // Can constexpr bswap this to appear correctly in GDB and hex dumps
    
        uint32_t somedata;
        char someText[32];
      };
      #pragma pop(pack)
    
    

In EventHandler.cpp:

    
    
      void processMessage(const Msg& msg)
      {
        // Everything is type safe now, do real stuff
      }
    
      void processMessage(char* data, uint32_t len)
      {
        Header* hdr = reinterpret_cast<Header*>(data);
        // should check the packet length is sufficient, return length processed, process multiple messages, etc
        switch(hdr->type)
        {
      #define DISPATCH(m) case m ::MessageType: assert(hdr->size == sizeof(m) /*or warn, whatever */); \
        processMessage(*reinterpret_cast<m*>(hdr)); break;
          DISPATCH(Msg);
          DISPATCH(OtherMsgDeprecated);
          DISPATCH(OtherMsgButForRealThisTime);
      #undef DISPATCH
          default:
            err("Unknown message type: %x (%s)\n", hdr->type, fourCCToString(hdr->type));
            break;
        }
      }
    
    

Another important thing is that the raw binary is the first class format that
requires no translation of any kind on x86-64 after the initial dynamic type
inference in DISPATCH.

Python, JSON, SQL, etc are all slow anyways, so we can spend time massaging
their serialization format afterwards.

------
IshKebab
I think the proper solution is to use a format that _explicitly_ requires a
schema (e.g. Protobuf). If your schema is implicit then people won't bother
using it and it'll be a hassle to get them to even write it down.

------
mcguire
" _Writing code to map back and forth between bits-on-the-wire and program
data structures is a very bad use of developer time._ "

I'm going to disagree. Those transitions happen very frequently and can
materially affect both latency and throughput. Spending developer time on
these things can give a large return.

" _Among other things, most messages are in JSON..._ "

Ah.

