
Towards a Type System for Containers and AWS Lambda to Avoid Failures [pdf] - cmeiklejohn
http://christophermeiklejohn.com/publications/hotedge-2018-containers-preprint.pdf
======
stephen
I'm not sure what the point of this paper is?

* It talks a lot about containers, but this is really just contracts across systems, whether they run in containers or not. So not sure why the word "container" is necessary.

* It says "towards a type system", but then just muses about IDL/REST/Thrift, says "we need cross-system stuff ... so we should use better types", ...but...what does that look like? There are vague assertions that "we've done this", but I don't see any description of what that actually looks like.

* The Zookeeper/Kafka example, while apt, I'm not sure if "having a cross-system type system" is exactly right, as Zookeeper is purposefully/validly ambivalent about what its clients encode within it's file system, so whether the replication data is F/F+1/whatever is meaningless to Zookeeper. So to me the solution is not "cross-system type system" where Zookeeper becomes aware of Kafka's invariants, it's just Kafka itself correctly/internally interpreting the Zookeeper data. If that means it's an Either[ValidCluster,InvalidCluster] within the Kafka "ClusterInfoApi" abstraction, that's fine, but that's not something the Zookeeper API/IDL is going to care/know about.

* You're never going to get all networked services to agree on the One True networking format/One True type system/IDL, so why even muse about this as a possible approach?

Disclaimer I don't read academic papers on a regular basis.

~~~
jahewson
“Towards” papers are typically workshop papers which give a preview of ongoing
work, usually by PhD students.

------
keithwhor
This is what we’ve done at StdLib [1] with FaaSlang [2].

FaaSlang uses static analysis and a superset of ESDoc comments to infer types
for the API interface to serverless functions on our system. This allows for
automatic type coercion from query parameters, automatic error checking, and
more - all caked in at the gateway layer before a function is executed.

Zero configuration; just write comments the way you normally would. It’s
almost a healthy midpoint between TypeScript and JavaScript, operating above
the runtime, but theoretically applicable to any language run within a
function container.

[1] [https://stdlib.com](https://stdlib.com)

[2]
[https://github.com/faaslang/faaslang/](https://github.com/faaslang/faaslang/)

~~~
cmeiklejohn
Interesting.

I just went through this a bit and it doesn't appear that the specification
allow for polymorphism in the type system. Am I missing something here?

If that's true, it doesn't seem to address the issues highlighted by the
paper, although does handle simple type checking across interfaces of
primitive types.

~~~
cmeiklejohn
To be clear, it seems you have basic coercions (a la C, or ad-hoc
polymorphism) but no general support for subtype polymorphism nor parametric
polymorphism.

~~~
cmeiklejohn
Also, I have to sign up for stdlib? Do I have to pay for it? (Never got that
far, and the fact I have to sign up is quite misleading for something called
stdlib.)

~~~
keithwhor
You have to pay for APIs / Functions that the individual vendor charges for
(if the functions are not free) - for example, MessageBird charges $0.005 per
SMS sent [1] using their sms.create function.

You also pay a nominal fee per ms of compute used when somebody uses an API
_you’ve_ built.

But you can get started for free - $5.00 of credits included that should cover
up to 100,000 requests, if you’re really experimenting feel free to message
our team (my email is in my profile here, I think).

[1]
[https://stdlib.com/@messagebird/lib/sms](https://stdlib.com/@messagebird/lib/sms)

------
arcticbull
I've not nearly as much experience developing these systems, so getting
started without types was a bit daunting. I wrapped my Lambda functions in
protobufs and used a shared common definition repository. Then the lambda
services support either the JSON rep of the protos or full-on binary protos,
and the type checking happens on both ends. Curious what y'all think of this
as a solution.

~~~
cmeiklejohn
We're trying to highlight that most of the work you're doing to make sure the
interfaces are well defined and match up manually can easily be done by a type
checker, as you would have if you wrote this as a single application.

~~~
bjz_
How are you going to handle updates and migrations of types? Something I've
been thinking about in terms of dependently typed APIs is versioning and
migration - but more at the code ecosystem level than in terms of running
systems that can't be turned off. But there is kind of a similarity there. You
can think of the ecosystem of written programs as kind of a distributed
system. Would be nice to give library authors the tools to more gracefully
migrate their consumers code in a type directed way, and to highlight
potential problems prior to deployment, but this could equally apply in a
distributed computing context.

~~~
geogriffin
Welcome to Erlang/OTP "releases" where the folklore is that Ericsson engineers
spent as much time testing releases (read: state migration code) as they did
application code.

~~~
bjz_
Interesting. Yeah, from what I understand the Erlang philosophy was to just
throw out the idea of a type system and deal with faults dynamically, which is
understandable given the time the language was created. But given what we've
learned about type systems in the intervening years, it would be super nice to
leverage a type system for this, and greatly reduce the testing overhead.

I'd love to see this in an event sourcing context too. This paper, “The Dark
Side of Event Sourcing: Managing Data Conversion”, seems to hint at there
being some interesting algebraic foundations to splitting and merging streams,
adding new fields, etc: [http://files.movereem.nl/2017saner-
eventsourcing.pdf](http://files.movereem.nl/2017saner-eventsourcing.pdf)

------
bjz_
I'm really happy more work is being done in this space! Also nice to see nods
to various attempts from the past - it's important we learn from their
failings, but also their successes. It's too easy to get stuck in the mindset
of "Ugh CORBA" and "noes, SOAP", without being able to see an opportunity
there. Let's be persistent and figure this stuff out!

------
gm-conspiracy
I am having SOAP-related WSDL flashbacks.

~~~
runT1ME
I would take SOAP/WSDL over barely documented restful APIs and JSON...

~~~
cmeiklejohn
Unfortunately, popular opinion disagrees and that's part of the motivation for
this work.

------
xinjo
I like the premise of the paper, but the motivating examples feel really weak
to me.

~~~
cmeiklejohn
Care to extrapolate? We tried hard to take industry use cases.

~~~
xinjo
kafka/zk bug: I just don’t think this is a compelling example of an
underspecified interface being to blame. In my view, it’s squarely on kafka to
correctly implement its replication policy, and that likely isn’t something
I’d want to bake into an interface layer at the system/network boundary. Also,
kafka is a fairly foundational infrastructure-y component from a modern
“serverless” perspective, and its interface w/ zk is critical — it just
doesn’t exemplify the type of pain you see from underspecified interfaces in
large microservice deployments. Unlike between tons of small containers, the
kafka/zk interface is worth scrutinizing and warrants a ton of manual testing
and verification.

lambdas/kafka: from my perspective, this one is kind of conflating the value
of Options with the value of a typed IDL with some notion of generics baked
in. It’s not clear to me that Option[Number] is what I’d want in this
scenario, maybe I just want to reject anything that doesn’t validate to a
Float — but I think that’s kind of tangential to whether or not generics in a
more conventional sense would be useful.

edit: I guess I would have liked to see examples that show how this would help
with discoverability, code generation, distributed tracing, monitoring,
verifying large systems of microservices. It's a nice position to start with
though -- are you planning on a follow up or an implementation?

------
cbrozefsky
"CORBA, while successful" ....

~~~
bitcrusher
What're they going to say? "CORBA was a giant shitpile and created tire fires
for several years but since this is an academic paper, we have to present
historical context for our ideas"...

~~~
toddh
Your CORBA experience depended on how you used it. If you used the IDL and
messaging and plugged in your own backend it could work quite well. People are
still reinventing new IDLs to this day.

~~~
cryptonector
People have been reinventing IDLs since forever and a day.

S-expressions are the oldest serialization format that I know of. ASN.1 is a
fairly ancient IDL that was used for RPC back in the early 80s (ISODE/ROSE).
There's ONC RPC, with XDR as the IDL. There's DCE RPC (and MS-RPC). And many
many many many others. There's tons of serialization formats, and they all
tend to have one or more related sort of RPC frameworks, ad-hoc or otherwise.
Perl5 has several, no?

This is the space of NIH.

Take protocol buffers. It's supposed to be an anti-ASN.1, but it's actually
remarkably similar to ASN.1's DER (distinguished encoding rules), which means
it has lots of the same mistakes. It's all very sad.

~~~
cmeiklejohn
If you actually read the paper, we talk about ASN.1 and why it's insuffient
for what we are doing. That's clearly stated, and it's clear you didn't read
that paragraph.

~~~
toddh
If you want people to be supportive and care about your work, this is not the
way to go about it.

~~~
cryptonector
Thanks for noting this. I was hoping my lengthy reply would get this point
across more subtly, but maybe a more explicit note will be better.

