I don't program professionally, and I struggle with dicts and classes. On one ha...

GeneralMayhem · on June 26, 2022

> You really do need to document different expected dicts somehow, which is basically structs/classes.

As someone who does program professionally, my experience is that if you find yourself needing to do this (and the expectations aren't runtime-dynamic) you've almost certainly gone astray, and you'll eventually find yourself implementing a much poorer version of a type system anyway. Dict contents should always be treated as optional. If you have a case where you have 2 required keys and then a bunch of optional ones, define a struct/class that has those 2 fields and then a dict/map for extra values.

The only reason one might want a "schema" for maps is when you're dealing with something config-driven; for instance, in implementing a SQL engine, or assembling inputs to an ML model. Even then, your code shouldn't have expectations on specific contents, other than that the keys must be the same as some other map/list.

A few other similar rules I follow with weak/primitive types:

* Strings are opaque blobs. The only valid operation on strings is to test two strings for equality. No parsing, no checking "does it have a prefix", no concatenation - if you need comparisons that take multiple elements into account, write a richer type. The exceptions are in implementing wire protocols, and rendering data for human consumption.

* Booleans are not allowed as function arguments or member fields. Define a custom enum instead. You'll almost always end up wanting at least a third possible value. Even if not, it's useful to make the default state some form of invalid, so that you don't have to guess at whether a field is false because you meant false, or false because you forgot to set it.

* If there's a value property that you rely on (e.g., lists being sorted, strings being capitalized, integers being in some range), and that property needs to be preserved across a function-call boundary, wrap it in a type. It doesn't force correctness (unless you're using a language with dependent types), but it's at least stickier than a comment.

owl57 · on June 26, 2022

Python implicit string iteration is an annoying trap.

  COMPLETE_STATES = 'done', 'cancelled'

  if state in COMPLETE_STATES:
      …

Then you decide to handle cancelled tasks separately.

  COMPLETE_STATES = 'done'

Boom!

Can't remember a less contrived example right away, but I have broken real code by inadvertently calling string iteration and spent some time scratching my head. Granted, don't think I've seen something like this in a PR, only in local development.

P.S. I think last time I stumbled on this, it actually involved Django ORM and changing filter(state__in=COMPLETE_STATES) to filter(state__in=DONE).