Hacker News new | past | comments | ask | show | jobs | submit login

I don't program professionally, and I struggle with dicts and classes. On one hand, I want to avoid the Java world of needing to learn 8 new classes to use any library. So dicts are lightweight and extensible and feel like the modern way of doing things. One the other hand, all the problems listed in the article are right. You really do need to document different expected dicts somehow, which is basically structs/classes.

The other thing that always burns me are lists. Specifically lists of lists and lists of strings. Since python allows you to index into strings the same way as lists, for some reason I always loose track of where I am in the unpacking stack. This is when I switch to type hints.




> You really do need to document different expected dicts somehow, which is basically structs/classes.

As someone who does program professionally, my experience is that if you find yourself needing to do this (and the expectations aren't runtime-dynamic) you've almost certainly gone astray, and you'll eventually find yourself implementing a much poorer version of a type system anyway. Dict contents should always be treated as optional. If you have a case where you have 2 required keys and then a bunch of optional ones, define a struct/class that has those 2 fields and then a dict/map for extra values.

The only reason one might want a "schema" for maps is when you're dealing with something config-driven; for instance, in implementing a SQL engine, or assembling inputs to an ML model. Even then, your code shouldn't have expectations on specific contents, other than that the keys must be the same as some other map/list.

A few other similar rules I follow with weak/primitive types:

* Strings are opaque blobs. The only valid operation on strings is to test two strings for equality. No parsing, no checking "does it have a prefix", no concatenation - if you need comparisons that take multiple elements into account, write a richer type. The exceptions are in implementing wire protocols, and rendering data for human consumption.

* Booleans are not allowed as function arguments or member fields. Define a custom enum instead. You'll almost always end up wanting at least a third possible value. Even if not, it's useful to make the default state some form of invalid, so that you don't have to guess at whether a field is false because you meant false, or false because you forgot to set it.

* If there's a value property that you rely on (e.g., lists being sorted, strings being capitalized, integers being in some range), and that property needs to be preserved across a function-call boundary, wrap it in a type. It doesn't force correctness (unless you're using a language with dependent types), but it's at least stickier than a comment.


Python implicit string iteration is an annoying trap.

  COMPLETE_STATES = 'done', 'cancelled'

  if state in COMPLETE_STATES:
      …
Then you decide to handle cancelled tasks separately.

  COMPLETE_STATES = 'done'
Boom!

Can't remember a less contrived example right away, but I have broken real code by inadvertently calling string iteration and spent some time scratching my head. Granted, don't think I've seen something like this in a PR, only in local development.

P.S. I think last time I stumbled on this, it actually involved Django ORM and changing filter(state__in=COMPLETE_STATES) to filter(state__in=DONE).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: