Hacker News new | past | comments | ask | show | jobs | submit login
Code Smell of the Day: Type Keys (2021) (jesseduffield.com)
111 points by genericlemon24 on Jan 2, 2023 | hide | past | favorite | 79 comments



The original version isn't a code smell. It has a very important advantage: read the function and you know that a user can be only one of the two things, a "customer" or an "admin", and you never have to worry about any other cases, because they don't exist.

Make it a callback, and now the code becomes "Here we do something specific to each subclass of users, but this code doesn't tell you what they are, there can be any number of classes and the callback can literally do anything, and if you want to figure out what's happening at this particular step, scour the entire codebase and find all the callbacks that are passed as arguments."

The code has become extensible, and thus harder to reason about. This is a good trade-off if you know you will have more and more user roles, added by different devs (or even different teams), and you want the user creation code to be decoupled from roles.

On the other hand, if most of your business logic is "Do this for customers" and "Do that for admins," then it's a useless extensibility. You will need a major change of your business requirement before a callback would be useful.

YAGNI.


“Do this for customers” and “do this for admins” has to be the number one regret of people who implement systems requiring access controls. Using roles and privileges is a much better approach.


This has a name. It's called "control coupling." Like the author's example, passing in a flag to a function that controls what the function does is control coupling and you should prefer different functions for each case instead.


> you should prefer different functions for each case instead.

No I shouldn't. I use this frequently in c++ to deduplicate common functionality that only differs by a single line. The "flag" is a zero-size struct and checking the flag is done at compile time.

  struct do_bar_tag {}
  template<typename do_bar_t>
  int foo(int x, do_bar_t) {
    if constexpr (is_same<do_bar_t, do_bar_tag>::value) {
      bar(x);
    }
    ...
  }
The compiler will emit two implementations, as you suggest is proper, but my maintenance overhead is reduced. And the cost is zero, in time and space, at runtime.

Of course, this isn't always the right thing to do, and you can definitely overdo it. But in moderation, it's fine. "Code smell" is too often a needlessly strong opinion.


It never stays that way. It starts as

    foo(int x, do_bar_t)
With time, it turns into

    foo(int x, int y, do_bar_t)
and eventually

    foo(int x, int y, int z, int w, do_bar_t)

It's too tempting to add 'just one more' flag. You de-duplicate common functionality but increase overall complexity.


Part of the job is to know when to call it quits and refactor, there's no hard rule. In most cases I'll take a 130 line procedure with three flags that don't heavily interact with each other over three almost-identical 100 line functions.


Porque No los Dos?

In languages with good support for functional features, you can curry and compose functions to your heart's content, allowing you to reuse similar logic and take better advantage of type systems like TypeScript's


It comes down to how deep the type system of a language support this kind of operation. It'll work well with TS, Rust, but it doesn't add much to golang, C++.

But well, usually dynamicizing statically-known types yields premature abstraction


I would actually call it "atomization" more than abstraction, but I agree it can often be premature


Put the flags into a class and pass the class around.


s/class/typed object/ -> "How to write Typescript"


Only, I don't use ints as "flags," so you're doing it wrong. And you're using the slippery slope fallacy, assuming that I'm a DRY absolutist. I'm not. Like I said, it's not always a good thing, and you can definitely overdo it.

The valid complaint about the implementation I show would be that the do_bar logic is at the very top with no preamble, and it may be simpler for the caller to simply call bar(x) in that case. But adding necessary-looking complexity would detract from the clarity of the implementation (and damn c++'s verbosity, it already looks like hot garbage IMHO)


> Only, I don't use ints as "flags," so you're doing it wrong. And you're using the slippery slope fallacy, assuming that I'm a DRY absolutist. I'm not.

You might not "use ints as flags" but a lot of maintenance programmers who know nothing about your idea, have never seen the code before, don't know how the application works or even what it does, will do just that.


What you're describing is multiple policy failures. Why are these "maintenance programmers" implementing novel features? Why are they novices that are unfamiliar with the language and its best practices? Why are these novice-rogue-developer-maintainers' contributions not being reviewed by more competent developers?

Moreover, if these unrestricted-novice-rogue-developer-maintainer-rhinoceri are going to do whatever the hell they feel like, completely disregarding the patterns and practices that I have established in my code, does it matter what I have done?


Because novice programmers are cheaper and businesses make decisions based on short term cost. Especially when projects get old.


I know all that. But you didn't address my point: if management has thrown all good practices to the wind, how am I to blame for the horrendous outcomes?


I'm not blaming you I'm saying most of the time management will pick the cheapest option. That leads to bad outcomes for a lot of legacy software. That then leads to someone spending more money replacing the legacy system


Quit jobs where management is that bad.


This is a tangent, but I’ve never seen the real world work this way. You’re being paid to be the DRI for the code base. You will be blamed because you’re being paid for it to be your responsibility. Actually horrendous outcomes, which can only be related to breaking functionality rather than code prettiness, can be prevented.


And the issue with too many behavioural flags is that now the function (and the implementor) has to deal to all possible combinations, even though by specs only a small subset would be allowed.


For most modern programming problems that we get paid to solve, a bit of duplicate or similar looking code is the least of our problems. Conceptually integrity and managing dependencies/coupling is far more important.


Who is "we" and do you have statistics behind that "most?" And why is "paid to solve" relevant to the discussion? It's okay to say that what I'm saying about my personal experience doesn't resonate with your personal experience, but hand-waving at a nebulous authority isn't making a case.

I use compile-time flags to deduplicate logic because refactoring is a nightmare if tens to hundreds of lines of code are turned into boilerplate. That is to say, I ensure conceptual integrity by maintaining a single source of truth. The nice thing about relatively dry code is that if you need to decouple it, you can easily just duplicate the code in question and run with that. If you have a dozen slightly-different copies of the same boilerplate, it's very rare in my experience for the variations to be well documented, so it ends up taking a lot of work to figure out why the variations exist -- and the real pitfall that I've seen time and time again is that a bug is found in one variant, fixed there, and not propagated to the other variants.


> Who is "we" and do you have statistics behind that "most?"

I love that you've called this out. So many people try and justify their coding preferences by referring to this mythical 'we' or the 'coding community' who have apparently all agreed to to some certain standard and everyone should just fall in line and get behind it without asking any questions. It's such a load of bullshit.


Just because you do it frequently doesn’t necessarily make it good. Your approach increases the function’s cyclomatic complexity. The struct requires the caller to dictate the function’s inner logic instead of the function logic being driven by application data. Maybe your solution is a good fit in your production code. I can’t say. Maybe there is a different approach to solve your boilerplate concerns. There will be exceptions but generally I don’t agree with the example as provided.


> Just because you do it frequently doesn’t necessarily make it good.

Fair point, but in my ~20 years of experience, I've learned what does and doesn't make for more work down the road. To wit: if something is always bad, I don't keep doing it. But I didn't just say I do it frequently, I said that I do it frequently for big-ish functions when it's only toggling a line or so. That context is important: I am not arguing that this is always a good thing. I'm saying that there's a time & place for it, and with a language like C++, it's a zero-cost abstraction.

> ...I don’t agree with the example as provided.

Yeah, neither do I. As I pointed out elsewhere, its a contrived example which can be more simply factored out into two function calls. So unless it's being used tens to hundreds of times, nothing is gained there.


> This has a name.

Good, I'm glad. For some reason the term "Code smell" has itself become a "smell" to me.


When I shared this to my colleague, I call it "dynamicizing statically-known info". It is in the same family as putting hardcoded values in a list and calling a .map to it. It almost always end as a premature abstraction.


> const createUser = (

> attributes: UserAttributes,

> onCreate: (user: User) => void

> ): User => {

> user = User.create(attributes);

> onCreate(user);

This stood out to me, in an otherwise tame article, as probably the most terrible overthought interface possible. A literal callback function passed as a function reference to something that does three lines of code? Good luck with that, I much prefer even a slightly procedual implementation.


The three line function here is an example, but I spent a good part of a sprint having to deal with a 730-line function that could've been used as a perfect "real world scenario" of why this abuse of control flags (or type keys) is a bad idea.


The problem with these examples is that they're contrived and oversimplified. Yes, switched enum behavior can be bad, but it can also be the right solution for the time being (or even forever).

The solution in the article could turn out even worse if the system evolves to require so many kinds of users that you end up with 8 different functions depending on what kind of user you want. You can't hide complexity of this type; you can only move it around as it emerges and try to keep things understandable.

In every design situation you're faced with countless approaches that you can take, with a lot of uncertainty as to how the system will evolve over time. I've found that the "what if" time you spend up front follows a parabolic efficiency curve: Early on in the design, you throw away some ideas that would make the first implementation easy, but would quickly become unwieldly. As the design progresses, you start thinking of more and more "what ifs", and soon you find your up-front architecture getting heavier and heavier, such that if you don't stop with the "what ifs", your first implementation will become a cathedral when all you'll need for the next year is a priest and a bench.

Clear out some easy win design issues up front, but overall don't sweat it and don't spend much time thinking "what if" because you're going to predict the future 50% wrong anyway. Refactoring isn't that hard or time intensive except in old code bases with poor hygene, and your new project is neither (unless you have poor discipline).


These kinds of edicts don't work because they fail the most important rule of programming: do what's simplest and most clear for the code you're writing. Your language is going to have constructs for this kind of thing (polymorphic inheritance, discriminated unions, multi dispatch, etc). Pick the language-native construct that works best. In the case of Typescript this would arguably be polymorphic inheritance, but if you don't have a scenario that benefits in multiple ways from writing classes then using them would be overkill.


A common variant of this is the Boolean argument to flip between function modes

    doAOrB(args, aOrB: bool)
Especially if the toggle gets used in 3-4 places or more in the body, control flow is a nightmare to follow. Instead of 2 paths to read, you have to mentally coalesce 2^n paths into 2. Worst-case scenario is when args has different meanings depending on the value of aOrB.

    @param flag: if set and if aOrB is true, warnings log to stdout and die, otherwise warnings are saved in the foobar field of args


My golden rules of programming:

• All booleans really want to be enums

• All enums really want to be rich data objects

• All rich data objects really want to be discriminated unions

That is, whenever you write some code which accepts a Boolean parameter that drives its behavior, at some point you will regret making it a bool and inevitably need to refactor it to be an enumeration type with more than two values. But then eventually you will realize that you have one behavior in your code that applies to more than one of those enum values (all the ‘type a’ cases but not the ‘type b’ cases), and you will want those enum values to themselves have a Boolean property telling you whether they are of type a or type b (and note that that Boolean will also be subject to this same golden rule in time)

Eventually you’ll find that those different enum values (the type a and type b ones) need different sets of dependent data (type a values all have a ‘target’ as well, but type b ones don’t, say) - so you end up needing a discriminated union to capture the various data objects involved.

All of which leads us to: every if statement and switch statement (over a parameter or input value) is a disguised match on a discriminated union type. Over time, this fate is inevitable.

And if your language doesn’t have discriminated unions, learn a pattern to fake them (usually it’s the abstract factory pattern).


Also: forgot the evolution into its final form, which is a list of rule objects.

A practical example:

Your program starts out accepting a config file that looks like this:

    enableLogging: true
Then it changes into:

    logLevel: WARN
Then that becomes:

    logging: {
        level: WARN
        file: system.log
    }
Then:

    logging: 
        fileLogger: {
            level: WARN
            file: system.log
        }
And finally:

    logging: 
        - fileLogger: {
            level: WARN
            file: system.log
        }
        - consoleLogger: {
            level: DEBUG
        }


The truth of this comment leaves me feeling somewhat emotionally violated.


I’ve seen the general idea here called “part blindness” before: https://www.hxa.name/notes/note-hxa7241-20131124T0927Z.html


I think this evolution paradigm is useful to keep in mind when designing the API, but obviously each successive step is more computationally expensive and more complex (see: error-prone)

You should probably generally use the least advanced of these while maintaining a pathway for future refactors into more complex versions of this


It's just the second law of Thermodynamics at work, no ?


Question is whether and to what extent you can shortcut that lifecycle, and whether it's worth trying.


Right. I tend to think you should just be aware of it, be deliberate in choosing where to situate your code on it, and be adept at refactoring code up the scale.


My favorite version is the “forceX” antipattern, where you introduce boolean flags to turn off some aspect of a function. E.g. “loadData(forceFromDisk: bool)”, where “forceFromDisk == true” bypasses the cache.

Then somewhere down the line someone realizes that actually in some cases of “forceFromDisk == true”, we still want to load the data from the cache, so the function becomes “loadData(forceFromDisk: bool, forceCache: bool)” where “forceCache == true” overrides “forceFromDisk”.

Then somewhere down the line someone realizes that actually in some cases of “forceCache == true”, we still want to load from the disk… Repeat ad infinitum.


Two or more coupled bools are crying to become a union. A function who's control flow needs to be configured cries to be split up.


Previously discussed in August 2021:

https://news.ycombinator.com/item?id=28325563 (74 comments)

Credit: Thanks to @rosebay for pointing this out at https://news.ycombinator.com/item?id=34218449 (now dead)

Also see additional interesting submissions from jesseduffield.com:

https://news.ycombinator.com/from?site=jesseduffield.com


Not everything needs to be DRY. In a lot of codebases, I see this hell-bent attitude to make everything as DRY as possible often at the cost of creating multiple files when all it'd require is 2 functions.

A lot of code smells arise from this attitude. DRY is not mandatory. Oftentimes, repeating yourself is better and clearer. It allows you to grow both functions at their own pace, handle edge cases separately, prevent spaghetti code, write better & simpler tests.


> It allows you to grow both functions at their own pace, handle edge cases separately,

DRY should only be applied to semantically identical code, not merely syntactically identical code. If the code is in the former category, and should be subject to DRY, and you're making changes to one copy, by definition you must need to make changes to the other copy … and it's a bug if you're not.

And those are the worse* kinds of repetition to encounter later on when you're trying to change the code: I've got two functions, ostensibly doing the same thing … except not. Which one is correct? (I.e., "What are the requirements?", and in my career, if I'm encountering this situation in the code, the requirements are never* documented.)


I 100% agree, but try telling that to a programmer with 2 years experience.


Even with over 25 years professional experience I still see code where logic and literal constant values (including e.g. complex regular expressions) are duplicated unnecessarily (requiring constant double maintenance etc.) as more of a problem than overly clever ways to reduce repetition that make the code harder to work with. 95% of the time developers can easily save themselves time upfront and in the future by first checking if the logic they're writing already exists in the codebase somewhere and reusing that (usually with very minor refactoring, e.g. marking a function public and moving it into a suitable shared module). Maybe 1% of the time the necessary refactoring is complex enough that it risks introducing bugs or decreasing code comprehensibility and duplicating the logic (ideally with a comment referring to the original source) is the lesser evil.


In a language like Rust or OCaml with exhaustive pattern matching on algebraic data types, the initial example is the recommended approach, because you will never miss a case, it forces you to think through all possible cases, where as the later refactored example in the article do not have such a constraint.


The same is true for Typescript, hence the userType's restriction to 'customer' and 'admin'.

Though the initial sample code is wrong anyway - note the missing "break" statements inside the switch.


It’s not only true for Typescript, but often times necessary if you require dynamic behaviour that would otherwise be achieved through reflection.


If possible, I would prefer deriving the "userType" from UserAttributes.

I see various methods for doing so are covered in the follow-up post linked at the top of tfa.

This is also avoidable if you have proper class/interface hierarchies, where the type "field/key" is the type itself.

In the case it's not possible and there really is only 1 type that is fully applicable to both admins and normal users then I would probably be fine with either approach (although I'd consider updating the return type to get type safety between admin and normal users!)


Type keys are used in various serialization mechianisms, especially JSON/TypeScript APIs (JSON-RPC, RDF-JSON, etc.). Even XML serializations have type keys with the element names.

In these cases, it makes sense to hide those type keys from the API and only use them in the serialization/deserialization logic.


Maybe, but the final solution is definitely harder to follow because the function definition now doesn't contain all the possible code that can run. You have to "find all references" and read the code of all the callers to know what it might do.

It's more loosely coupled and that always makes code harder to follow. So I'd say it could go either way. Depends on the specific example.


I mean... you already have user.setupNotifications() in your example... the obvious solution would have been a polymorphic user.setup() which is a different implementation for the CustomerUser class and the AdminUser class... if you've already picked OO as your paradigm of choice, at least exploit it for its value, you've already paid the costs.


This feels like inheritance to me, AdminUser inherits from User both have a setup / create method… Or just use a constructor…

For the most part I feel whatever gets the job done in the least lines of code is usually the right solution.

But overall this entire exercise smells suspiciously like a constructor.


Yeah I assumed we were going to get to some sort of types/inheritance based solution as well.

Stuff like this doesn’t bother me a ton unless it’s littered everywhere. If every function can run one of two paths that’s certainly annoying… if there is some high level function that builds one of two “pipelines” and I don’t think about userType again, I don’t care too much.


Some are suggesting that the author is against the use of tagged unions.

I don't read it that way. What the author is against is using parameters for the specific purpose of controlling the flow of a function. Instead, the author recommends breaking the function up into as many functions as there are variants of the type key.

This also applies to boolean flags (where only two variants are possible). There are, of course, exceptions as the author points out.

The small win is that the type key no longer needs to be maintained. The big win is that each function that gets split out has fewer reasons to change. The individual functions are therefore easier to maintain and understand.


> The big win is that each function that gets split out has fewer reasons to change. The individual functions are therefore easier to maintain and understand.

Yet the namespace becomes more polluted and the interconnections between the functions becomes more complicated, particularly during debugging.

It's important to remember that "break big things into smaller things" only pushes complexity someone else rather than eliminating it altogether, while simultaneously increasing the overall complexity of the system by increasing the number of interconnections between the parts. That doesn't mean you shouldn't do it but you should be honest about the consequences.


> Some are suggesting that the author is against the use of tagged unions.

> I don't read it that way. What the author is against is using parameters for the specific purpose of controlling the flow of a function. Instead, the author recommends breaking the function up into as many functions as there are variants of the type key.

Sure. How do you call the right one of the combined set of functions? A conditional branching on the type key at the call site? But then, by the same rule, the calling function should be split into multiple functions instead, and that recursisvely happens until you reach each point an instance of the typed union that might, ever, be passed (indirectly) to the function at issue is defined. And then you have a bunch of sets of near-identical functions where every function in the set needs updated for any required change that doesn’t depend on the tagged union. And the tagged union is no longer really serving the purpose of a tagged union, which is exactly to allow not having that kind of duplication.


I think the idea is that it's better to maintain several functions whose implementations are each linear flow than to maintain a single function that is littered with type key checks to turn on and off parts of its flow. It's trading off comprehension of each function by adding some risk that those functions will have shared parts that go out of sync.


As a nice side effect of fixing this is that it removed the bug where the switch statement didn’t have a break statement.


To be fair, every reasonable linter will treat this as an error or at least a warning and inform the programmer about their blunder. In fact, in TypeScript you don't even need a linter - the compiler itself can be configured to disallow fall-through cases. It's smart about it, too and still allows "empty" cases and makes the last case-statement optional, such that "switch (lorem) { case 'foo': case 'bar': ipsum(); break; case '...': blah(); } is still valid and compiles just fine even with noFallthroughCases configured.


I'm not sure I agree with this. Surely the solution is not to bother with the completely pointless createAdmin and createCustomer functions in the first place and to just call createUser directly where it is needed with the correct parameter. Otherwise you're just repeating yourself.

Most of the time, DRY is better.


The author is against tagged unions. Ok.

https://en.m.wikipedia.org/wiki/Tagged_union


The article felt like it was more against them being used in functions as switches. I don't think (could have missed it) that there was any discussion about using them in transport, because they are necessary there, but deserializing them to richer objects before going into code like was detailed in the article is a good idea IMO.


The article actually specifies that they are known statically, and not actually dynamic values. They are not coming from databases or other external sources and that's a large part of why it's a code smell. From the article, near the top but below where many people in this thread stopped reading:

> its value is always known at compile time. We’re not receiving it as an parameter from an HTTP request or a value from the database.


Sounds like he could use a language with multimethods...


Not sure I agree with the author. It took me less than 3 seconds to understand the initial code, but at least 10 seconds to understand the final result, even I had the knowledge of the initial function. But in some context this pattern can make a perfectly sense. Just not in this example.

Furthermore, from my point of view, creating an anonymous function and then store it in a variable is a bigger code smell.


When I started reading the post, I though the author would talk about pattern matching to get rid of the switch block. In that case, creating a user can be expressed in Elixir as follows:

    def createUser(attributes, "admin") do
        create(attributes) |> setupAdmin |> setupNotifications
    end

    def createUser(attributes, "customer") do
        create(attributes) |> setupCustomer |> setupNotifications
    end

    # example
    User.createUser(%{name: "laura", age: 30}, "admin")
Here, we chain functions using a pipe, |>, assuming that each function returns a user-like variable, like %User{}

I also liked the idea of using a callback for decoupling code (although that approach was discouraged by many users in this post). It could be done in Elixir as follows:

    def createUserCallback(attributes, callback) do
        create(attributes) |> callback.() |>  setupNotifications
    end

    # example
    User.createUserCallback(%{name: "laura", age: 30}, &User.setupCustomer/1)
    User.createUserCallback(%{name: "laura", age: 30}, &User.setupAdmin/1)
Since callback can be any kind of function, we can use pattern matching to enforce that the return type of each function is a user, ie %User{}:

    def createUserCallback(attributes, callback) do
        user = %User{} = create(attributes)
        user = %User{} = callback.(user)
        user = %User{} = setupNotifications(user)
        user
    end
Another approach is using a @spec annotation to define the signature of the callback.

Full code:

    defmodule User do
      defstruct name: nil, age: nil, type: nil

      defp create(_attributes = %{name: name, age: age}) do
        %User{name: name, age: age}
      end

      defp setupNotifications(user = %User{}) do
        IO.puts("User created #{user.type}: #{user.name}")
        user
      end

      def setupAdmin(user = %User{}) do
        %User{ user | type: "admin" }
      end

      def setupCustomer(user = %User{}) do
        %User{ user | type: "customer" }
      end

      def createUser(attributes, "admin") do
        create(attributes) |> setupAdmin |> setupNotifications
      end

      def createUser(attributes, "customer") do
        create(attributes) |> setupCustomer |> setupNotifications
      end

      def createUserCallback(attributes, callback) do
        user = %User{} = create(attributes)
        user = %User{} = callback.(user)
        user = %User{} = setupNotifications(user)
        user
      end
    end


The author has probably never written a compiler before.

Inside any compiler, you are going to have “type keys” and switch statements all over the place.

Even if you are writing a compiler in languages with builtin “tagged unions” and “pattern matching” you are still using “type keys” and switch statements.

Trying to write a compiler without a switch statement would lead to a giant unreadable mess.


> Note that instead of using callbacks we could have subclassed User to AdminUser and RegularUser, with each overriding the onCreate function.

Although the pendulum has swung far from inheritance towards composition in recent years, this is surely a prime case for simply using a subclass and instantiating whichever one you want.


This code is easy to read and understand and refactoring it with a callback seems as premature optimization which does bring own flavor to it. In other words, this is a good example when perfect is the enemy of the good.


My biggest takeaway from this is that CLOS and it's multiple dispatch system is the Platonic Ideal of OOP


This entire article assumes that the type is only used to satisfy what's on display in the example function, which is almost never true.

The article itself is some sort of a programming flex I'd fire you for if you leave it in the code reviews too often. :P


I really dislike the trend for saying “I’d fire you for x” where x is merely a suboptimal choice, or even a harmless decision about which the author has strong opinions. Probably it is said in jest but it creates a hostile atmosphere for juniors, where they are worried they can be fired for using the wrong design pattern.


I envy your access to your recruiting organization.


I read the blog, was mortified but luckily others have already provided feedback in an earlier thread: https://news.ycombinator.com/item?id=28325563

To the author’s credit, I loved that he is thinking analytically and he did take the feedback positively - there are some good ideas in the post.


The code smell I see is the omitted semicolons. The pitfalls of doing this are well documented. Just don't do it.


The commonly cited pitfalls are either strawmen or exceedingly rare. ASI is predictable and reliable. Not once in 5 years of semicolon-less development did it come to bite me or anybody else in a notable manner.

By all means use them in your work, but don’t treat it as ultimate truth. It is not.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: