It comes down to a preference thing. I want to read code like an essay. I don't ...

mlthoughts2018 · on Sept 28, 2018

I also work in an enterprise situation and very many bugs that I deal with from legacy code and others’ code comes directly from in-lining anonymous functions.

Many bugs have to do with utilizing closures to access variables needed in the function body, which then make refactoring harder and make modifying unit tests harder.

This can be even worse in languages like Python where there is a distinction between early binding and late binding and you have to be aware if by closure you’re using a reference name that may be associated with different data during its lifetime, or a name whose value won’t change, because a change to the underlying value could make a difference in what is bound inside the anonymous function at different times when it’s called. The classic example is trying to define functions in a loop where functions use the loop variables by closure. Then being surprised when every one of the functions has a reference to only the final value of the loop variable.

Even in statically typed languages, this makes things much harder to reason about than they should be. On the other hand, making an anonymous function that accepts many arguments for all the data it would try to access by closure is stupid: those arguments need to be documented, and it’s just so much cleaner and maintainable to do that with a regular function definition, which also makes it much clearer what all the conditions are for calling the function.

What’s worse is that these things can be deeply welded into some coding context, like using a flatMap over some TypedPipe in Scala / Scalding, and can result in needing to make in-line functions that are hundreds of lines long with arbitrarily complex function bodies, which then become tied into assumptions about the runtime context you’re embedded in, and then nobody can figure out how to refactor it into a standalone function, so it just grows by attrition to an inline lambda over years and is extremely fragile. Change something seemingly unrelated about the outer context it’s defined in, and suddenly you get unexpected, cryptic compiler errors complaining something’s wrong with the TypedPipe, and you have to dig deeper to understand why it’s related to the anonymous function.

I would say many of the most serious closure-related bugs and bugs related to unrefactorable yet undocumented dependence on an enclosing context that I’ve seen have been largely a direct result of the programming style of in-lining anonymous functions inside functional programming constructs.

I sympathize with your claim of “not reading an essay” especially because people can be prone to try to use functional programming or overloading the Python data model and operator syntax with cutesy bullshit that they try to pass off as expressiveness.

But I think there’s a middle ground where you think about it not like an essay, but just basic modularity and separation of concerns, and write functions separately except when they are very short and really trivial.

danShumway · on Sept 28, 2018

Hrm. I wonder if the differing bugs just comes from which one people are doing more.

I often have to fight to get people not to expose huge amounts of state whenever they build a module, so by far the most common bugs that I see are people being too loose with turning things that really shouldn't be modules into very fragile, cumbersome modules that depend on being used only in specific (undocumented) places with specific (undocumented) setup.

If I was in the opposite situation, and everyone I worked with already inlined code all the time, then probably most of the bugs I'd see would be related to people reusing variables, abusing hoisting by making spaghetti references to variables that are defined later in the function, etc... and in that case I could definitely see myself agreeing with you.

I have on occasion wanted the ability to define an anonymous function that didn't inherit variables from the scope that it was defined in. So I'll give you that - I would love for the ability to make an anonymous function that only has access to variables that are explicitly passed in. If I could isolate variables going into a closure as easily as I can isolate variables going out, I suspect a lot of the problems you're talking about would be easy to solve.

mlthoughts2018 · on Sept 29, 2018

That is a good point that whichever approach displays the most bugs in a given team is likely to just be whatever is the most common approach for that team, by simple base rates.

I’m not sure how we could objectively decide if either of these two approaches is definitively better, but in the specific, restricted case of a framework like Scalding, I’d strongly wager that avoiding in-lining ends up better in the long run. Those cases also have little connection to the weak module design issue you brought up, since it’s usually a module with de facto map reduce boiler plate and then just a few isolated places with any actual implementation, and when those parts are expressed as huge in-lined anonymous functions inside Scalding data type wrappers, I know right away it’s a bad code smell.

danShumway · on Sept 29, 2018

> I’m not sure how we could objectively decide if either of these two approaches is definitively better

The correct approach might just be the opposite of whatever you and your team's predilection is. In your case, you're saying that the teams you work with are using inline functions instead of following a restricted framework, in part because they're using languages that encourage them to just slap a bunch of nested code in instead of writing out the extra boilerplate.

Well, they probably already know to be careful about module design -- so if you encourage them to inline less code, odds are pretty good you won't suddenly wake up in the morning with a codebase with a hundred classes and a bunch of obscure private/public methods named `setupEntityExtractorForCollisionPart3`.

On the other hand, if your team is coming from a Java background and half of them are starting from the position that lambdas are just witchcraft, then it's probably not a bad idea to get them over that fear.

In a setting where everyone is inlining most of their code, probably the tests that are coming out are all integration tests, so... yeah, bias towards creating units so you can unit test. In the opposite situation, I'm actually just trying to get people to stop testing private methods and leaking implementation details into their tests. So I would love if people were testing with a little less granularity.

I know that my initial reaction to you listing off the problems you've run into with people building anonymous functions that couldn't be refactored was just, "yeah, but why the heck would anyone make that mistake? How hard is it to organize the variables in one function?" So I assume that other people might listen to my complaints about improper code reuse and think, "yeah, but why the heck would anyone ever just reuse a method in a class without checking the documentation first?" So my takeaway from that is, "different people struggle with different things."