I want to read code like an essay. I don't want to have to jump to a function definition to figure out what `thingExtractor` does. To me, that's like taking a book and rearranging the chapters in alphabetical order. No, put them in chronological order. Don't define a function three "chapters" ago in your code and then expect me to remember it if you're only going to use it once. I wouldn't read Ikea instructions that way, why should I read code that way?
But again, personal preference. And you don't have to choose either/or -- I love that Javascript's anonymous and named functions are so similar because it means you can switch between the two styles with basically zero cognitive overhead.
I will say, having worked in large enterprise before, I have seen bugs come out of people not inlining code and instead attaching it to classes or leaving it otherwise accessible. Specifically, people will use functions that are intended to be method-specific in other places. Even if those functions are well-written and don't have side-effects (which is often not the case), the original author still doesn't know that other people are now depending on them as an API.
So later on the code gets refactored and those methods get changed or removed, and suddenly you have a broken build. That's something you can partially mitigate by making a function private (although not entirely, because I've also seen it happen in code within the same class/scope). Python doesn't have true privates, but you can also mitigate that problem by just using code reviews to force people to respect `_`. In practice, I often found that it was easier to make the function anonymous, so everyone knew 100% that it was only being used in exactly one place.
I also work in an enterprise situation and very many bugs that I deal with from legacy code and others’ code comes directly from in-lining anonymous functions.
Many bugs have to do with utilizing closures to access variables needed in the function body, which then make refactoring harder and make modifying unit tests harder.
This can be even worse in languages like Python where there is a distinction between early binding and late binding and you have to be aware if by closure you’re using a reference name that may be associated with different data during its lifetime, or a name whose value won’t change, because a change to the underlying value could make a difference in what is bound inside the anonymous function at different times when it’s called. The classic example is trying to define functions in a loop where functions use the loop variables by closure. Then being surprised when every one of the functions has a reference to only the final value of the loop variable.
Even in statically typed languages, this makes things much harder to reason about than they should be. On the other hand, making an anonymous function that accepts many arguments for all the data it would try to access by closure is stupid: those arguments need to be documented, and it’s just so much cleaner and maintainable to do that with a regular function definition, which also makes it much clearer what all the conditions are for calling the function.
What’s worse is that these things can be deeply welded into some coding context, like using a flatMap over some TypedPipe in Scala / Scalding, and can result in needing to make in-line functions that are hundreds of lines long with arbitrarily complex function bodies, which then become tied into assumptions about the runtime context you’re embedded in, and then nobody can figure out how to refactor it into a standalone function, so it just grows by attrition to an inline lambda over years and is extremely fragile. Change something seemingly unrelated about the outer context it’s defined in, and suddenly you get unexpected, cryptic compiler errors complaining something’s wrong with the TypedPipe, and you have to dig deeper to understand why it’s related to the anonymous function.
I would say many of the most serious closure-related bugs and bugs related to unrefactorable yet undocumented dependence on an enclosing context that I’ve seen have been largely a direct result of the programming style of in-lining anonymous functions inside functional programming constructs.
I sympathize with your claim of “not reading an essay” especially because people can be prone to try to use functional programming or overloading the Python data model and operator syntax with cutesy bullshit that they try to pass off as expressiveness.
But I think there’s a middle ground where you think about it not like an essay, but just basic modularity and separation of concerns, and write functions separately except when they are very short and really trivial.
Hrm. I wonder if the differing bugs just comes from which one people are doing more.
I often have to fight to get people not to expose huge amounts of state whenever they build a module, so by far the most common bugs that I see are people being too loose with turning things that really shouldn't be modules into very fragile, cumbersome modules that depend on being used only in specific (undocumented) places with specific (undocumented) setup.
If I was in the opposite situation, and everyone I worked with already inlined code all the time, then probably most of the bugs I'd see would be related to people reusing variables, abusing hoisting by making spaghetti references to variables that are defined later in the function, etc... and in that case I could definitely see myself agreeing with you.
I have on occasion wanted the ability to define an anonymous function that didn't inherit variables from the scope that it was defined in. So I'll give you that - I would love for the ability to make an anonymous function that only has access to variables that are explicitly passed in. If I could isolate variables going into a closure as easily as I can isolate variables going out, I suspect a lot of the problems you're talking about would be easy to solve.
That is a good point that whichever approach displays the most bugs in a given team is likely to just be whatever is the most common approach for that team, by simple base rates.
I’m not sure how we could objectively decide if either of these two approaches is definitively better, but in the specific, restricted case of a framework like Scalding, I’d strongly wager that avoiding in-lining ends up better in the long run. Those cases also have little connection to the weak module design issue you brought up, since it’s usually a module with de facto map reduce boiler plate and then just a few isolated places with any actual implementation, and when those parts are expressed as huge in-lined anonymous functions inside Scalding data type wrappers, I know right away it’s a bad code smell.
> I’m not sure how we could objectively decide if either of these two approaches is definitively better
The correct approach might just be the opposite of whatever you and your team's predilection is. In your case, you're saying that the teams you work with are using inline functions instead of following a restricted framework, in part because they're using languages that encourage them to just slap a bunch of nested code in instead of writing out the extra boilerplate.
Well, they probably already know to be careful about module design -- so if you encourage them to inline less code, odds are pretty good you won't suddenly wake up in the morning with a codebase with a hundred classes and a bunch of obscure private/public methods named `setupEntityExtractorForCollisionPart3`.
On the other hand, if your team is coming from a Java background and half of them are starting from the position that lambdas are just witchcraft, then it's probably not a bad idea to get them over that fear.
In a setting where everyone is inlining most of their code, probably the tests that are coming out are all integration tests, so... yeah, bias towards creating units so you can unit test. In the opposite situation, I'm actually just trying to get people to stop testing private methods and leaking implementation details into their tests. So I would love if people were testing with a little less granularity.
I know that my initial reaction to you listing off the problems you've run into with people building anonymous functions that couldn't be refactored was just, "yeah, but why the heck would anyone make that mistake? How hard is it to organize the variables in one function?" So I assume that other people might listen to my complaints about improper code reuse and think, "yeah, but why the heck would anyone ever just reuse a method in a class without checking the documentation first?" So my takeaway from that is, "different people struggle with different things."
I want to read code like an essay. I don't want to have to jump to a function definition to figure out what `thingExtractor` does. To me, that's like taking a book and rearranging the chapters in alphabetical order. No, put them in chronological order. Don't define a function three "chapters" ago in your code and then expect me to remember it if you're only going to use it once. I wouldn't read Ikea instructions that way, why should I read code that way?
But again, personal preference. And you don't have to choose either/or -- I love that Javascript's anonymous and named functions are so similar because it means you can switch between the two styles with basically zero cognitive overhead.
I will say, having worked in large enterprise before, I have seen bugs come out of people not inlining code and instead attaching it to classes or leaving it otherwise accessible. Specifically, people will use functions that are intended to be method-specific in other places. Even if those functions are well-written and don't have side-effects (which is often not the case), the original author still doesn't know that other people are now depending on them as an API.
So later on the code gets refactored and those methods get changed or removed, and suddenly you have a broken build. That's something you can partially mitigate by making a function private (although not entirely, because I've also seen it happen in code within the same class/scope). Python doesn't have true privates, but you can also mitigate that problem by just using code reviews to force people to respect `_`. In practice, I often found that it was easier to make the function anonymous, so everyone knew 100% that it was only being used in exactly one place.