Hacker News new | past | comments | ask | show | jobs | submit login

I agreed with their assessment though for a totally different reason: the second function is only doing primitive operations and not invoking any other functions.

For the same reason I found that their refactoring actually made the first function worse. Now I have even more jumps that I need to make in order to 'inline' all the code so that I can deeply understand what the function is doing.

There seems to be an implicit assumption that you should just trust the names of the functions and not look at their implementation. This is hopelessly naive. In the real world, functions don't always have obvious names, their implementation can involve subtleties that 'leak' into usage etc.

I strongly dislike function extraction that's driven by anything except the need to reuse the code block. Function extraction for readability is about as useful as leaving a comment above the code block. Honestly I'm more likely to update the comment than the function name since changing the fn name means updating the callers as well.

edit. To add a bit of nuance: I think 'primitive' vocabulary is what's essential here. The standard lib of a language is primitives. The standard functions of a framework like Ruby on Rails are primitive.

I'm happy to see code written by calling functions/types/operations that I have seen before dozens or hundreds of times. What I don't like is when a feature in a codebase has created it's own deep stack of function calls composed out of functions that I know nothing about.

Create a rich base vocabulary that is as widely shared as possible and use that for your work. Avoid creating your own words as much as possible. This way I can glance at your code and not just see if (BLACK_BOX_1 or WHAT_DO_I_DO_AM_I_LYING) then DO_SOMETHING_BUT_MAYBE_I_ALSO_DO_SOMETHING_ELSE_WHO_KNOWS




If I have a function call to WHAT_DO_I_DO_AM_I_LYING, that's no better than a block of code with a comment that says "WHAT_DO_I_DO_AM_I_LYING". The difference is that, once I look carefully at the code and find out what the called function does and that the name (optimistically) isn't lying, then the called function only takes up one line in the code I'm looking at, whereas the in-line block takes up several lines (plus the line for the comment). For me at least, the function call takes less mental space (if the function name is accurate).


It all depends on how often I'm interacting with an area of code. Is it library code I'm using all the time or some obscure private internal that I visit once a year at most?

If I'm only reading that code like ... 1 time ever, I'm never going to add to memory all the dozens of random 2-line private functions someone broke up their 100 line procedural function into. I'm going to be annoyed and mentally 'inline' it all into one linear block of code and then forget all about it.

Therefore I'll never remember or trust the private function names and will never know what the module does at a glance. I would be better off with it just inlined into one big procedure in the first place. Don't pollute my mental function cache.

The more some code gets touched and read and updated the more I can justify adding an internal vocabulary to it. Again, something like Ruby on Rails has a huge API surface area compared to plain Ruby, but that's fine, because after using it for a decade I know most of it and it was worth learning the vocabulary since I'm constantly using it.

It's like... commonly reused library or std lib or framework code is useful jargon used and reused by a community of people. But that pattern doesn't downscale. Applying it and creating my own jargon for an obscure module in a code base is a bit like someone inventing slang all on their own and whining when people ask them to explain what they mean in plain english. I don't want to read Finnegan's Wake for my job, just give me some boring realist prose suitable for a 5th grade reading level.


I think the ideal here, though I haven't seen it in actual production code - it may not be feasible/ergonomic yet - is to use a sufficiently capable type system and property-based testing so that it is possible to read from those what functions do even if the names and documentation are replaced with random strings.

In such a case the probability a function does something other than what it says on the tin is almost 0. If I am using Forth, the probability of WHAT_DO_I_DO_AM_I_LYING doing something naughty is much higher. In Haskell, it's already pretty unlikely.

Maybe preferences in this regard depend on language and peer trust?


Arguably, the ideal here would be to abandon our obsession with directly operating on the plaintext source code that is a single, canonical source of truth.

Whether the code is split into 100 little helper functions, or is just one long block of the same code inlined into a single function, or somewhere in between - those are view issues. They do not have - or at least should not have - any implication on semantics. It makes zero sense to try and commit to some one perfect balance, because at any given moment, I may want to have everything inlined, or everything extracted to functions, or everything inlined down to two levels deep in the call stack, etc. It all depends on the reason I'm looking at that piece of code in the first place.

This is one of those holy wars that can't be won, because the problem is the limitation of the medium. Similarly with increasingly arcane ways of lazily flatmapping monoids across sets of endofunctors, all to avoid "callback hell", hidden state, non-local control flow, incomplete type definitions, etc. - all at the same time. You just can't do that - the reason bleeding-edge PLs get increasingly weird with syntax and require math degree to grok ideas behind them, is because they aren't improving things - they're just sliding along the Pareto frontier[0], hitting the boundary of plaintext capacity. There's only so many cross-cutting concerns you can express simultaneously in the same piece of text, without making it impossible to understand. And the irony about cross-cutting concerns is, at any given time you want to ignore most of them.

What I mean is, say I'm reading your 100-helper-methods "clean code" function, trying to fix a tricky bug in the algorithm it's implementing. The helper methods are an annoyance, I want them all inlined (with properly interpolated parameter names - i.e. the exact inverse of "extract method"). Also, unless I have a reason to suspect the bug is related to error handling, I don't want to read any of your fancy Result<T, E> noise you use in lieu of exception handling. Hell, I don't want to see exceptions either - I want all error handling logic to disappear entirely (or be replaced with a bomb emoji, so I remember it's there). Same with logging/instrumentation, and a bunch of async calls and the whole "color of your function" bullshit. Not relevant to my problem, I don't care, I don't want to see this. Hell, I probably don't care about most types involved either.

Next week, I'm back to the same code, trying to improve some logging and fix the case where error information isn't propagated. Suddenly, I now want to have those 100 helper functions be their own named things. I want to see Result<T, E> - in fact, I probably don't want to see the T part. Logging? Yes. Async? Probably still no.

A month from now, the impossible happens, and I'm given time to optimize performance of that same piece of code. Obviously, this calls for different set of readability tradeoffs (and more than anything, the ability to change them in flight).

So how about we stop wasting all our combined brainpower on arguing in circles over fake problems, which exist only because we insist on only ever working with the one, single, canonical plaintext representation of a program? It's literally the software equivalent of one-size-fits-all shoes - no matter which size you pick, the fit will be bad for almost everyone. Exceptions vs. Result<T, E>, or "lots of small functions" vs. "few big functions", etc. are all just different ways of asking which shoe size is the bestest shoe size - the problem isn't the shoe size, but that we have to pick a single size for everybody.

--

[0] - https://en.wikipedia.org/wiki/Pareto_front


I can't fault your diagnosis of the problem. Fixing it would be... ambitious, though.

But maybe it would be easier in some languages than others. Lisp, being essentially a naked syntax tree, might be amenable to this. And Go has automated formatting tools. Granted, they aren't used for this, but perhaps they could be extended to do so.

But doing this for something like C++? That seems like something that's going to take a long time.


Fixing this is not a language-level problem, it's a tooling-level problem (and, of course, a big philosophical/cultural problem).

For an incremental approach, we could treat C++, or Python, or Lisp, as they are today, as serializing formats. We can keep using the existing process of compiling source code into executables - what matters is that we stop writing that code directly. That doesn't mean write no code at all, but rather breaking up with the idea that what we write the exact text files that then get compiled into programs.

Instead, the way forward is to treat the source code as more of a database, an artifact we sculpt (and yes, eventually it would be beneficial to ditch the plaintext source file representation too). This means the same underlying source code can be viewed and edited in many ways, textual and graphical, tailored to help with specific concerns or abstraction levels. So basically, take the modern Java/C# IDE, but instead of treating its advanced functionality as something to save you some typing, embrace them as the main way of looking at and modifying code. Then take it up to 11.

There's a precedent to that, too. It's not Lisp though - it's Smalltalk.



Ah, that's an interesting perspective.

I'll postulate a core goal, with regards specifically to optimal abstraction/inlining, involves ensuring the functional dependencies mirrors the readers internal conceptual representation of the problem domain as closely as possible. I consider this property as part of legibility and more important than readability. What it sounds like you are saying is that different tasks involve different conceptual representations and thus need different views, and I find that reasonable.

I'm not convinced that this is possible right now - if at all, however. The crux of the problem is that extraction supporting more than the abstractions you already implemented would encounter obstacles. I suspect automatically abstracting would be a hard problem. And the function naming problem would re-emerge - both you could give functions wrong names, and also your system could. In some cases there will not even be such a thing as an acceptable name, unless you assign random strings, in which case you'll have to check the abstracted code anyway to see what it does. And type signatures/property-based tests can't save you here, because the system would have to determine those. So automatically acquiring different views of the code may not be feasible.

Maintaining multiple views of manually implemented function hierarchies over the same data also sounds even more prone to name/doc drift than conventional formulations.

My point here is that for extraction to work, you have to implement the code you are working on in terms of all the helper functions already and you'd still have to deal with how you chose abstractions and names and description drift and so on.

Your concrete suggestions seem to be to use an IDE that can inline and fold code, use a binary file format for code, use auto-formatting tools, and give your language multiple ways to do each thing so you can have the IDE reformat a block of code with a different style to change how is presented. The only new feature may be using a ML model to tag code according to purpose so you can hide code by tag or hide all code not matching a tag.

While I agree those could be useful, we'd still end up with a canonical lexical function dependency hierarchy to obsess over. The problem of inlining vs abstracting still comes up there. Now abandoning that would be something, but I don't know how to do it.


I basically 100% agree with this and this is the editor env I wish I could live in.

I think the debate still exists because editors/langs don't work like this and you end up having to compromise one way or another. Obviously, if we could do really fancy code folding / inlining / condensing based on arbitrary user defined filters everyone would just write whatever they wanted.

I guess code review might be the only place where people would still complain, depending on how that looked.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: