Hacker News new | comments | ask | show | jobs | submit login

The older I get, the more my code (mostly C++ and Python) has been moving towards mostly-functional, mostly-single static assignment (let assignments).

Lately, I've noticed a pattern emerging that I think John is referring to in the second part. The situation is that often a large function will be composed of many smaller, clearly separable steps that involve temporary, intermediate results. These are clear candidates to be broken out into smaller functions. But, a conflict arises from the fact that they would each only be invoked at exactly one location. So, moving the tiny bits of code away from their only invocation point has mixed results on the readability of the larger function. It becomes more readable because it is composed of only short, descriptive function names, but less readable because deeper understanding of the intermediate steps requires disjointly bouncing around the code looking for the internals of the smaller functions.

The compromise I have often found is to reformat the intermediate steps in the form of control blocks that resemble a function definitions. The pseudocode below is not a great example because, to keep it brief, the control flow is so simple that it could have been just a chain of method calls on anonymous return values.

    AwesomenessT largerFunction(Foo1 foo1, Foo2 foo2)
    {
        // state the purpose of step1
        ResultT1 result1; // inline ResultT1 step1(Foo1 foo)
        {
            Bar bar = barFromFoo1(foo);
            Baz baz = bar.makeBaz();
            result1 = baz.awesome(); // return baz.awesome();
        }  // bar and baz no longer require consideration

        // state the purpose of step2
        ResultT2 result2; // inline ResultT2 step2(Foo2 foo)
        {
            Bar bar = barFromFoo2(foo); // second bar's lifetime does not overlap with the 1st
            result2 = bar.awesome(); // return bar.awesome();
        }

        return result1.howAwesome(result2);
    }
I make a point to call out out that the temp objects are scope-blocked to the minimum necessary lifetimes primarily because doing so reduces the amount of mental register space required for my brain to understand the larger function. When I see that the first bar and baz go out of existence just a few lines after they come into existence, I know I can discard them from short term memory when parsing the rest of the function. I don't get confused by the second bar. And, I don't have to check the correctness of the whole function with regards to each intermediate value.



What if I want to test some part of the function in isolation? At my current job I have to maintain a huge and old ASP.NET project that is full of these "god-functions". They're written in the style that Carmack describes, and I have methods that span more than 1k lines of code. Instead of breaking the function down to many smaller functions, they instead chose this inline approach and actually now we are at the point where we have battle-tested logic scattered across all of these huge functions but we need to use bits and pieces of them in the development of the new product.

Now I have to spend days and possibly weeks refactoring dozens of functions and breaking them apart in to managable services so we can not only use them, but also extend and test them.

I'm afraid what Carmack was talking about was meant to be taken with a grain of salt and not applied as a "General Rule" but people will anyway after reading it.


Perhaps it suggests our way of testing needs to change? A while back I wrote a post describing some experiences using white-box rather than black-box testing: http://web.archive.org/web/20140404001537/http://akkartik.na... [1]. Rather than call a function with some inputs and check the output, I'd call a function and check the log it emitted. The advantage I discovered was that it let me write fine-grained unit tests without having to make lots of different fine-grained function calls in my tests (they could all call the same top-level function), making the code easier to radically refactor. No need to change a bunch of tests every time I modify a function's signature.

This approach of raising the bar for introducing functions might do well with my "trace tests". I'm going to try it.

[1] Sorry, I've temporarily turned off my site while we wait for clarity on shellsock.


Something to consider, and this is only coming off the top of my head, is introducing test points that hook into a singleton.

You're getting more coupling to a codebase-wide object then, which goes against some principles, but it allows testing by doing things like

function awesomeStuff(almostAwesome) {

  MoreAwesomeT f1(somethingAlmostAwesome) {
    TestSingleton.emit(somethingAlmostAwesome);
    var thing = makeMoreAwesome(somethingAlmostAwesome) 
      // makeMoreAwesome is actually 13 lines of code,
      // not a single function
    TestSingleton.emit(thing);
    return thing;
  };

  AwesomeResult f2(almostAwesomeThing) {
    TestSingleton.emit(almostAwesomeThing);
    var at = makeAwesome(awesomeThing); 
      // this is another 8 lines of code. 
      // It takes 21 lines of code to make somethin 
      // almostAwesome into something Awesome, 
      //and another 4 lines to test it.
      // then some tests in a testing framework
      // to verify that the emissions are what we expect.
    TestSingleton.emit(at);
    return at;
  }

  return f2(f1(almostAwesome));
}

in production, you could drop testsingleton. In dev, have it test everything as a unit test. In QA, have it log everything. Everything outside of TestSingleton could be mocked and stubbed in the same way, providing control over the boundaries of the unit in the same way we're using now.


How brittle are those tests though?

I've had to change an implementation that was tested with the moral equivalent to log statements, and it was pretty miserable. The tests were strongly tied to implementation details. When I preserved the real semantics of the function as far as the outside system cared, the tests broke and it was hard to understand why. Obviously when you break a test you really need to be sure that the test was kind of wrong and this was pretty burdensome.


I tried to address that in the post, but it's easy to miss and not very clear:

"..trace tests should verify domain-specific knowledge rather than implementation details.."

--

More generally, I would argue that there's always a tension in designing tests, you have to make them brittle to something. When we write lots of unit tests they're brittle to the precise function boundaries we happen to decompose the program into. As a result we tend to not move the boundaries around too much once our programs are written, rationalizing that they're not implementation details. My goal was explicitly to make it easy to reorganize the code, because in my experience no large codebase has ever gotten the boundaries right on the first try.


I've dealt with similar situations, and it was what led to me to favor the many-small-functions myself. I like this article because by going into the details that convinced him, Jon Carmack explains when to take his advice, not just what to take his advice on.

I think maybe the answer is that you want to do the development all piecemeal, so you can test each individual bit in isolation, and /then/ inline everything...

That sound like it might be effective?


I'm not sure. If you then go head and inline the code after, your unit tests will be worthless. I mean it could work if you are writing a product that will be delivered and never need to be modified significantly again (how often does that happen?). Then one of us has to go and undo the in-lining and reproduce the work :)


I think I'm going to say that, if it's appropriately and rigorously tested during development... testing the god-functionality of it should be OK.

Current experience indicates however that such end-product testing gives you no real advantage to finding out where the problem is occurring, since yeah, you can only test the whole thing at once.

But the sort-of shape in my head is that the god-function is only hard to test (after development) if it is insufficiently functional; aka, if there's too much state manipulation inside of it.

Edit: Ah, hmm, I think my statements are still useful, but yeah, they really don't help with the problem of TDD / subsequent development.


Current experience indicates however that such end-product testing gives you no real advantage to finding out where the problem is occurring, since yeah, you can only test the whole thing at once.

I’m not so sure. I’ve worked on some projects with that kind of test strategy and been impressed by how well it can work in practice.

This is partly because introducing a bug rarely breaks only one test. Usually, it breaks a set of related tests all at once, and often you can quickly identify the common factor.

The results don’t conveniently point you to the exact function that is broken, which is a disadvantage over lower level unit tests. However, I found that was not as significant a problem in reality as it might appear to be, for two reasons.

Firstly, the next thing you’re going to do is probably to use source control to check what changed recently in the area you’ve identified. Surprisingly often that immediately reveals the exact code that introduced a regression.

Secondly, but not unrelated in practice, high level functional testing doesn’t require you to adapt your coding style to accommodate testing as much as low level unit testing does. When your code is organised around doing its job and you aren’t forced to keep everything very loosely coupled just to support testing, it can be easier to walk through it (possibly in a debugger running a test that you know fails) to explore the problem.


> I'm not sure. If you then go head and inline the code after, your unit tests will be worthless.

Local function bindings declared inline perhaps? It seems to me you could test at that border.


Could this not be achieved in an IDE - "inline mode". It could display function calls as inline code and give the advantages of both.


If it's done strictly in the style that I've shown above then refactoring the blocks into separate functions should be a matter of "cut, paste, add function boilerplate". The only tricky part is reconstructing the function parameters. That's one of the reasons I like this style. The inline blocks often do get factored out later. So, setting them up to be easy to extract is a guilt-free way of putting off extracting them until it really is clearly necessary.

But, it sounds like what you are dealing with is not inline blocks of separable functionality. Sounds like a bunch of good-old, giant, messy functions.


I think the claim is that if you don't start out writing the functions you don't start out writing the tests, and so your tests are doomed to fall behind right from the outset.

I'm not fanatical about TDD, but in my experience the trajectory of a design changes hugely based on whether or not it had tests from the start.

(I loved your comment above. Just adding some food for my own thought.)


"I'm not fanatical about TDD, but in my experience the trajectory of a design changes hugely based on whether or not it had tests from the start."

I'm still not sold on the benefits of fine grained unit tests as compared to having more, and better, functional tests.

If the OPs 1k+ methods had a few hundred functional tests then it should be a fairly simple matter to re-factor.

In "the old days" when I wrote code from a functional spec the spec had a list of functional tests. It was usually pretty straightforward to take that list and automate it.


Yeah, that's fair. The benefits of unit tests for me were always that they forced me to decompose the problem into testable/side-effect-free functions. But this thread is about questioning the value of that in the first place.

Just so long as the outer function is testable and side-effect-free.


Say you have a system with components A and B. Functional tests let you have confidence that A works fine with B. The day you need to ensure A works with C, this confidence flies out of the window, because it's perfectly possible that functional tests pass solely because of a bug in B. It's not such a big issue if the surface of A and C is small, but writing comprehensive functional tests for a large, complex system can be daunting.


The intro to the post has Carmack saying he now prefers to write code in a more functional style. That's exactly the side-effect-free paradigm you're looking for.


Even most of the older post is focused on side-effecting functions. His main concern with the previous approach is that functions relied on outside-the-function context (global or quasi-global state is extremely common in game engines), and a huge source of bugs was that they would be called in a slightly different context than they expected. When functions depend so brittly on reading or even mutating outside state, I can see the advantage to the inline approach, where it's very carefully tracked what is done in which sequence, and what data it reads/changes, instead of allowing any of that to happen in a more "hidden" way in nests of function-call chains. If a function is pure, on the other hand, this kind of thing isn't a concern.


> They're written in the style that Carmack describes, and I have methods that span more than 1k lines of code.

I don't think that's the kind of "inlining" being discussed -- to me that's the sign of a program that was transferred from BASIC or COBOL into a more modern language, but without any refactoring or even a grasp if its operation.

I think the similarity between inlining for speed, and inlining to avoid thinking very hard, is more a qualititive than a quantitative distinction.


"I think the similarity between inlining for speed, and inlining to avoid thinking very hard, is more a qualititive than a quantitative distinction."

I think what's being discussed here is quite either of those - this seems to be "inlining for visibility" and possibly "inlining for simplicity".


Is not quite either of those.


Have you seriously never written a 1000 line routine in C from scratch?


Sure, before I knew how to write maintainable code. Before I cared to understand my own code months later.

My first best-seller was Apple Writer (1979) (http://en.wikipedia.org/wiki/Apple_Writer), written in assembly language. Even then I tried to create structure and namespaces where none existed, with modest success.


Another great comeback for the annals of HN (like https://news.ycombinator.com/item?id=35083)


Maybe you should just be testing the 1k functions, if that even, and not the individual steps they take. The usefulness of testing decreases with the size of the part being tested, because errors propagate. An error in add() is going to affect the overall results, so testing add() is redundant with testing the overall results and you are just doing busywork making tests for it.


I often question the wisdom of breaking things down into micro functions. Usually when I'm delving into a Ruby code base where they have taken DRY to the extreme(I'm looking at you Chef). There is so much indirection occurring in the most basic of operations that it becomes a huge PITA trying to get your head around what's happening. An IDE that could interpose the source in the method call could be handy in such situations... I also feel that people conflate duplicating code with duplicating behaviour, which is what DRY is really about.


> I also feel that people conflate duplicating code with duplicating behaviour, which is what DRY is really about.

Indeed. The definition of the DRY principle [1] is:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

and not:

Don't type the same characters into the keyboard multiple times.

People often forget that.

[1]: http://c2.com/cgi/wiki?DontRepeatYourself


You could have the best of both worlds with IDE support, once it is there people may stop putting so many innecesary layers if they aren't needed only so that function gets "inlined by the IDE"


Look at the Java or .Net worlds as a cautionary example: powerful IDEs are useful for the code you have now but they also enable people to write even more labyrinthine code and the complexity fiends are usually more aggressive at pushing those limits.


Unfortunately, this conflicts with the (common, good) advice to make as many variables final / const as possible, since it separates the declaration of a variable and its assignment.

One nice thing that falls out of Scala's syntax is that it makes this style possible without using mutation:

    val result1 = {
      val bar = barFromFoo1(foo)
      // ...
      baz.awesome()
    }

    val result2 = {
      val bar = ... // unrelated to 'bar' above
      bar.awesome()
    }
All names introduced in the block go out of scope at the end of the block, but the 'value' of the block is just the value of the last line, so you can assign the whole thing to a final / constant variable. This style has all the advantages listed above, and makes it easier to avoid mutation and having uninitialized variables in scope -- I wish more languages had it.


> Unfortunately, this conflicts with the (common, good) advice to make as many variables final / const as possible, since it separates the declaration of a variable and its assignment.

In Java you can separate the declaration and assignment of a final variable as long as every branch provably assigns to the variable. For example:

  final int x;
  if(someCondition) {
      final int a = 1;
      final int b = 2;
      x = a + b;
  } else {
      x = 0;
  }


I did not know this; thanks! I think I still prefer Scala's structural style, but I'll definitely have good use for this when I wander back into the Java universe.


+1 for this. I'm still pretty new to Scala, but coming from my day job (Java) , I'm finding more and more reasons to love it.

As a side note, I'd love to see Scala make it's way into game development. I've been messing around trying to get libgdx working with it. But I would love something that lets me take full advantage of the Scala language.


I would enjoy reading more comments from people who have been programming for a very long time about how their coding style has changed.


I don't have much to say that'd fit in a comment, but for science here are two of my programs in C doing similar tasks, one from the late 1980s, one from 2008: https://github.com/darius/req https://github.com/darius/vicissicalc


Line 457-470 of vicissicalc.c: why do you use else if here rather than switch-case?


Might have been because of the `else if (ch == 'q') break;` line. If he used a switch statement he would have needed to use a goto to break out of the loop.


That's a reasonable guess, but a return would work there. I think I did it this way because all the breaks you need in a switch are noisy -- too noisy if you'd like to write one action per line. However, you can mute the noise by lining it up:

        switch (getchar ()) {
            break; case ' ': enter_text ();
            break; case 'f': view = formulas;
            break; case 'h': col = (col == 0 ? 0 : col-1);
which also makes oops-I-forgot-the-break hard to miss. I hadn't thought of that pattern yet. (You could define a macro for "break; case" too; my friend Kragen calls that IF.)

But I mostly stopped coding in C after around this time.


I thought you were the one who suggested the IF and ELSE macros in http://canonical.org/~kragen/sw/aspmisc/actor.c. :)

Interestingly, in http://canonical.org/~kragen/sw/dev3/paperalgo, I haven't yet run into the desire to have more than one `case` in a pattern-matching `switch`. I just added that piece of code from Vicissicalc to the paperalgo page.


The first break is ignored?


Not exactly. But it does create a no-op default. I've never seen/used this pattern, so I would have to go compile this down into assembly and play with it to give you a more complete answer.


Dropped by dead-code elimination. A compiler might conceivably issue a warning that the first break is unreachable, though that's never happened so far.


Thank you very much!


Seconded! Also I would like to see comments on how programming styles in general have changed over the years - does a 80's era high-quality program still look like a 2010's high-quality program once you factor out the syntactic sugar?


I find it fascinating that larger traditional languages have changed little over time, while the languages of front-end development seems to change daily.

My coding style seemingly morphs about every few months now. It's sad to think stuff I wrote even three years ago I would never show someone interviewing me or put it in a portfolio nowadays.


If you don't hate the code you wrote six months ago you're stagnating.


This should only be true during the early days of your career. If you're out of the novice stage, you shouldn't be writing hateful code – maybe there's a new library to use or something you now understand about the problem but that's hardly the level of hatred.


For this purpose, it might be worth checking out some classic books such as "Software Tools" and "Project Oberon".


Speaking specifically about C++, lambdas are good for this, if not quite syntactically ideal:

    AwesomenessT largerFunction(Foo1 foo1, Foo2 foo2)
    {
        ResultT1 result1 = [&] {
            Bar bar = barFromFoo1(foo1);
            Baz baz = bar.makeBaz();
            return baz.awesome();
        } ();

        ResultT2 result2 = [&] {
            Bar bar = barFromFoo2(foo2);
            return bar.awesome();
        } ();

        return result1.howAwesome(result2);
    }
Bonus: you can initialize 'const' variables with multiple statements:

    const auto values = [&] {
        std::vector <int> v (n);
        std::iota (begin (v), end (v), 0);
        std::shuffle (begin (v), end (v), std::mt19937 {seed});
        return v;
    } ();


I've been taking this tack more and more as well and while the syntax is never elegant, one at least grows used to it.

I also try to explicitly name which variables I'm capturing (within reason) as it makes it obvious at a glance what can and can't be modified within the lambda. I really wish it was possible to force constness on captured variables :/


I've experimented with this, too. One thing that I also like is that you can have multiple returns within the lambdas and know that the control flow paths will merge again at a common point. The compiler can also make sure that each one returns a value of a compatible type.


I've found myself doing this a lot. Glad to know I'm not the only one. Functions with a single call site don't need to be cluttering up a larger namespace; limited scope is also good for the internal variables.


Why the hell would you want to do that? There's no benefit and only drawbacks over the plain old block syntax.


> Why the hell would you want to do that?

To isolate all of the initialization logic for a single object (e.g., "result1" or "result2") into its own scope. You could use a function (in fact, that's exactly what's being done here—it's just an anonymous function), but moving that logic away from its use and into global scope generally just makes the code more difficult to understand.

> There's no benefit and only drawbacks over the plain old block syntax.

You cannot initialize a const object with the "plain old block syntax." With this, you can, and that is an enormous benefit. It's also much easier to see that the object is initialized; it's not immediately obvious that an uninitialized variable declaration followed by an assignment many lines later will always initialize the variable, but if you initialize it in its declaration, you know it will necessarily be initialized. That is also a very important benefit.

What are the drawbacks? That the syntax is a little uglier? It's not ideal, but it's hardly huge syntactic overhead: four or five non-whitespace characters and a return instead of an assignment. In fact, the syntax is markedly better if you have multiple initialization paths, because you can use return instead of goto; compare

    char* foo;
    {
        char* mem = (char*) malloc (N);
        if (!mem) {
            mem = NULL;
            goto foo_init;
        }
        if (!fgets (mem, N, fd)) {
            free (mem);
            mem = NULL;
            goto foo_init;
        }
    foo_init:
        foo = mem;
    }
    
    the_next_thing:
to

    char* foo = [&] {
        char* mem = (char*) malloc (N);
        if (!mem)
            return NULL;
        if (!fgets (mem, N, fd)) {
            free (mem);
            return NULL;
        }
        return mem;
    } ();
    
    the_next_thing:
Even after the "plain old block" version has been contorted to make assignment to foo unconditional (it's essentially a manually inlined function), it isn't obvious that it's correct. Does the block do "goto the_next_thing" or something similarly hairy? Read the whole thing to find out.

In contrast, once you've seen that foo is initialized by a function and that function ends in an unconditional return, you know that foo cannot not be initialized at the_next_thing, period (even if you throw exceptions into the mix!).

You might argue that you can verify the correctness of the first version, and of course you can, just as you can verify the correctness of an HTML parser written in brainfuck. It's all a matter of cognitive load, of the amount of context you have to hold in your head at one time to reason about the correctness of one part of your code. It's the difference between verifying that the function initializing foo has an unconditional return and verifying that every control path in the initialization block reaches foo_init, when what you're really trying to do is just to show that foo will point to a valid string at "the_next_thing."

In fact, that's basically the primary motivation behind functions, next to code reuse. All this pattern is really doing is modularization-by-function of initialization logic without the cognitive overhead induced by lexically separating the logic's definition from its use. In doing so, it delivers the same advantages as the plain-initialization-block pattern, but preserves the comparatively simple control-flow-barrier semantics of functions.

Compared to goto-spaghetti, typing "[&]" and "();" really isn't that big of a deal.


I'm new to C++ so please forgive my ignorance but where is howAwesome() defined in result1's lambda?


It isn't. It's a part of the type "ResultT1."


Those lambdas are immediately applied, so it's defined in the type returned by them.


He mentions that at the end of the article:

> Some practical Matters --- Using large comment blocks inside the major function to delimit the minor functions is a good idea for quick scanning, and often enclosing it in a bare braced section to scope the local variables and allow editor collapsing of the section is useful. I know there are some rules of thumb about not making functions larger than a page or two, but I specifically disagree with that now -- if a lot of operations are supposed to happen in a sequential fashion, their code should follow sequentially.


"* if a lot of operations are supposed to happen in a sequential fashion, their code should follow sequentially.*"

He is absolutely correct in this. However, he's wrong with regards to the level of abstraction. Those "operations" should be part of functions that could be scattered all over the code base in whatever order they were written. But at the end of the day, they will be called sequentially right next to each other.

I've often found this to be the case. The developers I see that make these "god" functions are unable to write and compose their code in layers. They instead see the entire run (start->finish) of their programs as one giant series of "sequential" "operations". So what ends up happening is they've got high-level code, interspersed with low level io/networking/db calls.


I'll say "it depends."

At work, I have a 450 line function. What it does is very simple:

1) validate the request; if it's invalid, send back an error. 2) look up some information. 3) send that information.

Step 1 is around 300 lines. But a majority of the lines are huge comment blocks describing what the check is for and referencing various RFCs. Yeah, I could have broken this part off into its own function, but ... there will only be one caller and doing so would be doing so for its own sake [1].

Step 2 is one line of code (it calls a function for the information).

Step 3 is the remainder; it's just constructing the response and sending it.

The code is straightforward. It's easy to follow. It's long because of the commentary.

[1] I did have to deal with a code base where the developer went way overboard with functions, most of which only have one caller. It's so bad that the main loop of the program is five calling levels deep. I once traced one particular code path down a further ten levels of function calls (through half a dozen files) before it did actual work.

When I confronted the developer about that, he just simply stated, "the compiler will inline all that anyway." And yes, the compiler we're using can do inlining among multiple object files, which makes it a real pain to debug when all you have is a core file.


> went way overboard with functions, most of which only have one caller. It's so bad that the main loop of the program is five calling levels deep

It almost sounds like you're talking about http://canonical.org/~kragen/sw/dev3/server.s! Or for that matter http://canonical.org/~kragen/sw/aspmisc/my-very-first-raytra...! But it's unlikely you've had to maintain either of them. I intentionally broke things up like that with the idea that it would make it easier to understand. In fact, I originally wrote httpdito as straight-line code and only later factored it into macros to make it easier to understand.

It sounds like you're saying that this was misguided and I should have just used nesting. I'll have to think about that for a while.


In reference to your 450 line function. What you just explained to me are three different sections to some algorithm that you have. One of the sections you already have as a separate function. However, the other two are quite large. Imagine you've just found this 450 line function while debugging. How would you know to not go through each line to find the problem, or step over all 300/1/150 line chunks of it?

There is benefit in being able to group the chunks, and maybe your comments are quite obvious in the grouping. But if I know I'm looking for a "validation" bug, and I come across a grouping of three functions which are called, say, "PrepareRFCRequest", "GetResponseInformation" and "ConstructResponse" then I can very easily deduce that they're not related to my problem and ignore them. You could say they're not my concern.

A lot of these things I talk about build upon each other. i.e. If you then go ahead and put some sort of input validation inside of your "GetResponseInformation" function, then the grouping / function abstraction is pretty much useless and may even be detrimental when debugging, e.g. in my example above.

"there will only be one caller and doing so would be doing so for its own sake" No, there is benefit to putting it in its own function. Even if there is one caller. Because creating functions isn't always about reducing duplication, but of concerns/abstraction and to me above all composability.

"[1] I did have to deal with a code base where the developer went way overboard with functions, most of which only have one caller. It's so bad that the main loop of the program is five calling levels deep. I once traced one particular code path down a further ten levels of function calls (through half a dozen files) before it did actual work."

Perhaps you, I, and the developer you speak of have a different view of what the "main body/loop" of the program, or any program in general.


It's far easier to do a brain dump and write everything in sequence upfront as a proof-of-concept. That should be in the very early stages only though.

Things like database reads/writes should be refactored out immediately. This is boring stuff that only challenges inexperienced developers. Most developers with some experience can grok the idea of a function named 'SaveToDB(stringthing)' or something like it. When not dealing with common operations, the key here is repackaging your functions in a way that the meaning is significantly conveyed through the name of that function without the name being excessively long. Short function bodies also ensure that things remain quickly absorb-able.

Taking if-blocks or loops and putting them into their own function just to shrink the size of the god function to pretend you're refactoring really serves no purpose though (IMO). This is especially true if the number of times those functions are called is less than 2. (believe me, I've seen it!)


> Things like database reads/writes should be refactored out immediately. This is boring stuff that only challenges inexperienced developers. Most developers with some experience can grok the idea of a function named 'SaveToDB(stringthing)' or something like it.

They should be refactored out, yes, but refactored out in a different component with little (ideally none) logic in it, not a different location of the higher-level component.

> Taking if-blocks or loops and putting them into their own function just to shrink the size of the god function to pretend you're refactoring really serves no purpose though (IMO). This is especially true if the number of times those functions are called is less than 2.

Of course it serves a purpose. Take a codebase where you have functions that go over 5 or 10 screens. Now, take a codebase which does the exact same thing, but where the god function is split into smaller and smaller functions depending on the complexity. The second codebase is much easier to read (as long as function names are well chosen). It also means that most, if not all, of the comments necessary to understand codebase 1 (which may or may not be present, and may or may not be up to date) can be removed. The number of call sites for a function does not matter. What matters is how small functions make the code easy to read.


"It's far easier to do a brain dump and write everything in sequence upfront as a proof-of-concept. That should be in the very early stages only though."

I guess I'm different to you in that regard. To me, my "brain dump" is a bunch of function skeletons that I write out at a high-level. E.g. if I know at some point I'll have to parse a file, I brain dump it as:

result = parse_file(get_file_data())

At that point, I've already defined my "higher-level" brain dump (as you call it) without having to worry about stuff like file-format, filenames, checking for deleted files, etc.

"Taking if-blocks or loops and putting them into their own function just to shrink the size of the god function to pretend you're refactoring really serves no purpose though (IMO)"

Well, depends on the if-blocks I'd say. 99% of the time, I bet you that those blocks and loops can grouped into logical pieces of work that happen in stages. Claiming that we do it just to "pretend" we're refactoring doesn't really add anything to the conversation.


Exactly, and he also suggests that in the email: "(...) and often enclosing it in a bare braced section to scope the local variables and allow editor collapsing of the section is useful".

I hadn't understood this "maybe leave functions inlined" rant a couple years ago when I first heard about it - it makes a lot of sense now.


This is what I do as well, for exactly the same reasons. (You saved me a post. Thanks!) It works pretty well, I think. As you say, it's nice to have the code just written out in the order in which it will be executed.

It can look a bit daunting at first sight - if you're not used to this style of code, it just looks like your average rambling stream-of-consciousness function - but it's actually pretty easy to keep on top of. And if people really complain, it's super-easy to split out into functions :)


I've been thinking about doing the same thing, but with actual closures/lambdas in more modern languages. Not sure if there's much of a point, though.


It's idiomatic in Rust, even:

   let result = {
       let b = foo(a);
       let mut c = b.see();
       while (c) {
         c.frob();
       }
       baz(c)
   };


Although the style described above will also work just fine:

    let result;
    {
        let b = foo(a);
        let mut c = b.see();
        while (c) {
          c.frob();
        }
        result = baz(c);
    }


The downside is that lambdas are usually a bit more complex to write/read than plain old scope blocks.

The very nice upside would be that you could make the inputs to the blocks explicit. In contrast, the fact that foo1 is an input to step1 and foo2 is an input to step2 can only be understood by careful examination.


I don't see how you want to use first-class functions, lambdas, in this situation. Maybe, you meant to say nested functions? Lambda function is a piece of code that is usually sent somewhere else, a callback basically.

If you define a lambda function instead of a block of code and call it right there, it would create a considerable overhead: because if it is a closure, it should save its current scope somewhere. And it is really unnecesary, if you call this function right at the same place where you define it.


sjolsen did a good job of illustrating what I meant https://news.ycombinator.com/item?id=8375341 Improvements on his version would be to make everything const and the lambda inputs explicit.

    AwesomenessT largerFunction(Foo1 foo1, Foo2 foo2)
    {
        const ResultT1 result1 = [foo1] {
            const Bar bar = barFromFoo1(foo1);
            const Baz baz = bar.makeBaz();
            return baz.awesome();
        } ();

        const ResultT2 result2 = [foo2] {
            const Bar bar = barFromFoo2(foo2);
            return bar.awesome();
        } ();

        return result1.howAwesome(result2);
    }
It's my understanding that compilers are already surprisingly good at optimizing out local lambdas. I recall a demo from Herb Sutter where std::for_each(someLambda) was faster than a classic for(int i;i<100000;i++) loop with a trivial body because the for_each internally unrolled the loop and the lamdba body was therefore inlined as unrolled.


In the languages that I'm using (JS, and now Swift), lambdas are only marginally more difficult to write than a pair of brackets. In fact, in Swift, you actually can't write a pair of brackets to designate scope, but you can put stuff in a lambda if you wish.


and C++11 btw ...


GCC has a (non-standard) extension where you can have nested functions:

https://gcc.gnu.org/onlinedocs/gcc/Nested-Functions.html

This allows you to actually write what you want. Nicer still, these internal functions are aware of surrounding context, so they're full closures, and thus you can take their address and pass it around as a callback without needing any void* cookie. I've been using these more and more.

The downsides are that this is a nonstandard gcc-only extension and that it's available in C only, not C++. Depending on what you're doing, these can be deal-breakers.


My brain has taken to interpreting "GCC extension" as "supported by GCC and Clang," but nested functions are one of the few cases where that isn't true,[1] which is worth noting for readers who are concerned about FreeBSD or OS X.

[1] http://clang.llvm.org/comparison.html#gcc


Fwiw there's no real problem with gcc-specific code on FreeBSD if you're writing application software. Kernel code and core utilities in the base install are supposed to be able to compile with clang, but stuff in ports can use anything that is itself available in ports. The ports system makes it easy to specify when something requires gcc to compile vs. any C compiler, and doing so is fairly common (heck, it's not a problem to require even more exotic things either, like Ocaml or gfortran).


This is why I prefer a language where everything is an expression, and I can nest things without limit. If I need a simple, but narrow-use helper function? Why, define it inside the function I need it in, and use it a handful of times (or once).


Doesnt need to be an expression language for that. Algol and Pascal had nested procedures and lexical scope but still had traditionla imperative statements.


I meant I want both features. I want to be able to write "let x = if a then y else z", and I also want to nest things.

But if everything is an expression, then nesting seems to arise naturally.


Could you please explain why it'd be useful to do that for something you'd only use once, esp only within that function? I honestly don't see how that's useful or more clear. Some examples with explanations would probably help. Thanks :)

(Edited to fix minor typo)


So I opened up one of my files and found the first instance of this. It was a function to load some subset of data and emit some log info. It was only called once, at the end of a chain of maps and filters.

  let load_x xid = 
    // loading code

  // Later on...
  let data' = foo |> map ... |> filter ... |> ... |> map load_x
Once you view functions as values like anything else, then making single-use, "temporary" functions seems as normal as temporary or intermediate variables.


Just consider naming clarity. It's just like using i for an iterator variable -- if it were global or file-wide in scope it would be disastrous, but because it's local to the block it's simple and intuitive.

Similarly you can give a little helper function a short, obvious name knowing it won't have side-effects.


I use literate programming for exactly this kind of tension. You can have blocks of code written elsewhere with a high-level overview for your function, but in the built program, you can see it all in one place the way it will be run.


I agree that dividing the largerFunction into smaller ones has a non-negligible cost, and that cost is never mentioned at school. However, I still think that the benefits provided by moving them out outweigh it.

One think that I found beneficial is that by dividing the big functions into smaller ones, the resulting functions have the same abstraction level. To give an exaggerated example: I try to split my functions so that I never deal with Countries, Provinces, Cities, Buildings, People, Body Members, Organs & Cells in the same scope. I try to only deal with one of them per function (sometimes two, in parent-child cases).

I find that 1-abstraction-level functions are easier to understand, and I gladly "pay the price" of having extra one-use names around for this reason. I do try to restrict the scope of those "extra functions" as much as I can, put related functions together, and reduce side-effects as much as I humanly can.


What non-negligible costs are you talking about?


The ones Karmack talks about. Several "concepts"(functions in this case) with implicit relationships (like what calls what, and in what order) are (sometimes) more difficult to process for a human brain than a huge block of sequential code.


Oh, ok. Thanks. I guess that's why we try to avoid unnecessary naming in Haskell, and if we can, just chain our functions like eg

   bigFunction = someFunction callBack . otherFunction arg1 callBack2


The ones Karmack talks about. Several "concepts"(functions in this case) with implicit relationships (like what calls what, and in what order) are (sometimes) more difficult to absorve than a huge block of sequential code.


Right. In my old age, I've actually come to appreciate some aspects of the C pre-99 insistence on declarations occurring at the start of blocks. I think on balance strict adherence is too much (especially the absence of initializers in for-loop headers), but as you say blocking off regions where things are used helps substantially.


Tempting but extra levels of indentation also lead to poor readability. The other problem is that this style of programming isn't commonplace and will throw off new readers. Better to just stick to a standard program, broken up into pseudocode I think. Technically, the bind time of variables are still the entire function but in practice you can choose to limit the useful scope of a variable to the code block where it is defined. It doesn't take a master coder to notice quickly that the first variable at the top is the one to remember going down the lines. The only remaining problems, then, are that variable names become a little longer to avoid conflicts between the code blocks and the compiler probably won't be able to optimize the function as well as you'd like to, since some extra variables stick around for longer when not needed.


I've found using a powerful language like Scala permits me to insert terse one liners, where in Java I might have created a function with a rich, English language name describing what the function does. Instead of a semantic description of the function, you see the function itself.


Some people like to solve this problem with 'goto's, as in the 'small' subfunction in https://golang.org/src/pkg/math/gamma.go

Personally I'd like to see something like Haskell's "where", to create a scope for private functions in the "B style":

    void largerFunction(void) {
        small1();
        small2();
    } where {
        void small1(void) {
            ...
        }
        void small2(void) {
            ...
        }
    }
Until that becomes possible, we can always just make sufficiently small modules with few public functions.


Can be achieved with lambdas or anonymous functions in most languages; c#, c++, javascript, python to name a few :)


Sure, but then you'd have to put the declaration before the function call.


You aren't the only one. I do stuff like this all the time, although I tend to find those kind of code blocks eventually get broken out into separate functions anyway if you wait long enough.


The #region directive in c# is pretty awesome for this and I really wish every language had something similar. Sadly I'm not sure there's a great way to do it for e.g. ruby and obviously the situation would be even worse for homoiconic languages.



Man, he's lucky he's never had to deal with a 12k lines in the main executable. Regions are a godsend there.


I used to use them, but now I view #regions as a massive code smell.


This is a nice approach and I have done it too. I appreciate people who care about their code.


IIUC that's the purpose of `where` in Haskell. It introduces `function` denotation for semantics but without the function call model.

One can read the high level code in small terms, and dive into details inside the where bound expression.


And of course, you can introduce functions in `where', too.


> mostly-single static assignment (let assignments).

Please don't call this programming style SSA, that's confusing it with the compiler IR pattern. "Immutable variables" is the popular jargon.


It's doing the same thing, except in this case for clarity/ease of debugging.


SSA should be thought of in terms of control flow. The immutable variables are a tool that makes it easier to reason about control flow and that flattens variable scoping. When you have nested lexical scope and immutable variables, that looks nothing like SSA. No one ever calls Java code that uses final SSA.


if certain functions are only called within one other function, couldn't the compiler automatically inline them to get rid of the method call overhead, as long as they're not externs?


His comments are on style, not performance. "In no way, shape, or form am I making a case that avoiding function calls alone directly helps performance." Although he does argue that this style indirectly leads to better performing code.


This really seems like something that should be solved by the IDE.

In VS13 the R-Click -> Go to Definition interface is OKay... but it definitely could be better


Light Table's code bubbles look pretty nice.

http://www.chris-granger.com/images/lightable/light-full.png


How do you do this in Python? Or is this just an approach you can take in C/C++?


You should be able to do domething similar to the C example in most other languages with sane, lexical, variable scoping. Unfortunately, Python doesn't really play nice with nested scopes/functions so you are a bit out of luck here.


All comprehensions get their own scope in Python 3.


You could define local functions inside your larger sequential function, and then call immediately. Not 100% the same, but in terms of reading-code-in-order, it would have the effect, and it would create the same scope restrictions as the listed code.


You can't do it in Python, but it's not limited to C/C++ either; this is usually called "block scoping" and it's present in C/C++, C#, I think Ruby, Java, etc.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: