Hacker News new | past | comments | ask | show | jobs | submit login
Applying the Linus Torvalds “Good Taste” Coding Requirement (medium.com/bartobri)
572 points by Jerry2 on Oct 26, 2016 | hide | past | favorite | 289 comments



I'm reminded of a quote from Moore in "Thinking Forth":

"A lot of conditionals arise from fuzzy thinking about the problem. In servo-control theory, a lot of people think that the algorithm for the servo ought to be different when the distance is great than when it is close. Far away, you’re in slew mode; closer to the target you’re in decelerate mode; very close you’re in hunt mode. You have to test how far you are to know which algorithm to apply."

"I’ve worked out a non-linear servo-control algorithm that will handle full range. This approach eliminates the glitches at the transitioning points between one mode and the other. It eliminates the logic necessary to decide which algorithm to use. It eliminates your having to empirically determine the transition points. And of course, you have a much simpler program with one algorithm instead of three."

"Instead of trying to get rid of conditionals, you’re best to question the underlying theory that led to the conditionals."

That's part of a chapter of the book called Minimizing Control Structures. Forth guys are crazy about taste, and if I've learned anything from reading their stuff, it's that chasing tasteful programming to its end gets very hard.

OP is right on the money. The hard thing is that it is a creative process, and takes a real understanding of the problem you're solving to do it. Worst of all, aside from the feeling of solving a puzzle well, the benefits only begin appearing much later. I'm glad the kernel team takes it seriously.


Be it kernel programming or Forth, the cultural shock can hit hard.

Most programming languages try their best to make things easier than easy, in order to maximize programmers' productivity. The focus on this parameter alone over-promotes the "worse is better" or "good enough" mindset.

That doesn't fare well with kernel programming. It's even worse with Forth. As soon as your "word" (function in Forth lingo) has to deal with more than two or three variables, you're in trouble. As soon as you have more than two levels of control-flow structures, nightmares begin.

If you're looking for programming practice then I recommend Forth. It may or may not be a "practical" language for you, but it makes you really think about complexity.


Forth was practical when it was invented though.


Still is practical, OpenBoot/OpenFirmware is still what boots big IBM POWER servers. Not sure if Oracle still uses it but Sun did.

OpenBoot is basically like UEFI but less complex and written in a Forth dialect. It's quite cool.



> That's part of a chapter of the book called Minimizing Control Structures.

Since the book is creative commons and freely available, I decided to download the book and have a look. Figure 8.1, the "automatic teller" example is just downright hilarious. The author presents the code and throws a challenge at the reader: "Easy to read? Tell me under what condition the user’s card gets eaten." Here's the code:

  IF card is valid DO
    IF card owner is valid DO
      IF request withdrawal DO
        IF authorization code is valid DO
          query for amount
          IF request <= current balance DO
            IF withdrawal <= available cash DO
              vend currency
              debit account
            ELSE
              message
              terminate session
          ELSE
            message
            terminate session
        ELSE
          message
          terminate session
      ELSE
        IF authorization code is valid DO
          query for amount
          accept envelope through hatch
          credit account
        ELSE
          message
          terminate session
    ELSE
      eat card
  ELSE
    message
  END
(hopefully no transcription errors from copying out of the PDF…)

Yes, it's easy. It gets eaten if the owner is not valid. Took one glance.

The thing that makes me laugh is that the example is contrived. This is the sort of thing code editors solve. Even though I could quickly glance at it and decipher the conditionals by eye, if I were working with this I'd still throw it into an editor that shows me code scope. That makes it trivial and removes any chance of error. This is a problem that has been solved.

I'm sure I will get a lot of responses saying I've missed the point, and yes I'm aware that I've dodged it, but the author acts like the problem of navigating nested conditionals is somehow impossible, which I think is ridiculous. Not only do lots people do it every day, not only are there tools that help us do so, but there's pretty much no way to exist in this world without depending on software that is written this way (ever looked at the source for GCC, to pick one example?) Not saying it's right, but it's clearly not a deadly problem.


Just as taste is a subjective thing as is often hard to articulate... I think my biggest complaint about that code is that I had to read all of it to make sure there weren't other cases where the card gets eaten. And try to follow what was going on.

Taste-wise, an easy fix for this is to do early exits. Instead of nesting, you could easily do (I'm on mobile so no fancy formatting):

If card is not valid: return message

If user is not valid: eat card; return

Etc

Now that's my personal taste. It makes it way simpler to quickly skim and understand what's going on.


> Yes, it's easy. It gets eaten if the owner is not valid. Took one glance.

Nit-pick, but it seems that the card only gets eaten if the card is valid AND the owner is not valid. If the card isn't valid, the owner is never checked and the card isn't eaten.


Just the raw formatting of that could be improved, by cuddling up ELSE IF from separate lines.

Aren't there some END tokens missing? Let me put them in:

  IF card is valid DO
    IF card owner is valid DO
      IF request withdrawal DO
        IF authorization code is valid DO
          query for amount
          IF request <= current balance DO
            IF withdrawal <= available cash DO
              vend currency
              debit account
            ELSE
              message
              terminate session
            END
          ELSE
            message
            terminate session
          END
        ELSE
          message
          terminate session
        END
      ELSE
        IF authorization code is valid DO
          query for amount
          accept envelope through hatch
          credit account
        ELSE
          message
          terminate session
        END
    ELSE
      eat card
    END
  ELSE
    message
  END
Okay, now:

  IF card is valid DO
    IF card owner is valid DO
      IF request withdrawal DO
        IF authorization code is valid DO
          query for amount
          IF request <= current balance DO
            IF withdrawal <= available cash DO
              vend currency
              debit account
            ELSE
              message
              terminate session
            END
          ELSE
            message
            terminate session
          END
        ELSE
          message
          terminate session
        END
      ELSE IF authorization code is valid DO
        query for amount
        accept envelope through hatch
        credit account
      ELSE
        message
        terminate session
      END
    ELSE
      eat card
    END
  ELSE
    message
  END
Okay, just one ELSE IF; not much opportunity for that.


On that topic, I did a somewhat weird thing in C yesterday. I took code like this:

  for (;;) {
    /* original big loop */
  }
and turned it into this:

  if (compatibility_with_old_version) for(;;) {
    /* original big loop: same indentation level! */
  } else for (;;) {
    /* rewritten new loop */
  }
 
I.e.

  if (...) for (...) {

  } else for (...) {

  }
Works for me.


I dunno, the first example seems unsatisfying. The original has that ugly condition, the "good" version seems overly clever. And for all the talk of taste and aesthetics, both versions ignore an elephant in the room: defensively dealing with `entry` being absent from the list.

Not that I've never abused addresses like this. But having written this multiple times, I currently prefer something like this:

  remove_list_entry(entry)
  {
      if (head == entry) {
          head = head->next;
          return;
      }

      for (prev = head;  prev->next;  prev = prev->next) {
          if (prev->next == entry) {
              prev->next = prev->next->next;
              return;
          }
      }
  }
There's still an `if`, but it isn't so bad since it's an early exit. What is "tasteful" about hiding the fact that one of the cases is far simpler than the other? There's conditionals and then there's conditionals. There's cases and then there's cases.

The other benefit of this approach: you eliminate segfaults by construction, because the `while` has been replaced with a much more normal `for` loop iterating through the list. Keeping it normal is 80% of the secret to avoiding security holes.

Rather than appealing to something nebulous like "taste", I prefer to focus on concrete functional benefits. What is a situation where someone using or modifying this function might be subtly led astray? That seems a better guide.

(Earlier version posted on reddit: https://www.reddit.com/r/programming/comments/59cq8r/applyin... )


I don't think anyone's mentioned that there are two functional benefits to Linus's version: testability and performance.

The author touched on testability. Dropping the "if" case eliminates one code path that may contain bugs.

But also, the revised code may perform a lot better if you're searching a lot of short lists. In that case, the target item has a large, though not likely, chance of being first in the list. This kills predictability of that "if" branch. Eliminating it can save ten cycles or so, possibly shaving 25% or so off the runtime of this function for short lists. (For large lists, the branch becomes predictable, and you start having cache issues due to the list's linked nature.)

My current side-project is an NES emulator. Nearly all my (non-algorithmic) performance enhancements have been of this nature: find some unpredictable branch nested inside a tight loop → eliminate the branch with "clever" logic → 25% performance gain on that loop.

(That there are no checks for "error" cases is a distraction. There aren't any in the original example either, because it's an example and the error cases aren't relevant to Linus's point.)


I agree partially w.r.t. testability of branchless code, but there's a flipside too: often the root cause of an edge case is in data-structure shape, so full coverage isn't necessarily sufficient. In other words, if an edge case is "folded into" the normal path via something like Linus' trick, you still need to actually test the case where (in this example) `head` is being removed, or else you aren't certain that the folding of cases is valid. And if you're testing that case, then you'll have full coverage of the original (branchy) code too.

Totally agree w.r.t. performance, though. Even if the branch is never taken, control flow can reduce the scope of other compiler optimizations.


Yes, this is a good point. I had thought of this too and somehow convinced myself that the "head" case disappeared (since the non-"head" case would fail if "indirect" weren't initialized properly), but I'll admit that's a shaky argument.


This comment thread was refreshing to read because it was technically enlightening, devoid of malice, and each party conceding arguments while expanding and clarifying places where it broke down or wasn't clear.

Something we can all strive for professionally and while commenting on HN.


I'd say branch mispredictions and cache misses are the two biggest factors to performance. Almost everything else is noise in comparison. I keep saying if your architecture doesn't account for these, you've wasted 90% of the CPU guaranteed.

I do the exact same techniques in my game engine (sadly proprietary) - even in JavaScript its dead easy to run circles around both Three.js and Babylon.js in terms of performance to the point I'm simply amazed at how fast JavaScript VMs can be if you really push them.

Eliminating branches, cache misses and allocations in the main loop yielded by far the biggest performance boost. (For the curious, we abused ArrayBuffers to guarantee contiguous memory allocations in JavaScript)


I'm not a low level programmer, so this is a honest question.

In the better version, why doesn't the while condition (that must be tested at least once) present predictability problems whereas the if check in parent does?


The first version had a while loop and a condition. On the other hand, the optimized version only has a while loop.


I can see that.

However, if I'm not mistaken, branches are costly only when they're mispredicted. Hence, the question (as I see it) is wether one can better predict the behavior of Linus' better version (which only has one conditional) or that presented above (where there's an early return condition followed by the while condition).

It seems to me that the version above (which has two conditionals but is not like Linus' worse version because it has an early return) should be similarly costly to predict. Consider the situations where either most of the calls are to remove the first element (A) or most of the calls aren't (B). Let's see what happens in these cases:

- On Linus' code, the while is evaluated the first time. In case A, the while is correctly predicted to exit the while, the swap is made and the function ends. In case B, where the while yields true many times before hitting our element, you should have to pay the cost of a misprediction when you finally find it.

- In the early return code above, the situation is... the same. In case A the early return if is predicted correctly to yield true, and the function ends. In case B, the early return if is correctly predicted to yield false and then the while predicts mostly trues until you finally hit your element, time at which you pay the same misprediction penalty than above.

So... aren't both functions similarly expensive in terms of branch misprediction costs? Am I fundamentally wrong on my understanding on this issue?

Thanks for the explanation!


Where is that early return you mention? I don't see the return keyword used in either examples. They both do the exact same while loop, but Linus' version adds a level of indirection to save a conditional later on. You're still touching the same memory in both cases so they should be identical in terms of cache misses.

From my understanding, the first version has two possible branch misprediction points while Linus' version only has one. This will probably only have a visible impact if the function is called in a loop however.

But to me the biggest advantage of the 2nd variant is its simplicity. This is the only way to stay sane with a growing codebase and keep shipping robust code.

Its no use getting a fast function if you can't integrate it optimally with the rest of the codebase. Raw performance isn't at the micro level but the macro one. This is where simplicity becomes critically important.


I was referring to the version in this thread's ancestor, which reads:

  remove_list_entry(entry)
  {
      if (head == entry) {
          head = head->next;
          return;
      }

      for (prev = head;  prev->next;  prev = prev->next) {
          if (prev->next == entry) {
              prev->next = prev->next->next;
              return;
          }
      }
  }


Bog-standard, humdrum enterprise programming != kernel programming

When every branch removed on a mainline path in the kernel saves years of compute time globally and every byte removed in the kernel saves terabytes of memory globally, being clever is absolutely the right thing to do. Removing that one branch takes away an expensive pipeline stall[0]. The version akkartik has above adds a extra branch (and expensive pipeline stall) to _each iteration_ of the loop; Linus would probably not be pleased, to put it mildly, if anyone submitted a patch like that.

And the method Linus shows isn't even particularly clever; it's been a common example for handling linked list insertion in C for decades. All these "Linus is being too clever" postings show is that most programmers aren't cut out to be kernel or embedded programmers.

[0] See https://en.wikipedia.org/wiki/Bubble_(computing) and https://software.intel.com/en-us/articles/avoiding-the-cost-.... And, yes, instruction reordering[1] might be able to fill the slots but you don't get to be where Linus is by relying on that. Plus, Linux runs on processors that don't have OOO, like Intel's Atom and microcontrollers.

[1] https://en.wikipedia.org/wiki/Out-of-order_execution


But now you're making argument about performance, not "good taste". Often, performant code is not very tasty.


Taste is malleable and relative. One can even entertain different notions of taste. I personally have at least two – "understandable" and "performant" – and they aren't always (but sometimes are) exclusive of each other.

I suspect Linus's "taste", given Linux's problem domain, tuned to some combination of performant + testable + reviewable + portable. In the example given, Linus's rewrite meets these requirements; hence his assertion that it is in better taste.


I feel at this point, the notion of 'good taste' has become even more ill-defined, something very personal. Then the notion is not very useful as a coding guideline.


There isn't a domain where taste isn't personal. It's always a blend of art and science. Just because you can't define it exactly though, doesn't mean you can't get closer to it by working at it.


Are you seriously arguing that Linus would think cleanly performant code is not in good taste? You're welcome to ask him that directly and see what he says.

EDIT: Okay, I'm being excessively snarky there; my apologies.

I will say instead that is that "good taste" depends on the context. What's "good taste" for kernel development isn't necessarily the same as "good taste" for enterprise development or the same as "good taste" for app development.


As a rookie programmer (but 10 years in IT, loving maths), this was my thought exactly. Basically a matter of field/domain definition, insofar as humans still have to interact with the code.

In short: most enterprise code should be elegant and simple, even if verbose (many conditionals, as little nesting as possible, etc) so that a 12 years old can understand it, because in real life you'll get much turnover and lack of means/skill/motivation. Hence, costly maintenance, etc.

Writing an open-source kernel consistently for 20+ years is an entirely different problem, and the fact that you use code in both these examples is a distant commonality of the problem sets; it is my experience thus contention that the human factor and real world conditions are paramount to ascribing qualitative assertions such as 'good' or 'tasty'. That is, until machines program themselves or others without human oversight. ;)


Haha, getting bullied does not sound very delicious to me.


Yes, of course, I'm certainly not cut out to be a kernel programmer. Were the audience in that TED chat kernel programmers? I thought we were discussing "taste", not what it takes to get a patch into the kernel. Linus does work on other projects, you know.


Pretty much nailed it in the first line :-)

I prefer the shorter version for being shorter, and for generalizing the update point at the end (and giving it a name), which stresses that there will be an update (delete).

Alas, most of my coworkers would prefer the "explicit" version with all of the extra lines.

Aside: I sure miss identifiers with underscores in them, rather than MashedTogetherEnterpriseSelfCrockumentingMego :-(


Screw 'taste', go for obvious.

If you're at BigCo and your codes going to be maintained by disinterested drones/ random contractors/etc, obvious code is better.

And lack of checking on entry pissed me off too.


I agree with the condition about "If you're at BigCo". It really depends on who's going to read/edit the code, how often, and how many times it's going to be executed.

For a line of code that's going to be executed on a global scale billions of times every second, and only changed (very rarely) by extremely good developers, then I'd happily trade a bit of clarity for a bit more performance.

But if it were an oft-edited method in an in-house application that's going to be supported by junior devs? I would prefer clarity.

As always, there are no universal rules, only tradeoffs.


If you understand C pointers, this is obvious: it keeps track of where the pointer to the current node came from, and replaces it with a pointer to the next node once it reaches the entry to be removed.


I agree, in fact taste might rightfully be defined as that which is obvious to an experienced developer in the environment. Code which simply relies on conventions and avoids complexity is often immediately elegant.


This is a great point but I'm trying not to think disparagingly about the skill level future maintainers, too: deadlines and distraction are a powerful negative equalizer.

The next person to touch that code could be you, in two years after you've mostly moved on to another project / major upgrade and are back in a hurry to make a “quick” change, investigate a security report, deal with a scaling issue, etc


For heaven's sake, everyone, please stop getting hung up on the fact that there's no error checking. Example code usually doesn't show error checking because it obscures the main point of the example.


Not shown: assertions :-)

(which, for kernal code, would be turned off later, anyway)

In languages other than C, with exceptions and stack traces, most precondition checks just become clutter, anyway. Although it would be nice to have "Eiffel" style pre/post-condition (invariant assertion) blocks out-of-line that can be enabled or disabled...


SPARK does that with efficient, system code. Then proves the code free of common errors in C. Then other tools can generate unit tests from the specs just in case. Frama-C does something similar for C. I'm not sure what the usability or thoroughness is like vs SPARK, though.

http://www.spark-2014.org/about


Totally agree, never play code golf with your production code. Your cleverness will bite you in the arse when you can't be promoted or allowed to move to a new project, because your code is so clever that you're the only one who can properly maintain it.


>If you're at BigCo and your codes going to be maintained by disinterested drones/ random contractors/etc, obvious code is better.

But what if you're Linus Torvalds and maintaining the Linux kernel ?


Obvious is still important. Even kernel developers make mistakes.

The 'good taste' example may be more concise but how likely are you to spot bugs in it?


But I think that's part of the argument. Reasoning about edge cases in code full of nested loops and conditionals is hard. Eliminating branches and opportunities to hide fencepost errors (for instance) goes some way towards reducing cognitive noise so you can focus on the meat.

Of course that's not the full argument. And I'll throw in one more: it isn't anywhere near the top of the list in terms of importance, but aesthetically pleasing, elegant code has a certain amount of value, if only to other programmers. (Naturally, it has to be correct, too.)


More likely, because there are fewer code paths to follow. Given Linus's explanation, I think that's part of the basis of his "taste", not conciseness (which in this example I suspect is a byproduct).


> how likely are you to spot bugs in it?

Depends on what you methods are.


Define obvious. At least the pointer pointer version is obvious to me in this task.

Pointer pointer can't be applied everywhere. If you try to reverse a linked list in O(n) time with O(1) space using a pointer pointer, the code would be less obvious than using a prev pointer.


Not sayin' you're wrong, but...

I'm going to go sit in the corner and weep, now.


I say screw 'taste' and 'obvious' and instead go for it has high coverage unit tests and lots of them.

You can always go back and make it look pretty safely with good tests.

I just love code snobs that think their code is so good they don't need tests.


> You can always go back and make it look pretty safely with good tests.

Yeah, but you won't. I think that's part of the reason why people care about "taste". I want to maximize the chances that essentially the first draft is good enough to be maintained for as long as possible, because years of hard-won experience has taught us that it will.


> Yeah, but you won't. I think that's part of the reason why people care about "taste".

Ahh but you eventually will (at least in my experience). If you don't need to the code either doesn't matter (dead code) or works just fine.

I think I have looked at my own companies entire code base a couple of times. Do I see nasty crap... all the time. And I often fix it for good taste when it is really bad but I have to say the value prop isn't very good compared to adding tests or other automated validation (including automated profiling and dead code analysis).

Kernel code and in particular C are sort of special cases because they are hard to test and performance is generally a high concern. But for many other higher level languages and problem domains this is not the case.

I have been meaning to look how at how much code bases change overtime. In the case of Linux I wonder how much of the Kernel code has stayed the same (lets say over a 5 year period). Probably not a ton but I bet for other domains particularly business software the code either just dies or changes dramatically over time.

Then there are I suppose other domains where the code really can't physically change often (ie space probes and embedded devices).... for those systems I really really hope they have lots of tests.


Counterpoint: test code is also code and when unit test tyrants rule a team it can easily break down into a mess of maintenance (which suddenly more than doubles).


Speaking of good "taste" you could apply your "taste" to what are worthy tests and what are not (ie adding just maintenance). The fact of the matter is you need some sort of testing to happen or some sort of proof that your code works on a continuous basis and if you don't have that to happen IMO I'm not going to say it is good code particularly when it is based solely on one persons opinion of what is good looking code. Aesthetics compared to performance, readability and automated testing are pretty low on my list. You may say aesthetics brings those characteristics and it might but lets have some tests to prove it.

As for maintenance and test code being invalid it doesn't have to be as buggy and in fact can be even in done in a data driven style or declaratively. Besides if it is a true unit test you should have written the test. If its your code and you have such good "taste" shouldn't it not "break down into a mess"?

There are plenty of projects like SQL Lite, and jQuery that have excellent "taste" not just because of the quality of code but because of the test coverage. Oh and I won't go downvote people who disagree (which my parent seems to be) because I hold that to be poor in "taste" IMO.


I don't down vote, and in fact only recently noticed I could. I don't know who down voted you.

To the other points, "no tests" is the other extreme from "test everything." I find both views are unacceptable in practice.


Another issue both versions have is that they will not tolerate head being NULL.

Don't be afraid of double pointers, although they may seem overly clever at first glance, they can be really helpful for situations like this. Linus's code is considerably shorter, and not particularly difficult to debug or to understand, assuming you understand double pointers.

Another case where double pointers can be really useful is in the interface for linked lists. I.e.

    remove_list_entry(node** head, node* entry) { [...] }
Defining the interface that way allows properly removing the last entry from the list. Workarounds like making a list struct and passing a pointer to that are just double pointers in disguise. Returning the new head is error prone, since you can easily forget to reassign it.

Double pointers for life!


I have nothing against double pointers: https://news.ycombinator.com/item?id=12798604 (please don't consider the flamey bits there to be directed at you).

That last bit, though, I wouldn't recommend it as a slogan :) How about something like "adapt to the situation for life"?


An early exit is not the worst thing in the world, but an edge case is still an edge case. And if you have full "end of list" checks you make it even worse because now you're doing that in two places too. And if you do something with the entry, now that's in two places.

Sure, the case where it's the head of the list is simpler by itself. But treating it specially means more code, and more duplication, and it's not even faster.

To criticize the cleverness of Linus' code is fine. But the algorithm is the argument here, not the specific way he wrote it.

Edit: Also, I don't really buy into your argument about for loops. You trade one problem for another. Your version won't segfault (except the case where it does), but it will silently fail to remove anything.


1. "The algorithm" is the same in all cases here: a linear scan with a look-behind of 1. We're just talking about how the algorithm gets translated into code.

2. My point wasn't about the `for` loop: https://news.ycombinator.com/item?id=12798192

3. Yes, I don't do any error handling and neither does Linus. I wasn't trying to bring up the importance of error handling. I was trying to bring up the importance of avoiding undefined behavior. "Segfault" was shorthand for that. Dereferencing NULL is undefined behavior. I mostly don't care about "taste", but showing undefined behavior while talking about taste seems egregious.

4. Finally, I don't get nearly as hung up about "duplication" as most programmers: http://www.sandimetz.com/blog/2016/1/20/the-wrong-abstractio.... I realize this is still a minority opinion, but we own the future. (This comment has a bunch of duplicated 'e's, but we both agree that that's unimportant to "DRY" out. Your concerns strike me as similar. When you try to minimize the number of occurrences of "x != NULL" you're doing Hamming compression, not programming.)


This version is more clear for a human, but it does contain duplicate logic.

This line:

          head = head->next;
is another form of this line:

              prev->next = prev->next->next;
Both delete by setting "->next". Code duplication is a common source of bugs, and that's the rationale behind Linus's version, which eliminates duped code.


See my comments regarding blindly removing "duplication" at https://news.ycombinator.com/item?id=12798796 (point 4)


Not only is it duplicate, it also contains nasty "behavior". If prev->next equals NULL then dereferencing this would result in segfault, so in this case another if statement is required.


Like Linus example, this is just an example too and does not cover or comment on all the issues, so there are still elephants lurking in the shadows, just to name a few:

- Can entry be NULL? (segfault)

- Can head be NULL (list is empty)? (segfault)

Since the Linus example has the apparent precondition that the entry to be removed must be present in the list, there are no segfaults with proper API use. Since your code tries to eliminate that precondition, I'd argue "Remove NULL from the list" and "Remove something from an empty list" is something to expect, but you didn't account for that.

Point being: Don't be overly critical of examples; they are just examples.


I guess my fundamental criticism is this: how is a list traversal that doesn't check for the end of list being held up as an example of "good taste"?

You're right that they're all just examples. I'm just a random guy on the internet, and it's easy to push back against me. But we all have the tendency to nod along when someone we respect says something sagely on TED. It seemed useful to point out that the sage has no clothes on in this case. So go ahead, point at my skimpy cladding as well. I won't mind.


I can't see why the code from akkartik would segfault when entry is NULL? What am I missing?


If entry and head are null, segfault on line 4.

If entry is null and head is valid segfault on line 10.

--edit--

As pointed out by my esteemed colleagues below, I didn't read the loop conditional properly. Only the first fault applies


> If entry is null and head is valid segfault on line 10.

To get to line 10, we need prev->next == entry. Since we are talking about the case where entry == NULL, that requires prev->next == NULL

The for loop condition is prev->next, which will fail if prev->next == NULL, hence line 10 would never be reached in that case.

The difficulty of analysing all of this shows why mutable structures and variables make code hard to follow - perhaps the real good taste solution is to use something like finger trees instead of linked lists, together with a language which eliminates null.


How can you reach line 10 when entry is NULL?

(prev->next==entry) can only be true at the end of the list, but the for-condition already checks that.


I agree with you that moving the conditional to the beginning as an early exit keeps things simple, and while I don't see anything unusual about a 'while' loop and feel they have their place, this 'for' definitely works well.

I would make one small tweak, replacing this line

    prev->next = prev->next->next;
with

    prev->next = entry->next;
Just makes it a bit more straightforward to understand (as well as infinitesimally more efficient).


Yes, I'd be ok with a `while` that checked for the end of the list: https://news.ycombinator.com/item?id=12798192

I really try not to be a nazi about this sort of thing. Very few things matter enough to get into a HN conversation about "taste".


Totally, it's not about the one right way to do something. I do think there's some value in putting thought into how to write.. maybe "clean" code would be a less charged word than tasteful.


> both versions ignore an elephant in the room: defensively dealing with `entry` being absent from the list.

Indeed, both versions really have a poor taste. Both are code snippets which can only work under very restrictive assumptions. The main argument `head` is implied and the `entry` to be removed must be present. Such piece of code has to be duplicated and tuned for every list and use case.

I even wonder why Linus bothered to show this example, while he leads one of the most famous software masterpiece.


This isn't an API though. 'head' is fine being implied, and if 'entry' doesn't exist, why is it being passed in ? That's a further code 'smell' to follow up.

Sure, you need to be massively defensive if you're writing an API or boundary, but within an owned codebase, if you're eliminating edge-cases, you should follow them all the way up.

tl;dr if you're calling remove_list_entry, only do so if you have an entry.

Or course, I'm not Linus, so I have no idea of his thought process.


Independently of style, taste, BigCo, edge cases, assumptions and other considerations, I still find a major regression in your version:

Instead of one variable allocation and one test in the loop, your version has one allocation and TWO tests.

Adding weight in loops has a major performance impact.


Oh, absolutely. I have a long history of preferring code to be safe rather than fast[1]. In this case, for any project I had a say in you'd have to convince me with concrete measurements that this particular loop was on the critical path, and then you'd have to write a dozen tests to convince me that it was being used right without the terminal check, and that any future users would get unambiguous errors in debug mode. And after all that, you wouldn't be allowed to claim that taking away one of the conditions is "tasteful".

I actually tend to be pretty lax about coding style, believe it or not. But dereferencing NULL is undefined behavior.

[1] My guru in this regard is DJ Bernstein: https://cr.yp.to/qmail/qmailsec-20071101.pdf


The "bad taste" program of Linus performs two comparisons if it want to remove the first entry of the list. The "good taste" program of Linus, like your program perform only the needed number of comparisons.

For me, your program is similar to the one of Linus where you have unrolled the first iteration of the loop in order to avoid pointer syntax difficulties. Your program makes it also trivial to fix what you called the elephant. I think Linus has omit it for simplicity of example.

I like your program slightly better than the one of Linus because the syntax is less tricky, but I think the point of Linus was only the superfluous comparison in the "bad" one.


I didn't know that whiles were less normal than fors.

And though I don't object to for loops in this situation, I really had to puzzle yours out, because of the way your loop pointer is one behind the place of interest.

I believe you when you say it is segfault-free, but again I would have to puzzle at the code to be sure.


I think I'd be ok with a `while` that looked like this:

  while (prev->next) {
      if (prev->next == entry) break;
      prev = prev->next;
  }
or even this:

  while (prev && prev->next != entry)
      prev = prev->next;
Both are reasonable enough that they wouldn't trigger my (pretty laissez faire) sense of "taste". My core point: checking for the end of the list should be non-negotiable.

The `for` is just extra icing: it creates a scope for `prev`; you can tell at a glance that it's traversing a list. The update is closer to the termination check. But yes, this bit is negotiable.


Btw the second "tasty" solution in the article may not taste well to cache lines. May just be better to do three separate loops.


I'd add something like

    assert (head && entry)
up there.


A NULL head is not an error (= empty list) and 'assert' would be better as a BUG_ON or panic if it is technically an error.


It is an error though if your code SEGFAULT on NULL head (and this one does).

I wouldn't care, but if you claim to be defensive, go full monty.


> It is an error though if your code SEGFAULT on NULL head (and this one does).

There is no "SEGFAULT" in kernel code. SIGSEGV is a signal, which is only used at user-level.

NULL dereference in the kernel results in an "oops".

I believe it is also assumed in their code that "entry" is not going to be NULL, meaning "if (head == entry)" will never succeed if head is NULL.


The code crashes if head is NULL, regardless of entry.


There's a compromise enterprise developers should recognize from CS classes that's almost elegant enough for the kernel crowd: use a sentinel node for the element before the head. The edge case is eliminated, and no more &(*)->; ASCII Cthulhu.


That was my first thought, my basic algorithms book specifically mentions sentinel as a way to eliminate the head boundary condition.

Linus basic point stands although his example isn't very good.


Really, if you think the good version is overly clever, you probably have a very fundamental problem with your abstract reasoning. A very human one, as history is full of people having that problem--and ultimately, the good version taking over anyway.

How about this:

    subtract(n){
        if(n==0){
           return m;
        }
        r=m;
        for(x=0;x<n;x++){
            r--;
        }
        return r;
    }
People for a long time had strong objections to a number "0", because it doesn't make sense to have a number for nothing! Why not just have no number at all! 0 is not a number, it's a special case! Wrapping it up in arithmetic like that is just overly clever! After all, 0 is not a number!

The whole problem with that is the preconceived and completely unjustified notion that 0 is not a number. Just accept that 0 is a number, and suddenly, things become much easier to reason about, much more elegant to work with.

Similarly, the whole problem here is that you have the preconceived and completely unjustified idea that removing the first element is a special case. You can define it to be, just as you can define 0 to be a special case. But you could just as well simply let go of that preconceived idea and accept that it's not, and suddenly all the problems that you imagine are there disappear, and you have simpler code, just as having integer types with a zero makes for simpler code.


".. if you think the good version is overly clever, you probably have a very fundamental problem with your abstract reasoning.."

Just to show that you're talking out your ass, here's a code snippet where I did the address of address thing in precisely the same "remove from linked list" situation. In a VM and programming language and operating system that I wrote from scratch[1].

https://github.com/akkartik/mu/blob/a90faae990/edit/003-shor... [2]

Moreover, I lived with it for months before I decided it wasn't worth the cleverness[3]. In this case. So I have some plausible claim to having found the "simplicity on the other side of complexity" in this case[4].

As a general rule I'd recommend not reaching conclusions about people from things they write in a comment on the internet. You're looking at them through a tiny peephole into their belief system. When in doubt, ask questions. "Excuse me sir, do you have much experience with C?", "Are you saying pointers to pointers are always bad?", and so on.

(Clever analogy there about 0. Totally agree with it. Utterly non-applicable in this instance.)

[1] https://github.com/akkartik/mu

[2] Hard to read, I admit. There's a colorized version at http://akkartik.github.io/mu/html/edit/003-shortcuts.mu.html, but I only host the most recent version there.

[3] https://github.com/akkartik/mu/commit/ea5e7fd4cb from Apr 2016. The previous version was from Sep 2015.

[4] https://en.wikipedia.org/wiki/Shuhari


> Just to show that you're talking out your ass, here's a code snippet where I did the address of address thing in precisely the same "remove from linked list" situation. In a VM and programming language and operating system that I wrote from scratch[1].

I'm no sure what exactly that code/that change is supposed to tell me, so I can't tell whether it's a good argument. Mind condensing it down/pointing out what specifically you mean?

> As a general rule I'd recommend not reaching conclusions about people from things they write in a comment on the internet. You're looking at them through a tiny peephole into their belief system. When in doubt, ask questions. "Excuse me sir, do you have much experience with C?", "Are you saying pointers to pointers are always bad?", and so on.

Well, that depends on whether you want to make a point for other people in the discussion or discuss some issue with a specific person. As such, that was not really a judgement of your person, but rather a general statement about a group of people (which obviously might be wrong in some cases).

> (Clever analogy there about 0. Totally agree with it. Utterly non-applicable in this instance.)

You mean not applicable in the case of your code, or in the code in the article, or what else?

(edit: and yeah, I realize how the wording might give the impression that I was talking specifically to/about you, but I really wasn't, I was just using what you wrote as the opportunity, so I ended up addressing you :-)


:) Thanks for the clarification.

Yeah my link is in a strange syntax, but mostly I just wanted to point at the specific line containing address:address, which is the same as a pointer to a pointer. That line is in a function called delete-before-cursor which uses a (doubly) linked list to maintain the state of a text editor.

I came up with this syntax for a teaching environment: http://akkartik.name/post/mu


I love your project! The of testing pervading the whole system is a very good one. I also like parameters being immutable unless they're a product, that's a great way of making passing references hurt less.

I still prefer the version Linus recommended. It's not for speed - I think it's more understandable. It's more simple (in the way Richard Hickey says, one strand rather than many) and there is less of it. Proving it correct would be easier, not that I actually would.

When people say "clever" to me, I think of code that is being tricky about something. (My canonical example of clever code is Duff's Device.) Linus' example doesn't read as tricky to me, it reads as very straightforward.


Thanks! I'm very happy that you noticed that little feature about immutability so quickly. I'd love to hear from you off-thread if you have any more questions or comments about Mu (email address in my profile).

I tried in my original comment not to bring Linus into it. I didn't want to be just another person on the internet poking holes at a famous person. If the examples had both included a guard for the end of the list I'd have been like, "eh, no accounting for taste, but ok," and moved on in silence. But we shouldn't be teaching/encouraging people to focus on minor considerations like the number of `if`s when there's a larger problem with undefined behavior. Since you already took a look at Mu, here's a place where I try to elaborate on this belief system: https://github.com/akkartik/mu/blob/07b54625f7/001help.cc#L7...


>What is "tasteful" about hiding the fact that one of the cases is far simpler than the other?

The fact that this is an implementation detail and not a universally true fact about the two cases?


For my money, Linus's example of "good taste" gives up rather a lot of clarity to achieve succinctness. The original is simple and clear. His preferred version is shorter, but also harder to understand because of its use of a complicated indirection. And that's not good taste. It's just showing off.

“Programs must be written for people to read, and only incidentally for machines to execute.” ― Harold Abelson, Structure and Interpretation of Computer Programs


Do you do much C programming?

If I put my "Java goggles" on, the first, "untasty" snippet looks so very obviously to be the right way to do it. But...

If I put my "C goggles" on, the "clever" version from Linus looks much more direct, obvious, and better in general...much like the final example from the article author.


While I agree, I also think that it depends on your background. I guess pointer arithmetic should be second nature for a kernel developer? So there might not be much clarity cost then.


I completely agree (tho you're getting downvoted by others).

Linus's code here is 'clever'... but not good, simple code.


It's a long time since I wrote much C but Linus' version seems like idiomatic C to me. The use of pointers in C is an ordinary thing and those who write a lot of C should be fluent in their use.

It's interesting to apply the same technique to other languages. I have to use VB.net most of the time so here are implementations of the tasteless and tasteful versions in VB (untested so there might be bugs). Even in VB the tasteful version is shorter and, I think, clearer.

    Module Module1

      Public Class ListEntry
        Public value As String
        Public [next] As ListEntry
      End Class

      Public Head As ListEntry

      ''' <summary>
      ''' Straight translation of Torvalds' tasteless version.
      ''' </summary>
      ''' <param name="entry"></param>
      Sub RemoveListEntry(entry As ListEntry)

        Dim prev As ListEntry = Nothing
        Dim walk = Head

        ' Walk the list
        While walk IsNot entry
          prev = walk
          walk = walk.next
        End While

        ' Remove the entry by updating the head or the previous entry.
        If prev Is Nothing Then
          Head = entry.next
        Else
          prev.next = entry.next
        End If
      End Sub

      ''' <summary>
      ''' Straight translation of Torvalds' tasteful version.
      ''' </summary>
      ''' <param name="entry"></param>
      Sub RemoveListEntry1(entry As ListEntry)

        Dim indirect = New ListEntry
        indirect.next = Head

        ' Walk the list looking for the thing that points at the thing that we
        ' want to remove.
        While indirect.next IsNot entry
          indirect = indirect.next
        End While

        ' ... and just remove it.
        indirect.next = entry.next

      End Sub
End Module


Actually, I find the double pointer version easier to understand, and incidentally it is also the way always wrote this in C. And I pitied the pascal programmers who had to use the original version.

'Simplify so a fool can understand your code, and you will have fools editing it.'


Pascal has references and pointers too so why would they not be able to do it in a very similar way?


In the original pascal pointers/refs came only out of new (AFAIR), you couldn't point into structures or to local variables. That makes this trick impossible.

'Modern' 'pascals' were different.


I'll grant that you can't do it quite as efficiently as in C but you can do the same trick as I showed for VB.Net (in another post in this thread) and create a new entry that points at the head then use that as your indirect pointer. The problem is caused not by being unable to point at locals or members but by Pascal's pointers being strongly typed so that you can't simply dereference some random word length memory cell to get a new reference.

So not identical but still simpler than the tasteless version.

I thought that this was such a fundamental idea that it should be on Rosetta Code but the page for it at http://rosettacode.org/wiki/Singly-linked_list/Element_remov... is empty.

Perhaps we should all rush over there and fill it in. :-)

Edit: See entry on RC at http://rosettacode.org/wiki/Singly-linked_list/Element_remov...


Taste is dependent on the audience. An audience of kernel maintainers would probably share Linus' taste here, while an audience of enterprise developers would find the original code more palatable.


I think the specific example can create a distraction because of the use of indirection, but that there is a more general point to discuss: at what point is a chunk of code "too clever?"

One argument is that code should be written so that junior or average (or even below average) developers can be put to work on it. Another view is that part of a junior developer's learning experience ought to include not shielding him from alleged complexity or "clever" constructs, because sometimes they really are better or even unavoidable.

Cleverness for its own sake is one thing to avoid, but I think that's almost too subjective a standard to use.


'cleverness for its own sake' often makes it into standard practice later.

And I seriously fail to see the 'for its own sake' here. This cleverness reduces the number of moving parts, does the 'dont repeat yourself' and thus reduces ways modifications of the code could break it due to overseeing a case.

I just shuddered to think how this would be done in java - with a LinkUpdater interface, and a RootLinkUpdater and ElementLinkUpdater, and one instantiation per iteration, and suddenly we have a lot of code around to keep the active code's function simple - to a point where it is completely pointless because of adding a lot of machinery. (And java does collections differently anyway.)


I agree with you, about this specific example and in general.

When I look back on my career, often when I've said something is "clever for its own sake" was only clever because of a lack of understanding on my part, and rarely for its own sake, outside of toys and obscurity competition where such cleverness is the point.


In userspace this is generally true, and it sucks because "good taste" should really be an engineering determination not compromise to human laziness and mediocre slackers.


I was given a piece of advice very early on in my career that I've always been grateful for, which is fundamentally the same as this. IF and FOR are both code smells.

One case of this is just simplifying loops with some functional goodness

  var listOfGoodFoos = new List<Foo>();
  for(var i = 0; i< listOfAllFoos.Count; i++)
  {
     if(listOfAllFoos[i].IsGood)
         listOfGoodFoos.Add(listOfAllFoos[i]);
  }
  return listOfGoodFoos;
VS

  return listOfAllFoos.Where(x => x.IsGood);

But perhaps a more interesting point is it can also be a a sign of DRY gone wrong - two things that aren't actual repeats of each other, but just similar code, are smushed together with lots of IFs hang around to make it actually work.

A connected piece of advice was "it is easier to push things together than pull them apart" so err on the side of assuming two bits of code are not the same, knowing you can refactor them together later (probably only a few hours later) if it turns out they are in fact pretty similar.


Does it come down to having a deeper understanding of what is available? In your example, I guess a new engineer may not know that Where() is available, but a seasoned engineer should know.

I also think that this why the social aspect of code review is so important. It is a knowledge sharing mechanism.

edit: grammar


> I guess a new engineer may not know that Where() is avaiable, but a seasoned engineer should know

I'd guess the opposite. Its only in the last few years that many mainstream languages have adopted this sort of functional syntax. The new engineer probably learned it right away from the docs, while the seasoned guy churns out smelly-old for loops without a second thought.

GP makes a great point about different loop logic being 'smushed together', that seems to be the most common kind of ugly code (and I've been guilty of it). Perhaps that is psychological. for/while makes loops appear expensive, while the functional code hides that detail and the programmer doesn't think of it.


> I'd guess the opposite.

Might depend on how trendy the new feature is. Because I still see newish Java programmers who apparently don't know that java.date.util exists, and who also don't know that rolling your own date math is only done by date manipulation library authors and connoisseurs of slow moving train wrecks.

A second, related but harder to pin down, issue exists here. I see relatively inexperienced developers who, for instance, get a little confidence with Javascript chaining and callbacks, and work themselves into corners where a given filter method (like the 'Where()' example) doesn't quite do what they need, and suddenly they have no idea what to do. So they hack in something awful in the middle, forget that it is a weird homegrown something instead of a library function, and see profoundly strange problems later.

Both of those are things programmers grow out of, hopefully. But they seem to be instances of having APIs with large surface areas causing what amounts to research failures.


In case any other seasoned C++ engineers are worried that they have missed something big in all the new C++ specs, the above code seems to be C#.


It is, sorry,I should have specified. Also I don't think many people would use a loop like that when foreach is available.


For loops, even in C# are used in performance sensitive contexts. I believe things have changed in versions of .NET and if you're using things like Mono that also have its own versions.

Foreach generally (or used to) generate more garbage/allocations. Foreach against some types in some versions of C# (ex: structs) produces no allocations. There are also some ugly mutation related things you could do using for instead of foreach, but I'd call that bad code.

The same things are true on many platforms - foreach is generally preferable for non-peformance sensitive code. In some languages and libraries, foreach constructs will do things similar to auto pointers or reference counted pointers, which makes them safe enough, but also slower. The point is that foreach can take up more memory, cause garbage, or cause allocations/free on some platforms and languages which is not always desirable. For is therefore a better choice if you don't want this behavior. Mostly though, using for instead of foreach is micro-optimizing in these cases. Where I'd usually do it is somewhere critical in my code, like something called a lot per frame game loop for example. If I caught someone using a foreach that generated lots of allocations and is called a lot of times in these contexts, I'd probably kill them (and have).

So in short, I guess I'd agree that "many" people would do it, but it's context dependent which include performance goals, target platform, language limitations, and more. Mostly, it's better to just write the code that is safe and works, and go back and fix any bad decisions like this. If it's obvious though, I don't have a problem with the optimization from beginning.


Having made my fair share (if not more) of terrible (and terribly embarrassing) errors in C using the for-loop, I love foreach and its brethren in other programming languages. I only write classical C-style for loops very, very rarely these days.


Absolutely - it's why the advice was so good as it led me to trying to discover alternative patterns that remove the ifs.

On our team I try and code reviews to both be about code quality and learning. Reasonably often we'll go down mini rabbit holes about different ways things could be written and the various merits of them. We also try and have more than two people reviewing to spread the learning even more. That can make it worthwhile rewriting good sections of code to be worse just to discuss why the original way was better.

As a junior I learnt a great deal from reviewing the senior devs' code.


Well the second snippet hides the loop and conditional which surely are implemented in "Where". Is that a code smell by proxy?


Here's one thing to keep in mind. They both end up doing the same thing, but it's more obvious that the Where implementation is not doing anything subtly different from what you think at first glance.

Without reading the for implementation carefully it's much harder to verify:

* You're looping from start to finish * Over each element * You're only adding the items to the new list that match the conditions.

That all said, I think this swap is only a good idea if the thing you're swapping with is commonly used. Which admittedly is a chicken-and-egg problem, but does largely exclude non-library function transformations or one-off transformations.


Depends which expresses the actual intent more clearly. To my mind, that it's the list of foos that are good is what's important, and the loop and conditional are implementation details.


In a way, yes. On the other hand, it boils down the code that some poor soul of a maintenance programmer needs to deal with to a single line which is more readable, too. Win-Win.


I don't disagree. But then again I've had peers review code similar to this and flag an issue saying it's too clever because that poor soul might not know what "where" is doing or that the use of lambdas is "too complex."


That is certainly something I've also come across. This kind of functional style is very readable once you understand it, but it does have a learning curve just like learning what "for(var i = 0; i< length; i++)" means.

I actually did a little presentation to the team about how to refactor a for loop to an self implemented where/filter function to explain the connection. For me, realising that all the where was doing (or at least potentially, there might be some behind the scenes optimisations going on) was passing in the condition for an IF in a FOR loop suddenly clarified everything for me.

In case anyone's interested, this is these are the code steps (javascript)

  var people = [{name: 'bob', gender: 'male'}, {name: 'dave', gender: 'male'}, {name: 'sarah', gender: 'female'}];
  
  var getAllMen = function(population) {
      var returnList = [];
      for(var i = 0; i< population.length; i++){
          if(population[i].gender === 'male'){
              returnList.push(population[i]);
          }
      }
      return returnList;
  }
  
  var males = getAllMen(population);
  
  ___
  
  
  var getAllMen = function(population) {
      var returnList = [];
      for(var i = 0; i< population.length; i++){
          if(isMan(population[i]){
              returnList.push(population[i]);
          }
      }
      return returnList;
  }
  
  var isMan = function(person){
      return person.gender === 'male';
  }
  
  var males = getAllMen(population);
  
  ___
  
  var filter = function(anyOldList, filterCondition){
      var returnList = [];
      for(var i = 0; i< anyOldList.length; i++){
          if(filterCondition(anyOldList[i]){
              returnList.push(anyOldList[i]);
          }
      }
      return returnList;
  }
  
  var isMan = function(person){
      return person.gender === 'male';
  }
  
  var males = filter(population, isMan);
  
  var females = filter(population, function(person){
      person.gender === 'female';
  });
  
  
  var females = filter(population, person => person.gender === 'female');
  var females = filter(population, x => x.gender === 'female');


I definitely agree with this, but note that the semantics in your code are different from those in Linus's example - your code filters out ALL matches.


This is actually an important point. Quite a few times I've written something that started as a list comprehension, changed it to a lambda later in the sprint and finally to a loop when the requirements became more detailed. - The real world is unfortunately full of business requirements that think that they are special flowers.


Indeed, I should have been clearer this is just a similar example, not equivalent code (and I assume nothing like performant enough for Linus's needs)


I use eight space tabs when I write C, to force myself to think twice about using if, for, and nested loops

(I convert it back to 4 space tabs before I commit anything)


Does your coding style not involved lining function arguments (and other things) that are broken into two lines up vertically? Converting between 8 spaces and 4 would screw this up.


My preferred whitespace code-style is to use tabs for indentation and spaces for alignment. Under no circumstances should two consecutive lines be aligning something (with spaces) and have different levels of indentation (with tabs), I can't imagine why you would ever need to do this, so this has never been an issue for me.


My preferred whitespace code-style is to use tabs for indentation

What does "I convert it back to 4 space tabs before I commit anything" mean, then? If you're putting \t tabs in your file, 4 space tabs vs 8 space tabs refers to the way your editor renders \t. What are you changing before committing, your editor settings?

Edit: oops, just realized you're not the person I was responding to. The convert it back before I commit anything comment suggests he/she is using spaces. Or is very confused.


Sorry, I should have mentioned that I was not the GP. Even if you use spaces rather than tabs, the idea is the same. Do you mean something like the below? Note: \t represents tabbing (not necessarily the number of tabs, just spacing using tabs), and _ represents spaces.

  int f(int x,
  \t\t__int y) // one 4-width tab, pad with spaces
  {
  \t\t// do stuff;
  }

  int f(int x,
  \t\t\t\tint y) // one 8-width tab, whoops
  {
  \t\t\t\t// do stuff;
  }
In this case, you should not be using tabs, since logically you haven't entered a new block yet and as such should not be increasing the indentation. It should be

  int f(int x
  ______int y)
  {
  \t\t// do stuff;
  }


Even if you use spaces rather than tabs, the idea is the same

It's not, because there does not exist an editor that is smart enough to recognize that these 4 spaces are indentation tabs while these 4 spaces are spacing spaces. It's not impossible, but show me an editor that does it today (especially with C++) and I'll eat my hat :D

I agree with you that if you're using actual tabs for indentation it works fine.


>It's not, because there does not exist an editor that is smart enough to recognize that these 4 spaces are indentation tabs while these 4 spaces are spacing spaces.

Oh yes I see your point. All the more reason to use tabs for indentation and spaces for alignment. :)


This also comes easy, if you are using a more expressive language:

   remove e [] = []
   remove e (e:xs) = xs
   remove e (x:xs) = x : remove e xs
This even has the benefit of making it very clear whether you've remembered the case where 'x' is not in the list.


You don't know what that Where() costs you, though. The abstraction in your second snippet does _at least_ as much work as the code in the first snippet _and_ has an unknown level of extra overhead.

Abstractions are not free. Because we can almost always afford the luxury of abstractions, some people forget that there's an engineering tradeoff there. There's always something underneath that nice, pretty-seeming abstraction that has to do the dirty work. Because that abstraction has to be pretty and handle generic situations, it (not always but often) can't be optimized to be more efficient in specific situations.


Zero-cost abstractions are a thing though. Compiler magic can even fuse together multiple calls like this on the same collection into the equivalent regular loop. So it can in fact be faster rather than slower, in the right context.


Like most things, it's all contextual. Were I your manager and we were working on a business app, this might pass code review just fine or I might even suggest what you did here if you had showed me the original.

Another piece of context in a business app might be - "Do we need these values now or later?" That is, should this be lazy, stream results, something like that? I am assuming this is supposed to be C#, but still speaking somewhat generally. You might need to instead yield rather than looping and returning an intermediate list. Better yet, I'd also look for you to return something like a read-only list (ex: IEnuemrable in C#) since this is a query result and that can be done with the first version.

Unfortunately if we had been writing a game though lets say, I would have failed you miserably for the latter version and added 1 mental strike to the "send you packing" count. Why? For loops are most certainly not code smells on most platforms. As a few people said, the mechanics of "Where" in some languages may just be cloaking a for loop. More importantly, for loops tend to be much more performant, can be inlined easier, and generally produce less garbage than lambdas and other similar language constructs. It may different than I last looked, but back when I used to touch C#, for was most certainly preferable for anything performance sensitive provided you didn't need laziness, and if you did, there's still at least "yield" (though at that point, we're approaching what Where can does anyway). I could go on about cache lines, trashing, GC pauses, and so on, but hopefully you get the idea.

Overall, I am afraid I must disagree and point out that your advice you treasure is too generic. FOR and IF are very valuable in certain contexts. I think the real advice someone might have been trying to give you is to aim for more functional constructs, that is functions that are pure/idempotent/referentially transparent, compose, and operate on sets of data where possible rather than individual points of data. Like any advice, this isn't universal and quickly unravels in many situations such as performance-sensitive contexts like game programming, real-time programming, and so on, or may in fact just be the best practice of the target language.


I absolutely agree with you about performance stuff, if you are in a position where milliseconds matter this may well not be the right way of doing things.

Also, to open up another interesting area, Linq operating on IQueryable objects can do some pretty interesting stuff in the background, merging sql queries and the like. I've never looked into if it does anything similar for IEnumerable - although I will should the situation come up.

FOR and IF are indeed valuable, but that doesn't make them not smells. For me a code smell is not an anti-pattern - it is just an indication that something has the potential to be wrong. If you are writing lots of IFs and FORs in your code that might well be fine, but it has a good chance of causing problems. Smells, for me, just mean "justify this decision" - not "never do this".


Things like FOREACH are good sane defaults, so the justification only happens based on what you are doing like in the performance example.

IQueryable actually is a superset of IEnumerable (implements it - https://msdn.microsoft.com/en-us/library/system.linq.iquerya...). It is generally useful if you want the laziness and ORM/ORM-like features. If you are using LINQ with SQL, then it would be better to return something like that for many use-cases. IEnumerable is good for general use of course and when you want to make the assumption that you "don't need" all that stuff which is pretty good one for more general cases where you just want read-only results or lazy read-only results (via things like yield).

I agree about potential. Nested FOR and IFs though raises that potential hugely. There is rarely a reason to be nesting more than a few levels deep. In many decades of programming, of course I have deeply nested, yet I have refactored it each time. I have managed to avoid doing excessive nesting the entire time as have most of my colleagues. I am sure someone can find a piece of code somewhere written for something that simply cannot be refactored well or justifies itself otherwise, but I'm not afraid to say statistically that is easy to avoid and hard to justify. I think the nesting issue just raises a bunch of flags including: performance, composability, reusability, clarity, maintenance, and cloaking. So in that sense I do feel it is what you describe as an anti-pattern more than a code smell.


That is more imperative programming (your first example) vs. functional programming (your second example). The problem with functional programming is that you need to know how the functions are implemented to be able to estimate performance.


Especially for something like kernel code, which was the original example. "Obvious cost" matters at least as much as "obvious code" in that world.


Don't write the block to the if without curly braces on the next line. It's a common source of bugs when the code has to be extended (possibly by someone else), to forget to add the braces.


It's not simplifying it though, it's just hiding the complexity in syntactic sugar. I prefer the first way of doing things vastly over the second. Yeah, it's more code, but it's also more or less "what is actually happening", instead of an euphemism which has to be unpacked.


I disagree, I believe it is properly using abstractions to write simpler, more straight-forward code. I think of syntactic sugar as 1-to-1 replacements. For example, the -> in C and C++ is syntactic sugar for a dereference followed by a field access (ptr->field; (*ptr).field). If it's not a 1-to-1 replacement, then it's more likely to be an actual abstraction.


> I disagree, I believe it is properly using abstractions to write simpler, more straight-forward code.

Straightforward for some readers, or more straightforward for the CPU/GPU?


The first example is code duplication, and almost every line is suspecious.

    var listOfGoodFoos = new List<Foo>(); // is listOfGoodFoos a good name?
    for(var i = 0; i< listOfAllFoos.Count; i++) // is the loop condition correct?
    {
        if(listOfAllFoos[i].IsGood) -- the only interesting line is burried in here
             listOfGoodFoos.Add(listOfAllFoos[i]); -- is i the right index variable? (vs. j, k, n, especially in nested loops)
    }
    return listOfGoodFoos;
The second example is obviously correct, and highlights the condition.

A quote from Tony Hoare comes to mind:

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies.


> it's just hiding the complexity in syntactic sugar.

And that complexity is only written once, instead of multiple times for a single project

You can have as many FOR and IF branches that you want, but only if they are all bugless!


Actually, it's written exactly as often as the original construct.

The idea that writing more than 5 letters, or a for loop here and there, is this giant cesspool source of errors is totally alien to me. Of all the ways in which my programming sucks, this was never among them. So thanks for you permission, I'll use it wisely :P


Yep. As long as the abstraction is tested and functions as advertised it can be very useful.

I'd much rather see a few lambdas in a chain of function calls than blocks of conditionals and loops. Every keyword or bit of code that has to be typed is a potential source of errors, and abstractions of this sort help minimize that.


All programming languages are "euphemisms" in that sense - even C has a lot of layers between it and how modern hardware behaves. It is simplifying if it expresses the important aspects of the result rather than the implementation details of how the machine calculates it.


Sure, but when programinng, what the machine does and how long it takes, sometimes does matter. I have to think of all those jQuery examples floating about out there that just do $('#foo') in several places -- sure, it looks simpler than putting it once into a variable, but from the viewpoint of what the computer ends up doing it's utterly ass-backwards.


There's no reason a `.where` should be slower than an explicit loop though. Indeed it should offer more opportunity for future performance improvements, because it doesn't constrain the implementation with irrelevant details - the compiler is free to e.g. make the loop run backwards, or split the collection into chunks and parallelize, if it figures that that would be faster.


At some point that breaks down though doesn't it. I mean from one perspective a conditional in high level code doesn't describe "what is actually happening, either." That's especially true if an optimizing compiler or interpreter mangled it up.

From another perspective your argument can be applied to any function call in library code.

In either case I don't think your position is that strong.


> I mean from one perspective a conditional in high level code doesn't describe "what is actually happening, either."

Sure, and from another perspective even the most efficient code does nothing to stop the heat death of the universe and is therefore functionally equivalent to doing anything else or nothing at all. However, that's just splitting hairs. It's not really an argument anyway, but rather a preference.


The question is more whether it's a reasonable preference to think syntax X is helpful but syntax Y is hiding things. Thinking deeper might show that the issue is actually familiarity, and the two should be rated the same in terms of abstraction.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: