Hacker News new | past | comments | ask | show | jobs | submit login
What's wrong with this code, really? (cvmountain.com)
298 points by KiwiCoder on Sept 16, 2011 | hide | past | web | favorite | 196 comments

About the McConnell quote: "Inefficient programmers tend to experiment randomly until they find a combination that seems to work." The essence of this quote is being passed around quite often these days.

When you first start programming, you generally have no idea what the hell you are doing. You learn all these strange, abstract concepts best, by experimenting.

It's easy to dismiss people "jiggling things around until they work", as lesser, more inefficient or just plain bad programmers. Just remember that you were once like that too.

I think there should be more patience among the experienced, for the programmers who are still learning the basics.

I think there's a fundamental concept that isn't taught very well, and that concept is: Computers only ever do (for something like 99.999999999% of instructions) exactly what they are told to do.

They don't have a mind of their own, and programming isn't magic. Opaque languages, libraries and APIs don't help the situation either. I wonder how many programmers start out under the assumption that computers are more-or-less magic?

That summarizes the familiar high-level world of recent times. (But then I'm also old enough to remember when compiler gremlins were ubiquitous.)

The lower-level world can be quite a bit more unseemly, what with the unpredictability of occasionally marginal voltages or power supplies, of RAS recovery and the occasional RAS and ECC errors, of sections of system buses lacking ED/EDC protection, the "fun" that is radioactive RAM and embedded alpha emitters (particularly "entertaining" if you don't have solid EDC), bus transients, and a host of other hardware gremlins.

Looking at the lowest levels of these boxes, I something marvel that these modern computing boxes even work at all, much less as well as they do.

And that's why I say that it works for the vast majority of instructions that are executed. There is that slim chance that it will flub up, but that kind of error either: A) gets caught in the hardware or B) completely takes down the operating system or app.

The likelihood of some cosmic ray flipping a bit of a counter variable in some loop's sub-millisecond lifespan is just so unimaginably small you can count it impossible.

Physical objects only ever do exactly what they're 'told' to do, period. Rocks don't have a mind of their own, and physics isn't magic. Yet I don't want to live in a world where we don't spend time twiddling with dials and poking around.

I agree with the thrust of your statement, but it's worth keeping in mind that while physics isn't magic it is not deterministic either, once you start working with really small things.

I've noticed this "magic" thing too, among the most clueless members of our profession.

Even if they have no idea how bad they are, which they usually don't.

You see this a lot (even on this forum) with people referring to Rails "magic", when it can't possibly be anything but a bunch of lines of code being executed according to the rules of a certain interpreter.

To paraphrase Agatha Heterodyne, any insufficiently analyzed technology is indistinguishable from magic. I think their terminology is pretty reasonable.

Isn't this identical to what the original post I replied to was saying - those he refers to as the "clueless" have not sufficiently analyzed enough technology to distinguish it from magic. I wouldn't go so far as calling people clueless though, thinking of some things as magical black boxes frees up mental space.

Rails magic is vastly different from the magic referred to in the gp. Rails uses a number of dynamic approaches that create results that in other frameworks would take some amount of explicit configuration. The word "magic" in this context is ironic and merely means "dynamic".

Understood, but I would caution that even an inexperienced, inefficient programmer should still understand what their code does and why it works. The best way the more experienced can exercise patience is by giving inexperienced programmers enough time to figure it out rather than pressuring them to flip random switches frantically.

I think that just touches on documentation, extensive commenting, and generally writing straightforward code. I didn't really know the importance of this until I had to step-by-step rewrite my thesis after it fell apart completely. Now, I make sure to document at the least a small explanation of what my lines of code mean and the end result, no matter how simple.

With a little time, inexperienced and inefficent programmers can understand often understand the crappy code they wrote just fine. What they can't do is avoid writting it in the first place. For most of them if you say great this works now make it better they will have zero understanding in what direction they should try and improve things.

it's called experience for a reason - the coder has to try (and fail) until they learn enough to code good, efficient, clear code

> It's easy to dismiss people "jiggling things around until they work", as lesser, more inefficient or just plain bad programmers. Just remember that you were once like that too.

Sure. But you're not ready for much more than an internship in the software industry (if even that), if that's the phase of programming you're at. The point is that you aren't yet ready to write production code if that's your approach.

That depends what you are jiggling around :-) If you can't immediately make basic loops work, then yes, you shouldn't be anywhere near production code yet.

But if it's starting to use some advanced OO concept, a new protocol or similar, experimentation and learning-by-doing is a very valid approach, even if you are highly experienced.

If it was not alright to code something without understanding every single layer from the highest to the lowest level, not many programmers would get anything done.

Except maybe Linus Thorvalds or Steve McConnell :-)

Nothing wrong with having patience and helping new programmers, but my compassion ends there. :) I didn't know what I was doing when I started, but I learned it's worth the effort to understand _why_ something works once it does. Not to try is passing the buck, and I've spent plenty of time on the receiving end. I've seen "shotgun" code written by developers with 20 years of experience, and boy does it suck to maintain that:

"Wow, this code is messed up. This whole project is messed up. Wait, there's a weird edge case the author must not have considered. Hmmm, that has some side effects too. Crap, this can't be on purpose, but a bunch of other code touches this thing. Does any of it rely on this behavior?"

With any luck, after burning a day studying a thousand lines of code, you realize that none of it does anything useful, and the bug you were chasing is hiding somewhere else. (Ok, I'm a little better at debugging than that, but you see the point.)

After they're experienced, they can do this:

A novice was trying to fix a broken Lisp machine by turning the power off and on.

Knight, seeing what the student was doing, spoke sternly: “You cannot fix a machine by just power-cycling it with no understanding of what is going wrong.”

Knight turned the machine off and on.

The machine worked.

Programmers who are still learning the basics should not be employed as programmers.

I blame JavaSchools for this.

Computing education should be started in Scheme or Python, without all that accidental complexity-- "public static void main(String[] args)"-- littered around the first program. To absorb all of that requires more than should be expected before people can write a first program. JavaSchools get people used to magic incantations that create a culture where inadequate knowledge is the norm.

After learning the essentials, one can move on to languages with powerful type systems (Ocaml, Scala, Haskell) and the discussion of OOP and when it's appropriate and when not.

People should be starting with simple programs and knowing exactly what they do, rather than creating classes without a clue what a class is and why the concept is important (much less where not to use object-oriented programming).

The argument for starting CS in Java is that it makes grads more employable. Frankly, I would never hire (for a programming position) someone who took one CS course and stopped taking more. The whole raison d'etre for starting CS education in Java is bogus, and I think the practice is actually harmful. As much as I hate the factory metaphor in education, let's "produce" 20 programmers who actually understand what they are doing instead of 200 who don't have a clue.

I don't think it's just a Java thing. It's entirely possible to do this in pretty much any language. For one thing, I saw this kind of approach quite regularly from some programmers many years before Java even existed.

Also,I'm currently teaching myself Haskell as a hobby, and I'm quite regularly going through the "jiggling things round until they work, and then discovering a far more elegant way of doing the same thing" process. I don't feel bad about this - my professional programming days are many years behind me now, and I find that knocking something up and then revisiting it when you've understood more is a great way to learn a language (just not a particularly good way to develop professional applications).

My CS degree program was primarily taught in Java. This was not a problem. Language syntax had little to do with conceptual understanding. I appreciated graduating with a firm grasp of a popular language's syntax and core libraries in additional to an understanding of CS theory.

Can you name a school that teaches a student only one CS course expecting them to be a programmer?

> My CS degree program was primarily taught in Java. Same here

> This was not a problem. This was not either, for me. It was for ~80% of the class though.

> Language syntax had little to do with conceptual understanding. A uselessly verbose, initially incantational syntax adds useless friction, which just makes many people minds grind.

My experience in teaching languages is that Java and C# were atrocious grind generators whereas Python, Ruby and C were much smoother. JavaScript sits in the middle.

Can you name a school that teaches a student only one CS course expecting them to be a programmer?

This is never explicit, but the reason many schools make Java the default language is because it's more "employable" than Python or Scheme.

If you're a CS major, you've probably been exposed to Lisp or Python in your AI class, C in your OS course, and ML and Prolog in your PL class and the "default language" matters, from a CV perspective, a lot less because people've been exposed to more languages. The default language matters in this way for those who take one or two CS courses and then try out for programming jobs in the future.

Deleting from a container while you are iterating over it should always raise red flags.

Depends on what you want to accomplish and what your container and iterators support. If your Iterator offers a delete method, I think it's a very elegant and clear way to filter a container.

  for (Iterator it = container.iterator(); it.hasNext(); )
    if (!predicate(it.next())(
It's more ugly and error prone if you've got to juggle an index, though.

Even with plain C, you can just iterate backwards:

    for (i = ctr_size(container); i > 0; i--)
        if (!predicate(container, i - 1))
            ctr_remove(container, i - 1);

Use the goes-to-zero operator.

  for (i = container.size; i-->0;)
    if(deletep(container, i))

Reminds me of something :)

    for (i = ctr_size(container); i--; )
        if (!predicate(container, i))
            ctr_remove(container, i);

That's arguably better. Nitpick: start at ctr_size(container) - 1. [Feel free to edit your post, and I'll just delete this one.]

Nope. The decrement is occurring in the test, so it occurs after the initialization and before the first iteration.

  filter predicate xs

Haskell, I assume. Two things really make Haskell shine when it comes to problems like this: 1) higher-order list operations -- you're not iterating through indices, so there is no possibility for off-by-one errors, and 2) immutability -- your code takes a new list with the undesirable elements removed; you can't remove elements of a structure at the same time you're iterating through it.

I always love the problems on SPOJ where they give you the number of cases up front, because in Haskell you can almost always throw out that value. Your map function knows when the list is out of elements.

Yeah, I was thinking Haskell, but any functional language will do. Problems like the one in the blog post really make you appreciate them.

The snippet I posted can easily be made into a generic function<T> that takes a collection<T> and a predicate<T>. For example, the Apache Commons library offers such a function for Java, so where available the code comes down to the fairly similar

  CollectionUtils.filter(collection, predicate);
Of course aside from being a bit more verbose, predicate needs to be an object (often a singleton), because you can't pass around functions.

It's not as concise as Haskell, but the addition of FP capabilities to C# make it a lot more pleasant as a practical language than it was in earlier days.

var filteredCollection = collection.Where(x => predicate(x));

As far as I remember you can write that as

  var filteredCollection = collection.Where(predicate);
I agree that C# is a fundamentally usable language.

l = [i for i in l if predicate(i)]

  while(container.Size > 0)
No need for variables.

No need for looping :)

Yeah sure, :) , but I meant for those other cases when there are no Clear() method. :)

The correct idiom is delete list head until list is empty.

The iterator may become invalid if the collection it derived from changes.

Iterator delete methods are crazy to begin with since iteration does not correlate with deletion. But anyway it is not clear where the iterator's cursor will point after you delete the current element. You could end up deleting every second element in the container.

My example wasn't clearing the list, it's a filter. And depending on what language and library you're talking about, it's very clear where the cursor will point to after a delete. There's nothing crazy about it with an even halfway sane collection framework.

Discussion about iterators removing anything aside, I don't think your example disagrees with the parents point. His point was that iterating over a containing while removing items should raise red flags. And it should, always. You should always inspect such loops with more scrutiny because more things are going on, and it's surprisingly easy to get such code correct as multiple things are changing out from under you.

In the systems class I TAed, when students had memory corruption problems, removing items while iterating over a linked list was at the top of my list of things to look for.

Isn't clear() itself a "while (not empty) delete element"?

A while loop isn't in itself an iterator. There's a difference between

  while(collection.Count > 0) 
.. which is a working implementation of clear, though probably superfluous.


  int i = 0;
  while(i < collection.Count) 
which does try to walk along the list and is broken.

Also, .Net has a specific exception to stop you modifying a collection while an enumerator is walking along it. Google for "Collection was modified; enumeration operation may not execute"

Maybe, maybe not - depends on the container.

But thats not the point - while (not empty) is not iterating over the loop, so there is no danger of corrupting the iterator.

Very unlikely. There class implementation can almost certainly do it with an O-1 operation, like "self._internal_count = 0".

or self._tabs = new List();

It might be. It also might be a "new List<TabPage>" or even some faster, internal construct. I don't think .NET makes a promise here, though I could be wrong.

Especially when you're re-calculating the count on each iteration :) Inefficient and incorrect. Double whammy. While not as efficient as while(--), the fix get's the point across:

    for (int i=0,j=this.MyControl.TabPages.Count; i < j; i++) {

The problem is that .Count decrements every time you .Remove(). Either change [i] to [0] in your code, or initialize i with .Count-1 and decrement in the loop.

For the record, the second way is typically better: It is usually more (sometimes much more) efficient to remove from the end of a list/array than to remove from the beginning.

True for arrays. Very wrong for singly-linked lists, as in lisp, haskell, erlang, etc.

I'm genuinely curious -- why? The last time I coded my own linked-list, it was doubly-linked, so deleting either the head or the tail was exactly the same.

Its heavily implementation dependent, but for many implementations of lists and arrays shift is slower than pop. Shift tends to require moving the entire array around, while pop does not. It's rarely the case with doubly-linked lists, or with perl arrays (they do something special, keep the starting offset recorded or something)

For you case deleting from either end ought to be fine, but you've made the other implicit tradeoff because merely accessing items in the middle of a linked list will be slow. In the case of something like a JavaScript array, removing from the front is 80% slower than removing from the end:


Same deal with Python lists. From the Python spec:


It is also possible to use a list as a queue, where the first element added is the first element retrieved (“first-in, first-out”); however, lists are not efficient for this purpose. While appends and pops from the end of list are fast, doing inserts or pops from the beginning of a list is slow (because all of the other elements have to be shifted by one).

Yikes. That's quite a difference in performance ... what I still don't understand though is, why? I've done a little bit of searching and I can't find anything so far on how JS arrays are implemented. Since JS arrays are objects and inherently support things like out-of-order indexes, non-integer indexes and that kind of thing, maybe we can assume it's some kind of hash map? A tree would make sense too, for faster accesses, and if a pop() on the tree meant removing the root node (and rebalancing the tree), then I suppose that would make sense, too.

But, honestly, for the majority of stuff in Javascript, I'd be surprised if some kind of hybrid hash-map / ordered skip list weren't being used instead. Ordered skip lists can be about as fast as a binary tree, without the costs associated with rebalancing, and a lot less complexity. You'd have some tradeoffs in memory usage depending on how you want to tune your skip list, but given the absurd memory requirements for modern software, that doesn't seem to be a consideration amongst programmers anymore.

So ... I accept that removing items from the head of a list in these higher-level languages is (a lot) more expensive. But I still don't get why.

JS arrays are implemented in a variety of ways in different JS engines, but at least V8 and Spidermonkey have a "fast case" where there's an actual C array (single contiguous chunk of memory) and fall back on a slower representation in the rare cases (large holes in the array, non-integer indices, etc).

So in the common case, you are in fact looking at a big memcpy every time you remove an object from the front, unless you do some magic with keeping track of a nonzero offset in your C array. V8 does that magic in some cases but not others, as far as I can tell.

Thanks! That explains a lot. I'm actually really surprised that they're implemented that way (pleasantly surprised) -- I didn't think even "high performance" code was generally done like that anymore.

Most of the time it is simply because a shift operation has to re-address each element in the array (if it is implemented like a ho-hum, classic array) and a pop operation does not have to do this.

I think the assumption was that even arrays indexed by integers are stored as JS Objects, which are basically a hashmap style structure, and so they wouldn't be implemented like a ho-hum, classic array. I'm not sure if that assumption is right, but if it is, there would be no need to re-address each element.

If you don't re-address each element, how do you map the value at index 1 to index 0?

One way is to leave the indexes intact and keep an offset around, so you may map the nth logical index to the nth physical one. (Add 1 on a shift, subtract 1 on an unshift.)

But if you wanted this behaviour, there are more efficient ways: http://en.wikipedia.org/wiki/Circular_buffer

Linked lists aren't used very often in languages that have easy (memory managed, or resizable) access to arrays, unless the specific characteristics of linked lists are desired. Arrays are faster for almost all operations for most collection lengths that come up in human-provided data (i.e. less than 10, almost certainly less than 100 - and in this case, tab pages, very probably only 3 or 4). And on the flipside, if the idiom of the language is to prefer arrays over linked lists, it naturally follows that you tend to want to clear such collections from the end rather than the start.

In terms of performance, another consideration may be important here: invalidation and redrawing of the UI. Controls like tab pages may update the UI for every modification of the tab collection (unless updates have been suspended). Removing from the end will look slightly more pleasant than removal from the start in this case.

And for singly linked lists, deleting the last would be O(n), while the first would be O(1) (just change the head to point to the next node).

Parent has a point for arrays/arraylists, though, you need to copy everything after the element.

Singly linked lists can still maintain a tail and before-tail pointer, maintaining O(1) tail operations.

So after you delete the last element, how do you get the new before-tail pointer?

If you recalculate it right there, you've actually done nothing in terms of the algorithmic complexity. If you defer it either until it's needed or until you next enumerate the list, then you get to O(1) in the case of individual removes at the end (as long as they're interspersed with other operations), but you're still O(n^2) for removing the entire list starting at the tail.

Sigh. You're right. It's amazing how many linked lists I've implemented, and how often I still can screw it up. A singly-linked list with a head and tail pointer allows O(1) tail insertion, but still has O(n) tail deletion.

For a list there's not likely to be a difference. For an array you have to shift all the elements after the one you're removing down by one place; if you're removing from the start of the list you have to adjust every item in the list, but if you're removing from end you don't have to adjust any items.

Won't this.MyControl.TabPages[i] be invalid once i becomes greater than what this.MyControl.TabPages.Count now is? Since the "always remove element 0" code from the article works, I expect your code won't work because when you remove an element form the list, all subsequent elements now have an index one less than before, so after j/2 elements, you'll overrun the list.

Correct. I think the logical thing to do would be to delete from the end so that the index is always valid.

for (int i=this.MyControl.TabPages.Count - 1; i > 0; i--) { this.MyControl.TabPages.Remove(this.MyControl.TabPages[i]); }

Imagine the count is 2. The first iteration you delete item 1, the second you delete item 0, and then the loop exits.

EDIT: Actually, as someone else pointed out, it's clearer to use a while loop that deletes the 0th item until the collection is empty.

Actually, you will never delete the final element with that code.

I think what you meant was:

for (int i=this.MyControl.TabPages.Count; i > 0; i--) { this.MyControl.TabPages.Remove(this.MyControl.TabPages[i-1]); }

I prefer:

    for (int i=this.MyControl.TabPages.Count - 1; i >= 0; i--)  {
Though a simple while loop is much easier to follow, even if its less efficient than removing the elements in reverse.

Although your point on efficiency stands (at least with data structures that have to reshuffle contents on deletion), the type of loop that you use has nothing to do with the order that you delete the elements. You could easily do something like:

while (MyControl.TabPages.Count > 0) { MyControl.TabPages.RemoveAt(MyControl.TabPages.Count-1); }

For loop are nothing more than while loops with:

(1) an assignment (int i = MyControl.TabPages.Count in this case)

(2) an extra command (i-- in this case) added to the end

Sure. Though if we're talking about efficiency, I would imagine that counting backwards would be slightly faster than getting the current count each time. Though in reality this depends on way too many factors - I imagine TabPages would be stored in cache and getting the count is just as fast as counting backwards. Micro-optimization and all that.

Regarding for vs while, I find the choice is important only in the intent they emphasize: while puts emphasis on the condition, whereas for puts the emphasis on the iteration. I think in this case the condition (that the list is not empty) is deserves more emphasis than the iteration through the elements of said list - hence why I find the while version to be more readable. YMMV and all that :)

Exactly. For example in JavaScript doing a something.length would result in re-counting the number of elements, while doing the loop in this style would efficiently store the count in a variable that is fast to access and manipulate. I suppose if you want to be a real optimization junky you'd also use --i instead of i--.

I suppose if you want to be a real optimization junky you'd also use --i instead of i--.

Ten years ago, sure, but nowadays I trust the compiler to do this for me ;-)

That's also what I read of it.

It's TFA's method, except broken (or not fixed, word it as you prefer).

The article's point about writing code that does what it says it does is fine. But as an interview question, I have trouble imagining that candidates won't figure out that this is a game of "guess the answer I'm looking for" and say that they would rewrite this code.

A better question would be, there's tremendous deadline pressure, the company is in imminent danger of losing a giant deal if we don't have working code for some demo, and you have three features to implement by Monday afternoon. Do you write a new feature immediately, open a ticket for refactoring this loop and then write a new feature, or rehearse your explanation to the big boss that over the lifetime of the software, rewriting the code before adding a new feature was more important?


Just kidding, but trying to make the point that "what do you think of this code" is a little obvious as an interview question.

Most do say exactly that, then we have a chat about what they might do instead. Many immediately say, "um, is there a Clear or RemoveAll method?!" It's all over within a few minutes, we move on. The bigger picture is always going to trump the details, until the day the details have piled up and can no longer be ignored. It's the great big technical-debt elephant in the room.

I think saying "yes, there's a Clear, but what would you do if there weren't", or "yes, there's a Clear - how would you implement it if you made the library?". Interview questions where the right answer is "I know an API or assume something about it correctly" are stupid, but questions where that is the right answer with a more interesting follow-up are not.

The answer isn't "I know an API or assume something about it correctly". The answer is "Gawd that's fucking awful, I can't believe there's not a clear() method. I'd go look, before reinventing the wheel. And if lib was internal I'd suggest adding the clear() method instead of adding horrible loops all over the code <mentions DRY>. And if 3rd party lib I'd question the quality and if we really should be using that lib".

In otherwords showing confidence, humility, communications, and that they have a clue about quality.

Yes, that would be a great answer, and ideally they get to the part you mentioned about adding it to an internal library, to which the next question is, Ok, you're adding it, how do you write it? My point is merely that it's still worthwhile to see an implementation, rather than moving on when you hear the word "Clear".

Any implementation coded up on the fly, from memory in 30 min or less is crap. I don't want to hire people willing to write crap, so I don task them to.

I expected that would be the case, just wanted to make it obvious. Meta point: Context matters for code as well as interview questions :-)

The correct answer is "This code looks ugly, and bad".

Thank you for making this as simple as it should be.

Agreed. "What do you think of this code?" is not a hard question.

Surely the right answer would be something to the effect of "If this code survived in the source base until deployment, all is lost anyway."

The point of code review is to catch monsters like this when they are written. If that doesn't happen, what's the point?

That's a very purist view of the world.

This code works and, as he says, was produced by a coder under time pressure. You show me a system and pretty much I'll show you a system that has poor (but working) code in it produced by a competent developer under pressure.

And code review is a useful process but it's no guarantee that issues will be caught any more than system testing or user acceptance testing.

And yet the world keeps on spinning and all is not lost.

I agree, you'd hope this was caught, but if someone gave that answer in an interview even aside from the tone, I'd wonder how much real world experience they had.

If the candidate picks up on the fact that the code is ugly and should be rewritten, and then is able to explain why, then I think the question is worth it. It weeds out the people who honestly don't see anything wrong with that kind of code.

I am not a programmer, so I did not know about the clear() method. I thought it was a rather elegant solution! I guess it's all about perspective.

The problem with this code is that it is quite obfuscated. The "i" variable is essentially always 0, because it is being incremented and decremented over and over. Here is code that does the same thing, but without the useless variable:

   while( this.MyControl.TabPages.Count >0)
      this.MyControl.TabPages.Remove ( this.MyControl.TabPages[0] );

Anyone else read it and on the first pass think, "yeah that works". Then on second pass think, "it's not a good idea, but it works." Then finally think, "under pressure I've done worse; at least this works as intended. S/he should probably comment it."

Or is it only me?


I would also add that even as a junior program, Clear() was easily learned within the first few minutes and usually when you have to use a hack like this it's because something has gone wrong. I wouldn't necessarily chalk this up to inexperience or deadline it could honestly be there was a bug and this was the only way to get it to work.

No, I pretty much just recoiled in horror immediately.

Even if I had to avoid .Clear(), there are immediately-obvious, better ways.

  while ( !thing.Empty() )
    thing.Remove( 0 );
In fact, the code isn't just bad; it's risky. What if someone changes "int" to "uint"? It wouldn't even infinite loop / crash... It would remove exactly one element!

What if someone changes "int" to "uint"? It wouldn't even infinite loop / crash... It would remove exactly one element!

I can't see that. If i is a uint, at the end of the first cycle, i-- changes i to the maximum uint, then i++ changes i to 0, then i < Count compares Count to 0. It is identical to, if uglier than, the following:

  while ( 0 < this.MyControl.TabPages.Count )
     this.MyControl.TabPages.Remove ( this.MyControl.TabPages[0] );

Oh, yep. That's correct, thanks.

Just for the record, your method is inefficient for a lot of implementations of lists/arrays, where it is often far faster to remove from the end than to remove from the beginning.

This is the second time you said that in this thread, now I can't restrain myself anymore - 'list' != 'array', and it all depends on how they're implemented; I see no reason to assert that for 'most implementations' removing from the end is faster. For a single linked list, it's faster to remove from the start. For a double linked list, it doesn't matter (well it depends, it could be slower). For a 'we call it a list but all containers are really hashes', it doesn't matter. For a regular C array, it depends on how you work with it.

He post you are actually responding to answers your questions. Javascript and Python are mentioned.

"Lists" in python are not in fact lists (as the term is traditionally used to refer to linked lists). They are array backed. I expect "arrays" in javascript are similarly arrays...

roel_v is correct to point out this misunderstanding.

But we are talking about tabs here. Premature optimization is the root of all evil (to throw another cliché).


I really like your insight about the accumulation of decisions over the course of a project - I've never thought about it in quite that way, that by having one default decision or the other, your project's destiny is completely different. Picking a sane default answer to that question for a project should be thought of as one of the (many) important jobs of its creator.

edit: I really don't understand why the parent comment was downvoted and then deleted.

Spoken by a man who has never encountered a "tabbed notebook"-style UI with multiple rows of tabs, stacked on top of each other. ;)

Premature optimization is the root of all evil, but removing from the beginning of any array-based growable structure rather than the beginning will turn a O(n) operation into an O(n^2) operation.

And you don't know what's going into those tabs. That cliché shouldn't be used to excuse positively brain-dead choices.

but removing from the beginning of any array-based growable structure rather than the beginning will turn a O(n) operation into an O(n^2) operation.

I'll bite: in arrays, front deletion (like back deletion) can be done in O(1) time. Keep a pointer to the first element, and just increment it upon front deletion (and free the element if necessary). Obviously, you may want to be smart to avoid memory use getting out of hand.

And you don't know what's going into those tabs.

If the array stores references (pointers), which you'd expect with tabs, that is orthogonal to the discussion. Whatever is going into those tabs, a pointer is a pointer.

> I'll bite: in arrays, front deletion (like back deletion) can be done in O(1) time. Keep a pointer to the first element, and just increment it upon front deletion (and free the element if necessary). Obviously, you may want to be smart to avoid memory use getting out of hand.

Actually that's how they often do array-based queues and deques. To help with memory use they let the tail pointer overflow through the end (so you could have tail at index 2 and head at index 8).

Yep, that's correct. I wish "thing.Length - 1" were as clear as "0", but it's not.

Besides, if you really cared about efficiency, you wouldn't be using growable containers... =)

There's usually a Remove() with no arguments that depending on the data-structure involved will remove from either the beginning or the end (depending on how it is more efficient for said data structure).

If I'd had to resort to iteration for this, I would have included a prominent, bitchy comment/complaint pointing out how stupid it was that there was no Clear() call or that the Clear() call wouldn't work in this case, etc.

Modifying a loop variable inside a for() loop is generally a very bad idea, that's a red flag for me.

I agree I think this implementation is completely sloppy. At the same time I guess I've been desensitized to the word "wrong". To me, wrong just means the results are wrong or it doesn't work. In this instance it actually does work, but there are dozens of other ways that would have worked without giving you the Coding Horror face.

The code was unacceptable from line 1 onwards, and would not pass code review with me on watch.

For one, for loops have invariants.

Next up, you are altering the loop bound in the loop. I do not care if the code works, the code is hard to reason about when it goes all non-linear like that.

Finally, there is a goddamn Clear method.

That is the kind of code you see the next morning and delete; hoping none of your peers review the version control log that closely.

Perhaps the programmer wanted to run test functions on the objects as he deleted them? Or perhaps the data structure is a vector of pointers in C++, in which case using the standard clear method would introduce a massive memory leak.

There are a lot of reasons why someone doing difficult work with complex objects would use a loop to delete them, and using invariants is only possible if you have immutable data structures.

> Perhaps the programmer wanted to run test functions on the objects as he deleted them?

Doesn't justify anything that was done.

> Or perhaps the data structure is a vector of pointers in C++, in which case using the standard clear method would introduce a massive memory leak.

Doesn't justify anything that was done.

> There are a lot of reasons why someone doing difficult work with complex objects would use a loop to delete them

Then use a while loop, and make this explicit. While loops have tests, for loops have invariants. There is a reason both exist, and it's not purely historical.

> and using invariants is only possible if you have immutable data structures.


If someone used clear on a vector of pointers AS RECOMMENDED IN THIS ARTICLE they've just introduced a serious, serious problem.

readability < does it work

"readability < does it work"

No. In a professional setting, readability ~= does it work.

The code presented works almost by accident. It is clear there is no thought put into it at all. It is confusing, and subsequent modifications could cause errors very easily. It is not acceptable code.

Of course, code needs to work. Of course, if this .Clear method is a memory leak you wouldn't use it. Of course, we need to consider these things. But there is no situation in which this code is acceptable. This code would not even warrant a passing grade in an "Introduction To Programming" class.

> Of course, if this .Clear method is a memory leak you wouldn't use it.

Yes. Exactly.

> using invariants is only possible if you have immutable data structures.

Wow, dude, maybe if you don't know what the words mean, you shouldn't argue about them.

What business do you have using a vector of bare pointers anyway? If you use smart pointers, the container's clear() will automatically free resources held by container elements.

Desire to avoid garbage collection and automatic memory management mostly.

Like the blog author, I thought it was obvious what the problem was: Every time I read that, I'm going to have to figure out what it means. Any time there's a problem or change to code in that area, I have to stop and understand what it's doing.

To clean it up, I'd do 1 of 2 things: Either write a .clear() function, or rewrite it to start at the end and clear the items in reverse.

With the .clear() function, I can at least ignore it because it was tested and worked. (You do write tests, right?)

With it in reverse, it's something I've done numerous times because of how lists work in certain languages. I'd instantly recognize that it's going backwards because the list always starts at 0.

If I wrote it in reverse, I'd also write a comment about why it's in reverse, though, so that anyone else can instantly know why, as well.

Another option would be to run a while() loop that keeps running till the current .count() value is == 0.

The other nice thing about running it in reverse is that the .count method (assuming it's a method and not a property), needs only to be accessed in the initial condition setup.

If you're accessing the size of, say, a linked list, you end up sneaking an O(n^2) runtime because it has to re-count the size of the list every iteration to check for termination.

c.f. https://secure.wikimedia.org/wikipedia/en/wiki/Schlemiel_the...

> If you're accessing the size of, say, a linked list, you end up sneaking an O(n^2) runtime because it has to re-count the size of the list every iteration to check for termination.

I'd expect any decent linked list implementation to keep an integer with the current size. Java certainly does: http://www.docjar.com/html/api/java/util/LinkedList.java.htm...

> If you're accessing the size of, say, a linked list, you end up sneaking an O(n^2) runtime because it has to re-count the size of the list every iteration to check for termination.

You're still going to run in O(n^2) if you remove from the back (as you'd do if you run in reverse).

Linked list are far more efficient if you build in reverse but remove in iteration order (just replace your head pointer with head.next)

Though you have to be careful with that, because the obvious code for removing from a collection in reverse:

  for(i = list.size; i >= 0; i--)
Is itself a schlemiel algorithm on any singly-linked list...

You're removing i+1 elements here :)

That really depends on the implementation of the linked list. I don't think it would actually behave O(n^2) in most implementations.

for(i = list.size; i--;) list.removeAt(i);

Besides a .Clear(), wouldn't the second most obvious be to have a count variable external to the loop that is initialized to the number of tabs before entering the loop? Then you still count up, and are able to reach the end of the list.

Either way, we have just come up with 3 much clearer solutions in what I would guess is at most 5 minutes between us. I would guess we have the luxury of it not being 9pm at night and working for our jobs. I was happy to see the author include note of that rather than just call the original programmer an idiot for not knowing how to write maintainable code.

I'm sure we've all had our 'Why did I code that so badly???' moments. ;) I stopped getting mad at people for bad code quite a while back. It doesn't help either of us. Instead, I just fix it. The bonus to me is the little rush from improving something. Thank goodness for that little rush.

seems more like continuous rush :-(

This is broken in a different way (assuming you mean something like the following):

  int c = this.MyControl.TabPages.Count;
  for ( int i=0; i < c; i++ )
     this.MyControl.TabPages.Remove ( this.MyControl.TabPages[i] );
Let's say you have 4 items. That'll go something like this:

  Original list: (0,1,2,3)
  Remove 0    Item 0 removed, items (1,2,3) become (0,1,2)
  Remove 1    Item 1 (originally 2) removed, items (0,2) become (0,1)
  Remove 2    Only two elements left in the list (0 and 1), so there is no index 2 anymore...

Great post - I can get pretty OCD about the way code is written, but I have a hard time complaining about things like this without feeling like a dick since as you pointed out it's not particularly inefficient (even if there was an O(1) Clear() method, how many TabPages are we really working with that it would matter?) But you've reassured me that it's a reasonable thing to do, especially if I know beforehand that it's a piece of code that will probably be used for a long time.

That said, I've learned that with an early stage startup where you're trying to iterate as quickly as possible in a desperate attempt to get somebody to care about your product, you often have to pick your battles. Just yesterday I came across this:

    category_count = []
    for i in range(10):
        category_count.append( db.execute("SELECT count(*) FROM table WHERE category = %d" % i) )
For one thing, this iterate separately and then append to list approach in Python annoys me slightly (list comprehensions are so much cooler!). But far worse, it hits the DB 10 times instead of once, and no matter how small your site is you obviously can't be having that. How'd it get there? Who knows. It was written in the Django ORM, where the only way to do this is with a pretty obscure command like Object.values('category').annotate(count=Count('category')). At first we were picking up Django as we went, so at the time whoever wrote it probably had no idea that the values() or annotate() methods even existed, and the way it was written got something up on the page and working so we could decide whether or not we'd be throwing it out the next week. But, whatever, you come across something like this, go throw up, fix it and move on. And finding these kinds of issues puts into perspective smaller ones like using a for loop where you meant to use a while loop.

tl;dr - Having the luxury of sexy-ing up your your code as described in the post is strongly dependent on the stage that the project/company is in.

The code in question:

  for ( int i=0 ; i < this.MyControl.TabPages.Count ; i++ )
     this.MyControl.TabPages.Remove ( this.MyControl.TabPages[i] );
Nice analysis into the thinking that went into creating such bad code. Took me a while to even see the i-- at the bottom.

One of the advantages I find in pair-programming is that the 'navigator' (the one who's not typing) will often catch this kind of thing as it is happening. "Hey, that's just a while loop, or better yet just use `Clear`". There are other (greater) advantages to pairing, but this example is something we typically avoid before it gets committed.

That chart of development costs ignores the fact that only successful projects get maintained. Many projects simply get abandoned before they ever gain traction and at that point code quality becomes meaningless.

That's a good point. And if the project consists of mainly bad code, it probably contributes to the project being abandoned, as it gets harder and harder to fix bugs and add new features in a code base like that.

Most projects are abandoned due to political reasons. Very rarely do they get dismissed because someone didn't code to whatever standard of the moment.

Exactly if you spend 2 years and 10 million developing some new HR software that IBM spends 5million / year supporting for the next 18 years then that's a success even if you spent 90% in support in fact the better the project is the more likely for you to spend more money supporting it after the initial release. The only way to reduce that cost is to spend so long designing your software that it’s never actually released.

So it's a win-win?

Whenever a loop index is modified inside a for loop, it should raise immediate red flags. Also, with intellisense in Visual Studio, it shouldn't take more than a few seconds to check if there is .Clear() or .RemoveAll() method.

That said, I've been guilty of doing stupid things like this many times when I'm tired and just want the damn thing to work. Its amazing the kind of errors you make in situations like that.

Yup. A for loop's index should be completely controlled by the control clause. Modifying that control variable in loop should trigger air raid sirens. It wouldn't surprise me if static analyzers triggered an error on that.

I think this is more of an indictment of how we code rather than the programmer. There is nothing fundamentally wrong with code-by-experimentation. With libraries and frameworks growing in complexity over the last decade or so, it's all but impossible to hold all the details in your head of whatever piece of abstraction you're computing with. With dynamic languages and REPLs, it becomes even more standard to experiment until we get the correct result.

The problem is that imperative programming is horrible for code-by-experimentation. You end up with code that works, but is hideously unreadable. Declarative styles can help greatly with this. Functional programming can be a big boon here. But I think we're going to need a fundamental shift soon in either tool quality (say, to automatically refactor that shit code into the most straightforward and readable way), or a new paradigm that will allow code-by-experimentation to always result in readable code.

When pressed for a deadline, a demo, a functioning "something" - I do stuff that might not be the right way to do things because I need to get it to work. If the code ever has a chance of being witnessed by someone else, I always try to:

/* Can't find a clear/remove method, don't have time to screw around with it now, might revisit later, might not. Sorry. */

I should add that I often make these comments even if my code doesn't have a chance of anyone else looking at it. I forget way too often why I did something and need my own reminders. For all we know, this guy may have actually had a good reason to do this (work-around, faster, ...) - a simple qualifying comment for doing something out of the ordinary would have explained it all.

I find this a lot with HTML guys I work with. They add a few px of padding to the top of something to get it vertically centered on Chrome and then it's broken on IE. Then I tell them to fix IE and then it's broken on Chrome.

Then it's two more hours of screwing around until they get it right. Then I show it to them on a notebook with a different DPI setting...

"Let’s pretend for a moment that we are a harassed contract programmer working late, under intense pressure to deliver working code before we can go home."

I would hope that in those kinds of situations I would remember to add a FIXME comment so that I would come back in saner times and make it nice.

Dependent on the state of mind, one might even have done something like this (as I can't tell the language used in the example):

this.MyControl.TabPages = new Array();

Try to find something like this when the problem you are confronted with is that a server has to be rebooted every few hours because it eats up memory.

What's wrong with the code, then, is that it was written under a little too much pressure for the developer to think clearly.

    for ( int i=0 ; i < this.MyControl.TabPages.Count ; )
       this.MyControl.TabPages.Remove ( this.MyControl.TabPages[i] );

Why not just:

  while (this.MyControl.TabPages.Count > 0)
    this.MyControl.TabPages.Remove ( 0 );

Indeed. But the point I was making was that this hypothetical coder introduced more complexity by adding "i--;" whereas the code could equally well be fixed, in the same number of keystrokes, while reducing complexity, by removing "i++;".

Very inefficient on most array lists (but a very good idea on linked lists)

How many tabs do you use?

Less than 50?

Then don't worry about the optimization until you have to port it to a PDP10.

> Less than 50?

No, over nine thousands.

> Less than 50?

No, way more than that.

  while (this.MyControl.TabPages.Count > 0)
    this.MyControl.TabPages.Remove ( this.MyControl.TabPages.Count-1 );

Or Even this.MyControl.TabPages.Clear();

As soon as I saw that snippet I could see what's wrong.

In C# / .Net you can't remove an element from an enumerator while you're enumerating through it. You can remove the last element however, as it's the final loop the enumerator isn't used again so it won't throw an error. The original developer probably tried to remove it forward only first, encountered an error and wrote the code to loop through it backwards, using the random tweaking technique.

What's rather depressing is that a lot of developers I've encountered use the random tweaking methodology, instead of figuring out what's really happening.

Actually, that's not the problem :)

The can't-modify-a-collection-while-enumerating-it issue only comes into play if you're actually using an enumerator (either directly, or as part of a foreach loop) - the code in the article uses a plain for loop along with indexing into the collection, and wouldn't run into the problem.

Rather, the primary "issue" is that without that "i--" at the end it only removes half the elements - after removing an element, all the following elements shift back one index, and so the very next element never gets removed.

Sorry, I should have been clearer. I was getting flashbacks to when I saw a similar problem (except in a for each loop), that's what set the alarm bells ringing in my head.

When I see nasty code like that, I tend to stop parsing it fully and sniff out the intent. I think it's a form of bad code blindness (like banner ad blindness) my brain is protecting me from all the bad code I've seen. If I fully parsed all the really bad code properly I’d become a dribbling wreck. :) So I tend to look at it at a higher level instead to stay sane.

1. No iterator is being used here, so the iterator coherence check does not come into play

2. List elements are removed from the front, the code is essentially a complicated version of:

    while (0 < this.MyControl.TabPages.Count) {

I've also found it depressing that you can't remove an element fron an enumerator while you're enumerating through it

Why? I like to think library developers have my back. What's the point in an iterator, it its broken on the 1st thing I try? (Ok the 2nd thing; 1st I code a search through the list, then the teardown)

Iterators are also pretty much broken when 1) inserting into a list, 2) merging a list into a list, 3) deleting an item from a list.

Ok, they're pretty much broken for absolutely everything BUT searching lists.

I remember the profound disappointment I felt when Java 1 came out, and its iterators were this same lame junk.

So I never use them. They are born to create bugs like this one.

Yeah well except that that's not even related to what's happening here. Good thing though that you spotted it immediately and that you're smarter than all the developers you encounter ^_^

Well, when working on a code base, the first thing I think is who wrote this and when. When they wrote this how experienced was the developer? What's there coding style or was it written after coding standards were introduced? I've had this misfortune of inheriting a lot of bad legacy code in the past and when you see a chunk of bad code, you can either spend a good chunk of time figuring what it's really doing or less time figuring out the intent. Depending on who wrote the code, what it does, and various other factors you can decide which path to take.

With this code snippet it's obvious that the intent is to remove all the elements. So it can be fixed with a .Clear();

From a higher level, you can see that the snippet smells bad, and will need some attention.

I didn't say I'm smarter than all the developers I encounter. I've just encountered a lot of bad developers in my time.

If the code compiles and does what it is supposed to do, then the answer is that nothing is 'wrong' with it.

Writing code that does what it is supposed to do is often not the challenge of software engineering - but writing code that can be easily tested, refactored, altered and ultimately understood by other developers is the harder part.

The conditional statement used in the 'for' loop whose value can not easily be determined is not helpful and the i--; is 'unusual'.

In any case, it is more useful to code review the unit tests than the code itself.

Consider this:

for ( int i=0 ; i < this.MyControl.TabPages.Count ; i++ ) { try { this.MyControl.TabPages.Remove (this.MyControl.TabPages[i] ); i--; } catch(Exception e) { } }

Can this be simplified? If this syntax were used elsewhere in the program, but we didn't want to catch the exception in this particular case, should we copy-paste this, and remove the try block? Might there be a situation where preserving the syntax makes the program clearer?

OK, fairly newbie coder here..

What would be wrong with just setting a variable to the value of 'this.MyControl.TabPages.Count' outside of the for loop and refering to this?? ie;

    var x = this.MyControl.TabPages.Count;
    for ( int i=0 ; i < x ; i++ )
       this.MyControl.TabPages.Remove ( this.MyControl.TabPages[i] );
as a quick fix, or if someone did not know while loops or clear function??

Well, what happens when you try to remove the 10th page when there's only 1 left?

As a rule of thumb, don't ever rely on indexation in a collection if you do random deletes. Usually you'll just blow up your app gracelessly. Sometimes, epic failure ensues.

Through some feats of logic we might deduce that there's always a first element, though, until the collection is empty. So you might do this inside the loop: MyControl.TabPages.RemoveAt(0)

Needless to say, calling Remove when you have the bloody index (on IList collections that is) is counter-productive.

And I guess that code would be okay. I mean, if you head to phrase it : let x be the number of elements in the list, take out the head of the list that many times.

However, is it really the fastest way to clear a list? No. If we could access the class internals, we could just replace the store with a new empty array. Voila, O(1) clear and the garbage collector takes out the trash for you.

Generally, though, don't spend time worrying about implementation if you already have one available. When you've done optimizing all of your stuff (which is never the case), then you could go on to suggest changes to the standard library.

Say we start with three tabs, so i goes 0, 1, 2.

When i reaches 2, two tabs have already been removed, so there is only one tab left. So in the loop body we then do:

    this.MyControl.TabPages[2] // oh no!
In general, modifying a collection is a bit of a code smell, and a lot of iterator implementations will actually throw exceptions if you try it.

OK, i didn't think through the code.. rather i meant to iterate downwards;

    var x = this.MyControl.TabPages.Count;
    for ( int i=x ; i >0 ; i-- )
       this.MyControl.TabPages.Remove ( this.MyControl.TabPages[i-1] );

Related to the part about software maintenance in the last part of the post I recently wrote about "visualizing cost and improvement areas for software maintenance" here: http://marten.gustafson.pp.se/content/visualizing-cost-and-i...

So what's the "correct" way to do this?

Clearing the entire array can usually be accomplished easily, but what if you want to remove only items matching some condition?

Looping backwards, perhaps?

    for ( int i=this.MyControl.TabPages.Count-1 ; i >= 0 ; i-- )
       this.MyControl.TabPages.Remove ( this.MyControl.TabPages[i] );

As I mentioned above, the best way to keep items matching a condition is

  filter predicate xs

In case you don't speak Haskell:

    IEnumerable<T>.Where(Func<T, bool> predicate)
It's quite simple to use:

    var odd_numbers = numbers.Where( n => n%2 == 1 );

He got underpaid and bad boss right, but more likely this could be due to frustation. I have seen a coder who uses many different ways to code simple things, for example in a code he used (a and b), (a+b>=2), (1-a*b), and several other ways to do the same thing.

When I was asked to justify flagging this code

Good grief. You are in a sorry environment. What are you doing there? I will fire anyone that writes code like that and checks it in, and then fire anyone who objects to me flagging it in a code review.

I've seen this construct used back-in-the-day in C, but usually with conditional expressions before the remove. Perhaps it was just a crap bit of code from day one, but it's possible that devolved to that state over time.

It's bad because you have to write a long blog post about it explaining all the pitfalls, compare good to bad programers, talk about maintenance cost, etc etc.

This code:

for i in 1.100: print i

There's nothing to talk about, it's crystal clear.

You need not to be a genius to immediately notice that doing i-- inside a for loop is.. OK just not very smart. ^_^

Changing a loop counter in a loop is always a bad idea? You must implement only very simple algorithms.

There are more than one loop expression. Idiomatic usage of a for loop doesn't include changing counter variable in a loop's body. ^_^

nice writeup, could really feel your pain/obsession (not a bad thing) -- the sketchiness of the codeblock from the get go was cringeworthy for me too - harks to marking CS assignments and the like back in the day. what a way to start my day too. blech

My initial thought was...

If it has a .Count() and a .Remove(), it should have a .Clear()

wow - my first thought was something unprintable. Took a few minutes of staring at it before I could figure out what the code was doing.

brlewis has the winning answer

Sequence points, kids. Learn to recognize them.

Interesting that you say what's wrong with the code, but don't actually say how it should be done right.

Just in case you're talking to me, HarrietJones, I do actually say how it could be done: with the Clear() method.

that example code is terrible, it deserved no analysis of any kind.

I'm more worried that there are a host of comments, both on the blog and here, by people with enough time to think this through who clearly aren't doing so.

At the very least, I would have hoped everyone had read through the article.

Wow, I would never make a for loop like that. It is weird and breaks convention, and has a high chance of causing a bug.

This is perfectly valid code. You cannot modify a collection which is being iterated safely so it's the best way to handle the situation.

The code works, the problem is that it's not very obvious why it's going about it the way it does. You look at it and right away ask yourself "wtf, did I miss something?" simply because it's so unusual.

You might look at it for 2 minutes and figure it out, but that's 2 minutes too long for what's actually being accomplished. The problem is, if you were to come back and look at it again 6 months from now, it would take another 2 minutes.

The point of the article isn't really the method that was used to get to the result, but rather the fact that code should be made easy to read and understand, because +60% of the time is spent maintaining it. I usually tell this to newbie programmers, "code is meant for people to read, machines understand on/off".

Even if you need to borrow such a convoluted approach to clear a collection (as opposed to the more direct clear() method), there are simpler and more readable alternatives:

  while(Pages.count > 0){

That's what comments are for. If something is ambiguous, then you should comment it.

As for your approach, I do like that better.

Allocating an int?

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact