
Refactoring a 300-line if - 6502nerdface
http://groupserver.org/vanpy-2015/slides.html
======
amichal
... into 1000-ish lines in python and some xml dialects spread over 20ish
files that may or may not do the same thing... covered by 5 tests when there
are clearly more then 5 initial states driving the original if....

I would have left it as the big if.

Source:
[https://github.com/groupserver/gs.group.member.canpost](https://github.com/groupserver/gs.group.member.canpost)

~~~
wyldfire
> 1000-ish lines in python ...

Well, an if/elif/elif block that spans 300 lines is often much more difficult
to comprehend than 100 - ten line functions (especially if all 100 share the
same function signature).

That said, I'm not sure if I prefer the xml configuration option. But the
design is overall likely more flexible now than it was before.

~~~
amichal
If the flexibility was desirable then yes but it by definition makes the code
do more then it did before and is therefore harder to _really_ understand.

In this concrete case, these many functions are now spread over the entire
codebase (spanning many repos). In many classes and inheritance trees and
driven by a ton of configuration so there are now many many more
possibilities. Here is one implementation
[https://github.com/groupserver/gs.group.type.discussion/blob...](https://github.com/groupserver/gs.group.type.discussion/blob/master/gs/group/type/discussion/requiredpropertiescanpostrule.py#L95-L116)
which i don't find very readable personally

Reading the README in the first repo I can now see the flexibility and
coupling the rules more closely to their objects was a design goal. So in
fairness the new code is doing a lot more then (what i imagine to be) old code
to be doing. Also I now note this code is 3+ years old so the zope dependency
(which i know nothing about) makes some more sense.

------
Scarblac
That was an exciting ride! First an if statement that was longer than one
would want, then he was splitting things off into small bits and a loop to
make it look neat and dynamic, and then it _exploded into bullshit_ and
stopped looking like Python altogether.

Guess it happens.

~~~
wmt
Complexity tamed! With
Products.CustomUserFolder.interfaces.IGSUser.interfaces.IGSAnnouncementGroup.

~~~
mtdewcmu
The irony is that the documentation[1] is now 335 lines.

[1]
[https://github.com/groupserver/gs.group.member.canpost/blob/...](https://github.com/groupserver/gs.group.member.canpost/blob/master/README.rst)

~~~
incepted
That's pretty much unavoidable for complex problems.

What you want to do is make sure you don't need another 300 lines to implement
that problem, or that if you do, that code is easy to read and to maintain.

~~~
mtdewcmu
It's not a complex problem.

------
iamleppert
Wow, everything I hate about "Enterprise Software" in a single if statement.
Have you ever noticed there are entire businesses built on fine-grained access
control like this, that are utterly pointless bullshit? Probably full of
security holes, leaky abstractions, poorly tested and responsible for creating
a lot of support cases and manual intervention.

Figure out what the trust modality in your application is, and normally it's a
simple graph or can be managed with a straight-forward application of
world/group/user UNIX-style permissions.

~~~
hippo8
Try explaining that to management types. The whole problem with "Enterprise
Software" is its targeted at the manager types, you tell them they are now
going to able to generate 10 reports, tell them there's an app for it the
would throw money at you. They don't care about the people who has to deal
with the system, they don't care it takes 10 more additional steps to do
something that was really easy to do, throw in a bunch of jargon and tell them
its enterprise ready and cloud based, with a touch of big data. BAM! sold!

------
tcdent
I don't care how "broken" my multi-line if statement is, I would _never_ add
Zope as a requirement.

Also, singleton (yuck) accessed via a helper function: yuck. Just import it.

This starts to make sense where OP professes fondness of XML: we have a Java
developer!

~~~
Scarblac
The application it's part of (GroupServer) is based on Zope, so it's there
already and everybody working on it will be familiar with it.

That said, I don't understand yet why its code should be split into 9 pages of
separate repositories (see
[https://github.com/groupserver/](https://github.com/groupserver/))...

------
mmagin
That's not a 300 line if statement, it's broken down into many functions.

But anyway, I see the world of enterprise Java has reached Python.

~~~
incepted
> world of enterprise Java

By that, you mean code that is fast, easy to maintain, typed, tested and
scalable?

~~~
gravypod
Many people who don't use Java refer to Enterprise Java as the subsection of
code written by people who know nothing of efficiency and maintainability
within large companies.

IE: Using XMLs for things that they are not needed for.

------
incepted
He replaced a huge if/else with a rule engine.

The idea is sound since that's exactly what rule engines are for but I feel
that if you don't also post the final listing of the rules you end up with,
you haven't really made the case that you improved the code.

------
bcoates
If you want to actually solve this problem without doing the thing in the
slideshow or shooting yourself, when you see

    
    
        if(self.predicate_function())
          foo()
        else
          bar()
    

and self.predicate_function() is only called once in the program (this is the
case several times in the slideshow example), try pulling the if statement
into the function (and simplify), making a

    
    
        self.handle_case_something()
    

function instead. If the predicate is called more than once, consider
subclassing. You're already doing OO, may as well do it right.

~~~
fleitz
Exactly it seems like you could replace all this code with:

send("#{group.type}_notification") or its python equivalent.

Maybe a hash table with strings as keys and functions as values for a less
'scripty' language.

Perhaps I'm thinking the variable group is of type group, but it also looks
like it suffers from poor separation of concerns... eg. Why are groups sending
email?

Probably also a good candidate for guards instead of nested ifs

------
mtdewcmu
Btw, here is a simpler technique I've used for shortening over-long if-
statements:

Replace

    
    
      if(A) {
        // 300 lines of code...
      }
      else {
        return FALSE;
      }
    

with

    
    
      if(!A) {
        return FALSE;
      }
      // 300 lines of code...

~~~
animal531
I've been using this pattern for ages, but mainly for the amount of
indentation that it saves me.

------
shanemhansen
Huh. Started out with a valid problem, but the solution was terrible. The
validity of the question doesn't prove the correctness of the answer.

~~~
mtdewcmu
I disagree that the problem was valid. The fact that code doesn't look right
to somebody isn't a problem, it's just somebody's opinion. Beauty is in the
eye of the beholder.

------
_asummers
The reduce(..., True) function kind of bothers me in this code. Python has the
all(...) function to achieve that functionality.

~~~
evmar
It bugged me too. Here's the equivalent logic without even using all (which is
definitely the right approach):

    
    
        for rule in self.rules:
            if not rule.canPost:
                return False
        return True
    

That's the same number of lines of code, doesn't require two imports, and is
way easier to follow. I love functional programming but Python (intentionally,
see reduce in py3) is not a language for doing it.

~~~
jacobolus

        return all(rule.canPost for rule in self.rules)

------
justizin
I tried several times to refactor a 10k+ line switch statement which had a
particular case that by all logic should be unreachable, but had a comment
something like this:

    
    
      /* I don't know why, but if you remove this everything breaks */
    

I wonder if that is still there, it's been over ten years.

~~~
tunesmith
I got bit by something like that last year - a previous case above it had some
logic without a break. I removed a case that switched on a class that no
longer existed, but had a break. It was effectively stopping the fallthrough
from the case that was a couple of screens above. Stuff broke, and I cursed
the no-longer-employed developer that felt the need to be clever two years
previous.

Java and PHP allow cases that have logic and no break/return, and it's
(strongly) arguable that it's a bug. I think C# doesn't allow that. It allows
fall-through, but only if the case is empty.

~~~
to3m
In C# you can use `goto' to go to any other case as required - including the
one that's next, syntactically speaking. So if you need a fall-through from a
non-empty case, you can do that, but you do have to be explicit about it.

~~~
acveilleux
And when you take out the case that is "not needed anymore" the compiler will
yell at you. I like it.

------
roma1n
Is the logic equivalent though? In the initial example, there were at least
four outcomes (post, notify not a member, notify not a posting member, notify
group closed) which seems too many to map on a boolean.

------
explorigin
Perhaps a better tool to refactor any such large if-statement: [http://drakon-
editor.sourceforge.net/](http://drakon-editor.sourceforge.net/)

------
fleitz
Typical case of code providing policy instead of mechanism, the solution of
course being to provide policy over numerous classes, rules engines, etc
instead of writing a few methods that provide mechanism.

eg. Why is this method concerned with whether the notification is email or
something else...

Are persons not also the case of a group with one member?

It seems less like refactoring and more like hiding complexity in frameworks.

------
Raphmedia
Did anyone figure out how to zoom the slides?

~~~
eugenekolo2
Save yourself the headache of these kind of slideshows and press the 0 (zero)
at the bottom.

------
gravypod
If all you are doing is trying a bunch of cases (or filters) on some data, why
not just use all and a list comprehension on functions.

all([postAllowFilter() for postAllowFilter in filters]) Then just take a list
of all of your functions.

Really, there just should have been a restructuring of the initial code flow.
It didn't look too messy.

------
mtrn
I noticed a variable named retval in a few slides and was irritated for a
moment.

Maybe because I stopped using variable names like "retval", "data" or even
"result" \- because they convey so little meaning.

~~~
logicallee
>or even "result" \- because they convey so little meaning.

How can anyone write a sin function like:

    
    
      result = sin(degrees*PI/180);
      return result
    

when you OBVIOUSLY should be writing:

    
    
      sin_of_degrees_times_pi_divided_by_180 =    sin(number_of_degrees_in_degrees*PI/180)
      return sin_of_degrees_times_pi_divided_by_180
    

One is unreadable garbage. (Result? WTF does that mean)? While the other is
clean, maintainabe code. Anyone can tell at a glance what
sin_of_degrees_times_pi_divided_by_180 holds. But result? In a function that
calculates sin values? You might as well label your variables OX00001,
OX000002 and so forth.

~~~
jessaustin
This is a weak example, because obviously it should be a one-liner:

    
    
      return sin(degrees*PI/180);

~~~
logicallee
I disagree, and think people should use lots and lots of temporary variables.
variables like "retval", "data" or even "result", as quoted in GP.

I have zero qualms about writing two lines

    
    
      bool passes = /*stuff goes here*/
      if (passes) {
    
         }
    

whereas it is trivial that you could fold this into one line. the ONLY thing
you're adding is the term "passes". Which most certainly _is_ meaningful to
the human reading it.

Compare:

    
    
      if (!(i%2))
    

with

    
    
      bool isodd = i%2;
      if (!isodd)
    

which test do you think you'll accidentally flip the parity of?

Likewise I find "retval", "data", etc, to be perfectly fine variable
placeholders while you treat the thing for a few lines.

~~~
jessaustin
In particular situations "wasted" vars like this are helpful. (although I
agree with sibling comment that informatively-named functions are usually a
better way to communicate meaning than extra vars) If this is your general
rule, however, your code will be like twice as long as it needs to be.

~~~
logicallee
i'll take twice as long over twice as wide any day of the week :)

------
donarb
Then there's the problem of parens on every `if` statement, not needed in
Python unless logic requires them.

------
qnaal
"should have just used perl"

