
Writing code with multiple overlapping side effects with a straight face - miqkt
https://blogs.msdn.microsoft.com/oldnewthing/20170719-00/?p=96645
======
jxramos
"...who firmly believed that the terser your code, the faster it ran. The
author crammed multiple side effects into a single expression, used ternary
operators like they were going out of style, and generally believed that run
time was proportional to the number of semicolons executed, and every variable
killed a puppy."

How does one go about countering that mentality in some rule of thumb fashion
if at all possible? Could a disassembly illuminate identical push/pop
operations on the memory stack as explicit intermediate variables would? Maybe
a simple intermediate result expansion timing profile side by side the terse
code's profile?

~~~
mikekchar
I once went to a code review in a very, very large (but now defunct)
corporation. The author of the code did not allocation the memory they were
writing into. There were 8 people at the code review. I could not convince a
single person that writing into unallocated memory was a bad idea. "Allocating
the memory would be slow", they countered. "And besides, we've tested it and
it doesn't crash".

Sometimes you can help people become better programmers. Sometimes you can't.
Try not to stick around in positions where you don't have the
ability/opportunity to make a positive contribution, is my best advice.

But assuming that the person is willing to listen to you, all of the above
seem like good approaches.

~~~
phkahler
What memory were they using? How did they get a pointer to it? Perhaps there
was reason to believe it would not be used but they had lost that reason to
time. For example on many micro controllers you can have the linker reserve
space for various things (mapping structures over memory mapped peripherals is
common) so maybe the memory they were using was guaranteed to be available
somehow. Or perhaps they really were clueless.

~~~
mikekchar
Keep in mind that this is more than 20 years ago, so my recollection of the
details may be (probably are!) completely wrong. But this is what I recall.

It was an embedded system with a proprietary language. The compiler didn't
clear the variables when they were created (again, for performance reasons).
For some crazy reason, the language allowed you to declare a pointer without
initialising it. So what would happen is that the pointer would end up being
whatever happened to be on the stack previously. The reason that it didn't
crash in testing was because most of the code in the system was essentially
copy and pasted from other parts of the system. As long as the previously run
function had a pointer in the same location that happened to allocate some
memory in the heap, and then deallocate it, this function would end up using
the same memory and it would be fine. And this is exactly what happened in
this case (because most of the functions had similar shapes and pointers were
declared at the top of the stack frame). The main reason they were reluctant
to change the code was because they had copied it from somewhere else and
didn't understand how it worked. Because it didn't crash, they didn't want to
touch it.

~~~
phkahler
OMG that's insane.

------
userbinator
I wonder if the first one was meant to be

    
    
        a -= a * a;
    

which evolved from

    
    
        a *= a;
    

after the author thought "I need to subtract its square from itself, not just
square it". I can see how this could occur in numerical code.

I've noticed there are two opposing schools of thought on issues of "insane
code" like this; on one side there's "use the language to the fullest if it
makes sense to", and on the other is "avoid anything that might be the
slightest bit confusing". I think that taking either of those to their logical
conclusion is not a good idea --- in the former case, you'll end up with
extremely dense and (initially) difficult-to-understand code, but the latter
case will cause a gradual degradation of code into something approaching the
verbosity of Asm, but with none of the benefits (e.g. "ternary operators are
confusing, don't use them; multiple levels of precedence are confusing, so
always fully parenthesise expressions; nested parentheses are hard to read, so
don't use them either and only do one operation per statement; boolean
expressions are confusing, so always use if/else; nested if/else are
confusing, so don't nest and always refactor to use a function call, etc.
etc.)

It does seem that different languages vary in where they are on this scale,
with more "exotic" ones like APL tending highly towards the former
(interesting related discussion:
[https://news.ycombinator.com/item?id=13565743](https://news.ycombinator.com/item?id=13565743)
) and C#/Java towards the latter. The ideal is probably somewhere in the
middle.

~~~
metaobject
"I've noticed there are two opposing schools of thought on issues of "insane
code" like this; on one side there's "use the language to the fullest if it
makes sense to", and on the other is "avoid anything that might be the
slightest bit confusing".

This is a false dichotomy. How about "use the language to it's fullest, within
reason (avoiding things like multiple side effects per source line), but when
non-obvious code is written, document it."

At least, I'd say that describes my approach.

~~~
warrenm
Pretty sure that's why he said, "The ideal is probably somewhere in the
middle." :)

------
d--b
Well, maybe a good way to avoid this is to have the compiler email the manager
rather than printing a warning.

~~~
wyldfire
In all seriousness, management who takes things like static analysis results
seriously is a valuable asset. We should consider teams that have management
like this as superior to the ones who don't. They clearly value foresight and
planning and will probably be less likely to host a chaotic/careless work
environment.

~~~
TeMPOraL
Except when they take it too seriously.

At my $dayjob, I have a problem with a cow-orker with a position slightly
superior to mine ("first among equals", former team lead of the group of
projects to which I'm assigned) taking results of static analysis _too_
seriously - to the level of cargo culting. When bored, he'd introduce "fixes"
to reduce number of issues reported by SonarQube - "fixes" I have to revert
every now and then, because they reduce quality of the code (sometimes also
introducing bugs through carelesness), all to optimize the Sonar metric.

To be clear: tools are fine. It's people that are the problem (as usual).

~~~
btschaegg
It also strongly depends on the type of tool you use. If by "static analysis"
you mean "Clang screams at me if I'm juggling with razor blades", the results
mostly seem to be rather useful.

But I've yet to see one of those code-visualizing tools based on metrics that
convinces me. The results I've seen so far are either obvious ("I already knew
that, thanks") or inconclusive and hard to interpret. So, to me those tools
seem to be more of a way to try and convince your manager that you should be
given time for refactorings than actual utilities.

I've been wondering if there could be a benefit in analyzing the commit
history of a project instead so that one could somehow at least visualize
obvious problem areas ("every third bugfix has to touch this piece of code, so
we should really have a proper cleanup there...").

~~~
joevandyk
A "churn" metric would show if a given piece of code is getting changed
frequently. That can identify hot spots in the code.

------
schindlabua
I use the ternary operator all the time.

    
    
      result = boolean_flag
          ? do_this()
          : do_that();
    

I think it's less noisy and nicer to look at then an if-else, provided the
statements aren't too long. God forbid multiple side effects though, even in
languages with less complicated sequencing rules than C/C++.

~~~
dfgdghdf
I think that the ternary operator should only be used for side-effect free
code.

So:

    
    
        auto result = boolean_flag ? 
          compute_property() : 
          compute_another_property();
    
        if (boolean_flag) { 
          do_this();
        } else {
          do_that();
        }

~~~
TeMPOraL
Since everyone agrees, I'll disagree :).

Seriously though, there's nothing in ternary operator that says it should only
be used for side-effect free code. _Especially_ in C++ (or Java), which
doesn't even differentiate between stateless and stateful code on the language
level.

IMO ternary is a perfect solution for the case where you want to assign or
return a different value based on a boolean condition. Whether or not the code
in branches is side-effect free is irrelevant.

Personally, where I try to avoid using ternary operator, is places where I
don't care about returned value. So "yes" for:

    
    
      auto foo = [condition] ? possibly_stateful_then() : possibly_stateful_else();
    

no for:

    
    
      [condition] ? then_statement() : else_statement();

~~~
Mithaldu
> Personally, where I try to avoid using ternary operator, is places where I
> don't care about returned value.

As i asked below: Why though?

~~~
TeMPOraL
No solid reason, just a feeling. I feel that if-else block communicates intent
more clearly when you're dealing with statements - because it _can 't_ return
a value in languages we're discussing, it's _specialized_ for that job. The
same motivation makes ternary better when we care about a value.

~~~
Mithaldu
To be honest, with that explanation the decision you made has nothing to do
with ternaries, and is entirely just the mirroring of the decision the if
forced on you.

Also,

> in languages we're discussing

Don't bury that lede. If you're restricting what you're saying to a specific
language say that bold and as the very first thing, so people don't end up
wasting their time parsing your post in a different context. Rude.

~~~
TeMPOraL
> _To be honest, with that explanation the decision you made has nothing to do
> with ternaries, and is entirely just the mirroring of the decision the if
> forced on you._

I like to think of it as using the dedicated tool for a job it was designed
for. In my mind, it's a good rule of thumb because it enhances readability
(compare "the principle of least surprise").

> _Don 't bury that lede. If you're restricting what you're saying to a
> specific language say that bold and as the very first thing, so people don't
> end up wasting their time parsing your post in a different context. Rude._

Why? Come on, at this point in the thread we'll _still_ discussing C++
derivatives. I thought it was obvious.

Also: if a lanuguage has an if/else as an expression _and_ has a ternary
operator, then IMO there's a problem with that language being cluttered with
duplicate syntax for the same thing.

------
euyyn
> First of all, there will be a lot of false positives. For example, you might
> write
    
    
        total_cost = p->base_price + p->calculate_tax();
    

> This would raise the warning because the compiler observes that the
> calculate_tax method is not const, so it is worried that executing the
> method may modify the base_price, in which case it matters whether you add
> the tax to the original base price or the updated one. Now, you may know (by
> using knowledge not available to the compiler) that the calculate_tax method
> updates the tax locale for the object, but does not update the base price,
> so you know that this is a false alarm.

I think the warning in that case is still fair, and I'd be perfectly happy to
get it. calculate_tax() could be modified to affect base_price with this
compilation unit being none the wiser.

~~~
corpMaverick
Agree. calculate_tax should be declared const. Or it should not be used on an
expression like that.

------
ridiculous_fish
v8 at one point made inlining decisions based on character count. Terse code
would be inclined more aggressively, and even adding comments could defeat
inlining: [https://top.fse.guru/nodejs-a-quick-optimization-
advice-7353...](https://top.fse.guru/nodejs-a-quick-optimization-
advice-7353b820c92e)

~~~
notamy
I can maybe understand why v8 would do that, but why on earth would comments
be included in that?

~~~
maccard
Because its faster to get the length of the string before you strip out the
comments I guess?

~~~
spoiler
It's because back then V8 didn't save the AST in `SharedFunctionInfo`, so AST
was inaccessible, so to avoid re-parsing and generating the AST to use AST-
specific heuristics for inlining big function bodies, it used just the source
length as part of the inlining strategy. This has been fixed for a while now
with TurboFan[0] (also, now you can call `ast_node_count()` and get the AST
count directly).

[0]:
[https://github.com/v8/v8/wiki/TurboFan](https://github.com/v8/v8/wiki/TurboFan)

EDIT: added information about `ast_node_count()`

~~~
notamy
That makes sense. I figured there had to be a reason why they didn't just
check the AST directly.

------
sgt101
Two unfashionable lessons that I learned when I was a practicing programmer
rather than a hobbyist and loathsome manager.

1\. Code is a communication system : between you and the next person to touch
it. You are telling the other person how you made this work. Invest in this
because the next person will probably be you in six months.

2\. Compilers and type systems are your friends. Write code that extracts
errors from the compiler by explicitly constraining things as much as you can
afford to do. Avoiding typing or checking via long methods or low level
datatypes is bad. Short typed, functional methods and high level data types
are good.

------
nottorp
After gaining a bit of experience programming, I've developed for myself the
following rule:

If I feel particularly smart and proud after writing a piece of code, I look
at it again and remove some smartness in favour of readability.

Unless it's absolutely performance critical, in which case I try to comment
the hell out of what's going on.

~~~
taeric
First, I am not at all contradicting this. :) I just want to get that out of
the way.

I have found it pedagogical to read some more difficult programs, without
trying to find ways to make them necessarily "read" better. In particular,
generic and/or high performance versions of algorithms.

To your point, this is precisely why I will refuse to generalize some parts of
my code on the first pass. First, solve my problem specifically, then try and
generalize. (This is not a waterfall style approach, but I do try and keep it
somewhat phased.)

~~~
nottorp
IMO don't generalize unless you find out later you need the same algorithm
somewhere else.

Otherwise, just leave the specific version in and stick some info in the
comments (like a link to the high level description of the algorithm, if it's
a common one that has a name, plus some pointers to your specifics).

~~~
taeric
Completely agreed. I do recommend playing with it (in like a repl or some
other place) to see if you can possibly improve the algorithm, but have
specific metrics for what an improvement would be.

I'm reminded of a section in Programming Pearls where they explore several
versions of a function, each time improving it. Too often I'm impatient and
want the final solution first.

------
drewg123
I used to think I had a pretty good grasp on how the compiler was going to
encode things.

After spending months looking at disassembled FreeBSD kernel C code in a
profiler, trying to hunt down cache misses, I'm realize that it is hard to
count on anything and that my instincts, even after 25+ years of C
programming, are often wrong.

------
bryanrasmussen
I wonder if, from this: He gave as one example a book from an apparently-
successful author (sales of over four million and counting) who firmly
believed that the terser your code, the faster it ran. The author crammed
multiple side effects into a single expression, used ternary operators like
they were going out of style, and generally believed that run time was
proportional to the number of semicolons executed, and every variable killed a
puppy.

If the described author could be identified? Perhaps not absolutely, but they
do have some specific stylistic quirks combined with a particular sales level.

~~~
jxramos
I noticed that books in general have different constraints as well in not
having an infinite vertical view of the code unlike a text editor. The act of
publishing to physical paper imposes constraints on line length both
horizontally and vertically. I wonder if that drove part of it.

------
wyldfire
> The people who would benefit from the warning don't have the necessary
> background to understand it.

In my experience, this is accurate, but underscores a teaching opportunity. If
you show Joe Beginner how to get the warnings, explain how they apply to the
code and why the warning was good, they may slowly begin to appreciate the
warnings. So that even if they don't understand, they won't necessarily
dismiss the warning next time -- instead they'll go ask for help.

~~~
nerdponx
"Did you mean..." or "this is a bad idea because..." type notes in
error/warning messages are underrated teaching tools.

The R package "data.table" does a good job at this. Just scroll through
[https://github.com/Rdatatable/data.table/blob/master/R/data....](https://github.com/Rdatatable/data.table/blob/master/R/data.table.R)
and search for "warning(".

------
mcguire
From one of the comments there,

“ _Nathan Reed_

" _July 19, 2017 at 4:34 pm_

" _Didn 't you just write such code with a straight face yourself only
yesterday? ;) In your bytecode interpreter example:_

    
    
       memory[NextUnsigned16()] = NextUnsigned8();
    

" _Presumably the NextUnsigned8 /16 functions have side effects of advancing a
buffer pointer, so this is nearly the equivalent of p[x++] = ++x._"

------
js8
Somehow I thought this would be about monads and using them to control side
effects.

------
__s
Closest I've come to really embracing that kind of thing is
[https://github.com/serprex/pythonaes/tree/master/aespython](https://github.com/serprex/pythonaes/tree/master/aespython)

NB it really is faster

edit: nvm, Crumb is way more that kind of thing:
[https://github.com/serprex/Crumb/blob/master/C3.py](https://github.com/serprex/Crumb/blob/master/C3.py)
I forgot I concocted that little devil

~~~
proaralyst
Are you expecting that code to be maintainable? It looks like it's been run
through a minifier...

~~~
__s
Nope, haven't had to maintain it in years. I did have to port it to Python3
which was a pain & grew it by ~30 bytes. The minifier was myself. It could
actually be tighter but I optimized it for gzipability since I put it through
a build tool to produce
[https://github.com/serprex/Crumb/blob/master/Crumb.py](https://github.com/serprex/Crumb/blob/master/Crumb.py)

I even did a C version
[https://github.com/serprex/Crumb/blob/master/Crumb.c](https://github.com/serprex/Crumb/blob/master/Crumb.c)
which is a pretty boring implementation besides

    
    
        while(p>q?(j=d==b?:i*q/p,i++<p):(i=c==a?:j*p/q,j++<q))

------
nailer
What does:

    
    
        a -= a *= a;
    

Do?

Looking through C# docs (which I assume this is, but don't have experience
with), it seems A's value is changed to be the previous value of a, minus
itself (ie 0) multiplied by a? In other words:

    
    
        a = 0
    
    ?

~~~
nerdponx
I would hope it's equivalent to

    
    
        a = a * a
        a = a - a
    

That would be the "least astonishing" outcome for me.

------
pmarreck
A lot of these pitfalls and concerns would just completely go away in a
language that used immutable data structures pervasively

------
avip
The pythonic example I've seen not once:

{"some": "dict"}.setdefault("some", foo())

while expecting foo() to be short-circuited.

~~~
raverbashing
That's awful

What are people expecting with that?

~~~
avip
They expect to have some form of "lazy caching":

    
    
      conf = {}
    
      def get_conn():
        return conf.setdefault("db_conn", expensive_creation_of_db_conn())

