
What is your take on checking return values? - john-shaffer
https://rachelbythebay.com/w/2020/09/02/check/
======
grantjpowell
> They actually said that to me: it makes the code messy so they don't want to
> do it.

I'm going to give one possible interpretation (my own) of the sentiment I
believe the engineers were trying to get across with the statement "it makes
the code messy".

Functionality is an asset, and code is a liability. Minimizing the amount of
code and making the code "less messy" is a way of reducing that liability.
That doesn't excuse not checking for certain error conditions, but I think
it's important to point out that non "messy code" has merit on its own.

On the topic of effective error handling, the number of possible invalid
states in a program (especially dealing with the messy real world) tends to be
combinatoric. Error checking is vital to building a successful application,
but my experience is that in practice it quickly becomes impractical to have
_total_ potential error coverage. At some point tradeoffs need to be made, and
effective engineering is the art of evaluating those tradeoffs.

> "And yet, here we are, because it DID happen."

I'm not trying to imply the author may be suffering from hindsight bias, I
don't have any information that could tell me that. I do know that in similar
situations I've been in in the past I feel like I've suffered from hindsight
bias significantly. Here's a great article on hindsight bias that I think
about often.
[https://www.lesswrong.com/posts/fkM9XsNvXdYH6PPAx/hindsight-...](https://www.lesswrong.com/posts/fkM9XsNvXdYH6PPAx/hindsight-
bias)

The pivotal section from the lesswrong 2.0 article for me

> Viewing history through the lens of hindsight, we vastly underestimate the
> cost of effective safety precautions. In 1986, the Challenger exploded for
> reasons traced to an O-ring losing flexibility at low temperature. There
> were warning signs of a problem with the O-rings. But preventing the
> Challenger disaster would have required, not attending to the problem with
> the O-rings, but attending to every warning sign which seemed as severe as
> the O-ring problem, without benefit of hindsight. It could have been done,
> but it would have required a general policy much more expensive than just
> fixing the O-Rings.

------
nickm12
Tony Hoare, famously, called null references a "billion dollar mistake"[0].
Much preferred to checking return values is to use a language which models
null references in the type system doesn't compile code that would try to use
a null value as if it weren't null.

Ok, fine, but for whatever reason, some of us don't have the option of working
in such a language. What to do then? The author seems to think it's obvious
that the answer is you check for null in any function that could possibly
receive a null value. This means you have to check for null for every
reference type in your system, which is madness.

I agree with the programmers who the author thinks "Do Not Get It". When you
have a null reference bug, you fix the bug where the reference is generated,
not where it's used. If you fix it at the place where it's used, you've barely
fixed anything, because the place where it's generated will continue to send
nulls to unsuspecting callers who will blow up when they assume it's non-null.

It sounds to me like the fix here is to write a validation routine for the
configuration to ensure that it doesn't vend null values to the rest of the
program when it shouldn't. The service can validate its config at load time
and fail to start up if it's invalid. This will block your deployments, but
your customers won't be impacted and your code won't be littered with null
checks that fail to address the root cause.

[0] [https://www.infoq.com/presentations/Null-References-The-
Bill...](https://www.infoq.com/presentations/Null-References-The-Billion-
Dollar-Mistake-Tony-Hoare/)

------
thdc
Definitely agree that you should have validation if you can't guarantee the
validity of your data, but I think there's a difference here.

Coincidentally, I had a discussion about a similar topic today. Basically how
much validation is necessary and at what level should it be at?

    
    
      function add(a, b)
        if a is not a number
          throw error
        if b is not a number
          throw error
        result = a + b
        if result > max possible precise value
          throw error
        etc...
    

I would never have these checks in a function that just added numbers if I
could guarantee that any values passed to it are correct.

The team in the article said that the config shouldn't return null so they
didn't put checks before using that value. To me it seems like they missed
__necessary __validation logic because they didn 't the realize the full
extend of the config.

------
NewEntryHN
Why are we talking about checking return values (which are generated by a
function according to its interface and should not be checked) when we should
be talking about checking configuration values (which are generated by humans
according to whatever they type on their keyboard and should be checked). Also
why are we talking about checking stuff when we should be talking about one
bug bringing the entire system down?

------
burfog
Be sure that your error-handling code checks for errors.

    
    
      while(printf("Error!\n")+1 != sizeof "Error!\n")
        ;
    

Also, it should be bug-free.

~~~
burfog
No, wait, like this:

    
    
      void got_error(void){
        char err[] = "Error!\n";
        int len = strlen(err);
        int ret = printf(err);
        if(len != ret)
          got_error();
      }
    

Obviously, if the error routine faces an error, it should call the error
routine. That is how you handle errors.

Er, but maybe the error routine should also return an error flag in that case?
The caller could then check it and call the error routine if needed.

------
technicalbard
I'm an amateur developer and I write checking code in functions all the time
because you can't be certain of the source of the data!!!

------
tjalfi
RobustProgramming.ps[0] is a great presentation by Stratus Computers about
error handling for writing reliable systems.

I posted it to HN[1] a few years ago.

[0] ftp://ftp.stratus.com/vos/doc/papers/RobustProgramming.ps

[1]
[https://news.ycombinator.com/item?id=13678251](https://news.ycombinator.com/item?id=13678251)

------
rurban
Could be the glibc authors. Why should one ever check for NULL or out of
boundary conditions or overlaps or overflows when the spec forbids it?

