I recently discussed this with an engineering manager for a notable consumer iOS app with tens of millions of downloads. In discussing how to better reduce their crash rate, there was a debate on his team about whether to enable assertions in their production iOS app. As any iOS developer knows, even if you have a great crash reporter and database there are a lot of crashes that are hard to figure out.
One faction believed you should never intentionally crash the app, while another faction believed the app was going to crash anyways or maybe do something worse.
They decided to test it out and during a sprint that was already mostly about bug fixes where they would do extra testing before release, they enabled asserts, and closely monitored their crash reporter.
What they found was that the number of crashes did not change much, but the cardinality went down, significantly. The learning was code executing past a disabled assertion may be in one of n different bad states, each of which might lead to a different type of crash. They now had better high-level information about what was causing crashes (knowing which asserts were wrong) and it helped them reduce their crash rate much more than raw crashes without asserts (including cases where the crash was in iOS code, not app code).
Well yeah, if an assert() isn't bogus, then anything that follows after it is going to be wrong in some way. In many contexts, crashing is the least worst thing you can do in this situation.
If you're really working on something that HAS to keep running NO MATTER WHAT then you probably want to do the hardcore critical systems stuff like N independently developed systems voting on the outcome. (Ah, but developed to what spec? Now we've just moved the bugs from the code to the spec... At least hopefully there are fewer of them.)
That's a really interesting results. Those are the kind of thought you may have but knowing that it actually worked on a production app that's massively used is a really valuable thing.
I inherited a hilarious bad(ly written) legacy system.
One of the first things I did was turn on every possible "throw on error, die on error, assert" in development (could't do it production as it'd just fall over constantly).
The constant crashes in dev meant I rapidly fixed the low hanging fruit (undefined indexes etc) and then slowly as I cleaned an area up I'd leave the asserts in once in production since at that point what I'm asserting is "I think I fixed all the insanity but tell me straight away if my assumption was bad".
Painful very painful but those areas now cause me very few if any issues.
I work in visual effects for film writing internal tools. Usually they're tools that take data from department to department or the department uses internally; fairly short processes that serialize data in between.
One place I worked had a team that was very adamant about not really having much error checking. Not much of any qc process, either. Wait for someone to complain about bad data and respond. Honestly, this worked really well for small, skunkworks type projects that needed to be nimble. As you would expect, when errors did happen it was because of bad data from further up. You really had to know the system well to be productive (the cynical of us thought the developers liked this because they could look like heroes).
I prefer to error early and clearly and I got a lot of push back. To be fair to them, often times the errors would be irrelevant to the specific thing they needed and it would have been preferable to ignore and carry on. It felt like I was imposing a bunch of bureaucracy and yak shaving. But bringing on new people and scaling the amount of data going through would have been impossible without more structure.
I find it sad that there seems to be no conventional wisdom in the software business about the proper way to do this. Everyone starts from scratch and tries to learn it the hard way.
If you want some fun (fun as in ruining your day, don't try this without backing up first) fill up your system disk to near capacity, then try to run various apps and system utilities.
All kinds of bizarre things start happening, usually bad. The cause is simple, most of these are written in C and C programmers rarely check to see if writes to the disk succeed.
For example, a few months ago, my Windows box would crash every time it would auto-update. I did a lot of cursing about this, as how could Microsoft do this? I eventually realized that the disk was nearly full. Cleared out a few gigs, and the auto-update started working.
This is 2018.
It's not specific to Windows, either. It happens with every OS I've ever tried it with, including Linux. No message like "disk full", or "failed to write file". Just erratic random behavior and weird things happening.
That's right, but at least these days you'll get seg fault if you try to use it.
Back in the bad old DOS days, NULL pointers pointed at the interrupt dispatch table, so running out of memory (awfully common on a 640K machine) meant you trampled all over the operating system. It was so bad that I'd defensively reboot my machine constantly while debugging.
DOS extenders saved the day, because they ran code in protected mode. I never developed code in real mode again, I just ported fully debugged code to it.
In some other languages, a failed write will kill your application with an IO exception. Or at the very least (in Java) the exception will hit a handler that you are forced to write. The point being, those languages are designed to force a different code path on you in situations like failed IO, and people writing standard libraries make sure to check return values of C calls and throw exceptions if they're bad.
There's a good approach to asserts I adopted years ago for your typical low-risk CRUD/web/mobile app that I'm guessing ~90% of us are working on (as opposed to aviation software like in the article): Use asserts as you normally would, but in production have it fire an event to google analytics or some such. This is particularly useful for those middle-ground situations where a full app crash isn't justified because things will probably chug along fine, but nonetheless you want to know something's screwy.
For debug/testing mode I usually pop a toast or dialog so the tester knows to report something to me even if the app is otherwise working fine.
You often say to yourself "Well I know this assert will never trigger but I'll put it here just in case". Then you check your analytics next day after release and get humbled.
>You often say to yourself "Well I know this assert will never trigger but I'll put it here just in case". Then you check your analytics next day after release and get humbled.
That is exactly what asserts are for: checking for things that "cannot happen" - but do sometimes happen.
Software development often does seem like a struggle between reliability/robustness and safety/correctness. Figuring out what to do with asserts falls in this debate!
For embedded development I have often had ASSERTs log out to a serial port with the function name/line # and continue operating - this can then be connected to any logging device. This makes the system a little more robust (it won't crash, right away at least) but less correct/safe.
During development testing we often had complaints of machines freezing due to asserts, which always increased the priority of those bugs! Definitely a good thing in the long run to fix those bugs and make the system more correct. Stopping all executing may not be the correct 'safe state' for an airplane though!
> Stopping all executing may not be the correct 'safe state' for an airplane though!
It absolutely is. It's a (literally) disaster to allow code that has entered an unknown state to control aircraft functions. There is NO WAY such a design would EVER be certified by the FAA.
This may be n sample size of 1. After reading the NASA programming guidelines for remote vehicles and operation of equipment on mars my programming and system programming got a lot better.
Boil it down very simply without going into detail.
* On start-up each module allocates all its memory (it has a fixed bounds)
* Each loop loop of each module has a interation bounds
* Assert pre-condition and post-condition of routines and leave them in production.
That is certainly not how fly by wire systems are designed. The FAA would never certify a design that relied on allowing software in an unknown state to access critical flight controls.
I've never written written safety critical software, but I believe the point is that "stop all execution" is not always an adequate response to an unknown state; this does not imply that "proceed as if nothing went wrong" is an acceptable response. I would imagine the response to an unknown state would be "return to a known state". This could be accomplished by rebooting (returning the software itself to a known good state), passing control to a redundant system (possibly done by halting all execution of the affected system, but the design needs to consider more than that), falling back to some form of a safe mode, where the bad state is not present, but safety critical functionality can continue.
When you're at 30,000 feet, nobody wants to go experimenting with a system controlling critical flight operations that has gone into an unknown state.
The pilot often has the option of rebooting the system and trying it again, but that's dangerous. On an episode of "Aviation Disasters", I don't remember the exact details, but the pilot got a warning of a failed system. After a consult with the ground, they told him to reboot it. After a while, it failed again. They told him don't worry about it, reboot it again.
It (the airplane, that is) crashed.
Aircraft systems are designed to be decoupled from each other as much as possible, so failures do not propagate. In particular, they must not propagate to the backup system, or the "safe mode". The engineers do work hard at this, and sometimes they don't get it right, and another bitter lesson is learned.
The "safe mode" system must be physically and electrically decoupled from the normal system.
An example of getting it wrong is I've seen demonstrations on TV of people hacking in through the wireless key locks on a car to take control of the brakes. That's seriously bad engineering - not so much the vulnerability to hacking because happens, but the fact that the door lock system is connected to the brake system. It's cowboy engineering at its worst.
Keeping systems isolated is an orthogonal concern. The question is weather rebooting the system is safer then shutting it off. The fact that shutting it off should be safe is a separate concern. At a minimum, shutting it off reduces your redudancy; and it the system fails again (even in an undetected way), that should be fine, because you (should) have enough redundancy to handle an arbitrary system fail in unexpected ways.
Usually the hardware is tri or even quad redundant so the system can handle one of the boxes halting execution while it is either reset or other action is taken. If several failures happen one box may be powered off by the others to stop erroneous outputs from the failed box.
May? The offending box gets electrically isolated from the craft if any fault is detected in it. Any fault. Now, the pilot may attempt to restart it by manipulating the circuit breakers, but that's the pilot's option.
At my work (hosted enterprise webapp), both the frontend and backend have both an `assert` function and an `assertAndContinue` function. `assert` always crashes on failure, and `assertAndContinue` crashes in dev and logs an error and keeps going in prod. Each time we want to verify a runtime assumption, we decide which type of assert to use. We prefer `assertAndContinue` (and I push for it in code review), but there are reasons to use plain assert:
• In some cases, crashing is a better user experience than proceeding in an unknown state. Usually this is on the backend when data integrity is at risk, but could be on the frontend when we're at risk of making a server call with the wrong arguments. But usually, pretty much anything that doesn't crash is better than crashing.
• `assertAndContinue` requires that you have a reasonable fallback. If code is going to crash the next line anyway, there's no point in `assertAndContinue`. In most cases, it's easy, but sometimes it takes real engineering, like building in an error state into the UI component that skips downstream code. When there isn't an obvious fallback, using `assertAndContinue` is a judgement call based on the difficulty of implementing a real fallback, the likelihood of the bug actually happening, and severity if the bug does trigger.
In code that runs on your hardware, assert() should actually be drop_into_debugger() and then page an oncall eng with the url to attach their debugger to. The act of dropping into the debugger should cause a 502 back to the client with a GUID to track the exact issue.
I have a difficult time imagining anytime that assertAndContinue makes sense. If a fallback is necessary and recovery possible, I would have asserts throw and use normal exceptional handling for that.
Never is a strong word. Suppose the clock app on my phone fails an assert. Technically, this means that the entire phone system entered an unknown state; but there is a good chance that the bad state is contained within the failing app. In this case, it would be possible to recover by killing and restarting the offending process.
Sure, it is possible that the crash was actually caused by bad state in the kernel, or even in the baseband processor, so we can't say for certain if restarting the app actually put us into a good state. However, the approach of isolating failures at the program level seems to have been greatly succesful at improving the reliability of the overall system. It does not seem unreasonable that when a "single" program becomes sufficiently complex, it would not benefit from a similar containment.
The process in an operating system with per-process memory protection is a reasonable level of containment for a consumer phone, and terminating/restarting the program is a reasonable solution.
For a safety critical system, that isn't good enough. Put it this way - would you bet your life on it?
On my side I don't really see its point anyway. If the code is actually ready to handle such failures, this is merely an error trace, so it should just be called that.
> this is merely an error trace, so it should just be called that
It's stricter than an error trace because if it's encountered in a development/testing context, it's a hard error and the engineer must fix it to proceed. Plain error traces should only be used for expected failure scenarios (like a third-party service times out, possibly), not situations that would be considered "unexpected" to the developer.
I use a similar pattern for edge cases. We work with weird data and often have processes to cleaning it - non essential but useful for optimisation. Sometimes we’ll have an assumption about the structure based on what we’ve seen before but we can’t be sure that will always be the case. So if the following code requires a certain format, we check the assumption, log if not met and then don’t continue with that cleaning step. More often than not we never see the exception on production.
assert(someVar != null, "Unexpected null someVar");
assertAndContinue(params.val > 0, "Expected greater than 0 val", params.val);
// etc.
someVar.someMethod(params);
It may not look very different but it adds a documentation like effect to the code. It lays out the potential failure cases before the code execution. Even if internally `assert` becomes `throw`.
In some cases there are suspicious values that aren't necessarily incorrect. The person writing the code might think: "I don't expect this value but it won't cause an exception in the way a null reference will". You may want to catch these in development and if they show up, you investigate and there is no actual problem with the value then you can update the assertion to more accurately reflect the data.
Exceptions are a control flow mechanism, not necessarily an error collection mechanism. I also frequently work with languages either without exception handling or with exception handling disabled for certain targets.
Even when exception handling is enabled, it's a bit silly and heavyweight if you just need to short circuit a single conditional, loop, or function. Why throw when you can break or return? And when you do use nonfatal-but-logged exceptions, what do you put in the exception handler and what makes it different from assertAndContinue besides minor errata like being specialized for exceptions?
> I also frequently work with languages either without exception handling or with exception handling disabled for certain targets.
In those cases, you definitely cannot use exceptions but I'm still not sure I'd use something like assertAndContinue. If an assertion has failed, something has gone terribly wrong and I can't imagine just continuing.
> it's a bit silly and heavyweight if you just need to short circuit a single conditional, loop, or function. Why throw when you can break or return?
You throw because it's an error condition, you break or return because that's a potential the normal flow of operation. It's not heavyweight because it's for a state that should potentially never happen.
> And when you do use nonfatal-but-logged exceptions, what do you put in the exception handler and what makes it different from assertAndContinue besides minor errata like being specialized for exceptions?
All exceptions are "fatal" to the current operation, however you define it. It could be fatal to the entire application, bringing it down completely. Or it could just be fatal to a single function, request, or even UI button click. In comparison, assertAndContinue() would continue the current operation in, by definition, an invalid state.
> If an assertion has failed, something has gone terribly wrong and I can't imagine just continuing.
For most traditional uses of asserts, sure. There's plenty of edge cases you might not want to ship with but aren't fatal if they do ship however. In a game this might be something like missing leaderboard definitions. Sure, losing player scores sucks, but it's not as bad as crashing the game outright and losing their progress. It's worth crashing the game at QA time to force it to get fixed, it's worth silently logging in production to avoid even bigger losses to the player. I consider those checks a variation on assertions, and "assertAndContinue" sounds like exactly the sort of thing you'd use for exactly this kind of check. Maybe you don't call them assertions...?
> You throw because it's an error condition, you break or return because that's a potential the normal flow of operation. It's not heavyweight because it's for a state that should potentially never happen.
Exceptions take full stack traces in a number of languages. Even in those that don't, compilers usually optimize for the exception-free path and basically ignore the performance of the exception handling path. And just because it "should" never happen doesn't mean it doesn't happen frequently enough to cause perf bugs.
One example: A title I worked on had significant framerate problems if you pawed at the screen just right. The cause? Exceptions thrown from system APIs on invalid touch/finger IDs when querying finger positions. Because they invalidated the IDs before finger up events could be processed.
It could've returned an invalid state, returned the last known finger position for that ID in release builds, or done any number of faster error handling paths and it wouldn't have even been noticed as a problem. Instead, it threw, I doublechecked there wasn't any way to pre-query the finger IDs or process finger up events faster/in time, added a terrible catch statement, and just ate the performance hit.
Exceptions are heavyweight. Sometimes not so terribly so that they're the wrong tool for the job, but sometimes they are.
> All exceptions are "fatal" to the current operation, however you define it. It could be fatal to the entire application, bringing it down completely.
This is the nonfatal-but-logged case I was specifically referring to.
Again: What do you put there?
> In comparison, assertAndContinue() would continue the current operation in, by definition, an invalid state.
Any code invoking assertAndContinue has presumably taken measures to properly handle the continue path, likely by simply aborting the operation being asked of it. Just like any code that uses a try/catch construct presumably would.
Some examples of fallback behavior where I don't think exceptions would help that much.
* Of two arguments to a function, exactly one is supposed to be non-null (since they're different ways of specifying the same thing). If both end up being non-null, assertAndContinue and just use the first one and pretend the second is null.
* When listing all items of type X in a folder, ask the server for all items and do an integrity check that they are indeed in that folder and have the expected type. If any items aren't like that, assertAndContinue and filter those items out and just display the valid ones.
Wow, I'd definitely throw in both those cases. If someone passed a value to the first function and it ignored it, they're likely not getting the result they want which will propagate through the system and be even worse than just dying right there.
To be clear, both are considered bugs and both are reported to our error reporting system as bugs that we will then investigate as if they were crashes. The question is whether it's better to immediately crash the page for the user or whether to proceed in a best-effort way. From our experience, it's very common that the crash is in an area of code unrelated to the primary set of actions the user is trying to take. Having some widget behave strangely in the corner of your screen is much less damaging than completely blocking the user from getting work done.
Also, to be clear, both of these examples are in frontend code, where mistakes are very unlikely to cause lasting problems, since anything affecting data integrity would already need to go across the client-server boundary.
Front-end code (or a desktop app) is a bit different because you don't crash out in front-end situation. But I'd still throw and stop the current function/process and recover at a higher level than let the code continue to run. The user shouldn't be stopped from doing anything else or trying again.
If the effects are merely limited to a widget behaving weirdly, this is a sound strategy. And this seems to be the case for front-end code in well designed systems.
We have a similar pattern in my product (large Win32 app). We have "Assert" which throws a dialog in Debug then continue execution, but is removed in Prod. and then we have "IfFailedCrash" type asserts that immediately crash the app so that our crash reporting systems can kick in and report back with state information. (Though these can independently be turned off in prod, reverting them to a no-op.)
We end up with a lot of issues where developers ignore the dialogs, so we are slowly moving to more "crash" checks to catch issues more quickly.
I completely agree with the approach outlined in the original post. (It's also called defensive programming.)
Not failing fast and hard when the application is in unexpected state (which is what assert should check) is robust only on the surface.
I think we have the debate because it's not just about adding asserts. If you want to add asserts to existing codebase, essentially making it to fail fast and hard, you need to have a system architecture that is capable of dealing with said failure. If your codebase isn't architected as such, you will probably make things worse for the user by adding extra asserts.
I think that's what the people who are against asserts fear
of, and rightfully so. But the answer shouldn't be do not use asserts, the answer should be make the system more resilient to failure first.
In my view asserts are just a way of defining the contract of an interface that cannot be specified to the compiler/runtime any other way. They are a workaround of limitations of those systems, or rather the extend a system with infinite flexibility.
For example the NaCl "randombytes" function has the prototype:
void randombytes(unsigned char *buffer, unsigned long long length);
It has no way to report an error -- its contract is that it must fill the buffer with high quality random bytes before returning.
And the OpenBSD function "arc4random_buf" has the prototype:
void arc4random_buf(void *buf, size_t nbytes);
So we have a potential type difference between the two interfaces for the amount of data that we can generate.
If you wanted to back randombytes() by arc4random_buf() you have several options:
1. Handle the case within randombytes() where the (unsigned long long) value exceeds the maximum value of (size_t) by making repeated calls to arc4random_buf();
2. Assert, at compile time, that (size_t)'s range falls within (unsigned long long) and there can be no problem (since arc4random_buf() similarly cannot fail);
3. Redefine the contract of randombytes() such that specifying a buffer length that exceeds (size_t)'s maximum value is invalid -- this is really the same as #2 but instead of failing to compile if (SIZE_MAX < ULLONG_MAX) the program will compile and run fine as long as the contract is not violated; The assert in this case is a guard against the violation that should never occur and helps to enforce the contract during development
With regards to production use of assert, catching a contract violation there means a lot of things have gone wrong. How likely things are to go wrong and how much damage they do when they do go wrong can vary from contract to contract. Various asserts may be used based on the such things, from compile-time to run-time debug to run-time production to an inline debugging framework. There are trade-offs and no answer for all cases.
The larger a codebase the bigger the value of a crash 'close' to the root cause of the error. Assertions help you to get - sometimes substantially - closer to the root cause of a problem and any assertion that does not hold would lead to a crash - or worse, silent data corruption - any way, so by all means, leave them on.
The article argues in favor of assertions in production code but acknowledges that the author's experience is from a very specific field with very specialized constraints and extensive design around recovering from aborted components.
This advice is not obviously applicable to different kinds of programming with different constraints and different environments. As with so many things: one solution does not fit all problems.
You're gainsaying, but others have dropped empirical evidence on the other side, averring a low to no downside; and my experience is all on the other side too. It would help if you gave some lived examples.
I think part of what is argued is that people need to look at their software and determine what failure modes are not acceptable. It is impossible to design a system that won't fail so the focus should be on designing parts that detect and handle failures. For those parts you may want to pull in techniques from aerospace such as command monitor setups, assertions, fail-safe modes, and force restarts.
Clearly not all software is flight critical but a lot of software could be improved by treating some parts of it as such.
> High reliability is not achieved by making perfect designs, it is achieved by making designs that are tolerant of failure. Runtime checking is essential to this, as when a fault is detected the program can go into a controlled state doing things like
Very good point. That's is exactly the philosophy behind Erlang (the language and its VM, also extends to Elixir and other BEAM languages).
It has isolated process heaps and process hierarchy supervisors. That is the most critical part of the deal because it means crashes and failures are controlled and only a small part of the system would be affected, without it spidering out and putting the rest of the system and put it in an unknown state.
Microservices or just using OS processes instead of threads can kind of emulate that. But you can only have so many OS processes as they can pretty heavyweight. And with Microservices there whole other stack of stuff involved as opposed to just the language environment.
> I find these options far preferable than going into an unknown state and praying.
The idea of avoiding "unknown" state is key. That's also where crashing and restarting comes in - to get back to a known state. Of course you'd also want to log the failure so someone can eventually fix it. But, if a system is well designed, someone won't have to do it at 4am in the morning. I've seen systems crashing and restarting for weeks. Sounds terrible at first, but the alternative would have been having a service that's down completely and waking someone up in the middle of the night to fix the issue.
In my code, most asserts become log messages in release builds. This helps a lot when troubleshooting problems, customers don't have debuggers installed. I often work on desktop software, unfortunately it's impossible to achieve avionics level of reliability because I don't have a luxury of controlled hardware and software environment.
It doesn't fix everything, but judicious use of private data constructors and custom types can go a long way towards reducing the need for asserts. They can bubble up potential violations of preconditions and force handler code to be written.
Some Scala examples, but they're hopefully quite language agnostic:
By default macOS apps do not terminate on certain assertions within certain event handlers, because exceptions are caught by the framework. The app just does weird stuff after this happens. This is terrible. Once I figured out how to make it crash I fixed a handful of bugs and the number of reports of inexplicable behavior went way down.
I'm sure you're aware of this, but you can change this with NSSetUncaughtExceptionHandler: https://developer.apple.com/documentation/foundation/1409609.... By default Cocoa registers an exception handler that swallows exceptions and asks the user whether they want to continue after the exception is raised.
The biggest problem with assertions in Python code is that they throw a generic AssertionError when you should be throwing a custom exception class instead. You can't catch the right error (and nothing but the right error) that way.
The one time that it's actually a good idea to intentionally crash an app in production is if failing the assert could e.g. cause irreversible corruption of important data or worse, lead to a security vulnerability/exploit.
e.g. situations where you're writing to a buffer/disk, dealing with raw pointers, etc. (which should hopefully shouldn't happen with good design, but sometimes unavoidable)
Soft assertions. They crash under test and in development, but log in production. Monitor the occurrences in production and alert if they go above some threshold rate. The assertions should have an owning team, and potentially DRI. This info should be a required argument to the soft-assert function. Otherwise you’re just making a mess.
Hard assertions, for things that should never ever happen, or where crashing is preferable to continuing.
There are three main ways to reduce errors in code: tests, asserts and abstractions. All three have advantages and disadvantages and generally deal with different classes of errors.
In this thread, we discuss asserts, not abstractions. Monad is an abstraction. (On the other hand, many people only see tests and forget about two other solutions, which are as important.)
Assertions, yes, in so far as they mean "I know this can't happen, but I can't structurally prevent it in the program in an elegant fashion, with the tools I use". However, I can't see how monads are suddenly going to make this problem disappear.
As for throws/try-catch - beyond assertion use, they also report all problems that are outside your control (like IO failing due to HDD damage, or connection dropping). Those kinds of problems are here to stay.
Sure, monads provide an elegant way to structure code with error conditions, but I don't see a problem with using simple assertions, throws, try-catch to enforce invariants. Personally, I don't care about elegance as much as whether or not it works, as long as it is simple to use and maintain. I can return something like a Maybe/Either, or I can throw an exception - both approaches are fine by me if it forces the programmer to account for error cases.
Throwing is a side effect breaking flow of control for no good reason. It’s takes you down the same path as all other spaghetti code.
Not as much elegance as it is about purity. Minimizing side effects is the number one way to reduce bugs.
It should be the number one guiding principle when creating out reliable software. Which means, you simply cannot use the primitive try catch or similar construct. Don’t break flow of control. Guide it to a terminal value instead.
These days, when I see try-catch and if-else constructs (which is in most codebases) it’s clear there will be bugs over the life of the application.
It’s fine, use them, but there is a world of greatness when you ditch these faulty constructs. Just like ditching OOP constructs. All built on false premises.
If you application is interacting with the outside world, you will be dealing with side-effects, and sometimes error conditions will happen.
For example, today I am working on a service that:
- uses a database
- calls external APIs
- publishes and consumes from a message broker
- interacts with local and remote filesystems over a variety of protocols
All of these things entail error conditions, most of which throw exceptions in the corresponding libraries. Sometimes they are converted to Either/Maybe, and sometimes they are wrapped in "native" checked exceptions.
The important bits:
1. The type system and compiler make sure the programmer has to deal with the error conditions at some point. From a programmer's perspective, a checked exception bubbling up the stack is not very different from returning a monadic object up the stack.
2. The core of the application is entirely pure. No exceptions (in both senses of the word). All side effects are pushed to the boundaries.
Nobody here is saying that monadic error handling isn't good (in case it wasn't abundantly clear, my comment was this thing that people call a "joke"). The issue is pmichalina flippantly disparaging asserts, one of the few good things left in this hell world:
It wasn't abundantly clear, but I'm often humorless.
Sorry for being so crusty. I just found out that about a third of my professional career was essentially waste because I was too thick to adopt Prolog twenty years ago. It almost physically hurts to think about it.
Anyway, I think OP's right about assertions being a code smell: an assertion is literally the place where you tell the machine you don't know exactly what you're doing, eh?
I've been working towards an error-free mode of development, a combination of Dr. Margaret Hamilton's "Higher Order Software" concept that developed out of the Apollo 11 project with mathematical derivation of programs. I'm just about ready to "productize" it but I'm not going to market it to other developers. I'm going to sell it directly to consumers. They won't know enough to argue with the method, they'll just be able to use the computer to solve problems, I won't tell them that they are programmers.
Bottom line: error-free software development methods have been available for decades but the industry tends to ignore them. In 2018 writing "assert", "throw" or "raise", etc. is a code smell: you are doing it worng. If I sound cranky it's because I am.
I hope you succeed, and I say that sincerely, but regardless of whether or not program synthesis should have been pursued more aggressively in the past, I'm not aware of any support for your statement that "error-free software development methods have been available for decades" for any reasonable definitions of "error-free" or "available".
>Anyway, I think OP's right about assertions being a code smell: an assertion is literally the place where you tell the machine you don't know exactly what you're doing, eh?
The whole point of the original article is that, today, the only thing keeping planes in the air is fault tolerance and lots of redundancy (see also: Erlang). Assertions are about humility and it's good to be humble.
>In 2018 writing "assert", "throw" or "raise", etc. is a code smell: you are doing it worng.
But somehow writing ">>=" isn't? On one hand you seem to suggest that error handling is bad because there shouldn't be errors (what about I/O? user error?) while on the other hand you seem to be defending the position that monadic error handling is a panacea.
Brother, I really don't want to have this argument again. I don't care anymore if my peers believe me or not. I'm just going to bring it to market as best I can and see how it goes. Thank you for the well-wishing, and I apologize again for being so cranky.
A few lil points:
> I'm not aware of any support for your statement that "error-free software development methods have been available for decades" for any reasonable definitions of "error-free" or "available".
I know. You are making my point: The methods are there, and no one has ever even heard of them.
The primary method I am talking about was developed by Margaret Hamilton during the Apollo 11 mission to the Moon, and she has unsuccessfully marketed ever since AFAIK. Dijkstra once reviewed it and panned it harshly. It was a rare case of him not getting it. It went right over his head.
Hamilton, by the way, is the person who first coined the term "Software Engineering".
I'm not going to go into it further now, suffice it to say if you are typing text into a text editor to write code you are doing it wrong. There's a book, "System Design from Provably-Correct Constructs" by James Martin, that presents Hamilton's method if you're interested.
> The whole point of the original article is that, today, the only thing keeping planes in the air is fault tolerance and lots of redundancy (see also: Erlang). Assertions are about humility and it's good to be humble.
Planes stay in the air because they are designed to survive the failure of the individual parts. Now I would never suggest that proven-correct software cannot fail. Of course it can. Need I invoke the Cosmic Rays? I heed Murphy.
But you only write an assertion if you're not certain, eh? You're saying, in code, I think this will never ever happen, so if it does STOP-THE-LINE.
That kind of uncertainty is absurd, inexcusable, in software. (One of the very few realms that this is true, Herr Gödel notwithstanding. Software is electrified math, math is the "Art of What is Certain".)
I'm saying that our current common methods of software development are pathetic, that we are working far too hard to write software with far too many bugs, and it doesn't have to be this way. I'm further saying (to the OP) not to waste breath arguing with folks who can't or won't even accept the possibility! I've been doing it for a few years now and it's thankless. I've managed to convince exactly one other person and he's a brilliant programmer who came to it from a background in philosophy. He's a thinker rather than a puzzle-addict or complexity junkie.
Anyhow, if I manage to bring this thing together I'll make a noise here on HN (despite YC getting into bed with the Chinese Communists, the greedy fools.)
> On one hand you seem to suggest that error handling is bad because there shouldn't be errors (what about I/O? user error?) while on the other hand you seem to be defending the position that monadic error handling is a panacea.
Nope. Nothing like that. I can't explain it all her and now, but uh... Have you heard of Elm-lang? They boast zero front-end errors. Zero. The Elm compiler makes the JavaScript and it does not error. They do not write "error handling" to achieve this.
pp. 76-77 of Ousterhout's "A Philosophy of Software Design" talk about the issues with boilerplate try-catch. He basically does not approve, as they make readability suffer and often do not do what they advertise (truly catch errors.) I don't know how he would feel about production assertions, but I'm guessing he would buy into it if it helps reduce complexity. In the sense of finding remedies to troublesome bugs--which will occur.
This kind of absolutism only works in the lab and the journal.
The real world of software engineering is enormously varied and each project faces a different set of legitimate constraints and objectives. It's more valuable to discuss how and when to use "assertions, throws, and try-catch[es]" than to pretend that they're universally superceded by some other construct.
One faction believed you should never intentionally crash the app, while another faction believed the app was going to crash anyways or maybe do something worse.
They decided to test it out and during a sprint that was already mostly about bug fixes where they would do extra testing before release, they enabled asserts, and closely monitored their crash reporter.
What they found was that the number of crashes did not change much, but the cardinality went down, significantly. The learning was code executing past a disabled assertion may be in one of n different bad states, each of which might lead to a different type of crash. They now had better high-level information about what was causing crashes (knowing which asserts were wrong) and it helped them reduce their crash rate much more than raw crashes without asserts (including cases where the crash was in iOS code, not app code).