Hacker News new | past | comments | ask | show | jobs | submit login
Everything is broken (2014) (medium.com)
84 points by gpvos 66 days ago | hide | past | web | favorite | 24 comments



One thing not addressed here is: I can make better software. I can fix these problems. I can find them first, and then fix them. I _can't_ do that when I'm under ridiculous time pressure from an "agile coach" who barely knows what an Excel macro is but who, for some reason, is responsible for deciding what gets prioritized and who gets a bonus and who gets downsized. I can't identify and correct complex technical problems when I'm attending a one-hour daily "scrum" explaining that yes, I'm still working on "task" 5398129 and yes, it was a 4 "point" story (because that's the maximum we're allowed), but as it turns out, almost everything interesting and useful is also complex -- in direct opposition to the mindset that drives the oxymoronic atrocity that people now refer to as "agile".


Another way I've seen this put: we can already write software that's reliable the way bridges are reliable. The only problem is that people don't want to pay for it, wait for it, or accommodate it.

Cleanroom development supposedly lead to production code with 1 bug per 10k lines back in the 80s, and NASA famously managed Shuttle code with 0 bugs in 500,000 checked lines. Other projects like Ariane 5 were not so lucky, but the general error rate is still incredibly low. To my knowledge, software in general has killed fewer spacecraft than other narrow issues like voltage inconsistencies or launch vibrations. And after the Therac-25 incident, medical device software has followed similar a development track. Even automotive software, subjected to far fewer restrictions and more varied environments, has a better critical-failure track record than well-understood components like brakes and tires.

Producing software like that is orders of magnitude more expensive than producing 'normal' code. It takes far longer, in ways that more money can't neatly displace. It rarely talks to other systems, because doing so requires exhaustive testing at the interface (for every version), or testing both systems as one (with exponential testing requirements). And it's severely limited in range; freeform inputs and best-guess behavior when adjacent systems fail are out of the question.

There are institutional changes that could bring us better software under existing constraints, definitely. But it's also true that the general approach to design is sort of hopeless; the sheer amount of software getting built is inconsistent with what rigorous design looks like in pretty much any field.


> Producing software like that is orders of magnitude more expensive than producing 'normal' code.

Yes. Which I think points out that it's not exactly just a question of _will_ or _desire_, that we don't "want" to pay for it, wait for it, accomodate it.

It would mean anyone producing software would need _an order of magnitude more money_ to do it.

If we're talking about all software... it seems likely that our society literally could not afford it.

I suppose we could produce an order of magnitude _less but good software_ instead.

But we've built a society which is largely based on software, and been able to afford it as a society only because it's largely based on crappy software.


> it's not exactly just a question of _will_ or _desire_, that we don't "want" to pay for it... It would mean anyone producing software would need _an order of magnitude more money_ to do it.

This is a huge point, thank you.

I frequently see these discussions go down the road of personal responsibility, as in "you can't just say that it's ok for you to build things poorly because someone else would have done it if you didn't". Which is narrowly true, but strongly implies that bad code comes from individual programmers choosing to do things the easy way. The reality is that individual programmers can't choose to code this way, certainly not outside OSS. It's a structural difference that requires resources to enact.

> it seems likely that our society literally could not afford it... we've built a society which is largely based on software, and been able to afford it as a society only because it's largely based on crappy software.

And so there's the second major issue.

We're producing software at the sort of pace associated with simple consumer goods. If you're designing a new desk organizer, you can do it without much testing, meeting basic standards like "won't fall apart quickly during normal use" and "isn't made of something poisonous to touch". An engineer might well be involved, but they're looking at issues like ease of manufacture and shipment. Someone designing a basic website or app faces the same conditions: the product will fail if it doesn't work right or is unreliable, and they might face legal trouble if it directly causes harm during normal use.

I keep seeing people - including smart, influential programmers - suggest that software development lacks oversight and needs rigorous professional organizations and licenses to practice. Yonatan Zunger, as an example, draws a comparison to bridge collapses and the development of nuclear weapons, then a proposes a licensing regime like those in civil engineering, law, and medicine. (https://twitter.com/yonatanzunger/status/975545527973462016?...)

Frankly, this seems nuts to me. 99.9%+ of software development is lower-stakes than bridge design, and it has enough benefits that constraining it to the pace and price of bridge design has real downsides.

I do see obvious spaces for improvement, like ensuring that there are meaningful costs for companies that lose PII while neglecting basic precautions. But if we're going to keep producing software at even a significant fraction of what we do today, there's probably no way around falling far short of 'best practices' development.


Yep. But I'm less sanguine than you about it. While it's true that most software is lower stakes than bridge design, I can't help but thinking basing huge and ever-increasing parts of our economic and social life on software that is _complete crap_ is going to bite us hard eventually.

> But if we're going to keep producing software at even a significant fraction of what we do today

As a software engineer, I kind of think maybe we _shouldn't be_. Yes, there's little likelyhood of any way out of at this point...

So if nothing else, I'm a lot less dismissive of efforts to promote professional standards than you are, as at least _some_ attempt to respond to the pile of crap we are helping to provide as a foundation for social and economic life. While some software needs more care than others, under capitalism software gets only as much care as required for short-term profits, and we get a giant pile of crap which will at best be a constant annoyance to all of us, and at worst increasingly harm us all.


> While it's true that most software is lower stakes than bridge design, I can't help but thinking basing huge and ever-increasing parts of our economic and social life on software that is _complete crap_ is going to bite us hard eventually.

Sadly, I think you're right.

I only laid out part of my view: most coding isn't comparable to bridge design and shouldn't be attempted that manner, and attempts to seriously slow down software development are probably hopeless now that we're so immersed.

But I share your fears, because most software falls in the wide gulf between bridge design and desk organizer design. It may not do anything terribly high-stakes, but neither is it a single user per instance and a constrained failure mode. Even simple projects are often public-facing, and running alongside more significant things. Many are open to user input, rely on external dependencies, and gather relatively sensitive info like "who read what".

The standard failure case for software really has no comparison in mass-produced goods: one user employing a flaw in the product to harm another user. It's stuff like Magecart, which unpredictably harms users in settings where no one they knowingly dealt with was malicious or negligent. Worse, software tends to have major exposure to adjacent software and hardware; not only do attacks like XSS turn low-importance software into a threat vector, but major companies (e.g. Lenovo, Sony, Sennheiser) keep shipping hardware that compromises everything running on a machine. Honestly, software seems to fail more like monetary and political systems, where malice converts small mistakes into large, indirect harms.

(And all of that is just how failures happen now. Crap like NPM left-pad suggests that we could see serious global outages over trivial errors, and any major threat like Heartbleed or Spectre could become a disaster if the wrong people get there first.)

I'm still pretty dismissive of professional standards, but it's not for lack of concern. There are definitely appealing aspects, especially when I see things like companies using and defending plaintext password storage. I'd like to live in a world where people are at least told why that's bad practice before they do it, and maybe a world where there's some kind of authority to intercede with the idiots who steadfastly defend it after being taught.

I just fundamentally don't think they'll work for most of what ails us, and I except bad faith to become a problem almost immediately. The examples I see cited aren't just fields with high-stakes work, but ones with defined owners producing linear throughput; a few engineers design one bridge, a doctor treats one patient at a time. Indirect failures happen, bolts shear and drugs have side-effects, but even there chain of custody and area of responsibility can be clearly defined. But software seems to function more similarly to fields like banking, politics, or even intelligence, where effects are nonlinear, often extraterritorial, and ownership of output isn't clear. Professional standards bodies in those fields, even ostensibly powerful ones, seem to incessantly come up short or late. And that's before the infighting starts; already it seems like most calls for standards groups slide near-instantly from "banning malpractice" to "banning work I consider immoral".

Perhaps I'm not so much sanguine as fatalistic. There is too much software being produced, but I'm not sure how to fight that without crippling existing dependencies. There are a few legal changes I'd very much like to see, centered around making companies financially responsible for harms from bad practice so that they at least have to worry about profits. But overall, this feels more like a dilemma than a problem; we've already baked in so many major risks that it's not clear we have a way back.


> Producing software like that is orders of magnitude more expensive than producing 'normal' code. It takes far longer, in ways that more money can't neatly displace.

Absolutely which is why the solution isn't to say "Everyone write software like NASA" it's to work on reducing the cost of writing correct software while acknowledging that the average dev is well average (and I include myself in the category of mediocrity).

We need better tools and better languages that have all the safety rails possible and come with extra bubble wrap.


> it's to work on reducing the cost of writing correct software while acknowledging that the average dev is well average

Absolutely agreed.

I wrote the above because I'm frustrated by how often these conversations lead to either calls for more diligence by individual programmers (who are both average and likely working in broken systems), or draconian oversight approaches that amount to treating website developers as though they're designing bridges.

The really interesting things in this domain aren't attempts to force programmers to do a better job, but to make doing a better job easier. It's stuff like Rust attempting to break new ground in safe-by-default memory management, or Let’s Encrypt lowering the bar to getting set up with TLS. Oauth has some downsides, but it's certainly helped move us away from a world where every random site makes its own sloppy attempt to manage user auth. For that matter, IDEs deserve credit for reacting to common classes of mistake; defining a local variable then operating on the input instead is the sort of thing that's easier to catch with an automated warning than with diligence or even tests.

This sort of change has already taken web reliability from "sketchy at best" to "surprisingly good"; common hosts like AWS make basic load balancing and scaling straightforward, and CDNs and DDoS mitigation have become increasingly standard. We're not going to be free of bugs anytime soon, but I think there's good reason to expect actual progress.


That's because security doesn't bring money on the table for most products.

People are ok with their private life being violated, they don't mind discreet censorship or mass monitoring.

Since we make money with those users, and because the market moves fast, the only security most companies accept to afford is protection against script kiddies and bad PR.

But there is a reason the average Joe doesn't care about all those things: it works for him, most of the time. The amount of persons really having their life seriously disturbed because of a security flaw is small. Even during the last decades of data leaks, the actual individuals suffering the consequences are a small number.

So economically and socially, it's been a winning strategy up to now. Just like pollution, mass consumption, etc.


100% this. I also find it extremely annoying to have to give daily status updates on highly complex, time consuming problem. It shows a complete lack of understanding of the work being done.

The sooner we move om from this Agile garbage the better.


Agree. Over many years I realise I actually found waterfall great as a dev. There the worst that could happen was a clueless PM poking in every week and having to be given a vague estimate of how done we were with various tasks. Sure, it probably sucked for that PM, but I was having a good time. Now? Sometimes having to meaninglessly sum up where I am with a piddly ‘agile’ task MULTIPLE times a DAY. It takes more time to update jira/managers/coworkers than to just do the damn thing. Really winds me up.


Yes, I strongly prefer waterfall. Back in the good old days we had a biz. quarter to deliver a set of features. This allowed time for a spec., thinking about how it would all work, doing a design doc, working with QA on how they'd effectively test the features etc. Weekly status updates were adequate and an appropriate "polling frequency".

All of that is now thrown out the window in the mad hacking frenzy to make progress on the stupid "points" and try and show progress at the dreaded daily standup meeting (even though it's not supposed to be a progress meeting).

Then to make it worse, various team's progress are compared by looking at "points completed" without any kind of normalization of difficulty or risk for the different teams.

Then the final insult is the whole debacle is deemed a "success" because we've "switched to Agile", thereby checking off a box that some CTO with too much time on his hands read about in some MIT technology review.

I would actively refuse any job offering "Agile development" unless they had a very practical & pragmatic way of implementing it. The only one getting any benefits from this crap are the Agile consultants and the PMs.


I wish you had used "task" 19143203 instead ;)


I agree with most of the article. However, what I do disagree with is how they lump "security experts" into a single category:

Computer experts like to pretend they use a whole different, more awesome class of software that they understand, that is made of shiny mathematical perfection and whose interfaces happen to have been shat out of the business end of a choleric donkey.

I'm sorry, but there are vastly different grades of security experts. Security experts make Kali Linux. I'm pretty sure everyone runs their user as root despite it being created by security experts.

Now, look at the OpenBSD developers in comparison. Sure, bugs are found as they inevitably are, but they make it very difficult to take advantage of bugs that might be disastrous on other operating systems. They use privilege separation throughout their operating system (and packages if possible), announced recently their way of making ROP-chain exploits basically useless, and relink their kernel any time it's booted so that no two instances are alike (even if it's the same version on another computer). Using defense in depth is key. Unfortunately it's easy to talk yourself up in this field and not walking the walk.

There's a reason OpenSSH is such a highly deployed application and yet isn't constantly having RCE bugs. Sure there are bugs (as all software inevitably has), but there are definitely different degrees of security experts that the article fails to mention and lumps them all in one bucket.


Interesting article.

The first 75% I would send to my non-technical friends and family as well as the technical ones, as an awareness builder.

The last 25% are a surprising call for action (surprising as from the tone of article, I expected it to be a fun rant only). It's tricky because a lot of people tend to discount knowledge imparted once they perceive it was with the objective of persuading you to The Cause. It makes the informational portion more suspect of cherry picking or bias. In other words, while I personally agree both with the technical rant, and the call to action, I feel majority of my friends would react negatively to two of them together, just on general principles/reflexes.

Not sure what the solution would be; perhaps two linked articles - first the rant, and then a link to "Proposed solution" article at the bottom? Most people I know would react better if they felt they were in control of asking for solution guidance, rather than part of a seeming bait & switch...


Postgres was misusing fsync for 20 years. Yes, everything is broken.

I’m actually thinking it might be worth snapshotting my bank accounts, to make sure super reliable bank software doesn’t lose my money. At the end of the day it is written by the same folks as me :)


The number of programming fuckups (or bugs - call them what you want) causing me real life stress seems to be increasing exponentially as I'm getting older


That's because the amount of software affecting your real world ("meatspace") life is increasing exponentially without a corresponding decline in bugs-per-line.

This is a real problem, we have to adopt exponentially safer code generation methods or reduce the absolute amount of code in use, or both. (Or we can kiss our asses goodbye.)


I would say the number of bugs increases, because ASA you push stuff out there is never enough time to refactor and fix. Because well.. too many ppl think it "does not bring money"


Not completely sure about that. Grew up in the early industry and it was rife with complete crap software. Really the only thing somewhat protecting the software at that time was the lack of internet access and threats. Attempting to use that same software now would lead it to be exploited in minutes.

I remember messing with Windows 95 with IE 3 installed. It was really easy to create. NetMeeting shortcut that would crash with a buffer overflow error.


>The number of programming fuckups (or bugs - call them what you want) causing me real life stress seems to be increasing exponentially as I'm getting older

Why is that? Sure, the number of those fuckups might increase, but the number of fucks one gives as they're getting older should decrease!


Reminds me of Programming Sucks: https://www.stilldrinking.org/programming-sucks


A classic perennial favorite.

Everyone shakes their head knowingly about the problem, but whenever someone here suggests using safer and saner tools, the response is howls of protest.

"Buggy code is the developer's fault. Just try harder!"

"It's BNF/not BNF!"

"It uses/doesn't use types!"

"It doesn't let you achieve the last epsilon of speed!"

"It doesn't allow/does wrong my preferred form of prototypes/mixins/inheritance!"

"Everybody is using something else!"

"This is fine."


"Facebook and Google seem very powerful, but they live about a week from total ruin all the time. ... The US government would fall to a general revolt in a matter of days. It wouldn’t take a total defection or a general revolt to change everything, because corporations and governments would rather bend to demands than die. These entities do everything they can get away with — but we’ve forgotten that we’re the ones that are letting them get away with things."

Amen, is all I have to say to that




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: