Hacker News new | past | comments | ask | show | jobs | submit login
The shockingly obsolete code of bash (erratasec.com)
52 points by sumoruman on Sept 28, 2014 | hide | past | web | favorite | 68 comments



Among some valid criticism, the author complains a lot about trivial things. He also thinks that assignment inside the comparison part of a for loop is 'weird'.

Sure, it could and perhaps even should be avoided in this particular case, but in general it's a useful and common C idiom. Compiler writers are aware of that and avoid issuing warnings if you parenthesize the offending assignment expression.

He also compares pointer values against NULL. If I were as uncharitable as he has been, I'd argue that this shows a 'shocking' lack of knowledge about how C works. In C, comparing against NULL is as 'bad' as comparing boolean expressions against true and false in other languages.

edited: be less cranky


Compiler writers are aware of that and avoid issuing warnings if you parenthesize the offending assignment expression.

I see no parentheses in that code.


Which would have been a valid complaint.


Out of interest, where exactly is the source code history for bash?


There's https://ftp.gnu.org/gnu/bash/ starting with 1.14.7 and containing diffs back to 1.14.0.

There's also a corresponding git repository, also starting at 1.14.7: http://git.savannah.gnu.org/cgit/bash.git/log/?ofs=50

No idea about earlier revisions.


Sigh. New famous bug, and again it’s everyone’s obligation to open the code and express their tsk-tsk, how bad, how bad.


The best part is when he concludes by passing the buck to "somebody" to fix it.


That ending really galled me as well and I immediately thought of this old yarn:

There was an important job to be done and Everybody was sure that Somebody would do it.

Anybody could have done it, but Nobody did it.

Somebody got angry about that because it was Everybody's job.

Everybody thought that Anybody could do it, but Nobody realized that Everybody wouldn't do it.

It ended up that Everybody blamed Somebody when Nobody did what Anybody could have done


I had hoped that at least there was some new vulnerability discovered in the code, but he is just bashing about coding style. While it's true that the code could be rewritten to be more readable, nobody is willing to invest that amount of work. The whole issue is blown out of proportion because people use bash in an unsafe manner.


Whilst I feel that making personal attacks on the writer are no good at all, it does look like bash isn't using modern C, and is using nifty shorthands that doesn't make the code particularly clear.

GNU wants freedom for people to distribute code. But do they want us to have the freedom to understand their code? :-)


Yeah the code could be more readable, by today's standards. OTOH it's using style and idioms that were common practice when it was written.

Technical debt is something you accrue when you deliberately violate a current best practice for gains elsewhere (usually meeting a deadline). It's not generally used to refer to code that's simply old.

Anyone 20 years from now looking at a snip of code you write today is going to have similar complaints.


I'm sure they would accept clean-up patches.


I think it would depend what you're doing though. Some of these aren't really a big deal, but they'd create tons of code churn if you went and changed them. They probably don't have enough people to review that many code changes for not a lot of gain. I don't know anything about the maintainers though, so I'm just assuming.


Hey, at least people are looking now. Of course these condescending blog posts serve little purpose, but the author could've noticed something more important in the process of writing it.

Imagine if heartbleed happened and everyone just moved on because bugs happen, deal with it.


at the risk of sounding rude, this guy probably only learned C last week. i mean, all but one of his complaints are noise.

> k&r function headers

unusual, but totally valid, even in C99. a minor syntactical difference. not a style i'd use, but no different to the alternative. and if a static analyser would find this confusing, you probably shouldn't use it.

> global variables

i buy this. although your figure of "5" global variables is totally arbitrary. "0, 1, or infinite" principle.

> lol wat?

granted this for loop is arcane, but it's obvious what it does. note that the suggested alternative increments the index at the end of the loop body, etc. might not have been exploited there, but perhaps it's a style used somewhere else where it does matter, and it's been used here to maintain consistency.

regardless, it's probably better to use

    const size_t len = strlen(env);
    for(size_t i=0; i<len; i++)
      //etc
and using an assignment expression in a boolean statement is totally valid.

> no banned functions

at this point i decided the author clearly does not know what he's talking about, and has joined a bandwagon of code-hate.

banned functions. unsafe memory moves. "what if the destination is too small for the result?". urgh. look at the code that was pasted. strcpy, obviously has pitfalls that can easily overwrite random bits of memory. but this code cannot possibly exhibit it. if you want to complain about safe use of strcpy for the functions potential unsafe-ness, well, you may as well argue that C is an unsafe language, as the runtime doesn't catch out-of-bounds accesses, etc.

> where to go from here

if you've got style complaints about bash's source, change it, and submit patches. blog posts like the one posted do not help anyone.


Is this trolling?

Maximum of 5 global variables? For loop syntax? "objective" measures of code quality?

This kind of code shows up in any legacy code base, and style isn't what leads to bad code. Bash had a parser bug, a bug which anyone could have introduced irrespective of style.


The only mention of shellshock "starting with the function that is at the heart of the #shellshock bug", is also the only connection to it.

I suppose the objective measures he refers to are the static analysis he mentions in the next paragraph.

It is hardly a controversial that global state is bad, and that code should be clear rather than clever.


code should be clear rather than clever

But clarity depends on context. Personally, in context of the C language, I find code like

    while ((string = *env++))
(which is how the offending loop could have been written as well) perfectly clear. I doubt someone coming from a, say, Java-background would agree.

Similarly, he complains that the loop index was called string_index instead of the 'far superior' i.

With C99, I'd agree. But in a C90 codebase, do you really want to have declarations like

    int i,j;
instead of

    int char_index, string_index;
at the top of a block?


Really? Complaining about the style of a for-loop? They are scraping the bottom of the barrel here.


The whole post reads like an attempt by some security consultancy to capitalize on a currently high-profile issue by writing an "analysis" based on looking at the source code for 10-15 minutes and picking out a few arbitrary things to write a blog post about.


The main issue is, I think - if something works, nobody wants to fix it. Especially if that something is full with 80s code.

Bash works, pretty well, for a lot of people. So nobody wants to touch it.


Ironically, the shit is free, fork it and have at it! Contribute fixes... "Security guys" don't do that though, do they?

There are a whole lot of really important pieces of code that not a lot of people find sexy to work on.


Yeah, but it works already, and there are also so many features it's hard to know what all could be broken by randomly fiddling around the code.

(I suppose bash does not have a test suite.)


I think code should be able to hang around for 30 years without automatically being called "shockingly obsolete." Are there choices the programmers could have made that would have held up better over the years?


It's easy to bash Bash with the benefit of hindsight.

I'd be a lot more interested if someone had performed this analysis of Bash and then created some search thing that found similar programmin style in other tools - especially if they also found interesting bugs.


then created some search thing that found similar programmin style in other tools - especially if they also found interesting bugs.

It's called static analysis and (judging by the tools we use) probably already ship with rules that would bark loudly at this code.


Isn't that what caused the Debian random number bug? Valgrind and purify spotted weird code; a Debian package maintainer removed it and submitted a patch upstream; upstream didn't notice it.


Yep, it sure was. Bugs introduced because a developer is unfamiliar with the codebase is a thing that happens.


"Global variables are bad. Your program should have a maximum of five, for such things as the global debug or logging flag."

Just exactly five and no more, huh? Better rinse that number off, because we both know where you got it from.


I have to say, this one made me laugh and fit quite well with the rest of the article ...


All it does is confuse programmers

...programmers who don't know C? None of this is particularly tricky C code.

While these functions can be used safely by careful programmers, it's simply better to ban their use altogether.

"If there's a chance something could be dangerous, ban it"? This feels like the same sort of reasoning that the government uses to deprive people of personal freedoms... all in the name of "security". It's not unexpected given these people are in the security industry, but sometimes I wonder what kind of world they really want...

The whole article has the feel to it of being written by someone to whom "best practices" are the gospel, and misses the extremely simple reason behind the bug: accidentally reusing a function that does both command parsing and execution to evaluate function definitions. The reason for this misuse could be because command execution is being conflated with evaluating function definitions, but more importantly, none of the complaints raised in the article matter; neither changing function definitions to C89+, removing global variables, rewriting the style of loops and renaming their variables, nor using "safer" string functions would've eliminated the bug. Without all of these, the same mistake could be made. It's a little amusing that one of the gripes is about variable naming, when I think one of the functions involved in the bug couldn't be named any clearer: parse_and_execute().

There is a certain allure to ticking off some list of "best practices" and thinking that it'll somehow solve everything, but as this bug shows, there's simply no replacement for thinking carefully about the code and what it does when implementing functionality.


Not by themselves, but there has been more than one bug in this code: http://lcamtuf.blogspot.ca/2014/09/bash-bug-apply-unofficial...


As much as I think the fact that assignment in C returns the value causes barbaric illegible code, c programmers seem to love it (similar uses of increment squick me just as much - return a value or change state, not both). Like it or not it's a popular feature so it's silly to argue that the loop isnt idiomatic. The plethora of globals is more worrying to me.

The loop construct shows a coder who is obsessed with terse code, but knows their crap. The globals show somebody who doesn't care about sensibly organized code.


As much as I think the fact that assignment in C returns the value causes barbaric illegible code, c programmers seem to love it

C doesn't have all the nice sugar that comes with newer or more high-level languages.

Sure, it would look nicer if we were able to write things like

    for (char cc : stdin.chars) { ... }
    for (string : iterator(env)) { ... }
But that's not an option, so we make do with what's available:

    for (int cc; (cc = getchar()) != EOF; ) { ... }
    while ((string = *env++)) { ... }
Assignment expressions are crucial for that.


For me, one of the advantages of C is that it lacks syntactic sugar. Writing a couple of extra characters is an acceptable tradeoff. Keeping it as simple as possible, with as few operators and keywords, is better.


These kinds of code issues aren't unique to bash. Just take a look at almost any FOSS project with a bit of history, say GCC, GNU binutils or Xorg, and you'll find the same issues in those projects.

IMNSHO the real issue is with the developers itself. A huge amount of them seem to be stuck in the 80s and anything invented in the field of software engineering is just ignored by them. Like there are plenty of programming languages now with saner string handling, but no, C89 or pre-ANSI C it is.


> almost any FOSS project

Though we will probably never know, I bet having legacy code isn't exclusive for open source projects.


How long would it take to re-write the code base to modern standards?

Perhaps some of the very large companies who consume this code could contribute to the cost.


I think your great-great-great grandchildren have other plans than to be introducing new bugs in a misguided quest to make old code more aesthetically pleasing.


Exactly the point I was making. Rapid response to vulnerabilities found and perhaps even a bounty on new ones might be the more time efficient approach.


Remember a lot of these projects have portability goals. They are not just developing on Linux on an x86 architecture.

C is one of the few choices that is available on most computing platforms.


If only instead of blogging about how “badly programmed” bash is people went over to the mailing list and helped clean up the code...


Coding standards will only get you so far. The bigger issue in my mind is the risk of having such a powerful shell at the core of arbitrary software stacks.

What I hope comes out of this is that Linux distributions standardize on a bare-bones shell for system-level and even application-level scripts, and more powerful shells are used only for interactive sessions.


But POSIX shell sh which is part of UNIX standard is not that much less powerful (it's really not that much different than BASH). So you would have to get UNIX group to break backward compatibility (of 40 years) and downgrade the shell to something less powerful.

But why? We have had UNIX shells since 1972 and they work and work well for what they were designed.

Also, the criticism of code style is unfounded. It was written ages ago to the standards of the day, tested to work and left alone as it should have (it works, works well and why touch it?). It may look strange to the people who have only started programming in the last couple of years and who would happily re-write every peace of working code to conform to their idea of coding standards and how things should be done.


> What I hope comes out of this is that Linux distributions standardize on a bare-bones shell for system-level and even application-level scripts, and more powerful shells are used only for interactive sessions.

It's called POSIX, aka ISO/IEC 9945:

  http://www.unix.org/version3/iso_std.html


Yes, smart guy. I mean in practice.


Debian’s Almquist Shell seems like a step in the right direction. While having more features than the `system()` call might need, it’s a more pare-downed shell that retains POSIX compatibility.


The article assumes that "best practice" in 80s was the same as it is today. It was not. Best practice evolves.


Best Practice evolves because of lessons learned about the problems in previous practice. I think the point of the article isn't that Bash wasn't originally written with 2014 Best Practices, but that it hasn't been kept up to date and therefore hasn't been improved based on those lessons learned.


Not to mention that it is fscking portable and no, it's not easy at all to use all the shiny new features on that IBM whatever that it perfectly runs on.


I think tons of globals smeared across several files was bad practice back in the 80s.


IIRC they were not considered to be the pinnacle of good style, but they were much more acceptable than they are today.

Interesting fact: In early 60s there was a serious discussion about whether stack frames (called activation records back then) should be global or not.


The time would be better spent to create a pull request instead of a blog post without any added value.


If only it was on GitHub or Bitbucket ;)


Well, it is available on git here: http://git.savannah.gnu.org/cgit/bash.git/


I do not find it shocking that 25-year-old code is obsolete.

Heck, most any project I've worked on that was started more than 3 years ago has obsolete code. If it was started before it was a near-universal habit to write tests, oh yeah we're scared to touch it.

But I agree with the author that this is a sign of things to come. Our digital empire is built on layers upon layers of bad code, and the economic cost to fix it would be enormous. And that's what's shocking. Expect things to get a lot worse before they (maybe?) get better.


It is not obsolete, it uses obsolete technologies/conventions.

Using obsolete technologies does not imply bad code, either.

One doesn't see blog posts complaining about the shockingly obsolete stuff in the Eiffel Tower, either (wrought iron & rivets. Why don't they rebuild it in welded steel, or, better yet, 3D print it in steel?), or of the over engineering in Gothic cathedrals (the ones still standing use more stone than necessary, now that we know how to do strength calculations)

Yes, if you want to make strong claims about the security or performance of bash, its code probably needs work, but there is plenty of newer stuff that also has its issues (for example, the typical mobile phone OS needs quite a few iterations to get its lock screen right, the average brand-new thing also has security issues, etc)


If someone want to make a modern new shell that has no obsolete concept in them, they should take this as an opportunity to do so.

With no backward compatibility to think of, I can see several nice goals to have. It should use a modern language like python, bridging the unnecessary gap between scripting languages and shell. It should remove environment variables in favor of IPC like kdbus.

And while at it, removing the concept of tty would remove a huge obsolete concept from an bypass age.


MS tried that with powershell. It sucked.


> The philosophy of the Linux kernel is a good one to emulate: documented functionality describing how the kernel should behave will be maintained until Linus's death. Undocumented, undefined behavior that stupid programs depend upon won't be maintained.

I think this is utterly incorrect. Linux principle is "don't break user-space unless we really really have to". That includes dependence on undocumented behavior.


The problem seems to me that bash is written in an obsolete language. It isn't just a matter of style, but the fact that the language and compiler helps you so little is a clear sign that the technology itself is obsolete and unsuitable for critical piece of software.


That you do do not use or like it doesn't make a language obsolete.


GNU code in general is hardly beautiful and has plenty of anachronisms. I'm not sure why the author has never bothered to look at the source code of his underlying base system until now.


Complaining about global mutable variables, clarifying loop indexes and termination condition ... The author is ready to learn ML.


And so begins the internet pile on.


This has been a bug since the early 90s. With both heartbleed and this coming in the same year, is it time to stop thinking and teaching the concept that open source inherently makes bugs easier to find?


Doesn't it? Whether or not a product is open source, if no one thinks there's a problem in a code base, no one is going to find the bug. It's when you either: a) Are prompted to do an evaluation of the component for pay, education, or self-gratification or b) Come across a problem behavior and are trying to understand what causes it ... does one discover such things.

But without the source code, you can do neither of these things easily. The likelihood of the a) case occurring is lower by virtue of accessibility.

With the source code and the development history of the product in the open, we can at least see whether a misbehavior was intentional, or exploited logic questionable, and that there might have been a mistake in the original thinking. How easily would one have come to those conclusions if the source was not private? The response to the issue would have been very different, and I fail to see how if the source code wasn't open would it have somehow made it less likely a bug from the 80s wouldn't still be there.

I think it's even more important now than ever to keep source open to make sure bugs can be easily found; even if the benefits are marginal to discover, they are certainly better handled in the aftermath.


What do you suggest the alternative should be to make bugs inherently easier to find?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: