
The shockingly obsolete code of bash - sumoruman
http://blog.erratasec.com/2014/09/the-shockingly-bad-code-of-bash.html
======
cygx
Among some valid criticism, the author complains a lot about trivial things.
He also thinks that assignment inside the comparison part of a for loop is
'weird'.

Sure, it could and perhaps even should be avoided in this particular case, but
in general it's a useful and common C idiom. Compiler writers are aware of
that and avoid issuing warnings if you parenthesize the offending assignment
expression.

He also compares pointer values against NULL. If I were as uncharitable as he
has been, I'd argue that this shows a 'shocking' lack of knowledge about how C
works. In C, comparing against NULL is as 'bad' as comparing boolean
expressions against true and false in other languages.

 _edited: be less cranky_

~~~
chris_wot
_Compiler writers are aware of that and avoid issuing warnings if you
parenthesize the offending assignment expression._

I see no parentheses in that code.

~~~
cygx
Which would have been a valid complaint.

~~~
chris_wot
Out of interest, where exactly is the source code history for bash?

~~~
cygx
There's [https://ftp.gnu.org/gnu/bash/](https://ftp.gnu.org/gnu/bash/)
starting with 1.14.7 and containing diffs back to 1.14.0.

There's also a corresponding git repository, also starting at 1.14.7:
[http://git.savannah.gnu.org/cgit/bash.git/log/?ofs=50](http://git.savannah.gnu.org/cgit/bash.git/log/?ofs=50)

No idea about earlier revisions.

------
lelf
_Sigh_. New famous bug, and again it’s everyone’s obligation to open the code
and express their tsk-tsk, how bad, how bad.

~~~
mistaken
I had hoped that at least there was some new vulnerability discovered in the
code, but he is just bashing about coding style. While it's true that the code
could be rewritten to be more readable, nobody is willing to invest that
amount of work. The whole issue is blown out of proportion because people use
bash in an unsafe manner.

~~~
chris_wot
Whilst I feel that making personal attacks on the writer are no good at all,
it does look like bash isn't using modern C, and is using nifty shorthands
that doesn't make the code particularly clear.

GNU wants freedom for people to distribute code. But do they want us to have
the freedom to _understand_ their code? :-)

~~~
jimktrains2
I'm sure they would accept clean-up patches.

~~~
DSMan195276
I think it would depend what you're doing though. Some of these aren't really
a big deal, but they'd create tons of code churn if you went and changed them.
They probably don't have enough people to review that many code changes for
not a lot of gain. I don't know anything about the maintainers though, so I'm
just assuming.

------
foxhill
at the risk of sounding rude, this guy probably only learned C last week. i
mean, all but one of his complaints are noise.

> k&r function headers

unusual, but totally valid, even in C99. a minor syntactical difference. not a
style i'd use, but no different to the alternative. and if a static analyser
would find this confusing, you probably shouldn't use it.

> global variables

i buy this. although your figure of "5" global variables is totally arbitrary.
"0, 1, or infinite" principle.

> lol wat?

granted this for loop is arcane, but it's obvious what it does. note that the
suggested alternative increments the index at the end of the loop body, etc.
might not have been exploited there, but perhaps it's a style used somewhere
else where it _does_ matter, and it's been used here to maintain consistency.

regardless, it's probably better to use

    
    
        const size_t len = strlen(env);
        for(size_t i=0; i<len; i++)
          //etc
    

and using an assignment expression in a boolean statement is totally valid.

> no banned functions

at this point i decided the author _clearly_ does not know what he's talking
about, and has joined a bandwagon of code-hate.

banned functions. unsafe memory moves. "what if the destination is too small
for the result?". urgh. _look_ at the code that was pasted. strcpy, obviously
has pitfalls that can easily overwrite random bits of memory. but this code
cannot possibly exhibit it. if you want to complain about _safe_ use of strcpy
for the functions potential unsafe-ness, well, you may as well argue that C is
an unsafe language, as the runtime doesn't catch out-of-bounds accesses, etc.

> where to go from here

if you've got style complaints about bash's source, change it, and submit
patches. blog posts like the one posted do not help anyone.

------
oppositelock
Is this trolling?

Maximum of 5 global variables? For loop syntax? "objective" measures of code
quality?

This kind of code shows up in any legacy code base, and style isn't what leads
to bad code. Bash had a parser bug, a bug which anyone could have introduced
irrespective of style.

~~~
geon
The only mention of shellshock "starting with the function that is at the
heart of the #shellshock bug", is also the only connection to it.

I suppose the objective measures he refers to are the static analysis he
mentions in the next paragraph.

It is hardly a controversial that global state is bad, and that code should be
clear rather than clever.

~~~
cygx
_code should be clear rather than clever_

But clarity depends on context. Personally, in context of the C language, I
find code like

    
    
        while ((string = *env++))
    

(which is how the offending loop could have been written as well) perfectly
clear. I doubt someone coming from a, say, Java-background would agree.

Similarly, he complains that the loop index was called _string_index_ instead
of the 'far superior' _i_.

With C99, I'd agree. But in a C90 codebase, do you really want to have
declarations like

    
    
        int i,j;
    

instead of

    
    
        int char_index, string_index;
    

at the top of a block?

------
retroencabulato
Really? Complaining about the style of a for-loop? They are scraping the
bottom of the barrel here.

~~~
_delirium
The whole post reads like an attempt by some security consultancy to
capitalize on a currently high-profile issue by writing an "analysis" based on
looking at the source code for 10-15 minutes and picking out a few arbitrary
things to write a blog post about.

------
maximumoverload
The main issue is, I think - if something works, nobody wants to fix it.
Especially if that _something_ is full with 80s code.

Bash works, pretty well, for a lot of people. So nobody wants to touch it.

~~~
TheCondor
Ironically, the shit is free, fork it and have at it! Contribute fixes...
"Security guys" don't do that though, do they?

There are a whole lot of really important pieces of code that not a lot of
people find sexy to work on.

~~~
maximumoverload
Yeah, but it works already, and there are also so many features it's hard to
know what all could be broken by randomly fiddling around the code.

(I suppose bash does not have a test suite.)

------
droob
I think code should be able to hang around for 30 years without automatically
being called "shockingly obsolete." Are there choices the programmers could
have made that would have held up better over the years?

------
DanBC
It's easy to bash Bash with the benefit of hindsight.

I'd be a lot more interested if someone had performed this analysis of Bash
and then created some search thing that found similar programmin style in
other tools - especially if they also found interesting bugs.

~~~
forgottenpass
_then created some search thing that found similar programmin style in other
tools - especially if they also found interesting bugs._

It's called static analysis and (judging by the tools we use) probably already
ship with rules that would bark loudly at this code.

~~~
DanBC
Isn't that what caused the Debian random number bug? Valgrind and purify
spotted weird code; a Debian package maintainer removed it and submitted a
patch upstream; upstream didn't notice it.

~~~
forgottenpass
Yep, it sure was. Bugs introduced because a developer is unfamiliar with the
codebase is a thing that happens.

------
al2o3cr
"Global variables are bad. Your program should have a maximum of five, for
such things as the global debug or logging flag."

Just exactly five and no more, huh? Better rinse that number off, because we
both know where you got it from.

~~~
emeraldd
I have to say, this one made me laugh and fit quite well with the rest of the
article ...

------
userbinator
_All it does is confuse programmers_

...programmers who don't know C? None of this is particularly tricky C code.

 _While these functions can be used safely by careful programmers, it 's
simply better to ban their use altogether._

"If there's a chance something _could_ be dangerous, ban it"? This feels like
the same sort of reasoning that the government uses to deprive people of
personal freedoms... all in the name of "security". It's not unexpected given
these people are in the security industry, but sometimes I wonder what kind of
world they really want...

The whole article has the feel to it of being written by someone to whom "best
practices" are the gospel, and misses the extremely simple reason behind the
bug: accidentally reusing a function that does both command parsing _and
execution_ to evaluate function definitions. The reason for this misuse could
be because command execution is being conflated with evaluating function
definitions, but more importantly, none of the complaints raised in the
article matter; neither changing function definitions to C89+, removing global
variables, rewriting the style of loops and renaming their variables, nor
using "safer" string functions would've eliminated the bug. Without all of
these, the same mistake could be made. It's a little amusing that one of the
gripes is about variable naming, when I think one of the functions involved in
the bug couldn't be named any clearer: parse_ _and_execute_ ().

There is a certain allure to ticking off some list of "best practices" and
thinking that it'll somehow solve everything, but as this bug shows, there's
simply no replacement for thinking carefully about the code and what it does
when implementing functionality.

~~~
yuhong
Not by themselves, but there has been more than one bug in this code:
[http://lcamtuf.blogspot.ca/2014/09/bash-bug-apply-
unofficial...](http://lcamtuf.blogspot.ca/2014/09/bash-bug-apply-unofficial-
patch-now.html)

------
Pxtl
As much as I think the fact that assignment in C returns the value causes
barbaric illegible code, c programmers seem to love it (similar uses of
increment squick me just as much - return a value _or_ change state, not
both). Like it or not it's a popular feature so it's silly to argue that the
loop isnt idiomatic. The plethora of globals is more worrying to me.

The loop construct shows a coder who is obsessed with terse code, but knows
their crap. The globals show somebody who doesn't care about sensibly
organized code.

~~~
cygx
_As much as I think the fact that assignment in C returns the value causes
barbaric illegible code, c programmers seem to love it_

C doesn't have all the nice sugar that comes with newer or more high-level
languages.

Sure, it would look nicer if we were able to write things like

    
    
        for (char cc : stdin.chars) { ... }
        for (string : iterator(env)) { ... }
    

But that's not an option, so we make do with what's available:

    
    
        for (int cc; (cc = getchar()) != EOF; ) { ... }
        while ((string = *env++)) { ... }
    

Assignment expressions are crucial for that.

~~~
readerrrr
For me, one of the advantages of C is that it lacks syntactic sugar. Writing a
couple of extra characters is an acceptable tradeoff. Keeping it as simple as
possible, with as few operators and keywords, is better.

------
dezgeg
These kinds of code issues aren't unique to bash. Just take a look at almost
any FOSS project with a bit of history, say GCC, GNU binutils or Xorg, and
you'll find the same issues in those projects.

IMNSHO the real issue is with the developers itself. A huge amount of them
seem to be stuck in the 80s and anything invented in the field of software
engineering is just ignored by them. Like there are plenty of programming
languages now with saner string handling, but no, C89 or pre-ANSI C it is.

~~~
keithpeter
How long would it take to re-write the code base to modern standards?

Perhaps some of the _very_ large companies who consume this code could
contribute to the cost.

~~~
nknighthb
I think your great-great-great grandchildren have other plans than to be
introducing new bugs in a misguided quest to make old code more aesthetically
pleasing.

~~~
keithpeter
Exactly the point I was making. Rapid response to vulnerabilities found and
perhaps even a bounty on new ones might be the more time efficient approach.

------
andmarios
If only instead of blogging about how “badly programmed” bash is people went
over to the mailing list and helped clean up the code...

------
dasil003
Coding standards will only get you so far. The bigger issue in my mind is the
risk of having such a powerful shell at the core of arbitrary software stacks.

What I hope comes out of this is that Linux distributions standardize on a
bare-bones shell for system-level and even application-level scripts, and more
powerful shells are used only for interactive sessions.

~~~
hadoukenio
> What I hope comes out of this is that Linux distributions standardize on a
> bare-bones shell for system-level and even application-level scripts, and
> more powerful shells are used only for interactive sessions.

It's called POSIX, aka ISO/IEC 9945:

    
    
      http://www.unix.org/version3/iso_std.html

~~~
dasil003
Yes, smart guy. I mean in practice.

------
rumcajz
The article assumes that "best practice" in 80s was the same as it is today.
It was not. Best practice evolves.

~~~
Pxtl
I think tons of globals smeared across several files was bad practice back in
the 80s.

~~~
rumcajz
IIRC they were not considered to be the pinnacle of good style, but they were
much more acceptable than they are today.

Interesting fact: In early 60s there was a serious discussion about whether
stack frames (called activation records back then) should be global or not.

------
PSeitz
The time would be better spent to create a pull request instead of a blog post
without any added value.

~~~
shangxiao
If only it was on GitHub or Bitbucket ;)

~~~
nacs
Well, it is available on git here:
[http://git.savannah.gnu.org/cgit/bash.git/](http://git.savannah.gnu.org/cgit/bash.git/)

------
jrochkind1
I do not find it shocking that 25-year-old code is obsolete.

Heck, most any project I've worked on that was started more than 3 years ago
has obsolete code. If it was started before it was a near-universal habit to
write tests, oh yeah we're scared to touch it.

But I agree with the author that this is a sign of things to come. Our digital
empire is built on layers upon layers of bad code, and the economic cost to
fix it would be enormous. And that's what's shocking. Expect things to get a
lot worse before they (maybe?) get better.

~~~
Someone
It is not obsolete, it uses obsolete technologies/conventions.

Using obsolete technologies does not imply bad code, either.

One doesn't see blog posts complaining about the shockingly obsolete stuff in
the Eiffel Tower, either (wrought iron & rivets. Why don't they rebuild it in
welded steel, or, better yet, 3D print it in steel?), or of the over
engineering in Gothic cathedrals (the ones still standing use more stone than
necessary, now that we know how to do strength calculations)

Yes, if you want to make strong claims about the security or performance of
bash, its code probably needs work, but there is plenty of newer stuff that
also has its issues (for example, the typical mobile phone OS needs quite a
few iterations to get its lock screen right, the average brand-new thing also
has security issues, etc)

------
belorn
If someone want to make a modern new shell that has no obsolete concept in
them, they should take this as an opportunity to do so.

With no backward compatibility to think of, I can see several nice goals to
have. It should use a modern language like python, bridging the unnecessary
gap between scripting languages and shell. It should remove environment
variables in favor of IPC like kdbus.

And while at it, removing the concept of tty would remove a huge obsolete
concept from an bypass age.

~~~
Pxtl
MS tried that with powershell. It sucked.

------
kyberias
> The philosophy of the Linux kernel is a good one to emulate: documented
> functionality describing how the kernel should behave will be maintained
> until Linus's death. Undocumented, undefined behavior that stupid programs
> depend upon won't be maintained.

I think this is utterly incorrect. Linux principle is "don't break user-space
unless we really really have to". That includes dependence on undocumented
behavior.

------
bsaul
The problem seems to me that bash is written in an obsolete language. It isn't
just a matter of style, but the fact that the language and compiler helps you
so little is a clear sign that the technology itself is obsolete and
unsuitable for critical piece of software.

~~~
clarry
That you do do not use or like it doesn't make a language obsolete.

------
vezzy-fnord
GNU code in general is hardly beautiful and has plenty of anachronisms. I'm
not sure why the author has never bothered to look at the source code of his
underlying base system until now.

------
agumonkey
Complaining about global mutable variables, clarifying loop indexes and
termination condition ... The author is ready to learn ML.

------
icantthinkofone
And so begins the internet pile on.

------
benihana
This has been a bug since the early 90s. With both heartbleed and this coming
in the same year, is it time to stop thinking and teaching the concept that
open source _inherently_ makes bugs easier to find?

~~~
fl0wenol
Doesn't it? Whether or not a product is open source, if no one thinks there's
a problem in a code base, no one is going to find the bug. It's when you
either: a) Are prompted to do an evaluation of the component for pay,
education, or self-gratification or b) Come across a problem behavior and are
trying to understand what causes it ... does one discover such things.

But without the source code, you can do neither of these things easily. The
likelihood of the a) case occurring is lower by virtue of accessibility.

With the source code and the development history of the product in the open,
we can at least see whether a misbehavior was intentional, or exploited logic
questionable, and that there might have been a mistake in the original
thinking. How easily would one have come to those conclusions if the source
was not private? The response to the issue would have been very different, and
I fail to see how if the source code wasn't open would it have somehow made it
less likely a bug from the 80s wouldn't still be there.

I think it's even more important now than ever to keep source open to make
sure bugs can be easily found; even if the benefits are marginal to discover,
they are certainly better handled in the aftermath.

