Hacker News new | comments | ask | show | jobs | submit login
Bug 1202858 – Restarting squid results in deleting all files in hard-drive (redhat.com)
429 points by geoffbp on Mar 24, 2015 | hide | past | web | favorite | 168 comments

I love how deadpan the bug report is:

    Actual results:
    All files are deleted on the machine.

    Expected results:
    Squid is restarted.
Not many details yet but it sounds similar to the Steam bug [0] from last year.

[0] https://github.com/valvesoftware/steam-for-linux/issues/3671

I started laughing at the lines

    Stopping squid: ................[  OK  ]
    rm: cannot remove `/boot': Device or resource busy

Except the Valve bug passed their QE process if any and got out into the wild. This is for an unreleased version of RHEL, and was caught....

QE? Is that a misspelling of QA, or is there another meaning beyond Quantitative Easing?

Quality engineering. It's Red Hat's term, as well as a few other software companies, but its QA.

Though QA is Quality Assurance when really it's QC )Quality Control) that should have caught the error. QA puts processes into place that means there is a QC that can catch this. Sorry, going off-topic but as a tester I dislike being told to 'QA this'

Oh I see. Thanks.

Note that:

  At the time of this writing, RHEL 6.7 is still pre-beta
  and this bug was found in an *UNRELEASED* update to squid.
Bad enough, but not like it's out in production.

I would bet on the issue being in the init script itself rather than squid. (I'm assuming squid doesn't run as root by default in rhel) If that's true then it's another point for more sane process managers (upstart/supervisord/systemd/...)

Agreed, I should've elaborated. All it takes is something like this in the init script without checking if the variable is empty:

    rm -rf "$STEAMROOT/"*

And this is why it is important to write something like

    set -eu
on top of your bash scripts -- execution will stop on errors (non-zero retvals) and on undefined variables.

I also include set -o pipefail (exit if ANY command in a pipeline fails). Had to get bitten and waste an hour before that became a habit.

set -e and set -o pipefail really should have been the default, rather than an opt-in.

set -o pipefail makes common idioms a pain. Consider using head, which simply exits after it has read a few lines. In this case, the input process gets a SIGPIPE and exits with a non-zero exit code:

Consider /tmp/test.sh:

  set -o pipefail
  yes foo | head

  $ bash /tmp/test.sh >/dev/null
  $ echo $?

That's a bug IMHO which I reported at http://lists.gnu.org/archive/html/bug-bash/2015-02/msg00052....

I've collated other mishandling of closed pipes at: http://www.pixelbeat.org/programming/sigpipe_handling.html

For a while now, I've thought we should change SIGPIPE's SIG_DFL action to _exit(0).

That's not as simple or clear as you make it sound though.


disagrees and refers to GreyCat's preference not to use -e at the bottom of the list of 'complications'.

From the same page:"rking's personal recommendation is to go ahead and use set -e, but beware of possible gotchas. It has useful semantics, so to exclude it from the toolbox is to give into FUD."

You can use set -e, and turn it off (set +e) for code blocks and things that are problematic. He could also add '|| true', and you may be able to use colon to avoid point problems without turning everything off. These are edge cases and you can easily work around them if you an advanced user.

If you are not an advanced user then you should certainly use -e.

  $ diff -u /tmp/a /tmp/b
  --- /tmp/a	2015-03-24 08:33:00.021919797 -0400
  +++ /tmp/b	2015-03-24 08:33:05.629963015 -0400
  @@ -1,5 +1,5 @@
   #!/usr/bin/env bash
   set -e
  -let i++
  +let i++ || true
   echo "i is $i"
  $ /tmp/a
  $ /tmp/b
  i is 1

or check the variable before using it, like any other programming language:

[[ "$VAR" ]] && rm -rf "$VAR/*"

I think most of these issues stem from the fact that most developers that write shell scripts don't actually understand what they're doing, treating the script as a necessary annoyance rather than a component of the software.

If anyone understands shell scripts, it would be people writing init scripts at Red Hat :)

Anyways, that is not anything like other programming languages. Checking in that way is error prone and not really an improvement (nor equivalent to set -o).

  [[ "$DAEMON_PATH" ]] && rm -rf "$DEAMON_PATH/*"
See what I did there? It's an rm -rf /* bug because "checking variables" is not the answer.

In other programming languages, if an identifier is mis-typed things will blow up. E.g., in ruby if I write:

  daemon_path=1; if daemon_path; puts deamon_path; end
I get "NameError: undefined local variable or method `deamon_path`"

These issues do not always stem from bad developers. Bash's defaults are not safe in many ways and saying "people should just check the variable" isn't helpful here.

Shameless plug for my language "bish" (compiles to bash) which aims to solve many of these annoyances with shell scripting: https://github.com/tdenniston/bish

Bash has the ability to also flag use of an undefined variable an error, it is just not on by default.

set -u

Man page quote: "Treat unset variables and parameters other than the special parameters "@" and "*" as an error when performing parameter expansion. If expansion is attempted on an unset variable or parameter, the shell prints an error message, and, if not interactive, exits with a non-zero status."

Elephant in the room- shell is a bizarre language

Yeah, everyone always loves to shit on BAT (which is fair, it is terrible) and VBS (which is slightly less fair) but inspite of how many problems Bash has (least of all the massive security issue last year), it gets off almost scot free.

These bugs are indicative of Bash's design problems. Why is it used for init scripts? And don't even get me started on how Bash interprets filenames as part of the arguments list when using * (e.g. file named "-rf").

Say what you will about Powershell, but having a typed language that can throw a null exception is useful for bugs like these. The filename isn't relevant, and a null name on a delete won't try to clear out of the OS (just throw).

> it gets off almost scot free.

Not just scot free - during the Great systemd War of 2014 is was a talking point for the antis that using anything other than the pure, reliable simplicity of shell for service management was MADNESS!

I don't think that was the argument, as much as it was that if a shell script fouled up it was easier to get in and do field repairs because it was interpreted rather than compiled.

>And don't even get me started on how Bash interprets filenames as part of the arguments list when using * (e.g. file named "-rf")

That's not Bash. That's just... programs in Unix. Such is life when everything is stringly typed.

I think a better alternative is something like

    rm -r "${VAR:-var_is_not_set_so_please_fix_this_script}"
which substitutes the var_is_... if VAR is not set.

BTW, I hate hate hate -f. It has two meanings: 1. 'force' the removal 2. ignore any error

I've seen an instance of this sort of bug in my sysadmin career that I remember. It was a Solaris patch which wiped a chunk of the system.

No, this will remove 'var_is_not_set_so_please_fix_this_script' file if one exists.

If you're suggesting using parameter expansion, at least suggest the correct one (i.e. one that will give a meaningful error message):


Yep, better. Thanks.

> like any other programming language

some real-world programming languages don't have undefined variables :)

Since the variable he shows is used in a string interpolation, it doesn't have to be undefined.

Being the emptys string "" would work just as well.

I wonder why set -eu is not the default setting.

1. open bash

2. set -e

3. type an invalid command or run one that returns non-zero

4. "crap, where did my shell go?"

It could be the default for non-interactive shells without causing this problem. Or we could have a more nuanced rule, where -e means "stop executing the current sequence of commands as soon as there is an error", where a "sequence of commands" is a single line in an interactive shell (so "false; whoami" would print nothing), or the entire file in a script.

The real answer is that this has not been the default in the time between shells being invented and this comment being posted, and so the squillions of lines of shell script out there in the wild keeping the world turning have not been written with this in mind. Making it the default now would break a lot of things.

With the benefit of hindsight, though, i would say that yes, this should have been the default in scripts. Oh well.

There are lots and lots of these 'nuanced rules'.


set -u is good. set -e requires to change a lot of code. See for example


That's not completely true. At least with the GNU tools, 'rm' won't delete the root directory unless you specifically give it the '--no-preserve-root' flag. Since that flag has no use outside of deleting root, it's unlikely it has the flag on it. With that in mind the script must do some type of manual deleting for some reason.

I believe that "--preserve-root" applies only to / itself. That means `rm -rf /*` will expand to `rm -rf /bin /dev /etc /lib ...` and delete all anyway.

That's accurate. `rm -rf /* ` will still work to delete everything. But that said, `rm -rf "$STREAMROOT/"` can't ever expand to that, and more-over since the expansions in double-quotes it won't be subject to path expansion by bash. So even "/* /", which would normally expand into "/bin/ /dev/ /etc/ ..." won't. You can see what I mean yourself, just use echo:

    `echo /* `: /bin /dev /etc /lib ...

    `echo /*/`: /bin/ /dev/ /etc/ /lib/ ...

    `echo "/*"`: /*

    `echo "/*/"`: /*/
If you try it with `ls`, you'll find that `ls "/* "` results in `ls: "/* ": no such file or directory`.

Edit: Formatting.

The original example was `rm -rf "$STREAMROOT/"*`, though (the asterisk being out of the double quotes.) Now that glob will expand.

Ah, my apologies, I think that was HackerNew's markup at work. The '* ' wasn't there when I looked before.

This just happened to my coworker today. I'm sitting behind him telling him which commands to type (he's new to Linux...) when suddenly he jumps the gun and pushes enter just as I say "slash". My heart nearly stopped. I didn't even know preserve-root existed (plus I always iterate not to log in as root). It was a snapshotted vm but we still would have lost the day's work.

Coworker - intern? :P

It should optimize for this case and run mke2fs instead.

In the original Steam bug the asterisk is outside the double-quotes and path expansion definitely applies.

I feel like it would be a frighteningly common bug. I remember one like this from 2011 [1]. Install/packaging/utility scripts usually do not get as much attention and testing as the application code itself.

[1] https://github.com/MrMEEE/bumblebee-Old-and-abbandoned/issue...

I'd say the fact that these bugs only very occasionally happen - relative to the huge number of shell scripts out there that are being executed every day - that it's not really "frighteningly common". You only hear about the ones that fail.

By the same logic, memory safety issues only happen rarely, right? Most programs/scripts are going to be tested if part of a distribution and such errors removed. But without polling people it'd be hard to know of the many times this kinda thing messed things up. I personally wiped out a production DB due to expanding an unset variable (fortunately immediately after taking a backup.

This is, as the bug notes, a regression, and I'm guessing you're right about it being in the initscript (I'm pretty sure). I used to be a very heavy Squid user and Squid developer and I remember a very similar bug many years ago. It was in the cache_dir initialization code. It would read the configuration file, parse out the cache_dir lines, and if the directories didn't exist it would create them as part of the startup.

There were some circumstances where if there was no cache_dir line configured, or if the cache_dir was a link or something, the details are very sketchy in my mind after so much time, but it would end up destroying /.

I'm guessing this is of that same nature.

RHEL uses systemd.

This is RHEL6.7, it doesn't use systemd.

Oh, I see.

Specifically, this bug would not have happened with systemd since systemd does not leave the handling of pid-files and pid-dirs to shell-scripts.

No, but if an analogous bug happened (systemd forgot to set an internal squidroot variable before clearing the squidroot, for instance), it would be much, much harder to figure out what was going on. Which is really what everybody's complaints boil down to.

That normally does not happen when handling pid-dirs, simply because those are standard and can be handled with standard tools.

Every time I've seen such a bug (honestly, not many), it was created when cleaning a temporary dir.

"systemd" and "sane" only ever go in the same sentence as "sane people don't use systemd".

It looks like a bug in the init script; runnign it as squid's user wouldn't have triggered destroying the whole filesystem; likely just squid's config and anything under its /var.

I'll be the first to call out systemd for a lot of things, but not its core init idea. It's the same as daemontools, upstart, supervisord, and others do. Implementation is very different of course, but the idea is common - you run/kill services, not start/stop them. That's the reason we can leave the ugly and error-prone init scripts behind.

> It looks like a bug in the init script

Which is what happens when you have every daemon writing their own PID handling code, running as root, in a language whose interpolation rules nobody really understands.

A legacy of sysv rather than having shell script handle init.

It is quite possible to have the script for PID handling be written once, and imported as needed.

"systemd" and "sane" only ever go in the same sentence as "sane people don't use systemd".

It looks like a bug in the init script...

Ha ha ha ha ha ha ha.

I hear people complaining, but why has every distro picked it up then? If it's so insane, why are these people all converging on it?

Devs love it (in particular web service/app/buzzword-off-the-day devs), admins (at least those not in charge of "cattle") loath it.

This because what it provides it rapid spin up of containers and VMs, while everything talks to each other via APIs and DBUS.

But this rapidity also leads to issues with field repairs and debugging.

"Everyone" is adopting it because the Linux money is in web servers/services.

My good old trick to mitigate that is:

    touch /-@
I also always do it in my home directory:

    touch ~/-@
That's the first thing I do on a new host.

When accidentally running rm -f *, the command expands to -@ first, which is not a valid option and makes the command fail before doing any harm

    rm: illegal option -- @
    usage: rm [-f | -i] [-dPRrvW] file ...

While this is a really nice hack, stuff like this is also the reason I feel really uneasy when writing shell scripts. What works now may suddenly break in the future due to inadequate escaping of filenames.

That won't actually help with rm -rf /* , only with rm -rf * in / or $HOME.

Would it work with `touch "/ -@"` (not the space)?

No, it doesn't work. "./*" expands to "./ -@" as a single field, which rm has no problems with. (Note that, however, this is still the globbing of the shell, as far as I understand.)

how would changing the filename fix it? it's a hack relying on the globbing of the shell. if you're not using the globbing, the hack can never help you.

Very handy trick. More so if no root access, myself I preer to rename rm and drop a shell wrapper in place and can be simply a case of changing all passed "" into "Sorry_Dave_I_can_not_allow_you_to_do_that" (or words to that effect) and all "-AsteRISKdeleteALL" into "-" and then that modified input is passed onto rm. But can adjust how and add rules to taste.

That way the pain of having to type AsteRISKdeleteALL instead of * for rm events offsets any anxiety by far.

You can also catch the rm and mv to a difectory with quota's you can call a recycle bin, some low end attached storage can be fine as well as not many situations when your wildcard deleting with a time factor. Can accommodate this in your own skulker to clean up in a more organised way overall in a timely manner. and and scripts you can path to the real rm command if needs be, last time I called it P45Generator, but not the finest for readability in any such scripts.

As someone who uses the commandline a lot but isn't exactly a wizard, why does this work?

globbing expression expand to all the files that match.

if a directory has 1.txt, 2.txt, and 3.txt then << rm * >> expands to "rm 1.txt 2.txt 3.txt" and is then executed.

if you have -@, 1.txt, 2.txt, and 3.txt, that expands to << rm -@ 1.txt 2.txt 3.txt >>, and that can't execute.

(if you really wanted to remove your -@ you'd do << rm -- * >> because a double-dash signals the end of command-line options.)

The * is expanded by the shell to a space-delimited list of filenames, but the shell does not adequately escape filenames that can be misinterpreted as arguments to 'rm'.

That's kind of scary — in that case I guess I should avoid creating a file named -rf.

Yes. There was an article linked from HN ages ago (at least a year) that went into mitigation techniques for these issues. As you expect, it basically became fractal, and even then still had bugs. I wish I still had the URL.

I think it may be more scary for code that allows arbitrary execution using command-line arguments. Commands like find or xargs using without defense against this would be a problem. For example, site that does something precious with your uploaded pet pictures.

Defending against this being the use of -- to signal an end of command line arguments.

That is really interesting.

Can someone knowledgeable about the shell expand on this? I don't dare test it on my machine.


  $ touch important
  $ chmod 400 important
  $ rm *
  override r--------  vbezhenar/staff for important? n
  $ touch -- -rf
  $ ls -l
  total 0
  -rw-r--r--  1 vbezhenar  staff  0 Mar 24 14:35 -rf
  -r--------  1 vbezhenar  staff  0 Mar 24 14:35 important
  $ rm *
  $ ls -l
  total 0
  -rw-r--r--  1 vbezhenar  staff  0 Mar 24 14:35 -rf
  $ rm -- -rf

I've not tested it, but it should expand just like anything else. The effect would broadly be that running "rm *" in the directory would recurse into subfolders without warning.

Re testing. If you follow this link - https://www.digitalocean.com/?refcode=3fc9a5a35c52 - you get $10 free credit (affiliate link I get $25 if you spend that much in future) on DigitalOcean.

You can spin up a droplet and use the online shell tool or ssh in (very easy when you've set up a cert as the droplet can have the cert setup automatically).

Then you can mess about with a droplet as much as you like, virtually speaking. Once you're done then use the control panel to destroy the droplet - it costs a few ¢ a day and if you don't have a droplet in use (which means active or paused; preserving images is cheaper but non-zero) then you don't pay anything.

Basically sign up and have a year of uptime to mess with a full install of various OS with no charge.

Make sure you don't write "rm -rf /*" in the wrong terminal!

> shell does not adequately escape filenames that can be misinterpreted as arguments to 'rm'

That sounds like a bug to me, or at least depending on suboptimal behavior.

More specifically, it is expanded on the shell


$ echo /*

/bin /boot /dev /etc /home (...)

Hm that's a cool trick. IIRC some distributions (Suse?) had a 'bash' clause where you couldn't do "rm -rf /" without 'y'.

My 'zsh' has this one too, when I 'rm -rf /some/dir/' always asks if I'm "sure". Truth to be told, I'm not even expecting the text in "stdout" anymore, my finger goes to the 'y' automatically, which means that if I make something stupid it won't be able to protect me :-P

The last couple of years I stopped doing 'stupid things' by stop working on the shell when I'm very tired. That was the cause of my rm-related-incidents in the past :-)

Some versions of RM (Ubuntu/Debian if I remember correctly) require --no-preserve-root.

Likely you mean: touch ~/-@

Neat idea! Just tested that in an OS X 10.8 virtual machine; while it works nicely against "rm -rf *", sadly it does not help stop an accidental sudo rm -rf / or ~/. Also, "touch ~-@" created a file in the home directory called "~-@"; in order to set the correct filename, I cd'd into ~ and then ran "touch ./-@".

  touch /--
  rm -rf *
(admittedly, this would be a malicious attempt rather than a careless bug)

Or just use zsh.

Would zsh still protect me if the script explicitly uses Bash (i.e. #!/bin/bash)? Sorry if this is a dumb question, I'm unsure how shells work when calling other kinds of shell scripts.

No. If the script has the #! set as /bin/bash, then it will run as bash.

zsh is an interactive shell. It is not to replace #bash as system shell as I know.

Zsh doesn't have to be used interactively.

It's also excellent for scripting, and has far more features than bash.

alias rm='trash-put'

rm -rf -- $EMPTY_VAR/*


We had one of these kinda bugs.

If you uninstalled our software it deleted a major chunk of your Windows registry, crippling your computer. It was a one character error in our script. The first ticket read "Uninstalling [Product] destroys your computer".

I was responsible for customer support. Good times! Was a rough week. We managed to not get sued.

Pretty terrible that such a thing is even allowed by Windows. This is why as a user I like Apple's OS X sandbox.

Parent said "This is why as a user I like Apple's OS X sandbox.".

This is a bug from 15 years ago, much much before the sandbox feature was introduced.

A sandboxed iTunes would have prevented that.

A sandboxed iTunes would also prevent syncing your iPod and importing existing music collections, because those both require access to files outside the sandbox, which is probably why Apple hasn't done that.

Programs can read files outside their sandbox if they're asked to by user input[1]. Sandboxing does not prevent interaction with USB devices

[1] See "Powerbox and File System Access Outside of Your Container" at https://developer.apple.com/library/mac/documentation/Securi...

That doesn't mean iTunes can't work in a sandbox, it'd just need to request specific permissions.

A sandbox isn't a totally isolated prison. It's a permissions system. Programs can read specific files and folders outside the sandbox and can even ask the user add new files/folders to their whitelist.

If a software needs/has root rights, then all bets are off. This is true on any OS.

Windows 8 introduced a sandbox and users hated it. (Probably because it also forced digital signatures but hey, specifics)

About a year ago or so I tried to fix a computer of an OS X user where the Dropbox installer somehow deleted the home directory and replaced it with the dropbox application... That was an odd experience (and one where sandbox didn't help...)

Thanks OP, this actually made me laugh uproariously. Anyway, I'd be willing to bet 100 push-ups that (unless it was malicious and not a bug), this thing is caused by some clean up code somewhere that originally intended to do "rm -rf /path/to/squid/socket" but the function that was suppose to generate the "/path/to/squid/socket" string instead generated a null which was then parseString'd onto a "" via some + function that was trying to do "/" + null.

But I'm neither a redhat user nor an OS dev, so I might completely wrong.

That's almost exactly how I blew up a test server once. (rsync --delete in place of rm) Taught me to be extremely careful when dealing with absolute directory paths.

Oh man... Rsync deserves extreme caution even compared with other bash commands. It's so easy to erase everything and copy terabytes again.

Hmm, there are two possible candidates in the init.d script for the RHEL 6 package of an older version of squid (doesn't look like bug submitter is using a current version).

In stop():

    rm -rf $SQUID_PIDFILE_DIR/*
and in restart():

    rm -rf $SQUID_PIDFILE_DIR/*
SQUID_PIDFILE_DIR is hardcoded to "/var/run/squid" at the top of my copy of the init script. But, neither of those rm commands check first to make sure that SQUID_PIDFILE_DIR isn't empty (or, better yet, is in /var and doesn't contain ".."), and either the submitter's copy of the script is mangled or something else somewhere is stomping on SQUID_PIDFILE_DIR in the shell environment.

...I should grep my init scripts for "rm".

squid-3.1.10-29.el6.src.rpm (from ftp.redhat.com, buried where they keep SRPMs) has squid.init within, and that file has no mention of SQUID_PIDFILE_DIR. A few other spot-checked versions are the same way.

https://github.com/mozilla-services/squid-rpm/blob/master/SO... ... however ... whatever that is, does.

Did they take this upstream init script somehow?

This looks like it:


I guess they applied that change which was obviously written against a very different init script where the variable is actually defined, got QA to test it and immediately backed it out.

Oh, good call. Yeah, you found it.

That would be my guess too, although the package maintainer confirmed it on (presumably) a fresh install. I can't find a candidate rm command anywhere in the SRPM, so maybe an upstream file got merged into distrib somehow? I don't have access to an RHEL system to try it out, and can't find the distrib RPM yet to check.

    "Thanks Swapna and Red Hat QE for catching this issue before the package was released. Great work!"
Looks like this wasn't released into production.

This comment should be at the top. False alarm, squid users, and now back to our regularly scheduled HTTP proxying.

For more details on why this issue doesn't affect squid users: https://rwmj.wordpress.com/2015/03/24/restarting-squid-resul...

It looks like the start-up/shutdown script is doing an rm -rf on a bash variable that's evaluating to null.

If that is, in fact, the case, it is indeed a repeat of the Steam bug that 'kavan mentions elsethread.

Citation needed.

It very well could have been an instance of the bumblebee bug (https://github.com/MrMEEE/bumblebee-Old-and-abbandoned/issue...) where it's "rm -f / some/file" instead of "rm -f /some/file"

Link to the code if you want to make that claim.

It's hard to trust anyone that would ever depend on such a variable without having some sort of precondition.

"it's hard to trust anyone that writes a dumb bug".

Listen to yourself. We've all written dumb bugs. We've all had that one line of code that was an obvious mistake. I still trust people who write a bug here and there because if I didn't I would have to forgo trusting everyone for everyone makes dumb mistakes sometimes.

At this point I refuse to trust anyone who writes shell scripts. Bugs happen, but a shell script is practically guaranteed to have zero automated tests and variable-related bugs are all too common.

Surely your distrust should be aimed at shell scripts not at everyone who has written one.

There has to be some standard. While I didn't use the word dumb, it is clearly careless.

Aren't you glad for QA? Humans are inconsistent.

10 bucks says the lesson learned will be "remember kids, always set -u (and other good ideas, like set -e, set -o pipefail, and personally I like set -o posix but you'll need to give up process substitution for that)."

I'm having trouble finding the related commit that fixed this, can anyone else find it?

This bug is an internal QA bug. The reporter is a Red Hat tester. The buggy RPM was never released outside Red Hat, and so there's no requirement to release the source. When RHEL 6.7 comes out the source will go up on ftp.redhat.com.

Maybe I would add a new record to this https://github.com/icy/bash-coding-style#good-lessons

..aaand this bug happened in 2014 as well?

no, it's my mistake. I have fixed that.

Scary. Even more scary is the fact that the bug has been open for a week, one person has confirmed 100% reproducibility, and no one seems to care at Red Hat. Isn't deleting peoples hard drives a big no no ?

It already has a "fixed in version" listed. Fixing it within a day doesn't sound like no one caring :)

Does it really matter if no one commented "Oops, we screwed up"? It's kinda self-evident that there was a mistake and there's not really much to say; it's clear from the description how bad it is and marking it "Fixed in version" already says it all pretty much.

> Does it really matter if no one commented "Oops, we screwed up"?

To me it does. You just deleted someone's hard drive, an apology wouldn't be out of order.

When you're a QA engineer at RHEL working on an UNRELEASED PRODUCT then no, I don't think you need an apology. Maybe a thanks for finding the bug, but this is the whole point of QA and the whole point of QA-ing before it's released.

The context matters a lot. This title is linkbait. It omits mentioning that this was not publicly released and the reporter is a QA for Red Hat. Given that context I doubt you'll still agree an apology is so necessary or that this was not handled well.

This being said, the bugtracker isn't very clear when it comes to ticket status. Currently the status is 'on_qa'. I guess if you use it a lot you're aware of it, but it strikes me as iffy UI.

It's exceptionally clear for anyone even remotely familiar with RHEL or Bugzilla. It's not mean for any joe's consumption anyway, this is a work tool used by developers and QE...

'exceptionally clear'? How do you detect that bug status from across the room? If that bug page was up on a big screen, how do you detect it's status? How do you tell at a glance from a meter away what the status is? All of those things should be doable for an 'exceptionally clear' status. Developers don't have mysteriously different visual capabilities from 'joes'.

this is a work tool used by developers and QE

Because as we all know, developers don't benefit from good UIs. One wonders why they would use a webpage at all, rather than simply connect to an ncurses-based bugtracker that only supports an 80x24 terminal.

> How do you detect that bug status from across the room?

Know and understand the tools you are working with, and simply click the Modified (History) link at the top to determine the bug timeline: https://bugzilla.redhat.com/show_activity.cgi?id=1202858

I don't understand why I'm getting pounded by downvoters. First someone tells me that a small-typeface 'status' is 'exceptionally clear' and UI doesn't really matter for developers. Next I'm told that in order to detect something from across the room, I have to click through a link.

I realise my comment was snarky, but both of these responses were in a patronising tone themselves. Does bugzilla have some sort of rabid fanbase akin to the vim/emacs wars?

its like, the very first line of the bug's informational body. And it shows up in bugzilla searches too: https://bugzilla.redhat.com/buglist.cgi?quicksearch=deleting...

I don't think anyone else has had a tone other than short-spoken

Keep in mind that I did not say anything like 'terrible' or 'worst ever'. I said 'iffy'. I also obviously know where to find the state, because I mentioned it in my original comment.

In return I got that it was 'exceptionally clear' (which it clearly isn't, given there's a few people in this thread that missed it); that UI doesn't matter for developers; a possible insinuation that I don't know the kind of tools devs use; and a follow-up comment that tells me I need to know my tools but then proceeds to completely ignore the use-case it explicitly quoted when telling me what I should do. I don't know how that second one can be seen as anything but patronising.

Continuing on with the theme, your point still doesn't answer any of the issues I had in my downmodded comment regarding what is 'exceptionally clear'. Can you tell at a glance from a meter away? My point isn't about the existence of a field, but its presentation.

If you don't think the UI is iffy, that's fine, we disagree. But let's not make up nonsense about things like devs not benefiting from good UI or offering solutions that don't match the quoted use-case.

> (which it clearly isn't, given there's a few people in this thread that missed it)

I feel this is more due to many people wanting to look at the bug itself, instead of its metadata (due to the title of the link)

> Can you tell at a glance from a meter away?

I seriously can't tell anything that isn't colored, I do not have any idea what is the concept of distanced vision, (short sighted, glasses still make things fuzzy)

I would wonder if the status in all bugzilla implementations would warrant viewable-at-a-meter, I personally find the presentation fine though: it is the first thing I see other than the title, unless I'm specifically not looking for it.

You are being down voted because you are complaining about something which isn't relevant. Complaining about bugzillas UI not being pretty and clear from across the room really isn't relevant to the issue or bugzillas purpose whatsoever.

If anyone is using an unreleased product on anything other than a test environment with anything other than test data, then the nicest thing anyone could do for them at this point is to simply point out the bug has been fixed.

RedHat is pleased to announce the general availability of our new RedHat Enterprise Data Recovery service. Please contact your account manager for details.

It's already fixed, plus there are many private comments which you cannot see (but I think you see "missing" comment numbers). Also the product in question is not released yet. It's good this was found, but no customer would have been affected unless they were using an alpha.

Well, the bug was reported against RHEL 6.7 which has not been released yet.

RH don't have a great reputation here. Unlike Debian which does proper triage and practices "zero release-critical bugs", RH threw out RHEL7 with loads of critical issues still open.

This is simply not true. Could you provide evidence rather than making stuff up.


Fresh steaming proof as requested:


All high severity bugs against 7.1 which was relased 16 days ago. Check the dates on half of them. They're before the release date and half of them haven't even been assigned or triaged.

When 7.0 came out, datetimectl and systemd didn't even work properly. Enabling ntp threw dbus errors galore. On some kit it didn't even boot. Total lemon.

RHEL doesn't generally work properly until the .2 releases. I've been using it for 10 years so I've got plenty of experience on the matter.

I would go into detail about the CIFS/smb kernel hangs I've had on 6.x but I've had enough of it by now.

The priority fields are set by developers so they know which bugs they should work on first. The two bugs of mine which appear on that list are both new features for RHEL 7.2. I set the priority of those so I know to work on them first. I really think you need a better query than that one.

Update: I think if you wanted to find out which critical bugs affected RHEL 7.0 on release, you'd probably want to look at the list of z-stream packages (RHEL 7.0.z) which subscribers have access to. These are bugs which didn't affect the installer or first boot, but were important enough to need fixing in RHEL 7.0 after it went out. (If a bug was critical enough to affect installation or first boot, it would have delayed the release).

First "high severity bug against 7.1" on the list:

"Customers would like to be able to use their IdM users to log on to Window clients that a part of the trusted domain."

"Doc Type: Enhancement"

Couple that with what rwmj said, you've effectively debunked yourself.

Redhat Linux (not RHEL) tended to have issues until the ".2" releases too, all the way back to 4.x in '96/'97.

Ah yes, like good old DSA-1571-1

I concede that one. Have an upvote.

And this is why every bug tracking system should have a triage SLA.

Or, even if you don't want to declare a threshold, publish the current stats on its front page: "Over the last 30 days, our 99-percentile triage wait time was: XX hours."

Similarly, open tickets with priority=urgent should never go 24 hours without a new comment from the owner.

"Or, even if you don't want to declare a threshold, publish the current stats on its front page: "Over the last 30 days, our 99-percentile triage wait time was: XX hours."

Now there's a good idea. Major open source projects should have software quality dashboards tracking things like that.

We're now seeing hospital emergency rooms displaying their current wait time in minutes on billboards.

Ahh Redhat, the distro which chose to symlink a bunch of binary lib files from Apache into the etc directory.

"Why is grepping /etc taking so long? Binary files in /etc?!? WTF?!?"

Coming from Debian, Redhat seems to make a lot of irk-worthy choices.

I find the /etc/httpd/logs symlink more annoying. If you want to grep through your Apache configuration you have to explicitly grep through conf and conf.d otherwise just going to /etc/httpd and doing a grep -r you're searching through gigs of Apache logs.

grep -r shouldn't follow symlinks, -R does however:

      -d, --directories=ACTION  how to handle directories;
                            ACTION is 'read', 'recurse', or 'skip'
      -D, --devices=ACTION      how to handle devices, FIFOs and sockets;
                            ACTION is 'read' or 'skip'
      -r, --recursive           like --directories=recurse
      -R, --dereference-recursive  likewise, but follow all symlinks

requires someone to know there will be symlinks they don't want to follow, though :-)

As opposed to Debian, the distro which chooses to break tomcat (a program which unzips into a single folder and is thereby completely self-contained) up into a million different pieces and scatter them randomly all over your hard drive?

To be fair to Debian, that's how it's supposed to work. You untar into /opt, self contained, while your apt install puts things in the Filesystem Hierarchy Standard, which means config goes into /etc.

You'll find most distros follow some FHS standard, although there's some differences in interpretation.

Yes, it's a distro where you can find EVERY setting on /etc, even if the software creator decided you should know to look somewhere else.

(That said, making the package self contained is the most sensible way for the developer to release it. It's just not a good option for a distro package.)

Is there an aricle with specifics on this and why it was done?

I have no direct information about this specific case, but in ye olden Unix™ days, there was no /sbin, so all those binaries instead lived in /etc. The Red Hat symlinks could be a backwards-compatibility thing.

You mean bug-compability?

No, I would not mean that, since the previous behavior was not a bug.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact