
Why the Progress Bar Is Lying to You - krambs
http://www.popularmechanics.com/technology/how-to/tips/why-the-progress-bar-is-lying-to-you
======
sofal
The file download was a bad example. I understand the file download progress
bar to mean how much of the file I have downloaded. No more, no less. Next to
that progress bar is the current rate and an estimate of the remaining time.
This is not what frustrates me. What frustrates me is a progress bar for an
installation task that sits at 99% or 100% for about the same amount of time
it took to get there in the first place. I'm skeptical that this kind of
problem happens merely because the last task happens to be unpredictably long
_every single time_.

I'm betting most of the problems are less about variance in execution time and
more about measuring the right tasks at the right granularity.

~~~
donall
I agree that the download was a particularly bad example, but I think the same
argument can apply to any progress bar. As a programmer, I always measure
progress as the percentage of tasks completed, whether that is bit downloaded,
modules installed, mappers finished mapping, etc.

The mapper example is actually particularly pertinent. Hadoop's GUI shows me
the progress of my M/R jobs with a bar, but it is as dependent on the amount
of other tasks running on the cluster as downloading a file is dependent on
the traffic in the network. There is no sane way to accurately estimate the
amount of time a M/R job is going to take a priori, as far as I know. A
stochastic method would be too variable and a method playing clever tricks
with psychology seems especially insidious, from an engineer's perspective.
You might as well replace it with an "Are we there yet?" button that responds
to you in a soothing voice "Not much longer now".

------
azov
I don't expect progress bar to accurately predict _time_ \- it's there to show
_progress_. The obvious way of implementing it is to split the task into
milestones of some sort and display how many are completed. It helps if
there's enough milestones so that the bar looks smooth and each one takes
roughly the same time, but it doesn't have to be.

Even for time estimates - I don't sit and watch the bar move from start to
end. I'll look at it for a few seconds, notice how fast it moves, and go do
something else. That "psychologically friendly" progress bar that starts
slowly and accelerates will do nothing but confuse me.

~~~
pithon
Well I guess the implicit assumption is that the rate of progress is
relatively constant - hence proportional to time.

------
js2
Another approach is doing away with the progress bar altogether. :-) Apple
used to have a boot progress bar. At one point, it worked by keeping how long
the _previous_ boot took, then on the next boot it just scaled the progress
bar to complete in the same amount of time.

With respect to download progress bars, one approach is to use an exponential
moving average:

[http://stackoverflow.com/questions/933242/smart-progress-
bar...](http://stackoverflow.com/questions/933242/smart-progress-bar-eta-
computation)

------
wtracy
Here's the research mentioned in the article:
[http://www.chrisharrison.net/projects/progressbars/ProgBarHa...](http://www.chrisharrison.net/projects/progressbars/ProgBarHarrison.pdf)

Briefly, they took a process that proceeded linearly, applied an arbitrary
function to the current progress, and displayed the result in the progress
bar.

The conclusion is that time taken is perceived as less when the progress
appears to start slowly and then accelerate, even when the actual total time
taken is the same.

------
tzs
Two things I've half seriously considered if I ever need to do a progress bar:

1\. Make the errors amusing to the user. For instance, if the operation goes
over the estimate, go ahead and fill the progress bar to 100%. Then pause a
second, and act embarrassed. Change the progress text to something like
"Uhm...err...well, this is a bit embarrassing...", then go one to explain it's
taking a little longer then expected.

Then say something like "OH MY GOD! THAT SPIDER BEHIND YOU IS THE BIGGEST I'VE
EVER SEEN!!!". While the user is presumably turned around to check out the
giant spider, drop the bar back to 80% or so, and pretend nothing ever
happened.

2\. Report progress bar inaccuracy back to the server, which can look and see
if there thing is being consistently off in one direction and make a
correction factor available to subsequent users of my software.

Might also stats to the user at the end of the operation, telling them things
like "This progress bar was off by 10%. The average user's progress bar was
off by 20%. Congratulations! Your computer was more reliable and consistent
than average!"

~~~
morsch
I would find option 1 and possibly option 2 off-putting in anything other than
a game. I realise progress bars are a more-or-less rough estimate (ie. "they
lie"). Just make it as good an estimate as you can reasonably do, and that's
fine.

For bonus points, display the underlying data in some way so that I can draw
my own conclusions, supplanted with "sidechannel" information I am likely to
have, e.g. previous experience with the process if nothing else. Extra bonus
points if you make this information optionally visible so that I can tune it
out if I don't need it and my mom doesn't get confused. I really like modern
copy dialogs in that regard (and not-so-modern ones on decent file managers).

If you can't make the estimate reasonable at all, _don't make a progress bar_
and display progress in some other manner instead. E.g. if you are copying
files, and you have no way of knowing how large each file is, consider simply
putting down a simple count ("Files copied: x/y") and if at all possible some
indication that the program is progressing at all ("Current speed: x MB/s").

Finally, please do try to add some kind of progress information for any task
that can take more than a few seconds. It's incredibly frustrating to stare at
a "busy" mouse pointer[1], having to guess whether an application has locked
up or if it's just taking extremely long. If you get it wrong, you usually end
up re-trying and being faced with the same situation. I have had to play
detective around applications misbehaving that way, and having to use top,
iotop, etc. to determine whether an app is using CPU, hard disk or network
resources is both a lot of work and non-conclusive (not to mention beyond the
capabilities of most users).

(This started out as a short reply and ended up quite being a more general
rant; the second time this has happened to me tonight. Sorry about that.)

[1] Though terminal utilities are often just as bad, because they're not just
designed with interactive usage in mind. However, each individual terminal
utility tends to do _less_ , which means complex process are broken down
anyway, which yields some progress information.

------
cbr
Several people [1][2] are saying they think of the progress bar as bring about
how much of the work (bytes downloaded, milestones completed) is done. It
doesn't matter how you interpret the bar: what matters is what your users will
think, and most people see the bar as indicating time.

[1] <http://news.ycombinator.com/item?id=3641046>

[2] <http://news.ycombinator.com/item?id=3641036>

~~~
alttag
I agree ... mostly.

One of the examples was bytes downloaded. The reason I, and I suspect other
users, interpret that as % downloaded is that in most interfaces (e.g.,
iTunes, Firefox, Safari), below the download progress bar, it says "XK of YK".
(There is also an estimated time remaining, but it's typically not on the left
side.)

In other cases (installs), yes it makes sense that the simplest interpretation
is time, because that's what we care about when installing: how long until we
can use the new shiny widget.

------
ComputerGuru
Very, very interesting piece. At my previous job, I spent perhaps an entire
month (plus all-nighters) working on a progress bar for the restore page of a
backup program. The catch was it was a multi-stage process, and we wanted one
and only one bar to represent the progress as smoothly as possible.

It's by far one of the most deceivingly difficult problems I've worked on. If
anyone cares for more info, this is what the restore process looked like:

    
    
       * Single thread that determines the list of files that need to be restored, asynchronously feeding to
       * Multiple threads that download the files from our servers as ZIP files, then each queue their results to either
        * A thread pool with a dynamically adjusted number of worker threads for the unzip and decrypt process, extracting the files to their final destination, OR
        * A different thread pool with a fixed number of threads that does block-level (byte-level differential) restore, which may, when processing a file, need to add files to the first thread mentioned in this list
    

To pull this off, we had to modify the backup process to store enough info to
be able to calculate, immediately when the user presses "start restore," the
total number of files to be restored, the total number of files to be
downloaded (which may be different because ZIP files can contain many small
files to minimize latency and overhead, and also because the byte-level
differential backup has "backup run" outputs), the total bytes to be restored,
the total bytes to be downloaded, the total bytes to be unzipped, etc. etc.
etc.

The progress bar had to move "smoothly enough" across the entire backup. If
you're restoring a hundred and one files, 100 of which are tiny and in a
single zip file, and the 101st being a huge differentially-backed-up file, the
101st will take forever and the progress should reflect that. For multi-GB
restores, the math could give you 100% (with rounding) for over 10 minutes -
need to jam the progress bar at 99% until it's actually done or you'll get
complaints. Likewise can't keep it stuck at 0% even if it rounds down to 0% or
you'll get complaints.

At the end of the day, the formula I came up with weighted for the following:

    
    
       * The number of files to be downloaded
       * The number of files to be restored
       * The size of the files to be restored
       * The number of files to be decrypted
       * The number of files to be differentially restored
       * The number of files required to differentially restore a single file
       * The size of the files required to differentially restore a single file
       * The size of files cached locally that can skip download
       * The number of files coalesced in a single ZIP archive
       * The destination drive (externals are slower than internals)
       * The average download speed
    

Fringe cases such as attempting to restore a single file that was contained in
a single ZIP with a hundred other files when tested with a naïve algorithm
would end up giving negative progress as if you adjust for the factors listed
above you'll end up needing to factor in a (relatively) "huge" number of bytes
for download while you're actually only grabbing a single "small" file from
the ZIP. Basically, if you adjust the weight of one factor for one case,
you'll end up getting non-smooth progress bars for other cases as a result.
Took a lot of charting, a lot of trial and error, a lot of user feedback (each
time from people that had never participated in the test before to prevent any
sort of bias or preconceived notions) to finally get it right. That code is
now classified as "no one touch it, no matter how trivial (you think) a bug
you found would be fix."

So, yes. Progress bars lie. It takes a shitload of work to make pull these
lies off, and if they told the truth, your users would really make sure you
never heard the end of it. Even if you've already technically done 80% of the
work, if only 20% of the required time has elapsed, that progress bar had damn
well not say 80%. Or 20% either, for that matter.

~~~
bdunbar
_a progress bar for the restore page of a backup program_

Maybe I'm unusual. Aside from tar'ing data to tape on elderly 'nix machines,
years ago, nearly all of the backup / restore jobs that I've done have been
for work - tape libraries, dedicated servers, etc.

I don't _care_ about progress bars, accurate or not. I'm not going to watch a
nifty GUI screen for hours - kick off the job, give me an estimate, notify me
when done.

People really care enough to make it an item worth that much skull sweat?
Weird.

~~~
ComputerGuru
_People really care enough to make it an item worth that much skull sweat?
Weird._

Unfortunately, yes. The B2C desktop software market is such a cutthroat
environment both in terms of what users expect and how much competition there
is. It's an entirely different beast and the devil really is in the details.
For some reason, bloggers, software review sites, magazines, etc. will really
run desktop software through the mill and pick at the nitty-gritties in a way
that they don't with webapps. Perhaps because with webapps the user experience
is rather ephemeral and can be easily patched or fixed whereas with desktop
apps everything has a very "final" feel to it.

------
bdunbar
Irrelevant Digression:

The niftiest progress bar I've ever seen wasn't a progress bar.

It was a little morality play.

As the game installed a dinosaur slowly bounced across the installer window
and _ate_ a hapless user .. who didn't send in his registration card.

Still remember that, seventeen years later.

~~~
sukuriant
Unfortunately, Atari owns a patent on little games loaded for someone to play
while a larger bit of data is being loaded.

~~~
bo1024
What, seriously?? How the hell can you patent that??

~~~
krambs
I would love to see that patent. I have a hard time imagining it would be
enforceable.

~~~
icebraining
It's probably this: [http://patft.uspto.gov/netacgi/nph-
Parser?Sect1=PTO1&Sec...](http://patft.uspto.gov/netacgi/nph-
Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=5718632.PN.&OS=PN/5718632&RS=PN/5718632)

It's actually issued to Namco, but it seems to describe exactly that.

I think I have a PSOne Namco game with a preloader, possibly Tekken. I never
so much as dreamed that was patented, though.

~~~
sukuriant
Yes, sorry. I was going off memory there. That's the patent I recall. Would
update, but no longer have edit ability.

------
InclinedPlane
Suggestion: use a succession of checkpoints (including start and finish) to
keep track of overall progress. Then create a database of checkpoints along
with timestamps (either from historical runs if this is a process that
normally runs often for an individual user or from a few sets of lab runs if
this is a one time thing like an installation). Now, calibrate the checkpoints
to the expected position on an "ideal" progress bar given the historical data
and present that to the user.

~~~
r00fus
I had a similar problem with a customer that wanted a smooth progressing
progress bar for periodic sync process where I had no idea how many actual
entries were being sync'd on any given run (the cost of finding out == cost to
sync)... the number was almost always more than the previous time.

So what I did was simply calculate how long the previous run took, add 10%,
use that as my new baseline, and show progress based on the percentage
between. Often this would mean the progress bar went straight from 93% to
complete - but this was preferred (better than stalling at 100% for minutes at
a time).

Eventually we automated the sync process, and the customer was happier looking
at the occasional exception report instead of a progress bar.

------
dreeves
For anyone who missed it, here's an idea for making progress bars way more
fun: <http://xkcd.com/1017>

And I'll throw in this one just because it's funny (and quite on-topic):
<http://xkcd.com/612/>

------
noblethrasher
Relevant:
[http://www.chrisharrison.net/projects/progressbars/ProgBarHa...](http://www.chrisharrison.net/projects/progressbars/ProgBarHarrison.pdf)

------
georgieporgie
It seems to me that a modern operating system, which might be used on several
different networks throughout the day, and which may offer a variety of
different hardware configurations, should be able to provide information to
applications which could be useful in calculating this sort of thing. Things
like latency, typical data throughput (network and local storage), etc.

~~~
ragweed
For some algorithims, it can be difficult to predict how much time will be
spent processing a particular hunk of data even when it's instantaneously
accessible.

