
Consumer Reports Retesting MacBook Pro Battery, Apple Says Safari Bug to Blame - binaryapparatus
http://www.macrumors.com/2017/01/10/consumer-reports-mbp-battery-safari-bug/
======
EnFinlay
Chances are they have a script that hits a couple different websites that
represent different archetypes: light/heavy text, light/heavy images,
light/heavy js, light/heavy ajax, flash, rendering, dependencies, fonts.
Instead of using the cache dozens of times, they refetch the pages (which
would must more closely represent someone surfing ye olde interwebs). Seems
pretty reasonable to me.

~~~
a2tech
Thats correct-the funny bit is they were using a hidden developer feature-not
one of the 'Developer' drop down features but an actual hidden flag. They also
observed huge internal discrepancies with their testing methodology-but
instead of reporting that to Apple or trying to figure out why their test was
broken CR reported their results as 'facts'. Thats why people and Apple are
irritated about this.

~~~
c0nfused
The test is factual.

The fact that turning off caching triggered a browser bug is not consumer
reports problem nor do they have an obligation to figure out why. They exist
to report on existing things even if they have bugs.

The idea that battery life for a browsing workload should be tested from
cached sites is silly, as is the idea that not using a check box on the menu
makes them out to get Apple.

~~~
Terribledactyl
The browsing workload should be partly cached content and mostly new, like in
the real world. Otherwise, what are we doing here?

~~~
slantyyz
Well, no battery test truly represents real world, so I don't think it's a big
deal.

The same testing methodology in the past gave the "Recommended" badge for
Macbook Pros. Why should we question the methodology only when it produces a
result that people don't like?

\-- edit, just noticed the article I posted this comment on got merged with
another by the moderators, so it's a dupe within the overall thread.

------
leejoramo
The issue is more complex. Turning off the cache also triggered a bug.

[http://www.imore.com/consumer-reports-fails-earn-macbook-
pro...](http://www.imore.com/consumer-reports-fails-earn-macbook-pro-
credibility)

> Consumer Reports uses a hidden Safari setting for developing web sites which
> turns off the browser cache. This is not a setting used by customers and
> does not reflect real-world usage. Their use of this developer setting also
> triggered an obscure and intermittent bug reloading icons which created
> inconsistent results in their lab.

~~~
mturmon
This is a good point, which seems to be ignored by other commenters nearby.

Additionally, this seems to be a case where _CR_ (a venerable publication with
many good qualities) used results in a home-brew single-point test to
characterize a system's quality.

This is not an isolated problem with this particular review. There's a lot of
snake oil in audio systems, but _CR_ reviews of audio equipment were a joke
for many years because of over-reliance on specific lab measurements. _CR_ has
had issues over the years with mistaken testing of child safety seats, pet
food, and other products over the years (see their wiki page).

This caveat is apparent after subscribing for a few years.

------
purephase
I dunno here. CR's validation around the fact that users will often visit
different sites seems on point to me.

Some users visit the same sites over and over, true. But, a more accurate
browsing pattern would be different sites, new content etc.

A more in-depth detail on how the browsing is done (automation) might be
relevant. If they hit the same 5 sites 1000 times vs. 1000 sites 5 times would
alter whether they disable cache or not. I'd say the former makes sense, not
the latter.

------
Will_Do
I guess the way to benchmark browsing to get most realistic numbers would be
to record normal browser behavior of a decently-sized sample of people (say
1000) and automatically repeat the behavior of a representative subsample
(10-50 may be enough).

This would take forever and cost quite a lot of money, however, so the much
more reasonable solution is to simply turn off the cache for everything.

I don't agree with Apple or Techcrunch that CR's methodology is wrong --
pretty much anyone would make that same decision because the alternative is
too much work for too little reward.

~~~
Johnny555
Websites change over time -- they'd have to download a static copy of the
websites and store them locally for repeatable tests (but then it no longer
reflects the real world, since websites often tailor themselves for each
user).

And since browsing activity changes over time too, they'd have to regularly
update the test, which means that tests are not comparable over time.

~~~
pfranz
I don't think that's unreasonable to address. For CPU and GPU benchmarks they
often run BenchMark2014 so you can see improvements compared to previous years
and BenchMark2017 to see the latest and greatest because hardware
architecture, game engine design, and graphics APIs change. You just need to
figure out how best to present that to the user.

------
mmastrac
I guess this really just confirms what I've suspected: the only way to get
decent battery life out of a Mac is if you are literally using it for nothing
but simple web browsing or watching a movie and doing nothing else.

Non-Apple apps are death for battery life. I think I'd be lucky to get two
hours of productive development time on my laptops. 1.5 hours is probably
closer to reality. I have two BatteryBox devices that add about 45 minutes to
1 hour each to runtime and it's impossible to get productive work done without
them.

Perhaps Consumer Reports should try testing battery life by running other
apps?

~~~
zitterbewegung
Can you recommend a company that has good battery life for high intensity
workloads? I'm genuinely curious.

~~~
mmastrac
No, as I've been in the Apple ecosystem for about 10 years and haven't shopped
around. I _used_ to have a Dell beast of a laptop back then that was 1) very
heavy, 2) very hot and 3) would last for hours of development time.

I would actually trade whatever weight would be necessary to get my useful
portable development time up to 3-4 hours.

~~~
caconym_
These days I think you'll find that laptops on the thinner and lighter side
rely heavily on CPU throttling and other power saving features, as well as
efficient software, to achieve good battery life.

You may need to go the "brick" route if you have a heavy workload, whether or
not that's due to inefficient software. I've found these days that a lot of
software is simply poorly written and drains much more power than it should
(the Atom editor is a good example).

~~~
mmastrac
Does such a brick still exist? I've seen the new Razor gaming laptop beasts
and wondered if they might do as a development laptop with lots of juice.

~~~
caconym_
Those Razer beasts are exactly what I was thinking. Probably anything with a
primary use case of running a discrete GPU on battery power will be a good bet
for heavy CPU workloads too.

------
tedivm
Consumer Reports did nothing different with the MacBook Pro tests than they do
with their other tests. So using the benchmarks as a comparison point between
Apple and others, such as HP, should still be completely valid.

~~~
ralfd
Read Apples statement to TechCrunch in the article.

------
tener
They turn off the browser cache for __every __notebook they test. If their
methodology were flawed because of this they would have noticed that long time
ago.

------
lynndylanhurley
Here's a question - if the results were due to an obscure Safari bug, then why
did Apple remove the battery life estimate with the latest OS X update?

I've been getting 2-3 hours max out of my new 15" 2016 MBP and I don't even
use Safari. I'm just running webpack, vim and Chrome.

~~~
caconym_
I think they removed it because the battery life of the new MBPs has been
getting a lot of negative attention, deserved or not, and in general people
don't seem to be capable of understanding why the estimate would change for
different workloads. Maybe the best PR move was just to remove it altogether.
I wish they hadn't.

It's not the same machine, but I routinely get > 10 hours on my 2016 12"
Macbook with a workflow that includes moderate to heavy browsing in Opera,
coding in Tmux/Vim, and various compilers and interpreters invoked
periodically. Have you had a look in Activity Monitor to see where all your
power is going?

I'm using two non-Apple apps as the core of my workflow (Opera and Alacritty
for terminal:
[https://github.com/jwilm/alacritty](https://github.com/jwilm/alacritty)) but
my battery life is still excellent. However, I chose them carefully on the
basis of their efficiency. Things written in Javascript seem to be real
resource hogs (Atom comes to mind), and Chrome is notorious for inefficiency
in memory, at least.

------
hueving
Does anyone here use consumer reports for general shopping guidance? Curious
of the quality.

~~~
acomjean
Generally I've found them to be pretty good with the reviews. The testing is
thorough and they explain what to look for when buying something (I've used it
for appliances).

Its subscription only (they don't do ads), which means they're not beholden to
the companies they're reviewing for money (a good thing)

They generally make a grid of the items you're looking at with dots indicating
the ratings. Its very easy to nagivate and intuitive.

You'll have to pay for it (or find someone to give you their old copies, or
many libraries have subscriptions).

~~~
jdmichal
I'm a big fan of the grids they do. Really allows you to pick the things that
are most important to you and maximize them.

------
DoodleBuggy
"You're testing it wrong."

It couldn't be that the battery is actually quite literally smaller. No, it is
the tester and tests fault.

------
sailfast
Speaking personally, I would want any stress test to be the MOST stressful
possible across all laptops for the baseline. Turn off cache (how often do you
have the console open), run a VM, sit on a Hangout, all at the same time. Make
the fans whirr hard.

If you want an "econo" benchmark go right ahead but I don't know why one would
criticize Consumer Reports for being hard on a laptop browser. (It obviously
makes sense for hardware makers to dispute the technique as it's harder to hit
"X Hours" markers that are tested)

~~~
lucb1e
I can see where you're coming from but I disagree. In that case you are
testing the _battery hardware_ , while the software actually makes a big
difference.

Say you have two phones: one with crap software, and one from a company that
spends time on writing proper, efficient software. If you run a stress
benchmark, they will both be running at 100% CPU all the time and you'll only
be measuring how much mAh is in the battery divided by how much power the CPU
draws. All of this is information is already available in the product
specification.

The only reason my Galaxy Note 2's battery life is any good (still the
original battery which is, what, 5 years old now?) is because I did software
tweaks to keep it from holding wake locks when apps are inactive.

~~~
sailfast
That makes sense and you're right, if all you're doing is a head-on stress
test then just let the math do the talking.

I think testing heavy use is probably different than a straight up stress test
in a number of ways that are more applicable to my daily use and more useful.
It's good to know if Safari's caching strategy saves you battery, but as a
non-Safari user that often does not refresh from cache I'm not sure I would
appreciate buying a laptop advertising X battery and finding it to have
changed significantly or be caveated by lots of required settings.

~~~
lucb1e
Alright, that's a good point.

------
cddotdotslash
So Consumer Reports had Safari Developer Mode enabled which forced full page
loads (plus some other bug). And Apple claims "this is not a setting used by
customers and does not reflect real-world usage." While I agree most people
wouldn't have that option enabled, I would argue that the average user is
somewhere between the extremes. For example, most users are not disabling
cache and browsing to 20 MB webpages and clicking refresh hundreds of times.
But at the same time, neither are users leaving cache on and reloading the
same page hundreds of times; they're browsing from site to site, refreshing
Facebook feeds, etc. If Apple can make this statement, I'd be curious as to
what tests they are performing. I'm almost certain the battery specs released
by Apple are not indicative of how average users would use the MacBook (i.e.
brigtness 10%, no movement, WiFi off, all background apps disabled, etc.).

EDIT: as a reply mentions, Apple does provide (some) conditions used for their
tests. However I'd still imagine that the conditions they do choose are the
"most optimal" (for example - who do you know that only browses 25 sites in a
single charge?)

~~~
r00fus
So the test if flawed. You can't simply disable cache and call it a good test.

Do they do this on other browsers? Browse the same site over and over while
disabling cache?

Why did this not cause a problem when they ran Chrome (did they also disable
cache there)?

The CR tests raise more concerns about how CR runs their tests (and
specifically how they run this test) as opposed to the MacbookPro.

~~~
slantyyz
>> The CR tests raise more concerns about how CR runs their tests (and
specifically how they run this test) as opposed to the MacbookPro.

Why? At some point, all of these battery tests are arbitrary and not 100%
reflective of the real world.

If I'm not mistaken, previous Macbook Pro models underwent the exact same test
and still got the "Recommended" badge. Nobody was complaining about the test
methodology for those models.

~~~
intopieces
>If I'm not mistaken, previous Macbook Pro models underwent the exact same
test and still got the "Recommended" badge. Nobody was complaining about the
test methodology for those models.

Did Consumer Reports indicate in any previous reviews that this is an aspect
of their methodology? It's my understanding that this aspect is just now being
revealed.

~~~
slantyyz
>> Did Consumer Reports indicate in any previous reviews that this is an
aspect of their methodology?

Probably not.

>> It's my understanding that this aspect is just now being revealed.

That's probably because nobody cared to ask before. The new Macbook Pros were
already under a microscope even before Consumer Reports did their tests. Any
less than stellar review from CR was going to get lots of eyeballs. In any
case, the big reveal is not that caching was disabled, it's that there was a
bug on Apple's side that is the likely root cause, and I'm guessing that bug
didn't exist in the previous versions of Safari that were used to test the
previous Macbook Pros that got the "Recommended" badge.

CR's response today: [http://www.consumerreports.org/apple/apple-releases-fix-
to-m...](http://www.consumerreports.org/apple/apple-releases-fix-to-macbook-
pros-in-response-to-consumer-reports-battery-test-results/)

Some key quotes:

"We turn off caching as part of Consumer Reports' standard laptop test
protocol" \-- I can only take that at its word. It doesn't say "updated"
standard protocol.

"At Consumer Reports, we test every laptop from every manufacturer in a
comparable way. Because people use laptops differently and because their usage
can vary from day to day, our battery tests are not designed to be a direct
simulation of a consumer’s experience. Rather, we look to control as many
variables as possible, then perform a test that gives potential users a
reasonable expectation of battery life when a computer’s processors, screen,
memory, and antennas are under a light to moderate workload. _This test has
served as a good proxy for battery life on the hundreds of laptops in our
ratings._ "

They don't review hundreds of laptops per year, that's spread out over
multiple years. So I don't think it's a huge leap of faith to assume that they
turned off the cache on prior Macbook Pro models.

FWIW, John Gruber does not "think it’s fair to say that disabling the caches
is unfair or a flawed method" \-- which says a lot.

~~~
intopieces
I'm not sure, then, why the lack of scrutiny in previous years is valid
criticism; nobody shines a lamp in her doctor's eyes when she gives her a
clean bill of health. Of course an unusual review will invite unusual
scrutiny.

I just wish CR had been more open with the exact methodology from the start.
What other settings do they activate that vary from a normal user's? They want
us to take their word that their method sufficiently emulates light to
moderate usage, but my own experience has been vastly different from that
review.

~~~
slantyyz
Put yourself in CR's shoes.

You can't do every test of every laptop using the popular sites, because
content payloads on various sites change over time. You won't get apples to
apples comparison tests.

The easiest way to do a repeatable browser test is to turn caching off on the
browsers and hit a few web pages that never change and that you control on
your own server. By repeating that test over and over again, it simulates
hitting new sites even though you're hitting the same pages.

If I was in the business of doing laptop reviews, that's how I'd do it.

\-- edit: This blog entry by Marco Arment on the topic is worth reading:
[https://marco.org/2017/01/10/cr-mbp-battery-
update](https://marco.org/2017/01/10/cr-mbp-battery-update)

------
richdougherty
Summary: Consumer Reports turned off Safari's page cache to ensure the
browsers download fresh content. This hurt performance a lot, presumably
because Apple hadn't optimised this usage pattern. Apple has released an
update which fixes the "bug".

My opinion: Disabling the cache as Consumer Reports has done is an unrealistic
testing pattern. Since Consumer Reports run their tests from local servers [1]
they have a better option available. They can use server-generated content to
control how much content is served from cache and how much is fresh. They can
make the workload 0%–100% fresh by using the servers to generate unique ids
for URLs that they want to be served fresh or reusing URLs if they want them
to maybe be in the cache.

1: [http://www.consumerreports.org/apple/apple-releases-fix-
to-m...](http://www.consumerreports.org/apple/apple-releases-fix-to-macbook-
pros-in-response-to-consumer-reports-battery-test-results/):

EDIT: To be clear, I'm not saying the Consumer Reports didn't find something
interesting here, I'm just saying they could make a better test for the
future.

~~~
1_2__3
If a laptop battery capacity is cut in half because you turn off caching in a
browser that is important information to know, as it suggests the battery
times are the result of carefully sterilized conditions that don't withstand
the test of real world use. They didn't disable power management or something,
they turned off caching in a single app. Come on.

~~~
sdegutis
That's not true. The vast majority of people don't turn off caching. The
default caching behavior Safari provides is probably tailored to real-life
usage. I could be wrong, but I'm assuming Apple isn't _that_ dense.

~~~
richdougherty
I agree, real world usage is a mix of cached and non-cached. The only time I
turn of caching is when I'm debugging a website I'm developing.

------
revelation
It doesn't really matter what the test is as long as they run the same one
across the devices.

------
Oletros
The Techcrunch article is very bad.

CR has disabled the cache for all of their tests in all the computers for
years.

And now is bad because there was a bug in Safari?

------
bluthru
Consumer Reports's original methodology is dumbfounding.

~~~
Oletros
Why?

~~~
bluthru
The vast majority of users don't browse with caching disabled.

~~~
Oletros
The vast majority of users browse different pages.

And why it wasn't a problem until now? CR was praised when they acknowledge
that Macbooks had almost always the higher battery life

