
What We Learned from Analyzing 100M Bugs - okgabr
https://instabug.com/state-of-mobile-app-quality-2018
======
rokob
Pretty much all of these "most" findings are explainable by the distribution
of the installed user base, i.e. they are not real results but just artifacts
of the population sizes.

~~~
aprilledaughn
Yes, you’re right! The report covers data from Instabug users only. We
extracted data from 30K apps with a range of user base sizes, locales,
devices, etc. I definitely agree with you, it’s not a definitive
representation of the market, but we believe we have a good enough sample and
that the findings are valuable for app developers.

We couldn’t find any other data on mobile bugs like this, so we decided to
share what we have for the app dev community to have some benchmarks and
insights.

~~~
minimaxir
Sample size isn't the issue here, although a large heterogenous sample is
good.

The complaints are e.g. "Most bugs are reported from iPhones" because they are
a very popular type of phone with the customers more likely to report bugs. It
doesn't necessarily mean the iPhone is buggier than others.

~~~
aprilledaughn
Right, we're definitely not saying the iPhone is buggier than others. I guess
the issue is with the wording of the claim. It would be more accurate to say
"Most bugs reported through Instabug are from iPhones"

That's also why we included the bugs/user data since it shows a completely
different distribution across devices.

------
oldgradstudent
They haven't actually analyzed 100M bugs, they've analyzed a list of bug
reports.

They haven't analyzed how quickly bugs are resolved, they've analyzed how
quickly bugs are marked as resolved.

The distinction is important. Nowhere in the report there's any attempt to
judge the quality of the data and its reliability.

Oh, and the honest answer to the question "Why did we create this report?" is
probably PR.

~~~
aprilledaughn
\- You’re absolutely right, not all the 100M bug reports are actually bugs.
However, we highlighted this under the "Time to Close" section: "These are
most likely not programmatic bugs, but could be support issues or spam." How
do you think we should highlight this more to avoid confusion?

\- About bug resolution time, Instabug is used by many companies as their main
bug reporting tool or they forward these bugs to another bug tracker like Jira
and we have a two-way sync so whenever it gets resolved over there, it’s
resolved at Instabug as well. That’s why we used the word "resolved" not
"fixed" because each company has their own definition. I hope this makes
sense.

\- About the quality and reliability of the data: Oh, we didn’t mean to be
protective about this! On the contrary, we’d love to get your feedback. What
would you like to know?

\- About your third point, I respectfully disagree. As the person who spent
the most hours working on this report, I can tell you honestly that it was not
for PR. We just wanted to put something out there that would hopefully be
valuable to the people in our community. We initially shared this with our own
users for them to have benchmarks. This is the first time we’ve released
anything like it, so it was an experiment for us to be honest and I’m loving
all these comments because it helps us know what to do better next time
around.

~~~
oldgradstudent
Sorry if I impugned your motives.

I think that this report is only useful in showing app developers that the
patterns they encounter in their bug reports are common in the entire
ecosystem, not special to their specific app requiring further investigation.
Keep the patterns, keep the information about integration with external tools
(customers might find it useful). Scraps the rest.

The main problem in the report is that you try to answer questions which your
data and analysis is inherently incapable of answering. For example:

\- "Which manufacturers have the most bugs?" \- "Which UI orientation has more
issues?" \- "Which locale has the most bugs?" \- "How does battery affect app
stability?" \- "Which OS has buggier apps?"

As other commenters have mentioned, your results could be just artifacts of
the user demographics (or any number of other confounders). The answers are,
at best, meaningless.

There are significant inconsistencies in figures 1 and 2. They definitely do
not agree with "Errors discovered through Instabug are most likely to be
resolved within 24 hours of being reported." (except in the narrow technical
sense of the first day being the most likely day).

Even if the data was sufficient, there's no mention of statistical
significance in comparisons. For example, Danish is the locale with the most
bugs per user. However, you have quite a lot of locales and random variability
is expected. Is the difference statistically significant?

------
coldtea
I expected something much more interesting: e.g. most common types of bugs, or
causes of bugs -- and thus suggestions on how they could be avoided.

~~~
okgabr
Good point and sorry to disappoint you! The good news is that we still have a
lot to share, this is the first time in six years to dig deeper into our data
and share it with the community. I’m sure we’ll do more and more soon. A
series about the most common causes of bugs and suggestions on how they could
be avoided would definitely be a great start!

------
dbwest
On my Pixel 2 the download does not work for their report. I submit that bug
so they can analyze it as their 100M+1.

------
LiamPa
Ironic that I have a huge ‘download report’ banner across the middle of the
screen (iPad Pro)

~~~
aprilledaughn
Thanks for reporting this bug! Yup, Instabug has bugs too.

------
bradjohnson
I think the y axis: "% Bugs" in Fig. 1 has an incorrect scale. It doesn't add
to 100%.

Also, it doesn't seem consistent with the claim: "Bugs discovered through
Instabug are most likely to be resolved within 24 hours of being reported"

~~~
ericpauley
The figure is showing percent of all bugs, not percent of resolved bugs.
Likely the rest of the 100% is unresolved bugs.

The confusion on the second point hinges in "most likely". You're likely
interpreting that as the expectation of resolution time whereas they are using
maximum likelyhood estimation. MLE is rather useless in this case, but it is
technically still correct.

~~~
okgabr
You’re right! Thanks for adding your thoughts to clarify. The wording “most
likely” could be confusing indeed. How about we change it to be “Bugs
discovered through Instabug are most often resolved within 24 hours of being
reported.” Would that be clearer? And also saying that this is percent of all
bugs, not percent of resolved bugs

------
fhood
Hmm, all this is basically what I would expect....wait..."Most bugs are
reported from iPhones, while more bugs/user are reported from LG devices."

....Why LG?

~~~
aprilledaughn
Thanks for checking out the report! I'm part of the team who put it together
:)

Yeah, we thought LG was interesting too. When it comes to Android, we expected
Samsung to take first place tbh, but we found more bugs/user reported from LG
and Google devices (Fig. 9). This could be explained by our technical user
base and the popularity of Nexus devices with Android developers. So the
higher proportion of bugs/user we see reported is most likely due to internal
beta testing by devs.

We went into this with some expectations and were surprised by some other
findings as well... like Danish being the top locale where bugs/user are
reported from :D

~~~
chatmasta
Or maybe it’s the opposite, i.e. devs are not testing on LG devices so they
miss corner cases and ship bugs to them?

~~~
aprilledaughn
Interesting! Could be. Our analysis is based on what we know about our users'
behavior but certainly not definitive. All the data here is open to
interpretation.

------
1_800_UNICORN
I'm very confused...

"Errors discovered through Instabug are most likely to be resolved within 24
hours of being reported" is one of the TL;DR points, but only ~1.5% of bugs
are resolved within 24 hours.

