
Statistics for Software - mhashemi
https://www.paypal-engineering.com/2016/04/11/statistics-for-software/
======
qznc
A student of mine just finished building a benchmarking tool for applications
[0]. For example, it warns if your sample size is too small. Here is an
example, where he compares GHC performance over the last years [1].

[0]
[https://github.com/parttimenerd/temci](https://github.com/parttimenerd/temci)
[1] [https://uqudy.serpens.uberspace.de/blog/2016/02/08/ghc-
perfo...](https://uqudy.serpens.uberspace.de/blog/2016/02/08/ghc-performance-
over-time/)

------
rubidium
"When the software industry gets to a point where it leverages this analysis
as much as the hardware industry, the technology world will undoubtedly have
become a cleaner place."

Hear hear! Quality control is usually one of the _primary_ drivers in new
hardware development. With software, I find it's often tacked on at the end.

~~~
rileymat2
I am not sure that is coming anytime soon. Generally, You have the ability to
update software products more cheaply than hardware products. Also, people get
quite a bit of utility out of buggy products, people will jump on a beta if
they think it is useful.

Given that, half finished buggy software products have incentives to be
released with less quality control. This of course will vary by use, no one is
getting in a plane with software with low quality control.

I would guess that we can look at the attention paid to quality control as a
function of updatability and risk.

~~~
kmkemp
Exactly this. Reminds me of this quote:

"If you are not embarrassed by the first version of your product, you’ve
launched too late."

\- Reid Hoffman

------
zatkin
Slightly off-topic, but it's surprising to me that they have this blog that is
up-to-date with their latest brand, yet they've still let large portions of
Paypal.com go without an update to line up with their latest brand.

~~~
duiker101
Well, this post blog will be seen by a relatively small number of people and
also it's just that, a blog. Their main site is used by an exponentially large
number of people, including our grandma's and possibly people with different
accessibility requirements, making it trickier to update. This is probably not
an excuse (I consider having a half updated site worse that a slightly bad
full update) but it's to say that it's not that surprising

------
pandeiro
OP writes really well. Found myself reading the README of his Python web
framework (which I'll never use) just because of the clarity, style and
pedagogical approach.

Hope there's more on the way.

~~~
mhashemi
Whoa! With praise like this, how couldn't there be more? Thank you!

------
ascotan
Oh God. I need to read this. Great post.

------
partycoder
I started by dumping spreadsheets and forcing myself to use R. Also signed up
to datacamp.com

I never liked spreadsheets or people that like spreadsheets.

------
gnahckire
I don't really like the idea of throwing data away because it ultimately gives
an incomplete view of the system. But, easy solution to solve a hard problem!

~~~
gnahckire
Okay. Not really sure why I got downvoted so much. Why am I wrong?

~~~
mhashemi
Sampling is fundamental to so much of practical statistics. It's more or less
proven and accepted. In real studies, we "throw data away" by just not
collecting it in the first place. As long as you do it right, you still get a
reliable answer.

But if you've already got it all and it all fits in memory, by all means, hold
on to it!

