Chebyshev's Theorem: 89% of all values in (almost every human-ish) distribution fall within 3 SDs of the mean.
PERT estimation: base your estimates on the happiest (O for optimistic) unhappiest (P For Pessimistic), and most likely (ML) path - the recommended formula is (P + O + 4ML) / 6, but you can also just model them as a distribution.
Secretary problem: when attempting to optimize a result, the effect of uncertainties can be ascertained in 1/e (~37%) of the total solution space.
So putting this together ..
If you have to give a time estimate for something, your best bet is to
1) get your 3 PERT numbers
2) calc the mean, SD, and 37% of mean + 3SD ("Assessment") - this is how much time it should take (based on your pessismism) to eliminate uncertainty for the remaining work estimate
3) if the mean is less than the Assessment time, there's too much uncertainty and you should offer to assess the effort instead of just doing the work
4) if the mean is more than the assessment, just offer to do the work with the PERT output from the formula above.
I find most people are happy to hear you have a concrete plan for assessing something with the explicit purpose of providing precision.
I've found success asking questions such as "Roughly how much will updating my abstract cost? I won't hold you to the number, but are we talking closer to $200 or $800?". I make sure the numbers are quite far apart, and use it to figure high/low ends of required performance, deciding if something that's not currently the bottleneck will become a bottleneck, etc.
Also, converting between hourly and salaried, you can roughly use 40 hrs/week * 50 weeks/year = 2,000 hours/year.
And do explicit bounding, with high/low confidence bounds.
"How many cats live in this neighborhood? Is it more like 1, 10, 100, 1000, and so on?" "Hard lower bound of 1 - that one over there". "Hard upper bound of 1M - we'd be tripping on a cat per sq meter". "Soft lower bound of 10 - I think it likely there are at least 2 more cats somewhere, which gets us to order 10".
In an educational setting, bounding has much nicer conversational dynamics than the much more commonly used point estimates. Instead of identity-entangled long chains of approximation, followed by "I don't like your step C", "Well, your step B is a gross underestimate". And then "oh, the answer is X, but we said something else, which we'll now arbitrarily call right/wrong and move on". Instead you can get a much more collaborative "Can anyone suggest another bound, upper or lower, hard or soft?" "Does anyone have any questions about that proposal?" "Any suggestions on resolving those questions?" "Ok, can anyone suggest another bound?" And when bounds are compared to ground truth, there's the richer followup of which bounds held. And while a soft bound blown might be bad luck, a hard bound blown means something was misunderstood, permitting fruitful failure analysis. And it's intensive "scientific dialog" practice - the discussion around each bound is another iteration.
April estimates 5 hours of work but spends 12 on it. She justifies her overage by communicating with the PM and letting them know ahead of time the task is worse off than they thought. But to do it right she will need to go over, by roughly 5 more hours. Song the way she is taking notes of what she’s doing and then screenshots and justifications at the end to explain the work.
Bob estimates 5 hours and takes 12. He doesn’t communicate, the work isn’t finished to spec or even partially to spec, he can’t justify his time and argues that roadblocks kept him from finishing.
The latter is the person I regularly deal with and I only know one “April”.
Most memorable from this was watching his reaction while he and I were watching an episode of Star Trek the Next Generation where his exact problem was manifested into reality by his favorite character: https://youtu.be/8xRqXYsksFg
Nonetheless, I may or may not have employed my dad’s and Scotty’s method now that I have a team reporting up to me, each with their own strengths and weaknesses in delivery. Grooming them has been a genuine pleasure, though.
Kevin estimates 5 hours, his manager stares at him and says "I don't think it's really 5 hours, we can do it in 3" and forces Kevin to revise his "estimate". He finishes in 10 hours, clearly justifies the overrun, but gets a negative performance review.
One trick I've found is to work in a company with respected Test Engineers. If you tell them "I could really use an introduction to the different pieces here", rather than saying that you should stop "trying to understand the universe", they are often happy to show you documents like a risk list and other diagrams useful for getting a mental model of a project. If you find TDD useful for getting started and getting into a good cadence with a project, they often have great ideas for how to make a project more testable. They are good people to have backing you up if you're pushing back on other forms of estimate nonsense.
Even better, take just take an hourly rate, 2x it, add three zeros = annual salary; literally takes seconds to do once you know 2000 hours per year is the assumed hours worked per year.
Personally I've got 6 paid weeks off every year (their default, I didn't negotiate that number) plus public holidays (iirc that adds up to another two weeks but maybe it's just one) with both my past and previous employer which were in different western European countries, but I don't know if that's average.
In the US maybe 250 is closer and if you can multiply it by 4 in further calculations somehow it becomes a power of ten.
How can this possibly be true?
If I take five samples of the speed of my car, and I always take them while the car is just setting off, it's never going to be anywhere near the median speed over a twelve hour drive.
I feel like there must be a huge list of extra constraints and caveats you aren't mentioning.
(I'm really bad at stats - genuinely asking.)
In the example you gave, the samples you draw are from the distribution of (the speed of your car just as it is setting off). There would be no guarantee that the estimated median has any relationship for any other distribution, such as (speed of your car during a trip). If you want to estimate the latter you'd need to figure out a way to draw random samples from that distribution.
Which scenario seems more likely ;)
I don't know enough stats to give a firm answer, but I'd reckon there is a key assumption that the samples need to be drawn i.i.d. from a single underlying probability distribution, or perhaps need to satisfy the related assumption of being exchangeable.
In your example of a sequence of samples that increase, they're certainly not exchangeable. I think they're not independent either.
E.g. thought experiment to give a concrete version of your example, where we define it so there's no randomness at all to make it easier to think about : let's suppose an idealised situation where we launch a space probe that travels away from the earth at 15 km / second. Suppose we have some way of measuring the distance d(t) that probe is from earth at some time t after launch. Regard each distance measurement d(t) as a sample. Let's assume we take 5 samples by measuring the distance every 10 seconds after launch. So t_1=10s, ..., t_5=50s, and d(t_1)=150km, ..., d(t_5)=750km.
The sequence of distance samples d(t_1), d(t_2), d(t_3), d(t_4), d(t_5) is not exchangeable as if we exchange two samples like d(t_2) <-> d(t_4), the permuted sequence d(t_1), d(t_4), d(t_3), d(t_2), d(t_5) corresponds to the situation: "at 10 seconds the probe was 150 away, at 20 seconds the probe was 600 km away, at 30 seconds the probe was 450 km away, at 40 seconds the probe was 300 km away, at 50 seconds the probe was 750 km away" -- the probability of observing that outcome is an awful lot lower -- based on our understanding of how physics of the situation work in this idealised example -- than the probability of observing the outcome from the original sequence (this is pretty sloppy as I am not clearly distinguishing between observed values and random variables, but hopefully it gives some vague intuition).
So if you want to estimate the median distance of the probe from the earth from 5 samples, you roughly need to take 5 measurements at 5 times chosen to be uniformly at random from the entire period you are interested in. E.g. if you want to estimate the median distance of the probe from the earth during the first 10 years of travel, you need to draw 5 samples from 5 different times sampled from the uniform distribution over the period [0 seconds, 10 years]. Then the resulting estimated median distance would only apply for the distance of the probe during that time period, it would not be an estimate that could be applied for any different time period.
Unfortunately this kind of key assumption is rarely made explicit when teaching people stats. I see research papers all the time making this assumption where it clearly isn't warranted - such as in benchmarking a computer.
> There is a 93.75% chance that the median of a population is between the smallest and largest values in any random sample of five from that population.
> It might seem impossible to be 93.75% certain about anything based on a random sample of just five, but it works. To understand why this method works, it is important to note that the Rule of Five estimates only the median of a population. Remember, the median is the point where half the population is above it and half is below it. If we randomly picked five values that were all above the median or all below it, then the median would be outside our range. But what is the chance of that, really?
> The chance of randomly picking a value above the median is, by definition, 50%—the same as a coin flip resulting in “heads.” The chance of randomly selecting five values that happen to be all above the median is like flipping a coin and getting heads five times in a row. The chance of getting heads five times in a row in a random coin flip is 1 in 32, or 3.125%; the same is true with getting five tails in a row. The chance of not getting all heads or all tails is then 100% − 3.125% × 2, or 93.75%. Therefore, the chance of at least one out of a sample of five being above the median and at least one being below is 93.75% (round it down to 93% or even 90% if you want to be conservative).
Rule of three: If you want to compare two things (i.e if code A is faster then code B), then measure each three times. If all three As are faster then the Bs, then there is a ~95% probability that is true.
We are trying to find the probability of selecting three As, from the set (ABABAB). We don't care about the order of the As, so this is a combinations without repetition problem:
6 choose 3
= n! / ((n-r)! r!))
= 6! / (3! 3!)
= 120 / 6 = 20
There is only one state where all three As are chosen, so the probability of getting it by random chance (our null hypothesis) is:
1 / 20 = 5%.
Hence 95% confidence.
Can't actually speak to the probability that a is faster than b without some kind of assumed prior distribution for a and b
In particular it only provides weak evidence that the mean, mode, median of A is any faster than B's. In the worst case A happens to be a lot slower than B, but with very low probability.
But if for example, B always halts in 10 seconds, and A halts in 1 second 99.99% of the time but A runs for 10 years 0.01% of the time, then A is "slower" but you'd need 10,000 trials to notice.
First, this isn't really "95% confidence" of anything. You are computing the probability of seeing your outcome in the case that A and B have the same distribution. This is very counterintuitive and I highly recommend people read up on a p-values to get an exact sense of what's going on. A low p-value simply tells you that it's unlikely that A and B have the exact same performance, it doesn't really tell you that A is faster than B in any specific manner (mean, median, mode, etc).
Second, the math here is wrong. The p value is actually 0.10, not 0.05 as presented here as you should also take into account the possibility of your test returning 3 Bs. You then get p = 2 * 1 / 20 = 2 / 20 = 0.1. (The exact terminology here is whether the test is a two-tailed or one-tailed test an usually you want to use the two-tailed test unless you have some special prior knowledge).
We're ordering the trials by speed, and computing the probability of getting an AAABBB ordering. The probability of ordering AAABBB by random chance is 1/20. That's why we don't care about getting BBB first, and why getting AAA first does tell you that A is faster then B.
You can also click on the link in the OP and see other people explaining the logic of the rule. If there's an error with either my explanation or others, I'm interested in hearing it.
The main key to remember when deriving a p value is that you are computing the probability of seeing an outcome at least as extreme given the null hypothesis, not the probability of seeing your particular outcome.
In this case I am defining the statistical significance of our test ordering (AAABBB) as anything less than or equal to 0.05 of our probability distribution - which corresponds to the 0 Bs bin. This corresponds to a one-sided confidence interval of 95%.
What am I doing wrong?
If this doesn't make sense, I would recommend running simulations under the null hypothesis. You will see that 5% of the time you will falsely conclude that A < B and that in another 5% of the time you will conclude that B < A, leading to an overall false positive rate of 10%.
This seems to me like good but dangerous advice. I'm all for using back-of-the-envelope estimations as "small angle approximations" where they're not far off, but, in general, iterating the lossy approximation to get a large-scale answer is the wrong thing to do. It works well here, but only because we're dealing with logarithms, which genuinely do convert multiplication to addition, and not mentioning this point seems to tempt one to use it in situations where it's not applicable.
365.25 / 12 * 24 = 730.5
Technically it is 731 if you round up.
Days in a year: 365.2425 (exactly, under Gregorian calendar)
Days in a month: 30.44 (approximation, a twelveth of the above)
> Technically it is 731 if you round up.
Actually, since 365.25 is a bit high (if it were exact we would all still be on the Julian calendar), I would round down to 730 since the multiplication gives a little less than 730.5. But that's still not the same as saying the multiplication gives exactly 730, as others are saying.
Apparently a lot of "we" don't, since a number of people were saying they did the "exact" calculation that kolanos did but got a different answer.
> so maybe cool it on the fun fact witch hunt.?
The "witch hunt" seems to me to be on the part of the people who downvoted a perfectly correct and legitimate post, and those who responded to it with a claimed "exact" calculation that was wrong.
IIRC it covers everything from multiplication to square roots to logarithms.
Personally I find that I am unable to do these calculations/estimations unless I write things down. I get lost between powers of 2/MB/TB/nanoseconds and other unit conversions. I understand that this is elementary math, but still, I am just curious, can most people do similar calculations in their head (for instance during a meeting) ?
Powers of 2 are incredibly helpful too: I can estimate that the square root of 4000 is around 60 because 4000 is close to 2 to the 12th, for instance. As a coder it’s well worth your time to memorize powers of 2 up to at least 2 to the 16.
And for 0-9 I have those memorized. So if I'm given the number 2^39 I unwrap that as 512 giga, or just over 500 billion (seems its ~550, but estimating with base 2 is still going to be closer than estimating with base 10)
Just ask engineer friends random questions from time to time, and when there is no consensus, calculate the results properly.
Eg. How much power could a solar powered fly collect?
How many tons of batteries would I need to drive a train across the USA?
How long can I power my laptop from a solar usb battery bank in the sun each day on a hiking trip?
What's the load time of this webpage I just wrote on a satellite internet connection?
How long would it take to boot up my OS if I could keep RAM intact during a reboot?
How many kilos of coal does it take to cook a pizza?
Edit: just realized this is more about mental math than about back-of-the-envelope estimations
If I tipped 15%, I would probably knock off a zero, then add half to it.
(Unfortunately I don't always get a satisfying answer, but that's life...)
 - https://sirupsen.com/napkin/
So if two servers are in a room 100 feet apart, the time it takes for them to communicate cannot be lower than 100 ns.