Back of the envelope estimation hacks 228 points by bkudria 2 days ago | hide | past | web | favorite | 75 comments

 Three other hacks which when combined have a lot of power ..Chebyshev's Theorem: 89% of all values in (almost every human-ish) distribution fall within 3 SDs of the mean.PERT estimation: base your estimates on the happiest (O for optimistic) unhappiest (P For Pessimistic), and most likely (ML) path - the recommended formula is (P + O + 4ML) / 6, but you can also just model them as a distribution.Secretary problem: when attempting to optimize a result, the effect of uncertainties can be ascertained in 1/e (~37%) of the total solution space.So putting this together ..If you have to give a time estimate for something, your best bet is to1) get your 3 PERT numbers2) calc the mean, SD, and 37% of mean + 3SD ("Assessment") - this is how much time it should take (based on your pessismism) to eliminate uncertainty for the remaining work estimate3) if the mean is less than the Assessment time, there's too much uncertainty and you should offer to assess the effort instead of just doing the work4) if the mean is more than the assessment, just offer to do the work with the PERT output from the formula above.I find most people are happy to hear you have a concrete plan for assessing something with the explicit purpose of providing precision.
 I've found people hate giving numbers to things when estimating how a system would work, or what it would need. Ballparking can really help though.I've found success asking questions such as "Roughly how much will updating my abstract cost? I won't hold you to the number, but are we talking closer to \$200 or \$800?". I make sure the numbers are quite far apart, and use it to figure high/low ends of required performance, deciding if something that's not currently the bottleneck will become a bottleneck, etc.Also, converting between hourly and salaried, you can roughly use 40 hrs/week * 50 weeks/year = 2,000 hours/year.
 One can also use an exponential spread: "Is it more like 1, 10, 100, 1000, ... ?".And do explicit bounding, with high/low confidence bounds."How many cats live in this neighborhood? Is it more like 1, 10, 100, 1000, and so on?" "Hard lower bound of 1 - that one over there". "Hard upper bound of 1M - we'd be tripping on a cat per sq meter". "Soft lower bound of 10 - I think it likely there are at least 2 more cats somewhere, which gets us to order 10".In an educational setting, bounding has much nicer conversational dynamics than the much more commonly used point estimates. Instead of identity-entangled long chains of approximation, followed by "I don't like your step C", "Well, your step B is a gross underestimate". And then "oh, the answer is X, but we said something else, which we'll now arbitrarily call right/wrong and move on". Instead you can get a much more collaborative "Can anyone suggest another bound, upper or lower, hard or soft?" "Does anyone have any questions about that proposal?" "Any suggestions on resolving those questions?" "Ok, can anyone suggest another bound?" And when bounds are compared to ground truth, there's the richer followup of which bounds held. And while a soft bound blown might be bad luck, a hard bound blown means something was misunderstood, permitting fruitful failure analysis. And it's intensive "scientific dialog" practice - the discussion around each bound is another iteration.
 I think a lot of people hate giving an estimate because they have learned from experience that, even if someone tells them they won't hold them to the number, it still happens. You might be the exception, but it seems to happen quite regularly.
 Here’s my experience with that:April estimates 5 hours of work but spends 12 on it. She justifies her overage by communicating with the PM and letting them know ahead of time the task is worse off than they thought. But to do it right she will need to go over, by roughly 5 more hours. Song the way she is taking notes of what she’s doing and then screenshots and justifications at the end to explain the work.Bob estimates 5 hours and takes 12. He doesn’t communicate, the work isn’t finished to spec or even partially to spec, he can’t justify his time and argues that roadblocks kept him from finishing.The latter is the person I regularly deal with and I only know one “April”.
 My old man spent 30 years working as a mechanical engineer for General Electric in their gas turbine division and used to bring home these sorts of tales similar this with people who reported to him, which subsequently informed him on how he needed to communicate to the manager he reported to on when to expect deliverables.Most memorable from this was watching his reaction while he and I were watching an episode of Star Trek the Next Generation where his exact problem was manifested into reality by his favorite character: https://youtu.be/8xRqXYsksFgNonetheless, I may or may not have employed my dad’s and Scotty’s method now that I have a team reporting up to me, each with their own strengths and weaknesses in delivery. Grooming them has been a genuine pleasure, though.
 You missed one:Kevin estimates 5 hours, his manager stares at him and says "I don't think it's really 5 hours, we can do it in 3" and forces Kevin to revise his "estimate". He finishes in 10 hours, clearly justifies the overrun, but gets a negative performance review.
 I've had discussions like this in the past, where people ask me "can we do it in 3 days instead of 5?" And I've found the easiest way to sell it to them is to ask which 40% of the feature set they would like me to remove from the deliverable?
 Probably the biggest thing that I've found makes estimation harder when I lack a clear idea of the goals of the project, what the different pieces are and how they interconnect, or how to start on the first step. This sort of barrier was really hard to overcome as a junior engineer because I found it hard to persuade people that "I don't know" or "I don't know what I'm doing with this" was an honest reflection of reality rather than impostor syndrome.One trick I've found is to work in a company with respected Test Engineers. If you tell them "I could really use an introduction to the different pieces here", rather than saying that you should stop "trying to understand the universe", they are often happy to show you documents like a risk list and other diagrams useful for getting a mental model of a project. If you find TDD useful for getting started and getting into a good cadence with a project, they often have great ideas for how to make a project more testable. They are good people to have backing you up if you're pushing back on other forms of estimate nonsense.
 >> “ Also, converting between hourly and salaried, you can roughly use 40 hrs/week * 50 weeks/year = 2,000 hours/year.”Even better, take just take an hourly rate, 2x it, add three zeros = annual salary; literally takes seconds to do once you know 2000 hours per year is the assumed hours worked per year.
 Yes, this is how I do it in my head. I should have mentioned that given the context of the post!
 I wonder what that number is anywhere else in the world. Most countries have a legal minimum paid time off higher than that (the USA is the exception when compared to the vast majority of countries, rich or poor), but most companies give a bit more than the minimum in our field, so what I'm wondering is whether the difference is large enough that rounding it up doesn't make sense.Personally I've got 6 paid weeks off every year (their default, I didn't negotiate that number) plus public holidays (iirc that adds up to another two weeks but maybe it's just one) with both my past and previous employer which were in different western European countries, but I don't know if that's average.
 I could be wrong, but I think 50 hours is the result of rounding down from 52, not the result of subtracting two weeks of unpaid vacation. Most people who work a full time hourly job in the US get paid for every week of the year if they only use paid time off.
 For work days per year, I usually use 200. It is 5 * 52 = 260. Now subtract vacation days in Germany: -30. Minus some holidays. Minus potential illness.In the US maybe 250 is closer and if you can multiply it by 4 in further calculations somehow it becomes a power of ten.
 American here. I would use 204: 250 - 10 holidays - 10 days of summer hours (something my company does in the summer) - 26 vacation days. My vacation days can actually rollover from year to year (up to a maximum of 60 days), so I could take off more vacations days. But, at a minimum I will get 26 days of vacation (on top of the 20 days of holidays and summer days) per year.
 50 weeks a year is very US focused. Other countries usually have 4 weeks off a year or more.
 In this case, it's not a huge difference.
 And lots of workers, even hourly ones, get paid for 52 weeks despite taking some of those off.
 Rule of five: Take five samples. You can be 93% confident the median is between your five samples. Works for any distribution (not just normal).
 > Works for any distribution (not just normal).How can this possibly be true?If I take five samples of the speed of my car, and I always take them while the car is just setting off, it's never going to be anywhere near the median speed over a twelve hour drive.I feel like there must be a huge list of extra constraints and caveats you aren't mentioning.(I'm really bad at stats - genuinely asking.)
 There would at least be an assumption that the samples are drawn independently from a single distribution, and the estimate of the median is for that same distribution.In the example you gave, the samples you draw are from the distribution of (the speed of your car just as it is setting off). There would be no guarantee that the estimated median has any relationship for any other distribution, such as (speed of your car during a trip). If you want to estimate the latter you'd need to figure out a way to draw random samples from that distribution.
 Can you do this with a source of infinite samples? If every time I take a sample it's slightly higher, does this still hold?
 I'm not good with stats, but five increasing measurements has two options: A. You've hit an unlikely coincidence, and you're fine B. You're not really randomly drawing from the same distribution.Which scenario seems more likely ;)
 > If every time I take a sample it's slightly higher, does this still hold?I don't know enough stats to give a firm answer, but I'd reckon there is a key assumption that the samples need to be drawn i.i.d. from a single underlying probability distribution, or perhaps need to satisfy the related assumption of being exchangeable.https://en.wikipedia.org/wiki/Independent_and_identically_di...https://en.wikipedia.org/wiki/Exchangeable_random_variablesIn your example of a sequence of samples that increase, they're certainly not exchangeable. I think they're not independent either.E.g. thought experiment to give a concrete version of your example, where we define it so there's no randomness at all to make it easier to think about : let's suppose an idealised situation where we launch a space probe that travels away from the earth at 15 km / second. Suppose we have some way of measuring the distance d(t) that probe is from earth at some time t after launch. Regard each distance measurement d(t) as a sample. Let's assume we take 5 samples by measuring the distance every 10 seconds after launch. So t_1=10s, ..., t_5=50s, and d(t_1)=150km, ..., d(t_5)=750km.The sequence of distance samples d(t_1), d(t_2), d(t_3), d(t_4), d(t_5) is not exchangeable as if we exchange two samples like d(t_2) <-> d(t_4), the permuted sequence d(t_1), d(t_4), d(t_3), d(t_2), d(t_5) corresponds to the situation: "at 10 seconds the probe was 150 away, at 20 seconds the probe was 600 km away, at 30 seconds the probe was 450 km away, at 40 seconds the probe was 300 km away, at 50 seconds the probe was 750 km away" -- the probability of observing that outcome is an awful lot lower -- based on our understanding of how physics of the situation work in this idealised example -- than the probability of observing the outcome from the original sequence (this is pretty sloppy as I am not clearly distinguishing between observed values and random variables, but hopefully it gives some vague intuition).So if you want to estimate the median distance of the probe from the earth from 5 samples, you roughly need to take 5 measurements at 5 times chosen to be uniformly at random from the entire period you are interested in. E.g. if you want to estimate the median distance of the probe from the earth during the first 10 years of travel, you need to draw 5 samples from 5 different times sampled from the uniform distribution over the period [0 seconds, 10 years]. Then the resulting estimated median distance would only apply for the distance of the probe during that time period, it would not be an estimate that could be applied for any different time period.
 > a key assumption that the samples need to be drawn i.i.d. from a single underlying probability distribution, or perhaps need to satisfy the related assumption of being exchangeableUnfortunately this kind of key assumption is rarely made explicit when teaching people stats. I see research papers all the time making this assumption where it clearly isn't warranted - such as in benchmarking a computer.
 From Douglas W. Hubbard, How to Measure Anything (3rd ed.) via https://lobste.rs/s/kk89vp/back_envelope_estimation_hacks#c_...> There is a 93.75% chance that the median of a population is between the smallest and largest values in any random sample of five from that population.> It might seem impossible to be 93.75% certain about anything based on a random sample of just five, but it works. To understand why this method works, it is important to note that the Rule of Five estimates only the median of a population. Remember, the median is the point where half the population is above it and half is below it. If we randomly picked five values that were all above the median or all below it, then the median would be outside our range. But what is the chance of that, really?> The chance of randomly picking a value above the median is, by definition, 50%—the same as a coin flip resulting in “heads.” The chance of randomly selecting five values that happen to be all above the median is like flipping a coin and getting heads five times in a row. The chance of getting heads five times in a row in a random coin flip is 1 in 32, or 3.125%; the same is true with getting five tails in a row. The chance of not getting all heads or all tails is then 100% − 3.125% × 2, or 93.75%. Therefore, the chance of at least one out of a sample of five being above the median and at least one being below is 93.75% (round it down to 93% or even 90% if you want to be conservative).
 Right so random samples uniformly taken across the whole set of a finite set of independent samples is the first of the extra constraints you didn't mention... which makes it not useful for many real-world computer-science applications like benchmarking, where you often have samples that are inter-dependent and from an infinite set so you can't sample them uniformly.
 You don't understand convexity. You don't walk across a river that is 4ft deep on average.
 Came here to post this. I took his course as “The Art of Approximation in Science and Engineering” and absolutely loved it. So much so that it became the name of my CTF team for the next few years.
 Here's one that I learned from HN years ago (https://news.ycombinator.com/item?id=18463449)Rule of three: If you want to compare two things (i.e if code A is faster then code B), then measure each three times. If all three As are faster then the Bs, then there is a ~95% probability that is true.Derivation: We are trying to find the probability of selecting three As, from the set (ABABAB). We don't care about the order of the As, so this is a combinations without repetition problem:6 choose 3 = n! / ((n-r)! r!)) = 6! / (3! 3!) = 120 / 6 = 20There is only one state where all three As are chosen, so the probability of getting it by random chance (our null hypothesis) is:1 / 20 = 5%.Hence 95% confidence.
 Uh oh... I think you're interpreting the probability incorrectly. Shouldn't this be "if we assume that a and b take the same amount of time, then there's a 5% chance that we would see this result?".Can't actually speak to the probability that a is faster than b without some kind of assumed prior distribution for a and b
 It should also be pointed out that the one thing you're ruling out is that they have the same distribution.In particular it only provides weak evidence that the mean, mode, median of A is any faster than B's. In the worst case A happens to be a lot slower than B, but with very low probability.
 Yep. It gets at, what is the meaning of "faster?" Implicitly we are saying the average runtime of A is smaller than the average of B.But if for example, B always halts in 10 seconds, and A halts in 1 second 99.99% of the time but A runs for 10 years 0.01% of the time, then A is "slower" but you'd need 10,000 trials to notice.
 Yes, this is correct.
 This is neat but I think your derivation needs another step: a Bayesian prior that before this experiment, you thought the probability of A beating B was 50%.
 Both the statistical reasoning and the math are wrong here.First, this isn't really "95% confidence" of anything. You are computing the probability of seeing your outcome in the case that A and B have the same distribution. This is very counterintuitive and I highly recommend people read up on a p-values to get an exact sense of what's going on. A low p-value simply tells you that it's unlikely that A and B have the exact same performance, it doesn't really tell you that A is faster than B in any specific manner (mean, median, mode, etc).Second, the math here is wrong. The p value is actually 0.10, not 0.05 as presented here as you should also take into account the possibility of your test returning 3 Bs. You then get p = 2 * 1 / 20 = 2 / 20 = 0.1. (The exact terminology here is whether the test is a two-tailed or one-tailed test an usually you want to use the two-tailed test unless you have some special prior knowledge).
 I think the math here should be right, although perhaps the confusion here is because I wasn't clear about the role of the order:We're ordering the trials by speed, and computing the probability of getting an AAABBB ordering. The probability of ordering AAABBB by random chance is 1/20. That's why we don't care about getting BBB first, and why getting AAA first does tell you that A is faster then B.You can also click on the link in the OP and see other people explaining the logic of the rule. If there's an error with either my explanation or others, I'm interested in hearing it.
 The problem is that you are equally likely to get a BBBAAA ordering which would give you the same Type 1 error as AAABBB in the case where A and B have the same distribution.The main key to remember when deriving a p value is that you are computing the probability of seeing an outcome at least as extreme given the null hypothesis, not the probability of seeing your particular outcome.
 I see your point. But isn't that only true for two-tailed tests? This sort of ordering problem seems more suited to a one-tailed test. Let's say for example, we represent the bins of the probability distribution as the number of B's we can obtain (0, 1, 2, or 3).In this case I am defining the statistical significance of our test ordering (AAABBB) as anything less than or equal to 0.05 of our probability distribution - which corresponds to the 0 Bs bin. This corresponds to a one-sided confidence interval of 95%.What am I doing wrong?
 A one sided test is only valid if you know that it's not possible to get a result in the other direction. In the general case you don't know for certain that B can't be faster than A. One sided tests are almost always invalid as it's very difficult to know that the other direction is impossible.If this doesn't make sense, I would recommend running simulations under the null hypothesis. You will see that 5% of the time you will falsely conclude that A < B and that in another 5% of the time you will conclude that B < A, leading to an overall false positive rate of 10%.
 > If you combine the rule of 72 with the powers of two, then you can quickly find out how long it will take for a quantity to increase by several orders of magnitudes.This seems to me like good but dangerous advice. I'm all for using back-of-the-envelope estimations as "small angle approximations" where they're not far off, but, in general, iterating the lossy approximation to get a large-scale answer is the wrong thing to do. It works well here, but only because we're dealing with logarithms, which genuinely do convert multiplication to addition, and not mentioning this point seems to tempt one to use it in situations where it's not applicable.
 Less sophisticated, but recently I realized that just plain remembering certain numbers like "average hours per month" (~730) or "seconds per day" (86400) is really helpful to make very quick estimates. A customer wants a quota of 100k queries a day? That averages out to a little more than one query per second assuming equal distribution.
 Edit: I stand corrected.365.25 / 12 * 24 = 730.5Technically it is 731 if you round up.
 I find it hilarious that so many people are so worked up on the rounding on the third significant figure of a number in the comments section of an article about "back of the envelope estimation hacks".
 Guess that's engineers for you
 I used to think engineers would round to the nearest order of magnitude…
 Other useful numbers:Days in a year: 365.2425 (exactly, under Gregorian calendar)Days in a month: 30.44 (approximation, a twelveth of the above)
 Since you're getting downvotes and several others have posted disagreeing, I'm posting to agree with 730.5. I also get 365.25 * 2 = 730.5. I have no idea why other people are saying this somehow comes out to exactly 730.> Technically it is 731 if you round up.Actually, since 365.25 is a bit high (if it were exact we would all still be on the Julian calendar), I would round down to 730 since the multiplication gives a little less than 730.5. But that's still not the same as saying the multiplication gives exactly 730, as others are saying.
 I did this exact calculation and it comes out to 730, exactly like GP said.
 No, that evaluates to 730. It's the same as 365*2=730.
 The number of days per year is not exactly 365.
 365 * 2 = 730 :)
 There aren't exactly 365 days in a year.
 We understand nothing “exactly” in this universe, but this is a thread about estimation hacks so maybe cool it on the fun fact witch hunt.? Idk just a thought.
 > We understand nothing “exactly” in this universeApparently a lot of "we" don't, since a number of people were saying they did the "exact" calculation that kolanos did but got a different answer.> so maybe cool it on the fun fact witch hunt.?The "witch hunt" seems to me to be on the part of the people who downvoted a perfectly correct and legitimate post, and those who responded to it with a claimed "exact" calculation that was wrong.
 It's been about 20 years since I read this, but for people interested in quick mental math, here's a great book on doing fast calculations in your head: Dead Reckoning: Calculating Without Instruments https://www.amazon.com/dp/B00BZE4916/ref=cm_sw_r_tw_apa_i_Yu...IIRC it covers everything from multiplication to square roots to logarithms.
 I can't find a way to contact the author, so I'm just going to post this here: your RSS feed has broken links, it links to www.robertovitillo.com, but the www subdomain does not resolve to your site.
 Whoops, thanks!
 Can most people do these calculations in their head?Personally I find that I am unable to do these calculations/estimations unless I write things down. I get lost between powers of 2/MB/TB/nanoseconds and other unit conversions. I understand that this is elementary math, but still, I am just curious, can most people do similar calculations in their head (for instance during a meeting) ?
 I think it probably helps to know units and powers cold, so you can use your short term memory for the actual calculation. I was able to do the first calculation in the article because, as it says, it’s just summing exponents on powers of 10. It should be second nature that M=6, G=9, T=12, or that “nano” means -9.Powers of 2 are incredibly helpful too: I can estimate that the square root of 4000 is around 60 because 4000 is close to 2 to the 12th, for instance. As a coder it’s well worth your time to memorize powers of 2 up to at least 2 to the 16.
 Further its good to think of your powers of 2 like: 10, kilo; 20, mega; 30, giga;And for 0-9 I have those memorized. So if I'm given the number 2^39 I unwrap that as 512 giga, or just over 500 billion (seems its ~550, but estimating with base 2 is still going to be closer than estimating with base 10)
 Don't underestimate the effectiveness of practice!
 The quickest way to get good at this is practice.Just ask engineer friends random questions from time to time, and when there is no consensus, calculate the results properly.Eg. How much power could a solar powered fly collect?How many tons of batteries would I need to drive a train across the USA?How long can I power my laptop from a solar usb battery bank in the sun each day on a hiking trip?What's the load time of this webpage I just wrote on a satellite internet connection?How long would it take to boot up my OS if I could keep RAM intact during a reboot?How many kilos of coal does it take to cook a pizza?
 When tipping, I realized it's harder for me to multiply the bill (say \$58) times 15% rather than times 30% / 2. In the latter case I simply do 60 * .3 / 2 which is \$9. I thought about this after realizing, for example, that 18 * 5 is really 18 * 10 / 2.Edit: just realized this is more about mental math than about back-of-the-envelope estimations
 I usually just do something with 10s. I usually tip 20% so i knock off a zero then double it.If I tipped 15%, I would probably knock off a zero, then add half to it.
 Order-of-magnitude is extremely important. Frequently I find myself asking "So how much money is it going to save us? \$100/mo, \$1k/mo, or \$10k/mo?" It's basically a nicer (and more concrete) way to ask "Are we sure that this project is worth the money the company's paying us to do it?"(Unfortunately I don't always get a satisfying answer, but that's life...)
 Simon Eskildsen has a monthly newsletter called Napkin Math[0] entirely dedicated to practicing practical real-world Fermi problems like these.
 A post with some examples: http://highscalability.com/blog/2011/1/26/google-pro-tip-use...
 Another neat hack for estimating lower limits on latency is that light travels roughly 1 foot in a nanosecond.So if two servers are in a room 100 feet apart, the time it takes for them to communicate cannot be lower than 100 ns.
 A large hamburger contains approximately one kilowatt hour worth of kilocalories.
 Working out loan repayments. If you borrow 100k over 20 years then roughly you'll be paying interest on 50k every year. If the rate is 8% then that's 4k in interest every year. You'll be paying back roughly 5k of the loan every year, so your annual repayment well be 9k a year or 750a month
 (Keep in mind that due to how compounding works, averaging interest in this way will underestimate it slightly.)
 Yes but the point is that it's just a back of an envelope estimate. I'm amazed that it's been downvoted as it's actually a technique that's used in this way all the time, or at least was anyway. I don't know about now when everyone has a smartphone.

Search: