If a "good" candidate is a 1-in-100 find, then each false negative means you have to look at another 100 candidates.
Also, if you decrease your false negative rate by more than you decrease your false positive rate, you're actually hiring MORE bad candidates (while spending more time interviewing). I.e., every time you pass on a good candidate, that gives you another chance to make a mistake and hire someone bad.
Your odds of hiring one specific "bad" candidate may be small, but if most candidates are "bad", that actually makes it more likely to hire someone bad each time you pass on someone good.
(1) loss of entire person year or even two because annual reviews need to accumulate evidence for HR to fire (may be less time if you were startup without real "HR")
(2) amount of cleanup other people have to do after that new bad hire
(3) loss in moral for good people in your team who now perceives your hiring process at the company as "broken"
(4) delays + bugs introduced in product because actual work probably didn't got done or badly done despite of you filling up your headcount
(5) amount of money lost in salaries, signing bonuses, office space and benefits (typically > $200K)
(6) amount of productivity lost because of wasted time by good people in the team trying to "ramp up" your true negative
(7) emotional stress you caused to good people wondering them about their job stability and to managers who wasted their time in months of paper work and lot of explaining
(8) emotional stress you caused to your true negative being fired who had moved across the country for you, bought a house on mortgage and had 3 school going children
(9) Most likely, if you are big company, true negative didn't actually got fired because hiring manager never wanted to admit it. S/he was encouraged to join another team or role or even learned political tricks to get promoted contributing to ongoing bozo explosion
(10) I could go on and easily justify probably 3X-10X loss compared the case if true negative was avoided
1 - Many companies specify their hiring rate over total resume they received which is wrong. I'll use 10% as total number of full interviews that needed to be conducted which is average of 5%-15% at most companies.
2 - This is bad math. Assuming random trials, it would be actually 5 person-day on average but intuitive approach doesn't produce entirely bad results here so we will go with that.
3 - http://guykawasaki.com/how_to_prevent_/
* 99% of applicants are bad
* 50% false negative (You look over about one good developer for every good developer you hire)
* 1% false positive (one out of a hundred bad devs can snooker you into hiring)
In that scenario, you're twice as likely to hire a bad dev as a good one. And if you halve your false positive rate by increasing your false negative rate by 50%, you're still twice as likely to hire a bad dev, it will just take you twice as much work.
Even if you hire 10% of the candidates you on-site interview, that says nothing about your actual false positive or false negative rate. For all I know, he could be weeding out all the top candidates at the pre-interivew stage, and then hiring the best of a mediocre group of people.
It's easy to measure your false positive rate, people you are forced to fire (or wish you could fire if not for corporate bureaucracy).
It's harder to measure your false negative rate. The only way you could measure your false negative rate is to pick a random sample of people who fail your interview, AND HIRE THEM ANYWAY. (However, that could be a lawsuit risk. It would be unfair to the people who hire despite failing the interview. A small business couldn't afford to do it, only some huge corporation could do the experiment.)
Also, I doubt the ability of most businesses to identify the best performers AFTER THEY ARE HIRED and working there for a couple of years.
I feel you are truly confused about FP and FN. Whether there are 99% bad developers out there or if you hire 10% of candidates you interview - these both quantities are independent of FN and FP. FN says that you are turning away X good people and it's again independent of FP which ultimately decides how many bad developers you would eventually end up hiring regardless of other 3 quantities I mentioned. See here: http://en.wikipedia.org/wiki/Confusion_matrix
It's not easy to measure FN, FP, TN or TP. Even good people fail due to different reasons like bad manager and bad people may succeed despite of mediocre skills. Looking at who you had to fire or who got promoted doesn't give accurate measurements at all although they may serve as weak proxy. The scenario I described was hypothetical to point out that cost of FP is far more higher than additional cost in hiring due to FN.
I'm using standard terminologies here. There are plenty of textbooks and articles on confusion matrix, precision, recall, RoC etc. Not sure what definitions you are using to arrive at conclusion that FN increases the number of good hires (it only increases effort).
FP = probability, given a bad candidate, you will hire him
FN = probability, given a good candidate, you will pass
Suppose 100 bad candidates, 10 good candidates
You make 10 bad hires and 9 good hires
You make 10 bad hires and 8 good hires.
So increasing FN lowers your yield.
Your statstic (good hires / total hires) tells you nothing about your actual FP or FN value.
If you don't get it, I'm not wasting time on you anymore. You are very dangerous. You think you know statistics, but you don't.
Using the math from that link, if you decrease FN, then PPV increases.
First FP and FN are not probabilities. They are just unbounded numbers. This may feel pedantic but in a moment I'll show you why this is critical. Let me draw the confusion matrix first (G = Good candidates, H = Hired candidate etc):
\ H NH
G | TP FN
B | FP TN
What you are referring to as probabilities is actually False Positive Rate or FPR and TNR respectively which is defined as follows:
FPR = FP / (FP + TN) = FP/B
FNR = FN / (TP + FN) = FN/G
precision = P(G|H) = TP/H
So how do we get TP to calculate precision if we only knew FPR, FNR, G and B? I did little equation gymnastics using above and got below:
TP = G - GFNR
H = TP + FP = TP + FPRB
So now you can plug this in to above equation for precision and find that as you increase FNR, precision goes down while you keep FPR constant. So you are actually correct. Although it might look like unnecessary exercise vs following intuition I think above equation can actually help calculate exact drop in precision and multiply that with cost of FP vs FN to get the operating sweet spot. On my part I need to do some soul searching to figure out why this didn't triggered to me before :).
if you decrease your false negative rate by more than you decrease your false positive rate, you're actually hiring MORE bad candidates
First false negatives (FN) and false positives (FP) are independent of each other. FP estimates how many bad developers you would end up having regardless of your FN. The FN determines how many good developers you would turn away regardless of your FP. If you are confused about this, well, these numbers are part of appropriately called "Confusion Matrix". I would highly recommand reading up on Wikipedia (http://en.wikipedia.org/wiki/Confusion_matrix) or any textbooks before you jump on commenting and through bayesian equation around because you are certainly not using right terminology. Also both of these are again independent of actual % of bad developers out there (i.e. whether market has 99% bad or 1% doesn't matter, FP solely determines what many bad developers you would end up with).
Next, it might be actually easier for you to think in terms of precision and recall instead of FP/FN. Interviewing process is nothing but classification problem and P/R is standard way to measure its performance. Again Wikipedia is your friend to brush up on that.
A classic situation in classifier performance is referred to as precision recall tradeoff. You can plot that on curve called RoC and choose your operating point. The way you typically do that is by quantifying how much you would get hurt due to loss in precision (~ more FP) compared to increase in recall (~ less FN). You plug the costs in equation and decide your operating point. For companies that can rapidly deal with FP, increasing recall may make sense and other way around. However in most cases there are too many other reasons that I'd listed should typically prevent you from lowering your precision too much.
Why should it take 2 years to fire someone? That sounds like a corporate bureaucracy problem.
I had never thought of it this way before but this is an instance of Bayes's rule. If the false negative rate goes too high and the percentage of good programmers is small then yes, the process could actually increase the odds of a bad hire.
Try it out with some numbers.
10100 candidates, 100 are "good".
Suppose you have 2% false positives and 1% false negatives.
You hire 99 good candidates and 200 bad candidates.
Suppose now you have 0.5% false positives and 90% false negatives. (You decreased your false positive rate by 4x but increased your false negative rate by 90x. This is typical for employers who look for every little excuse to reject someone.)
You hire 10 good candidates and 50 bad candidates. Your "good hire" percentage went down, and you're churning through a lot more candidates to meet your hiring quota!
So, "it is better to pass on a good candidate than hire a bad candidate" is FALSE if you wind up being too picky on passing on good candidates.
Assuming you can identify losers and fire them after a year or two (with decent severance to be fair), you're actually better off hiring more leniently.
It's also even worse when you realize that the candidate pool is more like:
10200 candidates, 100 are "good", 100 are "toxic", and the toxic people excel at pretending to be "good".
Also, the rules for hiring are different for a famous employer and a no-name employer. Google and Facebook are going to have everyone competent applying. If you're a no-name startup, you'll be lucky to have 1 or 2 highly skilled candidates in your hiring pool.
When you make a false negative, you never find out that you passed on someone amazing.
When you make a false positive, it's professional embarrassment for the boss when he's forced to admit he made a mistake and fire them.
So the incentive for the boss is to minimize false positives, even at the expense of too many false negatives. The boss is looking out for his personal interests, and not what's best for the business.
What you're attempting to do works well for hypothetical drug testing or terrorists but not for hiring developers (or anyone else). With the numbers you used you're proposing that less than 1% of all candidates are "good" - nobody would reasonably set the "good" threshold to include only the top 1% of developers.
First, unless you really think we are terrible at hiring as an industry. So even if on a given day all developers that start looking for a job have a skill level that matched the average population, the good developers will find jobs faster, leaving the 4th, 5th and 6th job applications for the developers that did not manage to get hired after applying ina couple of places at the most. So yes, your talent pool on any particular day, just due to this effect, is far worse than the average talent in the industry.
Then there's how bad developers are fired or laid off more often than the good ones, so they are added to the pool more often. Typically companies make bigger efforts to keep good developers happy than those that they considered hiring mistakes.
And then there's the issue with the very top of the market being a lot about references and networking. In this town, no place that does not know me would give me the kind of compensation that places that do know me would. I'll interview well, but nobody will want to spend top dollar in someone based just on an interview. In contrast, if one of their most senior devs say that so and so is really top talent, then offers that would not be made normally start popping up. The one exception is 'anchor developers', people that have a huge level of visibility, and you still won't get them to send you a resume at random. You will have to go look for them, at a conference, user group or something, and convince them that you want them in the first place.
My current employer has a 5% hire rate from people interviewing off the street, and that's not because our talent is top 5%, but because you go through a lot of candidates before you find someone competent. We've actually tested this: Interviewers do not know candidates, even when they were referred by other employees. But, as if by magic, when there's a reference, the interviewed almost always is graded as a hire.
Based on my experience, most employers don't hire better than "Pick a candidate at random".
Also, if you had an employee that was super-brilliant, why would you tell someone else so they can hire them away from you?
Based on the people I've worked with over the years, I say that the actual skill distribution is:
5% toxic - These are the people who will ruin your business while deflecting blame to other people.
25% subtractors - These are the people who need more attention and help than the amount of work they get done. In the right environment, they can be useful. (Also, this is mostly independent of experience level. I know some really experienced people who were subtractors.)
60% average - These people are competent but not brilliant. These are solid performers.
9% above average - They can get 2x-5x the work done of someone average.
1% brilliant - These are the mythical 10x-100x programmers. These are the people who can write your MVP by themselves in 1-3 months and it'll be amazing.
You first have to decide if you're targeting brilliant, above average, or average. For most businesses, average is good enough.
If you incorrectly weed out the rare brilliant person, you might wind up instead with someone average, above average, or (even worse) toxic.
Actually, when my employer was interviewing, I was surprised that the candidates were so strong. There was one brilliant guy and one above-average guy (My coworkers didn't like them; they failed the technical screening, which makes me distrust technical screening even more now). They wound up hiring one of the weakest candidates, a subtractor, and having worked with him for a couple of months my analysis of him hasn't changed.
There is no reasonable definition of average that would only allow for 9% above that (or 10% including the 1% you marked as brilliant). Average is usually considered as either the 50th percentile (in which case you would have ~50% above this) or some middle range (e.g. 25th - 75th percentile).
Since you said 60% are average we'll consider an appropriate range as average, the 20th - 80th percentile. That leaves you with 20% of applicants below average and 20% above. Your math falls apart real quick when we're dealing with distributions like 20%/60%/20% instead of 99.5%/0.5%.
[As an aside, the toxics and brilliants are outliers, they should be fairly obvious to a competent interviewer (and as someone who previously spent a decade in an industry where nobody conducts interviews without adequate training I'll be the first to say most interviewers in our industry are not competent)].
So "average" is not really a meaningful term. I mean "average programmer" as "can be trusted with routine tasks".
Behind every successful startup, there was one 10x or 100x outlier who did the hard work, even if he was not the person who got public credit for the startup's success.
If you're at a large corporation and trying to minimize risk, hiring a large number of average people is the most stable path. You'll get something that sort of mostly works. If you're at a startup and trying to succeed, you need that 10x or 100x person.
It has 20 elements and an average of 10. 5% are toxic. 25% are below average. 60% are average. 10% are above average.
I didn't say it was impossible to construct a set that would yield only 10% as above average, I said there "is no reasonable definition of average" - if you feel the above set accurately represents the distribution of the caliber of developers then we clearly have very different opinions of what's "reasonable."
Or one could replace the mathematical term average with the word ordinary. Is it possible for 60% of developers to be ordinary?
That would depend on what set of developers we're looking at:
All developers - this will be very bottom-heavy, people [usually] get better with experience and there's obviously a lot less people that have been doing this for 20 years than having been doing it for two. Additionally people who are bad at a profession are more likely to change careers than those that are good (this is by no means an absolute, I wouldn't even go as far to say most bad engineers change professions, I'm just saying they're more likely to - further contributing to higher caliber corresponding well to years of experience).
Developers with similar experience - this is much more useful as there's not much point comparing someone who's been doing something for decades with someone on their first job. I would expect this to be a fairly normal distribution.
Developers interviewing for a particular position - applicants will largely self-select (and the initial screening process would further refine that) so this group will largely have similar experience (i.e. you're typically not interviewing someone with no experience and someone with 25 for the same job). But it won't match the previous distribution because, as someone else commented, the bad ones are looking for work more often (and for a longer period of time). Do the interviewees you wouldn't hire outnumber the ones you would? Yes, definitely. Do they outnumber them by a factor of a hundred to one? Definitely not. Ten to one? Probably not - if they do it probably represents a flawed screening process causing you to interview people you shouldn't (or not interview the people you should) rather than an indication that only one out of every ten developers are worth hiring.
If you substitute with these terms:
- 5% toxic
- 25% subtractors
- 60% competent
- 9% exceptional
- 1% brilliant
...then there's no reason to apply (or defend!) the mathematical definition of "average." And I think those numbers actually seem somewhat reasonable, based on my own exposure to working developers in various industries. What this doesn't count is the the "FizzBuzz effect," where ~95% of the people who are interviewing at any one time (in a tight market) tend to be from the bottom end of the spectrum.
Even within the broader pool of programmers, the line between subtractors and competent is very project-dependent, in my opinion. For some levels of project complexity, the line might actually invert to 60% subtractors and 25% competent, while for far less complex projects, it might be 5% subtractors to 80% competent.
In the former case I'd want an exceptional developer, while in the latter the exceptional developer probably wouldn't even apply, or would quit out of boredom.
For example, if you assume that 90% of your employees are "good" and 1% are "toxic", what does that tell you about the candidate pool and/or your interview process?
If I was the boss and had a "toxic" employee, I'd just dump them rather than waiting. I've been forced to work with toxic people because I'm not the boss, and I've noticed that toxic people are really good at pretending to be brilliant.
Over the years, I've also worked with a couple of people who singlehandedly wrote all of the employer's key software. I also worked with several people who wrote a garbage system but conned everyone else into thinking it was brilliant.
If 90% of candidates are "good", then why waste time with a detailed technical screening at all? Just interview a couple and pick the ones you like the best.