

Games People Play - Bayesian filters in recruiting - bdfh42
http://weblog.raganwald.com/2008/03/games-people-play.html

======
dpapathanasiou
Somebody has actually implemented this (<http://londonmiddleware.org/chaff/>),
though it's not clear how seriously the results are used in a real candidate
evaluation.

~~~
raganwald
Thanks for the link, I will add it to the post. +1!

~~~
apathy
You end your article with 'Bayesian filters will not outperform a human in
hiring'. But that's not the point.

A classifier, and especially a supervised classifier, is really just a tool
for intelligence amplification. Make a dumb man perform smarter, and a smart
man perform better than almost anyone without the advantage. Similar to
providing physicians with a checklist for common procedures which has been
developed by iterative analysis of outcomes. It's very hard to become a
physician if you're dumb or lazy, but it's quite easy to get fatigued on a
36-hour residency shift, or get complacent if some 'trivial' procedure
interrupts your microsurgery specialty. And the stakes are far higher in most
surgical interventions.

Naturally, many doctors resist. But the best figure out how to use this to
their advantage. It increases the efficiency of the system. Likewise, having
the 'advice' of a machine that has been trained on a corpus of good multiple-
human-actor decisions, over time, can provide individuals with better judgment
than their experience alone. This is more apparent with a larger corpus and
finer-grained classification -- eg. multidimensional classification with a
huge corpus and eigenclasses of suitability. Game that, and you're smart
enough to be in management, most likely ;-)

So, I don't believe you should let your detractors off so easily. Maybe a
talented human will outperform an filter with a small corpus. But I'd bet
dollars to donuts that, for someone who isn't a full-time interviewer, the
assistance of a well-trained filter will increase their acuity and throughput,
allowing them to get on with their real jobs and worry less about dumb hires.

You can't really avoid the enthusiasm of junior employees who haven't been
burned, and this is another scenario where a filter can help them gauge their
judgment by providing a historical perspective. "You know the last guy we
hired who interviewed like this, one of your coworkers spent 2 hours a day for
3 months training him, and then we fired him!" That's something you want to
avoid, and I have seen this happen at places like Google where you might think
they'd be immune. But once you let the dumb or negative folks in, it's all
downhill from there.

So -- replace humans? No. Augment them? Yes. It's what computers (and
statistical analyses) are meant for!

------
mynameishere
An employee will cost anywhere from 20K/year to 500K+/year. I think it is
worth spending 2 minutes personally reading each resume.

~~~
raganwald
Ah, my old friend whose name is here. Your comment is absolutely true.

If I may ask, are you mentioning it because you think the post is suggesting
otherwise? Or are you just mentioning it??

~~~
mynameishere
No, you're not suggesting otherwise. But using "Bayesian filtering" (or
whatever variation of it) is best on huge data sets. Working manually, I could
tell you with near 100 percent reliability which email is spam--better than
any filter. It's inefficient for a human to do it, so a process that can
remove 95 percent instead of 100 is acceptable. Inefficiency matters less as
the data becomes smaller and more important.

Real life example: My current manager has some twisted filter on his brain,
whereat he is convinced that a mastery of certain things (like design
patterns, or "OO architecture") are extremely important. We were interviewing
a while back, and some kid said his 'proudest achievement' was a Pac-Man clone
he made. Well, my manager's filter did not include the words "Pac-Man clone"
and so we never even looked at it.

Every good candidate in a creative field is going to go outside the bounds of
_any_ filter you can come up with, training or otherwise. The better they are,
the more likely this is true. A tool that is suitable for flagging "V!agr3" is
not necessarily the tool for...identifying good pharmaceutical researchers.

~~~
pchristensen
That was the main concern I had about the Bayesian resume filter - would it
work at small (or even mid-sized) companies? Sure, with Google getting 10K's
or resumes a month, they could mine some monster data out of it, but if you
hire a couple people a year and get 100 resumes, do you have enough data?

I guess Reg's point was that even if it isn't perfect, it gives you _some_
data, which is a heck of a lot better than _no_ data.

~~~
cstejerean
Some data is not always better than no data. Sometimes data can provide a bias
that you don't want. So it has to be the right data that you have. For example
knowing that someone attended a prestigious university is likely to create a
bias towards hiring that person and ignoring clues that indicate otherwise.

If your classification filter recommends somebody it's possible to let that
piece of information provide the same kind of bias. So if the data you have is
likely to be unreliable you might want to just ignore it.

~~~
raganwald
"So if the data you have is likely to be unreliable you might want to just
ignore it."

That prompts me to ask two questions:

1\. So should you forget about data, or pay attention to collecting good data?
WHich course of action is more important? 2\. If you don't make decisions
based on data... Just how are you making decisions?

I am serious about question #2. We aren't talking about face to face
interviews here, we're talking about looking at 200 resumes and deciding which
ten people to call with the expectation of bringing 3-5 of the ten in for
interviews.

In my experience, when people tell me they are using their "experience" and
"judgment," They are actually using a highly biased process, such as selecting
people who went to their University or preferring people who share the same
hobbies.

~~~
cstejerean
1\. try to collect better data if possible. if that's not possible ignore the
source of that data (and likely find another source that can provide more
reliable data). So I might ignore the automatic resume filter and perhaps
judge the fitness of a candidate based on how enthusiastic they seem about
work they've done in the past (just an example)

2\. You always need to make decision based on data. But sometimes you need to
allow your brain to interpret the data for you, even for resumes. If you're
getting more resumes than you can handle reading by hand you can implement a
spam filter. For example require candidates to solve a sample problem and
submit the answer along with their resume.

And as far as gaming by providing stock answer to the coding problems try
this: extract actual problems (bugs or new features) from your real
application, write automated tests for them and then put the problems online
for interested candidates to solve. After a while identify a new problem from
your application and put that online. Not only are you finding quality
applicants you're solving real problems at the same time.

These are just some thoughts that went through my head by the way, I can't
speak for how effective these methods are (or would be).

