

What's wrong with OKCupid's matching algorithm. - crasshopper
http://blog.hiremebecauseimsmart.com/post/13178732106/okcupid-whats-wrong-match-algorithm

======
fjh
Minor point, but if you are using factual questions to filter out stupid
people, asking for the "right" number of continents is probably not ideal. You
(and I) might disagree with someone counting Eurasia (or North and South
America) as a single continent, but it is not really an indicator of stupidity
in the way that believing in astrology or thinking the earth is larger than
the sun are.

According to the wikipedia article on continents, different models are taught
in different countries, so you probably end up selecting for a different
attribute than you thought.

~~~
ugh
People I don’t want to date: Anyone who gets hung up on arbitrary definitions
and is ignorant of other cultures.

That questions seems to be a pretty good filter, just not for the reasons he
thinks it is.

(As I child – not in school but before – I learned that there are five
continents: Africa, Australia, America, Asia, Europe. I don’t think I was ever
told in school how many continents there are. All we were told were names of
certain landmasses to make sure we are all talking about the same thing when
saying America, Antarctica, South America or Eurasia. Why bother counting
them? That doesn’t even make sense.)

~~~
LaGrange
See, but if you made that question mandatory, you'd weed out me, because I was
taught in a different school in (probably) different country, and my set of
default continents is different.

OTOH, I think I marked this question as irrelevant.

~~~
thret
I wonder if the question exists just to filter out people who _don't_ mark it
as irrelevant.

------
goodside
Hi. I work at OkCupid.

I can't address all the particulars raised in this analysis, but the biggest
is essentially correct:

> _The worst side effect of the current scoring system, is that a spammer
> could easily answer only the questions with obvious answers (basic facts and
> display of non-bigotry) and get a decently high match percentage with a lot
> of people. At which point, the spammer uploads a picture of an attractive
> guy/girl, writes some generic profile text, and scams away._

The algorithm as described in the FAQ does suffer from this problem. However,
we have enhancements that address the issue very effectively. The FAQ is
slightly out of date, and shouldn't be taken as a complete, exhaustive
description of how we make matches.

~~~
exit
why not let people search by specific conjunctions of questions they've
answered publicly?

~~~
angelbob
Amusingly, you could. Create a new account, answer just those questions as
"mandatory", and see who your matches are. That's a bit effort-intensive,
obviously.

~~~
glimcat
Effort intensive as a one-off, maybe.

If you're one of the aforementioned spammers, I assume you just set it up once
in pywinauto and then go nuts.

------
ianterrell
> _The worst side effect of the current scoring system, is that a spammer
> could easily answer only the questions with obvious answers (basic facts and
> display of non-bigotry) and get a decently high match percentage with a lot
> of people._

That is exactly what I did which led to my meeting my (now long-term)
girlfriend. I was receiving about 5 profile views/week with 500 questions
answered. I scrapped them all, answered 20 or 30 questions with non-offensive
answers, and skyrocketed to 60-100 profile views/week.

~~~
leak
I don't know what your view/message ratio is but I can't imagine any guy has
any decent ratio. The thing I have noticed with online dating is this: girls
get a gang of messages + views and guys get some views. This is true with all
the guys I know who are on okcupid. The 60-100 views/week for you is what
girls I have dated on there say they get in messages, which I'm sure you know
is much harder thing to accomplish.

The art of online dating, IMO, is not in the questions answered but the
messages sent and secondly how non-offensive your profile description is.
Include keywords "Hiking, Laughing, Friends/Family, Good Beer" and you're in
like gin.

My 2cents :)

~~~
angelbob
Depends what you're looking for. But sure, adapted to your, um, target market.

------
ansy
OKCupid would probably get better matches if it dropped the user submitted
weights and ideal match preferences completely and instead used its database
of people in relationships as training data for a proper machine learning
algorithm.

The current approach is entirely oriented to give people what they think is
important and what people think they want. It would probably be better to
derive that from existing relationships (successes).

I would be a little surprised if people at OKCupid hadn't already thought
about this. Whether there is actually any momentum to change the core matching
mechanic or not remains to be seen.

~~~
FaceKicker
I agree, I wouldn't be surprised if the weights people manually assign to the
questions have very little correlation with the success of their
relationships. People probably suck at knowing what they want.

That's not to say they're meaningless though; e.g., if someone puts
"mandatory" for all his questions, that definitely says something about his
personality, and should be used as a feature in the ML algorithm used for
matching.

~~~
TheCowboy
People may suck at knowing what they want, but I think it's more that people
do not necessarily know what criteria they should want to select given their
goals for finding someone.

Another problem is that a lot of users are probably more casual about the
weights, such as selecting mandatory. (It's common for people to select an
answer and then mark that same answer as unacceptable in a match, for
questions in which it makes no sense for them to do that.)

------
passionfruit
As a person with very strong religious views I've found the matching algorithm
to be great at filtering out those with incompatible views. I think matching
based on ethical, political, and religious views is the matching system's
strength. The true weakness of the algorithm is that it matches poorly for
personality. Two people can have very similar beliefs but be a terrible match
in terms of personality. I would prefer an entirely separate score for that
aspect.

~~~
eftpotrm
Heh. Exact opposite here - I'm religious and just posted here about how bad I
find it at filtering people by religion (or politics, for that matter) for me.
Its ability to match _for me_ based on personality seems rather higher.

I'd suspected for a while TBH that there were odd effects from the balance of
what questions were answered; some topics have more data coverage than others
in the question pool and that (for me) seems to put a noticeable damper on its
precision in other areas.

~~~
crasshopper
> some topics have more .. coverage than others

I've noticed that too. There seem to be a lot of repetitious questions all
aiming at or around faith vs atheism. "Do you believe in fate", "Do you
believe in miracles", etc are talking about the same kind of thing.

------
daimyoyo
OKC has the worst matching algorithm on the Internet. I signed up for an
account, spend a large amount of time to get my profile "100% complete",
answered over 1,000 questions and was still consistently matched with people I
had absolutlely no interest in dating at all. I can't count how many times I
saw 0% compatibility scores, people hundreds or even thousands of miles from
me(I'd specifically said no more than 30 from my zip code) and worst of all,
several men(I'm not gay and didn't express and desire at all to meet men on
the site). I deleted my profile after 6 months and when they asked me why I
was leaving I told them how their matching process would be better using a
random number generator. It's too bad because OKC is one of few free dating
sites I know of that actually has a decent number of women using it.

~~~
nandemo
That doesn't sound like a bad matching algorithm, it's more like you found a
bug on the search form: distance and sex are presumably filters in the query,
not input to the matching algorithm. And it sounds like a rare bug too, since
if a lot of people were getting results of the wrong sex that would quickly
become an issue.

~~~
eftpotrm
I may be being harsh, but the comment sounds like a troll to me; he's
describing horrendous, show-stopper bugs like failure to respect basic search
filtering criteria which the apparently large number of HN readers with
accounts on OKC have failed to report. I don't find the post in any way
credible.

------
mtgentry
I'd like to see an analysis between these two sets of data:

A) Dudes that have an opinion about the OKC algorithm

vs.

B) The amount of times they get laid each week

Jokes aside, I'm fascinated with this topic. For most of history, the
likelihood of finding a mate was left to chance. Then OKC comes along and says
it can leap past obstacles such as chance and geography to help you find your
soul mate. That's an incredibly powerful idea.

------
tryitnow
I would have thought that OKC would use some dynamic weighting scheme where
the weights are not constant, but depend on how commonly people answer that
question. For example, questions that just about everyone lists as mandatory
wouldn't be weighted with 250, but with some number proportionately reduced to
reflect the banality of the question.

Hmmm, I'm a bit disappointed or maybe I'm just missing something.

~~~
awesomesauce311
Yes that's how it works, the FAQ is out of date.

------
apotheon
I guess I should have read the FAAAQ. I had no idea "mandatory" was being
applied to the matching algorithm in such a naive manner. That sucks.

Of course, it doesn't really matter for me, anyway. I don't use OkCupid as a
dating site. I use it as a way to find things like local libertarians,
programmers, and other essentially platonic things. In fact, for a few years,
I haven't even used it for that -- but that's how I did use it, so dating
criteria are kinda irrelevant, which means answering questions is kinda
irrelevant too. (I've answered quite a few just for shits and giggles,
though.)

~~~
tryitnow
That's interesting, I have tried using OKCupid for dating, but it seems like
all of my matches are women I might be interested in platonically, but not for
dating.

It could be that OKC has developed a very good system for meeting new friends,
but not necessarily for dating.

~~~
aidenn0
Out of curiosity, what qualities are desirable in a female friend that aren't
in one you would date (aside from the obvious like "she's married" or "she's
only interested in dating the wrong gender")

~~~
daxelrod
I'm guessing that it's the other way around; there are qualities that are
desirable in a person GP would date that are not found in people who would
still be great friends.

~~~
klipt
E.g., being physically attracted to them.

------
ajays
They seem to have tweaked their algorithm after the Match acquisition. My ex,
with whom I had about 65% match (she signed up after we broke up), suddenly
one day became a 80% match. She claimed she hadn't answered any more
questions, and neither had I.

I like OKC, but they don't do even basic filtering of profiles. If they just
verified, say, a mobile phone, it would get rid of the vast majority of fake
accounts.

------
eftpotrm
Interesting maths analysis. I'd certainly noticed that their ability to
reliably rank inside a window of say, 80% and 95% was rather low, along with
the ability to filter out deal-breakers. Frankly that remained a manual
process, no matter how much data people had supplied, I never could do that
reliably from the numbers alone. As a liberal Christian, I end up with quite a
bit of filtering of filtering out of people with very great differences in
either religion or politics which the algorithms simply didn't pick up.

So anyway, some tweaking done. Let's see if it has any visible effect...

------
lell
How about this: instead of showing you match %'s, they show you profiles and
you rate them. Then use recommender systems [1,2] just like netflix. There are
two immediate problems with this: you have the cold start problem [3] and also
if you don't show random profiles, then you introduce bias into their
rankings. The first problem could be solved with side information (you still
answer questions) and the second with a nicely chosen exploration/exploitation
trade off. PMF[4] for the win!

[1] [http://spectrum.ieee.org/computing/software/the-million-
doll...](http://spectrum.ieee.org/computing/software/the-million-dollar-
programming-prize)

[2] <http://en.wikipedia.org/wiki/Collaborative_filtering>

[3] <http://dl.acm.org/citation.cfm?id=1352837>

[4] <http://en.scientificcommons.org/42513739>

~~~
tincholio
OKC does have this functionality, it's called quickmatch. I don't know how
they use the ratings you give there, but I doubt they just let it go to waste.

~~~
eftpotrm
Quickmatch has other problems though.

* If you're going to do it properly and give them meaningful data, a 'quick' skim is the exact opposite of what you want to do. Yet it's what the function is set up for; you can't even see a person's questions and answers, which have frequently (to me) proved far more revealing than the profile.

* Again, if you're doing it properly, rating someone highly sends them a mail saying 'someone's interested in you'. Which always looks to me like saying 'Someone's interested in you but doesn't have the guts / energy / enthusiasm to write you a proper message so has gone for the easy way'. Not a great intro.

With a bit of tweaking I agree it could give some useful seed data, but not
much more; fundamentally I think the question-based approach is very good, but
I think their scoring algorithm could benefit from refinement.

------
dschoon
This critique reminds me very strongly of Evan Miller's "How Not To Sort By
Average Rating": [http://www.evanmiller.org/how-not-to-sort-by-average-
rating....](http://www.evanmiller.org/how-not-to-sort-by-average-rating.html)

In both cases, the key observation is to realize that the _meaning_ of the
numbers is more important than the numbers (or how they're calculated, per
se). This point is kind of abstract--especially compared to the salient
examples in both pieces--but if you're looking to understand _why_ this error
is so common, look first to the fact that it's not an error of stupidity or
insufficient maths.

You can sometimes avoid a big practical problem (e.g., an avenue to attack
your system via diluting meaning ("spam")) by abstractly considering the
meaning behind the structures available to you.

------
Avshalom
I suspect that "doug" in the articles comments has it right. Users are happier
to see dozens/hundreds of "matches" that they know aren't all that good
despite saying 90% than if they saw 3 matches that were more correct followed
by a huge drop down to 50-60%.

~~~
crasshopper
I think it depends on your locale.

Spearfish, SD versus Brooklyn, NY for instance -- the Brooklynites could
benefit more from algorithmic improvements.

------
CJM13
It is crazy to think that a matching algorithm based on answers to seemingly
pointless questions can predict the success of a future relationship. There is
no substitute for interpersonal connection in real life. Those looking for
love and meaningful relationships should get off of their computers and meet
people in person. This is the problem with the majority of dating sites out
there-too much focus on the online experience of users rather than the offline
interaction. The internet is a great way to network and meet new people but
nothing can replace spending time with a person in real life to determine
whether or not you will get along with them.

~~~
crasshopper
CJM, it's just a sort & a gauge to guide search. Not a formula for
relationship success.

------
itmag
I don't think matching algorithms will EVER work, short of human-level AI. The
only matching algorithm that works is the human brain.

So, I think that the right way to go is to just increase the speed and
accuracy with which a human can look for matches. Some ideas here:
[http://ideashower.posterous.com/idea-dating-site-
slideshow-a...](http://ideashower.posterous.com/idea-dating-site-slideshow-
audio-voiceovers)

------
philwelch
The other problem is that many of the questions are redundant, ambiguous, or
otherwise poorly constructed. I end up skipping a lot of them just because of
this.

------
orenmazor
I've been a user of okcupid, on and off, for several years.

at this point, I dont really even look at scores anymore. I can tell based on
profile text alone whether I'm likely to get along with a person.

but this is age and experience, not math.

I'd rather okcupid had a feature that tells me what things are going to cause
fights between a person and myself after we date for a while :)

~~~
pbhjpbhj
> _I can tell based on profile text alone whether I'm likely to get along with
> a person._

 _but this is age and experience, not math._ //

Arguably your brain is running an algorithm that can be expressed
mathematically, we simply don't know the exact nature of the algo.

Age is largely irrelevant, it's mainly a first pass indicator for experience.
Your experience is probably largely a statistics based algorithm with some
unsound choices caused by your psych make-up thrown in to keep things
interesting.

------
heyrhett
This reminds me of the recent Dinosoar Comic. It seems like everyone thinks
they can make a better dating site these days: "BaguettesAll4Me.co.uk is
complete! ... I realized my perfect woman won't say 'ew that's weird' as she
watches me eat a whole baguette" <http://www.qwantz.com/index.php?comic=2088>

------
khafra
> At least they’re not using a non-linear Bayesian splitting tree didactogram

This seems like a weakness to me. I'd love to be able to choose between
different algorithm for matching features and set some relevant coefficients.
It'd be fun to see the profiles that have a minimum hamming distance from
mine, or whatever.

~~~
scarmig
You should make a dating site advertising that as its key feature.

Most everyone who would join it would be a giant nerd, even moreso than OkC,
but that'd significantly increase average compatibility!

~~~
philwelch
Unless it was a gay dating site, that would also worsen the endemic problem
with dating sites: namely that there are far, far more men than women on them.

~~~
tomjen3
Actually back when I was using OKC I didn't have a problem finding women -- it
was just that there were few dateable women.

I expect that to change though - one (of the many) differences between men and
women is womens dating worth as a date falls in her twenties while the typical
male sees his dataing value go up.

~~~
crasshopper
[http://blog.okcupid.com/index.php/the-case-for-an-older-
woma...](http://blog.okcupid.com/index.php/the-case-for-an-older-woman/)

~~~
tomjen3
I read it back before I settled on OKC.

I am still not interested in an older woman (or a fat one, or a fundie or a
single mom). If I wanted to date my mothers friends I would give them a call.

------
vaksel
it's probably not a big deal from their perspective, since 99% of people
wouldn't realize to do that.

It's essentially good enough for them, since the mainstream audience wouldn't
realize that, and if a few geeks figure out a way to cheat the system, then
that's fine too, since they need all the help that they can get

~~~
flurie
There are other, more popular ways to game the system that have nothing to do
with the algorithm (cf.
[http://www.reddit.com/r/IAmA/comments/ekd1o/iam_my_own_okcup...](http://www.reddit.com/r/IAmA/comments/ekd1o/iam_my_own_okcupid_wingman_i_have_a_fake_profile/)).

------
tibbon
Hmm, this does make me wonder about some people that I'm "almost perfectly"
matched with. We potentially had too many mandatory questions matches, but a
lot of the more subtle things got lost?

------
rejectedstone
Props for using Chris Rock's "I take care of my kids."

------
winternett
Dear OK Cupid: Just because I weigh 280Lbs, it doesn't mean that I want to
date a girl that weighs 250LBS...

~~~
parfe
Two way street. 125 pound girl might not be accepting of someone more than
double her weight.

~~~
winternett
Who said anything about 125lbs, thats a far way away from 280... I joined but
only got recommendations for people my size, but when I go out in public my
choices are always varied. Its not just an algorhythm, a feature for random
suggestion should be added.

------
listic
What's wrong with OKCupid's matching algorithm for me is that I can't get to
use it at all! I just can't log in. I've been trying for the last 2 months,
they just keep saying "Sorry, we're having technical difficulties right now.
Check back later.". I have searched news for an explaination of a major
OKCupid outage and found nothing. Is it just me?

~~~
ig1
They may have blocked your ip address.

~~~
listic
Yikes! Looks like that; I've tried logging in via a proxy and it worked. Never
thought they would filter me out this way, saying it's "technical
difficulties".

Is there a way to contact them? They don't reply when I sent via Feedback form
is not answered. I seriously thought they have gone out of business (however
improbable it might be, being acquired just this year for $50 million)

~~~
lhnz
There is no way of contacting them. I've had this problem myself, too. (I
eventually changed IP address when I moved address.)

------
caycep
no role for stochastic processes in generating matches? so sad...

~~~
rcfox
A stochastic input (ie: people) to a linear system is going to look pretty
much the same as a stochastic input to a stochastic system.

------
hendrix
Assuming OKC's target market is the relative majority in English-speaking
countries it really doesn't take much crazy math to find matches, as in the
matching facility is not going to be as important as the random number
generator (or trained monkey) that is filling in the matches. All that really
needs to be taken into account are features of the prospective match that the
two individuals WANT to be taken into account(age/height/race/income
level/political/religious). It isn't that complicated, especially since one of
OKC's tactics is to use quizzes/'looking for friends' to break the stigma of
online dating.

What is the stigma of online dating?

Obviously the implied answer is that if you are using online dating then you
cannot find a date IRL. Or perhaps a kinder answer would be that you have too
much going on to find a date IRL. This is the same principle as a meat-market
nightclub/singles bar where the bouncer lets the guy with the two attractive
women in, but the pack of forever-alones get locked out. Then the slightly
wealthier gentleman pays $ to the nightclub to get in, for the opportunity to
pay $ to one of the attractive women (in the form of a drink) for the chance
to talk to her and (hopefully) get sex/affection/phone #.

Of course this is all crazy, but it is a way in which many
bars/nightclubs/online dating works. If OKC can attract attractive women with
quizzes/validation/'looking for friends' and associated kitsch, they have it
made.

------
georgieporgie
The only online dating site that was worth a moment's time was Facebook, and
then they removed the search facility.

OKCupid was interesting, but not a magic bullet. It was then bought by Match,
which consists primarily of marketing and poor website construction.

Online 'dating' is a social problem more than a computing problem and, as
such, no good solution will ever exist. Anyone male thinking of sinking time
into one would be better served by simply getting out more, taking classes,
adopt a dog and walk it, etc.

~~~
int3rnaut
I have to agree with you in a sense that this is a social problem (but of
course that's what's interesting with graph integration and social media now).
As a guy who took Attraction classes (or PUA training as some might know it)
and created a much higher success rate for myself by negating the traditional
online dating sphere I do believe time is probably better spent offline--
however, I still think the computing angle of this and the way in which
society is already headed, this is one of those holy grails of the internet.

~~~
georgieporgie
That reminds me... I had a female friend who would likely be considered
attractive to most. She was yammering on about how terribly _interesting_ this
guy was who she met online. He kept asking such interesting questions, and
gave her a "personality test" about a cube in the desert, and a ladder, and...

Several months later, when I was reading The Game, he got around to describing
exactly this 'test'. I laughed for quite some time.

Apparently, PUA cheesiness works online, as well.

~~~
int3rnaut
This is quite off topic but because of how well known 'The Game' is, it should
never be used as a how-to as it once was but more of an inside look at what
such a life entails. Quite a few girls have read such books, and actually look
out for such tells to identify frustrated chumps--and well the "chessyness"
and the hyper convoluted dating approach of that time just doesn't cut it
anymore (I've seen some old techniques done, and successfully, but it's not
advisable haha--but again a lot of these things are built upon psychological
understandings), and it's a big reason why that field has really had to evolve
in the past few years. Simple common sense and psychological and biological
foundations seem to be the way to go.

------
freemarketteddy
Now a good way to single out the Hacker News Readers would be to identify
males in the computer software industry who suddenly and significantly updated
their profile questions shortly after November 23'rd 2011....would love to see
some real data on how much action these users are getting!

------
phzbOx
What I hate about OKCupid is that all the girls I've met only wanted sex. I.e.
we have sex, it's all fun, and then they leave saying to call them back
whenever I feel like having fun again. It's just not serious.. I clearly
checked "Long time relationship".

