
Coffee Meets Bagel Meets Me – Statistical analysis of online dating - ceph_
https://medium.com/@mydatablog/coffee-meets-bagel-meets-ginny-ea33fd0780ae
======
7Figures2Commas
> In 2014, I went on 45 first dates with guys I met on Coffee Meets Bagel or
> Hinge (but mostly CMB). I started using Hinge around October 2014, and I
> added it to this analysis when I decided I needed more conversation data.
> This covers dates that happened between January 1st 2014 and December 30th
> 2014. I also went on a bunch of second dates, a handful of third dates, a
> smattering of fourth dates, and some past that.

> The purpose of this analysis is to understand how to predict whether or not
> I’ll be compatible with someone based on his profile and the messages that
> he sends.

One of the best online dating studies was published in Psychological Science
in the Public Interest[1].

Its basic conclusion: online dating is great because it expands one's access
to potential partners but the algorithms don't really work and the profile-
based structure creates an "assessment mindset" that leads individuals to be
more picky, judgmental and non-committal. When there's a bunch of new
potential matches available every day, you don't have to accept the fact that
_nobody_ is perfect. The nice guy or gal you had dinner with last night might
pale in comparison to the Mr. or Ms. Perfect who could be in today's batch.

Ironically, the OP's experience demonstrates this dynamic more than it
demonstrates merit to her own analysis. After 45 first dates, and what sounds
like a reasonable number of second, third, fourth and _n_ dates beyond that,
the OP seems content to continue what she calls an "active dating life."

[1] [http://www.psychologicalscience.org/pdf/PSPI-
online_dating-p...](http://www.psychologicalscience.org/pdf/PSPI-
online_dating-proof.pdf)

~~~
GuiA
The end goal of dating isn't to find a life long partner for everyone. Some
people date to that end - great. Some other people are fine dating "casually",
without the prospect of a longterm (possibly lifelong) relationship as the
goal. In other words, they might enjoy the company of the nice guy/gal from
dinner last night for a few weeks, and then move on to a next person whenever
they feel like it. That's also fine (although it might not be fine ~for you~).

Also the very concept of "dating" is very much tied to American culture, and
one's perception of it (and its goals/effectiveness) will vary greatly
depending on culture/religion/upbringing/etc. There's a version of you in a
parallel universe brought up in a culture where arranged marriages are the
norm, and you'd be posting in this thread about how ridiculous it is that one
should date at all - your parents should pick your partner for you!

(I've been in a monogamous relationship for the past ~3 years)

~~~
7Figures2Commas
Your point is completely valid, however, the challenge is
expectations/intentions. It's one thing if both parties are seeking the same
thing; it's another if they aren't. I would venture a guess that many times
there are misalignments. In some cases, these may be conscious (somebody is
looking for casual dating on a marriage-oriented site) while in others,
they're unconscious (individuals _believe_ they're looking for a serious
relationship when in reality they aren't ready for one).

Using the author of this post as an example, I wonder how well aligned her
expectations and intentions were with those of her 45 first dates. Note that
in one part of her post, the author refers to "meet[ing] someone" while in
another part she refers to leading an "active dating life." Although you can
obviously expect to meet multiple people before you meet someone with whom you
want to start a monogamous relationship, these might be two different
pursuits.

The questions around expectations/intentions are even more intriguing in light
of the fact that the author says almost all of her dates looked like their
photos and that she didn't have "any especially bad CMB/Hinge dates."

------
thret
"Also true story I generated this data and actually wrote a for-loop that
would print out the top ten most highly-correlated words from 10 to 1 and
pause in between each word to build suspense (for me alone at my computer)."

Adorable.

I am curious why she would use word length and exclamation/emoji/question mark
ratios but not check for spelling or punctuation? Surely they are more
indicative of someone's level of education and reading habits.

~~~
ginny2357
Hey, I wrote the article. Thanks for reading it. This is only guys I went on
dates with -- to be totally honest, I don't usually go out with people who
make lots of spelling/grammar errors.

~~~
rybosome
I once got a date because I sent a woman a message with the improper usage of
"their" in place of "there", then followed up with a second message pointing
out the mistake and my shame in having made it.

~~~
ginny2357
That was very accommodating of her.

~~~
4ydx
I am guessing this is being said in a joking manner.

------
carbocation
> _I hope the variable names are self-explanatory. Any variable with a star
> next to it is statistically significant, which basically means it can be
> used to predict scores /ratings._

For a variety of reasons (e.g., multi-collinearity), following this procedure
would potentially have you tossing the most important contributors to your
model. I would use a different mechanism to evaluate the contributors to your
model.

A more classical way to tell if something contributes to your model is by
evaluating a model with that value compared with the model without it. How do
the AIC (or BIC, or LR, or other metric that you like) of the N plausible
models compare?

As an aside, the article's approach to evaluating the "significance" of
predictors doesn't account for multiple testing. You have ~10 variables in
your model, and your best P value is 0.01, which is essentially 0.1 after
accounting for multiple testing (Bonferroni), which is not significant
classically.

~~~
minimaxir
Two additional notes:

1) R includes the F-test output that accounts for multiple variables in the
regression: in this case, P = 0.089, which is a problem.

2) For variable/model selection, R's step() function is easy to use and tests
using AIC too:
[https://stat.ethz.ch/R-manual/R-devel/library/stats/html/ste...](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/step.html)

------
beachstartup
[http://jezebel.com/5902718/creepy-finance-guy-with-
spreadshe...](http://jezebel.com/5902718/creepy-finance-guy-with-spreadsheet-
of-matchcom-prospects-says-he-was-just-trying-to-be-organized)

if this guy's _only_ creepy, what does it make the person who tracks
punctuation marks, runs linear regressions, and trains learning algorithm
using data collected on dates?

~~~
Blackthorn
Assuming this is an honest question, the guy is creepy because men preying on
women is _much more common_ than the other way around.

~~~
beachstartup
how exactly is going on dates preying?

~~~
normloman
Nobody's saying that this guy is preying. But the fact that other guys prey
makes unusual behavior like this more suspect. Women have to be more careful
than guys, since they are more likely to be abused. So naturally, they are
more sensitive to unusual behavior. You and I see this spreadsheet and think
"this dude's just awkward, but probably harmless." But some women might think
"serial killer alert." This is partly because of the media (watch enough
dateline and you'll think every man's a kidnapper). But partly because of
reality - some men do sick shit, and women are scared. My wife has some
stories about past dates she's been on that would repulse you.

~~~
beachstartup
uh, women also do sick shit to men. it's 2015, being a twisted, terrible human
being is now an equal opportunity phenomenon.

~~~
Blackthorn
It's almost like you didn't even read the thread you're replying to. One of
these is _much_ more common than the other.

------
kelukelugames
It's a cute read but there is really no data analysis. Her conclusion is "I
still rather just look at profile pictures."

~~~
caractacus
> It's a cute read

That's all you needed.

------
jpeg_hero
something profoundly depressing about the idea of 45 first dates in a year.

profoundly depressing for both the guy (the crap odds that come from being
1-out-of-45) and for the girl (can't be happy with any one from out of 45)

~~~
nothrabannosir
Hey man not at all, meeting people is great, it's fun. Dating is like life:
people worry so much about what you achieve in the end that they forget to
enjoy the ride.

Besides, it's the communication age, isn't it? Date, my children!

nice sorry hiking house outdoors harvard google drink math cornell

Edit: PS: you're 1 out of 45 either way, but if you never date, that's when
your odds are _real_ crap. Dating only a small subset per year, just because
you happened to meet; or getting the girl to stick with you because she just
happened not to have met that guy who's perfect for her; that sounds much more
depressing to me. Unless you believe in fate?

------
brianbreslin
@ginny2357 I am curious what other statistical factors you used or kept that
weren't used in the analysis. A friend who was a former PM at match said that
height, income, race, and occupation were the four most important factors (in
that order) for women to select men.

If you're ever in Miami, let's discuss this over a drink.

~~~
7Figures2Commas
If your friend is correct, you should probably post your height, income, race
and occupation here so that she can take your offer for drinks under serious
consideration. Your proper use of "you're" and reference to "statistical
factors" isn't enough to stand out on HN.

------
rflrob
I could rationalize the message ratio as sort of a self-fulfilling prophecy.
If she sent many messages, she was probably already interested in him for some
reason, and so would be more likely to rate him highly at the end.

As a methodological point, I would also have taken the log of all the ratios,
especially when doing a linear regression. A ratio of .01 looks like .00001 to
a linear regression, but they are quite different. Of course, if the dynamic
range is relatively small (probably within 2 fold either direction) maybe it
wouldn't matter too much.

------
hurin
I don't know what's more worrying, - the turn everything into data as an
industry, or how happy individuals apparently are to apply it to themselves.

Take this with a grain of salt. The statistics of course (well on a larger
scale than this article could take aren't wrong) but that an individual is
ready primarily to evaluate their own choices in terms of them is rather
scary.

~~~
girvo
Did you read her conclusion? It's not all that serious, just a cute bit of
fun. Nothing wrong with that. Besides, everything _is_ data already; regular
boring introspection is just "data mining" with terrible sample sizes because
of memory, and bad statistics because of biases. ;)

~~~
hurin
I wasn't specifically referring to her analysis, indeed as you point out with
those sample sizes and the number of potential conflating factors it's
unlikely this data is statistically worth anything -

I was commenting more on the general approach, of evaluating ones actions or
choices within the context of a statistical framework.

------
mmastrac
Given that the rest of the posts on the blog were of the rough form "using
marketing channel X to meet people", this probably leads into the next article
on writing technical posts to catch the eye of an interesting crowd... "Should
you make Hacker News your new dating app?"

------
jplahn
I would love to throw some sentiment analysis at this and see what could be
conjured up. Maybe nothing because the sentiments would likely all be very
similar and it would be hard to fine tune it enough, but it still might be an
interesting exercise regardless.

------
dynofuz
The model isnt that great. I'd kill some of those variables and run it again
to find something more significant and with fewer confounding factors that
drive down the explanatory power (R2)

------
quietriot
Please somebody try to make a coffee-enabled bagel.

If you know a bagel maker, and they have not tried adding extra-fine ground
coffee to their bagel dough, might you suggest is please?

~~~
thirdtruck
Agreed! For that matter, is anyone in the coffee-plus-oatmeal market yet? I
mix the two together on occasion, but haven't experimented with it thoroughly
yet.

~~~
bobbles
Don't suppose you're Australian?

My goto breakfast of champions is the Carmans crunchy clusters[1] (honey
roasted nut), using a Dare double espresso as the milk.[2]

Soooo gooood

1 [http://www.carmanskitchen.com.au/our-
products/clusters#Honey...](http://www.carmanskitchen.com.au/our-
products/clusters#Honey-Roasted-Nut-Crunchy-Clusters)

2
[http://www.dareicedcoffee.com.au/Product](http://www.dareicedcoffee.com.au/Product)

~~~
thirdtruck
Different continent, but thanks for the recommendation. :)

