

Build an algorithm to predict friendships, then actually use it to meet people - waxman
https://www.joingrouper.com/cuttlefish

======
avalaunch
This is a cool challenge but the prize is definitely lacking. I think anyone
capable of writing an algorithm of the caliber you're looking for isn't likely
to participate. I could be wrong but I think you're going to have to pony up
some serious cash to get developers taking this seriously. Or you could go the
more standard route and just hire someone to do the job.

~~~
chegra
I would do it cause I'm curious.

Edit: Now looking at the dataset, I wouldn't be able to use the model I
developed personally.

------
benhamner
Ping me (b@kaggle.com) if you're interested in running this competition more
formally on [https://kaggle.com](https://kaggle.com).

We've run hundreds of machine learning competitions & offer a real-time
leaderboard to encourage competitive participation, a very active community of
data scientists, and many other features that simplify running this type of
challenge.

------
mariusz79
So basically they are asking people to build them an algorithm that will be a
critical part of their business, in exchange for a free service that will be
based on this algorithm. Right...

~~~
nottombrown
Sorry if this was unclear. You own any code that you write for this
competition.

The prize is that we'll use your algorithm to validate any matches that _you_
go on. If that doesn't seem worthwhile to you, feel free to pass on this
contest.

~~~
Aloisius
Do you allow closed source entries? Rewriting an algorithm implemented in
someone else's code to avoid copyright infringement is trivial not to mention
inevitable given the performance requirement differences between a contest and
a production site.

~~~
nottombrown
Sure, just take a row as your input and return a boolean.

------
mjmahone17
This is interesting, but given your parameters (predict the most friendships),
all you're technically asking for is recall. I'll write an algorithm that has
100% recall: predict that all people become friends with each other.

If this is really a competition (and not just "Here, have fun with our
dataset!"), you need to define the rules a little bit more clearly. How are
you weighing recall vs. precision? Or are you just looking at % correct
labels, where the only two labels possible are "FRIENDS" and "NOT FRIENDS"?

~~~
nottombrown
Sorry this was unclear. We meant " _correctly_ predict the most friendships"

You get 1 point for each friendship that you correctly predict _did or did
not_ occur. In the test data set ~50% of pairs became friends, so predicting
"everyone became friends" would get 250 points, whereas a perfect algorithm
would get 500 points.

I'm updating the README now to make our scoring system more clear.

------
thebiglebrewski
This would be a little more fun if there was a cash prize. No offense meant,
groupers look cool, but you'd probably get some more participation that way.

~~~
easy_rider
But then they could just hire a M.S. in Computer Science?

------
nottombrown
Hey HN, Grouper founder here. Let me know if you have any questions about the
contest.

~~~
ddod
This is the sort of thing I'm personally very interested in, and I have some
pretty novel ideas for how I'd approach it. That said, I wouldn't participate
in this because it clearly devalues the industry. You should really rethink
your approach.

Developers who are considering participation in this, I'd suggest you build
something for yourself with data acquired elsewhere.

~~~
libria
> I wouldn't participate in this because it clearly devalues the industry.

People this may be aimed at:

* Experienced devs in boring day-jobs who are seeking some kind of off-time challenge.

* People just getting into ML and want to solve something real.

* CS students with spare time.

You know more about ML than me, but it doesn't sound like they're looking for
a cancer cure; just fishing around for a one-off challenge. Or maybe they're
taking names for future interview candidates.

> Developers who are considering participation in this, I'd suggest you build
> something for yourself with data acquired elsewhere.

Relax, dude. If people think this an interesting problem to solve, what's that
to you?

------
maxk42
I'll be the first to say it: Your data is either incorrect, arbitrary, or
we're missing some information here.

Why does everyone have "7.5" \- 8 siblings and 7.5 - 8 "weekly workouts" and
7.5 - 8 platinum albums?

~~~
nottombrown
The headers with asterisks are intentionally mislabeled. Updated this to be
more clear in the README.

~~~
JFoss117
You write in the README that the mislabeled columns are "from our internal
ratings". Can you give any more definite sense of what this means? What kind
of things are these ratings based off of? What are they designed to reflect?
How are they computed (roughly)?

------
chegra
Mutual Information for the fields:

I(f_facebook_friends_count,members_became_friends) = 0.117320113379

I(m_facebook_friends_count,members_became_friends) = 0.113972809724

I(m_facebook_photos_count,members_became_friends) = 0.0449092782303

I(f_facebook_photos_count,members_became_friends) = 0.0426531483254

I(m_shoe_size _,members_became_friends) = 0.00276175766018

I(m_height,members_became_friends) = 0.00255043390135

I(f_shoe_size_,members_became_friends) = 0.00233148724025

I(m_age,members_became_friends) = 0.00198005768283

I(f_height,members_became_friends) = 0.0013606978915

I(m_weekly_workouts _,members_became_friends) = 0.00123271513215

I(f_age,members_became_friends) = 0.00122660347743

I(m_platinum_albums_,members_became_friends) = 0.00111710129455

I(f_number_of_pets _,members_became_friends) = 0.00108593667378

I(f_pokemon_collected_,members_became_friends) = 0.000880040104571

I(m_number_of_siblings _,members_became_friends) = 0.000830295252089

I(f_platinum_albums_,members_became_friends) = 0.000820683185117

I(m_number_of_pets _,members_became_friends) = 0.000768855827053

I(m_pokemon_collected_,members_became_friends) = 0.000720822383999

I(f_weekly_workouts _,members_became_friends) = 0.000620666529567

I(f_number_of_siblings_,members_became_friends) = 0.00019278884716

I(f_gender,members_became_friends) = 0.000124279429698

I(m_gender,members_became_friends) = 0.000124279429698

------
icebraining
This reminds of
[http://robrhinehart.com/?p=1005](http://robrhinehart.com/?p=1005)

That fact that the women are depicted as just three pairs of legs doesn't
help, though.

------
joshfraser
Ok, let's make this more interesting. I'll pay $50 to the first person to de-
anonymize their training set.

------
chbg
members_became_friends = 1/(1+ exp(-1297.88087 * f_shoe_size + m_shoe_size *
m_facebook_friends_count - 11761.6138))

