
Show HN: An API for gender classification - heynk
http://gender.hankstoever.com/
======
sambeau
The only reliable way to find out someone's gender is to ask them what gender
they currently think they are. There are more than two genders and an
individual's gender can change with time.

A better strategy all-round would to be ask yourself whether you need to know
someone's gender. I can think of very few legitimate reasons to know and
record a user's gender and many of them can be dealt with by simply asking
them what they'd like to be referred as, perhaps in more than one scenario.

~~~
joaomsa
One use I've come across for inferring genders was tackling record
depupliction across multiple data sources. In one data set we might have the
gender information of an individual but have it be missing in another.

Turns out gender is great to include in a blocking keys to reduce number of
comparisons. Extrapolating an inferred gender in the dataset without one was
incredibly helpful.

~~~
aw3c2
You are forgetting that the individual might consider the gender to be private
information. Some people might want to use a different gender in different
contexts. See the ESPN/Grantland suicide issue recently if you care/dare.

~~~
mseebach
An individual that considers gender to be private information , and that uses
different genders in different situations is very unlikely to be using a name
that can be classified with a high degree of confidence as one gender.

A person using the name "Jack" is unlikely to be assumed to be a woman, even
if that person selected "female" from a drop down somewhere. If the same
person uses Jack/M and Cindy/F in different contexts, no fuzzy algorithm is
going to resolve them as the same person (bar some other, stronger ID, such as
a SSN).

EDIT: I initially used "William" as an example. Ironically, it turns out that
name is only 57.6% male. Both Jack and Cindy are 90% male/female.

~~~
drakeandrews
At least in the UK, Jack is a fairly common shortening of Jacqueline. The only
way of determining a users gender is asking them directly, and if a person
wishes to enter different values into different systems, all the more power to
them.

------
onion2k
While it's a quite interesting coding task to write a classifier, for the
overwhelming majority of applications you simply don't need to know a user's
gender. Making it a public API is a bad thing.

Developers have a horrible tendency to gather as much data on someone as
possible, everything they're willing to give in fact, for the simple reason of
"just in case we need it later". It's far, _far_ better to gather as little as
possible and build something that simply doesn't need to know specifics. If we
build things that are ambiguous, unspecific for age, gender, race,
nationality, etc then the world will be a better and more inclusive place.
Paradoxical as it seems, more privacy actually leads to a more integrated
society. That is universally a good thing (in my opinion, obv.).

------
splitbrain
Without a location parameter this is pretty useless. Andrea for example is a
male name in Spain and Italy as far as I know.

Also this: [http://www.cscyphers.com/blog/2012/06/28/falsehoods-
programm...](http://www.cscyphers.com/blog/2012/06/28/falsehoods-programmers-
believe-about-gender/)

~~~
terabytest
I'm Italian, my name is Gabriele and it classifies it as female. That's wrong.
In Italy, Gabriele is a male name. I agree that this is useless without
location information.

------
joshfraser
I started searching for the most gender ambiguous names I could think of like
"Jesse", "Alex", "Erin", etc. The best one I've found so far is "Angel" at
51.1% male.

~~~
cclogg
This is pretty fun lol. I searched for Santa and it said 99% female haha.

~~~
Renaud
Apparently god is overwhelmingly male at 99.998179%

------
sheraz
I think it is important to make a distinction between gender and sex [1]. The
link below is a makes a pretty good distinction between the two.

"Sex" refers to the biological and physiological characteristics that define
men and women.

"Gender" refers to the socially constructed roles, behaviours, activities, and
attributes that a given society considers appropriate for men and women.

[1]-
[http://www.who.int/gender/whatisgender/en/](http://www.who.int/gender/whatisgender/en/)

~~~
Renaud
And even sex[1] is sometimes not as binary as we make it to be.

[1]:[http://www.chw.org/display/PPF/DocID/22620/Nav/1/router.asp](http://www.chw.org/display/PPF/DocID/22620/Nav/1/router.asp)

------
hnriot
why bother with a naive bayes classifier, why not just use the dictionary,
when it matches the percentage is simple the ratio of the genders? I don't see
the need for a classifier, I was hoping it was going to do something clever
like guess the gender of a document's author.

------
bwp
Microsoft is unknown, Linux is male, and Apple is female.

~~~
blueskin_
BSD is unknown.

------
levlandau
Neat. Should probably call out that it's an API for "gender classification of
english names". Did you build this mostly for learning/personal purposes?

------
sdegutis
"Permelia"[1] is definitely a girl's name[2].

[1]: [http://blog.xkcd.com/2014/01/31/the-baby-name-
wizard/](http://blog.xkcd.com/2014/01/31/the-baby-name-wizard/)

[2]:
[http://gender.hankstoever.com/classify/Permelia](http://gender.hankstoever.com/classify/Permelia)

------
nl
I spend a lot of time choosing gender neutral names for story based scenarios
in proposals (the joys of corporate work).

[http://en.wikipedia.org/wiki/Unisex_name](http://en.wikipedia.org/wiki/Unisex_name)
has been mentioned elsewhere, but I've found it pretty useful.

Unlike a lot of people here, I don't think there is anything wrong with an API
like this. It's true that it isn't culturally neutral, but there are times
when any piece of information is useful.

------
jpsim

      if probability == "59.369936" {
        probability = "?"
        gender = "unknown"
      }

------
mastersk3
This is great fun, could you embed share options? I could trigger a share-war
amongst my friends!

------
robinjfisher
My name has a 50.444139% chance of being female.

In the present case, the API is, unfortunately, incorrect.

~~~
Dewie
My name has 99.99...% chance of being male. But when I change it to the
English/Anglophone version, the chance is only 60.641%.

------
bnegreve
What's the training set?

------
karanbhangui
Apparently Karan is female :(

------
blueskin_
I get ~70% for a name I've never even heard of being used for a female.

------
abe238
Bah I much rather use this app as it's much faster:
[http://apps.microsoft.com/windows/app/gndr/a062944e-744e-495...](http://apps.microsoft.com/windows/app/gndr/a062944e-744e-4955-b685-f3197faa2560)

------
ozh
Fun, although you might want to state it applies to the US only

~~~
abe238
I'm not even sure of that. I have seen much better results from US Census data
with very different %'s of prob e.g.
[http://apps.microsoft.com/windows/app/gndr/a062944e-744e-495...](http://apps.microsoft.com/windows/app/gndr/a062944e-744e-4955-b685-f3197faa2560)

------
shobhitjain26
my Name is unknown .
[http://gender.hankstoever.com/classify/shobhit](http://gender.hankstoever.com/classify/shobhit)

------
kirchhoff
John - 57.3% ?

~~~
ddeck
Definitely some odd results:

Michael = 52%, Thomas = 62% (although Tom = 98%)

~~~
markcampbell
Thomassina is a woman's name, FWIW.

~~~
yen223
Thom is 99.999963% male though.

