
A better approach to determining gender from a first name - Stromgren
http://genderize.io
======
AndrewDucker
Please just don't.

There is no point antagonising people by guessing information about them
wrongly - particularly if it's something they've become sensitised to by it
occurring frequently.

If you need to know someone's gender (and largely, you don't), then ask them.

~~~
sambeau
It also assumes two distinct genders which is a fallacy.

~~~
clarkm
Sure, but that doesn't preclude it from being a useful metric for something
like ad targeting.

~~~
mrottenkolber
It does exactly that. Don't look for natural metrics in an artificial
universe.

------
marijn
> {"name":"marijn","gender":"female","probability":"1.00","count":1}

Except, of course, that I am male. My name is used for both genders. The thing
completely failed on a few other ambiguous names I tried. I'll second
AndrewDucker's opinion—just don't.

~~~
sdoering
The same goes for the following, a name used for both genders in Italy:

{"name":"maria","gender":"female","probability":"1.00","count":700}

~~~
k__
failed with my name, too.

{ "name": "kay", "gender": "female", "probability": "0.93", "count": 57,
"country_id": "US", "language_id": "en" }

~~~
jakewalker
How is that a "fail"? The probability is listed as 93%.

~~~
seppo0010
100% of probability doesn't make it a certainty either.

------
brey
Interesting from a machine-learning perspective - but this strikes me as a
solution looking for a problem.

If any service needs to know gender (and I'm having a hard time thinking of
times you NEED to know gender - dating sites?) - why not just ask? surely in a
situation where you're reliant on having accurate gender information, guessing
from $firstname and getting it wrong is worse than asking.

~~~
vdaniuk
Why not just ask? More form fields means less conversions. Using the service
one can ask for the gender later during the registration process only if
confidence in the sex detection is lower then a defined threshold.

~~~
tommorris
You don't need to know the data for most things. The times you do (dating
sites etc.), you need it to be accurate and should ask explicitly.

------
batemanesque
I'm sure this is interesting from a statistical point of view, but does the
tech scene really need yet more reinforcement of a binary view of gender?

~~~
tommorris
You want an enlightened view of the complexity of sensitively handling
transgender people, non-binary genders and other gender and sexual minorities?

This is Hacker News. Such enlightened thought is frowned on by our new
brogrammer overlords. Here's your beer.

~~~
quesera
Wow, that's misplaced hostility.

If you wanted to deride the fact that many folks here won't spent multiples of
effort on special, experimental, no-right-answers-and-likely-to-be-criticized-
for-it-if-you-even-try cases that affect minuscule fractions of their
potential user base, well...get in line behind the IE5 advocates, I guess.

Someone recommends using a free form entry for gender. No amount of
normalization will fix the "ham sandwich" entries (except that we know they
are nearly all male), so you'd trade the integrity of a small percentage of
your data for the appearance of "making an effort" for the vanishingly small
percentage. Net fail.

Just to be clear, my primary feeling here is that -- in the hypothetical case
where gender matters -- you're best served by keeping it simple: (female |
male | other/it's complicated | prefer not to answer). This should serve all
cases equally.

~~~
batemanesque
ah yes, because trying to be decent & inclusive to persecuted minorities is
the comparable to supporting an obsolete browser - yr analogy is a perfect
illustration of the grotesque intersection between the Valley & bigotry.

also, "the hypothetical case where gender matters" is a glorious illustration
of straight privilege

~~~
quesera
> a glorious illustration of straight privilege

Gender has nothing to do with orientation. Please don't propagate such
normative misunderstandings.

If I may restate for clarity: in most cases of software implementation, a
user's gender is not important data (obvious exceptions include medical and
related fields).

Generally, gender should not be requested. Where requested, it should not be
required. Where required, one should have no compunction against answering
randomly.

That's prerogative, not privilege.

~~~
batemanesque
opted for "straight privilege" as an alternative to "cisprivilege"that I
thought HN people were more likely to understand. I'd have thought it was
obvious I wasn't actually talking about orientation

------
Filligree
The "probability" return value appears to be a straight average; it returns 1
for "Peter", which is almost guaranteed to be incorrect - all it takes is a
single female Peter, anywhere on the planet.

A better approach, in the absence of more complex models, would be to use
Laplace's sunrise formula.

~~~
mjolk
You're kidding right? Guessing gender for a "show hacker news" with a .io
domain is a clear case of "done is better than perfect."

~~~
stephencanon
Adding 1 to the numerator and 2 to the denominator is a trivial improvement,
not pie-in-the-sky whiteboarding that prevents you from shipping.

~~~
mjolk
Yes, to fix the "girls named Larry" bug.

------
kmike84
In morphologically rich languages (like Russian) the most discriminative
feature for detecting gender could be the word shape of last name or middle
name, not the first name. So in many languages there is no way to have
meaningful gender prediction by analyzing just the first name. Relative gender
frequency for the first name is an useful information, but it is just not
enough for reliable gender prediction.

------
bromagosa
I guess it needs a better training DB, it returns {"gender": null} for not-so-
common names in languages other than English...

[http://api.genderize.io/?name=eloi&language_id=ca](http://api.genderize.io/?name=eloi&language_id=ca)

[http://api.genderize.io/?name=tomeu&language_id=ca](http://api.genderize.io/?name=tomeu&language_id=ca)

[http://api.genderize.io/?name=rigoberta&language_id=es](http://api.genderize.io/?name=rigoberta&language_id=es)

[http://api.genderize.io/?name=presentaci%C3%B3n&language_id=...](http://api.genderize.io/?name=presentaci%C3%B3n&language_id=es)

Credit for distinguishing between names in languages, though! Joan returns
female in English, but male in Catalan.

------
eksith
This project is a fine example of the "Falsehoods Programmers Believe About
Names" [http://www.kalzumeus.com/2010/06/17/falsehoods-
programmers-b...](http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-
believe-about-names/)

------
gambiting
Bear in mind that in some languages this problem doesn't exist. In Polish for
example, all female names end with an "a". There is not a single exception
from that rule, so if you see a name ending with an "a" it is always a female
name.

~~~
TillE
And in Iceland you can reliably determine gender from a person's second name,
ending in either -son or -dottir.

~~~
zuppy
Yes, but probably there are people in Iceland and Poland with foreign names. I
know that you both wanted to explain that there is a rule in some language and
it's nice to find out about that information, but as long as it doesn't apply
to everybody, I don't consider this a solution.

~~~
gambiting
In Poland you cannot give children foreign sounding names. So unless we are
talking about foreigners visiting,then it does apply to everyone. Like I don't
know a single person to whom that rule would not apply.

------
nefasti
I thought Hackers News had more people speaking more/other languages than
english.

A lot of complaints, excluding the binary gender complaints, totaly forget
about how languages like portuguese / french have male / female differences
for nouns and other language constructs.

Let´s say I have to build a phrase where I have the user profession like
engineer and I don't know upfront, for portuguese male would be "engenheiro"
or " engenheira" for female. It does have a lot of practical uses. And with a
big enough training, the decision to use for that user is on your hands.

------
casca
For Icelandic names, it's easy to identify the gender by looking at the last
name. For example Bjarni Benediktsson is definitely male while Katrín
Jakobsdóttir is definitely female.

Another strategy is to use gender-neutral terms until you find out the gender,
as asking directly might be considered rude in some cultures.

~~~
mhurron
Is the first name in Iceland the family name or is there something else going
on here?

~~~
jaimebuelta
I'm just guessing, but..

"Benedikt-SSON is definitely male while Katrín Jakobs-DÓTIR is female"

(Hey, I swear it was before taking a look
[http://en.wikipedia.org/wiki/Icelandic_name](http://en.wikipedia.org/wiki/Icelandic_name))

------
Grue3
Doesn't know my name.
[http://api.genderize.io/?name=timofei](http://api.genderize.io/?name=timofei)

------
anonemouscoward
{ "name": "петя", "gender": "female", "probability": "1.00", "count": 1 }

Yeah, how about no.

------
ludicast
I like this from a usability standpoint. Just as some forms auto-fill the
city/state based on the zip (and might get it wrong), this enables something
similar. And it might get it wrong, but if your mom gave you a girl's name*
blame her.

It also seems accurate:

Pat = about 50/50 David = All man Jessica = All woman

Also, wrt to "binary gender identity" complaints, are we all college freshmen
here?

* my own name (Nord) sucks and gave a gender of null. Spent my whole life being called Nerd, Nora, etc. I'm not flipping out.

~~~
masklinn
> wrt to "binary gender identity" complaints, are we all college freshmen
> here?

We aren't, which is exactly why it's a problem.

~~~
ludicast
Nobody is saying a form can't have "other", etc. as an option. Just that
someone's MVP that guesses gender is allowed to do that without political-
correctness police.

I fail to see how this API needs to accommodate transpeople in its 0.1
release.

~~~
eksith

      I fail to see how this API needs to accommodate transpeople in its 0.1 release.
    

You have failed at empathy.

Speaking to the fluidity of human gender, "Other" is the majority of the
spectrum and defaulting to binary is just as naïve as defaulting to ASCII as
expected input in an application/API written in 2013.

Restricting yourself that early in the release cycle (and I'm still dubious of
the merits of this project), doesn't bode well for its future.

Edit: I just read your comment history and, if I'm not mistaken, you're
already biased. Or would you care to elaborate what you wrote here?
[https://news.ycombinator.com/item?id=6451454](https://news.ycombinator.com/item?id=6451454)

~~~
ludicast
>> Speaking to the fluidity of human gender, "Other" is the majority of the
spectrum

I agree 100% about the fluidity of human gender, but rather than lecture
people via a form/api etc. it is probably simplest to have words that most
people use like male and female and something ("other", "trans",
"enlightened", whatever) for the 3rd option.

>> I just read your comment history and, if I'm not mistaken, you're already
biased. Or would you care to elaborate what you wrote here?
[https://news.ycombinator.com/item?id=6451454](https://news.ycombinator.com/item?id=6451454)

First of all that was a joke in the spirit of the Hangover 2. Might not have
been that funny, but was an attempt at humor based on what it was responding
to.

Secondly, I believe you are "biased" :) because though it is chronologically
juxtaposed to this comment that's probably a coincidence because if you go
through my comment history it might be the only one touching on the subject (I
think, no guarantees).

edit - I did make a prison rape joke a year+ ago
([https://news.ycombinator.com/item?id=4148572](https://news.ycombinator.com/item?id=4148572))
but I actually heard from people that it was hysterical* because of the play
on "backbone".

* hysterical is a sexist word, I know.

~~~
eksith

      it is probably simplest to have words that most people use like male and female and something...
    

The simplest is a text field with: "What prefix would you like us to use?" The
End.

There are no assumptions, no assignment of labels, not one bit of imposing
your cultural norms on anyone else. The hardest part of getting over biases is
acknowledging that you have them.

Learn some sensitivity, please.

~~~
ludicast
I'm either being trolled or bullied here.

A text field adds time to type out (which can lose customers, alienate
handicapped etc.), all to accommodate an exception, rather than a rule.

I don't care if you have a slider, dropdown, circle, whatever, but for
usability, a gender option should have poles that require 0 or 1 clicks to get
to (though a text-field for further elucidation is okay). Continuing down this
path, the further step is saying a shoe-size option insults amputees and
lymphedema victims and should be a text-field...

Edit - Another fact is that the person filling out this form might not be the
person described by the form (say for a CRM tool) in which case it matters
more to KISS.

