As a man named Tracy I would just throw out a caution to anyone thinking of using this. At this point in my life I don't really care about people calling Ms. Tracy Platt. But what it does do is immediately signal that I don't really need to pay attention. Fake familiarity doesn't usually work too well.
As a side note I would like to mention that having a girl's name does have some benefits. For example my wife can handle bank account issues over the phone for me. Also some near misses - I was assigned a locker in the girls locker room in junior high but they caught on before I could use it, and once a telemarketer called to offer me a spot in an all woman's resort spa. The only time I said yes to a telemarketer and I got rejected.
Me and my significant other frequently impersonate each other when dealing with banks and hospitals over the phone (we're a gay couple). It honestly never occurred to me that straight folks can't get away with this. How unfortunate; it's quite a time saver!
I just checked the data file behind this project and Tracy shows up as both a male and female name. If I were to use this, I would only ever let the analysis "leak" to the end-user if it were extremely confident that it's a strong match.
Gender identity is actually a quite tricky topic and should be approached carefully. I would discourage anyone from trying to use this library, the real world doesn't fit neatly in your multiple-choice view of gender. For more information, please watch this great talk: http://vimeo.com/61172068.
I can't imagine making a dating site these days: "I'm (name text field), a (sexual identity text field) interested in meeting (sexual identity text field) for the purpose of (dating/friendship/sex/swinging/whatever text field)." Good luck with that business logic. Maybe we do need intelligent agents for this stuff.
> Without knowing what data it's been trained on it's of questionable use.
Literally the very first line of the README has a link to the source data. It contains name frequency data for a number of countries and the README clearly indicates you can provide a country of origin when doing lookups.
> As an aside it's worth noting that as this library is GPL3 it means you can't use this code in any non-GPL product.
Agreed, but putting GPL code into your internal code is incredibly risky because of the vagaries of what counts as distribution under the GPL (for example if you distribute your code to a third-party security auditing company).
first name, birthdate, country/region code to allow for regional variations.
Would make a nice exercise in fuzzy decision processes, but I suspect it isn't a great idea: you'd be better off leaving that field as "unknown" by default and writing "Dear Sir/Madam" if it is unknown.
Frankly, it is none of your business what the sex or gender of a user is. I understand that sometimes there is money to be made by collecting this information but it is also alienating and just plain irrelevant (and I think there is also money to be made in recognizing that people can be fluid.)
You can give your users an option to provide you with these details but guessing/requiring is not a good practice.
On a side note, it's interesting that the most common gender neutral title is Dr.
I think you're looking at this through too narrow a use case. I agree you shouldn't be taking individual people and guessing what their genders are. And you should minimize the instances where gender is even relevant in your application.
But what if you have a whole bunch of data and want to do some aggregate statistics? "Do women use our product?" is a perfectly reasonable question to ask yourself. You don't need it to be exact, and it's certainly not reasonable to ask every user. So you use some heuristics and you get some useful data.
On the one hand, gender corresponds to identification and behavior, which this predicts (more so than biology). On the other hand, this produces binary output, and sex is more generally accepted to be binary than gender.
So, I think 'sex machine' is appropriate.
 Okay, three options, but 'andy' really corresponds to 'unknown', not 'person of androgynous gender'.
Sex isn't binary as binary as you'd think. There are people with ambiguous genitals, so that doesn't work. There are people with genitals that don't match the sex of the rest of their body. Genes then? There are people that have male genes and female bodies, and vice versa. There are people with both male and female genes (XXY).
I know that most of us already know this, but it is worth repeating: sex is biological, gender is cultural (for instance, "la montaña" - the mountain is feminine in the Spanish language). A "sex machine" would tell you whether you were dealing with a biological male or biological female, or something else, but it would not tell you the gender.
Of course you're technically right, but the pedantry is out of control. In reality, when speaking English in a professional (and pretty much any other) setting, "sex" means "gender."
I don't think it's worth repeating. I'd say it's repeating this kind of misplaced vernacular revisionism that's making us (in which I controversially refer to the disparate collection of users on HN as a single group) look even more anal and oversensitive than we actually are.
A non-zero quantity of people have a gender that is separate and different to their sex. By marking out differentating sex and gender as "vernacular revisionism", you are contributing to making the lives of a non-zero quantity of people worse than they need to be. Erasure sucks, please don't perpetuate it.
This is exactly what I am looking for! I run a tech conference and am interested in seeing what percentage of our attendees are male or female. I only have names for historical data, so this should help give a somewhat close approximation of sex!
Haha, wow. It's like someone said, "Hey, what are people reacting incredibly poorly to right now?", then took the answer, and built an almost-useful library with a funny but obviously-destined-to-offend-lots-of-people name.
Really though, geek PC hilarity aside... with so many collisions and uncertainties, this just isn't a practical approach.
We tried to solve a similar problem with our app, too. We were trying to generate questions based on person and occasion (think: "What should I get my boyfriend for his birthday).
It got interesting when occasions didn't warrant possessives ("What should I get my boyfriend for Christmas"), and when language factors were considered ("What should I get mi abuelo for su cumpleanos"). We decided to try and crowd source it, which worked ok: essentially we left the occasion empty, and if the person wanted to attach the gender-based possessive to it, they could. Otherwise, we would guess with what information we had. We figured, over time, we could actually create a service where we could sell that information (GaaS: Grammar as a service?).
Turns out, people just wanted to be able to write their own titles, and we quickly trashed the idea in the early phases.
The answer is convention over configuration. See, if we institute a societal convention that your gender is derived from your first name (automatically by a Ruby program), it will save a lot of time and energy and make the world more DRY.