Hacker News new | comments | show | ask | jobs | submit login
Sex Machine: Get gender from first name in Ruby (github.com)
47 points by _pius 1704 days ago | hide | past | web | favorite | 54 comments

As a man named Tracy I would just throw out a caution to anyone thinking of using this. At this point in my life I don't really care about people calling Ms. Tracy Platt. But what it does do is immediately signal that I don't really need to pay attention. Fake familiarity doesn't usually work too well.

As a side note I would like to mention that having a girl's name does have some benefits. For example my wife can handle bank account issues over the phone for me. Also some near misses - I was assigned a locker in the girls locker room in junior high but they caught on before I could use it, and once a telemarketer called to offer me a spot in an all woman's resort spa. The only time I said yes to a telemarketer and I got rejected.

Me and my significant other frequently impersonate each other when dealing with banks and hospitals over the phone (we're a gay couple). It honestly never occurred to me that straight folks can't get away with this. How unfortunate; it's quite a time saver!

We can, it's just a little more tricky... example being while looking up plane ticket information for an S.O. I'm usually her "personal assistant".

my experience is that the person on the phone usually doesn't care if you have all the relevant info

My full name is Matthew Stephen Trout.

This means I can filter a substantial percentage of junk mail simply by throwing away anything addressed to 'Ms. Trout'.

... or Grout, Sprout, Trent or any of the other hilarious misspellings people also seem to manage.

Less OT: Using such a thing to pre-select the probably-right option on a gender picker might be actually useful without the false selections causing the "fake familiarty" problem.

I just checked the data file behind this project and Tracy shows up as both a male and female name. If I were to use this, I would only ever let the analysis "leak" to the end-user if it were extremely confident that it's a strong match.

Reading the last two sentence made me smile. Of course, the following must be mentioned: http://www.nbc.com/saturday-night-live/video/its-pat/n10133/

Gender identity is actually a quite tricky topic and should be approached carefully. I would discourage anyone from trying to use this library, the real world doesn't fit neatly in your multiple-choice view of gender. For more information, please watch this great talk: http://vimeo.com/61172068.

I can't imagine making a dating site these days: "I'm (name text field), a (sexual identity text field) interested in meeting (sexual identity text field) for the purpose of (dating/friendship/sex/swinging/whatever text field)." Good luck with that business logic. Maybe we do need intelligent agents for this stuff.

I checked how Facebook handles gender the other day and it's still either "Male" or "Female". Strange, because they return gender as a string in API calls.

Came here to say this, thank you.

Without knowing what data it's been trained on it's of questionable use. What maybe a 18 year old american girls name may be a 60 year old german mans name.

The gender of a name can vary heavily by culture and time-period, it would make much more sense for the api to return the data in the form of ranked probabilities.

As an aside it's worth noting that as this library is GPL3 it means you can't use this code in any non-GPL product.

> Without knowing what data it's been trained on it's of questionable use.

Literally the very first line of the README has a link to the source data. It contains name frequency data for a number of countries and the README clearly indicates you can provide a country of origin when doing lookups.

> As an aside it's worth noting that as this library is GPL3 it means you can't use this code in any non-GPL product.

The source data is GPL, there wasn't much option.

The data is GPL documentation license; why/how would that affect the code?

Can you get probabilities out ?

> As an aside it's worth noting that as this library is GPL3 it means you can't use this code in any non-GPL product.

Of course you can. Usage != distribution.

Agreed, but putting GPL code into your internal code is incredibly risky because of the vagaries of what counts as distribution under the GPL (for example if you distribute your code to a third-party security auditing company).

Its probably useful for aggregate statistics. The Jordans and Teagans and Pats are outliers.

As are the Leslies, Shannons, Ashleys, and Tracys.

That makes me think, if you had firstname + birthdate, you could probably be a tiny bit more accurate.

first name, birthdate, country/region code to allow for regional variations.

Would make a nice exercise in fuzzy decision processes, but I suspect it isn't a great idea: you'd be better off leaving that field as "unknown" by default and writing "Dear Sir/Madam" if it is unknown.

> As an aside it's worth noting that as this library is GPL3 it means you can't use this code in any non-GPL product.

Well you can always simply ask the author -- if he has got the copyright, he can grant you permission.

Frankly, it is none of your business what the sex or gender of a user is. I understand that sometimes there is money to be made by collecting this information but it is also alienating and just plain irrelevant (and I think there is also money to be made in recognizing that people can be fluid.)

You can give your users an option to provide you with these details but guessing/requiring is not a good practice.

On a side note, it's interesting that the most common gender neutral title is Dr.

I think you're looking at this through too narrow a use case. I agree you shouldn't be taking individual people and guessing what their genders are. And you should minimize the instances where gender is even relevant in your application.

But what if you have a whole bunch of data and want to do some aggregate statistics? "Do women use our product?" is a perfectly reasonable question to ask yourself. You don't need it to be exact, and it's certainly not reasonable to ask every user. So you use some heuristics and you get some useful data.

I agree I'm mainly concerned about cases where gender follows you around a website when you're logged in.

"Good Morning Mr. _" and stuff like that should be avoided unless supplied by the user. For your own stats using this library is probably better than asking users for that information.

In quite a few languages, you need to know the gender to write personalised emails, eg. Cher Joe vs. Chère Jane.

That's a pretty poor definition of "personalised". When people get canned emails which guess their gender incorrectly it just seems a bit sleazy.

"Hi, you don't know me but I want your money."

Shouldn't this be called "Gender Machine", they get it right on the project description but not on the project name, weird.

On the one hand, gender corresponds to identification and behavior, which this predicts (more so than biology). On the other hand, this produces binary output[0], and sex is more generally accepted to be binary than gender.

So, I think 'sex machine' is appropriate.

[0] Okay, three options, but 'andy' really corresponds to 'unknown', not 'person of androgynous gender'.

Sex isn't binary as binary as you'd think. There are people with ambiguous genitals, so that doesn't work. There are people with genitals that don't match the sex of the rest of their body. Genes then? There are people that have male genes and female bodies, and vice versa. There are people with both male and female genes (XXY).

It's probably a cultural thing. "Gender Machine" doesn't invoke the great James Brown song.

I had a friend who insisted "gender" should be used for linguistic use (for example cheese being masculine in French) and "sex" for the boy/girl-ness of a person.

Determining the boy/girl-ness of a chicken is called "sexing" the chick. I see "sex machine" as a machine that sexes by name.

This product is used for linguistic purposes, not for deciding whose head to cut off.

'Male', 'female', and 'androgynous' are sex designations while 'masculine', 'feminine', and 'neuter' are terms of gender.

The code is using the gender of a reference to determine the sex of the referent.

So you could say that they're both right.

"Sex Machine" is a whole lot sexier. It gave me a chuckle.

In other news, searching for "Gender Machine" on Google gives me NSFW ads for sex machines.

I know that most of us already know this, but it is worth repeating: sex is biological, gender is cultural (for instance, "la montaña" - the mountain is feminine in the Spanish language). A "sex machine" would tell you whether you were dealing with a biological male or biological female, or something else, but it would not tell you the gender.

Of course you're technically right, but the pedantry is out of control. In reality, when speaking English in a professional (and pretty much any other) setting, "sex" means "gender."

I don't think it's worth repeating. I'd say it's repeating this kind of misplaced vernacular revisionism that's making us (in which I controversially refer to the disparate collection of users on HN as a single group) look even more anal and oversensitive than we actually are.

A non-zero quantity of people have a gender that is separate and different to their sex. By marking out differentating sex and gender as "vernacular revisionism", you are contributing to making the lives of a non-zero quantity of people worse than they need to be. Erasure sucks, please don't perpetuate it.

When you say things like "updated his profile" in your app, you're talking about individuals, often times to them. Those people have feelings, regardless of how "professional" the context is.

So don't assume anything unless you want to exclude people who don't fit in that binary box. It's not about being pedantic, it's about being considerate.

This is exactly what I am looking for! I run a tech conference and am interested in seeing what percentage of our attendees are male or female. I only have names for historical data, so this should help give a somewhat close approximation of sex!

Haha, wow. It's like someone said, "Hey, what are people reacting incredibly poorly to right now?", then took the answer, and built an almost-useful library with a funny but obviously-destined-to-offend-lots-of-people name.

Really though, geek PC hilarity aside... with so many collisions and uncertainties, this just isn't a practical approach.

We tried to solve a similar problem with our app, too. We were trying to generate questions based on person and occasion (think: "What should I get my boyfriend for his birthday).

It got interesting when occasions didn't warrant possessives ("What should I get my boyfriend for Christmas"), and when language factors were considered ("What should I get mi abuelo for su cumpleanos"). We decided to try and crowd source it, which worked ok: essentially we left the occasion empty, and if the person wanted to attach the gender-based possessive to it, they could. Otherwise, we would guess with what information we had. We figured, over time, we could actually create a service where we could sell that information (GaaS: Grammar as a service?).

Turns out, people just wanted to be able to write their own titles, and we quickly trashed the idea in the early phases.

I submitted this, but I'm not the author of the library.

I agree with many of the concerns and limitations brought up here, most notably the fluid, non-binary nature of gender. That said, thoughtful application of probabilistic guesses about gender can add value in certain situations. For instance: http://source.mozillaopennews.org/en-US/learning/freeing-plu...

Any reason you called it "Sex Machine"?

How is this useful?

The answer is convention over configuration. See, if we institute a societal convention that your gender is derived from your first name (automatically by a Ruby program), it will save a lot of time and energy and make the world more DRY.

Let's institute a convention to refer to everyone by a unique identifier. Oh wait, that's prison.

Ohhh the controversy that would be generated if this gem ever got big.


I can this being useful if perhaps your attempting to target with maybe some email marketing..long as your content with it being 70-80% accurate.

Funny. I had plans to build one of these in a month or so time. Now I can crib the answers, Cool.

>> d.get_gender("Álfrún")

Out of curiosity, why did you choose that name?

This gem is too clever by half.

Looking forward to the upcoming submissions

1. Rename "Sex Machine" 2. Why Women Don't Like Rubyists 3. Don't Publically Shame the Person Who Suggested the Name Change 4. Take Your 'Sex Machine' and Shove It

Iiiiit's Pat!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact