
Show HN: List of movie and book characters I use during application development - NameNickHN
https://github.com/appointmind/fakenames
======
spdustin
Would you humor a fellow HNer and tell me if you're in your early forties?

I happen to be working on a toy machine learning project that, based on the
fictional characters known by someone, predicts their approximate age. Your
list is the first organic validation set that happened onto my machine!

~~~
pjmorris
Fantastic. Want to try a smaller dataset? My stock names start with Alfred E.
Neuman, George Tirebiter, and Ted Nugent.

~~~
spdustin
I had Alfred E. Neuman and Ted Nugent in the data, but I really need more
music references.

It guessed early 70's.

It's written to guess early/mid/late subsets of ten-year ranges, and actually
does have some data for kids/teens. Mostly it had fictional characters in
movies, books and television but some singers snuck in there when I had my
family fill out excel worksheets, heh!

Any advice on how I could collect this kind of data? A survey on ask HN?
Mechanical Turk?

EDIT: the only features it uses are fictional names that one can rattle off in
one sitting and the specific age to use for the label. No gender or other
demographic information. So that's what I would collect, just a list of
fictional names a submitter can think of in one sitting, and their age. It's
basically a form of supervised topic classification training seen in other ML
tutorials, but using the age as the training set topic label. I'm
experimenting on enriching the data afterwards with media (book/movie/tv
flags) see if that feature improves its performance, but I'm teaching a class
this week and don't have any spare time to work on it.

~~~
spdustin
I see after going over your comment history that you're 52. Now I feel bad!

Looking at my data, only one person had Ted Nugent. My uncle, who is now 77. I
have new respect for him. A handful of Alfred E. Neuman references are in
there, wildly varying ages. It really underscores the value of a comprehensive
training set to see how wildly off some predictions are when there aren't
enough samples. Plus, just two vectors (as in your case) do not a
classification make.

It does make me wonder about the validation though... I should try validating
the model with varying numbers of input names to see if there's a baseline
where it's able to reliably predict age. I think that's where media would come
in handy...

~~~
pjmorris
My informal theory is that everyone's favorite music is what they were
listening to when they were in high school. I'd guess that's a less well-
specified version of your idea, but I think including musicians/bands would
improve your results. The point someone else made about having access to all
of prior history is valid though; my George Tirebiter reference came from
listening to 'Don't Crush That Dwarf, Hand Me The Pliers' by The Firesign
Theater, which is a decade or so older than I'd be expected to know if you
just knew my age.

------
benten10
Nice! May I ask if you collected those manually? Perhaps they should be
sorted/divided according to the genre they're from? Books/Movies/Tv-shows,
etc?

This list made be realize a sort-of annoying (sometimes) tendency I seem to
have developed. It appears that my first reaction to cool things is now not
wonder but 'I need to engineer the shit out of fit'.My first thought after
looking at the list was not 'wow, cool', but more of 'so if I use Named Entity
Recognition, and a large corpus, I could have tens of thousands of such names
in hours. Maybe I can catch up on computational linguistics literature on the
issue, and even identify the relative importance of characters on the text.
Should be a day-long project'. Need to learn to enjoy things for what they
are, sigh.

~~~
mercer
Haha, I've noticed the same thing.

My approach is in part to learn to accept that it's what I'm like, it's
something that fortunately helps put bread on the table, and fighting it too
much seems pointless. And the other part consists of forcibly turning it off
at times, through meditation or other activities, and seeing if it actually
benefits me or if I'm just trying to be something I'm not.

So far I lean towards 'accepting who I am' with the occasional and very
necessary break. It's only when I become to 'meta' about this process itself
that I get truly unhappy (trying to engineer my periods of non-engineering,
and then to force myself to not engineer this process, and so on).

In fact, it's all the 'meta' stuff in general that seems to be a bigger
problem than any of my natural urges. But I digress...

~~~
benten10
Yeeshh. I should to see a therapist for _that_ if nothing else. All that meta-
ness, which I guess might(?) arise from insecurity, often makes me stop saying
things other people would say, because I'm too conscious about the 'meta-
ness'. People sort of use the heuristic that the 'metaness' is a quick way to
identify a weirdo, so I often present myself as a lovable unknowing buffoon
instead. Or maybe none of it is true, and I just make these things up to make
myself feel better about myself.. Hmmm...

See what I mean? ; )

~~~
mercer
I'd say it's mostly normal behavior: a very sensible desire to not be a
weirdo, as being branded one can have bad consequences. But I suppose the more
insecure one is, the more likely it is that this 'internal monologue' becomes
a problem. And of course it can be argued that what is normal is not
necessarily good or healthy :-).

My favorite (short) novel illustrating this 'meta-existence' is Notes From
Underground. I read it in a period where I was very insecure and becoming more
and more withdrawn. Seeing on paper my exact thought process and how it
negatively bleeds into my (social) behavior and becomes a self-fulfilling
prophecy was very confronting, and a great warning and incentive to snap out
of it somehow.

I think the problems arise in a way similar to addiction: the 'meta' becomes
habitual, then compulsive an overpowering, and is experienced as negative,
perhaps even debilitating. But also gives a form of comfort. So we keep doing
it.

A certain degree 'meta', of self-reflection is fine, perhaps even necessary in
the complex societies we inhabit, but especially in bad times it can become an
addiction, making those bad times even worse.

I strongly believe being alone is crucial for our mental health. Or to 'just
be', I guess. And being 'meta' in your head is like constantly having
conversations of sorts with yourself or with others. It's not being alone.

In fact, it's worse than being alone, because your mind can puppeteer all
these others, so you really just end up projecting your insecurities and
judgments back unto yourself, except with more authority, because you're
imagining your partner, parent, boss, or friend doing it, which somehow makes
it seem more objective and real and painful.

------
tickthokk
Nice list, for just names it's a great resource. I see some umlauts, spaces in
names, punctuation in names, and it's clearly splittable for first/last name
fields. I'm not sure what other ground could be covered that someone would
need to account for.

I'm not "book" cultured, so a lot of names I don't recognize, but nice shout
outs to 30 Rock and Anchorman :p

Github complains that it's not a properly formatted CSV file. Maybe consider a
TSV? It'd probably still complain.

I've yet to use it, but it's been in my back pocket for when I need it. This
PHP package looks nice if you need more than just names:
[https://github.com/fzaninotto/Faker](https://github.com/fzaninotto/Faker)

~~~
NameNickHN
> it's clearly splittable for first/last name fields.

Yes, I need that for a simple script that generates email addresses and links.

------
RobertoG
I didn't recognize most of them, I had to search some to get an idea.

If the goal is testing, an improvement would be to add some
internationalization. There are not other than English characters there. You
want to be sure that your first foreigner don't break your program.

Actually, maybe it would be a nice project to accept pull request from around
the world and create an standard international data set.

~~~
NameNickHN
> If the goal is testing, an improvement would be to add some
> internationalization.

The goal is rather to be using some fun names during development. For example,
when I work on our appointment scheduling software, I often need to walk
through the scheduling and registration process in order to check the user
experience. Instead of using the tired old John Smith123 and
john123@example.com I can use some names that I fondly remember from a book or
movie.

~~~
RobertoG
Fair enough.

Can I suggest adding Iñigo Montoya at least?

~~~
NameNickHN
Can't. Haven't read the book yet and was unimpressed with the movie. Sorry.
;-)

~~~
tremendo
is it only "good guys"? I noticed several WOT names but only from the "light"
side, no Forsaken (maybe harder to get last names) or Padan Fain.

~~~
NameNickHN
I omitted indeed characters I didn't like. Those are not necessarily the
villains. There a quite a few of them in there as well.

------
vbsteven
Nice list. If I need to generate names for sample data I usually just use the
Faker library.

When I'm writing database fixtures for use in tests, I like to manually choose
names from movies/tv-shows for related entities.

For example for an Account with multiple Users I will pick Phil Dunphy for the
owner role, Claire Dunphy for the admin role and Luke/Haley/Alex dunphy for
regular user roles.

~~~
NameNickHN
> I like to manually choose names from movies/tv-shows

I knew I couldn't be the only one. And using family members to illustrate
different user roles is quite clever.

------
inanutshellus
At work we needed a "clean" dataset for a five-character code. We wanted it to
be something you could say out loud, e.g. "Hey, are you working on FORKS?"
"No, I'm working on CHUCK", so random wasn't an option, and we were afraid an
algorithm, like "consonant-vowel-consonant..." would randomly generate naughty
words.

We ended up using our customers' first names and it was a disaster. We had all
kinds of joke entries put in, like "JERK"... My favorite customer name was
"POOP LENGTH". lol. /facepalm.

Anyway, so in this multimillion dollar enterprise application we're showing
"POOP" to the whole company.

At least it was an intra-enterprise-only app.

------
nudpiedo
Does that worth to be shared in this community? has that ever been a problem
worth mention to someone? I am only aware to problems related to those names
when a living person feels they are using their name/image in a defamatory or
unauthorized way but I think anyone can find by herself a fiction or
historical name for that task (or generics such as John Smith/Max Mustermann)

~~~
NameNickHN
> Does that worth to be shared in this community?

Since it has been starred and forked on Github I think the answer is yes.

> has that ever been a problem worth mention to someone?

It has been to me. When I work I want to write lines and not think what name I
use when testing. I have a simple script that creates a link to my
applications so that the registration forms get pre-filled.

------
anotherevan
My go to name is Keyser Söze, which I use whenever I'm writing examples in
documentation and such. It also has the advantage of containing a unicode
character.

------
onion2k
This isn't a great list because it assumes far too much about what a name is.
Patio11 wrote a great blog post about what developers frequently get wrong
when it comes to people and names;
[http://www.kalzumeus.com/2010/06/17/falsehoods-
programmers-b...](http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-
believe-about-names/)

~~~
patio11
I'd encourage you to phrase feedback of this nature in the general form "Thank
you for doing free work on behalf of the community, which has certainly made
no one's life worse and some peoples' lives better. I feel like this list
could be even more useful if you added names like ... and/or restructured the
fields you store for names to resemble ... The reason for this is that the
existing list is generated from a subset of fiction which is broadly
representative of part of the global population but which fails to exercise a
lot of consequential cases commonly encountered when working with names. For
more details, see $LINK."

Optionally: "If you'd like, I'd be happy to do some of the legwork for you
there. Would you be open to receiving a pull request?"

------
chris_wot
You really need to add Jack Reacher.

------
_mc
Please add Phoebe Buffay & Regina Phalange .. I will send a pull request may
be :D

~~~
NameNickHN
Thanks. Added.

------
nazarewk
Love the beginning :P

~~~
NameNickHN
I edited the order of names before creating the repo on Github, to be honest.
:-)

