

How we scaled to generate personalities for 200 million people - enoex1
http://vasir.net/blog/infrastructure/five-labs-generating-200-million-personalities

======
tulpa
Nice notes about managing cache invalidation! Interested in seeing where you
guys go with your product.

------
seanalair
Very interesting. I like the idea of putting everything on one server so you
can just spin more up.

------
tannerc
Nice insights from a technical standpoint, though I'm more interested in the
machine learning aspect. Was it dictionary based? How does the system account
for sarcasm or the billion+ meme/BuzzFeed posts?

~~~
enoex1
I'll be posting a follow up about the machine learning bit in the near future.
It uses not just words, but also phrases. For the meme / buzzfeed posts, more
weight is given to content you write vs. links / articles you post (and we
only take into account what you say if you do share a link, not the content
the buzzfeed post itself).

It doesn't really try to distinguish sarcasm. Depending on the sample size
(ours used 75k people with ~750m words / phrases), it could conceivably detect
sarcasm. Yeah, totally. /s (Maybe, but probably not)

The study itself is published at
[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783449/](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783449/)

------
poezn
Posts like this is what I like about Hacker News! It would be great if there
were other examples like this where organizations share their setup for real-
world projects and presences.

------
applecart
Great post - nice layout of the process used. Thanks for the in-depth look at
what you did to make it work on such a large scale.

------
popwarsweet
Will you be posting any statistics on users personalities? (Anonymous of
course)

P.s. Your name is hazardous bra

