

MailChimp's Project Omnivore: genetic algorithm predicts email abuse rates - bentlegen
http://www.mailchimp.com/blog/project-omnivore-declassified/

======
hadley
But why genetic algorithms? It seems odd to start with something so complex
and so slow instead of starting with simple (eg. generalized linear models)
and then work your way up if they are not good enough. I suspect there are
many classical statistical techniques and modern machine learning algorithms
that would give better results, be faster and would have well understood
principles that can be used to diagnose the model.

~~~
gnosis
Genetic algorithms (and other evolutionary algorithms) aren't necessarily
complex nor slow. They could get faster and better results than traditional
algorithms. And, best of all, you don't have to understand your problem domain
to use them.

This last point is very important. You could just throw an EA at a problem
without understanding it at all. Just give the EA the task with some variables
to optimize and it'll come up with solutions.

You do have to come up with an automated way of judging which solution is
best, but that's usually much easier than understanding the problem domain
well enough to model it with a traditional technique.

Take a look at Monica Anderson's talks below for a high-level overview of some
of the advantages of using EA's and other "model-free methods":

<http://videos.syntience.com/>

~~~
hadley
Best of all you don't need to understand your problem domain?! This sounds
like a recipe for disaster to me - how do you tell if your model is giving you
nonsense or not?

~~~
cmorrisrsg
We had a fairly conservative cross-validation function that tested the results
of the genetic optimization for a first pass. Following that, Omnivore was put
into "observation" mode, so we could verify that the results it generated were
predictive going forward, and not just biased towards the test set. Over the
course of the past few months, the model has held up quite successfully.

Since accusing a customer of being a spammer without non-heuristic proof can
be an emotion-ridden experience for everyone involved, we were conservative
and optimized for accuracy over efficiency pretty much every step of the
process.

~~~
hadley
I don't doubt the quality of your final model, I just think you might have
found it easier to get there with a better method.

------
mrkurt
We use MailChimp, and I'm impressed that they've managed to make such an
awesome app for something totally un-sexy.

~~~
henning
I love their adorable monkey mascot.
<http://webimages.mailchimp.com/img/classic/bg_campaign.jpg>

~~~
dangrossman
It's not a monkey, it's a chimp.

------
dminor
It's getting more and more difficult to send legitimate email to a list
through your own server these days. MailChimp's free tools have been helpful,
but I think we'll have to bite the bullet and go certified soon.

