> As I'm fairly new to all of this, I'd love to hear how I can turn this into a more viable experiment.

The most important thing is getting an estimate of how good your averages are. Try to modify a couple of things, for example how do the 67% change with sample size (verification)? How does the number turn out if you feed it biased data, e.g. only male phone numbers (falsification)?

Thanks. I solemnly swear to not abuse statistics, so I'll play around and update the piece.

The goal is to use this as a hook to get the public interested, then take them for the ride as I learn. That's why I tried to be really cautious about pointing out all the reasons why this particular model is bad.

I didn't mean to say you specifically were abusing statistics, it's just very easy to draw dubious conclusions. (Doing this intentionally would be the abuse I was talking about.)

I know, but I still want to take it seriously, and thank you for your criticism.

Thanks to your help, I found a huge, obvious problem that was keeping the accuracy low and remarkable stable no matter what size of training data I tried. Here's the update: http://www.fastcolabs.com/3012908/tracking/im-beating-the-ns...

