What we used for counting is slightly different from what you see in the Twitter widgets (yeah, those tweets are from Twitter directly).
In our backend, we have a pretty conservative filter that matches a bag of phrases, such as "voted for barrack obama", "voted for pres obama", etc. The accuracy is over 95%. Of course, political tweets are full of sarcasm and humor, and Twitter is full of demographic bias. This is just a fun project for us.
We don't remove RT tweets, but instead, we only count each user once. If a user retweeted Michelle, s/he probably will vote for Obama. But if a user have a few tweets in favor of Obama, it's counted once only.
This is not a scientific research, so I didn't compute std, t-stats, etc. But I did pull a few hundred tweets from our database and counted how many wrong ones we had. That's where the number comes from. The filtering scheme is very simple: classify only if we're confident. There are many tweets containing "voted", but we only took ones we have a strong confidence and throw away the rest. For a complete set of keywords used for filtering, please feel free to email.
Good observation! It is actually a seven-day smoothing to offset some weekly periodicity, but the backward influence is very limited. We tested this with a hourly granularity without smoothening (the curves become really fuzzy), and it shows very similar results.
There was a huge hype around Twitter data one or two years back. Then it was followed by suspicion and criticism Now we should take a more balanced view. Twitter data or social media data aren't all-powerful, but I believe that if we look at it from the right perspectives, we still could learn something meaningful, despite its demographic bias. But certainly we have to be very cautious about any conclusion we draw. This is still a work in progress and we'll try to extract unbiased information from the biased source.