We are continuing to look at ways to improve this system, possibly with Mixpanel.
There's absolutely no indication on Posterous that bots are likely responsible for thousands of the page views shown (which given the long tail nature of Posterous I imagine we're talking about the majority of the page views here for a lot of users).
One user even reported that GA was showing 14 visitors while Posterous was showing 9316:
Given the values are essentially meaningless why show them to your users at all ?
By talking about "deeper insight" rather than just openly admitting the data is just bad you're just delaying the inevitable backlash.
These numbers are far from meaningless. At the end of the day the numbers reflect what our servers see -- they track something different from Google Analytics but they are useful as a simple ballpark for how interesting or visited your blog post was. Also, they're realtime, which GA does not provide.
If the majority of page views are generates by bots then the number of page views you show isn't even a ballpark figure indicating how interesting/visited the blog post is. It's simply a semi-random number depending on crawlers/bot activity and nothing to do with real human visitors.
If you were simply off by 10% I could accept your answer, but you seem to be regularly off by a factor 200-300%.
Rich Pearson (VP Marketing - Posterous)
@imranghory No - we just don't filter out your own views
and search engine visits
So I built a very simple bot detector based on IP adresses and user agents. With your volume of traffic you can do this too. Be sure to keep updating it (automatically!) and apply the detection retroactively.
The bot detector cut well over 90% of the hits reported in the real time analytics. Users complained that suddenly they were not as popular on twitter as they thought they were but accepted that the new numbers are more accurate. That's the key benefit you need to explain to your users: accuracy. They care about it a lot more than the vocal minority suggests.
P.S. As long as we're here:
request.user_agent =~ /\b(HttpClient|Baidu|Gigabot|Googlebot|libwww-perl|lwp-trivial|msnbot|SiteUptime|Slurp|WordPress|ZIBB|ZyBorg)\b/i
That, theoretically speaking, is supposed to catch most common well-behaved bots. Feel free to copy/paste it if you want, as it is better than nothing, but it probably won't get you in the vicinity of good numbers.
Incidentally, since page views are a vanity metric which don't really matter, I don't think that worrying about "good numbers" is that productive either way.
The only reason I was eventually convinced to care for A/Bingo is that artificially suppressing conversion rates can through statistical tests off, and A/B tests actually create actionable information for my business. For people who blog because they want to be heard, though, would lying to them to tell them they're more popular than they actually are actually hurt them? That strikes me as being close to the canonical white lie, in response to "Does this dress make me look fat?"
It's even worse if you're using it for a company blog (like many startups are doing). Because your blog page view stats can often become part of investor pitches or sales material to show traction. If you write an advert and say 10,000 people read your blog when they don't and that influences your customers buying decision, you can be prosecuted for misleading your customers.
That said, I know my sites get many thousands of hits from rogue comment spam and email harvesting bots with forged user agents.
Seems to be pretty much par for the course, like when twitter announces a billion new users, most of which are probably bots and/or quora.
If Posterous can't provide meaningful stats (and I appreciate that filtering out bots may be non-trivial) then they should just avoid providing stats altogether. Bad data is worse than no data as it's much more misleading.
Is it any different from Victoria Secret bras running a cup size or three higher than everyone else, or dress sizes continually shifting downwards, or SAT scores being recentered upwards by 100 points?
Specifically, about tumblr not supporting an email posting option.
View counts are updated every five minutes.
Google Analytics is better at measuring visitors and filtering out impressions triggered by search engine bots, crawlers, or indexers.
See this page for how to set up Google Analytics on your Posterous site: http://help.posterous.com/how-to-add-google-analytics-to-you...
In the US, people refer to the named entity (is). In the UK, they refer to the collection of people that make up the entity (are).
How do UK speakers deal with non-named entities? For example:
My family is/are going out to the movie.
The class was/were getting restless.
These are common gotchas on the SAT in the US; one of their favorite tricks is using plural country names, like "The United States of America (is/are) ...", where in US English "is" is the correct version. Cheap trick that really only tests whether you've memorized ETS's stance on that particular grammar point, IMO, but that's the SAT for you...
Interesting, I didn't know that. I'd have presumed that one would use "is" when referring to the country, and "are" when referring to the group of states. Is this incorrect? (for what it's worth, I'm from the UK ;)
In Australian English (derived from but not identical to the UK):
My family is going to the movie. The class was getting restless.
And the programmer's favourite: The data is corrupt.