Hacker News new | comments | show | ask | jobs | submit login
Are Posterous Fudging Visitor Statistics? (awesomezombie.com)
53 points by ig1 2508 days ago | hide | past | web | 33 comments | favorite

Our focus is on building the best product. It so happens to be that we are aware of the issue and recommend that you use Google Analytics for deeper insight into visitors and views.

Here is a bit of how-the-sausage is made, but you wouldn't believe how much of a negative response we got from existing users once we went to a fully JavaScript analytics system. So by sheer volume of user input we made the decision to return to the original http request method of counting views.

We are continuing to look at ways to improve this system, possibly with Mixpanel.

Do you not think the page view stats are completely misleading your users ?

There's absolutely no indication on Posterous that bots are likely responsible for thousands of the page views shown (which given the long tail nature of Posterous I imagine we're talking about the majority of the page views here for a lot of users).

One user even reported that GA was showing 14 visitors while Posterous was showing 9316:


Whenever Posterous has given a response the focus has been on disabled javascript, etc. rather than the fact most of these views are likely generated by bots. For example the email you sent to the user here:


Given the values are essentially meaningless why show them to your users at all ?

Also I imagine you're going to get a very large negative response when you switch to a working method that's going to be along the lines of "where have all my visitors gone".

By talking about "deeper insight" rather than just openly admitting the data is just bad you're just delaying the inevitable backlash.

A few weeks ago we had some bugs in new batch processing code that caused some major errors there. These bugs have since been fixed. We are definitely not as far off as 9316 vs 14 in normal operation.

These numbers are far from meaningless. At the end of the day the numbers reflect what our servers see -- they track something different from Google Analytics but they are useful as a simple ballpark for how interesting or visited your blog post was. Also, they're realtime, which GA does not provide.

You're avoiding the question about misleading your users.

If the majority of page views are generates by bots then the number of page views you show isn't even a ballpark figure indicating how interesting/visited the blog post is. It's simply a semi-random number depending on crawlers/bot activity and nothing to do with real human visitors.

If you were simply off by 10% I could accept your answer, but you seem to be regularly off by a factor 200-300%.

Just in case there was any doubt about Posterous not filtering out bots given that Garry hasn't specifically admitted it:


  Rich Pearson (VP Marketing - Posterous)
  @imranghory No - we just don't filter out your own views 
  and search engine visits

Real time or not. Misleading numbers are misleading. Being real-time is no 'cover-up' for a misleading data.As far as 'simple ballpark for how interesting or visited your blog post' is concerned. If a bit visited me a 1000 time and just 100 people read it, that doesn't make it interesting. Reporting bots is seriously no use.

I had the same exact problem when I built the URL shortener Cligs, and back then there was no way to implement a JS-based solution: the HTTP 301 redirect gave instructions to the browser before it would even get to executing any JS.

So I built a very simple bot detector based on IP adresses and user agents. With your volume of traffic you can do this too. Be sure to keep updating it (automatically!) and apply the detection retroactively.

The bot detector cut well over 90% of the hits reported in the real time analytics. Users complained that suddenly they were not as popular on twitter as they thought they were but accepted that the new numbers are more accurate. That's the key benefit you need to explain to your users: accuracy. They care about it a lot more than the vocal minority suggests.

Wouldn't simply excluding any http request whose user agent contains "googlebot" from the displayed count make it dramatically more accurate without needing a switch to JS-based analytics?

Trust me on this: no. I used to have ABingo ignore hits from several common bots. I then got around to implementing JS challenges. Participants in A/B tests prior to login immediately decreased by 75%, to about what my GA stats showed.

P.S. As long as we're here:

request.user_agent =~ /\b(HttpClient|Baidu|Gigabot|Googlebot|libwww-perl|lwp-trivial|msnbot|SiteUptime|Slurp|WordPress|ZIBB|ZyBorg)\b/i

That, theoretically speaking, is supposed to catch most common well-behaved bots. Feel free to copy/paste it if you want, as it is better than nothing, but it probably won't get you in the vicinity of good numbers.

Incidentally, since page views are a vanity metric which don't really matter, I don't think that worrying about "good numbers" is that productive either way.

The only reason I was eventually convinced to care for A/Bingo is that artificially suppressing conversion rates can through statistical tests off, and A/B tests actually create actionable information for my business. For people who blog because they want to be heard, though, would lying to them to tell them they're more popular than they actually are actually hurt them? That strikes me as being close to the canonical white lie, in response to "Does this dress make me look fat?"

The key factor is that for a lot of bloggers audience is the reason they blog. Audience is the ROI of the time spent blogging. Someone who thinks they've got thousands of visitors may spend more time blogging rather than spending it with their families or working (which they would have done if they only had 10 readers)

It's even worse if you're using it for a company blog (like many startups are doing). Because your blog page view stats can often become part of investor pitches or sales material to show traction. If you write an advert and say 10,000 people read your blog when they don't and that influences your customers buying decision, you can be prosecuted for misleading your customers.

I think AWStats has a a list of "bot" useragents regexs. I'd probably start there.

That said, I know my sites get many thousands of hits from rogue comment spam and email harvesting bots with forged user agents.

They're just counting views in a particularly unrealistic but technically not incorrect way.

Seems to be pretty much par for the course, like when twitter announces a billion new users, most of which are probably bots and/or quora.

The test has to be what your users understand that value to mean. I'm betting almost all Posterous users think they're being read by significantly more people than they actually are.

If Posterous can't provide meaningful stats (and I appreciate that filtering out bots may be non-trivial) then they should just avoid providing stats altogether. Bad data is worse than no data as it's much more misleading.

For a site whose key value proposition is vanity - is it really? HotOrNot scales all ratings by (rating / 2 + 5.0), i.e. everybody's above average. According to the post by James Hong where he mentioned this, they seem to have found a notable increase in user satisfaction with the change.

Is it any different from Victoria Secret bras running a cup size or three higher than everyone else, or dress sizes continually shifting downwards, or SAT scores being recentered upwards by 100 points?

I agree with you, but no company's going to leap forward and be honest about numbers, they're going to use whichever makes them look biggest/growing strongest to their users, investors and competitors, and all the numbers are "right" in some context.

"Our goal is to be 100% transparent with everything we do at Posterous, especially when it affects your blog and content" -Sachin cofounder, posterous.com

From: http://news.ycombinator.com/item?id=1309849

They indeed fibbed during their campaign about how posterous is better than blog brand x.

Specifically, about tumblr not supporting an email posting option.

There are many things that trigger a "view" for a post and a site: viewing the blog triggers a view for all posts visible on that page. In addition, a view will be triggered by the blog post page itself, bots and crawlers, and http://posterous.com/explore.

View counts are updated every five minutes.

Google Analytics is better at measuring visitors and filtering out impressions triggered by search engine bots, crawlers, or indexers.

See this page for how to set up Google Analytics on your Posterous site: http://help.posterous.com/how-to-add-google-analytics-to-you...

I've noticed that as soon as I submit a new post to posterous, it gets about 24 views. I'm quite sure those aren't people.

(Unrelated/offtopic: is "are" or "is" correct in the title?)

It's a cultural difference.

In the US, people refer to the named entity (is). In the UK, they refer to the collection of people that make up the entity (are).

Interesting, good to know.

How do UK speakers deal with non-named entities? For example:

My family is/are going out to the movie. The class was/were getting restless.

These are common gotchas on the SAT in the US; one of their favorite tricks is using plural country names, like "The United States of America (is/are) ...", where in US English "is" is the correct version. Cheap trick that really only tests whether you've memorized ETS's stance on that particular grammar point, IMO, but that's the SAT for you...

"The United States of America (is/are) ...", where in US English "is" is the correct version.

Interesting, I didn't know that. I'd have presumed that one would use "is" when referring to the country, and "are" when referring to the group of states. Is this incorrect? (for what it's worth, I'm from the UK ;)

> My family is/are going out to the movie. The class was/were getting restless.

In Australian English (derived from but not identical to the UK):

My family is going to the movie. The class was getting restless.

And the programmer's favourite: The data is corrupt.

Sorry to reply to myself, but I actually forgot to put my main point in, which was that British English generally refers to the group itself as the entity, whereas it seems that American English refers to the group elements individually.

I didn't know that difference and was wondering the same thing, thanks. It is funny how that subtle difference made me read the title like five times to understand it.

Yes, the poster is from the UK where companies' names like "Posterous" are considered plural:


Terrible title. "Fudging statistics" and "not actively excluding bots" are totally different.

I did actually check a dictionary before I used that term and the definition of "Present or deal with (something) in a vague, noncommittal, or inadequate way" seemed suitable.

Posterous is/are in touch with what users want from a blogging platform, but they seem to be out of touch with how users expect to be treated.[edit: grammar]

Those counts actually includes hits by the spiders, too.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact