
Google deleted whocalled.us for “Pure Spam” and replaced it with spam - whocalledus
In November Google completely removed whocalled.us from their search results.<p>It was the first site of its kind, created in 2005 for crowdsourced info about telemarketing numbers. There were scripts for people to utilize in their VoIP boxes, and it was a good honest site.<p>Google has been trying to weed out telephone spam in their results, and for some reason they deleted the original instead of the useless empty shell sites that spam every possible telephone number combination.<p>If I search &quot;whocalled.us&quot; on Google now I see #1 the Google+ page for the site, and #2 a WordPress spam page. They deleted whocalled.us with a &quot;Pure Spam&quot; manual action, and literally replaced it with <i>pure spam</i>.<p>whocalled.us has never used spam tactics. That is why copycats quickly beat it in terms of traffic. The first competitor used spammy SEO techniques, and is often the #1 result now for telephone number searches.<p>I&#x27;ve been making websites since before Google existed, and it feels like my ways are going extinct. Prior to this &quot;Pure Spam&quot; removal, there was a partial action for unnatural links. I was shocked that a search engine sent me a notice that my website would be penalized unless I contacted other websites to have them remove links to my site. Remove links? That is the whole point of the web!<p>It included some suggestions of &quot;spam&quot; sites with links to mine, and when I clicked one I saw someone&#x27;s personal blog with some links to their favorite sites. I don&#x27;t know what world this is, where a search engine tells a webmaster they need to contact a small-time blog owner to have them remove a link recommending your site, but it&#x27;s not mine.<p>To me this is still the World Wide Web, where little guys like me can play on the same field as global giants. But in reality, this is Google&#x27;s game now, and when they kick you off their field, there&#x27;s not much to do but sulk and go home.
======
mtmail
Your sitemap contains about 1 million URLs. When I look at
[http://whocalled.us/lookup/2104495665](http://whocalled.us/lookup/2104495665)
I see hardly information other than the phone number itself. And Google
Adsense advertising. Most of the pages look exactly the same. The ones linked
on the homepage contain profanity. The category pages, e.g.
[http://whocalled.us/lookup/sanantonio](http://whocalled.us/lookup/sanantonio)
are just lists of numbers.

From the outside it looks like a typical content database where every entry is
a page with advertising. I can imagine those are no longer in the new Google
algorithm's favor. What makes you think your website deserves to be listed on

The usual process to get Google employees to reconsider indexing the website
again is called a reconsideration request
[https://support.google.com/webmasters/answer/35843?hl=en](https://support.google.com/webmasters/answer/35843?hl=en)

"I've been making websites since before Google existed, and it feels like my
ways are going extinct."

Yes I think creating content databases and giving every piece of information a
URL and adding untargeted Adsense is no longer a business. Spammers (not
whocalled.us) have been misusing that SEO tactic for too long.

~~~
seba_dos1
> giving every piece of information a URL

Wasn't that supposed to be the whole purpose of URL?

~~~
saurik
This kind of website does not have a URL for every piece of information: it
has a public URL for every potential piece of information, and it expects
those to show up in search results. This would be similar to having every
combination of words show up in your search results as a blank page on
Wikipedia that you can edit to add content. When I use Google to find content,
I want to find actual content, not ten billion placeholders: if the site
doesn't have information on a particular phone number it should return a 404,
not a 200, and not be indexed; there should then be a way to submit
information into the database (actual information, not something totally
useless like "reported") that then creates that URL. From the home page (which
hopefully at this point has high page rank for being a non-spammy resource)
they then should link to a list of "updates", which will be seen to change
often, and the links on the other end of those will also change often, so
Google will pick up new content quickly and efficiently. Yes: I realize that
these websites are trying to rely on the search query as their discovery tool
to get people to add content, and so doing this "harms" them, but if every one
of them does this it becomes a useless discovery tool anyway (as the page of
results is just pages and pages of these placeholders); imagine if every wiki
did this: chaos.

~~~
whocalledus
Which website? whocalled.us does not generate or list any URL that does not
have information. Empty pages were never indexed by Google.

If you search Google for a random telephone number you will see a ton of empty
sites who list every possible number. whocalled.us has _never_ done that.

------
klenwell
Sorry to hear this. I always like whocalled.us. I thought it a useful service.
I used to Google strange numbers I didn't recognize that called my cell phone
(until I just started ignoring them all together). I didn't really care who
gave me an answer, but among all the copycat sites that quickly popped up, I
came to recognize whocalled.us as a legitimate source of the info I was
looking for. It seemed the most active and least spammy of its class.

Was it perhaps the target of some of the sleazy blackhat SEO tactics that have
been discussed elsewhere on this site?

------
uptown
I can't speak to why they've penalized you, but it sounds similar to what they
did with RapGenius. Rap Genius would pay others to link to them to help
increase their pagerank. Google caught wind of it, and penalized them for
gaming the system.

[http://searchenginewatch.com/sew/news/2321516/rap-genius-
no-...](http://searchenginewatch.com/sew/news/2321516/rap-genius-no-seo-
genius-lyric-site-fails-to-recover-traffic-after-google-penalty)

I'm not sure why they think you've done the same - but that's likely why they
are blaming you for the links others have created to your site. I agree - if
you did nothing to encourage those links to be created, it's difficult to see
how you should be responsible for their removal, but hopefully the article
I've linked will give you more context as to what Google is probably trying to
weed out from their search results in order to help you get it resolved.

~~~
graeme
It's not just payment. Google also considers an unnatural link profile as
evidence of spammy actions.

This is perhaps reasonable in isolation and in most cases. But it certainly
raises the possibility of negative SEO, where you point large numbers of spam
links to a competitor's site in order to blacklist them with Google.

See this for example: [http://moz.com/blog/a-startling-case-study-of-manual-
penalti...](http://moz.com/blog/a-startling-case-study-of-manual-penalties-
and-negative-seo)

I run a site that has received a few tens of thousands of visitors. I checked
webmaster tools. I have a bunch of spam links from btclush.com, marqueefy.com
and some russian sites. I can't actually find the linking pages, btclush.com
itself is blank and all content is on subdomains.

It's not clear to me what I, a small webmaster, should be doing about this.
I'm hoping these sites send spam links to almost every site – in that case
presumably google is aware that site owners are not at fault.

I don't have the time to contact spam sites to tell them not to link to me or
to disavow all these links. Note that the composition of the spam links
changes. The last time I checked different spam sites were linking to me.

------
wallflower
If I had to guess, they are probably going to introduce a similar service to
their Google Voice service soon (e.g. a 'spam' filter that auto
directs/categorizes > voice mail).

Too many ppl I know get spammed by recruiters because they once made a mistake
of putting their real contact phone number on a resume for a major jobs board.

------
z3t4
One huge aspect of the problem is that the browser developers are in bed with
the search engines. Whatever you type in the address bar, it will be
redirected to Google or Bing.

I would almost go so far as to say browsers and search engines are killing the
web. It's no longer a web.

People will tell you to not depend on the search engines. But what do you do
now when you are blocked? Start advertising on AdWords? LOL!

My advice is to build a community. Get people involved with your site, make
them return often.

~~~
pyre
> I would almost go so far as to say browsers and search engines are killing
> the web. It's no longer a web.

How do you want people to find things on the web if not via search engine? Do
you want to go back to the days of the "Web Ring?" Or are people supposed to
crowd source their search via their social graph on Facebook?

~~~
z3t4
I have nothing against search engines. What I'm concerned about is that many
people do not use links or addresses, they just search!

When people no longer use addresses to access stuff, the web will be broken.

Crowd sourced search is actually not a bad idea! Most things are still
discovered through mouth-to-mouth and sites like HN. Search engines has a long
way to go before they can actually make better suggestions then humans.

------
sleazebreeze
As a side note, thank you for creating that site. My phone number was sold
some time ago and I regularly get telemarketers calling me and a quick google
search of any unknown number would find your site and let me make the
determination to answer or not.

------
gress
I have been receiving spam calls recently despite being on the national do not
call register. Thanks to google, I wasn aware of this resource until today.

There is no defense against Google's actions here other than to acknowledge
that they make mistakes and need to take more responsibility given the power
they hold.

If they don't become more transparent about appealing these mistakes,
eventually we as a society should develop a legal recourse. There simply isn't
enough competition in search to enable these kinds of errors to be corrected
through market mechanisms.

------
toddkaufmann
Suppose 100 top spam sites all had a portion of some top legitimate content
site (like NYT). Wouldn't this lower the ranking of the legitimate site?

Ignoring other factors like number of incoming links--which for whocalled.us
is probably a low number (why would anyone link to it?)--it seems like spam
could (temporarily at least) pull down the others somewhat especially once any
results can make it to the first page and get some clicks. In fact if there
are 10 spam sites that link to each other or use some slimy affiliates they
might get better listings.

I use services "like" whocalled.us all the time, usually for nearly every
incoming number I don't recognize, before answering. I've seen whocalled.us
and used it before, others equally often. I've noted a couple times (and even
bookmarked I think) "this site seems less spammy than the others), but I don't
remember which.

If there were some way to differentiate yourself from the others... allow
people to register themselves as not telemarketers, or a business listing, or
say who you are? I don't see how to verify or prevent abuse. I'm sure you've
thought about this much more than me.

------
madsravn
I'm not sure what your site is about. Looking at it right now, nothing really
is stated about the numbers who called. It's a lot of empty fields and
"unknown"s.

I can't see why Google would penalize you like this unless you have been
suspected of doing dirty work.

------
zach
Essentially, this is the way that Google sets the standard for acceptable web
content.

It's subjective, it's vague, and sometimes unfair but that's the burden they
have to bear. Defining the "quality" of a web page for a search has become as
important a factor to web search as any other.

But "search quality" is a leaky abstraction like any other. When someone looks
up a random phone number that just called them, they aren't looking for a blog
post about it. They're looking for a very small amount of information.

Let's imagine we're determining the quality of a web page as a search result.
Say we're looking for the combatants of the Second Boer War, so our query is
"second boer war combatants". Now, if we had a page that listed only and
exactly the names of those combatants, that would seem to be the "answer"
we're looking for. But wait, isn't it better to see a page about the Second
Boer War which has a lot of other information about the conflict? Or a page
about the military history of the most notable combatant? Or even a table of
major historical conflicts with their combatants?

Basically, although it would seem like a precise "answer page" is the best,
there are a lot of other factors. The most popular kind of search results are
often Wikipedia pages, so people seem to want that level of information on a
subject. People sometimes search for something as a way to get to something
else. The nature of the web indicates that pages with links to further
information are more useful than those without them. And of course, as
untrustworthy sites proliferate, the more well-compiled, organized and
correlated information is on a page, the more reliable it is.

So ultimately, I feel like it's inevitable. As search has improved, Google has
to make these kind of choices, in a quantifiable way, about the nature of a
quality search result. And that ends up, in the tools-make-us stage of the web
we're in, shaping the envelope of what web content is popular, even in some
way acceptable. In turn, the web becomes easier for Google users to use by
becoming easier for Google to index and search effectively.

But this is the more painful reality, as I also know. You get pushed out of
the Google results not by sites with better information, not by sites that are
more relevant, but by sites which seem like they can play the "Google game"
better because they have the resources and profit motive to put together good
metrics for the myriad signals that make you rise in the ranks.

------
tehwebguy
In my opinion you have been extremely fortunate to have made it this long
before getting smacked down. Continue for _my opinion_ which is pretty harsh:

1\. Almost all of your pages probably have no content

You've got 1.9M+ calls. Some numbers like 2145627653 have multiple calls but
none of them provide anything new. Others like 4802550681 have 1 or more calls
but no information other than city + state which is available on like every
phone now. Others have no calls, no information at all.

2\. As a result, almost all search users probably bounce, immediately

People want to know who is calling them, right now. They have like 6 seconds
before the caller hangs up so they need to know - when they realize there is
no information they are going to bail. They are also going to be pissed off
that the top result on Google is asking them for information rather than
giving it to them.

3\. Your traffic is probably almost exclusively from search

Again I'm guessing here but I can't imagine that you are getting more than
5-10% of your traffic from sources outside of people searching for information
on a phone number. That says to me that people don't think there is a reason
to come back.

Yeah, you were first, but who cares? Your site sat around pissing off search
engine users (including me) for a decade and still looks like the MVP it was
in 2006. You may not be actively spamming the web but your website is
passively spamming Google. Those other guys might be worse but who cares? That
doesn't change the other facts and their punishment will likely come soon.
That argument doesn't get you off the hook with Google.

If you want to get back on top you're going to need a better offering than
effectively being the first result on Google and a comments system.

~~~
whocalledus
There's a couple million phone calls reported, but there's less than a million
telephone number pages for Google to index. There's 63 thousand numbers with
more than 1 comment, 195 thousand with at least 1. If the problem is that the
caller ID name with date of call is not rich enough data for Google, then I
can limit the sitemap to only pages with textual comments. But Google is not
saying that, or that they will reinclude it if I do that. They're not saying
anything except that the site is using illicit practices.

Prior to Google removing whocalled.us, it accounted for 68% of traffic. 15%
was direct, 6% Yahoo, 4.6% Bing. The fact that it is an extension of search,
and not primarily a website people want to return to does not in any way
demote its value to people. I do not visit Wikipedia directly, but I still
want it in my search results.

If the time of this kind of site is over, then great, remove them all. Why
pick on whocalled.us?

~~~
tehwebguy
My guess? Because you were first. I wouldn't be surprised if the others are
gone soon.

Side note, searching on your site pulls up a slightly different domain for me.
Is that on purpose?

~~~
whocalledus
I think other sites are interfacing with whocalled.us through the HTML. I see
it listed on sites like Spokeo where it shows comments, and I don't think
they're using the API. Plus the site code is old, and people tend to dislike
it when you mess with how their site works.

So I made whocalld.us as a place to write a new interface from scratch that
uses the same database. That is where I added fulltext indexing for search.
Previously the search box on whocalled.us used Google Site Search. But I want
to remove Google services, so rather than add fulltext search to whocalled.us,
I pointed it to whocalld.us for now.

I figured if I could rewrite the site to be better, then people would allow me
to replace whocalled.us with that one.

I thought maybe that's why Googlebot detected whocalled.us as spam, if it saw
duplicate text on whocalld.us. But I tried things like denying Googlebot
access to whocalld.us with robots.txt, and setting noindex in meta tags, but
nothing helped. If that were the issue then a person could see I own both
sites during the reconsideration request, and either remove the "Pure Spam"
penalty or provide some clue as to how I should fix it. Besides, if that was
enough to get my site removed, then what's stopping malicious people from
doing the same to other sites?

Either way, I should have the freedom to fork my own website to recode it if I
want without having to worry about the Google police. I don't build websites
for Google, so if this is how things work, I'll have to find a way to thrive
on the web without Google's help. We did it before, and we can do it again.

The other domain, whocalld.us, is also deleted as "Pure Spam" too.

------
jayzalowitz
Create an android app that automatically sends telemarketers to voice mail?

~~~
qiqing
Spam calls are a much larger problem in China. If only we had Xiaomi's spam
filtering features in the phone's UI. "MIUI users can label an incoming call
as spam by tapping a button, if enough people do it all MIUI users will see
the number as spam."[1]

1\. [http://techcrunch.com/2015/02/12/liveblog-xiaomi-explains-
it...](http://techcrunch.com/2015/02/12/liveblog-xiaomi-explains-itself-to-
silicon-valley/)

------
bitL
I am almost sure this is a machine learning fail on Google's side. Your site
probably matches some spammy profile and they just treat their algorithms as
objective truth beyond certain threshold. I think we should all get used to
this broken AI everywhere for the next 20 years...

------
vgeek
You probably didn't spend enough on AdWords.

------
arkitaip
There is very little meaningful content on your site, which makes it hard to
see how it is different from the spam sites.

------
hackaflocka
Google hires people from the top universities. People with 4.0 GPAs. People
who go through strenuous interviews where they are asked questions such as
"how would you move the pyramids of Egypt."

You live in their world. This is their world. Assume that they did something
right, and you did something wrong.

