Hacker Newsnew | comments | show | ask | jobs | submitlogin
Are Web Development Search Results Being Manipulated? (impressivewebs.com)
95 points by limedaring 1082 days ago | 54 comments



It's nothing black hat going on here. W3schools is just using wildcard subdomains and whenever people are accidentally linking to wwww.w3schools the Google bot picks it up.

I love W3schools btw. Got me into coding and I still use it for reference. Hate me all you want but I'm tired of people bashing them for no reason.

-----


W3schools is a cancer. They have old, outdated information. Their website, their content, and their business model is all a throwback to last decade. They sit atop of the search results, doing nothing, and raking in cash while providing a poor service. Meanwhile, their very dominance sucks all the oxygen (page views, ad dollars) away, ensuring that better competitors can't grow.

What redeeming feature do they have? If you aren't easy to use, accurate, comprehensive, or responsive...how good a reference site are you?

I've yet to see anyone bash W3schools for no reason, but I see plenty of people defending them for no reason.

-----


Is there another alternative out there? I feel like the mozilla developer docs are the only thing that comes to mind offhand.

-----


http://www.w3.org/standards/ of course.

-----


You answered your own question. MDN has far better docs.

-----


htmldog: http://htmldog.com/ is good resource. i know this is a far stretch but they're some really good books out there for quick referencing...i know it's not as fast as opening a new tab, but some of the books: "the definitive guides" are great.

http://www.amazon.com/s/ref=pd_lpo_k2_dp_sr_sq_top?ie=UTF8&#...

i generally write code at my desk more so then on the go, so i always have those books handy

-----


Tizag, which is second in the results, is a pretty good resource.

-----


"Tizag, which is second in the results, is a pretty good resource."

Their Perl tutorial set is pretty terrible. They call it "PERL" (it's Perl), they believe the latest stable version is 5.10 (it's 5.14) and the code samples are simply awful.

-----


The Opera Web Standards Curriculum, though it isn't a reference, is a fantastic resource for beginners.

Unfortunately they seem to have moved it onto w3c's domain and now it looks completely awful: http://www.w3.org/community/webed/wiki/Main_Page

-----


http://javascripture.com is a pretty quick reference for just javascript. http://dochub.io has MDN docs plus some others in a nice, consistent view.

-----


w3schools has a lot of misinformation...

http://w3fools.com/

-----


In case you missed it, in the original post, I included a link to a JSBin containing many of the subdomains:

http://jsbin.com/upamof/edit#html,live

Those are not accidental. No one types "www.jigsaw". That's intentionally affiliating the URL with the word "jigsaw" that comes from the W3's validator. Look at the other subdomains. Those can't be accidents. It's not a wildcard thing at all.

-----


http://foobar.w3schools.com/jsref/jsref_replace.asp

http://sdfaxdds.w3schools.com/jsref/jsref_replace.asp

That looks like wildcard DNS to me. The question is where the linkbacks are coming from. It is possible w3schools is creating those themselves, in which case that does seem like unethical SEO tricks. But if not, well, they can't be responsible for how people link to them.

-----


Oh, I know that wildcard subdomains are valid, but it's simply not possible for people to type these subdomains (http://jsbin.com/upamof/edit#html,live) by accident.

My opinion is that they're trying to use technically-allowable but questionable methods to bypass the fact that people are 'blocking' them from search results.

-----


How would people be mistyping "www" as "ww1"?

-----


Looks around guiltily - I still use their site too when I need something quick and I know they have it. Sure, some of the info is old. But some of the stuff doesn't change and is just as relevant now as it ever was. If they come up first in a google result and it's for a query I know they have the answer to, you're darn right I'll click it :)

-----


>They have as much power in web dev search results as Wikipedia has in regular search results. That is just wrong.

Except they've been providing useful info for longer than Wikipedia has been around. Sounds kind of right to me. They may not be as current as they used to be, but for the 12 years that I've used them they have helped fill in my knowledge cracks and I've appreciated it.

I don't understand the hate. If you really want someone to hate, hate the W3C for making their info so cryptic or non-existent for years that places like W3Schools had a market.

-----


Mostly that the Mozilla docs are far superior, and it hurts all of us if newbs continue to hit up really confusing documentation.

-----


May I suggest we all block w3schools.com at: http://www.google.com/reviews/t

Google states that it "may use everyone's blocking information to improve the ranking of search results overall" so this may be the best way to take action.

-----


I'm also sick of seeing their site on the first page of google, most of the time their content is completely out of date.

You can permanently block them from search results on Google by visiting: http://www.google.com/reviews/t and entering http://w3schools.com

-----


I have had both http://w3schools.com and http://www.w3schools.com in my block list for months, but w3schools still show up in my search results, and with www and not wwww or www1 or something like that. :-(

-----


This is honestly why I use duckduckgo on any of my dev computers. They don't always have best results, but for stuff like java(script) not only do they have the !js/!java syntax for going straight to the docs but they also don't have over-seo'd websites cluttering up their results.

Edit: Also On DDG the same search would have been !js replace and you would have seen this page : https://developer.mozilla.org/en-US/search?q=replace

-----


I actually prefer to not go straight to the docs because I find the Mozilla developer site to be somewhat slow (and occasionally really slow), and the search page to not be the greatest. So unless I get the name of what I'm looking up exactly right, there's just too much latency. I forget who taught me this (maybe on here), but when I google I just add a "MDN" to all my js searches and the specific Mozilla page for that topic is (usually) right at the top.

I guess it's kind of like a bang, and way better than having to write "site:developer.mozilla.org"

-----


While I give credit for the Moz docs for being accurate.

They're not exactly the easiest things to browse or find what you're looking for. For webdevs they've got a load of stuff that you just don't care about.

-----


I like DDG, but I'm finding it hard to break the Google habit.

I did just check out their settings area and it offers a lot of customisation which I love. I might just give it another (duck duck) go!

-----


The trick is to always use DDG, but just add !g to search google when you're not satisfied (or !gi to search google images). You start to use google less and less after you get used to the DDG layout: At first it seems (for some reason) like you're getting less useful results on DDG - but after a while you can see you're getting pretty much the same results, in a different format!

-----


I don't have as many problems with W3Schools as the author of this and many comments below here. What you say is true, but their target audience is mainly newbies, not professionals who know what they are doing. They might provide some outdated information, some things may even be false. But do they deserve to be called a cancer? (Read it below in the other comments, and that wasn't the first time I read it.) After all, doesn't every website with lots of information make mistakes?

Also in the case of the author, who was looking for about the second most basic Javascript method there is, the result seemed appropriate. He didn't even have to click the page, the info was right there in front of him.

Don't get me wrong though, I do agree that their different subdomains are a wrong thing to do, and I fully agree if Google decides to punish them by blocking/lowering results for *.w3schools.com for a couple months. I also agree that the Mozilla Developer Network may be a much better resource, both for newbies and professionals. But if you are that much against W3Schools, why don't you use the search function of MDN instead of Google's general search?

-----


As you say, they contain outdated and out right false information.

Now, it's not merely being wrong once in a blue moon or one or two articles being a bit behind the curve, but the sheer egregiousness of its mistakes plus the lack of action despite being shown to be incorrect (see: w3fools.com) which in the eyes of many qualifies it to be called a "cancer"; certainly in the realm of web development resources, that label seems rather valid.

-----


Having all of these mirrors of their domain indexed certainly isn't helping W3schools from an SEO perspective.

It is highly unlikely that this was done intentionally, just wildcard subdomains set as many have already said.

Read the last few comments on the actual post for good laugh.

-----


Having two or three domains with the same content should be OK. After all, many sites have a www and simply a top level domain with the same content. But having more than that should be penalized by the search engine gnomes.

-----


Best practices dictate that you either use toplevel or www, but not both.

W3Schools is definitely doing something scuzzy here.

-----


Why not use a 301 instead of having duplicated content?

-----


That's what I do, but I'm just saying a lot of sites duplicate the content without intending to do anything nefarious.

-----


Paging Dr. Cutts....

-----


I think Google classifies sites in different buckets. There's the "trusted" sites - the brand names that will only get penalized by their algorithms if they do something very very wrong, and another bucket, where you get less leeway. W3Schools is in the first.

Of course, big brands do get penalized like JCPenny and Forbes, but this often happens when they're called out by the media. Google for the most part, since Panda have said they want to rely more on algorithms rather than on handjobs

-----


which leaves us the questions:

1) why is w3school in that list?

2) if they got it because lots of (outdated) sites are linking to them, why do their alternate subdomains get a similar bonus?

3) this whole privileged "authority" bucket crap stinks. it used to be that a really good sub-page on somebody's geocities site could be the no.1 go-to first result for a certain search topic. why? because loads of people that knew it was good would link to it because it was just that good of a comprehensive resource. hearing more and more about this new ranking method makes me wonder whether it's even possible for a small guy to come up top in the results like that. And it's not so much that I feel for this small guy, but rather that I know I'm missing out on a lot of honest good web content that Google simply isn't showing me.

-----


> this new ranking method makes me wonder whether it's even possible for a small guy to come up top in the results like that

If you search for "Remote Unix" on Google (at least for me), a post on my humble blog is the second result. This isn't exactly the narrowest search term, and I'm certainly no juggernaut of a site, so I take that to mean that small random-ass pages can still rank well for a query.

-----


> because loads of people that knew it was good would link to it because it was just that good of a comprehensive resource

This is exactly why w3schools is at the the top of search results, and exactly why the http://w3fools.com was started -- because there are a ton of developers out there that think that w3schools is authoritative, correct, and kept up to date and link to it and use it. At one time it seemed to fill a niche, but no longer, and the cycle needs to be broken. In the meantime, unless you do some kind of sentiment analysis (and arbitrary changes based on the leanings of google, which you seem to be classifying as a bad thing), pagerank at least would probably conclude that the internet still loves w3schools.

> whether it's even possible for a small guy to come up top in the results like that

w3schools is actually about as small guy as you can get. They may have ubiquity, but it's run by like two people.

> I know I'm missing out on a lot of honest good web content that Google simply isn't showing me.

Yes, but there is too much good web content out there for you to see in your lifetime anyway. Meanwhile, defining "good" based on a nebulous query is kind of the crux of the whole problem, isn't it? I'm not convinced there is an answer.

-----


Aren't all search results manipulated if you think about it - that is what the SEO "experts" do. And I think the right word is "gamed".

-----


This is a clear violation of the webmaster guidelines - the subdomains should get swatted by google soon.

-----


How do we report that to google? Interestingly, I googled that, but couldn't find it :S

-----


The same way you report anything to Google - you whine on your own blog or kick up a stink on social media, and hope a Google employee who cares notices…

(Or, I suspect having a $100k+/month Adwords buy probably gives you a magical number to call.)

-----


Actually Google works pretty hard to keep the Adwords and web spam teams separated; they are discouraged from talking to each other at all.

-----


Apparently not, since this was posted in December.

-----


w3schools.com runs google.api's and hosts ads using Google services.

This very day, I have it on good authority that their ability to game Google's search algorithm is entirely coincidental.

http://news.ycombinator.com/item?id=3740869

-----


Or maybe they are just trying to parallelize downloads. http://code.google.com/speed/page-speed/docs/rtt.html#Parall...

-----


This certainly would explain a lot. I don't even see the option to block that site, though I have blocked a few others before. It doesn't seem to do anything at all. Bigresource.com is my nemesis.

-----


I believe you can only block a site if you click a link and hit back within a certain amount of time.

-----


Plus you need to be logged into google account

-----


You need a Google account to block sites, but you can also manually add items to the list here: http://www.google.com/reviews/t

If you enter http://bigresource.com on that page, it will block that site and all subdomains.

-----


Now, this is going to bring them even more back links to make sure that they remain at the top once again.

-----


oh yes, subdomains gone wild have to do with SEO, but not in the way outlined by the author.

duplicate (sub)domains resulting in duplicate sites with duplicate pages mostly have a negative impact on the performance of a webproperty in the SERPs.

why?

a link to www1.example.com does not automatically count as a vote for www.example.com, a link to www.example.com does not count as a vote for www1.example.com, that means they now how two websites both with one vote, instead of one websites with two votes.

additionally, you have two websites which are in competition to each other, and each of these websites has duplicated webpages which are in competition to each other. both websites and webpages usually perform poorer - if google has a doubt which of these duplicated pages on these duplicate sites is the best page to point the user to (said that, google is pretty good in stripping doubt out of the equation for subdomain duplicate pages issues)

if you have a webproperty with a "subdomains gone wild" issue, it is best practice to canonical (either via the canonical tag or via HTTP 301 redirect) them to one (sub)domain. it almost ever (depending on how big the issues was) results in a better performance of the canonicalized webproperty - it definitely helps the site on the (organic) linkbuilding front.

there are thousand reason why a webproperty can have a subdomain gone wild issue (it was (once upon a time) even a common black hat practice of spamming google with subdomain gone wild duplicate web-properties of the competition sites if possible (and it's possible with sites with a wildcard subdomain setting (i.e. wildcard.w3schools.com))

but in most cases "subdomains gone wild" does not have a positive impact on the SERPs. (blocking results is not one of this cases.)

but yeah, what the real issues with w3school? why does this sh/t rank so well?

well first of all it should be said: "you are not statistically significant" just because you (in this case we, the HN readers) are not happy with w3school does not mean the average searcher (searching for HTML web dev stuff) is not happy with it. they are happy (hey, they don't know better) and they use it like crazy. they are happy with what they find. as google is measuring SEPR "long click" very effectively they know exactly how well the average search users uses a page/site. if all users would click back immediately, would not stay long on the page, w3school would not be so dominant as it currently is.

secondly: links - i just did some backlinks check ons some obscure w3school URLs, they all have links. now you might say: oh they buy links... does not look like it, they have links from old forums, new forums, blogs, .edu domains, ...

and: where is the competition? mdn does a great job, but it is a resource for developers, for people who know what they are doing. w3school is a great resource for people who do not know what they are doing or what they are looking for - and w3school has a special page for every single thing / tag that we don't even bother to mention anymore (it's outdated, it's old, some say ugly, but it's there and has a unique description text, a unique example on the page) - that means they have a page for most of the people using the internet interested in HTML, people which think of HTML as a "programming language" - for them w3school is the perfect product, no competition in sight.

what's the solution:

in the xoogler book "I'm feeling lucky" the author describes a case where a wrong product keeps ocurring again and again for popular product searches. they tuned the algorithm again and again, the results got better and better, but the product showed up again and again. the engineers didn't know what to do. one day, it was gone. what happened: one engineer just bought the product. it wasn't listed in the store anymore. the issues was fixed.

fazit: somebody who cares about the internet (w3, mozilla, google, adobe, microsoft, opera, ..) and profits form good, well structured HTML and working javascript should just buy it (can't be that expensive).

-----


Nested parentheses: necessary in lisp; kind of a bitch when trying to read a comment.

-----


i like the idea of buying it...the question is would you trash the site out of spite or would you remove all the crap and keep the "good" content? :\

-----




Applications are open for YC Summer 2015

Guidelines | FAQ | Support | Lists | Bookmarklet | DMCA | Y Combinator | Apply | Contact

Search: