
JQuery 1.6.2 syntax error? You may be the victim of SEO. - gavingmiller
http://encosia.com/jquery-1-6-2-syntax-error-you-may-be-the-victim-of-seo/
======
pierrefar
Hello all

I work at Google as a Webmaster Trends Analyst to help webmasters with issues
like this one.

Looking into this, the first thing I noticed is that the blog.jquery.com seems
to be blocking Googlebot from fetching its pages, but the site responds
normally for web browsers: it returns an HTTP 500 error headers for requests
using a Googlebot user agent. You can see this yourself using a public tool
like Web Sniffer to fetch the page spoofed as Googlebot ( [http://web-
sniffer.net/?url=http://blog.jquery.com/2011/06/3...](http://web-
sniffer.net/?url=http://blog.jquery.com/2011/06/30/jquery-162-released/&uak=9)
) or using Firefox with the User Agent Switcher and Live HTTP Headers addons.

Unfortunately this is a very common problem we see. Most of the time it's a
mis-configured firewall that blocks Googlebot, and sometimes it's a server-
side code issue, perhaps the content management system.

Separately from that, I also notice that the blog.jquery.it URL is redirecting
to the blog.jquery.com, suggesting they are fixing it on their end too.

If an jquery.com admins want more help, please post on our forums (
<http://www.google.com/support/forum/p/Webmasters?hl=en> ).

Cheers,

Pierre

~~~
yaakov34
Do you have any insight into what makes jquery.it rank so highly? I thought a
pagerank of 8 was very difficult to attain. Was this done with just some SEO,
or did some mega-popular site link to them by mistake?

This is the real problem here, because people do have a lot of trust that very
highly-ranked results on Google won't hurt them.

~~~
josegonzalez
I'd imagine that if they are a perfect duplicate of jquery.com, and jquery.com
is constantly returning a 500 error page to the googlebot, then all the
content on jquery.it would be "unique". There is so much of it, and all highly
relevant content, that it's pretty much a no-brainer that it's pagerank would
be so high.

That said, I find it funny that there is so much vitriol against Google in
this case when, as has been noted above, it's likely a misconfiguration on
jquery.com's part.

~~~
yaakov34
Vitriol? Certainly not from me - I think I've been perfectly civil, and I am
trying to understand the situation.

I don't think your explanation holds. Pagerank is based on reputation (created
by links and other means which Google isn't very specific about), more than on
contents. jquery.com has a pagerank of 8, so it can't be all inaccessible to
Google. The GP says that blog.jquery.com is inaccessible. So how does
jquery.it get a pagerank of 8? Your explanation would look correct if
jquery.com had a pagerank of 1 (due to misconfiguration) and jquery.it sneaked
in with a rank of 2. But this isn't the pattern.

------
Encosia
In case the spirit of my post isn't clear, let me be the first to acknowledge
that I wouldn't have run into this problem at all if I had been less careless.
The purpose of posting this was to raise awareness and hopefully help others
avoid making the same blunder (and to point some links at the correct site
with good anchor text).

~~~
mrcharles
Sadly, I foresee a future where you have to whitelist everything in order to
be truly safe. You got screwed over by two characters.

~~~
kragen
That's where we were pre-Google; we called the whitelist our "bookmarks".

~~~
jshen
but it can be more automated than bookmarks. One idea, imagine if your
searches searched pages across all your friends bookmarks and your own.
Bookmarks isn't the right term, imagine if it were really easy to say "this I
validate", then everyone that trusts you will be able to search that as well.

~~~
kragen
That's a really good idea. I wonder why Delicious never did that. They had
your "friends" list and all your friends' bookmarks. I'll implement it in
yamemex for sure!

~~~
jshen
I've wanted to build something like my idea for years, but it's not something
you can do on the cheap. I've thought about focusing on some vertical to make
it cheaper, but meh.

~~~
rokekr
blekko.com looks to be doing something similar to what you described

~~~
jshen
what are they doing?

~~~
msbarnett
Basically they're letting people create "slashtags" that restrict a search to
a customized list of sites.

Eg) I have one for /objc that searches a half-dozen high-quality, good signal-
to-noise, non-spammy objective-c related sites, so that if I search for
"timers /objc" I get only quality content and no farmed spam.

There's a lot more to it but they've essentially farmed out the job of
whitelisting the non-spammy parts of the internet to their users.

Edit: and more to the point of what you were desribing, you can easily use
other people's public slashtags, and it will detect and suggest relevant ones
as you use it. It's totally worth playing around with.

~~~
jshen
That's very close, if they let me search my friends slash tags all at once it
would be perfect.

------
patio11
On the plus side: there are a lot more interesting things a spammer could do
with a copy of jQuery than accidentally include a syntax error.

~~~
VMG
As a side node: I did a diff and it seems that besides some newlines at the
top and the text at the end the scripts jquery-1.6.2.js and and
jquery-1.6.2.min.js are in fact identical. This could have changed at any
time, so from a security standpoint, you could even consider the client page
as being compromised.

------
dlikhten
HA! I started reading and researching and it just made me laugh.

Guess what you get when you go to <http://duckduckgo.com/?q=jquery> THATS
RIGHT, an official site logo, because Gabriel is f-ing awesome and I love
DDG's little almost insignificant features like showing you the official
jquery website vs what you THINK the official one is, since google never helps
you there.

Also I have adblocking and opt-out from google's ad tracking on so I never
suffer these things. But that's what makes DDG so amazing, that opt-outs don't
mater Its just so clutter free. Putting things into context vs just presenting
you with data. Note that in my search results I even get the nice icons
indicating if the result is spam or not. That website for the fake jquery
is... well its not even on first 100 search results, may be blocked.

Thanks DDG, you just justified your existence yet again.

~~~
retube
How is the "Official Site" authentication done?

~~~
callahad
There are far too many "official site" links for me to think it's manual... I
know DDG does deep indexing of Wikipedia, perhaps it's using the target of the
"Official website" link at <http://en.wikipedia.org/wiki/Jquery>?

Edit: Yeah, I think that's the case. These searches don't show the "Official
Site" badge: <http://ddg.gg/?q=jenkins+ci> <http://ddg.gg/?q=hudson+ci>, but
these do: <http://ddg.gg/?q=jenkins+software>
<http://ddg.gg/?q=hudson+software>

Compare those queries to the name of the wikipedia pages:
<http://en.wikipedia.org/wiki/Jenkins_(software)>
<http://en.wikipedia.org/wiki/Hudson_(software)>

~~~
dlikhten
You are probably right. Hes probably comparing search terms to wiki
information and then looking for homepage info on wikipedia. You know what,
thats good enough for me.

------
VMG
No, "being in a hurry" doesn't excuse downloading and including jquery from
the wrong website. He is lucky that there was a syntax error and the script
didn't work, this could have turned out much uglier.

Edit: the headline should be: "Everybody watch out, a fraudulent jquery
website ranks higher in google than the official website". The syntax error is
the best thing that could happen.

~~~
mtogo
Exactly, i don't understand how he can pin this on google. Sure, it's a
problem that google ranks a ripoff over the main site (as they did with SO
before), but if you _install jQuery from the wrong site_ , not once noticing
the italian TLD, that's your own fault.

~~~
knowtheory
Dude was careless, but that hardly excuses the blame the user mentality.
People live in an ecosystem which bias behavior. Dude expected top ranking
page to be the jquery site (invalid assumption) and presumably that if the top
ranked site wasn't the jquery site, it wouldn't be spoofing the actual jquery
site.

Dude fell for what amounts to a phishing scam. Sure he should have been on
better guard, but the circumstances definitely contributed to his user error.

~~~
mtogo
Looks like i pissed some people off with this. Maybe a few other people fell
for this too?

"Dude was careless" is the entire point. Google should not have ranked that
site above jQuery's site, but you should check that you are at least
downloading your JS from _the right domain_.

~~~
SoftwareMaven
Now imagine it's your first time getting a copy of JQuery because you've heard
how awesome it is. You go to Google to find the JQuery site. _How would you
ever know it was a fraudulent site?_

Google is providing a service that vouches for the authenticity of sites by
their ranking in the search results. They failed.

~~~
AndyJPartridge
Google is providing a service that gives search results based on keywords you
enter.

They try to be helpful by ranking them in some fashion, but at the end of the
day it's up to you how you use the results that are returned.

------
rsoto
This is very weird. jQuery.it has the same PageRank as jQuery.com (8/10). This
site dates back to 2007 [1] altough it started copying the jQuery site until
2009 [2].

It also has cloned the subdomains: <http://dev.jquery.it/> and
<http://forum.jquery.it/>

What this site appears to do is mirror the content of jQuery.com by copying
everything and then appending the "Time to generate" string. I just checked
adblock, it also adds a Google Ad, which is the point to this.

Obviously Google has messed up big time, but also the whole web by linking so
much a fake site that it has the same page rank as the original.

1:
[http://wayback.archive.org/web/20071001000000*/http://www.jq...](http://wayback.archive.org/web/20071001000000*/http://www.jquery.it/)

2:
[http://wayback.archive.org/web/20090601000000*/http://www.jq...](http://wayback.archive.org/web/20090601000000*/http://www.jquery.it/)

~~~
qeorge
Its fairly easy to fake PageRank via cloaking. You basically just redirect the
Google toolbar queries to the real domain. Of course you're not actually
fooling Google - just people using their toolbar or similar tools to check
PageRank.

That's probably what happened here. The jquery.it domain has nowhere near
enough link strength to get an 8/10:

<http://www.opensiteexplorer.org/jquery.it/a!links>

jquery.com, for comparison:

<http://www.opensiteexplorer.org/jquery.com/a!links>

~~~
rsoto
I think the cloaking was fixed long time ago. And Google _IS_ being fooled--
that's why it is ranked on first on that query.

------
gojomo
Google didn't just misrank this fraud; they also provided the loot (AdSense
revenue) that likely motivated the crime.

They should ban not just jquery.it from both natural rankings and AdSense, but
every other site on the same AdSense account and with the same registered
domain owner.

~~~
getsat
And if the guy running jquery.it was smart, he'd have the custom jQuery he's
hosting randomly replace people's AdSense blocks with his own to make even
more money.

~~~
btilly
If he was really smart, he'd have the custom jQuery also add random links to
him. To make sure that he stays at the top of the ranking.

~~~
slig
Links written from javascript are not indexed by Google, IIRC.

~~~
alecco
[http://code.google.com/web/ajaxcrawling/docs/getting-
started...](http://code.google.com/web/ajaxcrawling/docs/getting-started.html)

~~~
untog
All of that requires server-side coding in order to present the content
provided by JS otherwise. It wouldn't do anything in this case.

Although altering jQuery to add a link at the bottom of every textarea
field...

------
tghw
What I find incredible about this is that Google has gotten so good at
returning exactly what we want that we no longer bother checking the
authenticity of it, even when it's something specific. If Google was "lagging
behind on the game" as some people are suggesting, this never would have
happened because we wouldn't be trusting it to return what we want.

~~~
ams6110
Speak for yourself, I always make sure I'm on the vendor's/author's official
site before I download code. I never download code from 3rd parties.

~~~
ojilles
Genuine question: how do you do that?

~~~
tghw
Look at the domain. If I'm downloading jQuery, it should be jquery.com. If I'm
not sure what the domain should be, try to get it from someone I do trust,
like the Google-hosted libraries[1]. If it's from GitHub or BitBucket, try to
make sure the person who owns the repo is actually the one maintaining it (so
I don't get a broken copy or one with something "extra"). It's all mostly just
common sense and a little bit of extra scrutiny.

[1] <http://code.google.com/apis/libraries/devguide.html>

------
kristofferR
This should be easy to sabotage/fix for the real jQuery.com guys. Since the
fake .it-site hotlinks directly to the css and custom javascript from
jquery.com they can add some code that will warn users that the site is a
fake, automatically redirect all traffic to jquery.com with a 301, add
canonical headers etc...

~~~
masklinn
Then the fake site would just copy those as well.

~~~
SlyShy
That's kind of the point, is it not?

~~~
masklinn
Uh not really. _Copy_ , not hotlink. The fake site would copy the originals
(no referrer, no special sauce) and would be back to the current situation: a
fake site looking exactly like the real one and ranking higher on google.

~~~
mnutt
It would still be possible, from the javascript itself:

    
    
        if(document.location.host != "jquery.com") {
          document.location = "http://jquery.com";
        }

~~~
meatmanek
What's keeping the jquery.it people from changing that?

------
AlexMuir
Not only has Google ranked the site highly, it seems to be the financier too.
It looks to me that this was done just for the Adsense income.

Why this was done? Here were my first few thoughts:

1\. Display ad revenue. - Maybe initially, there is Adsense markup but the ads
aren't showing for me so perhaps Google has disabled them.

2\. Affiliate income from the links to jQuery books - I can't see an affiliate
code in the links so probably not.

3\. Hijacking the Donate button - No. This leads to a blank page with just a
Time to Generate snippet.

------
gaborcselle
I highly recommend using the Google-hosted versions of JQuery rather than
downloading them and hosting them on your own. You get the benefits of faster
downloads through Google's CDN and since many websites use these, they're pre-
cached in browsers:
<http://code.google.com/apis/libraries/devguide.html#jquery>

------
po
This is also another example of why having more top level domain names will
make things worse, not better.

------
czDev
I don't think the author is to blame at all. I do stuff like this all the time
- I'm working on something, google for it, download and keep on moving. I use
google constantly throughout the day, and trust the top result to be
authentic. As it turns out, that trust may be unwarranted.

------
yaix
>> If you’re in a hurry to download a copy of the latest jQuery revision

Pardon? If you download just any jQuery without even checking the domain you
are downloading from, then you are very careless. That's just like typing your
Paypal password into a form on a website that was linked in an email that
looked like it came from Paypal...

Your copy of jQuery will be able to see anything that happens on the site you
are writing, send any user password to a external server, read session keys,
query your API for any data as a logged in user, etc. You could even build a
botnet out of modified jQuery libraries.

Whenever you download executables, make sure you know where they are coming
from!

------
dspillett
There is a fairly simple heuristic that could combat this, which I assume is
possible with Google's architecture (which I know little about, so I could be
wrong here!):

* if a page A refers to the same external css and images as page B on another site, and those external resources are local to B, then assume B is more original and should be ranked higher than A.

Of course the SEO people will get around this by making sure they take copies
of the css and image assets as well as the html ones once this is implemented,
but at least it'll save the "target" site a little bandwidth.

~~~
masklinn
> but at least it'll save the "target" site a little bandwidth.

Which really is not the issue at all. Do you also suggest that people fearing
home invasions paint their walls red, so the blood splatters are hidden if
they get shot?

~~~
dspillett
It would also cost the scammer bandwidth. Though that is probably paid for on
a stolen credit card anyway...

Unfortunately the real problem Google is unable to do much about, aside from a
few high-profile things (jQuery would count as high enough profile, but many
similar libraries would not). How do they know, given two apparently identical
chunks of content, which is the original source?

------
ljlolel
Great find. Google is really lagging behind on the game. An italian blog
should never rank higher than the jQuery site. If Google loses our trust, and
we have to be on our guard all of the time, ...

------
JonnieCache
I bet there's a codepath in the magic google algorithm that says, "all things
being equal, favor the page that has adsense." Except normally, things arent
quite this equal.

~~~
bhartzer
JonnieCache, that couldn't be farther than the truth. Google has recently been
downgrading sites that have too much AdSense on them. The AdSense team has
been at odds with the Google organic algo team for a while now.

~~~
JonnieCache
Yeah, I wasn't being entirely serious, guess that doesn't come across. It
doesn't take much thought to realise that there are lots of glaring problems
with my reasoning there :)

~~~
bhartzer
Not necessarily, it looks to me like your comment was serious. Unfortunately a
lot of people think that just because it's Google that they are going to favor
sites that have AdSense on them, or even that advertising via PPC is going to
help your organic ranking.

------
MatthewPhillips
How is this even possible? Doesn't PageRank work by counting links? Does
jquery.it have more inbound links than jquery.com?

~~~
AgentConundrum
_Doesn't PageRank work by counting links?_

This is a very naive view of Googles algorithms. While link count plays a
significant role in ranking, there are many more factors (including various
secret sauces) that determine where a given link will end up in the SERPs.

Still, it really seems odd that Google shows a fake as the top result here. I
just ran the search myself and while jquery.COM ranks first for a generic
'jquery' search, the release blog post doesn't appear at all when searching
for 'jquery 1.6.2 released'. The closest you get is a link just to
'blog.jquery.com' with the title "jQuery:". It makes me wonder if something
went wrong with the blog release that affected the way the "Google Juice"
flowed down to that post, and this fake site managed to capitalize on it.

------
AndyJPartridge
If I now do a search at google.co.uk for "jquery 1.6.2" the linked article is
the first hit.

The .IT in question site is the fourth.

Things move fast on the interweb, I can't agree with anyone claiming this is
Googles fault.

With the size of the database, the breadth of queries that are done against it
and the myriad of possible returns - how could they reasonably police it?

~~~
kposehn
Here in the US, the the linked article is now first...and jQuery.com doesn't
show up until the second page.

------
iter8n
I would have just typed in 'jquery.com' and then clicked the link for the
download page.

~~~
btucker
Because often times you don't remember the domain exactly & then you've got
the same problem. Anyone remember whitehouse.com or countless other examples
of this? The fact that Google is probably 99%+ effective at getting you where
you want to go heavily disincentivizes relying on your brain which may not
have as high a success rate as Google. Plus all the browsers have gone in this
direction since search terms are more usable than domain names.

~~~
iter8n
Of course. This could have happened to any site, but in this case I just found
it funny because 'JQuery -> jquery.com' is so simple and obvious. I'm sure the
author knew the actual domain, but used google out of habit or preference.

------
mrcharles
What worries me about this is the original intentions of the owners of
jquery.it. Obviously they planned something malicious, and were setting up for
it, only random chance resulted in them getting outed before they could begin
deploying their actual malicious code.

Frankly, had you been more careful, worse could have happened down the road.
Still, I would be interested in seeing exactly how jquery.it made it to the
top of the search listings.

~~~
patio11
_What worries me about this is the original intentions of the owners of
jquery.it._

Diff suggests the plan was probably "Serve adsense ads." That is the punchline
to quite a bit of spam.

------
eps
In short -

Beware of downloading JQuery from jquery.it, which may appear before
jquery.com in related Google searches. The end.

------
bhartzer
I'm amazed at how many people jump to conclusions--and say that a bad Google
search result or search spam is always the fault of SEO. SEO is not evil, and
it is not all search engine spam. Most of the time the SEO that I do has to do
with more with cleaning up bad or sloppy code and web design errors.

------
EricR23
The jQuery team is well aware of this issue.

------
scottkrager
The victim of SEO?

No, the victim of a scam and Google. The site barely has any inbound links, if
this is SEO, they suck.

This is a google fail, not an SEO fail.

------
fendmark
Saying that you're a victim of "SEO" is about as accurate as saying you are a
victim of "Web Development."

------
chrisjsmith
I think the blogger was being an idiot. You should check the authenticity of
what you are downloading, not just snag it.

It's like eating a kebab dropped in the street.

~~~
aquark
Technically how should he know jquery.com is more trusted than jquery.it?

jquery.com does NOT appear to have a fully valid SSL certificate: Chrome gives
me "the site's security certificate is not trusted!"

Like it or not, Google is an important part of establishing reputation --
that's what pagerank was built on initially and if that becomes worthless then
finding the true source of something becomes very difficult.

~~~
patio11
_jquery.com does NOT appear to have a fully valid SSL certificate_

Hypothetically supposing that jquery.com had a lovely little green lock, that
wouldn't matter, because on jquery.it a) you wouldn't be looking for the
lovely green lock and b) if you did look for it, look here, a lovely little
green lock and c) you didn't click the lovely green lock to see who it was
issued to but if you did d) it was issued to jquery.it, which matches the
address in your bar.

SSL solves one problem, really really nicely: it makes it impossible to
eavesdrop between the user and a trusted endpoint. It does basically nothing
to make sure that the trusted endpoint is the one the user thinks they are
interacting with.

~~~
aquark
True -- the green lock itself wouldn't help here. I was thinking more along
the lines of code signing certificates.

When I visited by bank's web site and drill into the certificate details I can
at least establish that someone my browser vendors trusts (or someone they
trust ...) issued the certificate to an _organization_ called 'Bank of Nova
Scotia' in Toronto, not just the domain name.

If I was able to register micr0soft.com then _hopefully_ I would have a hard
time getting an SSL certificate issued for it. I know there have been a number
of discussions on certificate infrastructure here that show how complex this
can become.

~~~
chrisjsmith
SLL certificates bring nothing other than a false peace if mind. I've seen
fake antivirus software that goes to great lengths to provide verified (!) SSL
encrypted pages to steal your credit card details.

~~~
premchai21
Which fake software is this? If it's already taken control of the client side,
too, couldn't it just be altering the root certificate set rather than
exploiting some weakness of the union of all of the existing roots (which no
doubt have many such weaknesses regardless)?

~~~
chrisjsmith
"Vista Security 2012". It can't touch the root certs as you need elevated
privileges to do that. The entire thing hijacks the user's shell via the
registry. You can log in as another user on the machine and it appears not to
be infected.

Quite well designed really :-)

------
ldar15
"MAJOR BUG! Google search engine found to be open to SEO abuse!!!"

Except that this wouldn't be news. And because it wouldn't be news, the blame
falls entirely on the author.

The author got fished. Kudos for letting everyone know about it. -kudos for
blaming google.

~~~
ldar15
Really???? Mod down???? How about a comment explaining which bit you disagree
with?

------
lhnn
What if Google put subtle warnings on websites from other CC TLDs than .com,
.net or the TLD you live in?

Alternatively, Google could just work on spotting phishing and spam sites.

