Hacker News new | past | comments | ask | show | jobs | submit login
JQuery 1.6.2 syntax error? You may be the victim of SEO. (encosia.com)
257 points by gavingmiller on July 6, 2011 | hide | past | web | favorite | 135 comments



Hello all

I work at Google as a Webmaster Trends Analyst to help webmasters with issues like this one.

Looking into this, the first thing I noticed is that the blog.jquery.com seems to be blocking Googlebot from fetching its pages, but the site responds normally for web browsers: it returns an HTTP 500 error headers for requests using a Googlebot user agent. You can see this yourself using a public tool like Web Sniffer to fetch the page spoofed as Googlebot ( http://web-sniffer.net/?url=http://blog.jquery.com/2011/06/3... ) or using Firefox with the User Agent Switcher and Live HTTP Headers addons.

Unfortunately this is a very common problem we see. Most of the time it's a mis-configured firewall that blocks Googlebot, and sometimes it's a server-side code issue, perhaps the content management system.

Separately from that, I also notice that the blog.jquery.it URL is redirecting to the blog.jquery.com, suggesting they are fixing it on their end too.

If an jquery.com admins want more help, please post on our forums ( http://www.google.com/support/forum/p/Webmasters?hl=en ).

Cheers,

Pierre


Hi Pierre -

Thanks for the details on this! We dug into our Wordpress and realized that W3 Total Cache was configured to block anything that had the word 'bot' as part of its user agent string (sigh). That's now fixed and live.

As to the redirect - that's actually a bit of devious magic on our end. Since jquery.it is hotlinking to our JavaScript and CSS we just added a bit of JavaScript to automatically redirect them to the right jQuery.com page:

    if ( /jquery\.it/.test( window.location.hostname ) ) {
       window.location = window.location.href.replace('y.it', 'y.com'); 
    }
Thanks again for your pointers - looks like the jQuery 1.6.2 blog post is already showing up in the search results and jQuery.it is not in the first page of results (although "jquery 1.6.1" hasn't updated yet and jQuery.it is still there - I assume that it'll be remedied as the Google spider makes more progress). Thanks again!


>configured to block anything that had the word 'bot' as part of its user agent string

Yep, that'll do it :) Glad you found it and fixed it quickly!

One follow-up thought for you and anyone in this situation: Set up a Google Webmaster Tools account ( http://www.google.com/webmasters/tools/ ) and make reviewing it part of the webmasters' daily routine. In this particular case, the Crawl Errors page in the Diagnostics section would have flagged this problem very quickly.

A couple of pro tips for Webmaster Tools:

1. Be sure to look at the "date detected" column because it is accurate and somehow people miss it.

2. Set up email forwarding for messages Webmaster Tools sends to you: http://www.google.com/support/webmasters/bin/answer.py?answe...


Odd, but the Crawl Errors actually displayed "No known errors" - which is odd, since it was definitely having issues fetching as googlebot. Thanks for the suggestions and finds, though - we'll be looking at it more often for sure! :)


That's usually the case if you've just verified the site for the first time. It takes a few days for the first data to be collected. Because of this we always recommend webmasters verify early (before things break) and monitor constantly.


Of course, the best solution is to get the jquery.it domain name transferred to the genuine owner of jQuery, but that will take time.


Naturally, we're talking with our legal representation about that - we'll have to see what we can actually do. I think we're "ok" for now though as users are getting to the right place in the end.


Do you have any insight into what makes jquery.it rank so highly? I thought a pagerank of 8 was very difficult to attain. Was this done with just some SEO, or did some mega-popular site link to them by mistake?

This is the real problem here, because people do have a lot of trust that very highly-ranked results on Google won't hurt them.


I'd imagine that if they are a perfect duplicate of jquery.com, and jquery.com is constantly returning a 500 error page to the googlebot, then all the content on jquery.it would be "unique". There is so much of it, and all highly relevant content, that it's pretty much a no-brainer that it's pagerank would be so high.

That said, I find it funny that there is so much vitriol against Google in this case when, as has been noted above, it's likely a misconfiguration on jquery.com's part.


Vitriol? Certainly not from me - I think I've been perfectly civil, and I am trying to understand the situation.

I don't think your explanation holds. Pagerank is based on reputation (created by links and other means which Google isn't very specific about), more than on contents. jquery.com has a pagerank of 8, so it can't be all inaccessible to Google. The GP says that blog.jquery.com is inaccessible. So how does jquery.it get a pagerank of 8? Your explanation would look correct if jquery.com had a pagerank of 1 (due to misconfiguration) and jquery.it sneaked in with a rank of 2. But this isn't the pattern.


One factor may be unique fresh content. If jquery.com only returns 500 to googlebot, jquery.it is one of the most content rich sites with fresh content on jquery 1.6.2. When I tried it some hours ago, jquery.com had first place for the query "jquery", but was only on the second page for the more specific query "jquery 1.6.2", while the jquery.it page was still on the first page for that query, but had lost some of its freshness bonus (or my search history had pushed it down, you do no longer really know what is googles ranking and what is your own confirmation bubble these days), appearing on place 7, two places below the OPs post. The other top places where more or less shady news aggregators and download sites. [edit: right now I see the download site at heise.de as the highest ranking download site on place 2 on google.de, which is about as reputable as it can get, so google does actually do something useful here, given the misconfigured jquery.com server.]


I seems that only blog.jquery.com is misconfigured. jquery.com is OK. But it makes sense that this would be a part of the problem, since the blog has the freshest content, and the bad result does come from the blog.

However, this still doesn't explain why jquery.it has such a high pagerank. And blog.jquery.it has a pagerank of only 1. I can't find any interesting-looking links to jquery.it that would explain the high rank. It's also strange that Google returns blog.jquery.it 1st for the search "jquery 1.6.2", since those strings are found at lots of highly ranked blogs (pagerank>1 for sure). We could be dealing, once again, with Google's preference for finding search terms in the domain name. But this puzzle is still not coming together for me. This is why I'd like wiser persons than me to try to get to the bottom of this.


Just look at the backlinks to jquery.it and it should be pretty clear why that site ranks in Google for jquery-related searches.


Can you share some details about what you found? I am not much of a webmaster, but naive searches with Google don't find any massive backlinkage to jquery.it. Maybe they are already cleaning the index, or maybe I didn't search it right.

EDIT: 1 more datapoint: bing doesn't return anything from *.jquery.it for the search in question.


In case the spirit of my post isn't clear, let me be the first to acknowledge that I wouldn't have run into this problem at all if I had been less careless. The purpose of posting this was to raise awareness and hopefully help others avoid making the same blunder (and to point some links at the correct site with good anchor text).


I completely agree with the tone of your post. A spam site outranking the official site of a popular tech download is a big deal. Yes, as a web designer, you should know better, but if they can outrank jquery, they can outrank anything. Including sites meant for non-technical users. How's a non-technical user supposed to know he is making a dangerous download, if there is a nice green lock on his screen and the name of the program is familiar? How do we explain this to them?


Interestingly, I did the exact same thing today: tried downloading jquery 1.6.2, was not careful enough, got a syntax error when I used it.

Just checked, I'd gotten it from jquery.it...


Honestly it seems like the majority of the article's attention is spent criticizing Google for allowing you to download a file from a site you didn't intend to, rather than warning others.


Sadly, I foresee a future where you have to whitelist everything in order to be truly safe. You got screwed over by two characters.


That's where we were pre-Google; we called the whitelist our "bookmarks".


but it can be more automated than bookmarks. One idea, imagine if your searches searched pages across all your friends bookmarks and your own. Bookmarks isn't the right term, imagine if it were really easy to say "this I validate", then everyone that trusts you will be able to search that as well.


That's a really good idea. I wonder why Delicious never did that. They had your "friends" list and all your friends' bookmarks. I'll implement it in yamemex for sure!


I've wanted to build something like my idea for years, but it's not something you can do on the cheap. I've thought about focusing on some vertical to make it cheaper, but meh.


blekko.com looks to be doing something similar to what you described


what are they doing?


Basically they're letting people create "slashtags" that restrict a search to a customized list of sites.

Eg) I have one for /objc that searches a half-dozen high-quality, good signal-to-noise, non-spammy objective-c related sites, so that if I search for "timers /objc" I get only quality content and no farmed spam.

There's a lot more to it but they've essentially farmed out the job of whitelisting the non-spammy parts of the internet to their users.

Edit: and more to the point of what you were desribing, you can easily use other people's public slashtags, and it will detect and suggest relevant ones as you use it. It's totally worth playing around with.


That's very close, if they let me search my friends slash tags all at once it would be perfect.


On the plus side: there are a lot more interesting things a spammer could do with a copy of jQuery than accidentally include a syntax error.


As a side node: I did a diff and it seems that besides some newlines at the top and the text at the end the scripts jquery-1.6.2.js and and jquery-1.6.2.min.js are in fact identical. This could have changed at any time, so from a security standpoint, you could even consider the client page as being compromised.


Exactly. I wonder now if I need to start changing passwords for sites that use jQuery on their login pages. Good thing this did not come out as "lulzsec now has everyone's bank passwords.


HA! I started reading and researching and it just made me laugh.

Guess what you get when you go to http://duckduckgo.com/?q=jquery THATS RIGHT, an official site logo, because Gabriel is f-ing awesome and I love DDG's little almost insignificant features like showing you the official jquery website vs what you THINK the official one is, since google never helps you there.

Also I have adblocking and opt-out from google's ad tracking on so I never suffer these things. But that's what makes DDG so amazing, that opt-outs don't mater Its just so clutter free. Putting things into context vs just presenting you with data. Note that in my search results I even get the nice icons indicating if the result is spam or not. That website for the fake jquery is... well its not even on first 100 search results, may be blocked.

Thanks DDG, you just justified your existence yet again.


For what it's worth, Bing (and therefore DDG) gets the more specific search in question right too: http://www.bing.com/search?q=jquery+1.6.2


Yes and not: they point to the official jquery site only, but the first result is the main page rather than the (more interesting imo) release post.


The second result is the release post, however - on Google, the release post doesn't show up at all in the first several pages, which is just weird.


If you go to google.com and search for jquery, jquery.com takes the top stop with several sublinks and jquery.it is nowhere on the first page. It is only for the query "jquery 1.6.2" that blogs.jquery.it outranks jquery.com (because blogs.jquery.com shuts out googlebot, as mentioned elsewhere in this thred).


How is the "Official Site" authentication done?


There are far too many "official site" links for me to think it's manual... I know DDG does deep indexing of Wikipedia, perhaps it's using the target of the "Official website" link at http://en.wikipedia.org/wiki/Jquery?

Edit: Yeah, I think that's the case. These searches don't show the "Official Site" badge: http://ddg.gg/?q=jenkins+ci http://ddg.gg/?q=hudson+ci, but these do: http://ddg.gg/?q=jenkins+software http://ddg.gg/?q=hudson+software

Compare those queries to the name of the wikipedia pages: http://en.wikipedia.org/wiki/Jenkins_(software) http://en.wikipedia.org/wiki/Hudson_(software)


You are probably right. Hes probably comparing search terms to wiki information and then looking for homepage info on wikipedia. You know what, thats good enough for me.


Manually, I would imagine.


That's improbable. I know Gabriel is good, but hes not god.


I love the idea of Duck Duck Go, and would really like to use it, if only it had a different name... Maybe DDG is a solution!


We have the http://ddg.gg/ short domain.


No, "being in a hurry" doesn't excuse downloading and including jquery from the wrong website. He is lucky that there was a syntax error and the script didn't work, this could have turned out much uglier.

Edit: the headline should be: "Everybody watch out, a fraudulent jquery website ranks higher in google than the official website". The syntax error is the best thing that could happen.


No, it doesn't excuse carelessness, but that doesn't excuse Google either. As the saying goes, "Blame is a non-zero-sum game."

We developers should heed his warning to be careful about download sites. AND Google should do a better job of blacklisting spam sites like this.


Algorithmically, how would you tell the difference between a site and its illegitimate clone? It seems like it becomes a problem when it fools enough people that they create natural links to it...


Couldn't they just notice both sites are identical, and one has way more authority?

Doesn't seem that hard to me, compared to the other tasks they do.


jquery.com had the content much earlier than jquery.it - I think it's a safe bet that when there are two sites with nearly identical content, the one that had it earlier should have a higher pagerank


Perhaps a better idea is to notify the user that two sites have nearly identical content, and let the users decide.


Exactly, i don't understand how he can pin this on google. Sure, it's a problem that google ranks a ripoff over the main site (as they did with SO before), but if you install jQuery from the wrong site, not once noticing the italian TLD, that's your own fault.


Dude was careless, but that hardly excuses the blame the user mentality. People live in an ecosystem which bias behavior. Dude expected top ranking page to be the jquery site (invalid assumption) and presumably that if the top ranked site wasn't the jquery site, it wouldn't be spoofing the actual jquery site.

Dude fell for what amounts to a phishing scam. Sure he should have been on better guard, but the circumstances definitely contributed to his user error.


Looks like i pissed some people off with this. Maybe a few other people fell for this too?

"Dude was careless" is the entire point. Google should not have ranked that site above jQuery's site, but you should check that you are at least downloading your JS from the right domain.


Now imagine it's your first time getting a copy of JQuery because you've heard how awesome it is. You go to Google to find the JQuery site. How would you ever know it was a fraudulent site?

Google is providing a service that vouches for the authenticity of sites by their ranking in the search results. They failed.


Google is providing a service that gives search results based on keywords you enter.

They try to be helpful by ranking them in some fashion, but at the end of the day it's up to you how you use the results that are returned.


We know from decades of research into human psychology and the causes of accidents that our attention spans are not infinite and that we will often see what we are expecting to see. In my first glance at the spoofed site I did not see the .it in the domain, because the entire context of the site convinced my brain that I would see the URL I was expecting, blog.jquery.com.

We can either rail against the realities of human nature and persist in blaming the user, or we can accept that asking the user to check everything always is a plan guaranteed to fail, and build better tools to help eliminate the problem.

I vote for pragmatism and better tools.


Well, the Italian TLD is not really a red flag in these days of "cute" domain names. Which is the real page for BackPack?

http://backpackit.com/

http://backpack.it/

If the "wrong" one had the same content as the right one, would you know which one to pick?


The website that actually works, in this case...?


Yeah. Let's depend on people always doing the right thing. That's why even flight regulations depend on everybody being extra-careful.

Only not: http://en.wikipedia.org/wiki/Tenerife_disaster#Safety_respon...


This is very weird. jQuery.it has the same PageRank as jQuery.com (8/10). This site dates back to 2007 [1] altough it started copying the jQuery site until 2009 [2].

It also has cloned the subdomains: http://dev.jquery.it/ and http://forum.jquery.it/

What this site appears to do is mirror the content of jQuery.com by copying everything and then appending the "Time to generate" string. I just checked adblock, it also adds a Google Ad, which is the point to this.

Obviously Google has messed up big time, but also the whole web by linking so much a fake site that it has the same page rank as the original.

1: http://wayback.archive.org/web/20071001000000*/http://www.jq...

2: http://wayback.archive.org/web/20090601000000*/http://www.jq...


Its fairly easy to fake PageRank via cloaking. You basically just redirect the Google toolbar queries to the real domain. Of course you're not actually fooling Google - just people using their toolbar or similar tools to check PageRank.

That's probably what happened here. The jquery.it domain has nowhere near enough link strength to get an 8/10:

http://www.opensiteexplorer.org/jquery.it/a!links

jquery.com, for comparison:

http://www.opensiteexplorer.org/jquery.com/a!links


I think the cloaking was fixed long time ago. And Google IS being fooled-- that's why it is ranked on first on that query.


> It also has cloned the subdomains

and http://code.jquery.it. It's an exact mirror of jquery.com, the only differences are the tld, the borked download file and ads.


Google didn't just misrank this fraud; they also provided the loot (AdSense revenue) that likely motivated the crime.

They should ban not just jquery.it from both natural rankings and AdSense, but every other site on the same AdSense account and with the same registered domain owner.


And if the guy running jquery.it was smart, he'd have the custom jQuery he's hosting randomly replace people's AdSense blocks with his own to make even more money.


If he was really smart, he'd have the custom jQuery also add random links to him. To make sure that he stays at the top of the ranking.


Links written from javascript are not indexed by Google, IIRC.



All of that requires server-side coding in order to present the content provided by JS otherwise. It wouldn't do anything in this case.

Although altering jQuery to add a link at the bottom of every textarea field...


Maybe we should also reconsider the claimed neutrality of search results provided by a company that makes its money by delivering targeted advertising.


What I find incredible about this is that Google has gotten so good at returning exactly what we want that we no longer bother checking the authenticity of it, even when it's something specific. If Google was "lagging behind on the game" as some people are suggesting, this never would have happened because we wouldn't be trusting it to return what we want.


I would frame the situation a little differently: Google got good enough that we started to trust it for this purpose (which we might call a "brand-name lookup services") but then in the last few years, it has declined and become untrustworthy.

IMHO Google is very vulnerable to competition by new brand-name lookup services.


So I guess you're saying you're not feeling lucky anymore?


Speak for yourself, I always make sure I'm on the vendor's/author's official site before I download code. I never download code from 3rd parties.


Genuine question: how do you do that?


Look at the domain. If I'm downloading jQuery, it should be jquery.com. If I'm not sure what the domain should be, try to get it from someone I do trust, like the Google-hosted libraries[1]. If it's from GitHub or BitBucket, try to make sure the person who owns the repo is actually the one maintaining it (so I don't get a broken copy or one with something "extra"). It's all mostly just common sense and a little bit of extra scrutiny.

[1] http://code.google.com/apis/libraries/devguide.html


I'm not saying I would do this, I'm just kind of amazed that Google has gained so much trust that some people have stopped bothering.


This should be easy to sabotage/fix for the real jQuery.com guys. Since the fake .it-site hotlinks directly to the css and custom javascript from jquery.com they can add some code that will warn users that the site is a fake, automatically redirect all traffic to jquery.com with a 301, add canonical headers etc...


Then the fake site would just copy those as well.


That's kind of the point, is it not?


Uh not really. Copy, not hotlink. The fake site would copy the originals (no referrer, no special sauce) and would be back to the current situation: a fake site looking exactly like the real one and ranking higher on google.


It would still be possible, from the javascript itself:

    if(document.location.host != "jquery.com") {
      document.location = "http://jquery.com";
    }


What's keeping the jquery.it people from changing that?


Well, it would work for a while since I doubt the owner of the fake site is actively monitoring it. He probably has a bunch of similar scraped sites running on autopilot. Even when(if?) he notices it, he'd have to change the operation of the bot to rewrite the links. At that point the SEO would be quite damaged by the canonical tags.

If the canonical tags were added stealthily he probably wouldn't notice at all until his SEO was completely destroyed.


Maybe, but if I were writing a script to scrape and clone a site I'd probably strip canonical meta tags anyways.

There needs to be an international douche law to serve as a deterrent to this kind of behaviour.


Not only has Google ranked the site highly, it seems to be the financier too. It looks to me that this was done just for the Adsense income.

Why this was done? Here were my first few thoughts:

1. Display ad revenue. - Maybe initially, there is Adsense markup but the ads aren't showing for me so perhaps Google has disabled them.

2. Affiliate income from the links to jQuery books - I can't see an affiliate code in the links so probably not.

3. Hijacking the Donate button - No. This leads to a blank page with just a Time to Generate snippet.


I highly recommend using the Google-hosted versions of JQuery rather than downloading them and hosting them on your own. You get the benefits of faster downloads through Google's CDN and since many websites use these, they're pre-cached in browsers: http://code.google.com/apis/libraries/devguide.html#jquery


This is also another example of why having more top level domain names will make things worse, not better.


I don't think the author is to blame at all. I do stuff like this all the time - I'm working on something, google for it, download and keep on moving. I use google constantly throughout the day, and trust the top result to be authentic. As it turns out, that trust may be unwarranted.


>> If you’re in a hurry to download a copy of the latest jQuery revision

Pardon? If you download just any jQuery without even checking the domain you are downloading from, then you are very careless. That's just like typing your Paypal password into a form on a website that was linked in an email that looked like it came from Paypal...

Your copy of jQuery will be able to see anything that happens on the site you are writing, send any user password to a external server, read session keys, query your API for any data as a logged in user, etc. You could even build a botnet out of modified jQuery libraries.

Whenever you download executables, make sure you know where they are coming from!


There is a fairly simple heuristic that could combat this, which I assume is possible with Google's architecture (which I know little about, so I could be wrong here!):

* if a page A refers to the same external css and images as page B on another site, and those external resources are local to B, then assume B is more original and should be ranked higher than A.

Of course the SEO people will get around this by making sure they take copies of the css and image assets as well as the html ones once this is implemented, but at least it'll save the "target" site a little bandwidth.


> but at least it'll save the "target" site a little bandwidth.

Which really is not the issue at all. Do you also suggest that people fearing home invasions paint their walls red, so the blood splatters are hidden if they get shot?


It would also cost the scammer bandwidth. Though that is probably paid for on a stolen credit card anyway...

Unfortunately the real problem Google is unable to do much about, aside from a few high-profile things (jQuery would count as high enough profile, but many similar libraries would not). How do they know, given two apparently identical chunks of content, which is the original source?


Great find. Google is really lagging behind on the game. An italian blog should never rank higher than the jQuery site. If Google loses our trust, and we have to be on our guard all of the time, ...


I bet there's a codepath in the magic google algorithm that says, "all things being equal, favor the page that has adsense." Except normally, things arent quite this equal.


JonnieCache, that couldn't be farther than the truth. Google has recently been downgrading sites that have too much AdSense on them. The AdSense team has been at odds with the Google organic algo team for a while now.


Yeah, I wasn't being entirely serious, guess that doesn't come across. It doesn't take much thought to realise that there are lots of glaring problems with my reasoning there :)


Not necessarily, it looks to me like your comment was serious. Unfortunately a lot of people think that just because it's Google that they are going to favor sites that have AdSense on them, or even that advertising via PPC is going to help your organic ranking.


It doesn't take long at all, but there are enough tinfoil-hat folks out there that I assumed you were serious.


How is this even possible? Doesn't PageRank work by counting links? Does jquery.it have more inbound links than jquery.com?


Doesn't PageRank work by counting links?

This is a very naive view of Googles algorithms. While link count plays a significant role in ranking, there are many more factors (including various secret sauces) that determine where a given link will end up in the SERPs.

Still, it really seems odd that Google shows a fake as the top result here. I just ran the search myself and while jquery.COM ranks first for a generic 'jquery' search, the release blog post doesn't appear at all when searching for 'jquery 1.6.2 released'. The closest you get is a link just to 'blog.jquery.com' with the title "jQuery:". It makes me wonder if something went wrong with the blog release that affected the way the "Google Juice" flowed down to that post, and this fake site managed to capitalize on it.


If I now do a search at google.co.uk for "jquery 1.6.2" the linked article is the first hit.

The .IT in question site is the fourth.

Things move fast on the interweb, I can't agree with anyone claiming this is Googles fault.

With the size of the database, the breadth of queries that are done against it and the myriad of possible returns - how could they reasonably police it?


Here in the US, the the linked article is now first...and jQuery.com doesn't show up until the second page.


I would have just typed in 'jquery.com' and then clicked the link for the download page.


Because often times you don't remember the domain exactly & then you've got the same problem. Anyone remember whitehouse.com or countless other examples of this? The fact that Google is probably 99%+ effective at getting you where you want to go heavily disincentivizes relying on your brain which may not have as high a success rate as Google. Plus all the browsers have gone in this direction since search terms are more usable than domain names.


Of course. This could have happened to any site, but in this case I just found it funny because 'JQuery -> jquery.com' is so simple and obvious. I'm sure the author knew the actual domain, but used google out of habit or preference.


Exactly. Why do people trust their search engine, google, so much? And what would be the point of typing jquery.com into google?

There is an adress field you know.


What worries me about this is the original intentions of the owners of jquery.it. Obviously they planned something malicious, and were setting up for it, only random chance resulted in them getting outed before they could begin deploying their actual malicious code.

Frankly, had you been more careful, worse could have happened down the road. Still, I would be interested in seeing exactly how jquery.it made it to the top of the search listings.


What worries me about this is the original intentions of the owners of jquery.it.

Diff suggests the plan was probably "Serve adsense ads." That is the punchline to quite a bit of spam.


In short -

Beware of downloading JQuery from jquery.it, which may appear before jquery.com in related Google searches. The end.


I'm amazed at how many people jump to conclusions--and say that a bad Google search result or search spam is always the fault of SEO. SEO is not evil, and it is not all search engine spam. Most of the time the SEO that I do has to do with more with cleaning up bad or sloppy code and web design errors.


The jQuery team is well aware of this issue.


The victim of SEO?

No, the victim of a scam and Google. The site barely has any inbound links, if this is SEO, they suck.

This is a google fail, not an SEO fail.


Saying that you're a victim of "SEO" is about as accurate as saying you are a victim of "Web Development."


I think the blogger was being an idiot. You should check the authenticity of what you are downloading, not just snag it.

It's like eating a kebab dropped in the street.


Technically how should he know jquery.com is more trusted than jquery.it?

jquery.com does NOT appear to have a fully valid SSL certificate: Chrome gives me "the site's security certificate is not trusted!"

Like it or not, Google is an important part of establishing reputation -- that's what pagerank was built on initially and if that becomes worthless then finding the true source of something becomes very difficult.


jquery.com does NOT appear to have a fully valid SSL certificate

Hypothetically supposing that jquery.com had a lovely little green lock, that wouldn't matter, because on jquery.it a) you wouldn't be looking for the lovely green lock and b) if you did look for it, look here, a lovely little green lock and c) you didn't click the lovely green lock to see who it was issued to but if you did d) it was issued to jquery.it, which matches the address in your bar.

SSL solves one problem, really really nicely: it makes it impossible to eavesdrop between the user and a trusted endpoint. It does basically nothing to make sure that the trusted endpoint is the one the user thinks they are interacting with.


True -- the green lock itself wouldn't help here. I was thinking more along the lines of code signing certificates.

When I visited by bank's web site and drill into the certificate details I can at least establish that someone my browser vendors trusts (or someone they trust ...) issued the certificate to an _organization_ called 'Bank of Nova Scotia' in Toronto, not just the domain name.

If I was able to register micr0soft.com then hopefully I would have a hard time getting an SSL certificate issued for it. I know there have been a number of discussions on certificate infrastructure here that show how complex this can become.


SLL certificates bring nothing other than a false peace if mind. I've seen fake antivirus software that goes to great lengths to provide verified (!) SSL encrypted pages to steal your credit card details.


Well, that, and actually allowing SSL sessions to be encrypted without being trivially susceptible to MITM attacks.


Which fake software is this? If it's already taken control of the client side, too, couldn't it just be altering the root certificate set rather than exploiting some weakness of the union of all of the existing roots (which no doubt have many such weaknesses regardless)?


"Vista Security 2012". It can't touch the root certs as you need elevated privileges to do that. The entire thing hijacks the user's shell via the registry. You can log in as another user on the machine and it appears not to be infected.

Quite well designed really :-)


Find the Github account with the most forks and followers. That's probably the official repository and will link to the official website. They may not use Github of course, but there are other similar methods.

Other good signs are them being linked to from cdnjs, cached-commons or microjs. If I'm looking to solve a javascript itch I'll first browse these sites to see if there is a popular tool.

Also, if you're looking for jQuery then it's because you've read about it online somewhere. Simply go back and follow the links.


Well, in this specific case, the header comment in http://code.jquery.it/jquery-1.6.2.js still references jquery.com


This is an interesting point - why does jquery.com have a https version in the first place? And why did someone bother to set it up with a self-signed certificate?

Sounds like perhaps someone was testing something long back (cert was signed in 2009) and just never turned it off.


This is a great question for new users of jQuery, or indeed any software I need to download and integrate with my software/website. When a user is hit with malware, it affects just them (and maybe their email/facebook friends). If I download malware and incorporate it into my software, then I'm now distributing it to my users!

So how I do it is I look for the community. github is a good place to look. HN is itself a good source of vetting. Google, certainly, but not the first link I find. In fact, when I first heard about jQuery, I didn't assume that the "real" site could be trusted either: if I'm going to install this on my site, and serve it to people who trust me, then it had better be trustworthy.

Now imagine I run a tutorial website, and people come to my site because they trust me, and then they install software they copied from me (or my links), and distribute that to their users. Wow. Kudos to the author: I think it was bad form to blame google here, but the fact that he admitted it all does a lot to reestablish trust.


Just to be clear, I didn't distribute the janky copy of jQuery to anyone myself. I test my samples pretty thoroughly before publishing, and definitely would have caught something like this.

The situation here was that someone was using one of my samples from the jQuery 1.2 era and wanted to see if it would work with 1.6.2. He downloaded the ".it" copy of jQuery to test it with, got the syntax error when he used it in my sample, thought it was because my code didn't work right with 1.6.2, and got in touch with me about it. That's about where the post picks up at, when I rushed to grab a copy of 1.6.2 via Google and made the mistake of downloading the ".it" copy without noticing.


Sorry, I didnt meant to suggest that actually happened in the last paragraph but it does read that way on review. My apologies.


No worries. I just wanted to make sure you (or others) didn't think I was that careless.


It's not the same. If I type jQuery into google, click one of the top results and the site looks exactly like the jQuery site, I'd probably be fooled too. The domain is close enough to not catch out of the corner of your eye.


With the upcoming version of chrome, there won't even be a url bar. AFAIK firefox wants to get rid of it too.

Now is this an argument for keeping the url bar? It's obviously error-prone, but the other methods of establishing identity don't seem to be there yet either.


The URL bar will be there, it just won't be visible at all times, but it'll still appear when the page is loading or you select the tab.


Whether he should have downloaded from that site or not, it's ridiculous that it was showing up above the official site in search results. That's his main point.


I think the rediculous thing is more that he spent that much time making a blog post because he made a mistake whilst obviously trying to do something quickly without thinking.


"MAJOR BUG! Google search engine found to be open to SEO abuse!!!"

Except that this wouldn't be news. And because it wouldn't be news, the blame falls entirely on the author.

The author got fished. Kudos for letting everyone know about it. -kudos for blaming google.


Really???? Mod down???? How about a comment explaining which bit you disagree with?


What if Google put subtle warnings on websites from other CC TLDs than .com, .net or the TLD you live in?

Alternatively, Google could just work on spotting phishing and spam sites.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: