Hacker News new | past | comments | ask | show | jobs | submit login
How not to do URL redirects (… the way Quora does) (webengage.com)
72 points by acharekar on Jan 19, 2012 | hide | past | favorite | 52 comments



Best I can tell, there is zero incentive for Quora (or any other site, for that matter) to care. Their current redirect logic in no way hurts their user experience.

Right now they protect their users' privacy. What benefit do they realize by providing their users' viewing history to other sites?

I personally think that the referer header was never a good idea. I disable it in my browser, and appreciate sites that do right by their users with privacy protecting default behaviors.


I think that is does benefit Quora for content providers to see how much traffic is being generated from their site. If I knew an article was getting a lot of traction on a site I would spend more time on there, perhaps participate and continue to improve and generate content itself, thus benefiting Quora with more data and more links for everyone.


Of course there is zero incentive for anyone to do it. And if everyone chose to link the way Quora does, you get a Google Analytics dashboard which cannot tell you what all URL's are sending traffic to your site/blog. I find it really difficult to imagine.


The long term effect would be that websites can no longer use referrer as a metric. What difference would that make? HTTP resources (webpages) shouldn't change semantic meaning depending on the referrer anyway. Doing so is arguably an unintended use (or abuse) of HTTP.


Absolutely! And see the funny reasons people have been citing in favor of such an act - http://www.quora.com/Why-does-Quora-redirect-to-URLs-in-a-wa...


I hate to crash the party, but why is the premise that "overriding links is absolutely okay" taken for granted?

Says who? Google and their `/url`? Facebook and their `l.php`?


Why wouldn't it be okay? This is a link on their own website, they can control it how they want.


That's the point. It is okay as long as they play nice with HTTP headers and other info which needs to be passed downstream.


Quora don't "need" to do anything. You just want them to.


On sites where there's private information in the URLs + links to external sites, overriding the referrer is necessary in order to protect users' privacy / identity. See https://www.facebook.com/notes/facebook-engineering/protecti... for why we do this at Facebook (I work on that system).


Quora 's question pages are community-built public pages. If they have private urls in account pages, they should limit referral protection to them. Also, i wonder, doesn't the 200 OK code confuse search bots?


Indeed, referers are useful information in some cases. For bookmarking apps like http://noteplz.com one useful thing is that along with the bookmark, they also store the referer, so you can later go back to the google search result where you found that bookmark.

On the other hand, with https and url shorteners,referers are a dying breed. The situation with URL shorteners is absurdly funny now, because twitter double-shortens the shortened urls, since most popular sites have their own shortener.


I don't think url shorteners really hurt referrers at all. They typically use 301 or 302 redirects, which preserve the original referrer.


... however since twitter is under https, all of them end up being "t.co". And even if it wasn't https, a double redirect would lose the referrer


Tracking helps you build great analytics. I, as a developer, would have otherwise no idea of what's happening in my app.


Do it client-side. Don't break the web.


Doing it client side has a couple issues.

1. you need to block the click event until you get a response from your analytics endpoint. Google suggests doing this by adding a 100ms delay: http://support.google.com/googleanalytics/bin/answer.py?hl=e...

2. you might get holes in your data for a number of reasons: the user has JS turned off; 100ms isn't long enough for the request to go through; or the user might click off before your script can attach itself to the onclick event.

You definitely don't want to get yourself in a situation where you go down and all outbound links stop working, but if you can fail gracefully, replacing the link makes a lot more sense.


Not to mention that your analytics code won't fire if the user opens a link by any means other than a standard left-click (e.g. middle-click, right-click -> open in new window/tab, or keyboard navigation)


Why does the delay matter? They have to wait for a response regardless of whether it's an AJAX request or a full browser redirect to the record/redirect URL.


This is probably not the case, but is it possible that Quora is intentionally stripping the referer header? Duck Duck Go does just this in the interest of user privacy: why should site X know where I came from and what I was searching? https://duckduckgo.com/privacy.html Seems unlikely in this case but possible.

Incidentally, it seems that encrypted.google.com does this but not regular google. EDIT: This happens for all https->http requests, it's not a google feature (TIL).


The User-Agent generates the Referrer header, not the site. Also, encrypted.google.com doesn't do it, the HTTPS standard says that browsers shouldn't send referrer headers to sites not in the same domain or not with https.


You are right, I'm writing carelessly. I meant strip loosely as "causes the header to not be sent" or not in full.


encrypted.google.com does this because it uses https. If a website is accessed from https and a link points to anywhere except another secure location, then the referrer is not sent.


I don't see how this could be a result of simple mistake. There doesn't seem to be any reason to do redirects this way except hiding the referrer.


Exactly what is pointed out in the post. Why would someone want to hide the original referrer for a link.


That's not the only thing they're doing...

http://nerdr.com/quora-needs-to-die/

Seems they're going down the annoying search visitors by hiding information route (similar to what expertsexchange was riled on for, although not quite as bad yet).


It's most likely done intentionally to protect against leaking the clicker's identity. See the issue Facebook had back then: http://www.benedelman.org/news/052010-1.html


Sending an incorrect site referrer to a downstream website doesn't solve the identity problem! HTTP headers have existed even before all these applications came into being. One just has to abide by some of those basics.


It can be fixed through a double redirect. Basically, redirect the browser to a internal page that redirects to the original page and have that page redirect to the outbound link.

For example:

Say you're on this page: http://site.com/article?_uid=123 (_uid being the identity leaking query param) and clicked a link that appears to point to: http://google.com/

When a user clicks on that link, the page redirect the user to http://site.com/redirect?target=http%3A%2F%2Fgoogle.com&...

The server will then redirect the browser back to: http://site.com/article

And when the server sees that request with referrer set to /redirect?target=http%3A%2F%2Fgoogle.com, it will then parse out the target url and redirect the browser to http://google.com.

This way, the target url can be given a meaningful referrer url without compromising user's identity.


Isn't that exactly what Quora is doing?


OP's blog post says Quora is not doing that. It says Quora's redirecting to gigaom.com from http://www.quora.com/_/redirect?url=http%3A%2F%2Fgigaom.com%... instead of http://www.quora.com/What-are-everyday-apps-that-use-cloud-c....

The technique I described allows Quora to customize the referrer associated with an outbound link.


Ah yes, I misread your post. The trouble with that approach is that you have to enumerate the dangerous params, and if the actual page URL needs a private parameter to work, you can't get rid of it.


Right, but you can always pass the canonical url to the redirector. That lets you avoid maintaining a whitelist/blacklist of query params. This should be trivial for Quora as most of their pages already contain the meta tag specifying the canonical url:

    <link rel="canonical" href="http://www.quora.com/What-are-everyday-apps-that-use-cloud-computing" />
They just need to update their outbound link interceptor to take that version instead of the actual url.


We let you create surveys and display those on your website in a “targeted” manner

A better title for your article would have been:

why to never rely on referers

(which can be blocked or purposely malformed)


Absolutely! The post might have got some attention from Quora in that case :)


Would we get the right referer if 302 is done via quora redirect?


Not sure if I understood this correctly. If Quora chose to send a Location: some-url and Status: 302, it would have definitely worked as expected.


So what should an app do if it wants ro track all outbound links and send the real url as referer to the outbound link


I've described a solution in a different comment on this thread. For each outbound link on the page, build a link that points to a redirector that accepts two query parameters: current page's canonical URL and outbound link's URL. The redirector will redirect the browser back to the canonical URL. Upon receiving the request for the canonical URL, instead of serving normal content, the server redirects the browser to the outbound link's URL on the condition that its referrer came from the redirector. This way, the outbound link gets the correct referrer without using any javascript wizardry. In fact, you can use this technique to customize the referrer to whatever you want.

1. Browser visits http://a.com/pages/3?privacy_leaking_param=1

2. User clicks on an outbound link: http://b.com/

3. Browser gets redirected to redirector at:

    http://a.com/redirect?canonical_url=http%3A%2F%2Fa.com%2Fpages%2F3&outbound_url=http%3A%2F%2Fb.com%2F

    "canonical_url" is set to "http://a.com/pages/3"
    "outbound_url" is set to "http://b.com/"
4. Redirector logs the request and redirects browser to canonical_url (i.e. "http://a.com/pages/3)

5. Code behind http://a.com/pages/3 checks the referrer to see if it came from the redirector.

5a. If it is, parse the outbound_url from the referrer URL and redirect the browser to that URL.

5b. If it isn't, serve normal content.

Basically, every content page needs to also act as a redirector and only redirects when the referrer indicates that the previous request came from the redirector.


When a user submits a link and before inserting it into HTML, URL encode it and append it to a generic redirector, such as

www.example.com/redirect?url=http%3A%2F%2Fwww.example.net

www.example.com/redirect should record url and return 302 with Location set to www.example.net.


Can’t you track it (on the server) on the page that sends the 302 response?

Another option would be to link to the real URL, and make a synchronous XHR from JavaScript (to your server) when the link is clicked.


Upgrade the links with jQuery. Here's a simplified version of what I use:

    $("a").bind("mousedown", function(e) { 
        $(this).data("href", $(this).attr("href"));
        $(this).attr("href","http://example.com/redirect?url=" + $(this).attr("href"));
    });
    $("a").bind("mouseup",function(e) {
        var el = $(this);
        setTimeout(function() {
            el.attr("href", el.data("href"));
        },10);
    });

This works by switching the url when a user clicks a link to your redirect url, then switching it back a fraction of a second after they mouse up. This means that your redirect works even if the user right clicks and opens in a new window / tab and when a user hovers over a link, they still see the normal URL in the status bar.

On the /redirect url just log any data you need and send a 301 or 302 redirect. The destination site will see your original page as a referrer, not your redirect url.


It doesn't work for keyboard access in the sense that you don't insert the redirect, but at least the link still takes them to the right place.

Seems like the original link following the `url=` should be processed by encodeURIComponent or else any original urls with chars like ""&" will break.


The best way to do it is probably to track clicks on outbound links using javascript.


Aren't there a few cases when this method won't work?


Are you referring to the fact that the browser will interrupt your tracking request because it already started loading the linked page? I haven't really tried, but I believe this can be dealt with if your server-side code expects it to happen.


Since you are a hosted service, you could periodically loop through all of the Quora redirect links you've received and resolve them. This might be against Quora's TOS, though.

I believe Twitter does this with URL shortener links posted in tweets.


Has anyone asked on Quora, why Quora does this?


Someone finally asked the question on Quora - http://www.quora.com/Why-does-Quora-redirect-to-URLs-in-a-wa...


A tweet was sent to Quora engineering - https://twitter.com/#!/Sengupta/status/160044848697704448. Could not find any question on Quora though.


So Quora works for you now? That must be nice...


Seems you saw a Quora survey on our site? We had to change the targeting rules to make it a generic "referring site starts with Quora.com" kinda rule instead of specific URL's :(




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: