
I too know the websites you visited - oxplot
http://oxplot.github.com/visipisi/visipisi.html
======
ithrewitallaway
Throwaway account. My company created an analytics product around the ability
to track which sites your visitors have visited. It used a different and (at
the time) more reliable technique.

About a year after the product launch we were contacted by a powerful
washington based lobby group and they wanted to chat. They felt it violated a
site visitor's "reasonable expectation of privacy". I agreed. So we pulled the
feature and dodged a bullet as this "browser bug" hit the mainstream press a
few months later. The feature wasn't a major part of our product's value prop,
few of our customers used it and none missed it.

So if you're thinking about basing a startup on this, don't. You will get a
call very quickly from organizations much larger than you are asking awkward
questions.

~~~
swah
Open-source it?

~~~
evan_
I'm sure it's the trick where you make a bunch of links to various sites, set
a :visited color, and then interrogate all of the links with javascript to see
what color they are. It's no longer possible on most browsers.

------
kingkilr
Not even close, while all the sites it said I visited, I had, it missed tons
of other sites I'd visited.

~~~
burgerbrain
It said I have not visited any of them, except google (there it just says
"whoops", I'm not sure if that is a hit or not).

~~~
shdon
Yup, got the same. Then rerunning it said I'd visited almost all of the sites.

~~~
deutronium
You'd expect that, as it works by loading an image from each of those sites.

So your cache will end up with images from each of those sites.

~~~
shdon
I get what you're saying, but if the technique really worked, I'd expect it to
have told me the sites I frequently visit on the first run and a 100% score on
the second run, but it was only like 80% of that.

------
shaggyfrog
I think it got 100% for me, on Safari/Mac.

Note that it doesn't need to be 100% accurate to be effective. If it guesses
better than 50% (i.e. coin flip), then it could be used to give guesses with
at least some confidence. No different than analyzing any other noisy dataset.
Because this all works client-side, it can also be done quite invisibly.

~~~
diamondhead
there is already way better and reliable methods to accomplish same goal. e.g

[http://ajaxian.com/archives/spyjax-using-avisited-to-test-
yo...](http://ajaxian.com/archives/spyjax-using-avisited-to-test-your-history)

~~~
tmcdonald
I was under the impression that attempts had been made to hide any effects of
:visited styles on the accessible DOM to stop this from working. There's a
particularly good article on Mozilla's attempts [1], and the relevant bug on
Bugzilla. [2]

[1]: <http://dbaron.org/mozilla/visited-privacy>

[2]: <https://bugzilla.mozilla.org/show_bug.cgi?id=147777>

------
mike-cardwell
There's another trick you can use, for detecting if somebody is logged in to
certain sites. See:

[https://grepular.com/Abusing_HTTP_Status_Codes_to_Expose_Pri...](https://grepular.com/Abusing_HTTP_Status_Codes_to_Expose_Private_Information)

Although, the Google test on that page is currently broken. The Facebook and
Twitter ones aren't.

------
rmason
Apparently the key to people not knowing where you visited is to use IE. It
missed sites like Twitter and Facebook that are open for me all the time. It
did get one site correct, HN ;<).

~~~
click170
It guessed HN correctly for me the first time, but on the second run it said I
hadn't visited any of the sites, including HN. Perhaps a bug..

------
AshleysBrain
What would be a possible use of this attack? I can't think of anything useful
you'd do with knowing that you've visited Facebook. And so many people use
sites like Facebook you might get a better success rate just always returning
"visited" rather than measuring this way!

~~~
TorKlingberg
If a malicious website can tell which banking websites you have visited, it
can show a phishing page that looks just like your bank.

------
fasouto
Old but related: Using your browser URL history to estimate gender
[http://www.mikeonads.com/2008/07/13/using-your-browser-
url-h...](http://www.mikeonads.com/2008/07/13/using-your-browser-url-history-
estimate-gender/)

Seems like i'm 50% male 50% female :D

~~~
simonbrown
That's because the CSS trick it uses doesn't work any more.

It would be interesting if someone updated it to use this new trick or even
just as a Chrome extension.

------
pnathan
Nope. Only got 1 right.

Best of luck, it's an interesting concept!

~~~
joe_the_user
It's kind of comforting to see the fails here...

------
joejohnson
Again, this seems to be inaccurate for a large number of people. Can we take
these two attempts as evidence that is hard for malicious websites to discern
our browser history?

------
nolliesnom
With the exception of Facebook (which I visited this morning), the results
were accurate (Amazon, reddit, linkedin, wikipedia, youtube). Spoooky!

------
pwaring
Said I hadn't visited any of the sites, except HN (which is easy to guess
since news.ycombinator.com will be in the HTTP_REFERER field...).

If I re-run the test it still gets some sites wrong (says I haven't visited
them when in fact I have). It even claims I haven't visited Amazon both times
when in fact it's open in another tab.

------
aaronjg
I just tried it twice, once on a public wi-fi network. And then again when I
got home. It worked very well on the public wifi, and had many false positives
at home.

It seems to work better on slower internet connections. The script returns
calls a site "visited" if the response time of the potentially cached image is
less than 1/20 the time of the certainly uncached image.

On slow connections the cache is much faster than the uncached. On fast
connections it's only slightly faster. However, the known uncached images
sometimes have "10x increase in latency" so it seems that based on my (and
other's experience) that this is a major problem.

One could attempt to normalize this for the sites where appending random query
string causes higher latency. Simply precalculate the added latency from
images with the random query string on a per site basis. Then subtract it from
"uncachedTime."

------
prsimp
Doesn't appear to guess correctly in Chrome 15 on OS X (10.7.2). I'm not sure
exactly what the 'whoops' means for google - but I've obviously visited HN and
have visited a few of the others as well.

Screenshot: <http://cl.ly/1i0921270W2b1u190b0W>

------
crocowhile
Didn't work on Opera on Linux (said I never visited any of those sites)

------
copypasteweb
RequestPolicy prevents that approach.

~~~
mahmud
Damn right. Don't handout cookies to strangers unless you're a girl scout.

------
pella
the Images :

    
    
      facebook: 'https://s-static.ak.facebook.com/rsrc.php/v1/yJ/r/vOykDL15P0R.png',
    
      twitter: 'https://twitter.com/images/spinner.gif,
    
      digg:http://cdn2.diggstatic.com/img/sprites/global.5b25823e.png,
    
      reddit: 'http://www.redditstatic.com/sprite-reddit.pZL22qP4ous.png,
    
      hn: 'http://ycombinator.com/images/y18.gif,
    
      stumbleupon: 'http://cdn.stumble-upon.com/i/bg/logo_su.png,
    
      wired: 'http://www.wired.com/images/home/wired_logo.gif,
    
      ....

------
georgefox
Really interesting concept. This one wasn't as accurate for me as the original
Firefox-specifc proof of concept, though. It only picked up on YouTube and
Wikipedia. What's with the "whoops" on Google?

I do use NoScript and Ghostery, though, and I could see how that might cause
some false negatives.

------
sairamkunala
When the script is run the second time, it will show that every site was
visited. After the first visit, guess its cached and cannot figure out if its
a hit or a fail :)

Running in Chrome's incognitive mode is a bit different though. only 7 show up
cached the first time its run.

------
mahmud
It said "not visited" for EVERYTHING, except google which says "whoops". I
have visited nearly all of them in the last 2 months.

But don't despair, I have one of the most hostile browser settings. I have
RequestPolicy, NoScript,and Flashblock.

~~~
bermanoid
Same results here, and I'm not running any of the things you mentioned; also
been to many of those sites, within the past few days in many cases. I'm
basically on stock Chrome (though I do have Adblock).

------
preek
Interesting concept, but I'm quite certain that I haven't only been on xkcd.

<http://dispatched.ch/pic/visipisi-20111203-214939.jpg>

------
cf0ed2aa-bdf5
Mostly right for me except it didn't know I visited twitter and facebook (both
tabs are open right now).

That's probably due to me blocking facebook and twitter widgets on sites other
than Fb and twitter though.

------
joshfraser
In my case, the ones it got wrong were the images that returned a 304 (not
changed) header since they returned significantly faster than fetching the
full image.

------
drunkenmasta
It said no to sites I had visited. Unless that is what is was programmed to do
I'd say it did not work. You can message me for any other info about the test.

------
dhs
From the 5 sites I visited, it correctly flagged HN, WP and YT as visited, and
gave a "whoops" for FB and Google (what does that mean?), which I both
visited.

------
kevinalexbrown
I got extremely inconsistent answers on multiple runs.

~~~
mvalle
Indeed, as the second time you run it, you have visited all those sites, and
some of those images are in the cache.

The first time, I had one 'visited', the second time about half were
'visited'. I'm surprised not all of them were, though...

------
stewbrew
No, you don't. One false positive, many woops, quite a few false negatives.
After calling the script a second time, almost all guesses are wrong.

------
jamesbritt
Apparently I haven't visited HN. :)

I wonder if the use of ghostery, no-script, that sort of thing, is what
bamboozles it? Overall, it looks like it's guessing.

------
ComputerGuru
Completely wrong. Said I visited some I haven't heard of, whooops on Google,
and not visited on most of those that I've been to recently.

------
rohit89
It got all of them right for me (Chrome, Windows XP) except for twitter. I got
a "whoops" for google multiple times.

------
swah
Big miss for Twitter, but cool idea anyway.

------
Technopia
The results are not consistent. Each time I click the button it keeps changing
and also lists the wrong sites.

~~~
cf0ed2aa-bdf5
A second try gives me a 'visited' result on almost every page (except
techbuy).

The first try was pretty correct though.

------
fez
It only got 4 out of 15 correct for me.

------
meric
It says I've visited HN and Slashdot. I haven't been to slashdot this year,
but I did go to facebook...

------
ChristianMarks
Dead on for me. Chrome under Ubuntu.

------
wasd
Did not work at all for me. The other one had slightly better results. Win 7
on most recent FF.

------
davidwparker
Got only 1 for me- youtube.

Several others it said I didn't visit but I did.

And it said I visited linkedin, and I didn't.

------
tronicron
Interesting. Twitter and HN yes, Facebook and LinkedIn no. Chromium on Debian.

------
bad_user
I'm on Firefox 9.

For all the entries I got "not visited", even though I visit a lot of them.

------
aespinoza
Extremely wrong on Chrome. The first time I ran it, it said I had not visited
any of them. Google and HN were definitely browsed today.

Ran it again, _ALL_ of them appeared visited. Even sites like abebooks, which
I have not visited at all.

~~~
dangrossman
That's because the second time you ran it, all the images were in your cache
from the first time you ran it. That's the expected result.

~~~
cookiecaper
Yet another reason this test is useless. If site A uses it, it may get
partially correct data, but when you browse to site B, it will return 100%
positive, most of these being false positives.

I just don't see any practical application for this method with such high
error rates. The methods mentioned above are only valuable if you can
guarantee at least relative reliability. By and large the results have been
seemingly random, with only one or two persons reporting 100% correctness. So
what's the difference between running a test with wildly unreliable results
and just doing something randomly?

~~~
dangrossman
First, it's a proof of concept, that's all.

Even so, even without doing any work to ameliorate these flaws, it could still
be (ab)used. Don't assume that it's only useful if everyone can scan which of
the top 100 websites you've visited.

Any site could use this to check which competitors' sites have been visited.
It's unlikely anyone else has an interest in checking that information, so the
cache is not going to be poisoned by anyone else. With knowledge of which
competitors a potential customer has checked out, you could do some effective
price discrimination -- the guy looking at the $10 solutions sees your lowest
price, while the guy looking at some competing Microsoft Dynamics package
enters a more enterprisey sales funnel.

It's also useful for retargeting. Throw the code up on an ad network and you
only test for cache hits against domains of current advertisers. If there's a
hit, store it in a cookie so you don't need to check the (now filled) cache
again. You can now show ads for companies a person has already had an
interaction with, without having to cookie every visitor to the advertisers'
sites first.

It doesn't take much to come up with (mostly nefarious) uses for this, even
without perfect accuracy and even without the ability to have multiple parties
check the same URLs.

It also doesn't take much to come up with ways to improve the process. You can
ameliorate the problem of overlapping testers by having a large pool of URLs
from each site to check. The average top 1000 site probably has dozens and
dozens of images and other resources per page, each of which can be used for a
cache test.

------
wnoise
Three false negative, one false positive, and one "whoops".

------
tct
Guessed all of mine correctly with Chrome on Mac. Scary.

------
geuis
The other one worked fine on iOS. Yours failed all tests.

------
paisible
got it mostly right initially, I then visited facebook (which it said I hadn't
been to), it then told me I visited ALL the websites (except facebook).

~~~
1bertlol
that makes sense though (mostly), after you test once all the images are
cached on the second run.

------
JohnLBevan
Nice trick; though it's a once in a cache-time event.

------
mooki
Got three - one false positive. Firefox on linux

------
Detrus
only got HN for me, using Chrome 15 on OSX.

------
hackermom
Fails for me on Safari 5.1.2 / OS X. It got 1 site right, 1 site wrong, the
rest being "not visited".

------
diamondhead
Not even close, too. Instead of measuring load time, you can create "<a>"
elements verify their rendered color is the color you defined for visited
links. It's a trick of old times...

~~~
joshfraser
a trick that all modern browsers have fixed

~~~
diamondhead
hmm. I thought it's impossible to block that solution since a coder should be
able to get the computed value of a style property. I'll try it soon.

------
Craiggybear
More accurate but a couple of false positives.

Chromium on Linux

------
diamondhead
[http://ajaxian.com/archives/spyjax-using-avisited-to-test-
yo...](http://ajaxian.com/archives/spyjax-using-avisited-to-test-your-history)

come on...

~~~
joshfraser
as mentioned elsewhere in this thread, that loophole has been closed by all
modern browsers. not to say there aren't other ways to get at that
information, but it's not as simple as checking the color of a link anymore.

------
portentint
Missed about 75% for me.

~~~
sairamkunala
try running it again. it will say you visited them all !

script FAIL !

