Hacker News new | comments | show | ask | jobs | submit login
You’re not anonymous. I know your name, email, and company. (42floors.com)
710 points by theinfonaut 1320 days ago | hide | past | web | 231 comments | favorite



Several people mention Ghostery[0] against trackers. It offers only partial protection. It is possible to fingerprint a browser without any custom tracking data.

https://panopticlick.eff.org/ <-- check how unique your browser is.

Instead of a script to embed, these firms could provide an API to identify users from the server side. The scripts that captures the profile would be served by the sites themselves rather than from third party services.

Toast.

A possible solution would be anonymize the browser fingerprint, at least in private mode, ie lie about the details of the system.

Google, Mozilla, Opera, can you hear me?

--

[0] http://www.ghostery.com/


Not only does panopticlick say my browser is unique, but it said that last time I visited it, several months ago.

Maybe it's forgotten. Maybe it lies. Maybe every time I rev Firefox Nightly I change identity.

What is true is that every time I leave an email address, it's tagged with the name of the site where I left it.


It does indeed use version information, not just the browser's but also of every installed plugin including Flash. So a single rev is going to give a new fingerprint. Plus, far fewer people are checking it as when it first launched, making those who do check it from time to time even more unique.

But note that a constantly changing fingerprint doesn't make it useless for tracking - if a site can keep any kind of cookie to persist between browser updates, it could add the updated signature. Then when you purge cookies and persistent storage, a site can re-add the cookie to keep on identifying you if your signature hasn't changed.

You'd have to purge all persistant storage at the same time as an update to avoid this, and even then (or if you never had persistent data to begin with) your IP or even geographical location will likely be enough to identify you again.


The gmail + trick can be defeated, though. If you own a domain, you can use arbitrary addresses and it becomes more reliable.

I use Google Apps (possibly not a good idea but soo convenient :/) with a catchall. I didn't catch any offender so far...

However (off topic), spammers are spoofing addresses as if they were coming from my domain. I receive two to three dozens automatic replies from mail servers (this address does not exist...).

I've properly set up DKIM and SPF records, making it obvious that these mails are spoofed, but I'm afraid my domain will end up on grey/black lists... Anyone out here familiar with this kind of issue?


The + trick is becoming well known with spammers anyway. I registered with one site as "something+else@example.com" having never used the "something@example.com" address anywhere and within a few days I was getting spam to "something@example.com" and "something+else@example.com". Whoever got that list of emails was clever enough to know to try removing the + portion.

Obviously you can never give out the "something@example.com" address and then assume that everything that goes to that address must be spam, but I've had legitimate contact from companies who have had to email me by removing the + portion because their internal email system wouldn't allow addresses with a + in them.


They're getting a little wise, but that didn't stop me from receiving spam to "else@example.com" a few weeks ago.


I've been using catch all's for years. I get 100+ bounced spam emails a day from people spoofing my domains (also have DKIM + SPF). AFAIK none of my domains have ever ended up on blacklists and no one has had trouble receiving legit email from me.


Good to know, thanks.


fastmail.fm lets you set up wildcard mx records -- so if your email is user@example.com, you can get email at company1@user.example.doc, company2@user.example.com, etc. instead of user+company1@example.com. I imagine other email providers do this too.


Yes, I have my own domain and have been doing this for years. It generally works as expected, and so far I've had at least four unique addresses that were leaked to spammers. One of them was the address I used for newegg.

If you do go this route, I'd recommend using a whitelisting approach. I do get a lot of spam sent to random addresses at my domain.


I've also had personal information leaked through NewEgg. For me it was my name and my parents email address when I got them to buy me a video card when I was 16. Eventually they got an email addressed to my full name and their email about car insurance rates or some random spam. I would really think NewEgg would be better than that.


Hmm, that's a useful tactic as I'm seeing + addresses becoming much less effective due to the reasons I gave above.


I think a lot of sites have caught on to the + trick, though, because more are more disallow the + character in the email validator.


More like laziness. They probably only allow something like:

  ^[a-z0-9_.]+@(?:[a-z0-9-]+\.)+\.[a-z0-9]+$


In my experience it's been ignorance rather than laziness. I suppose you could argue that the root cause of the ignorance is laziness. If someone is writing a validator, they probably ought to check what constitutes valid input.


It's not a sign of catching on, but of using an old email validation pattern from the days of Matt's Script Archive (seemingly). It's the sign of a website that has used cargo-code programmers. You're probably just seeing it more because you're trying to use plus-addressing more often (confirmation bias), but it's been this way for years and years. It's actually better now than it used to be.


I wish they wouldn't..

https://en.wikipedia.org/wiki/Email_address#Valid_email_addr...


"Maybe every time I rev Firefox Nightly I change identity."

Unless the nightlies have a different behavior than the releases, the patch level is not reported anymore. The changes went into 16.0.2 which was released on 10/26/2012. initial report on b.m.o[1] reads as follows:

  Steps to reproduce:

  1) Load http://www.delorie.com:81/some/url.txt

  Actual results:

  The User-Agent header exposes the security patch level as either a minor version
  number or as an alpha/beta/pre indicator. This data is exposed twice: in the 
  Gecko version and in the application version.

  While it is of value to expose this data to e.g. AMO, exposing it to all sites
  makes the browser more fingerprintable (see https://panopticlick.eff.org/ ) and
  doesn't serve a purpose more important than user privacy. Point releases don't
  change functionality beyond security and stability fixes, so sites shouldn't be
  sniffing the patch level anyway.

  Making trunk, alpha and beta builds look like release builds for sniffing
  purposes reduces sniffing-related failures that waste time when treated as
  functionality-related regressions by mistake.

  Expected results:
  Expected the version numbers to show the major version of the most recent 
  Firefox beta that Mozilla has shipped and not to show the security patch level 
  or an alpha/beta/pre indicator.

  Additional information:
  Internet Explorer doesn't expose the security patch level in its UA string."
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=728831


Amusingly so far my privacy has been best protected by Mobile Safari on a popular iPhone model on the latest OS version:

"Within our dataset of several million visitors, only one in 857,908 browsers have the same fingerprint as yours."

As it doesn't allow for plugins, my fingerprint should (cookies aside) match that of any other <popular device> user.

So maybe the solution here is coming up with a 'secure browse' profile that every browser reports the same fake fingerprint.

Security in numbers.


> So maybe the solution here is coming up with a 'secure browse' profile that every browser reports the same fake fingerprint.

There already is: https://www.torproject.org/projects/torbrowser.html.en

This also has the advantage that no other solution has: it completely hides your location as well, whereas even with a "standard" browser, your IP address + time zone alone can do a lot to identify you.


even the tor bundle requires some additional steps for increased privacy http://dpaste.org/hAyZG/


Many of those options are now set by default in Tor Browser.


Last time I checked they were not.


I just checked now - many are but certainly not all. There are a couple of things that probably affect practicality a lot, like referer sending.


Well, some pieces of information, it can be useful to provide to the server. But, yes, standardizing or even dropping certain fields seems like a nice first step.


You said it yourself: everytime your Firefox build is updated, you change identity.


Considering the development model used by most modern web browsers, I'm pretty certain those who do this type of fingerprinting in the real world will long ago have adjusted their matching algorithms to be aware of this. I doubt they even care if they get a fully unique fingerprint, they probably just define some percentage of matching as being "close enough to a likely match" to report you as being someone they have seen before.

Just mentioning this lest anyone get the wrong idea that setting your browser to update frequently might be a defense.


Identity matching algorithms use a fuzzy match to identify users because the only way to get an exact match is with by storing an identifier on the user's machine, or in memory for the duration of the browsing session. A lot of factors can change and it may not significantly alter the weight of the match.


One of the most effective methods of fingerprinting people is to enumerate the fonts they have installed on their machines (via flash). You don't even need the browser version number to uniquely identify most - the only solution is to disable flash.


As chrome sandboxes flash, I wonder how much hassle it would be to only allow a small subset of fonts through?


Most browsers have "run plugins on click" options, very useful in general.


> Maybe every time I rev Firefox Nightly I change identity.

Indeed; the user agent is part of the fingerprint.


Tracking in the short term is still possible though, and some would find it very useful :(


A possible solution would be anonymize the browser fingerprint, at least in private mode, ie lie about the details of the system.

Google, Mozilla, Opera, can you hear me?

== This. The system needs to be fixed. Need to know (only) vs nice to know info exch, etc.


> The system needs to be fixed.

This is another problem with having an advertising company (Google) supply a browser that is very popular (Chrome). In fact, Safari and IE are also run by companies with large presences in the online ad market.

I doubt Google in particular will risk antitrust suits by blocking these kinds of very, very unsettling but unfortunately legal trackers, which in part are not so technologically different to GA but combine a few more bits of tech which makes them awfully invasive. We might be able to hack technological solutions together here but this stuff rarely makes it out into people's mainstream browsers.

The most important way of securing people's data over the next 10 years is going to be by way of the browser and the mobile OS, but the thing that is most easily achievable is to have a solid browser that people can trust on to implement privacy-preserving technologies. The only browser I can realistically see doing that is Firefox.


I see both your points (browser vendors and extension distribution issues), and I think Mozilla has shown great respect for their users. However, Mozilla's primary revenue source is Google.

Maybe taking a page out of the enterprise play book and using a proxy, like Squid, would make sense. From reading the Squid manual, it seems like it could play a role as it is quite extensible and sophisticated. Making it easy to setup and customize would be pretty difficult from what I can tell, unfortunately.


Yes, but it takes more than removing most of the identifying information.

First, the precise browser version and OS can probably always be identified by checking for supported features, bugs etc. even if the extreme measure would be taken to remove the user agent string.

Add the screen resolution, IP, timing and request patterns (+) and we are all screwed.

(+) e.g. rule out users that are using other sites at the same time. Note that it would be possible to determine if a page is in the currently focused and visible browser tab and forward that information to the tracker.


Yes, but it takes more than removing most of the identifying information.

The trick is not to remove information, but to poison it.

For example, Panopticlick sees that I have dozens of "system fonts", enough to stand out. I want my browser to lie about the fonts I have, based on settings I choose.

There are many details about my browser and system that are irrelevant to what most sites need to do so lying about them should not interfere with viewing a site.


This is a great point. I totally agree, if you adjust simple things on each visit, then you can use their extra bits of identification against them, in a very obscure way, without limiting the actual checks of functionality that the sites use.


These system fonts are detected through Flash/Java. Disabling these will fix this.


I don't want to provide less, but still accurate, data, I want to introduce suspect data. The end result should be that anyone collecting data on me without my consent should have no idea which of it are accurate such that all of it becomes useless.


But why make it easier for the site to track you?

Force them to do detailed packet timing and their costs will go up, and it will become less economical for black hats to play around with your personal data.

I don't know if nuking the user agent string is a horrible idea, but it's less of a problem today than it was 5 years ago: today, a website can assume all browsers conform pretty closely to a standard. Only really advanced features require user agent sniffing (arguably, if you're sniffing the UA you're doing it wrong).

I think we should make that kind of fingerprinting opt-in, not opt-out.


I think changing the user agent to only report the major version would already go a long way.


I just played around with a Chrome-extension to mess around with panopticlick. In case anyone wants to continue: http://lukaszielinski.de/blog/posts/2012/12/12/panopticlick/


> lie about the details of the system

Rather than trying to hide everything another tactic is to provide random misinformation (different user-agent strings, only presenting a subset of fonts and plugins, etc). Enough to defeat the fuzzy matching that does go on.

Sure you've got to be careful that you don't do things that may break some sites that rely on this information remaining stable during a session, but that's got far less common with the frequent browser upgrades that go on nowadays.


doesn't this tactic make you even more unique? if a browser behaves differently than the rest of them...

imho the best strategy would be to copy one behaviour everywhere, so that there could be no way to differentiate between users.


if the data transfer is done from the client (and i bet it is, as it's much harder to persuade people to run code on their servers) then ghostery and the like still work, because they block the transfer (since the code to do the transfer must be loaded from the weasel site - same origin policy).


Nah.

    +-------+       +----------------+       +------------+
    |Browser|<==A==>| Visited server |<==B==>| ID service |
    +-------+       +----------------+       +------------+
The client data acquisition is done through A (AJAX), then that info is sent through B (API call) to get the identity. The browser doesn't interact directly with the ID service.

The data acquisition scripts would be served directly by the web sites.


did you read the article? they explicitly say it's implemented in the browser with "tracking code".

you can certainly make something that is "ghostery proof", but (1) this isn't it and (2) it would be more complex to deploy and so gain less traction.


I read the article, and you didn't get my post.

It wasn't about this very company, but tracking in general, which can be implemented without serving code from a third party server.

If solutions like Ghostery become the norm, there are still workarounds, and they will catch up if nothing else works.


Instead of a script to embed, these firms could provide an API to identify users from the server side. The scripts that captures the profile would be served by the sites themselves rather than from third party services. ... Toast.

Not really. If they share data on the server side, they wouldn't be able to share a cookie - they would have to rely on other means to identify you, such as IP address etc. Not entirely impossible, but not as precise either. And that is spoofable through proxies etc.


Cookies are only used to track your return on that one particular domain. The matching of browser profiles won't be done on the client side.

These scripts will only track you and not receive information.


No, but the identification relies on a cookie placed on the 3rd-party site (leadlander.com). With that domain blocked, there would not longer be a way to identify the user across different sites, which is a key element of this thing working.


That's not correct, you as a user run a snippet of leadlander js code on your web page. This code has no access to the cookies on a leadlander domain.

I've worked on a similar tracking snippet/system for http://www.projectcounter.org/ and this was one of the first things we attempted.

If you block the leadlander domain(s) the script will obviously not run and also consequently won't be able to send fingerprint details back.


I'm quite surprised that panopticlick says I'm uniquely identifiable with Chrome based solely on the browser plugins reported, even though the ones I have are quite pedestrian: just Chrome PDF viewer, QuickTime, PepperFlash, and Flash.

In Firefox, the plugins get them to 1 in 860,000 which leaves only 3 possibilities in their DB of 2.5 mln, even though Firefox loads only QuickTime and Flash.

It must be the combination of codecs I have installed. How do I go about cleaning that up?


I'm not sure about Ghostery, but the TOR Browser bundle (based on Firefox, see https://blog.torproject.org/blog/effs-panopticlick-and-torbu...) does apply a few tricks to normalize the browser fingerprint.


Also when you update Ghostery buglists, it whitelists some sites, you have to select all on purpose each time.


I think you can set Ghostery to block by default: see Ghostery prefs > Advanced > Auto Update > Block new elements by default.


good to know. Thanks.


A good workaround for panopticlick would be to append a random string to the useragent for example, effectively making your fingerprint unique all the time.


It would have to be something a human couldn't engineer around. Like, adding one random extension that you don't have installed. Though, this could cause issues with legitimate sites. The problem is that if it's bogus noise, it can be removed. If it's legitimate information, it's going to screw up something legitimate that relies on it. Maybe make up an extension name that doesn't exist for every request?


They'd just remove the random string...

The browser UA is only one component of the fingerprint, and probably not the most important one.


I think you always win in that cat and mouse game. Add some fake entries to the list of plugins, to the list of fonts, ...


A possible solution would be anonymize the browser fingerprint, at least in private mode, ie lie about the details of the system.

Plus one for this. I wonder if a plugin alone could change enough info to fool the trackers?


I'm skeptical of this unnamed company's actual abilities. In the initial email how are they able to identify anything about your visitors before you've installed the tracking code? Since they apparently can see search terms used to reach your site the only thing I can think of is their code is running on some site that links to you (perhaps an off-brand search engine?) and they're tracking outbound clicks. Or it's fake.

It's pretty easy to guess company name from IP address, especially if you don't care about accuracy. You can kinda sorta do this in Google Analytics under Audience > Technology > Network. That seems to be roughly what they're doing in the screenshots posted. IMHO, this is not the most serious privacy issue on the web.

I would be very curious to hear exactly what percentage of visitors it is able to supply Name and Email for (and how many of those fields look bogus). This sort of individual-level tracking across sites is obviously possible, but I don't think it's common. Google/DoubleClick do not, as far as I know, do any sort of tracking at the level of an individual's name or email address (And why would they? It's asking for regulatory problems and it doesn't really help them much -- they target ads to groups of similar people based on demographics, not to particular named individuals.)


For users without showdead, the user darrennix (who appears to be the same Darren Nix who wrote the article) posted this comment. Why the mods or system would kill it I have no idea.

> It's a fair question and one that I asked myself. If the entire service is a fake, then it is an extremely elaborate one because the name and emails of the individuals it did indentify (which I noted was a small percentage) were real.


I don't know what kind of numbers we are talking about here, but if a user clicked through the OP's site to a tracked site, there would be referrer information that could be backtracked.


It wouldn't be able to include what search terms were used to arrive at OP's site like the email seems to be showing.


I imagine (though have no actual clue) that it's more of an e-mail sharing network between sites. You sign up for site A, the API tracks that and allows site B to see the signup details you entered.

One one level, I can see why sites do it. On another, one inch higher level, I can see how any site implementing it is so shortsighted that I'm amazed they didn't immediately go bankrupt as soon as they started.


They can identify you by name/email if you've entered it on a site in their "network". Their network may not be huge, but a (presumably) similar service had a big enough network to capture Sumit Suman's email earlier this week (https://plus.google.com/u/1/106142598193409336347/posts/2jLJ...)


Sure, but how can they possibly know anything about a site that does not includes their Javascript or make any calls to their API?


If the site where the form was filled out sets the users' collected info as rather obviously-named Javascript cookies or PHP session vars (i.e. $_SESSION[email]=$_POST[email]), that's one method, no?

Was anything mentioned about the browser used? Maybe when "auto-fill" browser options are enabled for a user there's a way to access that data.


The site does include their Javascript.

So I signed up for a demo account and installed (and hastily removed) the tracker


The article author signed up for a demo account after receiving the b2b marketing email containing "a report snapshot for 42Floors.com showing names, companies, and emails of site visitors and the information seemed plausible."

Like the parent, I have no idea how this information could have been obtained. It lists search terms, how could a 3rd party track clicks from SERPs to a website not running their tracking code?


Perhaps I'm being dense here, but the blog post says he signed up for the demo AFTER getting an email that included a "snapshot" of visitors.


The snapshot is fake:

http://news.ycombinator.com/item?id=4911542


Toolbars. People install toolbars all the time.


The initial email comes with what it calls a mock example. I think that's what you're referring to.


HubSpot (and pretty much any other marketing automation tool) has this feature, too. They lookup company name and location by IP address and build an anonymous "prospect" record representing each visitor so that salespeople and marketers can detect whether prospects from a given company are hitting the site for information.

The second a prospect submits a web form, all that previous web activity is tied to their email address (and any other info you collected via the form). You now have a real lead.

I don't see any privacy issues with this.

What I would see an issue with is if the tracking company were sending the IP address and cookie back to a central database to query "Does anyone _else_ know who this visitor is?" and then provide PII any company who uses the tracking service.

The moment you start giving my PII to a company that I didn't voluntarily give it to is when I feel a line has been crossed.


> What I would see an issue with is if the tracking company were sending the IP address and cookie back to a central database to query "Does anyone _else_ know who this visitor is?" and then provide PII any company who uses the tracking service.

That appears to be exactly what's happening. The email mentions "access to our entire network of identified data ([...] we can identify any visitor [...] if that person has filled out a web form from any other website we are tracking)".


Well, then this is F'd up. I guess I didn't read the post carefully enough. :-/


According to the sales rep, their tracking capability goes far beyond ip lookup. It explicitly involves saving form data from site A and sharing that personal information with site B.


Most DMPs can do that you know. Some better DMPs like BlueKai do it anonymously, but there are companies out there that use and target PIIs. I believe RapLeaf was one such company (correct me if I am wrong)


This isn't what Hubspot does at all. While you are correct in that they, along with Marketo, Eloqua, Pardot, etc all look up company/location via IP, none of these companies are getting information from another website to identify prospects.

In the case of marketing automation, all the data lives within the system and is used by the company - rather than giving that information out - a very different proposition.


Well, isn't the scenario you would take issue with exactly what is happening here? From the article: "For example, if [a visitor] went to XYZ.com and filled out a web form and then [the visitor] later visited 42floors.com, [42Floors] would be able to identify [the visitor] by name/email as well as company details even though [the visitor] never filled out a web form on [42Floors.com]."


Marketo has this too ... they are all over the place! I manage multiple "online" identities so I can track who is me directly (and who is implied me).


I misread the article, the company the OP mentions is doing something far more sinister than what Marketo does. I can understand not wanting to be tracked at all, though.


Just looked through Zendesk's network calls -- looks like it's probably Demandbase. http://www.demandbase.com/landing-page/demandbase-real-time-...

Surprisingly, AdBlockPlus doesn't seem to block it.

Edit: actually it's LeadLander.com as pointed out by NiekvdMaas here http://news.ycombinator.com/item?id=4891764


I'm surprised by their extensive list of customers, and also the fact that customers seem to be happy to be identified as their customers ... http://www.demandbase.com/who-uses-demandbase/customer-list/

Hardware manufacturers & telecomms seem to feature heavily:

(a selection) ... Adobe, Dell, IBM, AMD, Box.net, Cisco, CSC, Comcast, Freescale, HP, Lenovo, Motorola, Novell, Qwest, Salesforce.com, Siemens, Symantec, Verisign, VMWare, Vodafone.

And there's several anti-virus/anti-malware companies listed there.

UPDATE: The LeadLander.com site also lists their customers - Microsoft, Motorola, Red Hat and Cisco, among others.


I wonder why some of these sites would want to participate in an information exchange that seems rather asymmetric. I couldn't find detailed information on the service but I assume they have some kind of mechanism, similar to those in P2P systems, to "encourage" sites to contribute large amounts of information(?) or maybe sites can buy information without having to sell?


Just checked, looks like Ghostery blocks Demandbase.


AdBlock is for blocking (annoying) ads, not trackers.


Depends on the filter list. EasyPrivacy is designed to block trackers, and if you look at the list

https://easylist-downloads.adblockplus.org/easyprivacy.txt

you'll see that Demandbase is there.


Well it does seem to block some retargeting pixels -- and this seems like a worse form of retargeting.


AdBlock blocks requests by URL matching and hides DOM nodes based on a handful of selectors. Whether this blocking and hiding is targeted towards ads or not is incidental.


Ghostery does, though.


This sounds eerily familiar. Around a decade ago, a data analytics company called Pharmatrak was actually found guilty of breaking federal wiretapping statutes for doing something very similar. [1] In their case, they had built a network tracking HTTP GET requests to pharmaceuticals companies websites with a web bug [2] and attached cookie. But because some of the pharmaceuticals companies were using GETs as the method on HTML forms (remember, this was ten years ago), the users actually ended up making GET requests with personally identifying information in the URL encoded parameters. Since these GET requests were logged by Pharmatrak, and neither party (the users nor the pharmaceuticals companies) had consented to giving away personal information to them, Pharmatrak was found guilty of wiretapping.

Pharmatrak eventually won on appeal though, arguing that they had no intention of collecting personal information, which exonerated them because only intentional eavesdropping is a crime.

The company in the OP's article could make no such arguments though. I suspect that their main difference is that they make no assurances of confidentiality to the websites using their software the way Pharmatrak did. Which 1) is just really creepy, and 2) sets them up for trouble with users in California, because California's wiretapping statutes say that it's a crime unless both parties agree to it. [3]

[1] http://cyberlaw.stanford.edu/packets001737.shtml

[2] http://en.wikipedia.org/wiki/Web_bug

[3] I'm not sure if this applies to police, but it definitely does to private parties: http://www.citmedialaw.org/legal-guide/california-recording-...

Edit: Added third reference.


If I found out a site I used employed this tool, I'd both trash them publicly and never use their service again.


sencha.com, activestate.com, sandisk.com, clustrix.com, and about 2000 others use LandLander. I checked the privacy policies of those four sites and none of them say they are giving away your personal information. On the contrary, they all explicitly say they aren't.

"We do not share any information about you or your company to unaffiliated third parties, except as necessary to administer the communications we offer and as permitted by law. We may use a third party service provider to for communications; that company is prohibited from using our users’ personally identifiable information for any other purpose. If you follow us on Twitter, Facebook or on other social media services, we may use information provided by these services to customize our communications to you. We will not share the personally identifiable information you provide with other third parties unless we give you prior notice and choice." - http://www.sencha.com/legal/privacy/

Nearly every company using LeadLander is breaking the law because their posted privacy policies do not state that they are giving a third party your personal information, and that third party is giving it to others.

Edit: It looks like http://formalyzer.com/formalyze_call.js is the specific js file that uploads personal information. Of the sites I listed only clustrix.com is loading that (on the contact form). The other sites seem to be using LeadLander without the form tracking.


As I understand it, in the US the FTC enforces privacy policy violations. If you don't promise your customer anything, then you're more or less off the hook (as far as I know). But if you do have a privacy policy, and you violate it, then you're misleading consumers.

IANAL, YMMV, etc.


California law requires a privacy policy to be posted:

http://en.wikipedia.org/wiki/Online_Privacy_Protection_Act


"unaffiliated third parties, except as necessary to administer the communications we offer and as permitted by law"

Doesn't that wrap things up? They'd just argue they've shared it with an affiliated third party.


And then it says: "that company is prohibited from using our users’ personally identifiable information for any other purpose."

It turns out that sencha.com might not be sending personal information. clustrix.com appears to be, their privacy policy says:

"The Personal Information we collect is not shared, rented, or sold to any third-parties. We may provide your Personal Information to companies that provide services to help us with our business activities such as shipping your order or offering customer service. These companies are authorized to use your personal information only as necessary to provide these services to us." - http://www.clustrix.com/privacy-policy

I'm not a lawyer, but as a normal native English speaker I read that as they are not going to send my name, email, and phone number to another company, who will in turn share it with with anyone who pays them. But that's what they are doing. They are selling your personally identifiable information.


Marketo has done company-level tracking for years[0], and if you click through from an email or fill out a form they can keep tracking you as well as back-fill any previously anonymous visits you made (depending on your browser cookie settings, of course). Once it's in the system, they partner with a number of companies, some of whom can help populate contact data[1], eg: "over 1.5 billion opt-in email addresses" -- how plausible is that? They have as customers a few companies[2] you're likely familiar with (eg: VMware).

[0] http://www.marketo.com/small-medium-business/inbound-marketi...

[1] http://launchpoint.marketo.com/strikeiron-inc/747-strikeiron...

[2] http://www.marketo.com/customers/


I've always assumed this is the case, but I also wonder if it applies when clicking to unsubscribe.


Just use Ghostery and share it with everyone you know. And keep using those services for free (ie: not paying with your private data).


No, that's the wrong solution. The traffic still tells them that there is still interest to monetize, they just may need to stoop to new lows to get to it.

Loudly tell them that their spying is unacceptable, then actually follow up on that statement. Ghostery is awesome, but that's a proactive measure. We're talking about appropriate reactions.


While I agree with your count-point, it's worth pointing out that Ghostery blocks Google Analytics, too. So they might actually not notice the traffic if they're only looking at Google Analytics (or other blocked analytics tools). [1]

But anyways, I agree with what you're saying. If we care about privacy, we have to be loud about it. I just thought it was worth pointing out that facet of Ghostery.

[1] Yes, you can still see the traffic in the web server logs, but I don't see evidence of many companies still doing that. Google Analytics and the like seem to have completely replaced server logs for traffic analysis.


If they only use Google Analytics (or Piwik, or anything else) and not fact-check this with server logs, they deserve to feel "low on traffic".

But Ghostery is not able (correct me if I am wrong) to disable the server from logging you. And automatically reading server-logs is not so difficult at all.

The most funny thing here is, that in Germany, you should anonymize an IP-address, when tracking, because of the law, that is concerned with privacy.

But the server logs your full IP non the less.

On the original post: The technology advertised to the author would be totally illegal in Germany. And if I would ever encounter (via Ghostery) a site that uses them and has a German base, I would inform the authorities against them.

I just hate this philosophy of bending/breaking the law/common sense, just because it is possible and might bring in some Bucks. And just because pressure from users might change the regulators minds in the future. It just feels so totally wrong, so disrespectful against fellow human beings, that imho everybody, that has something to do with things like this should be deported to somewhere like North Korea, or the likes. Or like in the middle ages should stand in the pillory (and not in a virtual one).


Is the weasel company's javascript (and/or flash bug) logging all form input back to its own servers to capture name/email when you sign up somewhere else? Are they capturing credit card numbers too?

We can tell the world all day long this is Bad and Unsafe, but within six months it'll be more popular than ad retargeting and the meebo crapbar (because, hey, analytics!).


I can't imagine many legitimate sites participating in this scheme. It would certainly violate many publicly posted privacy policies.


Sadly, opportunistic jerkfaces aren't limited by our privacy-hat-wearing engineer imaginations. They can devise much, much worse schemes we would dismiss in five seconds out of "ethical" concerns. (privacy=dead, remember? do anything to track people and manipulate them into giving you money. if you aren't selling anything, sell the tracking as leadgen.)


I pretty much assume that anything I post to any website could someday come back to haunt me. Expect no privacy on the internet, and you won't be disappointed.


> Are they capturing credit card numbers too?

Doesn't look like it, at least not intentionally. They are trying to capture name, email, phone, and company. Source: http://formalyzer.com/formalyze_call.js


Can someone provide a regex that would identify this tracker? I'd like to run it through our index and see if I can come up with a list of sites that employ it.


Probably not since none of us know who this firm is -- and thus the hostname(s) and/or IP(s) used; we'd probably need to contact the author for that info. Once we know that, the regex would be dead-simple...


Well Darren saw the tracker, and he reads HN, so perhaps.


try: http://www\.leadlander\.com/trackingcode\.asp.*&id=.*


We've got trends for trackers that we can detect and are currently not loaded by third party JS -

To name a few -

http://trends.builtwith.com/analytics/LeadLander http://trends.builtwith.com/analytics/Hubspot http://trends.builtwith.com/analytics/Marketo

There's a lot of them out there now and mostly all of the big ones are continue to grow in popularity.


trackalyzer.com, perhaps


I suspected that, but then I considered if he went to the trouble to obscure it on his blog post then he probably wouldn't divulge it here [publicly].

But I could be completely wrong; is the guy who relpied with the rx Darren?


The initial data is fake.

Proof: http://o7.no/Z0huP7

I get emailed by them for every startup I'm involved with and that first email is mostly the same every time as you can see in that screenshot. (Compare it with the one posted in the article and you'll see).

They seem to be targeting startups and make it look like some big VC firms are visiting your site to get you interested. I'm not sure how they come up with the 'search terms', but I guess they could just look at your META-tags or make them up.

In their email they do say it's a "mock example", but still I find it very deceptive.


I suppose it's too much to ask that we as developers and engineers show some fucking backbone and refuse to work on or with these tools and projects? And publicly shame those who do?


But engineers that build components for bombs and missiles should keep chuggin' along.


Those are all overt means of oppression, whose use and abuse is so obvious in most cases that restraint in their usage is exercised by those who wield them.

Something like this is a quiet, terrible thing slinking about unnoticed until it is rather too late. I believe these things have more potential to cause harm than any missile built and kept in stasis.


My point was that shaming engineers for choosing what they work on is stupid because often they work on components and don't ever build a complete missile, or "quiet, terrible, slinking thing" by themselves.

The developers at these kinds of places don't need to know what they're building. They have many tasks assigned to them and one of them is to write an API that collects a single piece of data. Many kinds of data are collected from many places and put into a database. Reports are made and cross-referenced by an analyst. Final reports are generated and fed to a guy who deals with direct marketing or advertising or sales. Any of these jobs could also be done by contractors or third parties.

You can't just tell people how to make a living without understanding what the hell you're talking about. That's my $0.02 anyway.

(P.S. people that work on missiles are often academic researchers and work for both the private and public sector on the same thing for many different clients, and aren't told what it's used for. the more you know...)


Anybody competent enough to build a system like this, even somewhat smaller units thereof, is more than clever enough to see the forest despite the trees and recognize that their work could be used for bad purposes.

Again, let's not argue over defense contractors or some damn fool thing--when you work for Google, when you work for AT&T, when you work for Palantir or HBGary or whoever, you don't get to say "lol not my department I made swing apps and file dialogs" when you find out they've done something bad.

We need to speak out when people work on harmful technologies.


What's one of the "bad purposes" you're talking about, anyway? What's the threshold of "badness"? How do you define what is worth quitting your job, and what might merely annoy a user? Can you even quantify it? Is it illegal, and where? What is 'it', anyway?

I'm talking about things like identifying if somebody is gay or republican or kinky and using the information for profit. Aside from selling it to background-check websites and the like, and the fact that it's information people willingly give up about themselves to entities unknown, I have trouble understanding how you can be so offended you think people should lose their jobs rather than develop potential parts of a potential system that could maybe harm someone at some point.

Your assumption about the "cleverness" of developers is misguided. If a guy is told to write a small piece of code which simply takes HTTP requests from JavaScript and plugs it into a database, there is no idea what the fuck that could be used for. The guy maintaining the database also may not know what the fuck he's looking at, it may just be numbers. Are you really so willfully ignorant as to believe every single outcome of every single human action is cut & dry?


Have you ever read "Scroogled" by Cory Doctorow or Stallman's "The Right To Read"? Both are a bit absurd at first glance but viewed today seem oddly prescient.

I'll even accept your assertion (for the sake of argument only, mind you!) that engineers at a company might only work on some small fragment of JS munging numbers in a database.

At some point, though, an engineer needs to implement the API for a saleable product using that information, or code up a dashboard with element names like "#user-site-history" or "#tracked-profile-visits", or at the very least see the marketing materials the sales folks use to show that the product is competitive due to this information gathering.

Your assertion makes publicity even more important--eventually, some engineer or admin is going to have to get their hands dirty and that is when they need to speak out.

~

To go back and answer your "so what if we have targeted advertising" directly: there is currently no heavily established legal framework of which I am aware that protects metadata about users gathered for the purposes of advertising. I do not know if Google or Facebook is prevented from giving up (for whatever reason!) the results of their ad engine's analysis of user browsing to anyone at a whim.

We (Americans, at any rate) are very lucky that our government at least goes through the motions of liberty enough to not overtly round up deviants and send them off to the camps or send drones after them--this is far from the case in various other countries.

As far as the idea that the information is given up willfully, we're talking about techniques and technology that are really only ten or fifteen years old...the average consumer has not had time to build up any sort of reasonable intuition about what they are sharing or not sharing, or how that information can be linked to other facts about their lives. To say that they've "willingly" given up this information is, I suggest, somewhat misleading.


I used to work on defence projects, and I considered this. I was working on ECM (protective) systems, which I felt was morally acceptable. Had the job been missile guidance, then I might have felt differently.


You could refuse to work on such a project, but then the project development would just get outsourced. I don't like the concept of this service at all, and I think its a really shady business practice to use it, but I don't think theres really way short of passing laws to stop it.


Dataium does this too, as covered by WSJ's recent article on the subject [1]

The article goes into depth about how much personal information is sent along to advertisers including a popular dating site's apparently anonymized information about drug use, and sexual orientation.

I think we need a non-profit service that defines a set of privacy licenses (akin to CreativeCommons' licenses) which companies can opt to label their websites/apps with. There would be no policing/auditing [2], but companies found to violate the privacy licenses would be obliged to donate a sum to an organization like the EFF.

That the privacy policies would be encompassed by one simple privacy licence badge would allow users to quickly and easily identify a company's privacy policies. I believe users would gravitate toward using services that display this license.

Edit: it appears such a service is in the works - http://privacycommons.org

[1] http://online.wsj.com/article/SB1000142412788732478440457814...

[2] The auditing process would likely become complex, costly and corruptible


I like the idea, but unfortunately the damage is already done, when you see a site without a privacy badge. (Since the browser did already execute any JS/ send headers etc.) But perhaps there could be a plugin like noscript, that searches for a given privacy settings on the site and allows/blocks JS and third party content. ( If there is no widespread use of the plugin, then noone will include the privacy badge, if there is widespread use, then there is a strong icentive to abuse the badge. So one would need some way to really enforce the privacy badge...)


quote: At 42Floors, we’ve made the decision not to use any visitor identification tools...

facts (detected by a ghostery at 42floors.com): ClickTale, Facebook Connect, Google +1, Google Analytics, MixPanel, Optimizely, Twitter Button


None of them can be used to single out and identify individual users (except GA... which I believe can be done if you are clever with it)


>None of them can be used to single out and identify individual users

...by 42 floors. They're still telling all those networks that I visited the website.


I can't speak for all of those other sites, but at Chartbeat we're vehemently against tracking any sort of personally identifiable information.


Well, there is this: https://mixpanel.com/people/


42floors would have to know who the people are before they could tell Mixpanel. Further, Mixpanel wouldn't reveal that information to any other accounts other than 42floors.


Going to site A, not providing any info, then going to site B, C and D and seeing ads to site A haunting you is one thing, capturing your name and email is a new level. If you don't use a tracking blocker, clearing cookies is not always going to work, these persistent trackers are quite sophisticated, they use local storage if possible, IP address, header information and whatever is possible to be able to identify someone, there is a huge industry behind it. But this one is taking it a little bit too far, scary.

On the other side, most startups including YC ones, use some sort of tracking for analytics to improve usability and internal flow, so advocating against all trackers and for all users installing a blocker is a double edge sword.


Transparency: I'm a co-founder at Perfect Audience and we believe strongly in the benefits of retargeting for the end user, for the advertiser and for the content publisher.

I don't see a moral issue with retargeting because at its heart it's anonymous - all we know about a user is a string of sites and maybe search words. However, as soon as that data is correlated against personal information, as soon as the real world data and the digital paper trail are correlated and identifiable it becomes sufficiently creepy to me. Who knows - maybe 5 years from now this will seem innocent and benign compared to the mind-reading banners on the bus stops but this seems like a line in the sand I am willing to draw today.


Yes, I agree this is where the line passes, I don't see a big moral issue with ad retargeting as long as there is an opt out option and a privacy policy somewhere to read. We don't like it when we see ads we don't like (or worse when people looking behind our shoulders can know a lot about us just based on the ads we get on our laptop in the coffee house), but we all like it when we use it to promote our own projects, or when an actually relevant ad shows up


What about research or sensitive purchases? If I'm shopping for sex toys, wedding rings, STD medicine, etc. doesn't retargeting violate my privacy?


This is why I've deleted my facebook account and browse with Noscript disabling javascript (except for whitelist), RequestPolicy blocking cross-site requests (except for whitelist), and CookieMonster blocking cookies (except for whitelist).

It wouldn't completely work here (e.g. EFF's panopticlick could still fairly uniquely identify me, or IP address would give away info if I'm not going through my VPN), but it improves things.

It feels kind of extreme, but it's worth it to me. My experience is not broken that much, and I feel like various sites are aggregating less about me. These tracking technologies not such an issue now, but I foresee at least the possibility of abuse in the future, so I figure I'll do what I can now if it's not too much hassle.

Lastly, at its heart most of this is about advertising, something I know I'm very susceptible to (try as I might to convince myself I'm not). So the better I am at blocking out these things, I think the less money I'll spend in the long run on frivolous nice-to-haves.


Thanks for the RequestPolicy pointer.

(edit) Does anyone know of a Firefox/Chrome add-on that strips referrer info from cross-site requests? That'd be the simplest way to deal with all externally hosted .js and images that double as trackers.


https://addons.mozilla.org/en-US/firefox/addon/refcontrol/


In Firefox, go to about:config and edit the "network.http.sendRefererHeader" integer.

2 = always send; 1 = send only to same FQDN; 0 = never send.


This kind of thing is what I've always seen as the potential end result of things like google analytics and also facebook connect. Both products that have javascript running on a vast number of websites, with the potential to link to personally identifiable information, in a similar manner to that discussed in article.

I can't imagine that I'm alone in this train of thought.


You're not, and it's why I largely avoid (at all costs) turnkey solutions that certain websites employ for parts of their site. For instance, the sites that use something like zoho.com or disqus.com for blog comments; even though they're overt about their usage (as opposed to hidden tracking code), I'd rather not be heard at all then to willingly yield my personal information.


>This kind of thing is what I've always seen as the potential end result of things like google analytics

Anyone know if GA's privacy policy firewalls the data collected by GA from AdSense and other parts of Google?

I know they modified the privacy policy a year or so ago to integrate data across all their products but does it include GA?


I had to give Dick Smith (A NZ retailer) my phone number before I bought an external the other day.

"Do I _have_ to give you my number before I buy this?"

"yes, but it's for return purposes only"

Of course I received 'promotional' txts the next week. I was hesitant to give it to them for just this reason, and because I acknowledged I had a phone number I felt obligated to give it to him. Dick Smith is a member of a larger chain it's no stretch of the imagination to hook up CCTV cameras to an OpenCV instance and send txts to customers when they walk in.

No matter the law, morals people hold, or customer wants large companies are always motivated by profit margins. The Consumer Guarantees Act, the Privacy Act, the Bill of Rights Act all become murky when you're dealing with new technology, and law will find it hard to keep up.


If your consumer law is anything like Australia's, this requirement is false. All you need is proof of purchase: the receipt.

Dick Smith used to pull this all the time in Aust before they got pulled up over it. I always told them "sorry, not available" and they just moved on.


I'm surprised tat NZ doesn't have stringent laws prohibiting this. In Australia, if you give info for a pacific reason, that's the only thing it can be used for. Heavy fines can result if it's used in any other way.


Isn't everything done in Australia for a Pacific reason?


Darned iPad...


I wonder what the ACCC would say about Dick Smiths practices.


Nothing, he lives in NZ.


Google Voice numbers are great for this (and craigslist) if you don't already use Google Voice as your main number.


It's unfortunate that a lot of people live outside of the US.


Can't you just give them a fake number? That's what I do whenever a retailer asks.


I recognize these screenshots - it's definitely Leadlander. I'm not sure if they do what he claims they do, but they can identify by your IP which company you belong to (assuming you're connecting from the office). There are a lot of companies doing that right now actually.


Read the article, thought that it was something interesting but probably not that applicable to me because I clear cookies on (frequent) browser close, don't enter my details into many sketchy sites, use multiple different (isolated) instances of my browser for different purposes.

Today, I get an email from a site that I visited yesterday and haven't heard from in 6+ months. It's too much of a coincidence for me to assume it's random so I dig into their website a little and they're using one of these services.

TL;DR: even though I'm relatively paranoid with giving out details online, one of these networks seems to have successfully identified me and provided my email to a website that I visited, who then reached out and tried to sell me shit.


Tacky, cynical, nasty, and inevitable.


My thoughts exactly. It's terrible, but completely expected in an age when sites like RottenTomatoes and TripAdvisor already know who I am, which of my friends are on their site etc when I haven't even signed up - all from deep Facebook Connect integration.


When I worked at The Washington Post we were among the first group of companies to integrate the then new Social Graph. I immediately deleted my Facebook account.

I was appalled when I saw that we could identify, not only visitor's names, but their friends, access public photos, and all of their profile information. All of this without any action on the user's part and before there were any privacy controls.

This is just the next logical step-federating data collection across multiple sites, not just FB.

I'm obviously in the minority since FB has grown tremendously in the past 2 years but I've not looked back. I dread the forthcoming lack of privacy and anonymity our world is heading toward.


While you can see this data from Facebook--and yes, that's jarring-- what you're allowed to do with it is something different. You can't sell it, you can't sell it to an ad network/exchange, you can't retain it after the user revokes permission; you can't even sell derivatives of the data.

Facebook Connect is the most benign of these sorts of things there are-- it's access to data, and the implementors of its widgets and API-- have an onus to protect it.

Now, of course, there's plenty of bad actors out there, and I'm sure it's sold and exchanged, but technically and legally speaking, you're forbidden from doing so.


Both RottenTomatoes and TripAdvisor require me to authorize them on Facebook before they show me any social data. Are you sure you didn't authorize them in the past and forgot?


I'm almost certain that RottenTomatoes will display your name if you're logged in to Facebook, regardless of whether you've given them permission.

I remember being disturbed when I saw that recently, and immediately sought out and installed a social widget blocker.


If that's happening the widget is most likely an iframe loaded from facebook, and not accessible to the RottenTomatoes server


Not 100% certain, but nearly.

They were just two examples I could think of off the top of my head though. As the other commenter said about TWP, the practice is common. I see my name and other social data displayed on sites I've never signed up to regularly.


I see that too, but in that case the social data is served from Facebook and doesn't go through their server unless you authorize it. Unless there are exceptions I'm not aware of.


So when I see a Like button on a site I've never visited before, that displays my name, FB avatar + social data - you're saying that the site I'm on has no way to know that I've visited it unless I click the Like button? Only Facebook knows that and is displaying it in a way that is undetectable to the owner of the site?

This is a bit over my head programmatically but that doesn't seem possible. If Facebook is serving something to visitors on my site, surely there must be a way for me to capture that data?


Nope, that's how iframes work, you can see it, but they can't get at its contents. Cross domain scripting isn't allowed in an iframe.


Can you please let us know the name of the company?


ghostery blocks trackers and analytics


Also NoScript for more granular control.

But when I see the anger that these types of plugins, especially AdBlock, produces in content publishers I wonder if we're not headed towards a new RIAA/MPAA-style battle front. As online publishers of all kinds get more established and consolidate their power they could start lobbying to regulate against these plugins. It might seem farfetched now but so did paying a tax to the RIAA for blank media, until it happened.


I use the Ghostery add-on for Firefox, but note that if you enable "GhostRank" then the add-on will send every URL you visit to Evidon. This is purportedly for "tracking the trackers", but it does give one pause.


Their FAQ says otherwise: https://www.ghostery.com/faq#q16

“When a user opts-in to GhostRank, Ghostery sends the following information each time a tracker is encountered:

    the tracker identified by Ghostery
    the blocking state of the tracker
    domains identified as serving trackers
    the time it takes for the tracker to load
    the tracker’s position on the page
    the browser in which Ghostery has been installed
    Ghostery version information”
Nothing about the URL you visit. Do you have reason to believe they're lying?


Ghostery's Alert Bubble does not report any trackers on this Hacker News page, yet every time I reload this page, my HTTP monitor (Charles Proxy) logs a ping to ghostery.com:

  http://l.ghostery.com/api/page/?d=news.ycombinator.com%2Fitem&l=304&s=0&ua=firefox&rnd=7639974
  http://l.ghostery.com/api/page/?d=news.ycombinator.com%2Fitem&l=426&s=0&ua=firefox&rnd=5747246
  http://l.ghostery.com/api/page/?d=news.ycombinator.com%2Fitem&l=346&s=0&ua=firefox&rnd=8989043
Why does Evidon need to know about pages that have no trackers? The FAQ says the domains serving trackers will be identified, not the complete URL path (minus query string parameters) for pages that have no trackers.


Adam from Ghostery here. This is a known bug, and we're fixing the issue in the next releases. We don't need or want URLs that aren't associated with trackers.


Hey - until the fix, we've made a note on our FAQ. You can visit the FAQ here: https://www.ghostery.com/faq#q16

We'll make a blog post soon, too.


This. Just remember to set it to auto update and to blacklist new trackers.


Should you prefer to whitelist sites yourself instead, I can recommend: https://www.requestpolicy.com


While this is awesome and powerful, it ends up being quite annoying. And Ghostery does not seem to miss anything, it blocks everything it should block, and allows everything it should allow. Just perfect.


Thanks for the tool recommendation. Interesting to note how much it blocks on 42floors.com.


I had a similar experience with FlightFox. I entered my origin and destination and got distracted and closed the tab. I get an email several hours later asking me to start the contest with the exact two places. Creepy, much?


I agree it shouldn't happen. But honestly, is this really worse than what FB does ?


Yes. If I integrate with the Facebook API I can't access any user details without first showing them the permissions dialog box. It is possible that advertisers have other access, but I would be extremely surprised if it gave them access to the user's e-mail address.


This is way, way, way worse.

If I'm understanding it correctly, the vendor is offering the following service: place a JS snippet on your website. When a user visits your site, data will be pushed to the vendor's server about the user and what they do on your site (probably keyed on IP and as many other things as they can use to fingerprint). In return for you sharing this data with the vendor, the vendor will give you all of the data on this same user that was contributed by their other clients.

Here is an extreme (yet possible) scenario. You go to a medical forum that uses this software and create an account using your personal email address and real name, both of which you select NOT to be displayed to the public. You then post a message asking about a specific type of back pain you're having. A few hours/days later, you're browsing for a gift for someone, and visit the website of a salon that also uses this software. They can identify that your browser visited medicalforum.com, see the email address and real name you created an account with (since they were passed your form submission directly, without regard to what privacy settings you used for the forum), and see the topic you posted on back pain. So just to be helpful, they email you an advertisement: "Hi {your real name}, we see that you're having some back pain - bring this email in to {salon} for 15% off a massage!"

EDIT: To add, how do you know that you can trust the vendor not to display seriously private data? What if an online store uses this JS, and the vendor has your credit card info, possibly not-so-securely stored? Your information becoming public would be as simple as Asshole Q. Pirate making a fake site with some link-bait, and creating an account with the vendor.


Or worse you go and apply for health insurance online and get denied because of this.


Well with Facebook you sign up and agree to a big EULA that presumably allows it. If I visit site A then site B, site B shouldn't be privy to whatever I said at a different location without agreeing to share it.


Advertising and marketing companies aren't the only ones that do this. Any corporation which owns more than a couple websites collects bits of information about them from each site and then builds profiles of its users, often then selling the information.

Say you own a sports website, a fashion website, a political website, and a gaming website. The user only specifies a tiny bit of information on each website. Each bit is collected into a single user profile from which they can refer to do things like figure out what product advertisements to show them. They use the same techniques to identify users that don't have accounts, and still collect their viewing/interacting habits and add them to the profile.

Sometimes they'll send you an e-mail telling you to check out their gaming website if you're not signed up, because the comments you write in their other websites' forums have to do with gaming. Sometimes they just sell the information to a gaming company. In the case of Target, they might send your teenage daughter a list of baby products for the little one you didn't know she was expecting.

This is not some horrifying violation of privacy. There is a price for all the free shit you get from the internet. Usually it's paid for by all the personal information you leak onto the net. They're just mopping it up and selling it back to you.


I may be one of the few and perhaps I've just been desensitized with all the social network invasion, but I don't find this stuff that reprehensible. At worst, its moderately annoying because its one more email that I have to archive but its definitely on the lowest totem pole of annoyances. Recruiters have been cold calling and emailing me for years based off of my LinkedIn and Github profiles and all I have to do is tell them "no thanks" and my life goes on.

What's the big deal?


We used to call the Soviet Union the "Evil Empire" for collecting about one tenth or less of this data on its people. That and rounding them up en masse to send to the labor camps.

In the US we had this thing called "McCarthyism". For a while back in the 1950's you could very easily be fired from a professional job for having read certain materials (mainly those from the Evil Empire of course) or having had certain political discussions when in college.

Just wait a few years until you need to find a health insurance plan that will agree to pay for the expensive medical treatments that will let you live a few decades extra. We'll see if your past web surfing and consumer habits make you worth keeping around.


Webtrak4.1 Profile for physcab.*

"Buys K-Pop from Amazon.com" "Visits RedTube at least 3 times a week" "Spends at least 15 minutes in gay porn section" "Made campaign donations to Republican National Convention" "Attends Clivesville Baptist Church"

* no this is not really his web profile. But this is the kind of things web bugs leak. And while your life my be so boring that it doesn't matter to you, may other people have information they would rather not share with every stranger on the net.


it means if you visit a website they could know exactly who you are, might not be an annoyance when it's a technology startups homepage but what if it's a pornography site, or a site for an engagement ring, or a surprise holiday you're planning? There's lots of examples where this could be a concern. Anything with questionable legality (eg: piracy) would be a more pressing concern.


This is outrageous and very much illegal in the EU.


In other words, the situation hasn't changed since the 90s. http://www.unc.edu/depts/jomc/academics/dri/idog.html (By the way, is anyone using http://samy.pl/evercookie/ in practice?)


I'm rather amazed that any company would put this on their website. What you may be doing, in fact, is likely identifying customers to your competitors. Cross-shopping is very common in most product categories - so it is quite possible that you're giving up your customer to a rival.


Anyone know if this system respects Do Not Track settings?


Like you said, it shouldn't happen, but it's inevitable.


Please... A clever use of GA + Wolfram Alpha can reveal a lot of potentially identifiable information already. You can't expect the Internet to become a big part of our society and, at the same time, remain a place for complete anonymity.


I can see how being a big part of society would exert pressure to realign with society's default approach towards anonymity (1) but I don't see how that would imply any final status of online anonymity. Other forces might exert pressure in the other direction. The technology itself might resist some pressure. Access to certain technologies changes the societies themselves.

(1) And of course society in general used to be and still is pretty anonymous. I can easily buy a newspaper in almost perfect anonymity through regular channels, apparently I need to take special precautions to get the same status online.


Shouldn't we be able to make these systems useless by filling them with loads of fake data/spam?

I.e. if I have to fill out a form somewhere, I would not only submit it once, but several times (ideally automated), ideally with realistic data, i.e. other businesses in my area (so geo-location won't raise a red flag then).

If I visit the next website which employs the same network, they can't really identify me - they have a big set of businesses I could possibly be (or they just take the last one, which would be fake).

At least currently, they do not seem to verify whether the filled out form can be properly validated, i.e. if the user clicked on a confirmation mail or similar.

Anonymity by obscurity :)


It's sufficient to disable third party cookies and not browse cookied by major social networks to prevent this right?

Ignoring the part where you can be "tracked" by company, but that's just looking up public IP records.


As far as I know, yes. This is what I do. Ironically, doing so and thereby solving the issue he's blogging about makes it impossible to comment via Disqus on the 42floors blog.


Custom hosts files can be used to block trackers across all browsers and applications. I personally use: http://someonewhocares.org/hosts/


How the article is designed - it took me a while to understand that 42floors was not the company performing the tracking. I initially went there to find a name of the company (to put it on permaban in my NoScript), yet the only organization popping up while skimming through the page was 42floors. I was a bit spooked when I checked the noscript-list for blocked resources and saw the url I thought was tracking me.

After that, I looked at the URL-bar and it took that long for me to click.


I find it interesting that this article should show up the very same day I found out about TAILS - The Amnesiac Incognito Live System ("https://tails.boum.org/), a live Linux distribution that uses TOR and other tools to enhance your online privacy. The more I read about online tracking efforts like this, the more I want to set up a wall around my computer.


I would never use this on my site, but I feel if I was to, it should have big tick box agreement, with a simple 1 sentence explanation.


Fond memories of the movie Minority Report spring to mind. Startups are, in fact, working on this exact end-game facial recognition based ad technology right now.

http://www.youtube.com/watch?v=6-ZLw2Q7U2M

and the company: http://www.immersivelabs.com/


This will be done in meatspace via facial recognition and the state will likely demand access to this data. Disney are pioneering this sort of tech:

http://occupycorporatism.com/disney-biometrics-and-the-depar...


I am okay with companies displaying ads to me - this is what pays for the web to exist. If however things like this continue to exist then I will take up all options offered to opt out of identification and ad networks. Google and Microsoft etc should take note to shut this sort of behaviour down.


So, how feasible is it these days to do all of your browsing through a VPN? Not that a VPN's going to save you from the attacks mentioned here, but hey, maybe it's time to start getting serious about my privacy.


Totally feasible and you can set it up in less than 15 minutes (either yourself with a cheap vps and openvpn or from one of the hundreds of existing VPN providers). However, if you're serious about your privacy that's only one piece of the puzzle. You also need to invest some time in managing what sites you are happy to accept cookies from and get serious about deciding how far you are willing to go (in terms of inconvenience) to blacklist everything else (using incognito mode, ghostery, noscript, adblock etc.)

Edit: also remember that all a VPN gives you is an encrypted tunnel between your PC and the VPN end point. This means your ISP can't snoop on your traffic and ad providers can't see your real IP (and thus location) but that's all it gives you. It isn't a solution for anonymous surfing. If you want that, use tor.


I noticed ghostery blocked 11 tracking cookies when I went to read this.


See http://news.ycombinator.com/item?id=4954972 for an update


The Tor Browser Bundle is probably the best current tool for anonymous browsing.


I'm not knowledgable in this area but would using a VPN prevent this?


No, how so?


Doing this in Spain would be illegal.


This is just terrifying.


wowza


Thanks for figuring out how they do it...

That said since a (very) long time I'm using separate Linux user accounts to: check my professional email + G+, surf my personal email + G+ + FB (my FB is using a fake but plausible name) and a third one to surf the Web.

The one surfing the Web is linked to a fake online identity: entirely made up, with fake friends / fake G+ circles, fake StackOverflow / OpenID and basically fake everything.

I then only ever surf using a transparent proxy for anything "work related": the IP can't be linked to my fake IP.

It's not difficult to set up: I did set up the transparent company Web proxy (VPN would to too) myself and basically Linux user accounts take care of the rest.

Now I'll start using different browsers too and, why not, maybe Tor in one of the account.

I take it I could take all this a step further and whitelist websites that my "personal" account is allowed to connect to (using iptables' owner-uid mod).


That seems extreme. Why not open your e-mail and social networks in incognito/private windows? Personally, I use a browser add-on to remove trackers, but I realise that isn't 100% foolproof.


Some require cookies to be enabled (Facebook I know does), so they networks won't let you log in in a private mode.


Chrome Incognito mode (at least) allows cookies, though- it just isolates them to the incognito window and destroys them on close. I just logged into FB and it worked fine.


Correct, cookies are stored for the duration of an incognito session.

I find it a little annoying though that cookies are not sandboxed by tab, rather than by window/session. If I log into FB in incognito, and then do a little more private browsing in a new tab the FB cookies are still accessible in the other tab.

I guess I could be more vigilant with my browsing habits but I think this is a fair feature to implement at browser rather than forcing user to jump through more hoops to protect privacy. On a side note, when will chrome finally offer API hooks to allow NoScript to be developed for it?!


I'll second this. I find Incognito windows really useful when I want to hop on my wife's laptop and check FB/email/the handful of other things we both have accounts on.


Curious what your friends say/think about the fake fb name as the main value of facebook comes from connecting with your friends..


Several of my friends use pseudonyms on Facebook, although ones that are usually recognisable as manglings of their real names ("John A Smith" -> "Jonas Mith", say). In their cases, I believe the main intent is to stop prospective employers finding their drunken-student-party photos, but I guess it might contribute to privacy in other small ways too.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: