Hacker News new | past | comments | ask | show | jobs | submit login
Sirubo: Packet filtering to block Google and Facebook tracking (peguero.xyz)
255 points by signa11 66 days ago | hide | past | favorite | 129 comments

This is cool, but not super useful in 2021.

Google and Facebook have already moved their tracking technologies beyond frontend network calls due to a rise in browser-level blocking (browser security policies, international regulations, AdBlock, PiHole, etc). The next generation of tracking tech relies on the backend transfer of data between a website and the ad platform, which is invisible to your own network.

It also shifts the story from "Google and Facebook nonconsensually tracking your every digital move through websites and applications" as described in the article into "websites and applications actively transmitting customer data to Google and Facebook." The websites and applications are no longer passive partners, and they assume the responsibility of managing user consent.

As someone who works in ads, this is a spot-on summation of the current state of tracking. And that's without getting into how machine learning can use seemingly unrelated data points to learn more about you than you could imagine.

At this point if you have even the most basic connections (US bank account, home owner, gps, insurance, a car) the more sophisticated tracking methods can still learn about you and reach you with a decent success rate. It's just that most companies are using crap ad tech.

So how do you morally justify your work? Do you think this work is valuable? I know I sound confrontational, but I'm seriously wondering about this. In particular because the HN crowd in general has been so highly critical of most adtech, but at the same time a significant portion of people here probably works on it or for one of the companies which are behind it.

There are many immoral players in the industry, but there's nothing inherently immoral about advertising.

I help people discover products and services that they love.

I help thousands of business owners find more customers, so they can support their families and create jobs for others.

There's a lot of bad behavior from marketers and sketchy business owners, but I know the work that I'm personally doing has a tangible, positive impact on the lives of thousands of people.

> but there's nothing inherently immoral about advertising.

That's debatable or at least strongly depends on your definition of advertising.

moral advertisment could in my opinion only tell a person about something when they actually need that thing.

For example telling the person what kind of options are available if a person needs to travel from x to y.

Any other form eventually ends right where we are right now, with with scientists and psychologists trying to figure out how to manipulate people into doing things they wouldn't otherwise.

Isn't it basically known that FAANG pays the most because they have the most invasive & powerful adtech, and adtech is actually just a euphemism for outreach and control?

And I hate to say it, but they justify their work really easily: they get paid very well for it and someone else would if they didn't anyways

The real battles were fought and lost long ago in this domain, and the chances of success were never high. Power is an addiction and powerful people are the most addicted, they always have and always will aim to subjugate and control, to be a pawn of such a person is definitely sad but maybe worth the lofty goal of having true financial independence

I vote with my conscious, but I work for whoever will pay me the most money.

Does this mean that you would vote for a candidate who promised to impose meaningful regulations on your industry? Even if that meant harming a company you work for / may have equity in? Is there some other solution to this tracking proliferation?

My question assumes you believe the work your industry is doing is unethical but that seems to be implied by your posts.

Absolutely. I'm not an owner and I don't have any equity.

> Is there some other solution to this tracking proliferation?

Not in capitalist countries. Serving someone an ad that is actually relevant to them is good business.

The government doesn't need web cookies to find you, they can just call your ISP or your phone carrier, or check surveillance cameras, ask your bank, so-on. No one from these private companies has the time to look at the data, you are just a number to them.

Though it might feel like it, you aren't being spied on in any meaningful sense. You are just being profiled, as a tax to use all of the free services you have access to.

Paid services that don't sell your data are the way to go, if that kind of thing scares you. But the ads you see online are going to suck.

> But the ads you see online are going to suck.

Even with all the profiling the ads still suck.

It amazes me how little insight all this profiling actually gives advertisers. Sure they advertise stuff to me that I'm already searching for, but advertising is supposed to be about brining in new customers. Not advertising a product to me after I've already decided what I want, searched for it, and bought it (or decided I don't want it).

I get ads for weeks after that are a complete waste of the advertiser's money, LOL. I don't think I've EVER bought something online that I didn't know I wanted until I saw an ad...

> Even with all the profiling the ads still suck.

It seems silly to me as well. But when it comes down to it, they serve the ads that pay the most, right? So perhaps all the fancy machine learning and all the other garbage is just a way to say to their customers, "Hey, advertise with us, look at all this fancy stuff!"

and then someone comes along and says, "hey, we'll pay more than any other relevant ad..." and POOF. All of it doesn't matter anymore. At the end of the day, it's about getting the dimes in the right pockets.

> The government doesn't need web cookies to find you

Yet the government collects[1] them [2] in[3] bulk[4].

> Though it might feel like it, you aren't being spied on in any meaningful sense.

Not[5] true[6]. There are real harms done.

> Paid services that don't sell your data are the way to go

It's hard for paid services to compete with free ones, so they're incentivized to sell your data anyways.

[1]: https://www.washingtonpost.com/wp-srv/special/politics/prism...

[2]: http://www.washingtonpost.com/world/national-security/nsa-in...

[3]: https://www.forbes.com/sites/thomasbrewster/2019/12/11/googl...

[4]: https://www.washingtonpost.com/technology/2021/09/25/tech-su...

[5]: https://arstechnica.com/tech-policy/2021/07/catholic-priest-...

[6]: https://www.eff.org/deeplinks/2020/01/grindr-and-okcupid-sel...

It's possible for someone to buy public data and match that with their first party data from your account on their service in order to anonymize you. But it's unlikely. This same tactic was used in Afghanistan to find terrorists.

My point being most people aren't being hunted by anyone in particular like the US Government or the Catholic Church. It's expensive to set up these honeypot apps, and to buy this data, and spend the time to de-anonymize it. Frankly that church story is kind of insane.

Your second example is not really relevant IMO.

> as a tax to use all of the free services you have access to.

The problem is that the current economics make it more profitable to operate an ad-supported product than a paid product.

Regulation that would make ads less profitable would allow paid products to actually compete on price.

Ah, so you don't find it unethical. Makes sense then you have no qualms accepting your paycheck.

For the record, "capitalist" countries can and do regulate businesses. Those that are democracies do so based on the general interest of the citizenry. Unless you consider anything beyond absolute libertarianism to be "not capitalist". I find these semantic arguments often confuse the issue at hand.

I would love to pay for services that don't track me. You mentioned in another comment it's becoming hard to own a car, have a bank account, insurance, without tracking being baked in. I'm interested in ways we can work to change this. I don't subscribe to your belief that there is no solution, or that the level of tracking involved in the status quo is not "meaningful".

I am personally very jaded with the United States' version of capitalism and regulation. This is a country with for-profit healthcare who also run ads. To say the least.

Your voting record, vehicle ownership, and household income, depending on what state you live in, are sold by your state government itself! We have a long way to go when it comes to ethical capitalism in the United States. Too far, actually.

I absolutely agree there is much work to be done. Let's start doing it. I don't see how offering one's talents to these companies is moving us in the right direction. It's not a meaningless act to accept a higher pay check in exchange for working in such an industry. They are paying more because they want solid employees who will work hard. Unless you're there just to throw sand in the gears, I really don't get it.

> "capitalist" countries can and do regulate businesses. Those that are democracies do so based on the general interest of the citizenry

You have a strong reading of the level of citizen constituency at play in these neoliberal corporate welfare states

This type of cynicism about countries like Canada and the U.S. is unwarranted when looking at the full sweep of history. Progress was made in the past. All is not lost.

Yeah, that's uh... "cool".

Not everyone can just "choose" what field they want to be in. If I got to choose what I did every day, I'd be a writer or a musician. No ethical concerns there.

You don't have to work in ads or tracking technologies. Unless you are quite junior and needed to accept whoever was willing to take you. But based on your knowledge of the industry it seems you've spent quite a bit of time there.

Most people who work for eg google s engineers have plenty of other decent paying options, especially right now. Sure, not everyone diodes, but most do.

Note theres a difference between advertising and surveillance. "Adtech" proponents are likely to conflate the two since adtech relies on surveillance.

Not GP, don't work in ads (actually I think my former colleagues probably think I left because our role pivoted that way) but I'm of the view that 'if I don't someone else will', and so in a way I'd rather learn more about it, maybe be the inside voice to say No this is way too far etc.

It certainly doesn't particularly interest me - maybe there are interesting problems if your big on ML - but I wouldn't turn down the best offer I had just because it was in adtech and I wish it didn't exist to go to anyone.

> but I'm of the view that 'if I don't someone else will', and so in a way I'd rather learn more about it, maybe be the inside voice to say No this is way too far etc.

I can't remember where I saw it, but I once read a rather convincing article that made the case that's the exact attitude the allows immoral things to happen, especially at scale. Basically "participate, but try to change it from the inside" pretty much simplifies into just "participate." You give them your talent, and either your attempts at change fail or are so small to be pointless (e.g. winning the fight to not pack Jews in so tightly in the boxcars that are still going to the concentration camp).

I'm not saying I'd be actively trying to blow things up from inside, my job wouldn't last long, and my whole point was that I'm not an activist about it: someone's getting paid for it and it may as well be me (if I don't have a better offer).

In your horrible analogy, I suppose what I mean is something like (but come on, not remotely close to!) 'why are we burning books and looting art, this is terrible, oh well someone else will if not me, and hey, I like art', but then 'you want to do a holocaust?! No, that is absurdly too far, whistle blow'.

> someone's getting paid for it and it may as well be me (if I don't have a better offer).

And maybe that person is worse at it than you, or without you they don't have enough people with the right skills to succeed.

> In your horrible analogy, I suppose what I mean is something like (but come on, not remotely close to!) 'why are we burning books and looting art, this is terrible, oh well someone else will if not me, and hey, I like art', but then 'you want to do a holocaust?! No, that is absurdly too far, whistle blow'.

Fat lot of good whistle-blowing would do in that example.

There's also the aspect where the last step was too far for you, but you helped take every step before that which enabled that last step to be taken without you.

If your attitude is "someone's getting paid for it and it may as well be me," you probably should just drop idea that you could change things from the inside, since it amounts to a BS rationalization to take the money.

Since I've been "reassured" that we're not being listened to by smart speakers, I can only assume that familial links i.e. the 'facebook graph', are also exploited by ad companies now as well.

Anec-data: visiting the in-laws in a different part of the country, so my device and location is 'known' to have geographically moved by tracking, they searched for garages and glazing. We return home, and every advert is for garages and glazing.

That’s most likely because your device used their wifi and ended up behind their nat using the same ip as them. The anonymous tracking cookie ids on your device get associated with searches from that ip. (i.e. “cookie 1234 is in-market for glazing”) and you take those cookies back home with you. This is actually a massive failure because you’re not likely to purchase glazing at all.

I'm not sure it's a failure. You could argue that showing him ads for glazing could be beneficial in case he spots a good deal and decides to talk to them about it.

Now I don't think the ad network is that smart and explicitly intended for this, but presumably in aggregate they're seeing better results by merging targeting buckets by IP than not, so they continue doing it.

What reassurances have you had about not being listened to?

I keep cam/mic access switched off a lot for this reason, and I've only seen generic denials/refusals that it is or could happen but nothing concrete, much like the pre-snowden outlook towards internet surveillance.

If someone wants to convince me FB, et al are doing this, they can show me packet traces of audio being uploaded for analysis, or in the alternative (if the theory is on-phone analysis) resource consumption data from the phone doing it.

There's been so much discussion of this theory that we'd have seen one of those by now. I have zero love for the surveillance shops, but there's just no real evidence this happens.

Resource consumption for low-accuracy voice transcription is trivial - think in the single-digit milliwatt range[1] for custom hardware. It would also be really easy to hide the resulting small amount of textual data in routine communications with the service's server.

You can't rule out audio transcription on the technical basis that "it's too hard" alone, because it's not too hard.

(for Google, that is - Facebook is constrained by the Android sandbox, but Google has their opaque Google Play services blob on almost every Android phone)

That's obviously not a reason to believe that it does exist - we'd only know that if a Google whistleblower stepped forward, or if someone reverse-engineered Play Services - but we can't rule it out on a technical basis alone.

[1] https://groups.csail.mit.edu/sls/publications/2018/Price_IEE...

Better to focus on the things that we have evidence that they are doing (and there is plenty of abusive behavior we know about) than to speculate about unlikely attack angles and say "we can't rule it out" (proving a negative is nearly impossible). Working that way just leads to focus on the wrong threats.

I'd be satisfied with a "beyond a reasonable doubt"-type argument. Somebody pre-registers some hypotheses, like "I've never thought about beanie babies in my life, but I'm going to talk about them in front of my phone. I expect I'll start seeing ads for them at a noticably higher rate then before.". Such an experiment wouldn't be difficult to conduct, but I've never heard of it being done methodically, only noticed after the fact.

I'd be very pleased to see it done though. Complications will certainly arise, so N should be large, and there should be several independent replications.

Maybe it's overly cynical, but this also makes me think of the Volkswagen emissions kerfuffle. Would it be so surprising if the ad software was sophisticated enough to know when it was being tested, and try to play dumb?

My wife and I did this. We don’t watch pro baseball at all and we live in the southeast. Had a 10 minute conversation about the Cincinnati Reds and within a few minutes we were seeing MLB ads on Facebook.

I don’t know how they are doing it…but it happens.

> What reassurances have you had about not being listened to?

I know enough about tech to know that it's very very very hard technical problem, and hiding it is basically impossible. And no one showed anything even reassembling any form of breadcrumb pointing towards it, not even a proof.

> I know enough about tech to know that it's very very very hard technical problem, and hiding it is basically impossible.

This gets repeatedly asserted, and it's false. Low-accuracy voice transcription is a solved problem, and is relatively easy to hide, as long as you have API access[1]. (so, Facebook is probably in the clear, as neither Apple nor Google are crazy enough to let them have invisible microphone access, but it would be relatively easy for Google (Play Services hook anyone?))

[1] https://news.ycombinator.com/item?id=27142812

I'm very well aware that low accuracy voice transcription is possible.

But it's naive thinking that having an algorithm equals solving a technical problem. That's not even the problem. Problem is how to deploy it, at scale, without anyone leaking it (both employees and vendors). And hiding it so well, that none of the security researchers will be able to find it. And doing all of that in a way, that they can use it and get value out of it.

And then compare risks of doing with risks and ROI of, for example, improving search accuracy, so people will just come and tell you more about stuff they want.

Have you considered that they might be tracking your thoughts and speech through microchips they implanted in your during your Covid vaccination? /s

> It's just that most companies are using crap ad tech.

Including Facebook, apparently. About 90% of Instagram ads were irrelevant to me for about two years. Then I mentioned an interest in photography in a private conversation, now 75% of ads are photography-related.

I once let my then 13yo borrow my laptop to do some research as he wanted to start experimenting with modded minecraft. I sat next to him while he researched minecraft plugins - no opportunity for him to hit some cough other websites, it was totally legit browsing session. Let's just say the advertising bubbles that Google rammed me into literally forced me to install adblockers just to make my web browsing experiences SFW again (also saw a huge spike in hostile ads with malware, and suspect it was too drastic and immediate a change to be mere coincidence).

Saying "crap ad tech" is giving way, way too much credit to ad tech. I'd argue current ad tech IS crap, because the overall methodology and approach is fundamentally crap.

Might not be that strange. The Minecraft plugins community is the perfect group to trick into installing malware. There are shaders and extensions that are hosted on very shady website and people just act like it is no big deal.

A teenager searching for video game plugins is the perfect target for all kinds of scams - it is absolutely not strange, the system is working as intended within the constraints it's given. Change the constraints by introducing regulation (such as advertising networks being liable for the ads they display) and the problem will go away.

The methodology is not completely crap. The role of the ad network is to get paid to display an ad. They aren't incentivised to ensure the ad is actually relevant - as long as they find some sucker to pay them to display the ad, they'll happily display it. Whether the mark then "converts" is not their problem.

The problem is that the entire advertising industry is crap and has successfully contaminated and taken over the tech industry. The only way out is regulation so that 1) advertisers are liable for what they display (to discourage scam or illegal ads - and in some countries NSFW content would fall into this category as you're supposed to ask or verify the user's age) and 2) privacy regulations that make targeted advertising opt-in so that overall the cost of advertising becomes too great and starts allowing alternative monetization models (such as actually asking the user to pay for the service) to compete.

Let's apply occam's razor.

What's more likely?

1. You mentioned thousands of other things in your private conversations. They suddenly start to narrow down on one specific one. 2. You're interested in photography, and are using service made to share photos. You follow photography accounts/search on the internet for photography related content, this signal got picked up, you clicked on some ad (by mistake most likely), and signal got amplified.

To address your breakdown:

1. I distinctly hadn't mentioned photography in private or public messages. Ever. Nor had I tagged any of my camera equipment in any of my posts. Hard to believe, I know, but I created and manage the account with explicit intent and action, preferring to move personal conversations off-platform.

2. I don't follow photographers, and prefer to only follow certain friends. While I don't go to that great an extent of concealing my browsing history, I also make some effort to segregate information flow. Plus, the low-hanging fruit of Firefox + DuckDuckGo + uBlock usually does a decent job of helping with that.

3. In the two years I've had an account, I hadn't seen a single photography-related ad prior to the conversation on photography. Ads typically revolved around random tech products (many of which were irrelevant) and ads for TikTok (zero interest in it).

4. There was a dramatic shift in the content of ads within 24 hours of the conversation on photography. It's conservative to estimate 75% of ads are photo-related. I have a hard time remembering any other ad (a sign of their irrelevancy, in my opinion).

5. Given the demographics of this site, I'd appreciate (for myself and others) if you'd give at least a little bit of benefit of doubt on technical comprehension.

If you are interested, you should know that it's possible to not be seeing any ads at all on a computer (phones are a different beast). It's a bit of a chore but it takes 3 things: firefox with ublock origin + noscript + delete cookies and history on exit. I haven't seen ads in years. You can go a step further and create startup profiles. It's a bit chore because you have to selectively allow domains for content to display (reader mode can help), but that's the price to pay to minimize tracking and remove annoying ads.

Thanks! I rarely see ads on desktop because I use Firefox + ublock + Pi Hole. I do, unfortunately, have to whitelist the occasional domain because some sites that I need to access don't play well with that setup.

What’s more likely? A service that vacuums up every piece of data vacuuming up microphone data, or the same service intentionally excluding this one random piece of data? Every time this comes up, people jump up and say “they’re definitely not listening!” But why do people think that? Facebook gathers as much data as possible, including buying a VPN to inspect supposedly private traffic. Why wouldn’t they listen to a microphone? As an anecdote, I know someone who worked on smart TVs. Part of what they implemented was a system to turn on the tv’s microphone to get an audio fingerprint of the content being watched.

Yeah, I swear they are listening. But if you think about ALL avenues including the people around you then maybe not. But yeah there are still cases where audio monitoring seems to be the only explanation.

> As someone who works in ads

Perhaps you should consider a career change?

I think anyone who works at an ad agency can tell you that they think about a career change all the time.

Either ethical or otherwise, what are some not-crap ad tech companies, that sell their services?

So this means privacy on the internet is now gone?

Can you recommend methods to avoid the tracking?

This is a bit old, but to answer your question.

You need a pi-hole hooked to your open source router that runs traffic through a VPN. You need to use cash or stay at a credit union that doesn't sell your information (if anyone has found one of these please let me know).

You need to not use a cell phone and stay off of most social media.

You need to not utilize credit. You need to use single-use email addresses for everything you sign up for, and furthermore to trust the system you use to create those addresses.

You need to use an OS like TAILS or QUBES and never stay logged in to any platform, reject all optional cookies and flush non-optional ones. There are going to be services that won't let you operate this way.

You need to get stuff shipped to a PO Box. You somehow need to stay out of the healthcare system.

There are some stuff that's out of my expertise like having a good lawyer to make sure records of your activity are expunged and that you are aggressively wiping your credit score if anything does pop up.

A good way to achieve some of this would be to found an LLC and buy your house and car through it instead of your personal name.

It also would be a good idea to live somewhere where you can obstruct wifi and cell signals from your house without getting into a bunch of trouble, and make sure your guests simply don't bring their devices into your home.

"At this point if you have even the most basic connections (US bank account, home owner, gps, insurance, a car) the more sophisticated tracking methods can still learn about you and reach you with a decent success rate."

Lots of younger people do not have all of the above. If everyone had all the above, then the statment would not contain "if", it would be a given. But lets assume every living person can be "learned about" and "reached". That does not necessarily mean every person is worth learning about or reaching. This sounds very much like a person with a heap of surveillance data trying to make it sound valuable.

"It's just that most companies are using crap ad tech."

As an unwise HN commenter once said, "The market has spoken." :)

"... and reach you with a decent success rate."

How effective is "decent".

Why should I pay for this.

No doubt people are working hard trying to improve surveillance and are making progress against users who havent a clue whats going on, nor any interest in getting one. Kudos for the easy victory. Like shooting fish in a barrel.

But whats the point in trying to surveil someone like Mr Peguero. Hes not hiding his identity or location, or his preferences (Silicon Valley is garbage).

If "adtech" surveillance succeeds against someone blocking Google IPs, then what. Whats the end game.

Anyone who is willing to take the time to block Big Tech with a firewall is, IMHO, unlikely to be a very profitable ad target.

Advertisers should only care about people who are likely to spend money on their products/services. However people conducting blanket surveillance trying to pitch to advertisers ("adtech"), they are the ones who have an interest in arguing, honestly or dishonestly, that every last "identified" individual is a worthy ad target. If I am an advertiser, I am not going to be particularly interested in trying to advertise online to someone who is running OpenBSD or blocking Google IPs with nftables.

Individuals like the OP are proactively saying, "No, thank you." (The OP is actually saying "FU".)

Personally I find its quite easy to control/limit/stop data transfer initiated by websites/popular_browsers using a forward proxy. And I can use AI, too.

However I think blocking and other Google IPs at the firewall is also good practice, not necessarily to stop ads/tracking by websites but to limit the users resources available to Google. Given their incentives and the fact we as users do not pay them like advertisers do, I think its naive to trust Google's employees will, for example, always honour the system DNS settings.

Unrelated but its possible that many OpenWRT users using default settings are actually pinging


Do you have a car registered with the state, a debit card, and a cell phone? A credit score perhaps?

Do you run every single connection to the internet through a VPN service that keeps no records? Do you use TAILS or Qubes OS? Do you actually own your router?

It's a leaky boat.

>The next generation of tracking tech relies on the backend transfer of data between a website and the ad platform, which is invisible to your own network.

But doesn't relying on the publisher's website log statistics instead of the end users' browsers introduce trust and "bad actors" problem? This has been a known "principal-agent" problem[1] for all the decades that 3rd-party ads have existed on the web.

I.e. Google getting onclick statistics from web browsers' Javascript and reporting to Google-owned "doubleclick.com" is different from the server logs of "JoeClickbaitContentFarm.com". Doesn't the contentcreator's website have an incentive to falsify the numbers to get higher payments from the ad network?

It doesn't seem like website self-reported server stats can fully replace end users browsers tracking. Instead, it augments it.

[1] https://en.wikipedia.org/wiki/Principal%E2%80%93agent_proble...

I think OP comment is probably referencing more purchase & consumer data, rather than sell side impression data. E.g. we send non-profit donation data into FB to help optimize our fundraising advertising. And ads are transacted based on conversions a lot of the time, instead of raw impression or click data which I think can optimize for fraud like you say.

Though I haven't thought much about what you bring up or that principal, thanks for sharing.

Ultimately google can render ads through a middleman iframe they control, so even going back to a basic impression count without a bunch of JS controls and measurement they still have lots of tools especially since they have large amount of log in data on their domains only they can verify.

I remember when I was just learning the internet doing something like this. copy pasting html (before it was all rendered with JS) to edit things to make them my own. I used an adsense account to render mesothelioma keywords and hit refresh a bunch; got paid a small amount, my parents were impressed a 13 year old got checks from google ;) There's a ton more learning resources now, but the days of copy pasting basic website html is gone.

The data is sent from the advertiser to the ad platform, not from the publisher to the ad platform. The advertiser is incentivized to send accurate data for both performance optimization and for campaign measurement purposes.

Ad fraud is a real problem in the ecosystem, but the server-side APIs are actually more secure. You have a private signed backend endpoint rather than public JS that can be injected anywhere and fed fake data by a malicious party.

>The data is sent from the advertiser to the ad platform, not from the publisher to the ad platform.

Then we're talking about different things. This thread has packet filtering to prevent user behavior being sent to Google. For example, see recent thread about Google's click tracking: https://news.ycombinator.com/item?id=28672625

The key is that click choice data on Google's search results page is never seen by advertisers so your explanation of "next gen tracking is by advertisers calling APIs to ad networks" -- isn't relevant to that scenario.

Then another level of tracking underneath Google's visibility of click behavior on its own search page is the website (publisher/contentcreator) recipient of the click. Whether any advertisers see this downstream click statistic on an ad network depends on the particular website. E.g. a content creator website might have tracking that sends data to Google domain "googleanalytics.com" -- but no advertisers.

Couldn't this be at least partially solved by the advertising agency still embedding tracking code into the web page, and the tracking payloads are included in the web owner's reports back to the advertising agency?

The data could be safeguarded by a cryptographic signature, though there's some trust paths that would need to be solved.

> The next generation of tracking tech relies on the backend transfer of server logs between a website and the ad platform, which is invisible to your own network

I've not heard of this, what sites are giving logs to networks?

I couldn't tell you a specific site, but here's an explanation of both Google's and Facebook's functionality.

FB: https://developers.facebook.com/docs/marketing-api/conversio...

Google: https://developers.google.com/google-ads/api/docs/conversion...

This is also why companies like Tealium and Segment are currently valued at billions of dollars. They provide a single middleware integration point to funnel customer data to the dozens of marketing companies that are now leveraging server-side APIs instead of browser pixels.

Not to forget Google's server side containers in Google Tag Manager.


All Shopify powered sites have the functionality available to send tracking data directly to Facebook via their API, if the merchant enables it. https://help.shopify.com/en/manual/promoting-marketing/analy...

"Server Side GTM" is a good starting point for for relevant information

Shipping logs would be an extreme breach in my mind.

But I can BS. Without a cross site unique identifier, the logs would not be usable across different sites...

Though... I guess a browser fingerprint could be used as a non-centralized method to generate that unique key...

Anonymous metrics, just the basics to help us keep track of service availability. Oh and a unique device identifier along with some information about your hardware and OS. Well... and your IP and connection times of course.

Don't worry though, the marketing websites of the commercial services we use to gather and analyse this data say they're keeping your data very safe and secure!

Your privacy means a lot to us.

No no, it’s “we value your privacy” which can be read in a couple of different ways.

Login name, email address, credit card number. Lots of ways for companies to get together like this to follow you around the web and apps. All invisibly. And you'd never know it.

Matching up is called “entity resolution” and reminds me of this recent showHN https://news.ycombinator.com/item?id=28127650

Ad tech does not need a cross site unique identifier for everyone in order for cross device targeting to work well enough.

There’s other tricks of the trade that make it good enough.

So what we really need is a way to easily and cheaply host “virtual clients”, bots to generate traffic so that real clients disappear in the noise?

This is correct. To expand: things like installed extensions, window size (as well as monitor size), bandwidth and general user speed, adblockers themselves and their individual block lists, and browser are all adding to your trackability profile.

I believe you can get specific information about user's operating system (either through legitimate, direct checks or by exploiting features and using process of elimination such as X version of Chrome is only available on Mac) and of course hardware IDs.

Your IP is obviously out there as an obvious profile that can build a general picture of you in a very similar way to phone numbers. If you use a VPN, the IPs bought by that can also be profiled to narrow you down. If you’ve seen a denial message telling you to not use a VPN, this can be what’s happening.

There are also just official exploit-tier-like features constantly being added. For example, Chrome is adding the ability to see if you're idling on a page.

I've noticed some major internet sites compiling this type of information for use in, for example, permabans. Trolls have otherwise been able to use a VPN or just create a new account. This is a large driver of finding new tracking methods outside of just personalized ads.

I think a lot of this is in its relatively infant stages. I suspect it'll be 5 to 10 years before people become aware and some newsworthy incident of major abuse occurs.

Is there an easy way to truly defuse various web APIs and anonymize browsers wrt that, except IP? I seem to be maintaining an almost globally unique setup for years judging by various browser privacy awareness tests, despite being a completely uninteresting person.

Should I, for example, use Docker containerized browser exclusively, or somehow use Selenium for all browsing traffic, or do something else drastic to that effect?

There's so many gotcha-holes that it's nearly impossible to get them all and still have a usable browser. Security updates are also going to keep you updating, and those introduce new unique identity problems.

Is there any way to be fully be anonymous online? No. The best you could do is Tor on a privacy focused operating system on a disposable computer on public wifi, but there are still loose ways to track that activity.

Or just cameras and transaction logs for buying wifi time or a coffee at that place. If you opt to not buy coffee, employees might remember you as the freeloader. If you pay in cash, you might be that only person who uses cash.

Obscurity is the best we can do right now. A virtual setup like docker using a typical setup with a VPN is the current most reasonable solution we have.

However, things like your grammar and sentence structure or even going to your profile instead of the home page before starting to browse are always going to be weak points unless you write a bunch of random "AI" to counteract that.

But then you're just the weird user doing a lot of random "AI"-like things.

Yeah that will be the standard answer... However I don't have much issues with IPs, what I have problems with is browser. A disposable computer on a public Wi-Fi still gives out your computer model and potentially allows identification for your unique machine.

I'm aware that server admins are able to get WHOIS on my IP and run triangulation by latency, which I can decline by doing that throwaway-laptop-cash-paid-gloved-hand-Tor-over-free-wifi-Guy-Fawkes pretention, if need be. But my priority is to get a "clean" browser that are indistinguishable from anything.

Sucks it ain't easy in 2021.

"This is cool, but not super useful in 2021."

Why would someone in marketing think this is cool.

"The next generation of tracking tech relies on the backend transfer of data between a website and the ad platform, which is invisible to your own network."

The transfer of data between the user and a website, Google, Facebook or otherwise, is visible to the user.

"It also shifts the story from "Google and Facebook nonconsensually tracking your every digital move through websites and applications" as described in the article into "websites and applications actively transmitting customer data to Google and Facebook.""

Why would websites transfer data to Google and Facebook. I dont know perhaps every website is different. But if I were a website I would only send data to Google or Facebook if Google and Facebook already had some data of their own.

Users who are actively monitoring the data they send to websites can assume that all websites, including but not limited to Google and Facebook, are sharing user data with each other. That doesnt mean we think they are, but there is no way to verify they are not (or to hold them accountable if they were); thus we know they could be exchanging data, without taking much risk.

> The next generation of tracking tech relies on the backend transfer of data between a website and the ad platform, which is invisible to your own network.

This is a major victory. It proves that technology is effective at bringing about change. It proves we are not powerless. We can force these giants to adapt to us whether they like it or not.

> websites and applications actively transmitting customer data to Google and Facebook

Now we work towards making that illegal.

> The next generation of tracking tech relies on the backend transfer of data between a website and the ad platform, which is invisible to your own network.

Well if that's true then it should be illegal if it's not already. How do you find which sites are participating in this data grab?

"websites and applications actively transmitting customer data to Google and Facebook."

Any website doing this for EU users without their consent is going to run into GDPR issues very quickly indeed.

I'm sure that, like most other consent prompts, it will be opt-out with lots of sketchy dark patterns (like artificial waits) to ensure you don't opt out.

That's actually still in breach of the regulation. However you are right to have concerns as the GDPR is not being enforced seriously.

As mentioned above by @marketingtech, the websites assume the responsibility of managing user consent and therefore are GDPR compliant.

This is why browsing the internet feels like filing for a mortgage now. But it's compliant.

Yes, this has been a terrible escalation of the battle. It means that, in addition to my other defensive tactics, I also need to be sure not to create accounts, even -- or especially -- free ones.

Surely Ycombinator (or others) would never run analysis on comments on HN to spot emerging trends.

Is moderating for users or to keep their inference engine simple?

I mean the backers of Facebook? The schemers who took back control of Reddit?

Surely the most principled of people.

That’s great because that means there can be no more cross-domain tracking. Which is what was the problem all along.

As someone with more time, I prefer to maintain a massive whitelist for my router. Daily websites receive permanent privileges, incidental websites (such as peguero.xyz) receive temporary privileges (e.g. allow traffic for the next minute), everything else is dropped.

I don't have to worry about what chicanery advertising companies are up to when they can't reach me even if they tried.

"So the fourth herd of deer took up residence where the poison-grass sower & his followers couldn’t go and—having taken up residence there—ate food without venturing unwarily into the poison-grass sown by the poison-grass sower. By eating food without venturing unwarily into the poison-grass sown by the poison-grass sower, they didn’t become intoxicated. Not being intoxicated, they didn’t become heedless. When they weren’t heedless, the poison-grass sower wasn’t able to do with them as he liked on account of that poison-grass."

How do you implement that? I have a whitelisting transparent proxy for my kids (contrary to the popular meme around here that all kids are NSA grade hackers determined to defy your every attempt to protect them, it's uncontroversial in my house and works very well). I use squid for that and have a shonky web UI I made to access the logs and update the whitelist acl. I'd like to make it more capable (stuff like temporary unblock like you mention). AFAICT the only way to do such things is writing a squid "helper" that runs as a separate process (/processes). Is that what you're doing?

I use adblock on openwrt with a basic script to write to and revert the whitelist, and to restart dnsmasq. I use qutebrowser and made whitelisting a hints shortcut.

There's almost certainly a better system, but this works for me.

Honest question: is it worth it? Why would you spend your time on managing that temporary white list? Do you think that time is wasted, or not? (I apologize if my phrasing is a bit rude, but i'm really curious about that, and want to understand your thinking)

I think people like this see it as a 'win' – as if they, John Smith, have beaten the dastardly BigCorp. Whereas, in fact, the most that happens is a Junior Marketing Executive at BigCorp says "Right, that guy falls within the 0.5% of techy customers who make things difficult for us. Ah well, it's only been 80,000 of them, well within our margin for this month."

True - however, IMO, the value is in the awareness of tracking and the knowledge of how to block things as such.

Its better to know how your network operates that you rely on for your daily life than to know nothing about its internals.

My biggest issue as I age is that I FORGET how to do some of the higher level networking that I used to know innately - and I also lose interest in doing such things and become lazy, complacent, and as I forget things, more and more ignorant to it all...

Take PC Gaming as an example, or server rebuilds.

I could build SUN 650s and many many PC based servers with a blindfold on.

I grew up gaming and ran Intel's Game Development Lab for some time and was super knowledgable about all things PC/PCGaming when I had the lastest and best hardware literally delivered to me every day at intel...

Now I don't knwo shit about 'PCMasterRace' and building these days....

The issue is that people like this fetishise avoiding tracking. It doesn't seem like they have a clear reason why they want to avoid tracking. Do they have sensitive data to hide? Do they ideologically disagree with large companies gathering data? Is it anything else? It honestly doesn't seem like it. It seems more like "stopping them from getting my data" is treated as an end unto itself.

I can’t speak for everyone but there’s a growing awareness of where all the risks to society with gathering and spreading all this data.

It surprises me how someone who understands the inner workings as well as the interactions of the systems that society has increasingly expected us to depend on are not scared shitless of how things will look a generation from now.

Who cares about the intricacies of building a pc? You do it every 5 years and it takes a few hours…

I care about no longer remembering something I used to be considered a master at previously. :-(

I don't like knowledge evaporation.

I don't think you know what you are talking about. Here is a link to what they were quoting and might fill you in.


I have no idea where that literary quote is from but it’s pertinent here.

In related news, what software or tools do you use to manage that whitelist? I’ve been considering stunting similar.

Not OP, but I am using uBlock Origin. Here's how I do it:


Do you use an SSL proxy to catch unwanted requests to CDN's like Cloudflare that would otherwise be allowed?

seems like an enormous amount of effort for essentially no benefit.

I applaud the effort (Boston FTW!). But what can we do to treat the cause, instead of just the symptoms?

Stopping big-tech's tracking is cool, but strictly inferior to removing their incentive to track you.

How to do that? The ads are valuable for a reason – they work. If they stop working (or rather work less – it's a spectrum), then the collected data becomes just impotent bits filling up some HDD somewhere.

I feel the fix will be more along the lines of improving individual psychology and mental wellbeing, rather than entering the arms race of adversarial technology to block packet traffic (or whatever).

The tricky bit here is that advertisers don't need their ads to work, they just need to convince their customers they do. Even if there are reasons to question the metrics used to measure effectiveness (see e.g. [1]) this is all meaningless if ad companies still succeed in convincing people they are selling something of value. Which ironically does mean that one way or another advertising companies are good at advertising something it just might not be the thing they're getting paid for.

[1]: https://thecorrespondent.com/100/the-new-dot-com-bubble-is-h... [1, hn]: https://news.ycombinator.com/item?id=21465873

This is a good point, and thanks for the links.

Sometimes it does feel like gimmicks all the way down. So much of the "digital market economy" is about generating demand, as opposed to answering demand.

Conservatively, I'm in favour of serving people's existing needs rather than making them anxious of FOMO something new. But I can see how demand-generation is more profitable: if you get to define what "useful" or "desirable" means, you're gold. "Competition is for losers."

But I see the solution as essentially the same: education and mental resilience. To move the needle back to personal agency, do the "malicious packet blocking" internally, rather than ex-post with technology.

I am reminded of Keynes' remark: "The secret of success in the stock market is not predicting which stocks will increase in value, but rather identifying stocks a majority of people think will increase in value."

> But what can we do to treat the cause, instead of just the symptoms?

By affecting the bottom line, increasing expenses and/or decreasing profits.

> If they stop working (or rather work less – it's a spectrum)

AdNauseam is an interesting attempt in this space - a browser plugin to automatically "click every ad to fight surveillance" (their words). By clicking everything, clicks become less valuable, at least in theory, but it has not really caught on.

> I feel the fix will be more along the lines of improving individual psychology and mental wellbeing, rather than entering the arms race of adversarial technology to block packet traffic (or whatever).

I agree with this. Ad blocking, ad clicking, packet blocking, is all thinking too small, always trying to catchup. It will always be behind and while useful for a niche subset of users, these kinds of technologies are more bandaids than a real solution to trigger fundamental changes to the advertising tracking industry.

What is a real, impactful solution? I don't know, but an area I have not seen explored much, considering by analogy:

Internet : Web :: Big Tech : ???

That is, the web layered on top of the Internet, as a disruptively transforming application, extracting and providing value.

Can another technology be created to build on the foundations provided by Big Tech, delivering value they provide, while avoiding their tracking/advertising downsides? I have little idea what this would look like in practice (how do you disrupt a billion dollar industry?), but if someone can crack this nut, it may change the world. Startup idea elevator pitch: disrupt Big Tech.

you have to poison the well. find a way to scramble enough of the data collected that they can trust none of it.

> The ads are valuable for a reason – they work.

I'll remind everyone that from my experience (starting a web shop and also being a target of ads) there is a lot to be said about ads. Ubers observations a few months back was no big surprise for me.

We turned off one particular network that according to their statistics were involved in most of our sales. Result: too small to measure.

Same goes for my observations as a consumer: I'm fairly certain what ad I get shown is decided not by how relevant it is but by who is dumb enough to pay the most for it without measuring results.

It's interesting that the last trend of "innovations" on the ad space make it harder for their clients to tell where exactly the clicks come from.

I've read plenty of claims that it's for increasing lock-in on their analytics platform... But why would people selling ads want to lock their clients on a free loss-leader?

I would love to block their autonomous systems entirely this way, but I fear that for Google it would be impossible without significant impact to my life. Frustratingly, Google is also the hosting provider for a massive number of critical services.

Consider that for many households today, you cannot block Google without also blocking your kid's homework, as Google Classroom has been made a mandatory part of a large portion of schools.

A somewhat more simplistic method is the hosts file. It won't stop hard-coded IPs, but it still works pretty well in my experience.


Hmm, not sure how that would work if some-merchant.com is sending logs of your browsing to FB or GOOG? You likely don't want to block the shop you're browsing (though maybe you'd leave abd shop on a different site if you knew...). I don't see how ublock, hostfile adblocking/filtering or DNS adblocking/filtering could help, really.

Does this not also prevent you from connecting to anything hosted on Google Cloud Platform?

I would imagine Spotify access would be heavily affected by such a block, considering their sizeable GCP deployments.

I wish the ad industry (especially web) would just break and just completely fall over.

Sure, we'd have this period of time where the world would feel like it's on fire. All your "news sources" (air-quotes) would disappear and everyone would be lost for a bit.

But then, innovation would happen. And the original web would re-emerge. And all the good things that we pine about for the old days of tech would return. People would pay for good news again or maybe Jethro would create the website he's been dreaming about.

Sure the stock market would probably collapse. And we'd probably have another great depression. But I sure think the world would be a better place after all the dust settled.

It would be nice if there was an effort that pulled the public/prviate keys hidden in the binaries of apps like facebook/google and decrypted the traffic for inspection/blocking. Rewriting would be nice but everything seems to be certificate pinned now.

Can an app just use the CT logs? I'm a little out of my depth on this topic.

How does certificate pinning work on corporate networks where all of the clients have a certificate from the local root CA installed and a proxy server examines all encrypted traffic? Presumably that doesn’t break Facebook so maybe there is a loophole there.

I've been using a similar technique found here[1] in macOS and it's very effective! The only issue I've had is I always have to redo it after an update. Admittedly I never did much research into why or if that could be overcome... clearly a macOS thing.

Pretty cool.

[1]: https://www.perpetual-beta.org/weblog/blocking-facebook-on-o...

I have just started to learn differential privacy (https://en.wikipedia.org/wiki/Differential_privacy), and am wondering if the following might be feasible in principle: on the browser level, instead of blocking the trackers, add a certain level of noise to the submitted data. This might form a truce between the end users and the trackers. Through statistics, the trackers might still be able to learn something about the end user group as a population; at the same time, each individual user's privacy isn't breached much more than they are completely offline. Admittedly this might be ridiculous and is just me under Dunning–Kruger effect as a beginner in this field.

Work with your state government to outlaw even consensual tracking, include fines, and make fines also apply to the development staff.

The first a step is a "know your developer" act to prevent software creation being anonymous. We have regulation around who can practice medicine, practice law, provide the same regulations to software. Remove the license for developers, business analysts, tech writers and project managers who write code to support tracking. It is easy to write evil code when your are protected by behemoth corporations and their lawyers.

Code is speech.

Your proposed law sounds more draconian and dangerous to free thought than the thing it’s supposed to prevent, and can be abused to have the opposite effect of what your intentions are.

We need more general technical literacy, not further increasing the gap between those who have and know between those who don’t.

Regulation around tracking, fine, but the consequences seem to far outweigh any benefits with the rest of your proposal.

Good idea. I wonder when they'll start using a backup ASN from a subsidiary.

Can I use this on my routers?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact