Hacker Newsnew | past | comments | ask | show | jobs | submit | davidfischer's commentslogin

They absolutely do. Every sponsorship you see on a podcast or a youtube video or a streamer is a contextual ad. Many open source sponsorships are actually a form of marketing. You could argue that search ads are pretty contextual although there's more at work there. Every ad in a physical magazine is a contextual ad. Physical billboards take into account a lot of geographical context: the ads you see driving in LA are very different than the ones you see in the Bay Area. Ads on platforms like Amazon, HomeDepot, etc. are highly contextual and based on search terms.


Founder of EthicalAds here. In my view, this is only partially true and publishers (sites that show ads) have choices here but their power is dispersed. Advertisers will run advertising as long as it works and they will pay an amount commensurate with how well it works. If a publisher chooses to run ads without tracking, whether that's a network like ours or just buyout-the-site-this-month sponsorships, they have options as long as their audience generates value for advertisers.

That said, we 100% don't land some advertisers when they learn they can't run 3rd party tracking or even 3rd party verification.


My employer, Read the Docs, has a blog on the subject (https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse...) of how we got pounded by these bots to the tune of thousands of dollars. To be fair though, the AI company that hit us the hardest did end up compensating us for our bandwidth bill.

We've done a few things since then:

- We already had very generous rate limiting rules by IP (~4 hits/second sustained) but some of the crawlers used thousands of IPs. Cloudflare has a list that they update of AI crawler bots (https://developers.cloudflare.com/bots/additional-configurat...). We're using this list to block these bots and any new bots that get added to the list.

- We have more aggressive rate limiting rules by ASN on common hosting providers (eg. AWS, GCP, Azure) which also hits a lot of these bots.

- We are considering using the AI crawler list to rate limit by user agent in addition to rate limiting by IP. This will allow well behaved AI crawlers while blocking the badly behaved ones. We aren't against the crawlers generally.

- We now have alert rules that alert us when we get a certain amount of traffic (~50k uncached reqs/min sustained). This is basically always some new bot cranked to the max and usually an AI crawler. We get this ~monthly or so and we just ban them.

Auto-scaling made our infra good enough where we don't even notice big traffic spikes. However, the downside of that is that the AI crawlers were hammering us without causing anything noticeable. Being smart with rate limiting helps a lot.


I'm not the poster you're responding to but I'm one of the founding team of EthicalAds. We're a small team, focused exclusively on marketing to devs, and really trying to show high-quality ads without tracking people (ads are contextually targeted).

You can get a feel for what you'll earn here[1]. Basically you earn 70% of the gross of what we charge advertisers (see advertiser pricing[2]). Keep in mind these are ad views which aren't quite the same as pageviews. They're a subset.

Ads are a straight-forward path to monetization but not always the best. If you can make a project work as SaaS or really make sponsorships work for you (this requires effort), those will definitely earn A LOT more money per pageview than ads. Ads require a lot of traffic to make them work well. Usually you want high tens to hundreds of thousands of pageviews per month.

From a what we look at for publishers (sites that show ads) perspective, we're usually looking for high quality dev-focused sites or projects that don't want to just show Google ads. Per ad, publishers will earn much more with us than Google display ads but if you want to stick 4-5 Google ads on your site, video ads, or the like we can't compete with that and we don't want our ads on sites that do that. Devs hate them. My email is in my bio if you want to discuss further.

Regardless, good luck on the projects!

[1]: https://www.ethicalads.io/publishers/calculator/ [2]: https://www.ethicalads.io/advertisers/pricing/


My employer, Read the Docs is a heavy user of Cloudflare. It's actually hard to imagine serving as much traffic as we do as cheaply as we can without them.

That said, for publicly hosted open source documentation, we turn down the security settings almost all the way. Security level is set to "essentially off" (that's the actual setting name), no browser integrity check, TOR friendly (onion routing on), etc. We still have rate limits in place but they're pretty generous (~4 req/s sustained). For sites that don't require a login and don't accept inbound leads or something like that, that's probably around the right level. Our domains where doc authors manage their docs have higher security settings.

That said, being too generous can get you into trouble so I understand why people crank up the settings and just block some legitimate traffic. See our past post where AI scrapers scraped almost 100TB (https://news.ycombinator.com/item?id=41072549).


There's a few ways first party cookies can track you. Probably the biggest single way is Google Analytics which by default uses only first party cookies. Even without cookies at all, GA could track you across the web although first party cookies do make this a little easier and "better". However, first party cookies can help trackers in other ways like for CNAME cloaking[1] which basically makes a first-party cookie function similarly to a third-party one.

Disclosure: I work for a small privacy focused ad company.

[1] https://webkit.org/blog/11338/cname-cloaking-and-bounce-trac...


SRI generally won't work here because the served polyfill JS (and therefore the SRI hash) depends on the user agent/headers sent by the user's browser. If the browser says it's ancient, the resulting polyfill will fill in a bunch of missing JS modules and be a lot of JS. If the browser identifies as modern, it should return nothing at all.

Edit: In summary, SRI won't work with a dynamic polyfill which is part of the point of polyfill.io. You could serve a static polyfill but that defeats some of the advantages of this service. With that said, this whole thread is about what can happen with untrusted third parties so...


Oooft. I didn't realize it's one that dynamically changes it's content.


So maybe it’s less that the article is selling something and more that you just don’t understand the attack surface?


It absolutely would work if the browser validates the SRI hash. The whole point is to know in advance what you expect to receive from the remote site and verify the actual bytes against the known hash.

It wouldn’t work for some ancient browser that doesn’t do SRI checks. But it’s no worse for that user than without it.


The CDN in this case is performing an additional function which is incompatible with SRI: it is dynamically rendering a custom JS script based on the requesting User Agent, so the website authors aren't able to compute and store a hash ahead of time.


I edited to make my comment more clear but polyfill.io sends dynamic polyfills based on what features the identified browser needs. Since it changes, the SRI hash would need to change so that part won't work.


Ah! I didn’t realize that. My new hot take is that sounds like a terrible idea and is effectively giving full control of the user’s browser to the polyfill site.


And this hot take happens to be completely correct (and is why many people didn't use it, in spite of others yelling that they were needlessly re-inventing the wheel).


Yeah... I've generated composite fills with the pieces I would need on the oldest browser I had to support, unfortunately all downstream browsers would get it.

Fortunately around 2019 or so, I no longer had to support any legacy (IE) browsers and pretty much everything supported at least ES2016. Was a lovely day and cut a lot of my dependencies.


They are saying that because the content of the script file is dynamic based on useragent and what that useragent currently supports in-browser, the integrity hash would need to also be dynamic which isn't possible to know ahead of time.


Their point is that the result changes depending on the request. It isn't a concern about the SRI hash not getting checked, it is that you can't realistically know the what you expect in advance.


Keychain Access has not been "fine". It's had multiple unaddressed data loss bugs. For example, Keychain lost all passwords from all Keychains after the Catalina update[1] and this wasn't fixed in the next 3 Catalina minor updates. Multiple users reported the issue to Apple and the response was crickets. Even if you restored the passwords, it helpfully deleted them all again. I switched to 1Password and declared Keychain Access a lost cause. I don't think I'll be giving them a second chance here.

[1] https://discussions.apple.com/thread/250722178


That does seem like one Apple fault, that Microsoft does better -- triaging actual bugs.

Apple has devs. Maybe some teams are short-staffed, but they fix things.

What Apple doesn't seem to have is a functional bugfix priority loop that includes customer input and provides feedback.


It starts with the horrendous thing called “Feedback Assistant”. Black hole if I ever saw one.


Depends on how we define "fine," but your own post clarifies that it was "website passwords -- but not app passwords, secure notes, certs, or keys." That's a pretty big difference compared to "all passwords" and seems like it affected a small number of people. In any case, if we're using anecdotes, I haven't had any issues with it so far and it's been decades. Given how 1Password has been getting shittier over time, I've been looking for an alternative, and I for one am going to give this a shot. You can check in with me in a few more decades and ask me if it went okay.


While I agree with you to an extent, this is not a very good way to check actual in-store prices. Most grocery stores charge a different rate for products that are delivered or even curbsided than they do if you go to the store and buy it yourself. This is true even if you go directly vs. going through an intermediary like DoorDash.

I live in coastal CA and the cheapest eggs at my local market are $3.49/dozen. Trader Joes and Costco are closer to $2.20/dozen if you just want plain white eggs. The moment you go organic, the cheapest is Costco at about $4/dozen.

Edit: Just to be clear, at Costco you buy 2 dozen rather than 1 and I've divided the price to a single dozen.


For other stores the online price might be different from the in-store price. But I think for Safeway, the online price is the same as the in-store price (which is why the webpage says "Shopping at <address>").


I can't confirm if the price is the same or not, but their terms[1] specifically mention that the price can be different.

> Prices for Offerings you order for delivery or pickup through the Online Grocery Ordering Service may be higher than the prices for such Offerings in our physical stores.

Most outlets I've seen (eg. Target) are the same in that they just list a higher price on the website than it costs in-store and they're upfront about this. It takes me 20-25 minutes of in-store picking to shop including checkout for my weekly groceries. Even if that is done by a minimum wage worker (~$17/hr here) that's ~$6-8 of service on top of them bringing it out to the curb. In addition, eggs are usually specially packed in their own bag (frequently with a sticker on the bag labeled "eggs") when they're bought online and curbsided. It seems a bit naive to me to think that all this service would just be included/free.

[1] https://www.albertsonscompanies.com/policies-and-disclosures... (linked from safeway's footer)


Hmm, interesting. I'll need to check the in-store egg prices when I go next to see if they match the website.


Healthcare providers and insurers in the US are bound by HIPAA privacy rules, but data brokers (mentioned in the article) and the ads industry generally are not. For example, if she used an app in the doctor's waiting room that shared/sold location data to a data broker, they can use her location data for retargeting purposes. There have been many cases in the past where advertisers targeted users based on visiting medical or other sensitive locations.

As to how they mailed it to her and got her home address, a data broker who has location data can fairly easily determine a user's home address from that data. Many brokers and networks may also already have an association between a "pseudo-anonymous advertising ID" and real user with name and address. Not saying that location-based retargeting happened this time as the article doesn't give us enough to go off of and other types of retargeting are another possibility.

Overall, I think it's unlikely that the provider or insurer shared her data and other alternatives are more likely.

Disclosure: I work in the ads industry but on contextual targeting only. Some location-based retargeting is terrifying and will probably eventually be criminal. It's a bit of the wild west right now.


I would assume that a more likely scenario (although yours is also likely!) is someone with cancer will almost certainly search the internet at some point for information about cancer and their treatments. This will immediately be sucked up by the pervasive surveillance economy and used to extract the maximum amount of marginal revenue attainable through any means necessary. You don’t need to know they’re in a doctors waiting room, using the internet for information retrieval will specifically inform everyone interested in paying for it what you are searching for.


Yes, I agree. It's also quite likely that people who know people with cancer will search about cancer, and sadly some of them will later need to purchase cremation services. This means that statistically it's not a bad idea to target people who have searched for cancer with cremations. This seems like the most likely explanation to me.

(Edit to add a meta note: Apparently this has to be said on Hacker News because people can't distinguish between someone presenting facts and someone making a defense, but I'm not defending the practice. I think it's abhorrent. But if we can't dispassionately analyze reality to try and understand the motivations, then we've really abandoned reason and lost our way).


All the medical information sites(WebMD,Drugs.com) are filled with ad beacons.


Ya, other types of retargeting like this are also likely. The jump from visiting a website to an advertiser does physical mailings isn't a big one (political advertising uses this a lot). Long story short, she was probably retargeted based on her actions and probably not based on the insurer or provider doing anything illegal.

Edit: I don't want to sound like I'm blaming the victim here. That's not my intent. I just don't think blaming the insurer or provider is fair either. I dump the blame on the data broker/ad network and to a far lesser extent the advertiser.


I wonder would the effect would be of extending HIPAA protections to information that you have inferred. If you have inferred something about a person that is protected by privacy laws, should that inference itself also be protected? How much of a shield should "we're not 100% sure, so it's just a very well-informed guess" be?

I have my own story about advertisers inferring personal. Relationship status isn't protected, but the last time I went through a breakup, I was suddenly inundated with dating site ads. I don't feel like my shopping or web browsing habits changed, but they must have to figure it out.


> I wonder would the effect would be of extending HIPAA protections to information that you have inferred.

That would be helpful. Also, HIPAA itself isn't exactly a panacea and is full of loopholes. Having effective medical privacy laws would be even better.


I'd just like effective privacy laws in the US generally.


> There have been many cases in the past where advertisers targeted users based on visiting medical or other sensitive locations.

Yep, I used to get ads like that all the time during my cab driving days.

I would often ponder on how they made any inferences out of my location data because I went to so many different places. There were definitely patterns, hauling around the same people once you learn their schedule is a big part of the job, but using it to sell me stuff is worthless.

Now that the youtubes have taken up the adblocker fight I still get all sorts of ads for medical stuff that has no direct relation to me. I do try to keep up on all the complicated "don't track me, you fucking stalkers" clicky buttons they like to add so perhaps I just fall into age group where their ad dollars shine, dunno?


I remain convinced that something on my phone listens to incidental conversations. Yesterday my wife was asking about the difference between our (past) Plymouth Voyager and the current Chrysler Pacifica. Today I get an ad in my Google (Android) feed for the Pacifica. I haven't looked at mini-vans in decades and neither of us searched for anything related.

We had just visited the Sloan museum in Flint, MI which has an extensive Buick display so an ad for a Buick would not have been unexpected.

Coincidences like this have happened too many times to be coincidental.


I think these conspiracy theories happen because people don’t understand how easy it is to leak data and how easy it is for data collectors to gather metadata and make a conclusion. Metadata is incredibly powerful and a lot of non-data scientists don’t realize the level of sophistication that companies have in their possession.

The classic example is Target predicting your pregnancy based on specific purchase behaviors. All they have to know is a consistent identifier and your purchase history and they can predict whether you’re pregnant. There’s no need to listen in on conversations or obtain other more detailed user data.

Also, a lot of “private” services and apps really don’t promise jack shit in their privacy policy. They are probably all gathering and selling the data nearly in real time. Their privacy policies are often far more broad, vague, and permissive than their PR will tell you.

You’re with your wife, your devices are often on the same networks, so it’s likely that advertisers know you know each other when you browse. Despite what your wife says, you really don’t know if she interacted with a Pacifica ad or piece of sponsored content. Even if she didn’t search for a Pacifica, it doesn’t have to be specifically something related to minivans, because that information that you are potentially more interested in minivans can come from other metadata.

TikTok manages to figure out your perception of a particular video based on how your fingers are moving on the screen, how long you’re spending on a video, what’s happening when you’re lingering or swiping, etc. You never really have to tell TikTok directly what things you like.

The game of 20 questions works from a similar concept. You can start knowing absolutely zero and ask a very small amount of binary questions to find the specific item the person has on their mind, only metadata.


> Cox Media Group recently gave advertisers an overview of a new technology it calls Active Listening. CMG claimed that its technology can use microphone data from devices like smartphones and tablets, specifically analyzing "pre-purchase conversations." The since-deleted blog post also mentions using AI to determine when the phrases heard from smart devices could be "relevant" to advertisers.

https://www.businessinsider.com/cox-active-listening-claims-...


From the archived page:

> We know what you're thinking. Is this even legal? The short answer is: yes. It is legal for phones and devices to listen to you. When a new app download or update prompts consumers with a multi-page terms of use agreement somewhere in the fine print, Active Listening is often included.

This means you have to give permissions and ignore the orange dot on your screen for this technology to work.


Don't most people just hit Accept blindly?


Probably, but that’s why new versions of iOS have a recording indicator anytime the microphone or camera is active.

And, you know, at some point consent is consent. It’s a giant dialog box that explains everything. Some people might even want an app that records their activity and gives them compensation for doing so (e.g., Microsoft/Bing Rewards). Who am I to tell that person they don’t want that?


> You’re with your wife, your devices are often on the same networks,

I'm on Fi, she's on Verizon. But I don't doubt that data miners know we're together due to consistent proximity. Neither she nor I did any Pacifica related searches.


Sorry, this is nonsense. The healthcare providers do it directly.

99.9% of people who have a healthcare provider at a minimum use their website - communicate with doctor, prescriptions, etc.

All these websites use adtech stuff, and the apps are even worse.

look, don't take my word for it, just look at kaiser permanente's website:

https://www.kp.org

It references google.com directly.

(also try www.dmv.ca.gov, same thing, same cookies)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: