Hacker News new | past | comments | ask | show | jobs | submit login
EasyList is in trouble and so are many ad blockers (adguard.com)
661 points by shscs911 on Oct 19, 2022 | hide | past | favorite | 406 comments



A great opportunity right now for CloudFlare to win some goodwill and PR by helping out EasyList for free right now.

But what about simply enable a firewall and show captcha or similar if the origin IP is from India and requesting that URL until the situation is under control? I did that with the free plan recently in CloudFlare in a similar situation and it worked perfectly (of course on a much smaller scale).


These apps behind cannot render the captcha, as the fetch is happening in the background.

However what you can do is match the user-agents, and return a global/catch-all adblocking rule that blocks all the content of all the pages (by blocking the body element).

The app developers are going to notice the issue very fast (because users are reporting the problem), and mirroring the lists or adding a cache is immediately going to be their priority.

Bonus: I think some browsers and extensions can execute JavaScript in adblocking rules; https://help.eyeo.com/adblockplus/snippet-filters-tutorial

(which is essentially re-using a gigantic XSS in order to notify the user)


Generally, I like the idea with the user agents filtering and “block everything” rule. No need for geoblocking. Insert a comment about why this is happening and ask for it to be changed.

However, as we’re living in the real world and the authors of the respective browsers strike me as lazy or uninterested, I also bet all that would change is the user agent.


"User agent" is a synonym for "browser". When you say "user agent" here, what you really mean is the contents of the header that identifies the user agent, i.e., browser. Calling it that is a little bit like referring to Chrome's developer tools as "Inspect Element" (based on the mistake that that's supposed to be its name, rather than recognizing that the label is just a simple, descriptive verb/action).


I think the idea was to block users without technically consuming bandwidth. A captcha is equivalent to blocking.


Blocking all page content to knowingly cause unintended behavior… I wonder if this can be considered criminal.

I read that poisoning your own lunch to catch a workplace fridge thief could be considered assault.

EDIT: here’s what I read. https://law.stackexchange.com/questions/966/can-one-be-liabl...

Imagine, say, you update the list to block all URLs, and it impacts some municipal government worker’s ability to update some emergency alert service and causes hundreds of people to be permanently injured.


I don't think so. Google often knowingly and intentionally breaks apps (through API deprecation) because it's more convenient for them or that it is costly to maintain. Nothing criminal there.

Same for Easylist, if they decide that a quota of 100000 requests per IP+UA per day is the maximum, that's their choice. They owe nothing to the consumers of the lists.

That being said; Easylist actually benefits from being distributed in many apps; it is really valuable to influence / control adblocking lists, so the more flexible they are to the browser developers, the better (I guess).


I think you misunderstood what parent was referring to. The idea was to poison the block list so that any browser matching their criteria (user agent belonging to DDOSing browser) would block everything.


If an application can't handle failed web requests that application is already broken. Web requests can and will fail at any time.


No one is forcing anyone to use this tool, they have every right to send an alert indicating the produce a user is using has been abusing their service.

Very much in the same way that image host use to change an image for those hotlinking directly to images in the early days of the net.


I appreciated parents comment because it points towards an interesting direction. No one is forcing anyone to use this tool, no one is forcing anyone to steal their food. In terms of individuals acting in line with expectations, the individual poisoning their own food as a trap shouldn't inconvenience anyone if everyone's being civilized.

Providing a service (which you expect others to consume) and then not only deciding to refrain from providing, but "poisoning" the output, is an interesting move. We don't consider them equivalent, but in a case where this application was providing some essential service that is not easily replaced, and physical harm was a result, how do we consider it?


I don't think you can remotely compare the two, and no physical harm is actual done. And if an extension stops working because it depends on a list, the list can be removed, the extention can be disabled, a different browser can be used. Ad blocking isn't an essential service that can be easily replaced, and it isn't being provided as anything but a voluntary service with no uptime or availability assurances.

The made up scenerio of this preventing some critical task from being accomplished is stretching at best.


True but I bet 99% of CloudFlare's income comes from companies that wish to see EasyList die in a fire. I'm pretty sure this would factor into their strict enforcement of the 'rules'. I mean, this is something between github and CloudFlare right? And github sure hosts a ton of other .txt files and other stuff that's not 'web content'. They don't enforce it so strictly with other sites.

Still, I'm sure the 'community' can figure out how to keep something like this online. I'd be happy to pony up some cash for decent hosting and I'm sure many would be. If that doesn't work out, something like ipfs, a torrent or whatever.


Correct. And let's not forget that the company which owns them would also like to see EasyList die in a fire.


Looks like it's fast to download now.


I am following up internally. Looks like there's a combination of this data not being cached, our systems thinking a DDoS was happening (which it sort of was). But getting the full story now.



I'm glad they sorted it out, but I wish there was a proper support route other than "create a sufficient media storm so that an employee tweets the CEO"


I can't understand their argument that a text file 'isn't a web content'; seems like a bullshit excuse.


This doesn’t sound like bullshit to me. Serving a static text file that is primarily used by applications is not in line with their terms of service.

Cloudflare provides a significant service to the free and open web by subsidizing the hosting costs of static content for websites. They give that away for free under what appears to be reasonable terms.


But it is web content, really. A txt file renders fine in every browser.

Many websites also push text through HTML as part of AJAXy stuff. If they actually enforced this for all sites, their service would no longer be usable.


Movies will render in browsers just fine too, doesn’t mean that cloudflare will allow you to cache them.



Sadly, despite my arguments in the same direction, Cloudflare refuses to host a base64 encoding of the new Flubber.


I can think of few things more static than a .txt file.


You're missing the point. The service cloudflare donates isn't free. That is the whole point of EasyList’s post. There are plenty of comments on this submission doing back of napkin math to find a reasonable monthly cost for hosting that text file. If you want to donate that bandwidth - go for it.

But the comments here about Cloudflare’s ToS read a lot like folks feeling entitled to getting bandwidth for for free. Cloudflare is providing a very specific service for free, and it does a lot of good.

It would be great if Cloudflare decided to donate. But I’d re-evaluate your stance if you’re feeling entitled to their resources.


> You're missing the point.

You're missing the point. If Cloudflare's issue is with bandwidth, then they should say so and leave it at that, not conjure up this pathetic excuse about .txt files somehow not being "web content". Does wrapping that data in <html><body><pre> </pre></body></html> magically fix the bandwidth issues?


Even just plain text file without any tags is a valid HTML 5 file. You don't need any tags. <html> and <body> are implied.

All you'd need is a <pre> tags or <style> somewhere if you'd want it rendered not just as one large paragraph.

I guess you may need a <!DOCTYPE html>


Just making a file valid HTML doesn't make it "web content". This file is being fetched by an application, not being viewed by a user.

I'm not sure this is the most reasonable rule but there are definitely some benificial aspects to it. For example the load on human-viewed content is limited by how often people want to view it. Not how often their browser wants to redownload it.


> Just making a file valid HTML doesn't make it "web content".

By Cloudflare's rationale it does.

> For example the load on human-viewed content is limited by how often people want to view it. Not how often their browser wants to redownload it.

Bandwidth is bandwidth. If 100,000,000 humans want to download a 10KB text/html page v. 100,000,000 programs wanting to download a 10KB text/plain file, both within the same time period, then that's going to be the same degree of load on Cloudflare's end.


Actually you're missing the point. It doesn't seem like many people are condemning Cloudflare for not serving a bandwidth-heavy file for free (FTA: "CloudFlare does not allow non-enterprise users use that much traffic").

Rather what's being condemned is this nonsense customer service characterization of a text file as somehow not "web content". Easylist.txt is a data file that could just as easily be in JSON (and be larger). Furthermore, as it stands easylist.txt actually looks like it's a valid text/html file, as browsers generally don't insist on <html>/<body> tags. So from both directions it seems like the customer service drone has thrown out this nonsense just to short circuit having to do their job.


I like the HN approach of taking a charitable interpretation of their message.

Clearly EasyList lived on their free tier for a long time without interruption. Only when they used excessive bandwidth did ToS enforcement happen. When they reached out for support, the support agent rightly pointed out that this isn't a website file.

Reading the ToS, the support agents message appears to be correct. Text files are fine (as is pretty much any format) as long as it isn't the main focus of the HTTP server Cloudflare is fronting. Robots.txt would be fine, turning the list into XML or HTML would not be fine. In this case, the text file isn't there to support the web content of Easy List - it's distributing a text file to applications.

The agent could have added additional context but their message is valid.


You're still conflating "free tier" with "not a web file". According to the article, they are separate issues and Cloudflare wouldn't be willing to host easylist.txt even on a paid plan.

Meanwhile -

1. easylist.txt is used by every single web page I visit. So the overall purpose argument fails.

2. Web pages commonly use non-directly-renderable data files in formats like JSON or XML, so the file purpose argument fails.

3. Text files are and continue to be one of the major formats displayed by browsers. So the file type argument fails.

4. The size of the file is in line with other files cached by Cloudflare. So that argument fails.

If the Cloudflare support rep said "we just don't feel like doing business with you", that would be a different thing. But instead they're throwing out some arbitrarily-framed unfalsifiable reason as if it's a logical justification. And no, customer service drones and corporate policies don't deserve a fundamental benefit of the doubt, per contra proferentem (ambiguous terms should be construed against the drafter). It's impossible to know what they actually mean here besides "we don't like it", and that is the problem.


But it's not JSON or HTML. And it's not meant for browsers. It's clearly a dataset as a text file and not meant as "web(page) content". What's nonsense about that when it's completely accurate?


How is it any different from a json file used by a web app (or mobile app, or desktop app)?

And there is no reason a web app couldn't read data from a txt file instead of a json file.


It's not for displaying a webpage but to power a separate application. Just because you can serve any kind of file over HTTP doesn't mean it's for serving a website. There's a reason Cloudflare doesn't allow large video files either - even though that probably counts even more as "web content".

This is a very straightforward interpretation of the terms and it's strange to see such pushback based on a pedantic technicality when it's clear what the file is being used for.

In fact, if this was the opposite situation and some automated rule was involved in isolating this file, I expect the same people would then want human intervention to clear up the difference based on the context.


> There's a reason Cloudflare doesn't allow large video files either - even though that probably counts even more as "web content".

And that's just as nonsensical. Bytes are bytes; the rationale should be based on bandwidth, not on arbitrary micromanagement of the format of the data consuming that bandwidth. If I encode that video in a giant self-contained blob of JavaScript that feeds the pixels into a canvas or something similarly ridiculous, does that magically fix the bandwidth issues?


No. It is about excessive resources/bandwidth usage. The customer service response reply isn't great but the context itself is clear because this file is clearly not for displaying any part of the web presence of the easylist site.

Again this is that "pedantic technicality" - why such a fuss when the actual issue is straightforward, and also clearly understood and reiterated by the easylist team themselves in the post?


> It is about excessive resources/bandwidth usage.

Then the policy would focus on that rather than micromanage the format of the data using those resources/bandwidth. Again: bytes are bytes.

> why such a fuss

Because a policy as nonsensical as "no non-HTML files allowed" artificially limits the usefulness of CloudFlare for precisely zero legitimate reason. I ask again: does wrapping a video in a blob of JavaScript fix the bandwidth issues associated with hosting videos? If I have a 10MB MP3 downloaded 1,000 times v. a 1MB HTML/CSS/JS static site downloaded 10,000 times, what difference does it make?


The difference is in the amount of cache you need. In one case you save 10GB per 1MB of cache, in the other just 1GB per 1MB and the big file is going to evict many small files (even if the user only listens to the first 10s). It's no huge difference for a single user/site, but across all users this quickly means needing a multiple of the current cache; which doesn't come for free.

Also, CF have a product to sell. The free tier is just the demo version: I think at the end of the day the policy is about not everyone in HN using CF for their low-cost DIY video and/or music streaming or download platform.

And I can totally see them reverse the decision and sponsor that project (it's probably something a support engineer has no power to decide).


> The difference is in the amount of cache you need.

Okay, now run the same thought experiment with 10,000 downloads of a 1MB MP3 v. 10,000 downloads of a 1MB HTML/CSS/JS site. What difference then?


So have a max file size.

Also, the same ToS applies if you use the paid product. Unless you buy an additional addon for non-web traffic.


The issue is about bandwidth and resources. The policy is generic to provide reasonable allowances but stop usage that's clearly outside the limits. That's how "unlimited but with oversight" plans work.


So the solution is to HTML the list and link it from the webpage?

> why such a fuss

Mockery isn't a fuss. And why? Because we can all picture having to deal with just this sort of idiot and their pet rule.


> It's not for displaying a webpage

The majority of of legitimate traffic to these txt files are by browsers(extensions) for the express purpose of displaying websites to a users specifications(without ads in this case).

A reasonably low estimate is that 20%-30% of global internet users are behind some kind of adblocker, almost all of which are default subscribed to easylists. So this txt file is potentially responsible for the way a BILLION+ internet users see and interact with near every single website.

Cloudflares claim that this isn't web content is full on reality warping. It only makes any kind of sense when masked under layers of abstractions and lawyer speak.

---

None this should even matter though, someone seeing the big picture at CF should have done the napkin math and realized that the easylist bandwidth pays for itself.

Ignoring the soft cushion of the huge amount of globally distributed caching of these files, if easylist suddenly stopped working for a week then global bandwidth usage could see a spike. A pretty little chunk of which CF may be on the hook to absorb at no cost.


But the support email didn't say it was a violation of the ToS because it is primarily used by non webpage applications (the ToS does say APIs may be allowed, but doesn't have specific details on when afaict), it says it is a violation because it is a txt file, and there is a txt couldn't be a part of a website. And in fact robots.txt and security.txt are standard and common txt files serves as part of websites. In the case if robots.txt it also is primarily consumed by web crawlers, not for actually rendering web pages. Does that mean robots.txt violates the ToS too?


> And it's not meant for browsers.

It literally is - specifically, for extensions thereof.


oh it's for browsers all right


A text file is a website file and that is what annoyed me in that support reply. The web is not just html, css and JS.

But in order to make the support eng do his job I need to add a .html extension to it?? Would that be considered a website file?


Interestingly, one could host this on a WWW frontend for Git. Then you'd only need to download (say a daily) diff. Why download the entire list when you can match checksum?


So if the text was embedded in a static webpage that the client had to parse locally, that'd be okay?



Does not inspire confidence in Cloudflare, that’s for sure.


I think CloudFlare pretty explicitly do not want people to be confident that they can serve 2 petabytes a month of API data on the free tier


That’s part of the problem though, isn’t it?

Because they certainly want to serve some huge amount of traffic for free while they attempt to become the next abusive monopoly platform.

They’re trying to have their cake and eat it too.


Maybe if they created a web page for easylist and then hosted that + the lists directly on CF pages, maybe that would considered as web content?


It probably means that their DDoS protection needs to use JS to get some trust signals


Web content is for consumption by people.


Just tack on a .html file extension and add a <html> tag at top and bottom…problem solved


So does this mean any site with a security.txt file is violating cloudflares ToS?


How about robots.txt, since security.txt is looked at by humans, but robots.txt is almost exclusively looked at by non web browser clients.


Pwned Passwords project by Troy Hunt is served by CloudFlare cache. I don't know scale of bandwidth usage by Pwned Passwords. But CloudFlare can definitely make the similar arrangement here too.


This is a bit different though. You are basically taking away a main revenue stream from websites, your main clients. That sounds like bad optics for them.


I can understand but my reply was with reference in parent comment

> A great opportunity right now for CloudFlare to win some goodwill and PR by helping out EasyList for free right now.


Wouldn't their R2 service tick all the boxes for this one?

https://developers.cloudflare.com/r2/platform/pricing/


Sounds like they'd probably be in for at least $500/mo on this which doesn't seem like a lot if you're serving the amount of data EasyList is doing, but is a lot if your previous hosting costs were "free".


Most requests will be in the background or in Cron jobs. Captcha wouldn't be possible in those situations as it would never be seen by anyone.


I’m not sure a captcha would help though. These aren’t intentional attack requests, they’re “legitimate” requests by a clueless developer’s app that happened to get popular.

They just need to serve either an empty response or an intentionally broken rule to break the misbehaving browser and force its developers to fix it.


Yes there is of course that as well!


> EasyList is hosted on Github and proxied with CloudFlare. Unfortunately, CloudFlare does not allow non-enterprise users use that much traffic, and now all requests to the EasyList file are getting throttled.

> EasyList tried to reach out to CloudFlare support, but the latter said they could not help. Moreover, serving EasyList actually may violate the CloudFlare ToS.

Seeing the comments from Cloudflare here, looks like the HN machine has yet again worked its magic to get appropriate attention!


A captcha for all 600 million internet users seems like overkill. Maybe a smaller subnet range.


that would break everyone in India not using one of those broken browsers


They are already serving access denied replies, so I assume they can identify the browsers via user agent or similar?

If so, returning a bogus file that blocks everything and adding a comment in that list asking the developers to use caching or mirroring the file should be fine.

I wonder if those browsers honor the list when fetching the update though. Would be awesome if you could just add easylist and lock out further requests right on the device.


Browser developers can choose to fake user-agents. Brave uses a generic chrome user agent so it cannot be differentiated from regular Chrome.


Everyone in the world is impacted if the site goes down under load. Changing that to everyone in a particular country (perhaps with a given user agent if the free plan allows expressions) would still be an improvement even if other work is needed.


If I recall correctly there was some image on wikipedia that was getting billions of downloads a day or something, all from India, because some smart phone had made it a default "hello" image and hot linked it.

Unfortunately, I can't find a reference to it anymore.



Not that you’d do it, but the temptation there is always to repoint your real application to a different URL and change the original image to something subtly NSFW.


I was debugging a similar issue where a small marketplace run by a friend was being scrapped and the listings were being used to make a competing marketplace look more active than it actually was.

The thing is, they didn't host the scrapped images themselves, they just hot-linked everything.

So through a little nginx config, we turned their entire homepage to an ad for my friend's platform :)


I assume you mean scraped, not scrapped.


Nope, meant scrapped


In case anyone is inspired to do related things, I made a mistake once (troubling and embarrassing), which I'll mention in case it helps someone else avoid my mistake...

In earlier days of the Web, someone appeared to have hotlinked a photo from a page of mine, as their avatar/signature in some Web forum for another country, and it was eating up way too much bandwidth for my little site.

I handled this in an annoyed and ill-informed way, but which I thought was good-natured, and years later realized it was potentially harmful. I'd changed the URL to serve a new version of the image, to which I'd overlaid text with progressive political slogans relevant to their country. (Thinking I was making a statement to the person about the political issues, and that it would be just a small joke for them, before they changed their avatar/signature to stop hotlinking my bandwidth.) Years later, once I had a bit more understanding of the world, I realized that was very ignorant and cavalier of me, and might've caused serious government or social trouble for the person.

Sensitized by my earlier mistake, I could imagine ways that a subtly NSFW image could cause problems, especially in the workplace, and in some other cultures/countries.


Yeah, you could get someone gulag'd pretty easily if you wanted to and they were in the right location.

Subtle things like flipping the image upside down or reversing the colors or other "not quite harmful but quite annoying" responses are probably better, or just serve a 1x1 pixel image of nothing.


Many years ago, back when eBay didn’t even have their own image hosting, I found someone hotlinking to the images from one of my completed auctions for their sale (of an identical product). I ended up swapping the images for ones from urinalpoop.com (seems to no longer exist, but at the time it featured pictures of exactly what you’d imagine by the URL). I ended up getting an angry message from the seller accusing me of “hacking” their auctions.


The angry emails were always the most amusing - the “fbi has seized this domain” image is a fun one to use.


I still have a 5k pixel square blank white gif on my site for times like that (~4kB) that I sub in for anything that gets requested too often, or from particular places.

I was getting hotlinked from controversial sites a lot at one stage, and the common forum software they used didn't force image sizes. So a 5k pixel wide image pushed most of the content off the screen thanks to a centred element :)


I remember from a long time ago something about an image that was corrupted, and did some self referral internally so you could crash applications through out of memory issues even though the image was only a couple of kilobytes. I might have to find it again to serve to hotlinkers!


Some Japanese porn makers avoid getting pirated in China by placing politically sensitive content in the backgrounds


Wasn't there news about police officers playing music in order for videos of them triggering automated copyright/DMCA takedowns?

https://www.vice.com/en/article/bvxb94/is-this-beverly-hills...


In reality it would only demonitize it for the uploader and they wouldnt get paid.


There are many songs completely banned from YouTube, like various Jimi Hendrix songs.


That sounds hilarious! Do you have a link to any articles about this practice?



Widespread in the sense that social media users have done it for long time, and Chinese users are sometimes counteracting by rewriting those into pro-regime phrases, but not what considered safe for commercial entities to exploit. That one is not a professionally produced film.

1: https://news-infoseek-co-jp.translate.goog/article/president... (og: https://news.infoseek.co.jp/article/president_61325/ )

2: https://i.imgur.com/5hjqu3L.jpg (label on bottle and window sign)


Thanks! That’s such a clever idea


My mind must be in a dark place because once you mentioned politics I thought of how just sitting at home I could easily come up with some kind of image that could literally imprison or kill some one off from thousands of miles away, without even getting up from the couch. I think I spent most of my internet youth lusting for such power.


A startup I used to work for had a horror story from before I started, where a small .png file had been accidentally hotlinked from a third party server. The png showed up on a significant % of users' custom homepages (think myspace, etc). At some point the person operating the server decided that instead of emailing someone or blocking the requests, they'd serve goatse up to a bunch of teenagers and housemoms. Mildly hilarious depending on your perspective, I guess?


This once happened in a particular South Korean news website where it shamelessly stole and hot-linked to a JavaScript file in a third-party website. The domain owner responded it by replacing the file, and the website in question had a warning message and tilted [1] for a while.

[1] https://twitter.com/dohoons/status/880347968800411648


Or something less malicious, like "Donate to Wikipedia", or some other organization.


> Or something less malicious, like "Donate to Wikipedia", or some other organization.

https://en.wikipedia.org/wiki/Censorship_of_Wikipedia:

“Wikipedia has been blocked in China since 23 April 2019”

⇒ putting ads for Wikipedia on sites likely isn’t safe everywhere.

I think it will be very hard to find “some other organization” that is universally ‘approved’ everywhere.


Been there. Done that. Someone had used our image in their phpBB signature. The hits slowed quite quickly.


Wikipedia is anyone can edit. It wouldn't be the first time someone did something like that.


...or a nice steak.


One could take some inspiration and simply rotate the image(s) - like in the case of wifi leeches:

https://www.ex-parrot.com/pete/upside-down-ternet.html


Oh if something /ever/ begged to be goatse'd


You do realize it will be displayed to children?


The discussion on wikimedia phabricator: https://phabricator.wikimedia.org/T273741


I worked on an ad-blocker a few months ago. I made the decision to have the filter-list files hosted on our own domain and CDN (similar to what Adguard does with their filters.adtidy.org).

This was done for 2 reasons:

1- Avoid scenarios like this where you ship code (extension in this case) that is hard to update. Then make that code depend on external resources outside of your control.

2- Leak our users' IP addresses to each random hosting provider.

So the solution was simple: Run a CRON once a day then host the files ourselves. Pretty happy with that decision now.


Except neither of those would help in this case. They’re already using their own domain name, and it’s unclear how they would even build their own CDN since they’re using that scale of bandwidth - AdGuard said they’re still pushing 100tb of access denied pages a month for their similar case. That is a LOT of bandwidth just for access denied messages.


Their point isn't that EasyList could have done anything differently, their point is that they are glad that they didn't decide to rely on others' infrastructure for their own ad blocker, because that makes them resilient against the fallout from this and similar.


Except it wouldn’t make them resilient since, as I pointed out, neither of the things they did would be of any help at all to Easylist in this situation.

It’s great that they’re happy with their choices, but the choices would, in this same situation, likely saddle them with a crippled infrastructure and/or some insane bandwidth bills for suddenly pushing 100 extra TB/m.


EasyList got here because they want all (respectful) apps to be able to use their list. They invited traffic, the problem is only occurring because this unknown browser violates the implicit "as long as you're considerate" rule.

OP, in contrast, wrote their own ad blocker targeting their own servers. They're in control of their ad blocker code and can write it to be respectful of their servers. They're not hosting the lists with the intent of allowing other people to use it, and they're unlikely to attract lazy app developers because the endpoints are (presumably) not listed publicly on the internet to anyone who wants an easy ad blocker list.


It was never suggested that what the original commenter was doing would help EasyList.


I didn’t say the OP did, though it was implied. I was responding to a comment which did… context, my friend.

Edit: so if OP wasn’t implying that their approach was better, what was the point of posting it? Wow, obtuse much?


It was not implied, you added that implication yourself and started responding to things that were not said, which is why the other commenter who replied to you was also confused, my friend.

Edit:

OP was implying their approach was better for themselves than relying on third party servers. It’s hardly obtuse, it’s a related discussion from someone who would otherwise be impacted by the throttling.


Actually if the developers of the web browsers in India did the same thing, easylist would have it easy...


100 TB/month is only like 40 mb/s I can find VPS that advertise 200 mb/s dedicated for not much (but still way more than “free”)


This sounds like a similar issue as Linux distros used to have, I'm not sure how many still do this, but use alternative sites.

On download.easylist.com have it shuffle and send a redirect to a mirror to download the list. I wonder if Universities are still offering these small amounts of space for Open Source projects?


Access Denied is an HTTP status code, not a page that you serve. 100Tb per month of status codes suggests something like 30 trillion requests per month. Is that possible?


Just because it's a status code doesn't mean it has no content. Response headers and some basic text?


Point taken, thanks.


Make it a web request so they can get a user agent. block the impacting user agents.


Since they added "Access denied" for misbehaving browsers, can they instead serve them some sort of bad response that will "surface" issue to the users? Depending on what would work better and cost less... (1) a small list that would block major legitimate sites. Whoops, the browser is unusable, now users complain to the developer to fix the issue, or abandon it. (2) "hang" the request if the browser loads the list synchronously; blocking UI thread is a hallmark of a bad developer, so they might (3) stream /dev/zero. Might be expensive; maybe serve a compressed zip-bomb if HTTP spec allows and/or browsers will process it?


Too much work. Just blackhole all requests originating from India in the firewall, as a start.


Right, block roughly 10% of the world. Great idea!


I'm confused about the ToS comment by Cloudflare. The txt is on a website so it is a web content?

So robots.txt is not supported by Cloudflare to cache/proxy it? That would be a weird regulation. And I bet everyone violates the Cloudflare ToS then.


It's from this tos page: https://www.cloudflare.com/terms/

2.8 Limitation on Serving Non-HTML Content

...Use of the Services for serving video or a disproportionate percentage of pictures, audio files, or other non-HTML content is prohibited, unless purchased separately...

A huge text/plain artifact, requested often, would seem to fall into that category of "disproportionate percentage" compared to text/html served.


This limitation apparently doesn't apply to R2 / Workers [0].

May be EasyList could host them there? That's what we do [1] (and the dashboards show 400TB+ per mo [2], likely rigged by the traffic between Workers and Cloudflare Cache).

[0] https://news.ycombinator.com/item?id=20791660

[1] https://news.ycombinator.com/item?id=30034547

[2] https://nitter.net/rethinkdns/status/1546232186554417152


Cloudflare can decide whom they want to do business with. But a plain text file is in my opinion sort of HTML. At least it is not "non-html" content. A .pdf file would be non-HTML content.

What else is important to note that the client is being abused and not the client abusing the service. That should be taken into consideration, when deciding if someone is breaking the ToS.


I'd agree that's weird. Seems like if it were simply renamed to .html with no content changes, then it would be okay.

> What else is important to note that the client is being abused and not the client abusing the service. That should be taken into consideration, when deciding if someone is breaking the ToS.

My understanding has this as moot. The issue from Cloudflare's perspective is only that the content is non-HTML and doesn't have anything to do with the rate of traffic (the abuse).


> (i) serving web pages as viewed through a web browser or other functionally equivalent applications, including rendering Hypertext Markup Language (HTML) or other functional equivalents, and (ii) serving web APIs subject to the restrictions set forth in this Section 2.8.

The key is "as viewed through a web browser" imo, this is not really an API and it's not a webpage; it's a datafile and would fall into R2 or similar things.


Why do people keep talking like you can't just navigate to a txt file in your browser and have it serve as any old web content? Which is something I have actually done many years ago to search for a domain in these types of lists.

Cloudflare is balancing on a razer for this TOS technicality.


The TOS aren’t referring to content-type headers, magic bytes, TCP headers, browser support of file formats, or any technical implementation.

To oversimplify, they’re saying Cloudflare’s service is to be used for serving websites to browsers.

Serving a static text file that is primarily used by applications is not in line with their terms of service.

Cloudflare provides a significant service to the free and open web by subsidizing the hosting costs of static content for websites. They give that away for free under what appears to be reasonable terms. I’m not sure why you’re trying to “gotcha” through their ToS.

It would be great if Cloudflare would donate resources to EasyList - it would do a lot to help the free and open internet by giving users more power over what gets delivered to their browser. But call that what it is: a donation.


> I’m not sure why you’re trying to “gotcha” through their ToS.

People are doing the opposite, pointing out the hole and asking them to get a better rule. Surely they don't just want the list merely converted into html.

> They give that away for free [...]

So they should specify things that influence cost such as total bytes served, number of files, etc. Currently all you can do it bypass the rule because you don't know how to cooperate.


It's lawyer speak, but the meaning is clear "this Cloudflare service is for webpages in a browser, not automated data downloads and distribution".


I see, that makes the position more understandable. I guess the same rule would (should) apply if they did indeed simply change the extension.


> Seems like if it were simply renamed to .html with no content changes, then it would be okay.

Imagine you do that and I DDoS the URL. CF will then mitigate this DDoS by, in part, replacing your html with their Browser Integrity Check html.

If you're serving 'web pages and websites' everything continues to work. What would happen if this list suddenly became an actual webpage.

If your site is serving 'a disproportionate percentage' of non-html you decrease the ability of CF to tell good traffic from bad.


A filter list is definitely not HTML


The minimal spec valid HTML5 document is currently:

    <!DOCTYPE html>
    <title>a</title>
Practically, browsers will accept omitting both of these, and the spec even allows for omitting the title "if it is provided by a higher level protocol"

So it's not that crazy an argument that a plain text file is a html document


Too technical.

They serve websites to browsers for people to view. This file (be it properly formatted .html or .txt) is not a website people go to in their browser - its used internally by an application. This is the key point.


You're looking at it backwards though. CF doesn't _actually_ care about what the content is, only that they can apply their DDoS protections to it. If you're serving a text file that's much more difficult as they can't replace it with their own content.


Only because they've so comprehensively defined HTML parsing that even parsing random data has a well-defined result.


They host the zipped files of content for haveIbeenPwned for Troy Hunt...


That's a special project they decided to take on, not subject to the standard ToS.


They should put EasyLisy in that special project category. It's just too important to the internet.


My best guess is that CloudFlare wrote this to prevent folks from serving big binary files like photo, music, or video and this txt file case was an unintended condition that happens to work to CloudFlare's advantage.

text/plain though is decidedly not text/html and I would expect CloudFlare to potentially do some on-the- fly optimizations that are aware of the structure of an html file that save terabytes a day at their scale.


> My best guess is...

Some think its very Oracle of Cloudflare to do so. I do not blame them.


This doesn't sound right to me. Cloudflare also protects web APIs. This text file is an extremely simple web API, but it is still a web API.


If the web apis were a disproportionate amount of what was served for some customers specific free CF plan, as compared to the cached HTML, then that doesn't match their TOS.


Sounds like it is meant to deal with multimedia mostly?

But anyway, just rename .txt to .html and you're done.


I imagine that might help with automated tos rate limiting, but eventually someone at Cloudflare will probably cut them off. It's plain text, but it's basically serving a distributed database. And a hint at their scale is "100TB of “Access Denied” served up monthly.

Cloudflare just seems to be trying to limit the free tier to "caching website html for the purpose of showing it to humans". They have pricing and plans for things other than that.


Simple but will it will break all sorts of automation down the line? All the other adlists are txt and I don't know how they would handle other file types, even if the content is unchanged.


Determining file type from the file name suffix is a fool's game and always was.


I hear you there. I'm more thinking someone probably hard coded txt file extension somewhere so something is likely to fall apart in simply handling that file.


Is it? Seems superior to arbitrary magic numbers or headers, and God forbid full naive parsing, in most ways.


I doubt there is any solution that is both robust and simple. In a sense, it is the same problem as that which ad blockers are attempting to solve.


Whats wrong with storing and delivering the intended content type as metadata, whether thats headers or filesystem metadata like in Mac OS X?


Transmitting in-band (headers) seems ripe for arbitrary complexity. Someone out there would write a Turing-complete header DSL. And then someone else would write an incompatible alternative implementation.

At least file extension is limited and externally visible (and thus accountable) to third party behavior, which should limit the worst complexity excesses.

Is filesystem metadata actually different (theoretically) from extension? Or just data in a different format?

Extension seems a nice balance between simplicity / brevity and utility, albeit as a hint, not a commandment.


Fun stuff like embedding data into jpgs or pngs.


Then CF replaces the html with their Browser Integrity Check. How does the app deal with the list becoming real 'Checking your browser" html?


If I can read it in Lynx, it is web content.


From a legal perspective I can understand such a wording, but I wonder why an engineer simply tells a (non-paying) customer that he violates the ToS, without thinking about it.

I mean, one could simply wrap the content in a HTML body and change the extension, but that would actually increase the data load for no good reason. So it is complete non-sense to complain about txt files being served.


The solution seems simple, just wrap it in a trivial HTML envelope. Enclose it in <pre> tags if needed.


Sounds like they’re just using the wrong service. R2 is designed for object storage, and has 0 egress fees. That’d be the way to go. Not sure why the support engineer didn’t mention it. The standard cloudflare web caching probably doesn’t work well for this use case for whatever reason. The price is only 0.015/GB/mo, so the ~MB(?) of list would be served in perpetuity for less than a dollar.


They're probably still getting many millions of requests a month so probably more than a dollar but even 20 million requests a month would only cost $3.60 (10 million free at first then 10 million @ $0.36/million)

I assume you probably know this but just wanting to share there are some pricing scales with R2 they're just pretty generous for a lot of things.


Elsewhere they say they are seeing 36 billion requests/month so that would be nearly $1,300 just for these access denides.


Actually, you're right. How would this work? Is Cloudflare really willing to foot the bill of 20 TB of bandwidth per day for a small text file that costs $0 to store?


Yes why not ? For reputation and attracting developers it seems to be worth it. If it costs 75K USD/year, that's already paid back with one big enterprise customer only.

Though, adblocking is a big business, many actors there are getting large revenue.

For example, Eyeo's income was 50 million USD per year last time I checked (and I guess most of it is actually profit), so they can find a solution if they really want.


Egress is free but not public i.e. you can't just give anyone an url. You have to use your own server to fetch content from R2 and then serve it to your visitors. Each fetch costs money but first 10 mil reads are free and your own server probably has egress fees.


No, egress is indeed public. Here's an example link, straight to R2:

https://img.phantasmagoria.me/img/96XJrjejoHNdrQv7.jpg

Even if you have a private bucket, you can give people a signed link with read access, for up to two weeks, IIRC.


Ah, I see they added public buckets last month

But it'll still cost them money by number of reads


Hm, yeah, true, you do need to pay for reads, you're right.


Imagine you're trying to block a DDoS attack. If the client is downloading HTML then they likely also have JS enabled giving you a ton of options for running code on their computer to help you decide if the traffic is legitimate.

If they're downloading text you can still use the headers, and some tricks around redirects, but overall you have far less data on which to decide.


Cloudflare caches robots.txt by default when proxied (the only .txt-file that they automatically cache), for all other content the following from their ToS probably applies:

> Use of the Services for serving video or a disproportionate percentage of pictures, audio files, or other non-HTML content is prohibited, unless purchased separately as part of a Paid Service or expressly allowed under our Supplemental Terms for a specific Service.

We will never know the reasoning of the support agent who replied to the EasyList maintainers, but I can imagine that it is indeed disproportionate for EasyList.

I really hope that Cloudflare actually sees that they are making a wrong decision here and actually help the EasyList maintainers.


The TOS isn't that you can't serve plain text, it's that it shouldn't be disproportionate in volume to the cached html served.


That could be solved by tacking on 2x the file size worth of pointless html code to the file.


What's the difficulty supposed to be? Serve the same thing with a different MIME type and you're in compliance.


I guess they just need to serve it with a minimal html shell


Yeah... That just doesn't seem right. All web content is text...


text/html is not text/plain but that doesn't matter: it's not a technical limitation that caused cloudflare to draw this line.

it's cloudflare deciding to protect "web content" and not videos or .iso images or other things that normally are not commonly served while you browse a contemporary website and read HTML.


> it's cloudflare deciding to protect "web content" and not videos or .iso images or other things that normally are not commonly served while you browse a contemporary website and read HTML.

That's false in two ways: first, text is normally served while you browse a contemporary website; second, so are images, which are explicitly called out as potentially violating this clause. Text is the only data that isn't covered by this clause.


> All web content is text...

It's all 1s and 0s too


> The txt is on a website so it is a web content?

Even more wtf- the file extension determines the file content?


Phew. Is just a bandwidth issue. This goofy title made me think advertisers found a way around ad blockers.


In a way, the advertisers did find a way around ad blockers.

Google built an entire browser and used Manifest V3 as an excuse to cripple ad blockers.

Companies are also paying influencers, twitch streamers, and YouTubers to promote their products in a way that conventional ad blockers can't prevent.


In case anyone reading hasn't heard of it: SponsorBlock for YouTube - Skip Sponsorships - https://chrome.google.com/webstore/detail/sponsorblock-for-y...


Sponsorblock has saved me hundreds of hours from watching youtube ads and other time wasting bullshit. The devs deserve to be paid for making this awesome application.


Yeah, it's awesome. Skips useless stuff like intros too.


Were you sponsored to say that?


I'm not the OP, but if I said that line, I'm 100% sure I am not being sponsored. It's just that good.


Or, you know, pay for Youtube Premium and get no ads? Added bonus of supporting the creators who make those videos you watch.


YouTube premium doesn't skip in-content ad segments in videos. Sponsorblock does.


And yet, you'll still get sponsor segues.


Wow never heard of it but that's nice. Also available for firefox by the way!


> Companies are also paying influencers, twitch streamers, and YouTubers to promote their products in a way that conventional ad blockers can't prevent.

Which I'm okay with in the same sense that I'm okay with newspaper/magazine ads or billboards or TV/radio commercials: they're annoying, but easy to ignore compared to online ads chewing up CPU time and battery life while actively violating one's privacy.


> in a way that conventional ad blockers can't prevent

Yet. One day someone will create an ad blocker with machine learning that "sees" the ads and deletes them in real time. Should work on all content types, even on augmented reality.


We already have SponsorBlock, which is just a crowdsourced manually created database of segments to skip.


I am on Chromium and using the manifest V3 version of Ublock right now and I have noticed no difference between it and Firefox with regular Ublock.

The very interesting thing is that none of Google's ads have ever made it through this new version of Ublock for me.


More ads will rely on unblockable methods over time.


This issue caused CF to irreversibly ban them though, so it's not "just a bandwidth issue" anymore.

> Based on the URL that are being requested at Cloudflare, it violates our ToS as well. All the requests are txt file extension which isn't a web content

> you cannot use Cloudflare to cache or proxy the request to these text files


> This issue caused CF to irreversibly ban them though

Do you have a source for that? The article only mentions them being throttled + the screenshot with the support engineer saying they seem to be breaking the ToS and asking them politely to move back into compliance.


Well, CF is just one service provider. There are bigger issues if they have already such a monopoly that their decisions kill projects worlwide.


Where did you get that they were irreversibly banned? Or banned at all for that matter?


They didn't get banned. They got an email from CF support saying that they cannot cache TXT files and that they'd need to disable the proxy.

This does not mean banned.


EasyList is a txt file. If they can’t host it on CF, it means they can’t use CF for EasyList.

They didn’t ban the whole organization, but effectively told them to stop. I don’t see a difference.


To me, getting banned is when the provider locks out (or just deletes) your account and prevents you from using their service entirely.

CF didn't do this. They sent them an email telling them that what they were doing was a violation of their TOS and to cease doing it. They did not kill off their account. They still have the option to comply and continue with CF, which seems to be what they are going to do at the moment.

Hopefully, CF will grant them amnesty on this one. At the end of the day, an HTML file is just a text file, so I don't see why this would have even mattered to begin with.


It is a bandwidth issue for a volunteer-run project.


Rate-limit the GeoIP list for the affected areas to drop if more than 20% of active traffic. i.e. the service outages get co-located only with the problem users areas.

Also, when doing auto-updates: always add a chaotic delay offset 1 to 180 minutes to distribute the traffic loads. Even in an office with 16 hosts or more this is recommended practice to prevent cheap routers hitting limits. Another interesting trend, is magnet/torrent being used for cryptographic-signed commercial package file distribution.

Free API keys are sometimes a necessary evil... as sometimes service abuse is not accidental.


That would only work if they had an API; AFAICT, they're just hosting a file.

At this point, they might be better off coordinating with the other major adblocker providers and just outright move the file elsewhere. Breaking other people's garbage code is better than breaking yourself trying to fix it. Especially on a budget of $0.00.

If the defective code for the browsers are in public repos, it might also be more effective for someone to just fork the code, fix the issue (i.e. only download this file once a month, instead of every startup), and at least give the maintainers a chance to merge the fix back in.


It is very common to see API keys in urls for access to what are essentially flat files. Thus, fairly trivial to change from:

https://127.0.0.1/file.csv

to

https://127.0.0.1/file.csv?apikey=abc123

This could allow client specific quotas, and easy adoption with maintained projects in minutes. Thus, defective and out-of-maintenance projects would need manually updated or get a 404.

=)


API keys are most successful when they're issued for server-side use; when used client-side the usual pattern that I see is for individual clients to request their own API key?

In this case, it would need to be distributed to myriad users who legitimately need to ask for the lists and then could be scraped by the "attacker", but at least then they'd have to be knowingly malicious vs. accidentally malicious.


You generally add a small "cost" to request an API key. For example submit your email to this form and wait a day.

Then browser makes like this will not reasonable be able to request a new key automatically for every install. So they will just request one and ship it.

Then when you get abuse like this you can disable it.


Ahhh, good point!


Moving the file elsewhere won't fix it. They are serving terabytes of traffic on Access Denied, it won't go away if that changes to "Not Found" instead, the developers seem already entirely ready to ignore their adblocker just not working.


The query limit from the lets encrypt SSL service operates in a similar manner. If you hit it more than a few times a week for the same domain/token/key, than you are banned for 5 days.

In general, it is easier to setup filters after differentiating legitimate from nuisance traffic. For example, fail2ban looks at the log of errors from invalid hits, and bans the IP or entire ISP block ranges for 5 days. This ban than get propagated to the rest of the web via spamhaus.org listing.

i.e. the users start to see the worlds internet become unreachable, as admins start to block traffic at the NOC's routers, and so on... India knows about Karma.

I am more surprised the app store for the apk isn't getting sued for theft of service.


Good luck finding the legal contact, not to mention suing, some random developer in India who apparently already abandoned the project.

I will also point out that if you fail2ban the IPs requesting the spamblock list, it could become worse if the browser just retries endlessly in the background. The traffic for a 404 page could be much smaller than the traffic of the very same devices trying again every few seconds, constantly, instead of only checking that 404 every app restart.


In general, the client socket needs to timeout on black-holed connection attempts (several minutes), and the server never sees a TCP handshake packet if the IP is on the global routers ban lists.

As a side note, some people build spider traps that reply with a pre-baked bzip file as a spoofed HTML compressed response. Thus a client program dutifully decompresses a few TB sized document, and browser exits due to memory issues. Note most modern Browsers are wise to this trick these days, but I doubt a dodgy plugin disk-usage limit check would catch a client side storage-flood. People shouldn't do this though, even if it is epically funny and harmless. =)


The client can just set a custom timeout and close the socket after 5 seconds of no or low activity. Then try again.

On your side note; EasyList probably does not want people to start suing for starting to distribute malicious content to users (and crashing your browser on purpose is arguably malicious).


There are all sorts of games people could play, as a DoS is technically an act of war under some legal systems. For example:

1. take the top 200 most popular websites in the given nuisance area

2. add ban rules to a version-B list that also includes all social media, search engines, and Wikipedia.

3. Look at the user-agent string for that specific problem client, or extreme apikey quota abuse

4. Randomly serve version-B filter list that breaks the browsing experience after a frequent update. Increase random breakage until traffic rolls off to normal levels.

The TOS for the ban list file does not specify which sites it will ban, and most users will just assume it is the App that is broken (it is already). People should not do this either, even if it is also funny and relatively harmless. Also, suing people while participating in an attempted crime probably would not go well. =)


Even then I would not do this without clearing it with a lawyer first. You could still end up at the wrong end of a lawsuit that you'll have to defend in India.


There are numerous legal/accounting specialists that protect businesses and investment decisions. We already won't serve _any_ content to IN networks as business policy... so are unlikely to ever have to visit with the cobras.

Have a gloriously wonderful day =)


Not serving means there is already no content. That is different to serving 100TB of 404 content from clients that will never stop.


Bittorrent, switch in long-term to that. Not saying every end-user should be a seeder but there is big bittorrent community out there and everyone could help a little bit.

Other options:

- A kind of mirror network (it only needs to keep sure that integrity can be checked, maybe with a public key)

- And while doing that why not also support compression (why not? only devs need to read it and they can run easily a decompression command), every bit saved would help.


> Bittorrent, switch in long-term to that.

S3 buckets in IAD with <5GB blobs can double-up as bit-torrent seeders.

I'd imagine, some tech IPFS/Filecoin/Sia might come in handy, too, but unsure of how healthy most of these web3 projects are right now.

There's also fosstorrents.com that help seed projects.


Is IPFS strictly Web3?

Sure it gained traction around Blockchain, crypto DeFI - but the storage technology is ELI5 a massive distributed storage.

I could hazard a guess that in terms of philosophy its closer to BitTorrent than Blockchain


IPFS is just distributed content discovery and download.

Someone advertises the content to the network, then people looking for the content can find it. Usually the people who find it advertise that they now have it to the network too.

It has nothing to do with cryptocurrency but is commonly used as a great way to embed "larger" content into blockchains and other immutable stores. It works well for this because 1. The CID contains a cryptographic hash of the expected content 2. You can change where/how you store the actual content without updating the URL.


The tl; dr I've been given is that IPFS is essentially one global DHT swarm with some searchable/discoverable magic on top. Not exactly the same as DHT torrenting but close enough fir the simple explanation.


Bittorrent won't work, as this list is commonly downloaded by heavily sandboxed browser extensions that can't access torrents.


Oh, but yes every user should be a seeder. Why not?


Mobile connections are usually metered. I'd still seed it 24/7 on my home connection though.


Wouldn't need all users to, and many are mobile or on metered connections.


Assuming it's not a kind of DoS attack, and since it sounds like they can detect the abusing clients (maybe by User-Agent)... some very desperate technical options involve serving an alternate small blocklist that does one of:

1. Try having it block subsequent requests for EasyList itself, just in case the frequent update requests are made with the prior blocklist in effect. (I accidentally did this before, in one of my own experimental blocklists, atop uBlock Origin.) Then the device vendor can fix their end.

2. If the blocklist language and client support it (I suspect they don't), you might safely replace or alter some Web pages, to add a message saying to disable EasyList in the client, or pressure the vendor, or similar. If this affects a lot of users, the meaning will also be spread in other languages to other users, even if not all of them understand any of the languages in the message. But be careful.

3. If you can't get a better message to the user, another option might be to block all requests, to prompt users to disable EasyList or vendor to fix the problem. But before doing this, you'll need to have verified that a combination of shoddy client/device software won't prevent users from using important functions of their devices for significant time. (Imagine this might be their only means of being connected online, and some shoddy client software pretty much prevents it from working, and the user is unable to access critical services.)

But before doing any of these desperate technical measures... First, I'd really try to reach people in the country who'll know what's going on, and who can reach and possibly pressure the vendor who's causing the problem. If tech industry people aren't able to help quick enough, reaching out to that government, directly or through your own country's diplomats/officials, might work. Communicating the risks of the desperate technical measures that you're trying to avoid (e.g., possibly breaking critical communications) could help people understand the urgency and importance of the situation.


> Try having it block subsequent requests for EasyList itself

Now you're thinking with portals


A lazy/bad developer ruining something so many people depend on is incredibly annoying.


On the internet, popularity is sometimes indistinguishable from being targeted by a low-orbit ion cannon.


serve a modified version to rate limited IP's that only contains popular indian sites and I'm sure it'll be resolved in a day or two


Or reverse slow loris them, send a byte sleep for a few seconds, send another byte, etc


Limit this to the specific headers of these Webbrowsers though, please.


They already have that part figured out. From the article:

> When we encountered a similar problem last year, we found a simple solution: block the undesired traffic from these apps. Even so, we continue to serve about 100TB of “Access Denied” pages monthly!


The difference is that serving Access Denied Leads to the users of these malicious browsers just getting more ads over time, as the filter lists can’t be updated anymore. Serving a special list containing popular sites would result in the users almost instantly not being able anymore to access these popular sites, resulting in requests to the developers to fix their shitty browser or switching altogether.


100TB of Access Denied is only 38 MB/s, so not even a minor DDOS these days.


Blocking the browsers isn’t a solution because they likely fall back to being open, so the user doesn’t notice.

Instead, you need to break the user experience so they complain to the developer of the app, thus impacting reputation.

It’s unfortunate that the browsers developers are unresponsive and this circumstance limits the available options to easy list.


This is an excellent solution.


Easylist should serve the Indian browser (based on user-agent) with a giant file (expensive), a corrupt file, or some response which causes the app the crash. If the browser crashes on every startup due to a malicious response from the Easylist server, users will likely delete it.


Serving a giant file is going to affect their servers more than the end device. If they could identify the user agent it would be a lot easier to just block it entirely.


Could be done with a gzip bomb.


Or a brotli bomb, for far greater expansion!


And a terrible idea because the users are not at fault, but the developers of the app. And in this case it might crash so they'd only report "app crashes on startup", if at all.


Then the browser could just pretend it is Chrome (if it isn't already) and that would easily work around your solution.

I think something like this would be best served by moving to IPFS or Bittorrent. A magnet link could be provided and then browsers and plugins could use that to download the file. That way, you can distribute the load.


Just a separate rules file that blocks .in and other popular domains for India will probably work just as well.


No, a file with a single line: “11) You should not download the list by every start” is enough


1. Add a ToS to the EasyList website that prohibits this sort of abuse. (I don't see any currently.)

2. Send a cease and desist letter to the app creator.

3. If they don't respond, also send a C&D to Google demanding they cease distribution of the malware responsible for the DDoS.

Anyone can send a cease and desist -- it's just cautionary letter. You aren't obligated to follow through with the threatened legal action.

It doesn't have the force of law behind it, but it'll at least get their attention.

(IANAL)


A C&D not coming from either a law firm or at least a big company is likely to go straight to the trash.


Would someone in India care about a C&D letter sent from someone, say, in Europe or the US? I don't think so


It’s not just somebody. If it’s from the creator of the things your livelihood depends on, you better pay attention.


So I understand some of the comments streaming down on CloudFlare, but I would like to get to another point entirely.

So people writing apps using crappy coding are cratering Easylist basically through unintentional DDoS. It is the apps that suck. Full Stop.

I have noted over the last five to ten years the general retrograde usability of phone apps and full desktop apps in general. Bad UIs, inconsistent behavior, performance issues on stuff that, at least on the surface, appears to be trivial.

I don't understand what is causing the general slide in quality, but it is clearly visible, and it seems to me, untenable over the near term.

Is it the tools? Is it the app churn pressure? I really do not know, because I work in a very different part of the industry. We have our own issues there, but that has more to do with technology churn (particularly in wireless standards) than in the tools and platforms.

So what is up with the app world, in general, and the web app world in particular? I am all ears, because as I said, I don't work in that space.


Regarding the 100TB of Access Denied pages: just drop the connection instead.

To make the system more scalable: instead of directly serving the file, serve a bunch of URLs to mirrors plus a checksum. The client must pick one of them. You can randomize the URLs and maybe add some geo logic to it. Let people provide mirrors. An additional indiraction step like this can prove incredibly powerful for systems that need to scale massively.


My first thought as well. Why even provide a response?


It would seem like you could prevent hotlinking by adding 1-5 minutes of latency to every request to a list.

Almost no dev would hotlink an asset that took that much longer to display, at least in critical/common paths. It would force consumers (devs/businesses) of the lists to provide a caching/mirroring solution of some kind for their users.

But on the bankend, the request would be designed just for updating the list cache. Handling 1-5 extra minutes per request, on a request that runs less than a few dozen times a day, to update the mirror/cache is trivial.


The issue with this approach is it's too late. It might work if you designed it from the start, but adding it now would only destroy your poor balancer with all the connections they have to maintain (waiting for the 5 minutes to expire).

It was mentioned in this article that they are now serving up accessed denied, but the problem is one of just too many requests.

At this point, it's likely easier to just kill the domain all together and get a new one.


This is certainly not a cure to the problem Easylist has right now. This is prevention. About how to design publicly consumable resources to naturally discourage hotlinking, before it is a problem.


That's what the person you're replying to said.


This seems the perfect use case for letting a secure BitTorrent tracker share the lists, then either implementing the client in the browser, or having it as a system service that syncs the necessary files.


Yeah! Just share through BitTorrent, this is the perfect example where it can be useful. To be honest I’m not sure why this isn’t built in to browsers yet Oracle browser has one.


Torrent was also my first thought. Is there any reason why this is not a good solution?


One downside is that Torrent doesn't really share blocks between torrents. So updating the torrent would result in a full download.

That being said the data is small so it probably isn't a big deal.

A better solution may be something like IPFS where with a rolling-checksum chunking algorithm you only need to download the changed parts of the list. You could even use IPNS to distribute updates for a fully decentralized setup.


Cloudflare claims that R2 have free egress/bandwidth

You could try that instead of the "CDN" service

--

Alternatively, try the "cheap" CDN services like Bunny or Beluga, which have packages for high volume like 0.005c/gb

Cloudflare is not really selling a CDN, but all the "smart" services on top of it.

That's why you don't have as much control (like blocking IP/Geos without Enterprise), or run into issues for breaking their ToS.


> like 0.005c/gb

You're off by a factor of 100.

https://www.belugacdn.com/cdn-pricing/

$5000/PB = $5/TB = 500c/TB = 0.5c/GB


beluga

> You pay 1¢ (or less!) for every Gigabyte of data accelerated over our cloud network.

doesn't seem clear cut as they bundle it in a subscription packages, but at this volume you'd surely need their enterprise package (10Tb+), which you'd expect to get 0.01 or less

bunny

> First 500TB $0.005 /GB

> From 1PB-2PB $0.002 /GB


> Even so, we continue to serve about 100TB of “Access Denied” pages monthly!

That is several thousand $ a year.


You wrote 0.005c not $0.005


my bad, not native speaker


Several opensource projects, especially Linux distributions, have solved this problem long ago by setting up a web of volunteer mirrors. They docs and scripts to quickly set up an additional mirror.

The central HTTP server of easylist could then hand out 302s to fetch the actual file from one of the mirrors. Alternatively to 302s, a modern scriptable DNS server, that uses the mirror list, responds with different IPs, round-robin (or even better if it's geo-aware).

DNS TXT records could maybe be used to serve a digest of the file, do that mirrors can't modify it without that being detected.


Maybe try the open source programs at Fastly.com or Bunny.net

https://www.fastly.com/open-source

https://bunny.net/contact/


This sucks. I hope they can reach the app authors and get it fixed the source.

Meanwhile perhaps some other CDN provider wants to create some goodwill if Cloudflare isn't willing?


I am not speaking on behalf of the company, but if someone involved with EasyList can contact me (avani@cloudflare.com), I'll see if there is a way to help out.


Thank you! I've passed that to EasyList maintainers.


Just serve an easy list to these browsers that blocks every image and all css. Problem solved.


How do you determine that it is these browsers requesting it and not uBO? It would be easy for the browsers to set their useragent to something like Chrome.


The browsers don’t update - they’re already returning 100b access denied or something.


I wonder what would happen if they renamed the file to robots.txt and then did a redirect at the cloudflare level for the current URL to robots.txt.

I imagine some (many?) clients would handle it poorly, but it would then be cachable at least. It's not exactly easy to test though without a level of unknown damage to legitimate users.


This is the first Cloudflare related thread that I haven't seen eastdakota or jgrahamc get involved



pool.ntp.org hands out specific subdomains for large-scale pool users. This way, it is possible to retire service for a subset of users that use devices that aren't updated anymore and are misbehaving.

The traffic issue is not just punted to DNS service. It's possible to return a cachable 127.0.0.1 response, and it's somewhat rare for DNS caches to be constantly powered up and down and reach out directly to authoritative DNS servers.


Just blackhole every request originating from India. After a while the developers will ask why their users are mad that they get ads all the time.

You then give them the finger, tell them to get their lists elsewhere, or pay for access due to their incompetency.

They will just have to do the needful.


Haha, yeah. Just proposing to blanket punish an entire country for a few developers screwing up, and throwing a little racist joke in there at the end. Good one.


Pretty simple to put 1B people into the same drawer, for a few bad apples. Think about what that would mean if the world applied that to (say) the US...


> Even so, we continue to serve about 100TB of “Access Denied” pages monthly!

What's the carbon footprint of this?


The thing with Access Denied is that these deprived clients retry with some vengeance. So, you're instead draining more resources than you'd like. I run a content-blocking DoH resolver, and this happened to us when we blocked IPs in a particular range and the result was... well... a lot of bandwidth for nothing.


Why serve any HTTP replies to those at all? If you are doing it at the IP level, why not just drop all inbound packets from the L3 address?


This is what I was wondering. I'm taking a wild guess that maybe they don't have that level of firewall access and it was being done through filtering by the webserver to provide an access denied.


We were on Netifly way back then. So, no L3 blocks. Now on pages.dev and workers.dev, but haven't needed to enforce any rules yet.


But why bother with deny? Just send a blank text file (or one with as minimal data as needed to satisfy the rogue adblock) to the "blocked IPs" to mitigate the traffic for now. If firewall access exists, just drop the offending incoming traffic entirely.


> Just send a blank text file (or one with as minimal data as needed to satisfy the rogue adblock) to the "blocked IPs" to mitigate the traffic for now.

The sent http body was blank, but I beleive we were still sending http head...

> If firewall access exists, just drop the offending incoming traffic entirely.

True, but the service we were using at the time didn't have a L3 firewall, and so we ended up moving out, after paying the bills in full, of course.


This is the correct answer, and basically you have to setup round-robin DDOS protection that provides these "wrong" answers.

While still trying to allow valid traffic through.


That reminds me of the absolutely insane amount of traffic my mother's Roku TV shits out when it can't resolve/reach its spyware and telemetry services. It's like 95-98% of the blocked traffic on her network.

Is there a clean solution to this problem these days? Like some kind of adblocking router that resolves these addresses correctly but then routes packets destined for these services into a black hole so the requests eventually timeout? That would at least slow the repeat request floods down significantly.


In this particular case this is not the case and they don't retry. They just really want to download updates REALLY often.


1. Restricting access until developers fix it. 2. Consider encouraging the use of webtorrent within the extension? = each user hosts and serves the list.


Webtorrent isn't distributed, so you'd just shift the problem to the tracker/signalling server.

And in this particular case even BitTorrent proper may not have helped because steady-state BT is distributed but if a client doesn't persist its state after a bootstrap - and lack of persistence is the issue here - it'd hit the bootstrap server every time. Granted, it'd only be about one UDP packet per client, much less traffic than what is easylist is seeing, but foolish code deployed at scale can still overload services provided on a budget.


One lesson I guess many people (I for one) learned over the years: never ever serve anything directly on your domain, always use a subdomain. You never know which part you'll want to move off of your server...


just ban all Indian IPs from the website on firewall.

proper solution would be to use DNS to forward all Indian traffic to one of the local VPCs.

The idea is to rate-limit users - let them pull blacklist.txt only once a day, so you serve the file and add requestor's IP to denylist on the firewall. So that any subsequent requests are blocked by firewall

+Cloudflare has Geofencing feature


Just report these bad apps to google playstore, and get them taken down for D.O.S.ing your website.

Google playstore will remove them.

Trust in the process.


But this won't uninstall the app from the phones


From the perspective of someone who got their start from building small web tools and slapping ads on them and putting them online, which paid for most of my expenses to get myself through university, it's kind of funny that the people that want to use the internet "for free" at the expense of builders of websites, now also want to be able to block ads to continue to extract value out of everyone else also for free by asking Cloudflare for a free plan.

Pay for your own darn servers, everyone else has to and can't even use ads to support the costs due to you.


There is a big difference between showing someone ads (think radio and tv ads) vs spying in on their browsing habbits (almost all internet ads).

I am fine with one but not the other. Thus I block them.


Just register a new domain and serve it off that instead.

If this browser is not maintained then it won’t update what it requests, but everyone else can.

Obviously this becomes very whack-a-mole but in this instance it might be an easy win.


I used to host my blog on my own server years ago and some popular site "hotlinked" to an image I had posted. My traffic - which back then I paid for by the Gb - spiked like crazy. I decided to put in Apache referrer checks for images and started serving some porn when it didn't match my domain. That solved the problem pretty quickly.

Easylist should do something similar - requests from India should include a list with all the popular sites like Google, YouTube, aajtak.in, etc. When their browsers suddenly stop working the problem will be solved.


> EasyList is hosted on Github and proxied with CloudFlare.

What is the reason for proxying through Cloudflare? Are there any bandwidth limits or performance issues when directly serving those files from GitHub?


From https://docs.github.com/en/pages/getting-started-with-github...

---

> GitHub Pages sites have a soft bandwidth limit of 100 GB per month.

> In order to provide consistent quality of service for all GitHub Pages sites, rate limits may apply. These rate limits are not intended to interfere with legitimate uses of GitHub Pages. If your request triggers rate limiting, you will receive an appropriate response with an HTTP status code of 429, along with an informative HTML body.

> If your site exceeds these usage quotas, we may not be able to serve your site, or you may receive a polite email from GitHub Support suggesting strategies for reducing your site's impact on our servers, including putting a third-party content distribution network (CDN) in front of your site, making use of other GitHub features such as releases, or moving to a different hosting service that might better fit your needs.

---


I wonder how e.g. NixOS or Homebrew deal with this


NixOS wouldn't be affected by that quota because it doesn't use GitHub Pages as far as I can tell.


GitHub has a soft limit of like 100 GB/month on transfers for Pages. According to the Adguard blog post traffic was already several TBs a day before the issue arose.


Why not only provide the list as a repo? You can't hotlink a repo. And someone abusing raw links is GitHub's bandwidth problem.

'Legitimate' users of the list would clone/pull the repo to their own mirror?


Do you mean like this? https://github.com/easylist/easylist

EasyList updates frequently, many times each day, as the commits to that repo demonstrate.


Exactly, but only via a repo.


I'm curious, if an arbitrary GitHub repo suddenly started attracting hundreds of terabytes of egress, violating GitHub's ToS, would GitHub manage traffic in coordination with the repo's owner, or would they disable the repo and suspend the account?

I suspect the latter. I don't know how to make a repo public but limit web traffic to it. Do you?


I could see disabling viewing raw links. But if the repo becomes popular to fork what would GH do? The friction of using git instead of HTTP will prevent 99.9% of hotlinking. So it probably couldn’t become too popular.


It seems their total traffic is 1000-2000TB per month and they're mostly just serving a single text file.

You can serve that easily from a dozen dedicated servers for a low four figures amount. Or just get one server with a 10gbit/s connection if you don't care about being geographically close to everyone.

These numbers aren't large, but many CDN companies (and AWS) would like you to believe they are and you should be paying them insane amounts of money to serve that kind of traffic.


Why not just serve a special list to all android users from India that contains the top 100 most popular Indian websites (https://www.similarweb.com/top-websites/india/)? Surely users will stop using that browser when they can't visit those websites anymore because those are now blocked instead of the ads.


Just block all of India for a week. That’ll sharpen minds a bit to track down the culprit

And given that this seems like a near existential threat drastic action seems warranted.


Better yet, serve an empty file to the offending user agents


Even better, serve a file that blocks literally everything.


I'm sympathetic to their trouble but we're talking about serving a 330KB text file (150KB compressed), surely this isn't an insurmountable technical hurdle to overcome?

A 1000mbps dedicated server could serve it 70 MILLIONS times per day. Considering that most wouldn't be served (E-tags and whatnot), it can probably sustain a billion requests a day.

What am I missing?


> The overall traffic quickly snowballed from a couple of terabytes per day to 10-20 times that amount.

A 1000Mbps server could only serve 10.8TB, and that's not even accounting for overhead/daily usage patterns/etc


You're missing three orders of magnitude.


The last copy of the .txt file saved by the WayBackMachine is 1.4MB:

https://web.archive.org/web/20220901000327if_/https://easyli...

If you're getting a 330KB file, maybe the server issues are causing the download to fail?


It’s not really the serving that’s the issue - it’s the amount of bandwidth used… in the case of serving simple content (like txt) bandwidth is always going to be the expensive element


Open and public utility data could be served with p2p technologies.


> There’s an open source Android browser (now seemingly abandoned) that implements ad-blocking functionality

> The problem is that this browser has a very serious flaw. It tries to download filters updates on every startup, and on Android it may happen lots of times per day. It can even happen when the browser is running in the background

EasyList should be offered as a version-controlled copy you grab once, that then gets bundled with an app, rather than offering a download to be called from an app: https://easylist.to/easylist/easylist.txt (Currently down as of writing).

The only caveat is such a list needs to be updated, so then a version system should be implemented for EasyList and you periodically bundle the new version via app updates. It would save a lot of bandwidth doing this.


They're offering a text file. Presumably with an If-modified-since header, although it's hard to check now.

There is no approach you can describe that doesn't run afoul of the described badly-behaved browser app which willfully retrieves the entire file afresh at every init. If it can be downloaded, it will be downloaded directly by the badly-behaved mobile apps.


Call me lazy but I’d just support an If-modified-since header in such a simple case and call it good


Bad app will just brutally fetch it every time with not even a cache on its side.

As a quick fix, there are many options for limiting per IP per timespan, e.g. fail2ban, you could configure it to punish bad apps without crippling functionality for others. Well, maybe crippling a little bit in some very special use cases, still better than it simply not working.


The list does support an ETag and the 304 even appears to not be rate-limited by Cloudflare.


PiHole and PFBlockerng are two big ones that use these resources too and setting those up it struck me as it did you that simply polling these resources on a set schedule was a waste.

Podcasting 2.0 has been talking about podping as a solution because podcasting basically has the same problem with periodic polling of the RSS feed. Basically you subscribe and then receive notice there's been an update, THEN you go get it.

https://www.podcasthelpdesk.com/podping-and-other-stuff-with...


The difference is pihole only updates weekly.


Didn't know that, in fact clicking around in the UI I don't see a way to change that so good on them for being friendly in this area.

PFBlocker seems to default to once a day.


To change when pihole updates you edit the cron entry at /etc/cron.d/pihole.


That's what I meant, it's buried in a config file. Not one of those things that should be surfaced in the UI that opens it up to abuse.


You can use firefox on android and install ad block on it as a plugin


How does that help when you can't download easylist anymore since it's under DDoS?


Scaling serving of a static text file would make a fantastic job interview question, especially as you explore what happens as the text file gets bigger and the number of downloads gets higher. The "correct answer" isn't necessarily obvious in any of these cases.


Download a blocking list on each start >:(

That shows the discrepancy between linear program execution model (on the classic desktop) and the quasi permanently running model of modern apps. Maybe the site should identify these rowdy browsers and give them a kick.


I am using a pi-hole which uses easylist, but it grabs it from https://v.firebog.net/hosts/Easylist.txt which seems to be working fine.


That's a different list. It only contains the hosts/domains on easylist, because that's all pi-hole (and all host based blockers) can block. It's also hosted by someone else (and they too use Cloudflare, see firebog.net).

The normal easylist is way bigger and has lots of rules for ad blockers like uBlock Origin.


Serve up a text file to connections from India that just has a wildcard entry that causes everything to be blocked. Those crappy browsers will very quickly loose their popularity.


100TB of "Access Denied" replies per month, how many requests served is this?

This is hilarious in an unbelievably terrible and tragic way. The scale is mind boggling.

I wonder which browser it is.


Author here. Actually, it's getting better, I've just looked up the stats and for the last 30 days we only served 70TB of access denied pages, this is about 33-34B requests.


That's 2 trillion requests a month, or 780,000 requests/second. That would be just a little shy of 1% of all of akamai's rps traffic. mind-boggling


Oh no, that were numbers for a month, not a day


Why would anyone host files at a provider that charges for egress?

EasyList would need more than the 100mbs Scaleway offers for €2/month but Scaleway has more capable instances.


I'm not sure, but is IPFS capable of solving the issue?


The gateways would similarly not be thrilled about it.

I would add a redirect to the makers of the browsers in question (so that the leechers got to deal with the traffic themselves; https://en.wikipedia.org/wiki/Inline_linking)


No, the issue is with a misbehaving client program accessing a specific HTTP URL.


Yes. But client must be native IPFS, not relying on gateway...


I have my pi-hole configured to use the Domains list from https://oisd.nl/ which incorporates a bunch of other lists (including easylist), de-duplicates, and removes a few false positives. They also have an Adblock Plus Filter List that can be used with uBlock Origin.


I can't imagine slowing down could be a good idea. At their scale, the sheer number of connection count probably matters more and may contribute to a higher proportion of cost. Bandwidth is expensive yes, but keeping a connection alive means consuming extra memory for the socket in the kernel, in the app, and in many other places.


> Even so, we continue to serve about 100TB of “Access Denied” pages monthly!

I might be ignorant of the scale of things but surely a 404 page can't weigh more than half a kilobyte right?

I think EasyList maintainers have every right to break stuff in this case. Also could easily turn the list into a HTML file as it's just as parsable as txt.


just add the easylist.to domain to easylist!


Really? "text files" arent web content? What the fuck does cloudflare think CSS and HTML files are?


To be fair to them, this is configuration data, not a piece of a website you would read in a browser. I don’t agree with the policy but it is reasonably clearly worded.



I think GeoIP blocking is a good strategy here. Wikipedia has used this strategy and it has been reasonably successful. People can still log in to GitHub and download the list regardless of where they are located.


The solution to this problem is to use a public R2 bucket with caching enabled. Block all requests to non-existent files with firewall. Now you don't even have to block those browsers from India!


I’m surprised ad companies have not tried this as an ‘attack’ vector already.


It seems they need some form of UUID in order to rate-limit individual clients. I wonder what percentage of their traffic would drop if they started requiring some form of authentication to download this list?


hack-n-patch worm? I recall a white hat doing this for some IoT annoyance.


Why not just limit one request per week per IP by those suspect netblocks


How much of that 100TB is for the stupid HTTP 'Date' header?


Perfect application for a functioning system like torrent.


You'd still need the coordinator node (tracker) for every single request, so you'd still be serving an absolutely insane amount of traffic off one central box.

If you baked something like IPFS or DHT torrent capability into the application and requested your blocklists that way you'd solve the single point of failure problem, but that's asking a whole lot from a shittily maintained and poorly configured browser fork.


Yet another example of how centralization sucks. Some kind of gossip protocol seems perfect for distributing lists like this.


Will the downloading browsers obey a 304 response?

Can that be edge cached with a future GMT expire date?

Alternately a 302 response to trusted local mirrors?



Would DNS blocking be affected by this (I presume not since the lists are hardcoded)?


I would just put it on a public git repo and let it sort itself out.


Can’t they run an open collective just to pay the bandwidth bill?


They shouldn't have to foot the bill for this. This is some bad developers releasing a bad fork of a bad browser.

At a minimum, they should have either reduced the requests to something like a monthly download (not great, but far better than requesting a file every startup), or ideally, hosted and updated the file themselves, on their infrastructure. At least that would force them to look at their own hosting bills, instead of crippling a prominent and important contributor to ad-blocking software.


Is Google’s AMP-cache a candidate for distributing the file?


Cloudflare's rationale for a ToS violation is absolutely bonkers. Who cares what the file extension is? By virtue of being accessible on the web via a URL it is by definition web content.


Cloudflare's response makes less sense the more you think about it. We are talking about URLs, where the entire concept of file extensions doesn't apply. Would they be okay if the developer simply changed https://eazylist.to/eazylist.txt to https://eazylist.to/eazylist and served the exact same resource? Or does Cloudflare actually want to parse and validate the encoding or format of the served bytes?


Cloudflare's TOS only permit caching for HTML content and related assets. It's probably a useful catch-all they can pull out any time someone uses too much bandwidth.


> It's probably a useful catch-all they can pull out any time someone uses too much bandwidth.

That's exactly what it is, and Cloudflare should be honest about that instead of coming up with an excuse as pathetic as ".txt files ain't web content".


Ok, so then why not make a structured HTML page that's served and parsed as the list.. that would be perfectly fine? Cloudflare's stance on this makes no sense


That would be fine because then CF could replace that HTML with their "Checking your browser before accessing" content. What's the app going to think of that though?


> That would be fine because then CF could replace that HTML with their "Checking your browser before accessing" content.

They could do that anyway by putting up the "Checking your browser before accessing" page and redirecting to whatever file was being accessed. There's precisely zero need to outright modify the HTML files being served to inject such checks.


No they couldn't. If you replace a .txt file with random html data bad things are going to happen.

Tell me, how does CF put a 'Checking your browser' page in my `wget https://easylist.to/easylist/easylist.txt`


> If you replace a .txt file with random html data bad things are going to happen.

Bad things are already happening. Random HTML data is no less useful than no data at all.

> Tell me, how does CF put a 'Checking your browser' page in my `wget https://easylist.to/easylist/easylist.txt`

By serving the HTML instead (with JS that does whatever checks and then redirects to the intended resource), and if the client can't cope with that, then tough shit; the absurdity of such arbitrary evaluation of whether or not clients are sufficiently browsery to be worthy of consuming bandwidth aside, ain't "only browsers directly navigating to this will be able to cope with this and gain access" exactly the point of using such JS-based client checks in the first place?


<sigh> there's a lot of similar comments to this one. In short, it's much harder to protect a text file against DDoS. The ToS say 'a disproportionate percentage ... non-html' likely because they need to be able to apply their browser checks to clients.


> <sigh> there's a lot of similar comments to this one.

<sigh> and a lot of similar rebuttals to this one.

> In short, it's much harder to protect a text file against DDoS.

It's exactly the same difficulty. Bandwidth is bandwidth, bytes are bytes. The only suggestion I've seen where HTML/CSS/JS might be relevant is using JS for client-side probing, and if Cloudflare is indeed injecting arbitrary JS into HTML pages it serves then that's utterly horrifying and is a problem in and of itself.


>> <sigh> there's a lot of similar comments to this one.

> <sigh> and a lot of similar rebuttals to this one.

My point is that a lot of people don't seem to understand the 'why' of this issue and instead appeared to just jump to the conclusion that:

> It's exactly the same difficulty.

When that's not at all the case.

> using JS for client-side probing, and if Cloudflare is indeed injecting arbitrary JS into HTML pages it serves then that's utterly horrifying and is a problem in and of itself.

Well they are. From CF:

>> Cloudflare’s bot products include JavaScript detections via a lightweight, invisible code injection that honors Cloudflare’s strict privacy standards

But even before we consider that, if the request is for html then it's likely coming from a browser. If CF replaces that html with their own then the browser will likely run it allowing them to run all kinds of probes then run the redirect. The same is not true for a .txt file or an image.


> My point is that a lot of people don't seem to understand the 'why' of this issue

We understand the "why" of the actual issue - i.e. the bandwidth consumption - just fine. It's Cloudflare's decision to micromanage data formats consuming that bandwidth (instead of just, you know, evaluating the bandwidth itself and being done with it) and using that as their basis for a ToS violation that's entirely whackadoodle.

> Well they are.

Then that is indeed absolutely horrifying and a problem in and of itself. Privacy concerns (of which there are a multitude) aside, this seems like a really great way to break all sorts of things, and I'd trust the claims of "lightweight", "invisible", and "strict privacy" about as far as I can throw them.

> If CF replaces that html with their own then the browser will likely run it allowing them to run all kinds of probes then run the redirect. The same is not true for a .txt file or an image.

The same is absolutely true for a .txt file or an image. Consider two flows:

1. You click on a link, your browser loads a text file.

2. You click on a link, your browser loads an HTML+JS file, the embedded JS redirects your browser, your browser loads a text file.

Same end result, same client-side probing opportunities, and without needing to rely on swinging JS into the end document like Patrick Bateman swinging an axe into his coworker's face while ranting about "Hip To Be Square".

Or better yet: just don't do this and only care about the raw bandwidth consumed, instead of actively making the World Wide Web a worse place with "clever" tricks like anally probing my browser to see if it's browsery enough for some arbitrary standard of browserness (for the sake of a "protection" that's almost certainly trivial to break with one of the umpteen headless browser solutions anyway).


They can just make it into a web page, and make all extensions parse that web page instead of a TXT file.

A bit ugly, but works.


Stupid question, but can't they rename the txt to .html, put <html> on the front, </html> on the back, and use cloudflare to cache/proxy the content?


I suspect that the real problem isn't the type of content but the fact that it is being requested by robots at a high rate. If you had a text file that humans were actually browsing and reading I doubt Cloudflare would care, but humans read files at a much lower rate than is occurring for this text file.


Say you do that and I DDoS the easylist.html. Cloudflare will start applying their DDoS mitigation. Now everyone's app receives the Browser Integrity Check instead of the list.


What about some form of P2P for this?


This would be a great use case for IPFS


Can’t this just be mirrored?


Here we go, text/plain is no web content. Cloudflare defines the internet.


can they not block Indian IPs from cloudflare dashboard


That would prevent 1.4 billion people from ad-blocking. Not sure if we want to use these kind of blanket measures as a first response.


The logic here is that it's better to block 1 country and keep it working for everyone else than to leave it as it is and break it for everyone.

It's not ideal, but until the problem is fixed/better solutions are found, I think it's a good "first response".


No, it would block 1.4 billion people from that one specific URL.


I think the free plan allows this. Seems an easy solution.


I'll have a huge impact on anything making legitimate use of the list. Adblockers on sensible browsers will stop working etc.

It may be easy, and it may even be the only option, but it's a bad one that will need some thought from the maintainers I expect.


The only thought it would require would be thought they should have been putting into their fetching strategy in the first place.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: