A great opportunity right now for CloudFlare to win some goodwill and PR by helping out EasyList for free right now.
But what about simply enable a firewall and show captcha or similar if the origin IP is from India and requesting that URL until the situation is under control? I did that with the free plan recently in CloudFlare in a similar situation and it worked perfectly (of course on a much smaller scale).
These apps behind cannot render the captcha, as the fetch is happening in the background.
However what you can do is match the user-agents, and return a global/catch-all adblocking rule that blocks all the content of all the pages (by blocking the body element).
The app developers are going to notice the issue very fast (because users are reporting the problem), and mirroring the lists or adding a cache is immediately going to be their priority.
Generally, I like the idea with the user agents filtering and “block everything” rule. No need for geoblocking. Insert a comment about why this is happening and ask for it to be changed.
However, as we’re living in the real world and the authors of the respective browsers strike me as lazy or uninterested, I also bet all that would change is the user agent.
"User agent" is a synonym for "browser". When you say "user agent" here, what you really mean is the contents of the header that identifies the user agent, i.e., browser. Calling it that is a little bit like referring to Chrome's developer tools as "Inspect Element" (based on the mistake that that's supposed to be its name, rather than recognizing that the label is just a simple, descriptive verb/action).
Imagine, say, you update the list to block all URLs, and it impacts some municipal government worker’s ability to update some emergency alert service and causes hundreds of people to be permanently injured.
I don't think so. Google often knowingly and intentionally breaks apps (through API deprecation) because it's more convenient for them or that it is costly to maintain.
Nothing criminal there.
Same for Easylist, if they decide that a quota of 100000 requests per IP+UA per day is the maximum, that's their choice.
They owe nothing to the consumers of the lists.
That being said; Easylist actually benefits from being distributed in many apps; it is really valuable to influence / control adblocking lists, so the more flexible they are to the browser developers, the better (I guess).
I think you misunderstood what parent was referring to. The idea was to poison the block list so that any browser matching their criteria (user agent belonging to DDOSing browser) would block everything.
No one is forcing anyone to use this tool, they have every right to send an alert indicating the produce a user is using has been abusing their service.
Very much in the same way that image host use to change an image for those hotlinking directly to images in the early days of the net.
I appreciated parents comment because it points towards an interesting direction. No one is forcing anyone to use this tool, no one is forcing anyone to steal their food. In terms of individuals acting in line with expectations, the individual poisoning their own food as a trap shouldn't inconvenience anyone if everyone's being civilized.
Providing a service (which you expect others to consume) and then not only deciding to refrain from providing, but "poisoning" the output, is an interesting move. We don't consider them equivalent, but in a case where this application was providing some essential service that is not easily replaced, and physical harm was a result, how do we consider it?
I don't think you can remotely compare the two, and no physical harm is actual done. And if an extension stops working because it depends on a list, the list can be removed, the extention can be disabled, a different browser can be used. Ad blocking isn't an essential service that can be easily replaced, and it isn't being provided as anything but a voluntary service with no uptime or availability assurances.
The made up scenerio of this preventing some critical task from being accomplished is stretching at best.
True but I bet 99% of CloudFlare's income comes from companies that wish to see EasyList die in a fire. I'm pretty sure this would factor into their strict enforcement of the 'rules'. I mean, this is something between github and CloudFlare right? And github sure hosts a ton of other .txt files and other stuff that's not 'web content'. They don't enforce it so strictly with other sites.
Still, I'm sure the 'community' can figure out how to keep something like this online. I'd be happy to pony up some cash for decent hosting and I'm sure many would be. If that doesn't work out, something like ipfs, a torrent or whatever.
I am following up internally. Looks like there's a combination of this data not being cached, our systems thinking a DDoS was happening (which it sort of was). But getting the full story now.
I'm glad they sorted it out, but I wish there was a proper support route other than "create a sufficient media storm so that an employee tweets the CEO"
This doesn’t sound like bullshit to me. Serving a static text file that is primarily used by applications is not in line with their terms of service.
Cloudflare provides a significant service to the free and open web by subsidizing the hosting costs of static content for websites. They give that away for free under what appears to be reasonable terms.
But it is web content, really. A txt file renders fine in every browser.
Many websites also push text through HTML as part of AJAXy stuff. If they actually enforced this for all sites, their service would no longer be usable.
You're missing the point. The service cloudflare donates isn't free. That is the whole point of EasyList’s post. There are plenty of comments on this submission doing back of napkin math to find a reasonable monthly cost for hosting that text file. If you want to donate that bandwidth - go for it.
But the comments here about Cloudflare’s ToS read a lot like folks feeling entitled to getting bandwidth for for free. Cloudflare is providing a very specific service for free, and it does a lot of good.
It would be great if Cloudflare decided to donate. But I’d re-evaluate your stance if you’re feeling entitled to their resources.
You're missing the point. If Cloudflare's issue is with bandwidth, then they should say so and leave it at that, not conjure up this pathetic excuse about .txt files somehow not being "web content". Does wrapping that data in <html><body><pre> </pre></body></html> magically fix the bandwidth issues?
Just making a file valid HTML doesn't make it "web content". This file is being fetched by an application, not being viewed by a user.
I'm not sure this is the most reasonable rule but there are definitely some benificial aspects to it. For example the load on human-viewed content is limited by how often people want to view it. Not how often their browser wants to redownload it.
> Just making a file valid HTML doesn't make it "web content".
By Cloudflare's rationale it does.
> For example the load on human-viewed content is limited by how often people want to view it. Not how often their browser wants to redownload it.
Bandwidth is bandwidth. If 100,000,000 humans want to download a 10KB text/html page v. 100,000,000 programs wanting to download a 10KB text/plain file, both within the same time period, then that's going to be the same degree of load on Cloudflare's end.
Actually you're missing the point. It doesn't seem like many people are condemning Cloudflare for not serving a bandwidth-heavy file for free (FTA: "CloudFlare does not allow non-enterprise users use that much traffic").
Rather what's being condemned is this nonsense customer service characterization of a text file as somehow not "web content". Easylist.txt is a data file that could just as easily be in JSON (and be larger). Furthermore, as it stands easylist.txt actually looks like it's a valid text/html file, as browsers generally don't insist on <html>/<body> tags. So from both directions it seems like the customer service drone has thrown out this nonsense just to short circuit having to do their job.
I like the HN approach of taking a charitable interpretation of their message.
Clearly EasyList lived on their free tier for a long time without interruption. Only when they used excessive bandwidth did ToS enforcement happen. When they reached out for support, the support agent rightly pointed out that this isn't a website file.
Reading the ToS, the support agents message appears to be correct. Text files are fine (as is pretty much any format) as long as it isn't the main focus of the HTTP server Cloudflare is fronting. Robots.txt would be fine, turning the list into XML or HTML would not be fine. In this case, the text file isn't there to support the web content of Easy List - it's distributing a text file to applications.
The agent could have added additional context but their message is valid.
You're still conflating "free tier" with "not a web file". According to the article, they are separate issues and Cloudflare wouldn't be willing to host easylist.txt even on a paid plan.
Meanwhile -
1. easylist.txt is used by every single web page I visit. So the overall purpose argument fails.
2. Web pages commonly use non-directly-renderable data files in formats like JSON or XML, so the file purpose argument fails.
3. Text files are and continue to be one of the major formats displayed by browsers. So the file type argument fails.
4. The size of the file is in line with other files cached by Cloudflare. So that argument fails.
If the Cloudflare support rep said "we just don't feel like doing business with you", that would be a different thing. But instead they're throwing out some arbitrarily-framed unfalsifiable reason as if it's a logical justification. And no, customer service drones and corporate policies don't deserve a fundamental benefit of the doubt, per contra proferentem (ambiguous terms should be construed against the drafter). It's impossible to know what they actually mean here besides "we don't like it", and that is the problem.
But it's not JSON or HTML. And it's not meant for browsers. It's clearly a dataset as a text file and not meant as "web(page) content". What's nonsense about that when it's completely accurate?
It's not for displaying a webpage but to power a separate application. Just because you can serve any kind of file over HTTP doesn't mean it's for serving a website. There's a reason Cloudflare doesn't allow large video files either - even though that probably counts even more as "web content".
This is a very straightforward interpretation of the terms and it's strange to see such pushback based on a pedantic technicality when it's clear what the file is being used for.
In fact, if this was the opposite situation and some automated rule was involved in isolating this file, I expect the same people would then want human intervention to clear up the difference based on the context.
> There's a reason Cloudflare doesn't allow large video files either - even though that probably counts even more as "web content".
And that's just as nonsensical. Bytes are bytes; the rationale should be based on bandwidth, not on arbitrary micromanagement of the format of the data consuming that bandwidth. If I encode that video in a giant self-contained blob of JavaScript that feeds the pixels into a canvas or something similarly ridiculous, does that magically fix the bandwidth issues?
No. It is about excessive resources/bandwidth usage. The customer service response reply isn't great but the context itself is clear because this file is clearly not for displaying any part of the web presence of the easylist site.
Again this is that "pedantic technicality" - why such a fuss when the actual issue is straightforward, and also clearly understood and reiterated by the easylist team themselves in the post?
> It is about excessive resources/bandwidth usage.
Then the policy would focus on that rather than micromanage the format of the data using those resources/bandwidth. Again: bytes are bytes.
> why such a fuss
Because a policy as nonsensical as "no non-HTML files allowed" artificially limits the usefulness of CloudFlare for precisely zero legitimate reason. I ask again: does wrapping a video in a blob of JavaScript fix the bandwidth issues associated with hosting videos? If I have a 10MB MP3 downloaded 1,000 times v. a 1MB HTML/CSS/JS static site downloaded 10,000 times, what difference does it make?
The difference is in the amount of cache you need. In one case you save 10GB per 1MB of cache, in the other just 1GB per 1MB and the big file is going to evict many small files (even if the user only listens to the first 10s).
It's no huge difference for a single user/site, but across all users this quickly means needing a multiple of the current cache; which doesn't come for free.
Also, CF have a product to sell. The free tier is just the demo version: I think at the end of the day the policy is about not everyone in HN using CF for their low-cost DIY video and/or music streaming or download platform.
And I can totally see them reverse the decision and sponsor that project (it's probably something a support engineer has no power to decide).
The issue is about bandwidth and resources. The policy is generic to provide reasonable allowances but stop usage that's clearly outside the limits. That's how "unlimited but with oversight" plans work.
The majority of of legitimate traffic to these txt files are by browsers(extensions) for the express purpose of displaying websites to a users specifications(without ads in this case).
A reasonably low estimate is that 20%-30% of global internet users are behind some kind of adblocker, almost all of which are default subscribed to easylists. So this txt file is potentially responsible for the way a BILLION+ internet users see and interact with near every single website.
Cloudflares claim that this isn't web content is full on reality warping. It only makes any kind of sense when masked under layers of abstractions and lawyer speak.
---
None this should even matter though, someone seeing the big picture at CF should have done the napkin math and realized that the easylist bandwidth pays for itself.
Ignoring the soft cushion of the huge amount of globally distributed caching of these files, if easylist suddenly stopped working for a week then global bandwidth usage could see a spike. A pretty little chunk of which CF may be on the hook to absorb at no cost.
But the support email didn't say it was a violation of the ToS because it is primarily used by non webpage applications (the ToS does say APIs may be allowed, but doesn't have specific details on when afaict), it says it is a violation because it is a txt file, and there is a txt couldn't be a part of a website. And in fact robots.txt and security.txt are standard and common txt files serves as part of websites. In the case if robots.txt it also is primarily consumed by web crawlers, not for actually rendering web pages. Does that mean robots.txt violates the ToS too?
Interestingly, one could host this on a WWW frontend for Git. Then you'd only need to download (say a daily) diff. Why download the entire list when you can match checksum?
Pwned Passwords project by Troy Hunt is served by CloudFlare cache. I don't know scale of bandwidth usage by Pwned Passwords. But CloudFlare can definitely make the similar arrangement here too.
This is a bit different though. You are basically taking away a main revenue stream from websites, your main clients. That sounds like bad optics for them.
Sounds like they'd probably be in for at least $500/mo on this which doesn't seem like a lot if you're serving the amount of data EasyList is doing, but is a lot if your previous hosting costs were "free".
I’m not sure a captcha would help though. These aren’t intentional attack requests, they’re “legitimate” requests by a clueless developer’s app that happened to get popular.
They just need to serve either an empty response or an intentionally broken rule to break the misbehaving browser and force its developers to fix it.
> EasyList is hosted on Github and proxied with CloudFlare. Unfortunately, CloudFlare does not allow non-enterprise users use that much traffic, and now all requests to the EasyList file are getting throttled.
> EasyList tried to reach out to CloudFlare support, but the latter said they could not help. Moreover, serving EasyList actually may violate the CloudFlare ToS.
Seeing the comments from Cloudflare here, looks like the HN machine has yet again worked its magic to get appropriate attention!
They are already serving access denied replies, so I assume they can identify the browsers via user agent or similar?
If so, returning a bogus file that blocks everything and adding a comment in that list asking the developers to use caching or mirroring the file should be fine.
I wonder if those browsers honor the list when fetching the update though. Would be awesome if you could just add easylist and lock out further requests right on the device.
Everyone in the world is impacted if the site goes down under load. Changing that to everyone in a particular country (perhaps with a given user agent if the free plan allows expressions) would still be an improvement even if other work is needed.
If I recall correctly there was some image on wikipedia that was getting billions of downloads a day or something, all from India, because some smart phone had made it a default "hello" image and hot linked it.
Unfortunately, I can't find a reference to it anymore.
Not that you’d do it, but the temptation there is always to repoint your real application to a different URL and change the original image to something subtly NSFW.
I was debugging a similar issue where a small marketplace run by a friend was being scrapped and the listings were being used to make a competing marketplace look more active than it actually was.
The thing is, they didn't host the scrapped images themselves, they just hot-linked everything.
So through a little nginx config, we turned their entire homepage to an ad for my friend's platform :)
In case anyone is inspired to do related things, I made a mistake once (troubling and embarrassing), which I'll mention in case it helps someone else avoid my mistake...
In earlier days of the Web, someone appeared to have hotlinked a photo from a page of mine, as their avatar/signature in some Web forum for another country, and it was eating up way too much bandwidth for my little site.
I handled this in an annoyed and ill-informed way, but which I thought was good-natured, and years later realized it was potentially harmful. I'd changed the URL to serve a new version of the image, to which I'd overlaid text with progressive political slogans relevant to their country. (Thinking I was making a statement to the person about the political issues, and that it would be just a small joke for them, before they changed their avatar/signature to stop hotlinking my bandwidth.) Years later, once I had a bit more understanding of the world, I realized that was very ignorant and cavalier of me, and might've caused serious government or social trouble for the person.
Sensitized by my earlier mistake, I could imagine ways that a subtly NSFW image could cause problems, especially in the workplace, and in some other cultures/countries.
Yeah, you could get someone gulag'd pretty easily if you wanted to and they were in the right location.
Subtle things like flipping the image upside down or reversing the colors or other "not quite harmful but quite annoying" responses are probably better, or just serve a 1x1 pixel image of nothing.
Many years ago, back when eBay didn’t even have their own image hosting, I found someone hotlinking to the images from one of my completed auctions for their sale (of an identical product). I ended up swapping the images for ones from urinalpoop.com (seems to no longer exist, but at the time it featured pictures of exactly what you’d imagine by the URL). I ended up getting an angry message from the seller accusing me of “hacking” their auctions.
I still have a 5k pixel square blank white gif on my site for times like that (~4kB) that I sub in for anything that gets requested too often, or from particular places.
I was getting hotlinked from controversial sites a lot at one stage, and the common forum software they used didn't force image sizes. So a 5k pixel wide image pushed most of the content off the screen thanks to a centred element :)
I remember from a long time ago something about an image that was corrupted, and did some self referral internally so you could crash applications through out of memory issues even though the image was only a couple of kilobytes. I might have to find it again to serve to hotlinkers!
Widespread in the sense that social media users have done it for long time, and Chinese users are sometimes counteracting by rewriting those into pro-regime phrases, but not what considered safe for commercial entities to exploit. That one is not a professionally produced film.
My mind must be in a dark place because once you mentioned politics I thought of how just sitting at home I could easily come up with some kind of image that could literally imprison or kill some one off from thousands of miles away, without even getting up from the couch. I think I spent most of my internet youth lusting for such power.
A startup I used to work for had a horror story from before I started, where a small .png file had been accidentally hotlinked from a third party server. The png showed up on a significant % of users' custom homepages (think myspace, etc). At some point the person operating the server decided that instead of emailing someone or blocking the requests, they'd serve goatse up to a bunch of teenagers and housemoms. Mildly hilarious depending on your perspective, I guess?
This once happened in a particular South Korean news website where it shamelessly stole and hot-linked to a JavaScript file in a third-party website. The domain owner responded it by replacing the file, and the website in question had a warning message and tilted [1] for a while.
I worked on an ad-blocker a few months ago. I made the decision to have the filter-list files hosted on our own domain and CDN (similar to what Adguard does with their filters.adtidy.org).
This was done for 2 reasons:
1- Avoid scenarios like this where you ship code (extension in this case) that is hard to update. Then make that code depend on external resources outside of your control.
2- Leak our users' IP addresses to each random hosting provider.
So the solution was simple: Run a CRON once a day then host the files ourselves. Pretty happy with that decision now.
Except neither of those would help in this case. They’re already using their own domain name, and it’s unclear how they would even build their own CDN since they’re using that scale of bandwidth - AdGuard said they’re still pushing 100tb of access denied pages a month for their similar case. That is a LOT of bandwidth just for access denied messages.
Their point isn't that EasyList could have done anything differently, their point is that they are glad that they didn't decide to rely on others' infrastructure for their own ad blocker, because that makes them resilient against the fallout from this and similar.
Except it wouldn’t make them resilient since, as I pointed out, neither of the things they did would be of any help at all to Easylist in this situation.
It’s great that they’re happy with their choices, but the choices would, in this same situation, likely saddle them with a crippled infrastructure and/or some insane bandwidth bills for suddenly pushing 100 extra TB/m.
EasyList got here because they want all (respectful) apps to be able to use their list. They invited traffic, the problem is only occurring because this unknown browser violates the implicit "as long as you're considerate" rule.
OP, in contrast, wrote their own ad blocker targeting their own servers. They're in control of their ad blocker code and can write it to be respectful of their servers. They're not hosting the lists with the intent of allowing other people to use it, and they're unlikely to attract lazy app developers because the endpoints are (presumably) not listed publicly on the internet to anyone who wants an easy ad blocker list.
It was not implied, you added that implication yourself and started responding to things that were not said, which is why the other commenter who replied to you was also confused, my friend.
Edit:
OP was implying their approach was better for themselves than relying on third party servers. It’s hardly obtuse, it’s a related discussion from someone who would otherwise be impacted by the throttling.
This sounds like a similar issue as Linux distros used to have, I'm not sure how many still do this, but use alternative sites.
On download.easylist.com have it shuffle and send a redirect to a mirror to download the list. I wonder if Universities are still offering these small amounts of space for Open Source projects?
Access Denied is an HTTP status code, not a page that you serve. 100Tb per month of status codes suggests something like 30 trillion requests per month. Is that possible?
Since they added "Access denied" for misbehaving browsers, can they instead serve them some sort of bad response that will "surface" issue to the users? Depending on what would work better and cost less... (1) a small list that would block major legitimate sites. Whoops, the browser is unusable, now users complain to the developer to fix the issue, or abandon it. (2) "hang" the request if the browser loads the list synchronously; blocking UI thread is a hallmark of a bad developer, so they might (3) stream /dev/zero. Might be expensive; maybe serve a compressed zip-bomb if HTTP spec allows and/or browsers will process it?
...Use of the Services for serving video or a disproportionate percentage of pictures, audio files, or other non-HTML content is prohibited, unless purchased separately...
A huge text/plain artifact, requested often, would seem to fall into that category of "disproportionate percentage" compared to text/html served.
This limitation apparently doesn't apply to R2 / Workers [0].
May be EasyList could host them there? That's what we do [1] (and the dashboards show 400TB+ per mo [2], likely rigged by the traffic between Workers and Cloudflare Cache).
Cloudflare can decide whom they want to do business with. But a plain text file is in my opinion sort of HTML. At least it is not "non-html" content. A .pdf file would be non-HTML content.
What else is important to note that the client is being abused and not the client abusing the service. That should be taken into consideration, when deciding if someone is breaking the ToS.
I'd agree that's weird. Seems like if it were simply renamed to .html with no content changes, then it would be okay.
> What else is important to note that the client is being abused and not the client abusing the service. That should be taken into consideration, when deciding if someone is breaking the ToS.
My understanding has this as moot. The issue from Cloudflare's perspective is only that the content is non-HTML and doesn't have anything to do with the rate of traffic (the abuse).
> (i) serving web pages as viewed through a web browser or other functionally equivalent applications, including rendering Hypertext Markup Language (HTML) or other functional equivalents, and (ii) serving web APIs subject to the restrictions set forth in this Section 2.8.
The key is "as viewed through a web browser" imo, this is not really an API and it's not a webpage; it's a datafile and would fall into R2 or similar things.
Why do people keep talking like you can't just navigate to a txt file in your browser and have it serve as any old web content? Which is something I have actually done many years ago to search for a domain in these types of lists.
Cloudflare is balancing on a razer for this TOS technicality.
The TOS aren’t referring to content-type headers, magic bytes, TCP headers, browser support of file formats, or any technical implementation.
To oversimplify, they’re saying Cloudflare’s service is to be used for serving websites to browsers.
Serving a static text file that is primarily used by applications is not in line with their terms of service.
Cloudflare provides a significant service to the free and open web by subsidizing the hosting costs of static content for websites. They give that away for free under what appears to be reasonable terms. I’m not sure why you’re trying to “gotcha” through their ToS.
It would be great if Cloudflare would donate resources to EasyList - it would do a lot to help the free and open internet by giving users more power over what gets delivered to their browser. But call that what it is: a donation.
> I’m not sure why you’re trying to “gotcha” through their ToS.
People are doing the opposite, pointing out the hole and asking them to get a better rule. Surely they don't just want the list merely converted into html.
> They give that away for free [...]
So they should specify things that influence cost such as total bytes served, number of files, etc. Currently all you can do it bypass the rule because you don't know how to cooperate.
The minimal spec valid HTML5 document is currently:
<!DOCTYPE html>
<title>a</title>
Practically, browsers will accept omitting both of these, and the spec even allows for omitting the title "if it is provided by a higher level protocol"
So it's not that crazy an argument that a plain text file is a html document
They serve websites to browsers for people to view. This file (be it properly formatted .html or .txt) is not a website people go to in their browser - its used internally by an application. This is the key point.
You're looking at it backwards though. CF doesn't _actually_ care about what the content is, only that they can apply their DDoS protections to it. If you're serving a text file that's much more difficult as they can't replace it with their own content.
My best guess is that CloudFlare wrote this to prevent folks from serving big binary files like photo, music, or video and this txt file case was an unintended condition that happens to work to CloudFlare's advantage.
text/plain though is decidedly not text/html and I would expect CloudFlare to potentially do some on-the- fly optimizations that are aware of the structure of an html file that save terabytes a day at their scale.
If the web apis were a disproportionate amount of what was served for some customers specific free CF plan, as compared to the cached HTML, then that doesn't match their TOS.
I imagine that might help with automated tos rate limiting, but eventually someone at Cloudflare will probably cut them off. It's plain text, but it's basically serving a distributed database. And a hint at their scale is "100TB of “Access Denied” served up monthly.
Cloudflare just seems to be trying to limit the free tier to "caching website html for the purpose of showing it to humans". They have pricing and plans for things other than that.
Simple but will it will break all sorts of automation down the line? All the other adlists are txt and I don't know how they would handle other file types, even if the content is unchanged.
I hear you there. I'm more thinking someone probably hard coded txt file extension somewhere so something is likely to fall apart in simply handling that file.
Transmitting in-band (headers) seems ripe for arbitrary complexity. Someone out there would write a Turing-complete header DSL. And then someone else would write an incompatible alternative implementation.
At least file extension is limited and externally visible (and thus accountable) to third party behavior, which should limit the worst complexity excesses.
Is filesystem metadata actually different (theoretically) from extension? Or just data in a different format?
Extension seems a nice balance between simplicity / brevity and utility, albeit as a hint, not a commandment.
From a legal perspective I can understand such a wording, but I wonder why an engineer simply tells a (non-paying) customer that he violates the ToS, without thinking about it.
I mean, one could simply wrap the content in a HTML body and change the extension, but that would actually increase the data load for no good reason. So it is complete non-sense to complain about txt files being served.
Sounds like they’re just using the wrong service. R2 is designed for object storage, and has 0 egress fees. That’d be the way to go. Not sure why the support engineer didn’t mention it. The standard cloudflare web caching probably doesn’t work well for this use case for whatever reason. The price is only 0.015/GB/mo, so the ~MB(?) of list would be served in perpetuity for less than a dollar.
They're probably still getting many millions of requests a month so probably more than a dollar but even 20 million requests a month would only cost $3.60 (10 million free at first then 10 million @ $0.36/million)
I assume you probably know this but just wanting to share there are some pricing scales with R2 they're just pretty generous for a lot of things.
Actually, you're right. How would this work? Is Cloudflare really willing to foot the bill of 20 TB of bandwidth per day for a small text file that costs $0 to store?
Yes why not ? For reputation and attracting developers it seems to be worth it.
If it costs 75K USD/year, that's already paid back with one big enterprise customer only.
Though, adblocking is a big business, many actors there are getting large revenue.
For example, Eyeo's income was 50 million USD per year last time I checked (and I guess most of it is actually profit), so they can find a solution if they really want.
Egress is free but not public i.e. you can't just give anyone an url. You have to use your own server to fetch content from R2 and then serve it to your visitors. Each fetch costs money but first 10 mil reads are free and your own server probably has egress fees.
Imagine you're trying to block a DDoS attack. If the client is downloading HTML then they likely also have JS enabled giving you a ton of options for running code on their computer to help you decide if the traffic is legitimate.
If they're downloading text you can still use the headers, and some tricks around redirects, but overall you have far less data on which to decide.
Cloudflare caches robots.txt by default when proxied (the only .txt-file that they automatically cache), for all other content the following from their ToS probably applies:
> Use of the Services for serving video or a disproportionate percentage of pictures, audio files, or other non-HTML content is prohibited, unless purchased separately as part of a Paid Service or expressly allowed under our Supplemental Terms for a specific Service.
We will never know the reasoning of the support agent who replied to the EasyList maintainers, but I can imagine that it is indeed disproportionate for EasyList.
I really hope that Cloudflare actually sees that they are making a wrong decision here and actually help the EasyList maintainers.
text/html is not text/plain but that doesn't matter: it's not a technical limitation that caused cloudflare to draw this line.
it's cloudflare deciding to protect "web content" and not videos or .iso images or other things that normally are not commonly served while you browse a contemporary website and read HTML.
> it's cloudflare deciding to protect "web content" and not videos or .iso images or other things that normally are not commonly served while you browse a contemporary website and read HTML.
That's false in two ways: first, text is normally served while you browse a contemporary website; second, so are images, which are explicitly called out as potentially violating this clause. Text is the only data that isn't covered by this clause.
Sponsorblock has saved me hundreds of hours from watching youtube ads and other time wasting bullshit. The devs deserve to be paid for making this awesome application.
> Companies are also paying influencers, twitch streamers, and YouTubers to promote their products in a way that conventional ad blockers can't prevent.
Which I'm okay with in the same sense that I'm okay with newspaper/magazine ads or billboards or TV/radio commercials: they're annoying, but easy to ignore compared to online ads chewing up CPU time and battery life while actively violating one's privacy.
> in a way that conventional ad blockers can't prevent
Yet. One day someone will create an ad blocker with machine learning that "sees" the ads and deletes them in real time. Should work on all content types, even on augmented reality.
This issue caused CF to irreversibly ban them though, so it's not "just a bandwidth issue" anymore.
> Based on the URL that are being requested at Cloudflare, it violates our ToS as well. All the requests are txt file extension which isn't a web content
> you cannot use Cloudflare to cache or proxy the request to these text files
> This issue caused CF to irreversibly ban them though
Do you have a source for that? The article only mentions them being throttled + the screenshot with the support engineer saying they seem to be breaking the ToS and asking them politely to move back into compliance.
To me, getting banned is when the provider locks out (or just deletes) your account and prevents you from using their service entirely.
CF didn't do this. They sent them an email telling them that what they were doing was a violation of their TOS and to cease doing it. They did not kill off their account. They still have the option to comply and continue with CF, which seems to be what they are going to do at the moment.
Hopefully, CF will grant them amnesty on this one. At the end of the day, an HTML file is just a text file, so I don't see why this would have even mattered to begin with.
Rate-limit the GeoIP list for the affected areas to drop if more than 20% of active traffic. i.e. the service outages get co-located only with the problem users areas.
Also, when doing auto-updates: always add a chaotic delay offset 1 to 180 minutes to distribute the traffic loads. Even in an office with 16 hosts or more this is recommended practice to prevent cheap routers hitting limits. Another interesting trend, is magnet/torrent being used for cryptographic-signed commercial package file distribution.
Free API keys are sometimes a necessary evil... as sometimes service abuse is not accidental.
That would only work if they had an API; AFAICT, they're just hosting a file.
At this point, they might be better off coordinating with the other major adblocker providers and just outright move the file elsewhere. Breaking other people's garbage code is better than breaking yourself trying to fix it. Especially on a budget of $0.00.
If the defective code for the browsers are in public repos, it might also be more effective for someone to just fork the code, fix the issue (i.e. only download this file once a month, instead of every startup), and at least give the maintainers a chance to merge the fix back in.
This could allow client specific quotas, and easy adoption with maintained projects in minutes. Thus, defective and out-of-maintenance projects would need manually updated or get a 404.
API keys are most successful when they're issued for server-side use; when used client-side the usual pattern that I see is for individual clients to request their own API key?
In this case, it would need to be distributed to myriad users who legitimately need to ask for the lists and then could be scraped by the "attacker", but at least then they'd have to be knowingly malicious vs. accidentally malicious.
You generally add a small "cost" to request an API key. For example submit your email to this form and wait a day.
Then browser makes like this will not reasonable be able to request a new key automatically for every install. So they will just request one and ship it.
Then when you get abuse like this you can disable it.
Moving the file elsewhere won't fix it. They are serving terabytes of traffic on Access Denied, it won't go away if that changes to "Not Found" instead, the developers seem already entirely ready to ignore their adblocker just not working.
The query limit from the lets encrypt SSL service operates in a similar manner. If you hit it more than a few times a week for the same domain/token/key, than you are banned for 5 days.
In general, it is easier to setup filters after differentiating legitimate from nuisance traffic. For example, fail2ban looks at the log of errors from invalid hits, and bans the IP or entire ISP block ranges for 5 days. This ban than get propagated to the rest of the web via spamhaus.org listing.
i.e. the users start to see the worlds internet become unreachable, as admins start to block traffic at the NOC's routers, and so on... India knows about Karma.
I am more surprised the app store for the apk isn't getting sued for theft of service.
Good luck finding the legal contact, not to mention suing, some random developer in India who apparently already abandoned the project.
I will also point out that if you fail2ban the IPs requesting the spamblock list, it could become worse if the browser just retries endlessly in the background. The traffic for a 404 page could be much smaller than the traffic of the very same devices trying again every few seconds, constantly, instead of only checking that 404 every app restart.
In general, the client socket needs to timeout on black-holed connection attempts (several minutes), and the server never sees a TCP handshake packet if the IP is on the global routers ban lists.
As a side note, some people build spider traps that reply with a pre-baked bzip file as a spoofed HTML compressed response. Thus a client program dutifully decompresses a few TB sized document, and browser exits due to memory issues. Note most modern Browsers are wise to this trick these days, but I doubt a dodgy plugin disk-usage limit check would catch a client side storage-flood. People shouldn't do this though, even if it is epically funny and harmless. =)
The client can just set a custom timeout and close the socket after 5 seconds of no or low activity. Then try again.
On your side note; EasyList probably does not want people to start suing for starting to distribute malicious content to users (and crashing your browser on purpose is arguably malicious).
There are all sorts of games people could play, as a DoS is technically an act of war under some legal systems. For example:
1. take the top 200 most popular websites in the given nuisance area
2. add ban rules to a version-B list that also includes all social media, search engines, and Wikipedia.
3. Look at the user-agent string for that specific problem client, or extreme apikey quota abuse
4. Randomly serve version-B filter list that breaks the browsing experience after a frequent update. Increase random breakage until traffic rolls off to normal levels.
The TOS for the ban list file does not specify which sites it will ban, and most users will just assume it is the App that is broken (it is already). People should not do this either, even if it is also funny and relatively harmless. Also, suing people while participating in an attempted crime probably would not go well. =)
Even then I would not do this without clearing it with a lawyer first. You could still end up at the wrong end of a lawsuit that you'll have to defend in India.
There are numerous legal/accounting specialists that protect businesses and investment decisions. We already won't serve _any_ content to IN networks as business policy... so are unlikely to ever have to visit with the cobras.
Bittorrent, switch in long-term to that. Not saying every end-user should be a seeder but there is big bittorrent community out there and everyone could help a little bit.
Other options:
- A kind of mirror network (it only needs to keep sure that integrity can be checked, maybe with a public key)
- And while doing that why not also support compression (why not? only devs need to read it and they can run easily a decompression command), every bit saved would help.
IPFS is just distributed content discovery and download.
Someone advertises the content to the network, then people looking for the content can find it. Usually the people who find it advertise that they now have it to the network too.
It has nothing to do with cryptocurrency but is commonly used as a great way to embed "larger" content into blockchains and other immutable stores. It works well for this because 1. The CID contains a cryptographic hash of the expected content 2. You can change where/how you store the actual content without updating the URL.
The tl; dr I've been given is that IPFS is essentially one global DHT swarm with some searchable/discoverable magic on top. Not exactly the same as DHT torrenting but close enough fir the simple explanation.
Assuming it's not a kind of DoS attack, and since it sounds like they can detect the abusing clients (maybe by User-Agent)... some very desperate technical options involve serving an alternate small blocklist that does one of:
1. Try having it block subsequent requests for EasyList itself, just in case the frequent update requests are made with the prior blocklist in effect. (I accidentally did this before, in one of my own experimental blocklists, atop uBlock Origin.) Then the device vendor can fix their end.
2. If the blocklist language and client support it (I suspect they don't), you might safely replace or alter some Web pages, to add a message saying to disable EasyList in the client, or pressure the vendor, or similar. If this affects a lot of users, the meaning will also be spread in other languages to other users, even if not all of them understand any of the languages in the message. But be careful.
3. If you can't get a better message to the user, another option might be to block all requests, to prompt users to disable EasyList or vendor to fix the problem. But before doing this, you'll need to have verified that a combination of shoddy client/device software won't prevent users from using important functions of their devices for significant time. (Imagine this might be their only means of being connected online, and some shoddy client software pretty much prevents it from working, and the user is unable to access critical services.)
But before doing any of these desperate technical measures... First, I'd really try to reach people in the country who'll know what's going on, and who can reach and possibly pressure the vendor who's causing the problem. If tech industry people aren't able to help quick enough, reaching out to that government, directly or through your own country's diplomats/officials, might work. Communicating the risks of the desperate technical measures that you're trying to avoid (e.g., possibly breaking critical communications) could help people understand the urgency and importance of the situation.
They already have that part figured out. From the article:
> When we encountered a similar problem last year, we found a simple solution: block the undesired traffic from these apps. Even so, we continue to serve about 100TB of “Access Denied” pages monthly!
The difference is that serving Access Denied Leads to the users of these malicious browsers just getting more ads over time, as the filter lists can’t be updated anymore. Serving a special list containing popular sites would result in the users almost instantly not being able anymore to access these popular sites, resulting in requests to the developers to fix their shitty browser or switching altogether.
Easylist should serve the Indian browser (based on user-agent) with a giant file (expensive), a corrupt file, or some response which causes the app the crash. If the browser crashes on every startup due to a malicious response from the Easylist server, users will likely delete it.
Serving a giant file is going to affect their servers more than the end device. If they could identify the user agent it would be a lot easier to just block it entirely.
And a terrible idea because the users are not at fault, but the developers of the app. And in this case it might crash so they'd only report "app crashes on startup", if at all.
Then the browser could just pretend it is Chrome (if it isn't already) and that would easily work around your solution.
I think something like this would be best served by moving to IPFS or Bittorrent. A magnet link could be provided and then browsers and plugins could use that to download the file. That way, you can distribute the load.
So I understand some of the comments streaming down on CloudFlare, but I would like to get to another point entirely.
So people writing apps using crappy coding are cratering Easylist basically through unintentional DDoS. It is the apps that suck. Full Stop.
I have noted over the last five to ten years the general retrograde usability of phone apps and full desktop apps in general. Bad UIs, inconsistent behavior, performance issues on stuff that, at least on the surface, appears to be trivial.
I don't understand what is causing the general slide in quality, but it is clearly visible, and it seems to me, untenable over the near term.
Is it the tools? Is it the app churn pressure? I really do not know, because I work in a very different part of the industry. We have our own issues there, but that has more to do with technology churn (particularly in wireless standards) than in the tools and platforms.
So what is up with the app world, in general, and the web app world in particular? I am all ears, because as I said, I don't work in that space.
Regarding the 100TB of Access Denied pages: just drop the connection instead.
To make the system more scalable: instead of directly serving the file, serve a bunch of URLs to mirrors plus a checksum. The client must pick one of them. You can randomize the URLs and maybe add some geo logic to it. Let people provide mirrors. An additional indiraction step like this can prove incredibly powerful for systems that need to scale massively.
It would seem like you could prevent hotlinking by adding 1-5 minutes of latency to every request to a list.
Almost no dev would hotlink an asset that took that much longer to display, at least in critical/common paths. It would force consumers (devs/businesses) of the lists to provide a caching/mirroring solution of some kind for their users.
But on the bankend, the request would be designed just for updating the list cache. Handling 1-5 extra minutes per request, on a request that runs less than a few dozen times a day, to update the mirror/cache is trivial.
The issue with this approach is it's too late. It might work if you designed it from the start, but adding it now would only destroy your poor balancer with all the connections they have to maintain (waiting for the 5 minutes to expire).
It was mentioned in this article that they are now serving up accessed denied, but the problem is one of just too many requests.
At this point, it's likely easier to just kill the domain all together and get a new one.
This is certainly not a cure to the problem Easylist has right now. This is prevention. About how to design publicly consumable resources to naturally discourage hotlinking, before it is a problem.
This seems the perfect use case for letting a secure BitTorrent tracker share the lists, then either implementing the client in the browser, or having it as a system service that syncs the necessary files.
Yeah! Just share through BitTorrent, this is the perfect example where it can be useful. To be honest I’m not sure why this isn’t built in to browsers yet Oracle browser has one.
One downside is that Torrent doesn't really share blocks between torrents. So updating the torrent would result in a full download.
That being said the data is small so it probably isn't a big deal.
A better solution may be something like IPFS where with a rolling-checksum chunking algorithm you only need to download the changed parts of the list. You could even use IPNS to distribute updates for a fully decentralized setup.
> You pay 1¢ (or less!) for every Gigabyte of data accelerated over our cloud network.
doesn't seem clear cut as they bundle it in a subscription packages, but at this volume you'd surely need their enterprise package (10Tb+), which you'd expect to get 0.01 or less
Several opensource projects, especially Linux distributions, have solved this problem long ago by setting up a web of volunteer mirrors. They docs and scripts to quickly set up an additional mirror.
The central HTTP server of easylist could then hand out 302s to fetch the actual file from one of the mirrors. Alternatively to 302s, a modern scriptable DNS server, that uses the mirror list, responds with different IPs, round-robin (or even better if it's geo-aware).
DNS TXT records could maybe be used to serve a digest of the file, do that mirrors can't modify it without that being detected.
I am not speaking on behalf of the company, but if someone involved with EasyList can contact me (avani@cloudflare.com), I'll see if there is a way to help out.
How do you determine that it is these browsers requesting it and not uBO? It would be easy for the browsers to set their useragent to something like Chrome.
I wonder what would happen if they renamed the file to robots.txt and then did a redirect at the cloudflare level for the current URL to robots.txt.
I imagine some (many?) clients would handle it poorly, but it would then be cachable at least. It's not exactly easy to test though without a level of unknown damage to legitimate users.
pool.ntp.org hands out specific subdomains for large-scale pool users. This way, it is possible to retire service for a subset of users that use devices that aren't updated anymore and are misbehaving.
The traffic issue is not just punted to DNS service. It's possible to return a cachable 127.0.0.1 response, and it's somewhat rare for DNS caches to be constantly powered up and down and reach out directly to authoritative DNS servers.
Haha, yeah. Just proposing to blanket punish an entire country for a few developers screwing up, and throwing a little racist joke in there at the end. Good one.
Pretty simple to put 1B people into the same drawer, for a few bad apples. Think about what that would mean if the world applied that to (say) the US...
The thing with Access Denied is that these deprived clients retry with some vengeance. So, you're instead draining more resources than you'd like. I run a content-blocking DoH resolver, and this happened to us when we blocked IPs in a particular range and the result was... well... a lot of bandwidth for nothing.
This is what I was wondering. I'm taking a wild guess that maybe they don't have that level of firewall access and it was being done through filtering by the webserver to provide an access denied.
But why bother with deny? Just send a blank text file (or one with as minimal data as needed to satisfy the rogue adblock) to the "blocked IPs" to mitigate the traffic for now. If firewall access exists, just drop the offending incoming traffic entirely.
> Just send a blank text file (or one with as minimal data as needed to satisfy the rogue adblock) to the "blocked IPs" to mitigate the traffic for now.
The sent http body was blank, but I beleive we were still sending http head...
> If firewall access exists, just drop the offending incoming traffic entirely.
True, but the service we were using at the time didn't have a L3 firewall, and so we ended up moving out, after paying the bills in full, of course.
That reminds me of the absolutely insane amount of traffic my mother's Roku TV shits out when it can't resolve/reach its spyware and telemetry services. It's like 95-98% of the blocked traffic on her network.
Is there a clean solution to this problem these days? Like some kind of adblocking router that resolves these addresses correctly but then routes packets destined for these services into a black hole so the requests eventually timeout? That would at least slow the repeat request floods down significantly.
1. Restricting access until developers fix it.
2. Consider encouraging the use of webtorrent within the extension? = each user hosts and serves the list.
Webtorrent isn't distributed, so you'd just shift the problem to the tracker/signalling server.
And in this particular case even BitTorrent proper may not have helped because steady-state BT is distributed but if a client doesn't persist its state after a bootstrap - and lack of persistence is the issue here - it'd hit the bootstrap server every time.
Granted, it'd only be about one UDP packet per client, much less traffic than what is easylist is seeing, but foolish code deployed at scale can still overload services provided on a budget.
One lesson I guess many people (I for one) learned over the years: never ever serve anything directly on your domain, always use a subdomain. You never know which part you'll want to move off of your server...
just ban all Indian IPs from the website on firewall.
proper solution would be to use DNS to forward all Indian traffic to one of the local VPCs.
The idea is to rate-limit users - let them pull blacklist.txt only once a day, so you serve the file and add requestor's IP to denylist on the firewall. So that any subsequent requests are blocked by firewall
From the perspective of someone who got their start from building small web tools and slapping ads on them and putting them online, which paid for most of my expenses to get myself through university, it's kind of funny that the people that want to use the internet "for free" at the expense of builders of websites, now also want to be able to block ads to continue to extract value out of everyone else also for free by asking Cloudflare for a free plan.
Pay for your own darn servers, everyone else has to and can't even use ads to support the costs due to you.
I used to host my blog on my own server years ago and some popular site "hotlinked" to an image I had posted. My traffic - which back then I paid for by the Gb - spiked like crazy. I decided to put in Apache referrer checks for images and started serving some porn when it didn't match my domain. That solved the problem pretty quickly.
Easylist should do something similar - requests from India should include a list with all the popular sites like Google, YouTube, aajtak.in, etc. When their browsers suddenly stop working the problem will be solved.
> EasyList is hosted on Github and proxied with CloudFlare.
What is the reason for proxying through Cloudflare? Are there any bandwidth limits or performance issues when directly serving those files from GitHub?
> GitHub Pages sites have a soft bandwidth limit of 100 GB per month.
> In order to provide consistent quality of service for all GitHub Pages sites, rate limits may apply. These rate limits are not intended to interfere with legitimate uses of GitHub Pages. If your request triggers rate limiting, you will receive an appropriate response with an HTTP status code of 429, along with an informative HTML body.
> If your site exceeds these usage quotas, we may not be able to serve your site, or you may receive a polite email from GitHub Support suggesting strategies for reducing your site's impact on our servers, including putting a third-party content distribution network (CDN) in front of your site, making use of other GitHub features such as releases, or moving to a different hosting service that might better fit your needs.
GitHub has a soft limit of like 100 GB/month on transfers for Pages. According to the Adguard blog post traffic was already several TBs a day before the issue arose.
I'm curious, if an arbitrary GitHub repo suddenly started attracting hundreds of terabytes of egress, violating GitHub's ToS, would GitHub manage traffic in coordination with the repo's owner, or would they disable the repo and suspend the account?
I suspect the latter. I don't know how to make a repo public but limit web traffic to it. Do you?
I could see disabling viewing raw links. But if the repo becomes popular to fork what would GH do? The friction of using git instead of HTTP will prevent 99.9% of hotlinking. So it probably couldn’t become too popular.
It seems their total traffic is 1000-2000TB per month and they're mostly just serving a single text file.
You can serve that easily from a dozen dedicated servers for a low four figures amount. Or just get one server with a 10gbit/s connection if you don't care about being geographically close to everyone.
These numbers aren't large, but many CDN companies (and AWS) would like you to believe they are and you should be paying them insane amounts of money to serve that kind of traffic.
Why not just serve a special list to all android users from India that contains the top 100 most popular Indian websites (https://www.similarweb.com/top-websites/india/)? Surely users will stop using that browser when they can't visit those websites anymore because those are now blocked instead of the ads.
I'm sympathetic to their trouble but we're talking about serving a 330KB text file (150KB compressed), surely this isn't an insurmountable technical hurdle to overcome?
A 1000mbps dedicated server could serve it 70 MILLIONS times per day. Considering that most wouldn't be served (E-tags and whatnot), it can probably sustain a billion requests a day.
It’s not really the serving that’s the issue - it’s the amount of bandwidth used… in the case of serving simple content (like txt) bandwidth is always going to be the expensive element
> There’s an open source Android browser (now seemingly abandoned) that implements ad-blocking functionality
> The problem is that this browser has a very serious flaw. It tries to download filters updates on every startup, and on Android it may happen lots of times per day. It can even happen when the browser is running in the background
EasyList should be offered as a version-controlled copy you grab once, that then gets bundled with an app, rather than offering a download to be called from an app: https://easylist.to/easylist/easylist.txt (Currently down as of writing).
The only caveat is such a list needs to be updated, so then a version system should be implemented for EasyList and you periodically bundle the new version via app updates. It would save a lot of bandwidth doing this.
They're offering a text file. Presumably with an If-modified-since header, although it's hard to check now.
There is no approach you can describe that doesn't run afoul of the described badly-behaved browser app which willfully retrieves the entire file afresh at every init. If it can be downloaded, it will be downloaded directly by the badly-behaved mobile apps.
Bad app will just brutally fetch it every time with not even a cache on its side.
As a quick fix, there are many options for limiting per IP per timespan, e.g. fail2ban, you could configure it to punish bad apps without crippling functionality for others. Well, maybe crippling a little bit in some very special use cases, still better than it simply not working.
PiHole and PFBlockerng are two big ones that use these resources too and setting those up it struck me as it did you that simply polling these resources on a set schedule was a waste.
Podcasting 2.0 has been talking about podping as a solution because podcasting basically has the same problem with periodic polling of the RSS feed. Basically you subscribe and then receive notice there's been an update, THEN you go get it.
Scaling serving of a static text file would make a fantastic job interview question, especially as you explore what happens as the text file gets bigger and the number of downloads gets higher. The "correct answer" isn't necessarily obvious in any of these cases.
That shows the discrepancy between linear program execution model (on the classic desktop) and the quasi permanently running model of modern apps.
Maybe the site should identify these rowdy browsers and give them a kick.
That's a different list. It only contains the hosts/domains on easylist, because that's all pi-hole (and all host based blockers) can block. It's also hosted by someone else (and they too use Cloudflare, see firebog.net).
The normal easylist is way bigger and has lots of rules for ad blockers like uBlock Origin.
Serve up a text file to connections from India that just has a wildcard entry that causes everything to be blocked. Those crappy browsers will very quickly loose their popularity.
Author here. Actually, it's getting better, I've just looked up the stats and for the last 30 days we only served 70TB of access denied pages, this is about 33-34B requests.
I have my pi-hole configured to use the Domains list from https://oisd.nl/ which incorporates a bunch of other lists (including easylist), de-duplicates, and removes a few false positives. They also have an Adblock Plus Filter List that can be used with uBlock Origin.
I can't imagine slowing down could be a good idea. At their scale, the sheer number of connection count probably matters more and may contribute to a higher proportion of cost. Bandwidth is expensive yes, but keeping a connection alive means consuming extra memory for the socket in the kernel, in the app, and in many other places.
> Even so, we continue to serve about 100TB of “Access Denied” pages monthly!
I might be ignorant of the scale of things but surely a 404 page can't weigh more than half a kilobyte right?
I think EasyList maintainers have every right to break stuff in this case. Also could easily turn the list into a HTML file as it's just as parsable as txt.
To be fair to them, this is configuration data, not a piece of a website you would read in a browser. I don’t agree with the policy but it is reasonably clearly worded.
I think GeoIP blocking is a good strategy here. Wikipedia has used this strategy and it has been reasonably successful. People can still log in to GitHub and download the list regardless of where they are located.
The solution to this problem is to use a public R2 bucket with caching enabled. Block all requests to non-existent files with firewall. Now you don't even have to block those browsers from India!
It seems they need some form of UUID in order to rate-limit individual clients. I wonder what percentage of their traffic would drop if they started requiring some form of authentication to download this list?
You'd still need the coordinator node (tracker) for every single request, so you'd still be serving an absolutely insane amount of traffic off one central box.
If you baked something like IPFS or DHT torrent capability into the application and requested your blocklists that way you'd solve the single point of failure problem, but that's asking a whole lot from a shittily maintained and poorly configured browser fork.
They shouldn't have to foot the bill for this. This is some bad developers releasing a bad fork of a bad browser.
At a minimum, they should have either reduced the requests to something like a monthly download (not great, but far better than requesting a file every startup), or ideally, hosted and updated the file themselves, on their infrastructure. At least that would force them to look at their own hosting bills, instead of crippling a prominent and important contributor to ad-blocking software.
Cloudflare's rationale for a ToS violation is absolutely bonkers. Who cares what the file extension is? By virtue of being accessible on the web via a URL it is by definition web content.
Cloudflare's response makes less sense the more you think about it. We are talking about URLs, where the entire concept of file extensions doesn't apply. Would they be okay if the developer simply changed https://eazylist.to/eazylist.txt to https://eazylist.to/eazylist and served the exact same resource? Or does Cloudflare actually want to parse and validate the encoding or format of the served bytes?
Cloudflare's TOS only permit caching for HTML content and related assets. It's probably a useful catch-all they can pull out any time someone uses too much bandwidth.
> It's probably a useful catch-all they can pull out any time someone uses too much bandwidth.
That's exactly what it is, and Cloudflare should be honest about that instead of coming up with an excuse as pathetic as ".txt files ain't web content".
Ok, so then why not make a structured HTML page that's served and parsed as the list.. that would be perfectly fine? Cloudflare's stance on this makes no sense
That would be fine because then CF could replace that HTML with their "Checking your browser before accessing" content. What's the app going to think of that though?
> That would be fine because then CF could replace that HTML with their "Checking your browser before accessing" content.
They could do that anyway by putting up the "Checking your browser before accessing" page and redirecting to whatever file was being accessed. There's precisely zero need to outright modify the HTML files being served to inject such checks.
By serving the HTML instead (with JS that does whatever checks and then redirects to the intended resource), and if the client can't cope with that, then tough shit; the absurdity of such arbitrary evaluation of whether or not clients are sufficiently browsery to be worthy of consuming bandwidth aside, ain't "only browsers directly navigating to this will be able to cope with this and gain access" exactly the point of using such JS-based client checks in the first place?
<sigh> there's a lot of similar comments to this one. In short, it's much harder to protect a text file against DDoS. The ToS say 'a disproportionate percentage ... non-html' likely because they need to be able to apply their browser checks to clients.
> <sigh> there's a lot of similar comments to this one.
<sigh> and a lot of similar rebuttals to this one.
> In short, it's much harder to protect a text file against DDoS.
It's exactly the same difficulty. Bandwidth is bandwidth, bytes are bytes. The only suggestion I've seen where HTML/CSS/JS might be relevant is using JS for client-side probing, and if Cloudflare is indeed injecting arbitrary JS into HTML pages it serves then that's utterly horrifying and is a problem in and of itself.
>> <sigh> there's a lot of similar comments to this one.
> <sigh> and a lot of similar rebuttals to this one.
My point is that a lot of people don't seem to understand the 'why' of this issue and instead appeared to just jump to the conclusion that:
> It's exactly the same difficulty.
When that's not at all the case.
> using JS for client-side probing, and if Cloudflare is indeed injecting arbitrary JS into HTML pages it serves then that's utterly horrifying and is a problem in and of itself.
Well they are. From CF:
>> Cloudflare’s bot products include JavaScript detections via a lightweight, invisible code injection that honors Cloudflare’s strict privacy standards
But even before we consider that, if the request is for html then it's likely coming from a browser. If CF replaces that html with their own then the browser will likely run it allowing them to run all kinds of probes then run the redirect. The same is not true for a .txt file or an image.
> My point is that a lot of people don't seem to understand the 'why' of this issue
We understand the "why" of the actual issue - i.e. the bandwidth consumption - just fine. It's Cloudflare's decision to micromanage data formats consuming that bandwidth (instead of just, you know, evaluating the bandwidth itself and being done with it) and using that as their basis for a ToS violation that's entirely whackadoodle.
> Well they are.
Then that is indeed absolutely horrifying and a problem in and of itself. Privacy concerns (of which there are a multitude) aside, this seems like a really great way to break all sorts of things, and I'd trust the claims of "lightweight", "invisible", and "strict privacy" about as far as I can throw them.
> If CF replaces that html with their own then the browser will likely run it allowing them to run all kinds of probes then run the redirect. The same is not true for a .txt file or an image.
The same is absolutely true for a .txt file or an image. Consider two flows:
1. You click on a link, your browser loads a text file.
2. You click on a link, your browser loads an HTML+JS file, the embedded JS redirects your browser, your browser loads a text file.
Same end result, same client-side probing opportunities, and without needing to rely on swinging JS into the end document like Patrick Bateman swinging an axe into his coworker's face while ranting about "Hip To Be Square".
Or better yet: just don't do this and only care about the raw bandwidth consumed, instead of actively making the World Wide Web a worse place with "clever" tricks like anally probing my browser to see if it's browsery enough for some arbitrary standard of browserness (for the sake of a "protection" that's almost certainly trivial to break with one of the umpteen headless browser solutions anyway).
I suspect that the real problem isn't the type of content but the fact that it is being requested by robots at a high rate. If you had a text file that humans were actually browsing and reading I doubt Cloudflare would care, but humans read files at a much lower rate than is occurring for this text file.
Say you do that and I DDoS the easylist.html. Cloudflare will start applying their DDoS mitigation. Now everyone's app receives the Browser Integrity Check instead of the list.
But what about simply enable a firewall and show captcha or similar if the origin IP is from India and requesting that URL until the situation is under control? I did that with the free plan recently in CloudFlare in a similar situation and it worked perfectly (of course on a much smaller scale).