New headless Chrome has been released and has a near-perfect browser fingerprint

natorion · on Feb 19, 2023

I am the PM working on Headless. Feel free to ask questions in this thread and I will try to answer them if I can.

Edit: Please also note that we have not released New Headless yet. We "merely" landed the source code.

Bender · on Feb 19, 2023

There are many comments about potential abuse. I would be curious to know if your team have ever challenged each other to look like a real person accessing a site and the other part of the team tries to detect and block them? If there is anyone that could do this it would be the creators of Headless.

Why go through the exercise, one may ask? I believe it would be a critical thinking exercise to improve Headless even more while giving website maintainers a way to opt out of receiving traffic from it. If not your team, have you reached out to see if people from project zero would take on that challenge in their abundance of spare time? [1]

[1] - https://googleprojectzero.blogspot.com/

natorion · on Feb 19, 2023

We regularly get feature requests for Headless to provide a field or property that can be polled by JS frameworks to detect if Headless is active e.g. windows.isBot.

Well, Headless is open source, which means anybody could build a Headless version with such a property set to "I am a human, trust me!" and employ such a modified binary ... ;-)

Bender · on Feb 19, 2023

Oh absolutely, relying on a header would be a placebo at best. I was thinking more along the line of having two teams, one that develops Headless and another team at Google that try to defeat it non stop. An official game of cat and mouse. Project: Tom and Jerry? I guess legal would never buy into that name.

My own personal method for my silly hobby sites is just to put passwords on things with an auth prompt delay.

dmix · on Feb 19, 2023

Why should Google redteam their headless browser though? As other comments point out there's plenty of ways for bot detectors to id bots even with a browser which mirrors a normal one: https://news.ycombinator.com/item?id=34858056

Almost all of those are things are outside of the scope of the browser itself. And anyone doing serious bot attacks already have scripts/forks that modify these signals. I don't see how the chrome team could do much to help stop that at that level.

Bender · on Feb 19, 2023

In theory their blue team could come up with even more advanced puzzles that bots trip over and then open source and document the bot puzzles. I don't know that they would, incentives or lack thereof and all. If nothing else it might make their work day more fun.

Or if I put my evil corp hat on, the incentive could be that they make puzzles that only Headless can get around and all other bots become trivial to block and obsolete by even the least knowledgeable hobbyist. Perhaps Google release Nginx, Apache HTTPD, Apache Traffic Server, Envoy and HAProxy modules that only Headless can get around and all other bots internet-wide are entirely silenced. Chrome becomes the one and only bot to rule them all.

robertlagrant · on Feb 19, 2023

Why would they want to do that?

Bender · on Feb 19, 2023

Oh man, you're making me put that hat back on.

I suppose that Google going through that exercise would mean that they get market dominance on bot gathering data and anyone not using Chrome Headless would be unable to obtain freebie data. This could enable future features whatever that may be. readjusts hat One future feature could be auto-discovery of Google DNS and Google proxies in GCP so they can learn about new data sources through crowd-sourcing thus making their big-data sets more complete and their machine learning more powerful. Developers could block the proxies or compile them out but as we know most people are too lazy to do this and many won't care.

Another advantage would be that eventually the only bots abusing Google would be bots using their code and they would know how to detect and deal with as they would implement their own open source anti-bot modules in their web servers, load balancers, etc...

There are more obscure ideas but I am doffing the hat before the hat-wraiths sense it.

imglorp · on Feb 19, 2023

RFC for IPV4 evil bit.

https://www.rfc-editor.org/rfc/rfc3514

Bender · on Feb 19, 2023

You jest, but I could actually see this becoming a thing. I envision a future dystopian internet where people first have to authenticate their network gear, PC's, laptops, cell phones, cars, trucks, e-bikes, toasters, coffee makers to a government contracted service. Once authenticated they utilize something similar to that RFC but probably instead a nonce or jwt token tied to their device that gets embedded in the packet header somehow. Then sanctioning a continent, country, state, ISP, city, company, manufacturer, distributor or person would be simply disabling their evil bits so to speak.

The push for this is starting with adult content [1] but the goal posts could easily be mounted on train car with a very long and smooth train track that only goes downhill.

[1] - https://news.ycombinator.com/item?id=34726509

rektide · on Feb 21, 2023

There's a huge amount of aggro pissy shitthrowing that Chrome is facilitating automation in these threads. Bollocks.

You know what? The Internet Is For End Users [1]. If we're going to cite an RFC, it should be RFC 8890. Not having a better headless Chrome would be a violation of the most basic principles of the internet.

There are some cases where automation can get out of hand, but blocking these efforts should not come at user expense. So says the RFC8890, and a general collective belief/hum-in-the-room. The availability of a good browser like Chrome helping should not be an issue, given how many other ways bad players have to go too far & cause harm to sites. The people who have to deal with this are not the priority & this doesn't radically change their troubles; this radically helps end users wishing to exercise agency though.

In most cases being able to script & automate a site is a completely primitive user-agency, of no special regard. Headless Chrome being a somewhat tolerable way of doing that scripting is 100% morale, correct. It greatly assists us in fulfilling a primary & clear overarching purpose of the internet: to be for end users.

I wish I could say I cannot believe the complaining & whinining & snivelling, the pretentious-nonsense/acting-offended that Chrome would dare help make good automation. I wish I could say I don't think this crowd recognizes nor comprehends the basic purpose of the internet, but again, I think I know better; I suspect they do but their protests are disingenous, that they have allied their hearts with darker forces, against the user.

[1] https://www.rfc-editor.org/rfc/rfc8890

charcircuit · on Feb 20, 2023

>Headless is open source, which means anybody could build a Headless version with such a property set to "I am a human, trust me!"

This is flawed reasoning. Just because we can't eliminate abuse from headless browsers that doesn't mean we shouldn't work to reduce it. Finding such a modified binary or making it yourself is additional friction that will cause less of these bots to exist. Some people may not care if a website is able to block them or not or some people may not decided to do the work to read the robots.txt. By implementing these capabilites into the product by default you are making the web ecosystem a better place wit less abuse. You are right that someone could make a version without the antiabuse parts, but surely that fork will be less popular and less used.

Aeolun · on Feb 20, 2023

What about if I want the headless browser to look exactly the same? Why should we make a distinction between humans and machines?

sagebird · on Feb 22, 2023

If I run a soup kitchen, and Google is sending robots to my establishment which are indistinguishable from humans, I should I have the right to ask if the client is a robot.

I would hope that Google's robots would not be programmed to lie to me, but would be honest.

If robots are required to be honest, then I have a choice to serve them or not. If they are not honest, I do not have a choice.

charcircuit · on Feb 20, 2023

Then don't add code to your site to make it work different?

>Why should we make a distinction between humans and machines?

Because machines can be used to abuse a site at a scale that humans can't. Site owners want to protect their site against abuse.

Aeolun · on Feb 20, 2023

By modifying the browser. It feels like DRM by a different name to me.

charcircuit · on Feb 20, 2023

Okay? I don't care what you call it. It will reduce the amount of abuse in the world and that is a good thing.

sagebird · on Feb 22, 2023

While I appreciate your answer from a technical point of view - indeed it is trivial modify/spoof - there is an ethical dimension.

Should bots have the legal right to say they are human?

For example - if Google Inc is visiting a web page to collect information about it using a headless bowser, and the server asks - are you a bot - should Google be legally or ethically allowed to answer no? (declarations in headers could remove the need for question/answer chatter.)

(I want to pre-empt dismissing this line of questioning via 'what if Google wants to know how the site will be served to a human for better search results because google could include a specific header for that, eg "I am a bot, but request that you serve the version of this page served to humans". It would be up to the server to honor or reject that request.)

The defaults Google choose have compounding effects in our society. If you make it "normal" for bots to pretend to be human, the industry has minimal pressure to hold any standard above what you do, and better norms may never appear, or be delayed by a decade. The alternative is to be thoughtful today to try to create a better world.

paulirish · on Feb 19, 2023

https://github.com/paulirish/headless-cat-n-mouse was this basic idea, but open sourced.

runlevel1 · on Feb 20, 2023

The destination of that escalation is DRM.

mike_hearn · on Feb 19, 2023

Do you guys ever think about abusive automation at all, or do you just consider that other people's problem?

lupire · on Feb 19, 2023

Abusive how? Headed chrome can be automated, as can wget.

Its bizarre to ask a client side program to implement server-side controls for users you want to allow on your site but throttle.

parker_mountain · on Feb 19, 2023

Headed chrome adds a huge amount of overhead, and can also be fingerprinted more easily. This is a lot more declarative and makes it easier to run an abuse farm. Although, per my other comment, I don't see Headless as a tool that will particularly move the needle on abuse cases.

squeaky-clean · on Feb 20, 2023

Isn't headed chrome usually fingerprinted by variables inserted by the chromedriver? You can rename these variables and be undetectable (you don't even have to recompile chromedriver, you can use a hex editor or a perl replacement).

At least I've never gotten detected.

runlevel1 · on Feb 20, 2023

There are even Puppeteer plugins that will do it for you. [^1]

The best detection I've come across so far (i.e. before this release) has just required I run headless Chrome in headed mode. Granted, I don't do a ton of scraping -- mostly just pulling data out of websites so that I can play with it in aggregate using more civilized tools.

[1]: https://github.com/berstend/puppeteer-extra/tree/master/pack...

scotty79 · on Feb 19, 2023

You call it abuse. Other people might call it use.

mike_hearn · on Feb 19, 2023

I've not yet encountered anyone who doesn't consider spam to be a form of abuse.

account42 · on Feb 22, 2023

Spam can be an effective way around censorship. What is and isn't abuse often isn't as objective as some people want to pretend.

aabbcc11 · on Feb 20, 2023

I am that anyone you mentioned. For example, autoposting on 4chan works very well for me. I spam goods on 4chan to buy or create opinions that I force.

scotty79 · on Feb 19, 2023

[flagged]

dang · on Feb 20, 2023

Would you please stop posting in the flamewar style? We've had to ask you this in the past as well. It's not what this site is for, and destroys what it is for.

https://news.ycombinator.com/newsguidelines.html

scotty79 · on Feb 20, 2023

I'm sorry. I'll try to bite my tongue more often when I'm in combative mood. Thanks for putting up with me so far.

hackernewds · on Feb 19, 2023

You call it use. Other people might call it abuse.

scotty79 · on Feb 19, 2023

That's my point exactly.

_moof · on Feb 19, 2023

Are we just misquoting the Eurythmics now?

pdntspa · on Feb 19, 2023

The implications of your question are beyond dystopian

DangitBobby · on Feb 19, 2023

Please elaborate.

pdntspa · on Feb 19, 2023

Because it suggests adding usage controls, possibly enforced via cloud connectivity, to add restrictions that will inevitably make legitimate usage more difficult, frustrating, and most importantly, subject to outside control. Extend this far enough and the world starts to look like Doctorow's "Unauthorized Bread".

This is an awful world, one designed to reinforce class divide and protect the entrenched and the rich by deliberately handicapping easily-accessible tools, because of a few bad actors. It creates a world where the code for literally everything is the most hideously complex version of itself because it is riddled with constant checks, phone-homes, and arbitrary usage limits. It further pushes us towards a disempowering future where our computing is limited exclusively to appliance-like devices whos inner workings are controlled for it. It stands against the very principle of general-purpose computing.

robertlagrant · on Feb 19, 2023

That's not beyond dystopian. It's just dystopian.

And implications of a question aren't either. Just your imagined implications. Questions aren't bad.

supriyo-biswas · on Feb 19, 2023

See my comment[1] on this very thread.

[1] https://news.ycombinator.com/item?id=34858232

aabbcc11 · on Feb 20, 2023

If you are soy developer who thinks cloudflare is god that should solve problems for you and use O(n^2) or even worse algorithms in your code so you can't even optimize it, it is only your problem, correct.

In 2000 sites were running where code has been precisely made such way DDoS attack was impossible. Now it is heckin sauce of js malware obfuscated proprietary code.

If your site like this, you deserved it. Cloudflare and such companies just need your money for solving 5-minutes problem like AWF that is just a regex, and you have limits even for user agent filtering, lol.

Stop making shitcode and learn HTTP and TCP/IP theory, and you will make antispam filter that is 200% better than any cloudflare shit that is simply malware that runs cryptominer as a "IUAM" mode for their own benefit and you even pay for it.

parker_mountain · on Feb 19, 2023

For what it's worth, the large "players" already seem to have this capability. They've forced pretty much everyone to roll out captchas, waf-level throttling, proof of work interstitials, and behavior-based fingerprinting.

While my immediate response was the same as yours, I think this actually won't really change much in the way of bad actors.

It's unfortunate, but basic controls (such as throttling, etc) are pretty much a floor-required feature - one way to avoid this burden is to do things like use 3rd party idp (aka google login). I'm not happy with the state of things but I don't think headless will particularly contribute to a material increase in abuse cases.

nobu-mori · on Feb 19, 2023

Now that headless mode is a "real" Chromium instance, is it possible to add extension support to Chrome running in headless mode?

rektide · on Feb 21, 2023

I didn't know this was a restriction before! Interesting. I would have assumed old headless had a profile, that typical command-line efforts[1] would let one load extensions. Are we sure that your question is valid? Are we sure that previous headless Chrome didn't have profiles or couldn't load extensions? I'm not sure this question is valid. I think maybe the assumptions here are incorrect.

The new Chrome headless certainly purports to be "just Chrome" "without actually rendering." One of the notable differences in the new headless mode is that it at least shows the stock/built-in extensions. From the submission:

> Similarly, when it comes to plugins, the old headless Chrome used to return no plugins with navigator.plugins, which is a technique that used to be exploited for detection when Headless Chrome got released 6 years ago, cf this blog post. The new headless Chrome returns the same plugins as a headful Chrome, and that’s the same for the mimeTypes obtained with navigator.mimeTypes:

Maybe perhaps the new headless is faking it, but my impression is that extensions definitely work as normal in the new headless Chrome. How or whether they worked before is another very very interesting question I'd like answers to.

I do wish the AMA dev had actually replied to this. My hope is that this wasn't an issue before (but default plugins just weren't installed, and now they are, just to alter fingerprinting), and that now the situation is unchanged but default plugins are installed.

[1] https://stackoverflow.com/questions/16800696/how-install-crx...

nobu-mori · on Feb 21, 2023

https://bugs.chromium.org/p/chromium/issues/detail?id=706008

It looks like the new headless mode does support extensions.

skybrian · on Feb 19, 2023

Can you talk about your team's motivations for improving headless mode? Any particular use cases in mind?

natorion · on Feb 19, 2023

Here are two of them: -Test reproducibility -Automated configuration rollouts in enterprise environments

ccooffee · on Feb 19, 2023

Improving test environments is a huge upside. I haven't worked on browser automation in nearly a decade, but finding ways to work around shortcomings in the headless environment used to burn a lot of time on that team. I know of many small teams which made deliberate decisions NOT to do any browser automation tests (e.g. Selenium) because some issues required testing hooks in production code.

oh_sigh · on Feb 19, 2023

Is it too late to change the name from "new headless"? It won't be new forever, and then there will need to be a new new mode, or a differently named one that people think is older because it isn't the new mode.

dylan604 · on Feb 19, 2023

No, obviously, the next version will be called Newer Headless. Then you get the More Newer or Even Newer release. Or my personal favorite NewV2. /s

Using the word "new" in naming conventions is the most moronic and shortsighted way to name things in something that is quite obviously going to be changing in the somewhat near future.

robertlagrant · on Feb 19, 2023

New College is doing fine even with its name. It's just a name. Doesn't really matter.

dboreham · on Feb 19, 2023

Also New Forest.

oh_sigh · on Feb 19, 2023

It reminds me of "pont neuf"("new bridge" in French), which is the oldest bridge in Paris crossing the seine.

int_19h · on Feb 20, 2023

By all rights, it ought to be EvenLessHead. ~

plugin-baby · on Feb 19, 2023

See also: report_final_draft(1).doc

natorion · on Feb 19, 2023

You would be surprised how much we talked about that . New/old are just relevant for the transition period.

Macha · on Feb 19, 2023

But then how would you have the pleasure of figuring out the sort order between New $Feature, Advanced $Feature, Revamped $Feature and Enhanced $Feature?

andrewstuart · on Feb 19, 2023

What is the actual difference between old and new?

Is there a list, an explanation anywhere?

andrewstuart · on Feb 19, 2023

So this argument can be used these ways:

--headless

--headless=new

--headless=chrome

And each mean something different - but what?

Not documented, very frustrating.

Can you explain the difference between each of the above arguments?

quenix · on Feb 19, 2023

There are two headless mode: "chrome" and "new".

--headless enables the default, which is "chrome".

--headless=new enables "new".

starik36 · on Feb 19, 2023

Any chance of an build for the Raspberry Pi?

tmm1 · on Feb 20, 2023

So the --headless=new doesn't work on any released version of Chrome yet?

rmorey · on Feb 19, 2023

what makes the new one “Native” ?

natorion · on Feb 19, 2023

It's real Chromium, not emulating a Chromium browser. "Old" Headless was merely pretending to be a Chromium browser, the "New" Headless is a Chromium browser. "Old" Headless requires a parallel/duplicate implementation of features, which leads to subtle behavior differences or infeasability to support certain features e.g. extensions proper.

rmorey · on Feb 19, 2023

wow, i had no idea the old headless was a reimplementation. congrats on landing the new one

mh- · on Feb 19, 2023

Does this mean we might see proper extension support in "New" Headless?

Ono-Sendai · on Feb 19, 2023

Can this replace chromium embedded framework (CEF)?

natorion · on Feb 19, 2023

I fail to see the connection. Can you elaborate?

LilyFrenchPants · on Feb 19, 2023

[flagged]

natorion · on Feb 19, 2023

What rumors? Can you provide any links or context?

graderjs · on Feb 19, 2023

I built a remote browser based on headless Chrome^0 and this is going to make things way easier. It's also great to see Google supporting Chrome use cases beyond "consumer browsing", and perhaps that's in large part been pushed by the "grass roots popularity" of things like puppeteer and playwright.

One thing I'm hoping for (but have heard it would require extensive rejigging of almost absolutely everything) is Extensions support in this new headless.

However, if I'm reading the winds, it seems as if things might be going there, because:

- Tamper scripts now work on Firefox mobile

- Non-webkit iOS browsers are in the works

- It's technically possible to "shim" much of the chrome.extension APIs using RDP (the low-level protocol that pptr and its ilk are based on) which would lead essentially to a "parallel extensions runtime" and "alt-Webstore" with less restrictions, something which Google may not look merrily upon

Anyway, back to "headless detection", for the remote isolated browser, I have been using an extensive bot detection evasion script that proxied many of the normal properties on navigator (like plugins, etc), and tested extensively against detectors like luca.gg/headless^1

Interestingly one of the most effective way to defeat "first wave" / non-sophisticated bots used to be simply throwing up a JS modal (alert, confirm, prompt) -- for the convenient way it kills the JS runtime until dismissed, and how you have to explicitly dismiss it.

^0 = https://github.com/crisdosyago/BrowserBox

^1 = https://luca.gg/headless/

supriyo-biswas · on Feb 19, 2023

I'm assuming the next step will be to bring to Cloudflare's pet project of TPM attestation into Chrome, otherwise known as PATs[1]. And just like that, not only would headless be defeated, but all of you using rooted devices and small time browsers would be left high and dry.

It's "Right to read"[2] all over again.

[1] https://www.ietf.org/archive/id/draft-private-access-tokens-...

[2] https://www.gnu.org/philosophy/right-to-read.en.html

formerly_proven · on Feb 19, 2023

If you can only use PATs on headlessless machines, then they're actually headpats. Everybody loves headpats. I don't see a problem.

wizzwizz4 · on Feb 20, 2023

Heatpats are nice when they're optional. Mandatory headpats are not universally admired.

qbasic_forever · on Feb 19, 2023

What stops someone from making a fake TPM that speaks the appropriate protocol and just instantly signs off on every request? AFAIK there isn't some grand/central list of trusted TPM modules. Anyone can implement one as a Linux driver: https://www.kernel.org/doc/html/latest/security/tpm/tpm_ftpm...

A fake TPM would be useless for security but just fine for fooling websites that there is a real human at the computer.

melvyn2 · on Feb 19, 2023

From wikipedia:

> Computer programs can use a TPM to authenticate hardware devices, since each TPM chip has a unique and secret Endorsement Key (EK) burned in as it is produced.

That EK is signed by the TPM manufacturer, and so it’s likely they’ll only trust the keys of physical TPM manufacturers. Good luck forging that in software.

userbinator · on Feb 19, 2023

I wonder if we'll get a cat-and-mouse game with miscellaneous TPM manufacturers "accidentally" leaking their keys, getting blacklisted, creating new ones, etc. I'd like to think that there's at least a nontrivial amount of the population wanting to subvert the authoritarian corporatocracy and with the skills to do so.

qbasic_forever · on Feb 19, 2023

It's going to be an extremely janky or very private website if they only allow you to use it when you have 1 of like a dozen supported and approved hardware TPMs to view it.

syrrim · on Feb 19, 2023

The latest windows version requires a hardware tpm on a device in order to be installed. Every hardware vendor has therefore included a tpm on all their new machines. This was already standard on apple devices, and many android devices have one as well.

qbasic_forever · on Feb 19, 2023

Sure but someone who wants to build a web scraper won't care, they could use their own homebrew TPM that does a no-op and claims a user pressed a button or was present when they actually were not there.

I doubt websites will go to the trouble to keep a list of approved TPMs. It's the SSL root certs nightmare all over again and even worse. No one is going to want to deal with managing a whole new giant list of devices, having fire drill updates to revoke compromised ones, etc.

judge2020 · on Feb 19, 2023

What is the solution to automation then? What do I do when someone hits my content-rich Wordpress blog with a scraper that hits 100 pages a second to download my content, and my database falls over leading to real, legitimate users being unable to use my site? What if it’s not a legitimate scraper but someone with hundreds of proxies uses them to DDOS my site for days? Should I sacrifice my uptime to protect the freedom of those unwilling to attest that they’re running on real hardware?

camgunz · on Feb 19, 2023

The method to stop a (D)DoS is the same as it always was: caching and rate limiting.

Re: content scraping -- I was an indie web dev of a sort for a while and people always ask this question, and the answer is it's impossible to stop. Not even Facebook or big content sites like CNet or The Verge can stop it. At the bottom of it, you can just access the site in a browser and save the source. Content scraping is a rephrasing of "viewing content even just once". Stopping it is antithetical to the web and technologically infeasible.

robomc · on Feb 19, 2023

it's probably actually cheaper to pay people piece rates to do it for you in a browser than to pay a developer to write and maintain a scraping script anyway, so if the later became genuinely impossible moving to the former isn't a big deal.

simonw · on Feb 19, 2023

Put your WordPress blog behind a caching proxy with a 5s TTL - that way any amount of traffic to a URL will produce at most one hit every 5 seconds to your backend.

I've used this trick to survive surprise spikes of traffic in multiple projects for years.

Doesn't help for applications where your backend needs to be involved in serving every request, but WordPress blogs serving static content are a great example of something where that technique DOES work.

supriyo-biswas · on Feb 19, 2023

Proof-of-work schemes such as Hashcash[1] and simple ratelimiting algorithms can act as deterrents to spamming and scraping attacks.

There are other kinds of non-invasive bot management you can do as well, however, due to various reasons I'm not in a position to talk about it. A few other methods are mentioned at the end of the post being discussed[2].

[1] https://en.wikipedia.org/wiki/Hashcash

[2] https://antoinevastel.com/bot%20detection/2023/02/19/new-hea...

jefftk · on Feb 19, 2023

Proof of work isn't very practical here, because computation is a lot cheaper in datacenters than on phones.

supriyo-biswas · on Feb 19, 2023

The trick is to prevent the offloading of the proof-of-work challenge to another device, as suggested in the Picasso paper[1].

[1] https://storage.googleapis.com/pub-tools-public-publication-...

schlauerfox · on Feb 19, 2023

Can privacy be preserved with zero knowledge proofs? I don't like the idea of universal fingerprinted devices in an already heavily authoritarian world.

jefftk · on Feb 19, 2023

Neat! This does seem like it should work!

Semantic quibble: it's less "proof of work" and more "proof of hardware+work". Or, as they call it, hardware-bound proof of work. The reason you can't offload the challenge to a more powerful device is that they rely on identifying stable differences for each device class that ultimately trace down to the hardware they're running on.

mindslight · on Feb 19, 2023

From reading the abstract, isn't this just exploiting the same class of security vulnerabilities that the OP is lamenting are being fixed?

jefftk · on Feb 19, 2023

Not sure. Maybe not, if it's about device-specific information instead of headed-vs-headless distinctions?

geokon · on Feb 19, 2023

Wasn't mining in the browser basically shutdown by every major browser?

It was done super fast.. one can't help but think that Google pull all the levers they had at Apple/Mozilla to made sure the first viable alternative to advertisement was killed before it was born. But I think as a side effect it make PoW might be sort of impossible?

I don't really know how to mining "fingerprinting" works exactly - so would be curious to know if I'm wrong

duskwuff · on Feb 19, 2023

What killed "mining in the browser", more than anything else, was:

1) It was almost exclusively used for malicious purposes. Very few legitimate web sites used cryptominers, and it was never considered a viable substitute for display advertising; it was primarily deployed on hacked web sites. Browser vendors were relatively slow to react; many of the first movers were actually antivirus/antimalware vendors adding blocks on cryptominer scripts and domains.

2) The most popular cryptominer scripts, like Coinhive, all mined the Monero coin. (Most other cryptocurrencies were impractical to mine without hardware acceleration.) Monero prices were at an all-time high at the time; when Monero prices crashed in late 2018, the revenue from running cryptominer scripts dropped dramatically, making these scripts much less profitable to run. (This is ultimately what led Coinhive to shut down.)

geokon · on Feb 20, 2023

I guess slow/fast is subjective. It didn't seem like enough time passed for a legitimate ecosystem to develop. Just the basic idea of say hosting a static-site/blog on a VPS with a cryptominer that could pay for itself would have been a game changer - but was probably just the tip of the iceberg of possibilities. Instead we're still stuck either having to sell our traffic/info to Google/Microsoft, put up ads, pay for it out of pocket. The entrenched players won

The hacked site boogieman felt overblown (and from what you're saying it sounds like if would have died out anyway). I'm sure it happened, but at least personally I never once came across it. Or if I did, then my CPU spun a bit more and I didn't notice. No real harm done.

More fundamentally we're now in territory where the browser vendors get to decide what javascript is okay to run and which isn't.

Anyway, it's just complaining into the ether :) it is what it is. thanks for the context of the market forces and antivirus companies

duskwuff · on Feb 20, 2023

> I guess slow/fast is subjective. It didn't seem like enough time passed for a legitimate ecosystem to develop.

Coinhive was live from 2017 - 2019, and it basically ran the whole course from exciting new tech to widely abused to dead over those two years. I don't think it needed more time.

> The hacked site boogieman felt overblown...

Troy Hunt acquired several of the Coinhive domains in 2021 -- two years after the service shut down -- and it was still getting hundreds of thousands of requests a day, mostly from compromised web sites and/or infected routers. It was a serious problem, albeit one which mostly affected smaller and poorly maintained web sites.

https://www.troyhunt.com/i-now-own-the-coinhive-domain-heres...

Cthulhu_ · on Feb 19, 2023

Make it someone else's problem; put a caching CDN in front of it, like Cloudflare, who have experience with these problems (like intentional or accidental DDOS).

supriyo-biswas · on Feb 19, 2023

I understand and agree with the suggestion of putting a CDN, but it's somewhat ironic to suggest the use of Cloudflare when that very same company is advocating for the DRM-for-webpages scheme.

sethhochberg · on Feb 19, 2023

Is it not a fair to assume that Cloudflare, as a company who have made a name for themselves selling various DDoS protection services, realize they're in an arms race with the old school way of handling these problems are are pursuing more advanced solutions before the current techniques are entirely useless?

It would be easy to point to the irony of saying "instead of supporting Cloudflare's proposals for PATs, use their CDN product for brute force protection" but on the other hand, they employ a lot of experts in this space and might see the writing on the wall in an increasingly adversarial public internet.

supriyo-biswas · on Feb 20, 2023

This is a good question, but if you look at it closely, Cloudflare seems to be the only company advocating for attestation schemes for the web.

It’s almost as if the conspiracy theory of Cloudflare acting as an arm of the US government and helping in the centralization of the internet is actually true.

notatoad · on Feb 20, 2023

is there such thing as a caching CDN that effectively protects against scrapers? generally if somebody is going to try and scrape a whole bunch of old infrequently-accessed but dynamically generated pages, most of those won't be in the cache and so the caching proxy isn't going to help at all.

i'm honestly asking, not just trying to disprove you. this is a real problem i have right now. ideally i'd get all my thousands of old, never-updated but dynamically generated pages moved over to some static host, but that's work and if i could just put some proxy in front to solve this for me i'd be pretty happy. but afaik, nothing actually solves this.

jamesfinlayson · on Feb 20, 2023

Akamai has a scraper filter (I think it just rate limits scrapers out of the box but can be configured to block if you want). I'm not sure how good it is at detecting what is a scraper and what isn't though.

notatoad · on Feb 20, 2023

Yeah, AWS has one of these, a set of firewall rules called "bot control". it seems to work well enough for blocking the well-behaved bots who request pages at a reasonable rate and self-identify with user-agent strings (which i'm not really concerned about blocking, but it does give me some nice graphs about their traffic). it seem doesn't do a whole lot to block an unknown scraper hitting pages as fast as it can.

aumerle · on Feb 19, 2023

rate limit. Or paywall.

inetknght · on Feb 19, 2023

> What do I do when someone hits my content-rich Wordpress blog with a scraper that hits 100 pages a second to download my content, and my database falls over

It's a blog. Blogs are not complex. Why is your blog's database so awfully designed that 100 pages a second causes it to fall over?

> leading to real, legitimate users being unable to use my site?

You assume that a scraper is not a legitimate user. I argue otherwise. If you don't want a scraper to use your site then put your site behind a paywall.

> What if it’s not a legitimate scraper but someone with hundreds of proxies uses them to DDOS my site for days?

If it's a network bandwidth problem, then a reverse proxy (eg, CDN) solves that.

> Should I sacrifice my uptime to protect the freedom of those unwilling to attest that they’re running on real hardware?

All software runs on real hardware. What is your exact question?

I am accessing this site in a virtual machine. I could be doing it with a headless browser. Why does that matter at all?

RobotToaster · on Feb 19, 2023

PoW captcha like MCaptcha. (It's technically not a captcha, for the pedantic)

harrisonjackson · on Feb 19, 2023

We have a chatbot that can send users screenshots of their CMS views (kanban, calendar, tables, gallery, etc) from inside of Slack.

The screenshotting uses puppeteer and chromium and a read-only session to impersonate the user and screenshot their dashboard.

It uses the old version of chromium and there were many gotchas that required a lot of extra scaffolding to actually render ours and other websites like they would on my laptop. This will hopefully make it easier for us to maintain once implemented.

londons_explore · on Feb 19, 2023

If you add DRM video playback to the fingerprint, it is pretty much impossible to fake...

Either they have a real TPM with a real nvidia graphics card able to decrypt content with a real serial number... Or they don't...

If one graphics card or TPM serial number starts acting bot-like, you can ban just that one.

azalemeth · on Feb 19, 2023

I browse with DRM disabled. Every time it gives me a notification about it, I view it as a "hah, fingerprinting avoided!" signal.

Sites that use it get my anti-traffic. I don't buy, support, or condone DRM'd media and I actively disable EME on every browser I come across...

2h · on Feb 19, 2023

> I don't buy, support, or condone DRM'd media

this is good, but it would also be helpful if you supported the anti DRM movement. Some people have developed ways to get around certain DRM such was Widevine, from dumping your own CDM to Widevine proxy. Just ignoring the problem is not going to make it go away. Over the last two years DRM use for streaming content has increased significantly. If you want to really help, I would look into contributing code to these projects, or donations.

ViViDboarder · on Feb 21, 2023

I'm not seeing how that doesn't help support DRM?

It does nothing to dissuade content gatekeepers from employing restrictive DRM on their sites.

Anti-DRM would be avoiding anything that gives money to those that employ DRM to incentivize the removal of the DRM. Frankly, flat out piracy (streaming ripped content) is more likely to result in the removal of DRM than making it appear that the DRM is working well for the provider.

lupire · on Feb 19, 2023

[flagged]

theyeenzbeanz · on Feb 19, 2023

We don’t want to deal with having to be forced into having specific hardware, operating systems, and browsers to watch content we paid for. I’ve had perfectly good monitors that were before HDCP was a thing, and these sites gimp the quality or outright refuse to play media because the monitor didn’t have some bogus technology.

rwmj · on Feb 19, 2023

Even as someone who isn't in the slightest interested in unauthorised copying of content, watching videos on anything which isn't VLC on my laptop is such a PITA that I never do it.

jimmydorry · on Feb 19, 2023

DRM has a huge impact on what I consume. For example only being able to watch Netflix at 720p due to running a *nix distro.

nine_k · on Feb 19, 2023

Good for you.

There are sites that commercially distribute DRMed video content; say, Netflix. They have a large audience, and they care, whether me and you like it or not.

andrewmackrodt · on Feb 19, 2023

Using Netflix as the example, Widevine L1 has very limited support on the desktop, i.e. Microsoft Edge on Windows and Safari on macOS.

All other configurations use L3 which is a shared key, e.g. provided by ChromeCDM as it runs entirely on the CPU - which is why Netflix content also works under Linux, albeit L3 is limited to 720p (or 1080p with browser extensions).

Given Chrome's massive browser market share, I'm not sure whether enabling DRM adds anything meaningful to the fingerprint - i.e. I don't think it's possible to revoke an L3 key without pushing out a new version of the CDM to all users of that browser, as has happened once before with Chrome.

FWIW I've tested Widevine L3 decryption works using a ”headless” docker container running Chrome. The only caveat to add is that Chrome must not be started with --headless, but you don't need a real GPU either, Xvfb works just fine.

azalemeth · on Feb 19, 2023

I've never used Netflix (or other streaming sites like them) because of the DRM. Youtube manages to prove that a streaming model can be very, very profitable without it at all, as does BBC iPlayer.

charcircuit · on Feb 20, 2023

YouTube uses DRM for licensed content like TV shows

lupire · on Feb 19, 2023

How much of that audience is watching on a device without a video card? Almost none.

nine_k · on Feb 19, 2023

AFAICT, the server can avoid serving the DRMed content until the browser proves it has a legitimate DRM-respecting playback capability, which is designed to be hard to feign. That is, unless something like [1] is correctly implemented in the headless mode, DRM content won't be available anyway.

Am I missing anything?

[1]: https://developer.mozilla.org/en-US/docs/Web/API/Navigator/r...

onion2k · on Feb 19, 2023

What use case is there for accessing DRM video content using a headless browser?

sebzim4500 · on Feb 19, 2023

Automated downloading of the content, I assume.

notatoad · on Feb 20, 2023

i love the contrast in these comments. on the one hand you have all the people arguing that headless chrome is unethical because websites need to be able to block bot traffic, and on the other you have actual humans saying they try as hard as they can to behave like a bot.

pritambaral · on Feb 20, 2023

> ... actual humans saying they try as hard as they can to behave like a bot.

Blaming humans for desiring privacy is bad. No one here is "trying to behave like a bot".

Exaggerated example: "Oh, you don't want to show me, a random stranger on the internet, your ID? You are behaving like a crook!"

notatoad · on Feb 20, 2023

i'm not blaming people for wanting privacy. i'm just saying that if you value privacy, you can't also value blocking bots, because in order to block a bot you have to collect enough information to violate the real people's desire for privacy.

and there seems to be significant overlap between the people who think enabling bots is morally wrong, and people who think fingerprinting is morally wrong. if you value privacy, you have to value privacy for all web users even before you've collected enough data to determine whether that web user is a real person or not.

sarnowski · on Feb 19, 2023

TPMs do not reveal a unique serial number or similar identifier by design for privacy reasons.

A TPM can attest that some measurements were done with it and it can attest that it comes from vendor X. You can block an entire vendor if they don’t behave but not individual TPMs via remote attestation.

You can use a scheme in which you can set up an „identity“ on first use and then on next use authenticate the same identity. But that identity is kinda per use case.

melvyn2 · on Feb 19, 2023

I was under the impression that the EK could be used to identify individual TPMs- why can’t it?

redox99 · on Feb 19, 2023

I don't believe DRM fingerprinting is used in the wild. Firefox shows when DRM is being used (like Netflix) and I've never seen it used outside that.

ffpip · on Feb 19, 2023

Reddit's website uses DRM for fingerprinting - https://iter.ca/post/reddit-whiteops/

redox99 · on Feb 19, 2023

Maybe they changed their mind on that, because it does not show me any DRM usage as of now.

jefftk · on Feb 19, 2023

> If one graphics card or TPM serial number starts acting bot-like, you can ban just that one.

I don't think you can get the serial number, though?

(And if there was an API for this it wouldn't be a passive one, which makes it inapplicable for fingerprinting)

beagle3 · on Feb 19, 2023

Also shutting out a lot of older and weird devices (internet fridges, dumb smart tvs, and more, many Linux and bsd users) who can’t play DRM.

Some sites won’t care, but for some this will be too high a price for avoiding headless bots.

xnx · on Feb 19, 2023

How does this work? Wouldn't a lot of real user-agents not have this capability and therefore not be able to be fingerprinted and banned in this way?

323 · on Feb 19, 2023

Can you report back the TPM serial number to the webserver?

If so, why isn't this used as an immutable ever-cookie that can't be deleted?

t0mas88 · on Feb 19, 2023

You can't, the parent comment has combined a few real world possible things into an impossible combination.

RobotToaster · on Feb 19, 2023

Why couldn't they just use a software TPM?

chirau · on Feb 19, 2023

How do i set the new part of the headless flag in Python?

The article mentions that to use this you need to specify the --headless=new flag.

I know that to set the headless flag i can just use this code:

    from selenium.webdriver.chrome.options import Options

    options = Options()
    options.headless = True

But how would I specify the new part of the flag/option?

pRusya · on Feb 19, 2023

There's a mention to this in the recent Selenium blog post https://www.selenium.dev/blog/2023/headless-is-going-away/#a...

Basically omit options.headless and use options.add_argument("--headless=new") instead.

transitivebs · on Feb 19, 2023

The cat & mouse game continues...

natorion · on Feb 19, 2023

PM working on Headless here. Masking bots is not the reason why the new Headless mode was created. The goal is to provide an headless browser that can be used in web tests. The original Headless is essentially a separate browser implemented in parallel to "proper" Chromium. That results in all sorts of subtle reproducibility problems for developers using Headless for their tests.

elbigbad · on Feb 19, 2023

PM working on Private Browsing mode here. Watching pornography is not the reason why the new Private Browsing mode was created. The goal is to provide an Private Mode than can be used for Christmas shopping. ;)

In all seriousness, despite intentions, and I do love headless mode for actual integration tests with Webdriver, it’s no exaggeration to say that it is likely the single greatest avenue for bots and spam enablement across the entire internet, and imo is probably net Bad.

zarzavat · on Feb 19, 2023

If it weren't for bots there would be no search engines, no internet archive, no WWW. Bots, and the tools for making them, are essential to the functioning of the web.

ufmace · on Feb 19, 2023

It seems more neutral to me. Yes there's a lot of spam and other types of malicious behavior, but I don't think it's good overall to try to eliminate web automation entirely to stop it.

dmix · on Feb 19, 2023

A necessary evil for supporting an open and programmable internet (IMO).

ilyt · on Feb 19, 2023

> PM working on Headless here. Masking bots is not the reason why the new Headless mode was created.

Right. But it will be massively used just for that.

literallyroy · on Feb 19, 2023

Yes, same as many technologies with legitimate uses. Tor is largely used for illegal activities, yet many would say the anonymity it provides for the general public is worth it being created (or the anonymity it provided for US intelligence).

ilyt · on Feb 19, 2023

I'm not chasting anyone for building a piece of cool tech but that does seem like something like a holy grail for bots.

richwater · on Feb 19, 2023

Nice to know you are the arbiter of what is and what isn't a "cool piece of tech"

ilyt · on Feb 19, 2023

I don't know whether you're illiterate or just maliciously misinterpreted what I wrote.

SeanAnderson · on Feb 19, 2023

This is such good news to hear. Browser test automation was a pretty sore spot. I'm excited for your work.

danaris · on Feb 19, 2023

> Masking bots is not the reason why the new Headless mode was created.

You might consider looking into some resources on Intent vs Impact (eg, [0]).

IMNSHO, anyone working in tech has a responsibility to consider what their creations can be used for, in addition to what they intend them to be used for. There's just too much potential for scalability of nefarious behavior to do otherwise.

[0] https://www.masterclass.com/articles/intent-vs-impact

tssva · on Feb 19, 2023

Please reveal what you work on so I can publicly judge whether you have considered and properly chosen between intent vs impact or any other possible moral failings of your work as I see it.

Mimmy · on Feb 19, 2023

I’m naive here but why would Chrome release a headless browser that makes it easier for bot developers to avoid detection?

dmix · on Feb 19, 2023

This blog post is written that way because the guy works in the bot detection business so it's what he cares most about.

But there are still plenty of legitimate use cases for wanting a headless browser that perfectly replicates a normal browser environment. The obvious ones are automated frontend testing tools like https://playwright.dev/

hoistbypetard · on Feb 19, 2023

Exactly. And as the blog post mentioned, people who have a strong need to block bots have tools other than browser fingerprinting at their disposal. Quoth the post:

> It’s important to leverage other signals such as:

>

> * Behavior (client-side and server-side)

> * Different kinds of reputations (IP, sessions, user)

> * Proxy detection, in particular, residential proxy detection

> * Contextual information: time of the day, country, etc

> * TLS fingerprinting.

Having a headless browser that behaves exactly like a normal one is tremendously useful for making things. And people who really *need* to block bots also need to contend with "mechanical turk" style attackers anyway. These techniques are also very useful against that approach, which still may be cheaper than making an undetectable bot even with a near-perfect Chrome fingerprint available headless.

aabbcc11 · on Feb 20, 2023

> * Behavior (client-side and server-side)

Imagine count of false positives.

> * Different kinds of reputations (IP, sessions, user)

Almost all use mobile network right now. One IP can be sticked to thousands of users. Imagine count of false positives.

> * Proxy detection, in particular, residential proxy detection

Most residential proxy are just common ISP ips bought by a face or led by botnet. Imagine false positives of simple home users that are IP ranged like on 4chan.

> * Contextual information: time of the day, country, etc

script.execute(() => navigator.dateOffset = Math.random()...) script.execute(() => navigator.country = Math.random()...) script.execute(() => navigator.etc = Math.random()...)

> * TLS fingerprinting.

Imagine count of false positives, especially because there are 4 common tls fingerprints across browsers.

Just cope and seethe that your antispam filters will never work, antibot measures are fail. Cloudflare turnstile is fail. Bots won as usual.

mercurialuser · on Feb 19, 2023

We use a headless browser to load an internal webpage (with content that may be updated several times per day) and generate a pdf on-demand.

dataviz1000 · on Feb 19, 2023

As a bot developer, without taking legal steps (I do not break the law) there is no stopping me regardless.

aabbcc11 · on Feb 20, 2023

Based. Make companies cope further, by scrapping their prices and make your own prices 5% lower so customers buy from you. Their sites lag even on 32GB ram devices, with animations everywhere and zero optimization. This is the only way to compensate it. Even if you have nothing to sell, you still should abuse their antispam filters.

ethbr0 · on Feb 19, 2023

Because none of the people complaining about headless bots (read probably: content and retail) are major stakeholders from Chrome's viewpoint.

TAKEMYMONEY · on Feb 19, 2023

headless browsers or are faster than the normal browsers (no GUI) so your tests run faster

hashseed · on Feb 19, 2023

Chrome sets navigator.webdriver to true when controlled by automation.

Until now, bots could simply use headful mode to achieve the same effect that is now made available through the new headless implementation.

nullifidian · on Feb 19, 2023

Are there non-headless browsers modified specifically to have extremely generic fingerprints? Hiding OS, GPU, fonts everything.

matterhorn2000 · on Feb 19, 2023

Firefox (and probably others) have fingerprint protection. https://support.mozilla.org/en-US/kb/firefox-protection-agai...

nullifidian · on Feb 19, 2023

Any chromium based forks?

Eisenstein · on Feb 19, 2023

Brave.

jeroenhd · on Feb 19, 2023

Going by the amount of upset advertising/cyberstalking companies that Brave is indistinguishable from Chrome, I think this may be the answer.

I don't like the way they pretend(ed) to send funds to websites using their cryptocurrency services, though. Good software, sketchy company.

krmbzds · on Feb 19, 2023

+1. Also make sure to disable all cryptocurrency and "web3" related plugins for a pleasant experience.

worksonmine · on Feb 19, 2023

Not a browser but Arkenfox[1] hardens standard firefox. But it's not for everyone and using something this specific can be a problem in itself.

[1]: https://github.com/arkenfox/user.js/

zelphirkalt · on Feb 19, 2023

Tor browser (based on Firefox) seems to fit that bill.

BonoboIO · on Feb 19, 2023

At the end we come to a browser and we have to emulate a mouse that does all the clicking.

jasmer · on Feb 19, 2023

We should assume anyone visiting a site without some kind of credentialed login is a 'bot'.

Or for all intents and purposes 'noise' traffic.

It'd be nice for the powers that be develop an anonymous cookie standard to allow people to flag themselves as 'humans' without enabling the host to know anything about them.

We are fighting wars over problems that we have created for ourselves.

aabbcc11 · on Feb 20, 2023

No one is gonna use your malware tracking cookies, I always block them and Chrome as good browser also detects such usage. Also, by european law, you have no force to make "anonymous" cookie. We already lived after evercookies, now there are no any evercookies.

If I ever detect that site uses "anonymous" cookies without my consent, well you will have to pay a lot of $$ for me as a compensation. Enjoy, try your luck, man. I need money, anyways and I love jurisdiction.

Move on.

novaleaf · on Feb 19, 2023

I am using the new headless Chrome for my Browser-Automation SaaS (PhantomJsCloud.com) and it is working great.

It fixes some nagging compatibilities with certain websites. I don't bother with anti-bot mitigations, and I don't expect this to be useful in that regard. commercial Anti-Bot doesn't care about how much you spoof your browser fingerprint.

feel free to AMA

newhotelowner · on Feb 19, 2023

I just realized that I use your service to get pool monitoring data.

I configured everything (login & scraping) and start fetching data using your serivce. Then I discoverd that you have to login to load the dashboard but the their get API only requires the serial number to fetch the data .

novaleaf · on Feb 21, 2023

If you want to get billing details automatically (without logging in) just inspect the Http Response sent back with every api request. it says how many credits are remaining, the api cost, etc.

mnutt · on Feb 20, 2023

Does phantomjscloud still run phantomjs? How do you keep it up to date with security patches? Have you benchmarked how this new chrome performs against the old headless chrome and against phantomjs?

novaleaf · on Feb 21, 2023

No, running on Chrome now. I'm stuck with the unfortunate name. Chrome is much faster and well behaved than PhantomJs ever was. I offered PhantomJs as an option until not long ago, but nobody was using it.

newhotelowner · on Feb 19, 2023

Can you share the code for how to launch a new headless chrome?

novaleaf · on Feb 21, 2023

If you are using Puppeteer, it's easy, just add "headless":"new" as described here: https://pptr.dev/guides/chrome-extensions

fyi, The new version just helps with compatibility, I have not seen it impact anti-bot detection at all.

chuckwolfe · on Feb 19, 2023

I tried with akami and it still didn’t work. Still need the stealth plugin and some additional tweaks to bypass

eimrine · on Feb 19, 2023

> navigator.plugins.length = 0

So, any website on the Internets can know how many plugins my browser has? Ridiculously!

joshschreuder · on Feb 19, 2023

It would seem like no, in recent times at least. In recent browser versions (Chrome 94+, Firefox 99+, etc.) it's been changed to only report the default PDF plugins

https://developer.mozilla.org/en-US/docs/Web/API/Navigator/p...

thekingshorses · on Feb 19, 2023

I wish I can automate some of the banking tasks. I tried but couldn't automate Chase, Citi or CapitalOne.

If anyone has a working script to login and perform simple task on one of these sites, please share it.

user3939382 · on Feb 19, 2023

Last time I was able to automate Chase by targeting their mobile site which, at the time anyway, had a dedicated URI. Mobile site was simple HTML and easy to scrape.

TAKEMYMONEY · on Feb 19, 2023

> the new headless Chrome can still be detected using JS browser fingerprinting techniques [...] however, the task has become more challenging [...] I’m not going to share any new detection signals

Any guesses?

botflyguy · on Feb 19, 2023

In the bot detection methods I've seen so far on this, a large part of it is timing analyses where there is a significant difference between headed and headless, e.g. graphical operations, audio processing.

bornfreddy · on Feb 19, 2023

That, or making sure that mouse really moved somewhere (in a sensible way) before the click occured.

mwill · on Feb 19, 2023

This would have false positives for some accessibility software, I believe

vntok · on Feb 19, 2023

True, that's why you don't want to block the pageload on this signal alone, just use it to trigger a captcha.

hoistbypetard · on Feb 19, 2023

It's pretty awful to make people who need accessibility software go through more captchas. Those are an accessibility nightmare.

ryandrake · on Feb 19, 2023

Or even non-disabled people who typically browse using the keyboard only. Please stop sending users who you find inconvenient to captchas!

vntok · on March 1, 2023

With Privacy Pass they won't see more captchas, they will actually see fewer of them.

zelphirkalt · on Feb 19, 2023

That could be circumvented rather easily I guess, by using a non-headless (head-having? head-full? headed?) browser instead. And perhaps adding some random human-seeming delay in interactions.

zahrc · on Feb 19, 2023

Headed browser.

And maybe, but that will make enduser suffer more (as always), as more false-positives will be caught.

PascLeRasc · on Feb 19, 2023

This is off topic but when did we get the ability to use spaces in URLs?

pixelesque · on Feb 19, 2023

Browsers have automatically done the "correct" thing (converting to "bot%20detection") under-the-hood for years in my experience. I remember MS FrontPage-made sites with spaces in the name and IE would work with them.

syrrim · on Feb 19, 2023

in what sense? spaces as mod encoded (%20) values have been around ever since I've used the web. those spaces are occassionally displayed as spaces in the url bar, depending on the context.

shp0ngle · on Feb 19, 2023

The best way to catch a robot is just to slap a captcha there. Everything else is kind of useless and not effective.

phiresky · on Feb 19, 2023

Getting captchas solved reliably via a service costs around $1 per 1000 captchas so captchas are kinda useless as well if there's a tiny monetary incentive to get to whatever is behind the captcha.