CAPTCHAs are supposed to tell humans and robots apart, if robots are all doing better than humans, maybe we should flip the acceptance criteria to make sure you are not performing task at superhuman level, (until we train bots to do this). On an unrelated note, I have found captchas that don't even work and reprompt me all the time, I wonder if there is some naive filtering behavior they are applying.
Do CAPTCHAs in practice really serve to perfectly tell apart humans or robots, or just to thwart the laziest attempt of abusing a website? Everyone knows that a door lock, no matter how good, can be overcome if you a really determined. But the lock still thwarts off opportunists. Yes, the goal when we first developed CAPTCHAs may have been to tell apart humans and robots. Realistically now, if CAPTCHAs can still significantly reduce the traffic from bots then companies will keep using them.
The difference with a door lock is, even if you are sufficiently determined and have a way of defeating particular kinds of door lock with 95% confidence, that doesn't mean that you can instantly break into 95% of houses that use that lock, because you are one person with a physical body.
If bots that can break CAPTCHAs become widespread, the volume of spam, scams, and other junk traffic is going to cause problems for many people and small websites.
For a long time now CAPTCHAs only make bots more expensive. There are plenty of (human-staffed) services that will solve your captchas for $0.001-$0.005 per solved captcha. Better AI lowers the cost, but it's not necessarily a massive change of the status quo.
Solving the puzzle is only of the requirement to getting past captchas. For better or worse most captchas use IP(subnet) based rate limiting and cross site tracking as well, so the physical body argument holds in captchas as well. Try using the web inside free VPN or Tor and see if you can break captchas.
Big Tech built their empires on scraping and stealing data (How do you think LinkedIn or Facebook got started?) But when we try to scrape them they throw a massive hissy fit, and so they put a lot of engineering into CAPTCHA systems to keep their data locked away. Eventually, the pinnacle of bots will be something that reads the framebuffer, and manipulates a mouse and keyboard to scrape websites.
This has got these companies freaked out, because all the founders know the dirty secret of their origin story, and that someone else can come along and do exactly the same thing to them.
Not necessarily, if the messages are analyzed there's usually an intent (solicitation, scam, etc.) so you can filter based on the content similar to email. The shorter lead forms without a message to analyze are a bit different since you can only go based on IP, email, phone but it's possible.
I wonder if part of the solution is for us to start designing the Internet for bots. We're entering a phase where websites are coded by bots, content of all stripes (text/video/audio/images) is generated by bots, and increasingly that fact doesn't necessarily mean that we want to block it.
The main tenets of apps that are resilient to bot spam are 1) scalability, so that they can handle huge quantities of traffic and bot-generated content, and 2) the ability to differentiate high quality from low quality content (regardless of whether it was created by a human). Ironically, AI is probably the solution to the second problem.
> On an unrelated note, I have found captchas that don't even work and reprompt me all the time
This is by design. The ones where you have to identify a bus, crosswalk etc are all used to train ML models. Your results are checked against other for the captcha, but sometimes you are the first person to see the image and there’s no way to check your answer so you’ll always get served another.
Another smart thing is that they actually segment the picture by moving the squares slightly.
> This is by design. The ones where you have to identify a bus, crosswalk etc are all used to train ML models. Your results are checked against other for the captcha, but sometimes you are the first person to see the image and there’s no way to check your answer so you’ll always get served another.
Do you have a reference for this? I wouldn't have thought a process like that would be needed now-a-days for training ML models.
Training ML models has been the thing for CAPTCHA services ever since Google bought ReCaptca in 2009, and was originally used to provide training data for OCR, but then switched to other data. https://en.wikipedia.org/wiki/ReCAPTCHA
And even though you can do a lot unsupervised these days, supervised labeled data is still something really useful for training ML models (often in combination with larger unsupervised corpuses).
So it is actually a little different than what is noted above. I actually was told this directly from the mouth of someone who worked on this project. I don't believe it is that secret. But this is how it works.
The Captcha presents you with 9 squares. It selects a identification test at random (crosswalks, trains, buses, stoplights, etc). For this example let's say the identification test is to identify crosswalks. The squares are then filled as follows:
1) Two of the squares are requested that pass the identification test at an alpha value p < 0.05 (meaning it is more than 95% confident it IS a crosswalk).
2) One square is requested that passes the identification test at an alpha value of p < 0.01 (meaning 99%+ confident, effectively certain it IS a crosswalk)
3) One square is requested that fails the identification test at an alpha value of p < 0.01 (it is almost certainly NOT a crosswalk)
4) Two squares are requested that fail the identification test at an alpha value of p < 0.05 (it is 95% confident that it is NOT a crosswalk)
5) Three squares are requested that need have low confidence intervals p > 0.05
The captcha then shuffles these 9 images at random, it offsets the images a little bit by altering the crop slightly to prevent memorization by bots. Then it presents these 9 squares to the users asking them to identify according to the identification test.
The captcha scores the user based on their selection with the 6 known squares. The response you give on the 3 low-confidence squares has zero impact on you passing or failing the test. From what I was told, you must successfully identify both of the 99% interval squares correctly (one that passes the id test and one that doesn't). That is a hard pass/fail. From there, the captcha scores your response on the 95% confidence interval squares to the expected values. It compares that to other variables such as the speed that you answer them, the movement of the cursor and other variables (such as selecting, deselecting, etc). It also compares IP address google session data as part of its determination to determine the liklihood of humanity in the user. My understanding is that is is moderately forgiving. If the user is determined to be human based on those responses, then your responses are fed back into the confidence intervals for all of the images presented (other than the two "known" squares). Data by users that fail the Captcha is discarded so it doesn't feed into the confidence metrics of the images presented.
From what I was told, you can actually incorrectly identify 2 squares and still pass the captcha. The IP address and mouse movement plays a significant impact in the response as well as your ability to identify the two known squares.
Three of the squares are entirely unknown to the bot. You are purely feeding the confidence on those images for future use in the CAPTCHA and other google products. But there is no test where you are "guaranteed to fail" as mentioned above. Every test presented to you can be passed. There are 2 known squares which you MUST answer correctly. Your behavior and computer data and answers on the mid-confidence squares are what further impact your pass/fail determination. The three unknown squares never impact your pass rate. They are filler, the captcha only watches how you interact with the filler squares, not what you actually respond.
> But there is no test where you are "guaranteed to fail" as mentioned above. Every test presented to you can be passed.
Sadly, when a site with captcha has decided to fail your every attempt, there is no feedback on why. You can request new sets as much as you wane; you can submit perfect results as much as you want, and you can try alternate methods offer (such as audio captcha) as much as you want. Typic'ly, when this happens, the following don't help: incognito/private mode; toggling extensions, adding some less direct mouse movement, forrce-reloading the page. Occasionally, other browsers or other hardware (phone, tablet, desktop), Windows/Linux/Mac, changing user-agent string might help.
To that point, when I do the picture captcha (Select the crosswalks type question), I always click a square I know is not valid and then de-select it. Adds some "human-ness" to the interaction and I never get a 2nd challenge that way. Will that be the future? Look for behavior that is too perfect?
I feel like Captcha already takes into account of how quickly I select the pictures already. If I spend the time to get it perfect, I end up with more challenges than if I just select quickly based on my instincts.
If you assume that most humans spend the minimum effort possible on their captchas, your gut response is also more in line with other responses than your well thought-out response. Even a matching based on the selected squares would pick up on this.
We do this with some "slide this puzzle piece into place" CAPCHAs; my understanding is the detection is based on how slowly you fit the piece in. A computer would be linear in its movement, whereas a human with a mouse would operate with some unevenness and/or slowness and/or inaccuracy.
I think only some parts of a CAPTCHA are challenging you (data is labeled sufficiently for mistakes to be considered a failed solve), and the other parts are still in the process of being labeled, which you are helping to accomplish, so those have no influence on the immediate outcome.
At least that's how the old "type these two words" CAPTCHAs worked. It was crowdsourced human OCR of whatever text the machine OCR couldn't make sense of. I'm not sure if "find the bus/motorcycle/crosswalk/light" is the same way, but perhaps, and it does seem to offer leeway when there's only a few pixels of that item in the frame.
What’s sad is, I’ve started to predict how dumb other humans are at filling in CAPTCHA’s. Often there’s a piece of motorcycle, stair, traffic light, fire hydrant, or non-bus picture that users will mis-select, and I’ll do what I think the masses do.
I used to choose correctly but being sent through 5 chains of CAPTCHAs is modern hell.
I'm not certain that that's what is happening, but I strongly suspect it. It is if nothing else a useful mental model: switching my strategy from selecting every square that contains foo to selecting every square that I think that Joe Schmoe would select gets me through CAPTCHAs more consistently.
That seems to square with the content I'm frequently tested in. Crosswalks and traffic signs for self driving cars and stairs for walking robots.
I've also noticed that captchas are getting more difficult. Is that because the AI needs to sharpen recognition skills or because that's needed for differentiating human from 'bot?
>CAPTCHAs are supposed to tell humans and robots apart,
Nowadays, it seems the kind of captchas I get when under suspicion of being a bot are simply there to delay. Especially google captcha with their extremely slow fade out box selections.
>>“Furthermore the bots’ solving times are significantly lower in all cases, except reCAPTCHA, where human solving time of 18 seconds is nearly similar to the bots’ time of 17.5 seconds.”
This is currently an obvious tell for the standard CAPTCHAs, as you mentioned,
"performing at a superhuman level". However, it's an easy bot behavioral fix, so.... what is the next step? Offer a game of chess and look for a human the playing style? Seems you'd have to offer a menu of games, but not "Global Thermo...."
> On an unrelated note, I have found captchas that don't even work and reprompt me all the time
Modern captchas do a lot of background work regarding how human your inputs look, whether you have a human-seeming fingerprint, etc. Often when I'm behind a VPN and on Linux I have the same issue, because my setup simply looks "too botty" no matter how good I am at telling which squares have a firetruck in them.
> On an unrelated note, I have found captchas that don't even work and reprompt me all the time, I wonder if there is some naive filtering behavior they are applying.
Curious, do you use a VPN or Tor? Either of those will cause CAPTCHAs to make you solve multiple puzzles.
The only site I visit that has ever been annoying with CAPTCHAs is PCPartPicker.
This is an interesting take and I've seen this done for determining if someone is cheating at Chess. If a player's movements match the AI moves too much (85%+ I think) then the player is considered cheating.
What it says is that one viable way to get rid of these stupid things may involve ADA lawsuits. Hopefully anti-discrimination lawyers are paying attention. Google has some pretty deep pockets, as do many websites that employ CAPTCHAS.
Failing that, we can't be more than a couple of years away from generalized solvers that can simply be implemented as browser plugins, at least on the desktop. The job of coming up with a fair, equitable and non-discriminatory test that only humans can pass is going to be an impossible one.
The captcha buster extension[^0] along with the service on you computer to move the mouse for you works very very well. It uses google’s TTS (afaik) to transcribe the audio captcha. It’s google verifying another google service works well. I find it very satisfying to not provide my labor to train googles computer vision corpus but instead have the snake bite it’s own tail.
Anyway, I highly recommend buster, I barely notice captchas anymore with it.
The only thing it seems that CAPTCHAs are good for these days is to discriminate against VPN users. There are several sites (including govt) that put me in an endless CAPTCHA loop.
I can't use several sites with Mullvad combined with FF and a couple privacy extensions/settings. Using mullvad also gets your comments auto-shadow-banned on reddit and your posts on craigslist immediately deleted.
me too but honestly there seem to be fewer (at least cloudflare) ones past few months. Not sure why, could just have gotten less annoyed with them and accepted it. Hard to say I guess, humans are so context oriented.
I can’t use ChatGPT, period. Their shit’s completely falling apart. The iOS app has managed to degrade from one of the smoothest apps I’ve used to a literally unusable steaming pile of crap in a matter of weeks. Honest question, are they replacing their devs with an LLM?
I completely forgot there was an ios and android app, I've always used the browser to use it and to be fair the browser is always open so it's convenient.
This is why techies will not be able to defeat WEI. It will be the replacement for CAPTCHA. Every place using a CAPTCHA now will switch to requiring cryptographic attestation that you're running a clean copy of FAANG stack on approved hardware. The only effective non-human bots will be physical robots using the analog hole.
It will get bypassed easily. The lengths spammers will go to for "organic" advertising that is automated doesn't really have limits. Fine, add WEI, spammers will write code that literally issues native events to a real browser on a farm of devices. I actually saw this discussion in the WEI proposal GitHub regarding Chrome extensions. What gets a pass?
Captcha was never really the limiting factor for spammers. It was generating human readable content at scale (comments, reviews, responses, etc). But with AI that's not an issue anymore.
For now. But we're slowly walking towards the future where you can only consume web using an official and conformant browser running on a locked-down platform. That'd make it considerably harder for spammers, but of course spam isn't what this is about, it's more like a cherry on top.
I think it will be more of a nationalized identification which is scary. It's the only way I can think of to actually get rid of spam at a "low entry barrier"
I see it like everything else we use for communication. Still junk mail, still robo calls, still spam texts. And you will still get spam under your nationalized id. But YOU(the individual) won't be allowed to do the same.
Edit: on a brighter side with high capacity drives being cheap and various wireless protocols sneaker net might be a thing. Your info won't be up to date but maybe that's not so bad, no doom scrolling. Beware the honeypots though.
Yes, one you get to needing a fake passport, or other equally stringent ID, from a list of authorized countries, the price per spam bot will basically be too high for any viable commercial spam.
This might actually be a pretty enjoyable experience for a lot of users. If every user can be assured that every communciation is actually from a real person, or a real organization that can be sued, or at least a fake personality that someone put tens of thousands of dollars behind.
For E2E testing you'll have to allow Selenium or something else to drive the browser, and they'll just figure out how much they can disable in a browser to pack as many requests per second onto a given amount of hardware.
Its only if the cost for a FAANG stack is high. Cost of tech is always decreasing. If you think you are going to have a FAANG stack protected by crypto hardware attestation you will be beaten by the next guy who never needs to do it.
There is a knee-jerk aversion to the word blockchain on hackernews, but crypto folks have been thinking a lot about decentralized sybil resistance. If the choice comes down to WEI or a blockchain based solution, how would you feel about that option?
That’s a succinct description of what seems to be the current state of most adults I know.
It’s clearly burning, nobody seems to be in charge, there’s little hope that anyone is going to fix it, and individual actions seem like shouting into the wind. Even worse, the problem is undefined and hard to measure - and different groups have different complaints and priorities.
Or, said another way:
“That pretty much sums it up for me” - drunk guy in the bar in the movie Groundhog Day
I’m looking! Last year I started actively looking for pro social friends to - let’s say - do a “security and firmware update of my social network.” It’s been successful so far though not perfect or easy
Nearly all the failures of the internet come down to advertising. The surveillance state was built to sell you advertising. Spam isn't exactly advertising but it's not not-advertising either. Captchas are built to protect sites from being viewed without making advertising impressions, usually so the content can be stolen to sell to other advertisers.
We need a new foundation. I don't know what it looks like, exactly, but I think it's going to have to built around micropayments.
Obviously there will be a satellite that will zoom at your gps location, you have to do a special dance and your humanity will be verified by motion detection, obviously robots eventually will be able to do this and dance better than humans as well as wear fake skin but that's a problem for another generation.
One solution was the universally hated worldcoin. I hate it as well and don't think it could work, but the problem they meant to tackle is very real and we should talk about it.
I think we are going to need some kind of certificates vouching for humans being humans.
I wonder if there could be a service which you use with your real identity registered, but the certs it gives you are untracked to still allow for possibility of being anonymous. Although it seems like it should limit it to giving you only a few certs a month. But then bad actors could start buying those certs from people themselves. Still would make it much harder to spam requests.
I could totally imagine Verisign or something doing this. Imagine a world where you have to pay a yearly renewal fee to keep your humanity. They’d love it. Of course, it’s not a claim that the user is a human, but rather the user has a slightly difficult to obtain certificate that had to be requested by a human.
> I wonder if there could be a service which you use with your real identity registered, but the certs it gives you are untracked to still allow for possibility of being anonymous.
This is possible with the German identity card. (I have no idea but my guess would be that this is possible with other identity cards from Europe or around the world too.) You can even make age checks with out transmitting the actual birthday. It just returns whether the card holder has or is above the age in question. You can't cross reference those certs with other sites.
It really grinds my gears if a identity check wants a video call or what ever instead of them using eID features.
> a service which you use with your real identity registered, but the certs it gives you are untracked to still allow for possibility of being anonymous.
The best you can do is pseudonymity. If the attestations are completely anonymous then you have no way of dealing with bad actors, like those who sell their digital identity to spammers.
> One solution was the universally hated worldcoin. I hate it as well and don't think it could work, but the problem they meant to tackle is very real and we should talk about it.
No it is not a solution. What prevents me from "selling" my biometrics to someone to be used for nefarious purposes? Or you are proposing putting an orb in from of every computer and not allow accessing the Web without scanning user's iris every time they type web-site address? Even if such dystopia get implemented we will end up with situation when bunch of real people in third-word countries are getting paid 2$ a day to sit in front of a computer playing games all day while getting their eyes periodicaly scanned to allow spammer botnet to browse the Web.
> I think we are going to need some kind of certificates vouching for humans being humans.
Who are "we"? Me and probably absolute majority of other users are perfectly fine with the way the Web currently works and don't care about the cost of fraud for Google's ad empire.
Ah, yes, I absolutely trust Sam Altman to keep all of my personal data, to never resell it to people I have absolutely no idea of. Just like I absolutely trust him to not completely manipulate worldcoin and generate tokens that will be given to bots. He's totally not in a position where he creates both the problem and the solution.
A bunch of news sites try that, but every link to them on HN (an Internet tech community you'd think would have a lot of people especially sympathetic to merit-based monetization of digital content) quickly gets a comment with a piracy link.
I've always felt one of the main reasons for this is that we never really got true microtransactions on the web. Pretty much all payment processors charge at least $0.20-30 per transaction.
My theory is that if there was a way to frictionlessly pay, say, $0.02 to access a piece of content ad-free, most people would be pretty okay with it. They key part is making the transaction frictionless - no more than 1-click/1000ms.
What's the challenge in already having this? There could be an aggregator service that has all the news sites joined with it and pays out on your visits to those news sites? Is it that it would be hard to get all news sites or content providers to accept this single one universal service?
They'll never do this because it dispels the illusion of independent media. It's the same reason cable news buys and shows ads: they want you to think the rest of the content isn't for sale. The truth is that the news is paid for by people who want to control the narrative, whether it's Logan Paul or Uncle Sam.
Old Flattr was something along those lines. You had to click though. (Like the old embedded Facebook likes. Are they old? Haven't seen them in a while, could be because of uBlock though.)
I really liked it. A view Blogs, Podcasts and the KeePass-Homepage had it. Then they sold to Chines investors and pivoted to god knows what.
If publisher accounts can be billed by the payment processor every month instead of every transaction and processor charges based on a percentage of monthly transaction.
I feel this is a business negotiation instead of a tech issue. But then I may be just clueless.
Yeah, I'd like to see a really good microtransactions implementation.
Including some way to keep that from marginalizing people in the current very inequitable capitalist environment.
A non-technical challenge is that you'd need someone to lead this in good faith, and they'd need both principles and clout. The first 5 candidates I thought of just now seemed much better candidates in the past, than currently.
I think most in the tech space tacitly believe that if Alice has a cool string of 0s and 1s, but you have to pay $5 to go into her house to see it, and Bob happens to memorize the string of 0s and 1s and decides to display it publicly for all to see on a billboard, that this is an extremely based and good thing to do and prevents Alice from holding back human progress.
People who don't believe this are probably biased because they are Alice.
I was thinking our tacit belief is usually more "I want that, but I also want to keep my money. whynotboth.gif?"
When challenged, rationalizations might come out.
Circumstantial evidence of this is that we really don't see a lot of clear altruistic looking towards human progress... on other topics.
If we mostly only talk about principled stands when it happens to be very convenient in a selfish way, the selfish way seems a more likely explanation.
For me, the issue is that I don't want to subscribe to the NYT because I never explicitly go to the NYT. If I'm linked to an article there, I'll read it, but subscribing on the off chance that I'm linked to an article there isn't a good reason.
If there were a way to give two cents for each article I read, I'd do that, but there isn't.
That they can't make enough money from contextual advertising with literally the entire internet linking to them all the time is a spectacular failure.
Probably a failure of the advertising model as a whole, I think. I wonder if the incentives would align much better if we had a micropayments provider.
I agree this should be a thing. Or imagine a new distributed internet where every request is an extremely extremely small micro transaction / mining event.
I can't pay for a single article; I have to subscribe. It is intentionally hard to cancel subscriptions. There are enough website data breaches that giving my credit card number to a random subscription site is an unpleasant risk.
I happily pay for Youtube Premium so that I can listen to whatever music I want anytime. Spotify and other services are fungible with that. I sort of happily pay for Netflix, but their catalog is shrinking. I would happily pay a subscription for access to all or at least ~99% of newspapers.
Whatever music did right (ASCAP, BMI) isn't perfect but it's miles ahead of the trash subscription schemes anywhere else.
Yeah this is a super good idea. I would pay $5 a lot of the time to access these just so I don't have to deal. Would have to accept paypal and I'd do it.
I think the moral dilemma is a completely artificial one. You have to do a lot of bending over backwards in your head before you come to the extremely weird conclusion that knowledge can "belong" to a particular person or entity, and the even weirder conclusion that even if you pay for access to the knowledge, you still don't get to share it as you see fit. This is just capitalism-brain, nothing more. It's not natural, and it's definitely not optimal as far as moral maxims go.
That modern society is structured around protecting Alice rather than protecting Bob and helping him free the 0s and 1s is simply a side effect of this, and an extremely unfortunate, progress-stalling one... the kind where you seriously contemplate going back in a time-machine to fix whatever went wrong to make us end up with.. this...
I agree, it's so weird that people who devote time, effort, and resources to the production of articles or books somehow think they're entitled to be compensated for the use of those products! What a bizarre belief! I mean, obviously they should just give them away for free. Not like they need money to survive or anything. They can just go on food stamps and live under a bridge or something.
It really is. Society should be structured such that these people (and really, everyone) can just work on their creative works or whatever they are good at without having to worry about where their next meal is coming from or whether their loved ones and dependents will be provided for. That we have to capitalize and monetize everything instead of just distributing resources fairly is bizarre. If we were to simulate thousands of different realities, I would think this state of being would be like a shitty local minima that you tweak the hyper-parameters to avoid.
As a creative person currently making an obscenely high amount of money doing rust development (north of $350k/yr), I'd happily throw all of that away to just be able to work on my open source projects in perpetuity if I could trust that society will take care of me, forever, in exchange. In fact, playing this whole capitalism game is a huge waste of my time and energy that I'd much rather spend making creative works without worrying about how I'll monetize them.
As someone who makes creative works, I don't _want_ to have to charge people to use/access/enjoy them.
We finally invent a thing (the Internet) that will let us share knowledge for free, and one of the first things that happens is a bunch of lawyers invent more work and job security for themselves by creating this fantasy notion that you can have Imaginary Property and they call it "Intellectual Property" so it's not immediately obvious how ridiculously selfish and society-retarding it is.
We live in a world of actual scarcity and they invented some artificial scarcity to benefit themselves exclusively, then immediately funded a bunch of bribery^H^H^H^H^H^H lobbying to make it illegal to share information for free.
>As someone who makes creative works, I don't _want_ to have to charge people to use/access/enjoy them.
Same here. I self published some stories on amazon and they won't let me charge less than a dollar for them, or I would. I'm infinitely more interested in people enjoying them than profiting from it. In fact, I found that they almost immediately ended up in some pirated torrent and was like, "How cool is that? Somebody thought it was worth pirating."
This notion that creative types won't create without financial incentive seems to come from lawyers, not creative types.
> We finally invent a thing (the Internet) that will let us share knowledge for free, and one of the first things that happens is a bunch of lawyers invent more work and job security for themselves by creating this fantasy notion that you can have Imaginary Property and they call it "Intellectual Property" so it's not immediately obvious how ridiculously selfish and society-retarding it is.
I think your narrative has gotten the relation between the invention of the internet and the creation of the subcategory of intangible personal property known as “intellectual property” very, very wrong.
Like, intellectual property is older than the USA and the internet is... not.
I get infuriated when a site wants to limit everything to its subscriber base but also wants to clog up my public search results.
No. Get out of my search results.
Pinterest, for example, should NEVER appear in my links or search results. If the New York Times wants to appear in my search results, then that article should be free otherwise GTFO.
Google used to heavily penalize sites that pulled this trick, but then gave in because it interefered with their ad revenue.
This is the main issue I have. I actually maintain paid subs to Washington Post, NYT, plus a local major newspaper. This should be more than enough to cover my share of news reporting costs. Yet if I’m linked to the Atlanta Journal or whatever, I’m expected to subscribe? Really? Every paywalled newspaper should support roaming. If museums can do reciprocal benefits, at WORST there should at least be reciprocal benefits among the roughly 1,000,000 Digital First Media newspapers, and within other large conglomerates. And ideally they should find a way to let a subscriber consume a reasonable amount of articles across a larger network. The higher the subscription tier paid, the more articles. Share a few cents for each article viewed.
Seems strange that the bots are allowed to read content for free, while the humans can't, but also the humans shouldn't read the bot's reposting of the content for free.
It seems to me that if they really don't want humans reading things for free, they shouldn't give it away to the (human owned, operated, and designed) bots.
No, they aren't piracy links. These are where the news site makes their content available to scrapers and then paywalls humans. This is where the news site is trying take advantage of search engines to get you to clink on their SERP, and then make you pay. In a lot of cases you can read this content by proxying off a non-dynamic ip range and changing your user agent.
> These are where the news site makes their content available to scrapers and then paywalls humans. This is where the news site is trying take advantage of search engines to get you to clink on their SERP, and then make you pay. In a lot of cases you can read this content by proxying off a non-dynamic ip range and changing your user agent.
okay... so they're piracy links is what you're saying?
Bots can do credit card transactions today, generate selfies that would fool automated detection (unless you are having humans in the loop, in which case they can easily generate them cheaper than they are to verify and ddos you that way), and pretend to be friends once 5 real people have vouched for them. Having an older gmail is going to do a good job making sure only the millenials and their botnets get in today, but also plenty of real customers in most domains will be blocked.
All of these might work today just because they aren’t widely used yet. Once they are and are understood they’ll be cracked.
The government oauth is maybe the most interesting; it probably just kicks the can to a government site which is going to have all the same problems, but could potentially benefit from a secure national id. But now the government has a database of all the sites you are logging into; forget privacy of library records, that’s a massive loss to the 4th amendment in practice. No need for prism or other NSA exploit nonsense when sites are literally pinging a government server to verify citizen network activity.
Why? Put a signing key pair on a secure element (e.g. in the passport). Let the government sign the public key. This way sites can check if it’s a government-approved keypair. You can prove that you own the key pair using the private key. No site ever has to contact the government for authentication.
The only issue is that it could lead to cross-site tracking because you are reusing the same public key everywhere.
They do! The field of zero-knowledge cryptography targets exactly this sort of use case, proving things to a recipient without revealing other things.
In this case, you might send a zero-knowledge proof that you have the private key corresponding to a government-signed public key, without revealing either key and, importantly, without communicating with the government or sending correlatable information on each transaction. (If you were ok communicating with the government you could just get it to sign a random public key for each transaction without needing zero-knowledge cryptography).
You can do a lot more, for example proving "I am 18+ [attested by gov signature]" without revealing your age, birthday or identity, or "I have security clearance level 2 [attested by gov signature]" without revealing anything else, or "I have a driving license [attested by gov signature] and current paid-up insurance [attested by insurer signature]" etc.
The field has advanced technically a lot in the last few years, to the point that those sorts of zero-knowledge proofs are now easy to implement technically and reasonably fast to compute.
The complete solution is a trusted 3rd party that verifies you in person. In most countries that would be the government. In the US it could a private entity like banks.
Then websites site rate limit each authenticated person.
For sure, and you could probably build off of that by just having a digital identify tied to the passport. But what I was getting at was that people freak out whenever anyone discusses a national identify card.
Yes, and I was getting at that a national identity card in the US has existed for many years, called a passport, so people are freaking out for no reason.
Also, it could be made opt in so the people freaking out would have nothing to object to, and then whoever wants to participate, can.
You can architect it in a way that let's the person choose what they want to share with 3rd parties. The most basic is just, this is an actual human. Then make it illegal for companies to share tracking data based on your passport id with anyone.
You can also architect it so that the government doesn't know who is trying to authenticate you.
It does prevent you from making multiple accounts with the same company though.
Isn't this what Wechat does? On the mainland you have to give them your bank account information, have a friend vouch for you (and that friend will also be kicked off if you misbehave), do it on a real phone that you can shake in a specific way when asked, and have a government ID.
This approach is used by wechat, you can't make your own account and can only be invited by an existing user. I suspect that wechat has reached enough market penetration that this is now worth it.
I have a feeling other humans verifying you mechanicalturk style one time is the future.
But imho, the payment route makes sense and also this is the problem as random paywalled sites and web monetization. A cash equivalent bearer of token payment method is what is needed.
1) verify+lock funds
2) get bearer to pay funds or funds unlock in x minutes, cancelling the payment
3) token is presented to payment processor who will revoke the token, issue funds to payee with a new token matching that value paid
4) anyone holding that token can pay for stuff, you can just email or move tokens with usb drives or store them im a secure vault service
I believe the main obstacle to solve so many web issues are KYC laws and lack of constitution amendment level laws that give the people right to trade using a bearer token (cash or not) and transfer funds peer-to-peer without a third party or disclosing their personal info (again, all like cash).
Probably <1 year away from beating automated detection of this reliably and 2-3 in fooling humans a high % of the time economically. I was at siggraph recently where I think I saw a whole room of papers of people’s approaches to doing just this problem of photorealistic faces saying arbitrary things.
If you're asking people to perform arbitrary actions on camera (not just speaking), generating that to a degree that is indistinguishable from video in real-time is much more than 2 to 3 years away.
It also doesn't scale, in part because without extensive internal controls it is a major abuse vector in the other direction, and will dissuade legitimate users, so its of extremely limited potential utility.
No it doesn't but the existence was the premise of the discussion. I do think that something like this could happen if trusted 3rd party identification services start to pop up. But only as a 1 time verification with the option of doing a 1 time in person verification.
Depends on how badly the given service wants to keep out bots as to whether they'll start requiring some kind of guaranteed identity check.
Who’s verifying it though? 3d rendering won’t fool humans but could be tuned to fool bots.
Unless you have a massive library of possible actions these can be pre-rendered too.
The OP you responded to said "I have a feeling other humans verifying you mechanicalturk style one time is the future."
I don't know if that's feasible, but that's the premise.
>Unless you have a massive library of possible actions these can be pre-rendered too.
Combinatorial explosion gives you a massive amount of actions. Hold your right ring finger between your left thumb and pinky and move it around in a circle.
There's tools that you could use to limit what you had to prerender and it would be a cat and mouse game like captcha is today, but I think until we're at the point where you can completely replace an actor, and then you can do that in real time, picking out a real human will be possible.
I’d take the other side of that bet if it was stipulated that the 1 year start after this became a common captcha technique.
People are working on generating animation on a rig from text prompts:
https://www.motorica.ai/
It would be easier if we knew the prompts ahead of time and then we just render transitions.
Real time-ness and a good deal of rendering fidelity can be mitigated by pretending to have crappy network connection once the prompt is given, unless you are comfortable excluding people without reliable internet.
I think the bigger issue is probably with the premise though. This kind of video rendering is much more taxing but it’s also much more taxing on the verification side. I can’t imagine a company being able to do this at scale the way captchas are to defend against bots, you’d need an automated system on the frontline.
Most of the work going on with rendering people speaking isn't to defeat captchas, so you can't use the speed of that work to judge how fast people will be able to work to defeat a captcha. Also if it were in use people would frequently change the kind of actions they are asking for.
>Real time-ness and a good deal of rendering fidelity can be mitigated by pretending to have crappy network connection once the prompt is given,
Real timeless and rendering fidelity are the only part about this that isn't solved, so there's no real debate to have if those constraints aren't there.
>unless you are comfortable excluding people without reliable internet.
If this was a real thing, I will absolutely bet that companies would exclude people with unreliable internet once I becomes known that all scammers are pretending to have unreliable connections.
"Please find a more reliable connection, or stop by the nearest Identity Check facility to authenticate yourself."
>I think the bigger issue is probably with the premise though.
I don't think this is likely to happen either. I think the most likely solution is trusted third party in person verification. Where I could see something like this working is if the third party verification service offered a 1 time remote verification process that functioned something like this.
I get the feeling that newer generations of CAPTCHAs will no longer be trying to filter out bots from entering human-made sites, but humans from entering bot-made sites.
To be honest, it already feels like a resource denial effort. Reminds me of Paul Virilio’s description of systems that deliberately inhibit human speed.
CAPTCHA is just usefully accidental punishment for evading surveillance capitalism while casually browsing the web. There are sites where getting through a CAPTCHA is literally impossible if you are using privacy controls, you just get stuck in an endless loop of CAPTCHA completion. But if you're "logged in" to the surveillance network, there's zero friction whatsoever.
How good are bots at simulating human behavioral patterns these days?
On my back burner I have a crowd-sourced data app and I keep wondering how I'm going to keep bots out. The ideas of shadow banning, throttling, or an approval queue for everything except known 'real' humans and new users that seem relatively human keeps popping up (eg, 2 approval queues for 'probably a bot' and 'probably not a bot')
Both of those are prone to be broken by a reasonably good AI, plus verification of either would be too computationally heavy to be cost effective (Also likely involving AI)
> reCAPTCHA: The accuracy [for humans] of image classification was 81% and 81.7% on the easy and hard settings respectively. Surprisingly, the difficulty appeared not to impact accuracy.
I'm surprised, too. My solve rate (as a human) is <50% for reCAPTCHAs of the form "select all squares containing a motorcycle/bicycle".
Does anyone have tips for solving them? (Is the rider part of the motorcycle? Should I select a mostly-empty square containing a single handlebar?) All I've determined is that the system prefers contiguous sets of ~6 squares.
I have stopped trying to get perfect scores on captchas. Instead, I've been testing to see how many errors I can get away with. This is a frivolous act of resistance, as the bots clearly don't need my help anymore.
It's purely anecdotical, but I'm convinced that Google CAPTCHA (which I don't see more often only because I rarely use Google Search) punishes me for being fast. I can half-ass it or do a perfect score, but if I'm done with it in less than 3 - 5 seconds, I'll only get another puzzle as a reward.
Companies that want Physical Humans to watch what is on their websites(ads+content) and wall off said content at the same time. Every one of these companies is on borrowed time. Eventually AI will be able to read frame buffer and simulate mouse movements to a tee and get what any one wants, Tor networks and VPN networks will be used to circumvent API restrictions and everything else that blocks people will be learnt by the AI. The only factor will be cost of compute that will host the AI and how much data you want.
If there was no profit motive(ads) or its correlates(attention), none of this would be necessary. Companies would just provide utility to make people's lives easier. The most utilitarian company would win.
The only reason big tech needs to make this much money is share holder "value"(code for: 'we need to make our execs and engg rich so that other companies don't grab them'). This is a massive snowball; Its crash-only thinking(google the term).
Sort of. Ads as a business model exists because many people don't want to pay companies directly (through subscriptions etc). For the company to exist, they have to make money, so they sell people's attention. Unfortunately, we simply haven't found a better way yet.
I'm no fan of capitalism or the ham-fisted way companies try to control their intellectual "property", but "shareholder value" is what gets content created under the system we've got. Somebody ponied up money for a content creator to sit down and do the work. They wouldn't do that if they weren't hoping for a return on that investment.
Me, I'd love to see a world where everybody got enough money to live on, and then got to sit down and create whatever content they want, for the heck of it. They don't own it; we all pay for it and we all get to enjoy it.
But that's not happening any time soon. So I think we're stuck with the problem of intellectual "property" being a thing, and companies trying to artificially limit access to it despite knowing that there are a million ways around it. They're just going to hope that most people, most of the time, would rather take the legal and official route, if it's not too burdensome.
I'm not surprised. I input a 2FA time-based code with a 30 second refresh on it and by the time I got through the captchas, my code had expired. By ~1.5 minutes...
Something I've wondered about is how CAPTCHAs are ADA-compliant. Ironically, this tech may end up helping people who are impaired from doing CAPTCHAs use websites. Just the other day, I watched a friend of mine who has vision health issues do several CAPTCHA rounds to use a state website.
The very paradigm of gating by cognitive traits is inherently a fragile one when we are actively working to bring computers in or beyond parity with our own.
We will probably rest back on the old ways of doing things where we use authentication and trust-building with verified accounts to throttle and gate what users get what privileges.
The evolving coherence of stronger and less siloed identity attestation seems like an obvious way to go here, at least looking way back from the past where we are now.
The goal of modern CAPTCHAs is to determine if the user is "acting like a human". Deciphering warbled, blurry text worked in the days when OCR was difficult. As we can see, that rule no longer applies. Now, it should be about making sure behavior is typical or atypical. Is this a human browsing the web, or is this a script using AI to pass captchas?
As I asked in similar thread the other day, what about proof of work based CAPTCHA like https://github.com/mCaptcha/mCaptcha - which I didn't actually see on any site. Is it used? Since CAPTCHAs can be solved by bots, at least make it more costly for them.
And good old web (well, that intermediary persons were many things didn’t yet use captchas but bots were already prevalent ) levels of spam and flooding of comment sections with links to advertisement and scams. Sounds great.
I hate captchas as much as the next person and have long suspected that bots can solve them better than I can, but I hate the comment sections and forums made useless by spam messages even more.
Delete all captchas everywhere. While at it. Delete all challenges. I am tired of wasting time into all this bs. If we did put the same amount of time into real solutions like rate-limiting and actually making endpoints secure there would be no problem to begin with.
As far as I understand at least for Google's captcha. They mostly checks check mouse browser details, activity & reaction time to check if your human. The challenge itself is only a very last resort. Didn't they use that for training data at some point?
> The bots’ accuracy ranges from 85-100%, with the majority above 96%. This substantially exceeds the human accuracy range we observed (50-85%),” the research paper read.
Wait, did they use humans or computers to measure the accuracy of the solutions?
I remember in the text captcha days there came a point where I just couldn't do them anymore. The letters were so distorted it would take me 10+ tries.