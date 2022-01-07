Now, if we look at https://www.hcaptcha.com/labeling we can tell they make money by labeling data sets for a fee. So as a guess, there’s someone out there that needs to improve computer vision detection of transportation vehicles. My guess is it’s a self driving car company, but who knows.
reply
You also need to assume that there are potentially control pictures in it as well.
I think this is a very liable approach.
Instead of saying "oh crap there's SOMETHING there we should stop" they said "huh, no let's loop on testing it until we figure it out or run it over....whichever comes first."
But then, when you're staring at your phone while driving your car out of a parking lot and across the sidewalk where you only miss hitting me (before driving into oncoming traffic!) because I stopped, well you deserve that minor inconvenience of being embarrassed.
Sometimes the bus/boat/truck has motorbikes sometimes bicycles. Is that a petrol-powered bicycle, or a motorbike to the [USAmerican?] person who wrote the rules!? Are all large yellow vehicles buses in USA or do you have minibuses, oh wait, are minibuses buses.
I've worked out fire engines are trucks for captchas, not sure about Transit-type vehicles, lorries are trucks apparently but goods trucks on railways are not trucks!
Is a traffic light only the lens/led array or the black light-holder too? Do pedestrian lights count as traffic lights? Are those weird lights hanging in the middle of junctions 'traffic lights'.
Wish they'd just tell you what counts.
I have noticed that times I realise after clicking that I missed a square they tend to go through whilst many times I get repeated captchas when I know I got it right. Success, as a user, seems impossible to predict.
In the US, lorry == truck. Never heard the term goods truck before today, but I think it's what we call a boxcar.
A bogie in India seems to be a railway car in the US.
Then there are intermodal containers, smaller than boxcars, I think they are hauled by truck (lorry) after being unloaded from a flatcar.
AKA "flatbed" or "flatbed trailer" :-)
A hexagon would be better as a frame instead of a square.
Every time Google blocks me for refusing to label a motorbike as a "bicycle" I get utterly pissed off. And likewise with the traffic lights on the californian skies. Are the traffic lights the actual lights themselves, or the boom holding them up?
I'm not a human very often, according to Google. hCaptcha tends to let me in...
"Please select all the Cats" then shows picture of a tiger among the common House Cats.
One example I keep ranting about: I think countries outside the US have different terms for "crosswalks".
I personally know them as "zebra crossings", and it took a while for the reCaptcha request to click in my mind.
"On the Internet, nobody knows you're a dog." (Well, except Google, it seems.)
When cloudflare switched to hcaptcha I definitely noticed it.
Literally any other captcha is better than this.
if opacity -ne 100% do_not_click_yet = true
So this is totally useless to prevent bots from solving it.
https://anti-captcha.com/
There of course options with image recognition, but they're less reliable.
Also latency of human-recognition service is quite high so while you wouldn't need to solve it you'll need to wait for number of seconds anyway.
For certain this can't be mainlined. And if we talk about extensions then at least in past extension code didnt have enough capacities to automatically bypass recaptchas.
This would require fake mouse pointer control and it's obviously not one of features that extension api expose.
Bypass for this fading was obviously implementrd next day this first appear on reCaptcha.
If rate limiting is needed there is always CloudFlare way where you're literally show user "wait" and refresh page a bit later. This is annoying, but nowhere as much as reCaptcha fading is.
“ In the year 2012, the corporations of the world paved over the Internet, designing their own network system. Keeping the same name, they developed a system where every piece of information was audited and paid for before it was passed on to the world at large. Those who still followed the ideology of an open and uncontrolled Internet gathered what resources they could and formed the SwitchNet. Build mostly out of discarded technologies and backdoors in the current Internet, it allowed some manner of uncontrolled communication around the world. The "Hacker Outpost" is in need of new recruits to perform missions in information gathering against the corporations, which will allow them to increase the presence of the SwitchNet in the world.”
And the slightly different press release one:
“ In 2012, a new Internet was introduced--one that prohibited users from posting anything on personal home pages, prohibited them from using software of their choice, and from having an e-mail address. Having no place to stay, hackers created the SwitchNet, an underground network operating on the old wires and infrastructure of the original Internet”
If Google says you're a robot, it must be true! You should behave accordingly.
There's actually a comic strip in the newspaper going through this storyline right now. Brewster Rockit: Space Guy! was told by a CAPTCHA that he's a robot, so he's going through life that way. The other robots do not seem to be happy to have him as part of their culture.
http://www.brewsterrockit.com
Take archive.is/archive.fo/archive.today, for example. If you're using Cloudflare DNS (1.1.1.1) or iCloud Private Relay, and you visit https://archive.is/, you'll get what looks like a Cloudflare screening page. It's not, though: that page is part of archive.is and is served to Cloudflare DNS users (which includes iCloud Private Relay users)--the use of reCAPTCHA in place of hCaptcha is a giveaway. You can complete the captcha as many times as you like, but you'll never get in.
And how many times have we completed a captcha on a form only to have it throw another captcha in our face without so much as an error message? Sometimes it's just lousy code.
https://news.ycombinator.com/item?id=28495204
So, I send everyone two captchas. One has a known answer and is required to be correct to access the service. The second captcha answer isn't yet known, so it doesn't matter what the user selects. However, when they get the known answer right, we log their answer for the unknown captcha. Once we get a large enough sample, we then have our top answers for the unknown captcha and can start using it for verification.
I wonder, what are the minimum number of labels per image to ensure clean data?
I frequently get these from Cloudflare when using Tor Browser. Google is basically unusable with Tor Browser.
I'd login, dismiss any popup or interstitial promotions the bank decided to give me, get to the account page, and tell my script to continue.
My script would then use Selenium to click the download button, click the "custom date range" radio button on download popup, fill in the range fields to cover the last 60 days, pick OFX for the download format, and start the download, prompting me to let it know when the download is finished.
When the download finished, I could then go to one of my other accounts at that bank, tell the script I'm there, and that one gets downloaded, and so on.
My bank isn't giving CAPTCHAs so that would still work if I were to get around to updating my script to deal with some redesigns they did of their pages which broke finding the relevant elements on the page.
But I've found that if I do visit a site that uses hCaptcha while using the Selenium launched browser, it seems to get stuck. Click to tell it I'm not a bot. Then get an image test. Answer that correctly and get another image test. Answer that correctly. Then it goes back to the click if you are not a bot thing, and repeats--two more image tests and back to the beginning.
Here's a program if anyone wants to try this and has the Selenium Webdriver package for Python3 installed. This will open a browser and take you to fanfiction.net. Trying to actually read any story will bring up the CAPTCHA.
#!/usr/bin/env python3
from selenium.webdriver import Chrome
driver = Chrome()
driver.get("https://www.fanfiction.net")
input("press enter when done")
driver.close()
driver.quit()
It used to be that if you added
from selenium.webdriver import ChromeOptions
options = ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument("--disable-blink-features=AutomationControlled")
driver = Chrome(options=options)
There's this project to provide a Selenium Chrome driver that is supposed to not trigger anti-bot detectors [1], but it still hit the CAPTCHA loop when I tried it.
[1] https://github.com/ultrafunkamsterdam/undetected-chromedrive...
The workaround is to simply visit all chapters separately and then point Calibre at the Google Chrome cache folder.
So nice going there, fanfiction.net. Instead of offering a 1-click .epub download like AO3 (which is completely CDN-able with a very long TTL), you now had to serve 50 individual requests. Great engineering work there.
(Obviously they do this to serve ads on every request)
There’s so many anti-detect libraries on GitHub these days. Wonder how many work well.
Try it out next time.
https://imgur.com/a/hoyjctl
Thank you! itsso much easier than being a labeling bot for self driving cars.
Austin has grown a lot since I lived there are a lot of people from outside Texas have moved there so I'm sure the culture has changed - but when I was there going from north Austin to south Austin seemed to be this epic trip the locals would only do on a weekend -- and probably pack water and sandwiches for the drive across town. A really exotic senior trip "abroad" for students might be to Houston or Galveston. You probably met your future spouse in grade school. Not very worldly.
> they were firebombing all the hispanic people's cars
Weird. This is sort of unbelievable. I think I know many Hispanics who lived in ATX during 2001, but they’ve never mentioned their car being firebombed.
>>You probably met your future spouse in grade school. Not very worldly.
The meeting spouse in grade school/not worldly typically implies not leaving the area they grew up in at any point. I grew up in a smallish town where this happens frequently. So many people never leave the state let alone the country. Some people never even left the county. Their biggest travel is for school sporting events.
In otherwords, it's not really a term of endearment as much as another "bless your heart"
Is the place you grew up intrinsically so much worse than the rest of the world? Would it be so bad to be invested in a single community for a lifetime, and to have a deep connection to the people there? I feel like I lack that, deep connections where the pull of cultural influence goes both ways. I'm not convinced that the cosmopolitan breadth of experience we may have gained outweighs the deep experience of locality that we sacrificed to get it.
For many people, the answer is a pretty definitive yes. I grew up in apartheid South Africa and left mainly to escape two years of compulsory military service helping enforce apartheid rule.
But if you speak to immigrants in any first world country, you'll find lots of similar stories. People having migrated because of severe political or economic issues.
> I'm not convinced that the cosmopolitan breadth of experience we may have gained outweighs the deep experience of locality that we sacrificed to get it.
What stops you going back, in that case?
So, you're extrapolating your personal experiences and applying them to everyone on the planet.
Many of my family members met their spouses in their home town. They hardly ever leave that town. They might fly to another country for vacation once a decade, but otherwise find complete fulfillment in the place where they live. Several have never even bothered to get a drivers' license, because they never found the need to have one.
By your standard, that makes them unsophisticated. But they're probably not. They live in New York City.
Indians or Middle Easterners I knew were not infrequently misidentified as Mexican (as was anyone of Hispanic but not Mexican background), but the idea that anyone would be so unfamiliar with the sizable Hispanic population to do a "black/white/terrorist" identification and firebomb "all the Hispanic people's cars" is hard to believe without some news articles discussing this as a big trend in 2001 Austin.
> "Its pretty much black/white/terrorist and hasn't improved all that much in the time since"
I suspect your experience wasn't very representative. After all, Texas has a 40% Hispanic population. Assholes won't be calling them terrorists, they're much more likely to be working them like slaves and paying them scraps off the books.
I suspect the biggest reason for a cross-Austin trip being an event has stayed the same: terrible traffic.
In this case, the intent is probably something like why are hcaptcha's customers centered around transport when there are so many other applications for this kind of labeling?
Bicycle detection is probably one of the more challenging elements as it relates to pedestrians and things you don't want a SD car to run into. Depending on the angle and color of the bicycle, rider position, and background elements, it can be challenging to discern the rider from the bicycle reliably. For the most part, just knowing there is a human present is a good start, but being able to anticipate movement speeds and directions of pedestrians vs. bikers is helpful in anticipating collision paths and distances, and also figuring out distances and terrain (person on bike is slightly elevated above ground, which can cause range perception issues, among other things).
source: have been working in AI/MV space for security/safety applications for 12+ years.
If you're white
> There are already a lot of labeled training sets available with people in various poses, heights, clothing types, etc
But not of various skin colors...
I like getting to identify basic things like cars, airplanes and trains with hCaptcha. It's like a picture book for adults, and feels strangely pleasant compared to other captchas.
https://old.reddit.com/r/google/comments/5udzy4/hey_google_e...
> hCaptcha has one of the largest pools on the planet available for your use. Whatever your scale, we can handle it without expensive upfront commitments. Millions of tasks per day are no problem.
Thanks for pointing this out - I feel abused now.
Everyone join us !
Yeah, it always seems funny to me. I'm using AdNauseam and other techniques for the same reasons they feel I shouldn't be doing it.
This approach mostly fixes the annoying phenomenon where I carefully select the exact right tiles only to be told, "too bad, try again".
Or something along those lines. And then you can get creative displaying same captcha to multiple users, etc...
On average the captcha let me go through which is actually very scary, since it looks like it prioritize algorithm training over bot detection...
Does anyone else do this?
You gave it a mostly correct answer, which it can cope with -- by design. It let you through, after all. You're not really accomplishing anything by being defiant, other than making yourself feel slightly better.
That's exactly why i was asking if other people were doing that. If i'm the only one..then yes, it's only useful for myself to fell less like a exploited brain, but if say 20% of the people start dumping random error on purpose...than the situation changes quite a lot...and potentially even the business model of shit-captcha might not work.
Isn't that the goal of all virtue signaling?
https://slatestarcodex.com/2013/04/12/noisy-poll-results-and...
As always, https://xkcd.com/1897
Here are a couple of examples:
https://bearbin.net/images/captcha/1.png
https://bearbin.net/images/captcha/2.png
https://bearbin.net/images/captcha/3.png
https://bearbin.net/images/captcha/4.png
There are also attempts to properly standardize it, and this is called- HYPE [0]. And there are big names like Fei-Fei Li and Michael Bernstein behind it.
[0]: https://arxiv.org/abs/1904.01121
Which leaves me wonder what the point is. If you are generating GAN images per CIFAR or ImageNet class, you know what the label is and don't need to label it. Perhaps they just generate lots of images to fill up the pipeline for the CAPTCHAs, to avoid reuse which could be exploited by spammers, when they have too little paying work?
I've not seen anything like this in the wild. And.... well, now I'm curious about how you had these examples to hand. When and why did you start collecting them?
But other times the pictures are 100% real life images.
Identify "Crosswalks". What the hell is a crosswalk
"School bus" - what's the difference between a bus currently serving a school and another one?
"Show taxis", there are no black vehicles listed at all
There won't be one. But there will be more and more unethical rich people using Machine Learning and Deep Learning technologies and vast computing power, money, and political clout to gain things for their own, and many people will suffer or at least be worse off as a result of this.
That said, I’m sure a lot of people don’t select squares that have only 1 relevant pixel so the captcha should be lenient.
Identify all the taxis: http://www.cookiesound.com/wp-content/uploads/2012/09/riksch...
When you are instrumenting software with anti-forensic security features to mitigate the speed of some reverse engineering, you run into this specific class of problem, where you need to get a machine to make a verifiable attestation to its identity and integrity and prove to a level of acceptable risk that the message isn't just someone inserting a breakpoint.
If you have ever had to design an "offline mode" for a verified transaction without a 3rd party verifier, you will need to run down this rabbit hole. This is to say, your intuition is a sound one!
https://www.youtube.com/watch?v=1h-seEowtDw
The baseline test seemed like an unnecessary deviance, and more like an active-duty psych exam measuring the psychological effects of the job.
It's also arguably the point of the novel/movies (I'll leave it at that to avoid spoilers).
[0]: https://nautil.us/blog/the-science-behind-blade-runners-voig...
[1]: https://en.wikipedia.org/wiki/Do_Androids_Dream_of_Electric_...
[2]: https://en.wikipedia.org/wiki/Blade_Runner
[3]: https://en.wikipedia.org/wiki/Blade_Runner_2049
And narratively I think it works amazingly. The idea of forcing someone to prove that they're sufficiently inhuman ... shudder.
I'd describe is more as a sense of ennui, and it's always about unicorns...
And yes, I'm currently employed as a blade runner.
Yes, can anyone confirm what they are really looking for in these instances? Further up-thread there are people implying that the "right" answer to the "bicycle" question is that you are also supposed to also be selecting motorcycles. I'd love to see a write-up about this from someone in the captcha department. Do they really want to identify bicycles specifically? But they are apparently getting many people clicking on motorcycles for some reason? And for the traffic light question, I only ever pick the elements that only actually light up, not the support structure. Are 25% of people selecting the poles?
https://pleroma.remerge.net/notice/AFCYPtCzeNBIIOD808
Captcha can sometimes get so philosophical.
I have my browser setup in a way that makes Cloudflare quite intrusive. I use the Temporary Containers extension on Firefox to open almost all websites in temporary containers (paired with the Containerise extension to whitelist the handful of sites that I like to stay logged in to).
About 30% of the random (like from web searches) sites I visit throw the Cloudflare captcha at me...EVERY SINGLE TIME. I'm so sick of picking out boats and buses that I just close out the tab without bothering the visit site.
I assume, that if I wasn't using Temporary Containers, a Cloudflare cookie after the 1st captcha would persist for the entire browser session, but there are privacy implications which are beyond the scope of this post.
Anyways, I guess what I'm saying is...Cloudflare sure seems great. Dangerously great.
If you design a web site with this in mind from the start, then there are several ways to make the Captchas less intrusive. However, a lot of Captchas are enabled to current solutions after problems have arisen and then it may hurt the UX.
or you could pay like $1 to cover 1000 captcha solutions. again, not sure if these still exist for newer style captchas though.
Anyways, my post was actually less about captchas and more about Cloudflare's silent consolidation of internet traffic.
https://www.ceros.com/inspire/originals/recaptcha-waymo-futu...
On a more serious note, I can't shake the impression that would be a logical next step for all long and medium distance freight, be it road, water, air or space. Whether it's a good or mature idea is anyone's guess.
src: https://www.vox.com/22436832/captchas-getting-harder-ai-arti...
You can sign up as an accessibility user and set a daily hCaptcha cookie that lets you instantly avoid the captcha (obviously, strict limits to not be abused) but good enough for myself!
https://xkcd.com/1897/
But isn't labeling of those basic concepts in static images pretty much "solved"? I am not an expert in self-driving anything, but I don't see captchas of video from driving, I don't see stills that are half-obscured by snow, I don't see nighttime pics, I don't see weird corner cases like a van with a decal of a cyclist etc.
Why don't we see captchas that seem more likely to be useful to creating datasets relevant to the more challenging problems?
https://news.ycombinator.com/item?id=29840110
Thus the occasional wrong “correct” answers.
https://www.ispot.tv/ad/qzJi/geico-too-many-robot-tests
This makes me think: It must be hard for AI to guess what is and is not a bus right now, but most humans do know what a bus looks like and can pick one from a photo.
But with concerted effort and years of research by our finest minds, we will make an AI that can detect whether something is a bus or not, and then we'll be asked something different instead.
How much train/plane/bike/truck labeling do they need? It seems like these have be standard for several years now, which is what I think the OP is really asking. Why these images, and why for so long?
The obvious answer as others have pointed out is they are selling it to self driving car companies like Waymo.
Now, if we look at https://www.hcaptcha.com/labeling we can tell they make money by labeling data sets for a fee. So as a guess, there’s someone out there that needs to improve computer vision detection of transportation vehicles. My guess is it’s a self driving car company, but who knows.
reply