It was always obvious to me that this was the case. Cameras and sensors do not overcome the chaos of supermarket shelving, but a ton of CCTV makes it fairly straightforward to build somewhat scalable human processes around this sort of thing. Fake it 'til you make it.
Another example that I'd bet is mostly human driven is the e-passport gates at many airports (not US airports). You put your passport in the machine, it points a camera at your face, and then it lets you through. Quick and easy, but I'm sure it's at least partially human driven – the response times aren't great, and it's much easier to show a few details to someone playing enterprise-grade Papers Please in a call centre in Swindon than it is to build some accurate machine learning system that meets the bar and can be understood by the goverment agency implementing it.
I flew out of SXM (St Maarten) Airport earlier this year. Their new departures terminal features an e-passport gate to exit the country. The agent watching over the turnstiles sits just past the gates in a glass booth. You can see them comparing the photo taken to the photo recorded in the passport. That's all there is to it.
I've seen a row of 20 active e-passport gates and maybe 1 or 2 officers in their booths off to the side, both assessing people in their role as a second line of assessment (I've before had the gates fail to scan my passport and had to go to the booth). At least in those circumstances, it doesn't seem to be the case – both because I don't think a 1-10 ratio would work, and because they were busy doing other things. Maybe this happens in some places, but as far as I can tell, not in the UK/Heathrow or Australia/Melbourne where I use these gates regularly.
Could not scroll on my iPhone 14 so was unable to read.
I went to Amazon Go several times to understand the tech. The one in Seattle had hundreds, possibly thousands of cameras but they apparently were all in the ceiling so I couldn’t figure out how objects could be tracked properly.
Hoping someone from Amazon can confirm or disprove this story.
I'd also like to know. In the first story I've read it said something like '1000 workers were labeling videos for Amazon'. Not clear to me if they were training a model to do this then or if they actually were following every shopper through video feed and seeing what they would pick out
Another example that I'd bet is mostly human driven is the e-passport gates at many airports (not US airports). You put your passport in the machine, it points a camera at your face, and then it lets you through. Quick and easy, but I'm sure it's at least partially human driven – the response times aren't great, and it's much easier to show a few details to someone playing enterprise-grade Papers Please in a call centre in Swindon than it is to build some accurate machine learning system that meets the bar and can be understood by the goverment agency implementing it.