Hacker News new | past | comments | ask | show | jobs | submit login
The ‘invisible’, often unhappy workforce that’s deciding the future of AI (unite.ai)
110 points by Hard_Space on Dec 15, 2021 | hide | past | favorite | 73 comments

Ground truth basic dynamics are similar to the basic dynamics of Prediction Markets. Having experimented with creating Prediction Markets from scratch I have witnessed first hand how the bias of the participants will skew or even nullify the wisdom of the crowd, with wisdom-free results.

> ‘[A] large majority of crowdworkers (94%) have had work that was rejected or for which they were not paid. Yet, requesters retain full rights over the data they receive regardless of whether they accept or reject it; Roberts (2016) describes this system as one that “enables wage theft”.

So, what was the source of wisdom in rejecting the results?

> So, what was the source of wisdom in rejecting the results?

You can run the same task 3 or 5 times and take the consensus option rejecting the others. But from experience unless the taggers have had special training in the current task the quality of annotations is crap. Then you have to do it all over again with someone in-house and gain nothing from the whole crowd sourcing ordeal.

Outside people tend to think anyone could label examples, but practice shows that it takes a special kind of person to do it well. Probably this is why many get their work rejected.

>Probably this is why many get their work rejected.

It's interesting we've built a system where people are allowed to be given a job and then told they're unqualified to perform it. If a construction site hired someone as a heavy machine operator and then immediately decided to fire them, they'd still be owed some form of wages.

Some tasks are only given to people who have completed a training course. But not all customers do it.

If you want to bring workers into a store and train them you have to pay them for the time they spent even if they leave halfway though.

The owed wages wouldn’t be the major problem though…

A days work at least, definitely isn't free

Not a very novel point - The Ghost Workers Powering The AI Economy https://www.forbes.com/sites/adigaskell/2019/09/02/the-ghost...

AI needs to face up to its invisible-worker problem https://www.technologyreview.com/2020/12/11/1014081/ai-machi...

That being said, this one covers two recent research works that are quite interesting (Whose Ground Truth? Accounting for Individual and Collective Identities Underlying Dataset Annotation , The Origin and Value of Disagreement Among Data Labelers: A Case Study of Individual Differences in Hate Speech Annotation)

I wonder if there have been any efforts to sabotage crowdsourced AI training and content moderation by signing up on crowdworking platforms and intentionally providing false responses. A large and tech savvy enough sabotage ring could use a browser extension or the like to keep their responses straight and increase the odds of their fake answers being accepted.

It is pretty common to "verify" workers: a fraction of questions (often 1% to 10%) is asked-before questions with known correct answers. If those are not answered correctly, the entire dataset from this person is ignored. Depending on the platform, they might get paid less as well.

This is designed to detect workers who either did not understand the instructions, or those who don't care about those and answer randomly. But this works against intentional sabotage as well.

Speaking from my experience working at data labeling companies, the sabotage does occurs, but is not intentionally malicious.

What ends up happening is that some labelers learn what the pre-determined questions and answers are and share these via Facebook and Discord to other labelers. That way, the other labelers can stay on the task longer while providing garbage responses to the non-predetermined question/answer pairs.

It's an arms race with labelers on one end, trying to make a quick buck, and data labeling platforms on the other, trying to get quality labeled data.

It was tried. 4chan tried a coordinated “penis” prank on Recaptcha. Despite the much vaunted community power of the website, and despite being coordinated, nothing happened.

It turns out that they are a drop in the bucket. Not only is there low RoI but also the group is too weak.

Did anything ever come of that? Because nowadays, those captchas are no longer used.

Seems like a lot of tedious work at low pay for little impact, particularly for people who are tech savvy enough.

Its worth hiring people full time to do high quality labeling. Benefits, 401k, the whole 9. A good technician who can interpret nuance has value.

>A good technician who can interpret nuance has value.

I want to highlight this phrase, because I think it is critically important and frequently missing from the worldview of the technical workforce. It is a serious risk that is created by the narratives around STEM education - especially engineering. Good technicians are incredibly valuable.

My first job out of engineering school was in a semiconductor factory. From the day I walked in, I was absolutely dependent on the technicians who worked on the tools to do my job well, as was every other engineer. I could do things they couldn't, but they absolutely could do things I couldn't. There were two types of early career engineers there (1) those who valued what the technicians could teach them and built those relationships and (2) those who did not. That second group struggled. They struggled because they didn't know how to listen to someone who they perceived as having less knowledge/education/value compared to them. Some of them got really upset when they found out senior technicians (typically with an AS) earned more than junior engineers (BS/MS). The reality was, the young engineers were a hell of a lot more replaceable and much easier to automate.

Google recently published a paper on this topic[0] and it is now a required read in the basic and advanced statistics courses that I teach, because a lot of my students are very excited about getting into machine learning roles. Unless you have quality data and information, the complexity of analysis you do on it largely doesn't matter.

[0] https://research.google/pubs/pub49953/

This can be somewhat resolved, if the engineers working on the model are required to do 1% of labeling.

100%...in our case it was 'do 1% of the cleaning/wiping/scrubbing'.

As if I haven't outed myself to any of my students reading this...we do it on the first day in my stats class:

Spend the day collecting data using the board game operation. I tell you the categories of data to collect, I don't tell you exactly how to collect the data. Then you clean that data for homework...the discussion about whether that data is good quality creates a lot of insight.

It's just not as efficient because the size of a dataset is often the biggest factor in determining performance. simply, I get way more labels if I'm paying $0.10 each vs the cost of a full time salary. And individuals are prone to burnout with repetitive tasks

Data labeling is one job that seems to lend itself well to gig work. There were many times where I would have gladly spent a day labeling images in exchange for $50, no need for benefits

>It's just not as efficient because the size of a dataset is often the biggest factor in determining performance.

I fundamentally disagree. Its not the size of the dateset that determines performance but the quality. Sure. Bigger is better. But simply better is *way* better. I can do more with 1000 high quality examples of the thing than I can with 40k low quality ones.

I've gone as far as bringing on 40 digitizers at once full time. I would never have been able to accomplish that project with turks. I would never have been able to call a team meeting and explain a very specific shift in interpretation we would be doing regarding the relationship of feature apple to feature banana, and when and where it would apply.

The difference between thinking you need oom more examples over a few quality ones I think depends on how in the thick of it you've been with regards to ML on theoretical grounds versus doing the boots on the ground work to get a product out. Having control over digitization and labeling and being willing to pay for high quality label's has allowed me to get results that directly meet my clients needs as opposed to sitting in some uncanny valley where the results are close, but no cigar.

The recent trend of self supervision in computer vision disagrees with this hypothesis.

If I get a thousand labels for $0.10/ea and 50% of them are wrong there is two ways to look at the result:

1) I spent $100 on 1000 labels

2) I spent $100 on 500 good labels and 500 bad ones and can’t tell them apart.

If I spent $0.20 per label on 500 labels with good better accuracy I end up spending less and getting more.

The only way what you said makes sense is if we ignore the quality of the data as a consideration.

In general, good technicians that can interpret nuance are extremely valuable in every domain.

When you use Turks the toughest problem is managing the workers who work at a high rate but do poor quality work. Some of these people might do better work if they got feedback (e.g. would be willing to pay them a bonus if they slow down and do a better job. It’s tempting though just to cut them off.

I'm on the team at Surge AI (data labeling platform + workforce), so this article hits home. We started Surge AI precisely because our team has always been against the adversarial, penny labor design of crowdwork systems: systems dependent on multi-annotator consensus lead to poor quality data (and ignore the inherent subjectivity in the rich, language-based tasks I love), they lead to suboptimal outcomes and treatment for all parties, and you get get what you pay for!

For example:

1. Most data labeling systems don't allow you to communicate meaningfully with your workforce. In contrast, we prize two-way communication; your data labelers are the ones going through tens of thousands examples, so they often have amazing feedback for how to improve your data and design your tasks better. And of course, you often have questions for them as well.

2. Context matters. I can't label Spanish hate speech; someone from Mexico City often can't label Madrid slang either.

3. The majority vote isn't always the best one. Real-world data is often personalized and subjective; your opinion on a funny or angry story may not match mine, and that's okay. Our training sets and AI models should reflect that. We just wrote a blog post on the subtle nuances when considering majority votes and inter-rater reliability metrics: https://www.surgehq.ai/blog/the-pitfalls-of-inter-rater-reli...

4. Curating annotator pools. Our product is designed around helping you build custom labeling teams that you trust, who learn the nuances of your domain and stay with you over time.

Seeing the headline made me remember that I wondered similar about google's hiring from another country to be quality guidelines raters or something like that.

I'd like to know more about the people and the culture there - not because it's easy to miss some culture thing here- but not knowing small things, the quality raters could be pushing some type of content into X category and other things into Y - and not because it's in the tons of pages of text they are supposed to follow, but because - whatever religion / other culture things are a thing there.

Haha, and this is how the singularity is going to turn out socialist. Mike Judge should make a movie about this.

The gig economy doesn't lead to happy employees- truly shocking. Who wouldn't want to dedicate their body and soul to a silicon valley corporation?! To be a replaceable cog in the machine is the American dream.

The funny thing about the American dream is that the quintessential examples used to be blue collar or back office labor jobs like mailroom processors could work hard and rise to the top. Kind of a pipe dream but it did happen. Even if you believe that to still be true, gig workers start even lower, below the bottom rung of the ladder. They’re contractors doing spot jobs in mailrooms. Most janitor jobs are gone. They’re now contractors too. Healthcare is to blame for a lot of this. The cost of a low level employee is simply too high when companies need to cover the healthcare coverage costs.

Your point about employee costs rising with healthcare is terrific and needs wider cultural recognition.

This increases the cost of labor, and therefore cost of goods across the economy. Its a big part of the reason why US based manufacturing is increasingly impractical.

Your point expands to all social services our government has outsourced, from retirement benefits to transportation, too.

We'd still have the costs if the government took those in-house. Germans pay around $1500 a month (personal + employer costs) for healthcare. Other government programs add more cost. Bring those in cost would not really bring the cost of the employee down. Rather it would expand the weight and power of the government via increased taxes.

It wouldn’t change the actual costs of living, but it would remove the inflection points where a full time employee is suddenly much more expensive than a 38 hour a work employee.

And the taxes could be spread more progressively, e.g. to match the rate of wealth accumulation people get from having passively having wealth

it distributes the cost across all sectors, which as the above poster commented would be beneficial to hiring low skill workers

The 'American Dream' is not 'Mail room to CEO' though FYI this is still possible.

The 'American Dream' is a stable job, a modest home with 'plumbing' and 'electricity' and maybe one car, plastic tupperware, maybe a dishwasher, and a safe community with basic constitutional protections and a justice and democratic system with integrity and legitimacy, decent public schools and some access to higher education.

All of that is revolutionary in the grand context of history.

The 1950's 'America Middle Class' is the dream, as it is for many migrants to the US for whom those things are like heaven.

The '$100K job, BMW and Corner Office' - that's something else, more like icing on the cake.

People forget how hard it was for the world to create the basic '1950s middle class standard of living' and frankly how delicate the system we live in is.

Within a generation we've come to accept that as more of a 'basic' condition of life, rather than something aspriational, which is a bit tragic. Most immigrants 'get it' though.

> blue collar or back office labor jobs like mailroom processors could work hard and rise to the top

I'd guess a factor in ending this has been the rise or at least growth of an administrative or managerial class in most businesses.

An apprenticeship model lets people grow, and to some extent a similar model used to apply in business, where the executives and managers were all career industry people who started on the shop floor.

The current system is closer to (closer, not the same as) a feudal system where there is a group of nobility that manages a group of serfs, and there is not much movement between the groups, at least at the same company. Admission into the nobility is based on education and other pedigree, and can never really be had by work experience.

Exactly. There is no way to "hard work" your way from junior employee to Founder or CEO. When the CEO leaves, their replacement always comes from the executive class. The junior can work their way up to senior or even low-level manager. But at some point high up on the ladder, employees come from a totally different aristocracy, and you can't work or even buy your way into that clique. Every company I've ever worked for, when they went looking for a VP-Of-Something, they'd never promote from the proletariat--they'd always go externally and look for someone from the nobility.

> Every company I've ever worked for, when they went looking for a VP-Of-Something, they'd never promote from the proletariat--they'd always go externally and look for someone from the nobility.

Interestingly, that might be the case for Fortune 500. But among the fortune 10, most CEOs/C-Suite are engineers with long tenures at the company.

Actually, if you look back at some of the most influential companies in the Valley at their peak, that's been a rule of thumb since the Fairchild days. Andrew Grove is an example of that [0].

[0] https://en.wikipedia.org/wiki/Andrew_Grove

So one factor here is the actually computers: 50 years ago, store clerk was a medium prestige job, that had medium pay and growth prospects- there are plenty of stories of people who started a clerk and ended up in a senior position. But that was because clerks had to know the prices and goods: a major risk to the store owner was customers swapping price stickers, so you needed a clerk who could spot the difference between the expensive onion and the cheap one and knew what the price should be and these clerks had to stay in one place long enough to know the prices.

Then along came the laser/UPC/computer system and deskilled the job. No longer did you need to know the difference, the computer printed up a natural language description of the item. This allowed stores to get much larger, because no one could know all the prices in a large store (early department stores handled the size by making you pay for items in each department, using store balance accounts to not create too much friction). This fundamentally reshaped the relationship between capital (who owned the stores and the computer system) and labor. It's actually the exact same alienation of the individual from their labor that Marx was writing about in the 1850's, just for jobs that didn't get the same attention.

Similarly, an in-law worked as a teller at a bank. 50 years ago that job would have more upward mobility because banks managed their risk locally: after many years as a teller and then as a loan officer you could move up to be the local person making the decision on loans. Nowadays large banks have centralized that with fancy computer systems to guide the decision, so there are entire paths of advancement that essentially don't exist anymore because computers spit out the answer. There is only shift and then branch manager, with once again the owners of the computer system using it to de-skill jobs and cut off promotion opportunities. (See, e.g. Better.com, using poorly paid contractors to do all the tedious stuff that can't be automated, and then letting the computer do all the high skill work.)

This is, by the by, one of the reasons that computer programmers continue to get paid well, because we are enabling capital to de-skill and bifurcate jobs better and better.

Interesting take. So if i can restate it, automating people's agency - basically their decision making or judgement - creates a "disposable" class that there is (from management perspective) almost no point in investing in their growth, meaning they have little mobility because there is no demand for a better person who has built on their experience.

Alternatively put, we've commodified human hand eye coordination and basic speech recognition and intent resolution. This view is definitely supported by the current state of the gig economy.

15 years ago, Circuit City laid off their highest paid retail employees, to be replaced by new hires: https://www.nbcnews.com/id/wbna17837882

That is, management was declaring that there was no value in experience whatsoever (until you got to management levels, naturally), and that they could replace anyone with a new hire without loss of performance. (After all, the way you got to be a highly paid retail person was to stick around for a long time and be pretty good at your job.) And while most managers in a retail setting aren't quite as obvious about it, that sure seems to be how they feel: employees are completely fungible, and it isn't worth paying them enough to get them to stay.

(Costco is a notable exception in the US: notice how the standard Costco name badge includes their start year on it, that is one of the ways Costco signals that they do value experience. They are also significantly more profitable per sq/ft than Sams Club, their competitor in the warehouse segment operated by Walmart. I strongly suspect that those two facts are related.)

>blue collar or back office labor jobs like mailroom processors could work hard and rise to the top

The American dream is not about starting low and making it to the top, it's about upward, intergenerational class mobility.

The definition has changed over time, but you have reduced the definition to a single component. It is true that an element of the ethos is to improve the outcomes of your descendants. The American Dream can be reduced simply to mean America exists as a place where you can live a better life. This concept is predicated on the statement in the Declaration of Independence that "all men are created equal." We may agree that this statement was not enacted with sincerity, but we may hope that is will be.

Much of the world views the American Dream as America as a place where one can come, work hard, be treated fairly and be rewarded regardless of social class or circumstances of birth.

The extrapolation of this concept into a larger ethos that includes upward, inter-generational class mobility makes sense, but is just one element of an ethos based on the basic principles of "life, liberty and the pursuit of happiness."

The American dream was also a concept for non-Americans. I mean, it's not like a formal, natural phenomenon we can just observe, so everyone has its own interpretation.

So here is mine, from a French who chose to emigrate in China instead of the US: the idea was that you'd come from your bum-shit country into that vast new land of opportunities where everything was to build, you build it and are rewarded in a fair meritocratic system in ways your rotting homeland could never have provided.

But now, you have to get your Twitter account scanned to be allowed entry under a temporary visa (or hey, try a green card lottery), risk upsetting an untreated schizophrenic while crossing the streets, to go to a gun-heavy anti-police protest anarchists use as an excuse to empty their Molotov stock while the police, too happy to finally have a purpose for their military weapon stocks, shoot at random. And if you are at the wrong place at the wrong time, you can be sent to a private jail. All the while, any accident you would have would not have been covered by any sort of socialized insurance but you'd be under one of the most punitive international tax system. And if you're unlucky, you can always take a payday loan, and join the merry mass of indebted grassroot people who have negative equity for no reason a rational mind could understand.

The American nightmare, it seems to me :s

>here is mine, from a French who chose to emigrate in China instead of the US [...long paragraph of US bashing ...] The American nightmare, it seems to me :s

Mighty ironic of you to go on such a lengthy rant to bash the US off your Western European high-horse (white male too I presume?) living in China like a priviledged westerner, but where the stuff you bashed the US for is amateur level stuff.

Be careful with all those Winnie the Pooh memes or you might loose your social credit points, comrade. Could you please point out the free nation of Taiwan on the map, in public, for everyone to see please? My geography is a bit rusty. Also, how are them force labor death camps this time of year over there?

> could work hard and rise to the top

What are the odds of that?

Not good, because getting to the top of any enterprise is inherently competitive.

Everyone wants the job, and there's only one of them to go around.

This is the reason you see such similar personalities occupying extremely high positions in companies. You need a certain mentality and dedication to set everything to the wayside for the pursuit of climbing the corporate hierarchy.

> Everyone wants the job, and there's only one of them to go around.

no, everyone does not want that job. People who do want that job reduce the equation that way, though

If the odds were high that people underwent meteoric career trajectories, that would imply that our sorting mechanisms for placing people into well-matched jobs worked very poorly. In general, I think these mechanisms work pretty well, so people typically end up in gradual career trajectories they're well-suited for.

Well at the very highest it's 1/N, where N is the number of employees at the organization.

There's lateral hires with their eyes on the same job too.

That's why I wrote "at the very highest". It can't be more that that, but it can be less than that.

But they leave a company somewhere else so don't affect the average number unless they move from a smaller company.

Technically non zero

Wow this escalated hilariously fast. Coming from a country with universal healthcare, I find it amazing how you believe your absolutely shitty coverage is to blame instead of, I don’t know, corporate greed?

But yeah, I guess low-level employees should do without healthcare, they just shouldn’t be poor AND sick after all :)

Your response is rude and completely misses the point of the person you responded to.

If companies have to provide healthcare coverage for their fulltime employees, but not their gig workers and contractors, which type of employment do you think they are going to prioritise?

If healthcare costs were applied uniformly through tax (as is done in most countries with universal healthcare) then there would be less reason for employers to prefer one type of employment contract over another for low level jobs.

You’ve got it backwards. By not directly employing low level workers for roles like janitors, companies do not need to pay for their healthcare. It is a very expensive cost per head even if the plan is shit. In America, employees get health insurance from their employer.

Your eagerness to shit on America made you entirely miss the point. Believe it or not, it is extremely obvious to many that universal healthcare would be a cheaper and preferential option.

Most companies just get around this by only scheduling you for 38 hours a week or something like that.

That is a common way, yes. Contractors from a firm is the other big one. There are folks who can spend decades working at the same org, full time, without technically being a real employee

the point is that healthcare costs have to be socialized – if you put the burden on the employer, you necessarily end up with the US system.

In Germany, the employer must pay half of the monthly healtcare plan and the employee pays the other half. Goes like this since decades in one of the strongest economies of the world. Of course, Apple is more worth than the whole DAX...

Be careful of market cap numbers, they hide information and are almost meaningless: they don't describe volume behavior. If suddenly people realized Apple had to be sold as fast as possible, most of that value would evaporate to reach a more reasonable tangible asset value.

It's possible German stocks are well priced, in a regulated, slow and rational market that cares about fitting the price with the value of the company, and would hold most of their current market cap much more than Apple, were it liquidated.

DAX P/E at the end of 2020 was 27, while Apple's was 35.5. Higher, but not shockingly so.

>It's possible German stocks are well priced, in a regulated, slow and rational market that cares about fitting the price with the value of the company

LOL, well regulated and rational my ass. The Deutsche Bank and Wirecard scandals (puls numerous more) proved the German government is just as corrupt when it comes to manipulating the market and threatening honest journalists, so that some rich and well connected scumbags can get even more obscenely rich. I've worked in several western countries but never saw more high-level corporate corruption than in Germany.

> Wirecard scandal

Oh yes, the regulation (BaFin?) totally was out of order on that one!

Thanks for that insight. When it comes to Stocks & Co. I don't know much, if anything. I find it always interesting to read information, that puts things into a relative perspective. The information I had was from an infographic I saw a few weeks ago.

“Who wouldn't want to dedicate their body and soul to a silicon valley corporation?”

A lot of people here do that but at least they make good money doing it. Gig workers don’t get the money.

This particular article points out specific problems, but I think your assumption that being a replaceable cog is necessarily bad shows a lack of imagination.

An advantage is that you can set your own hours and drop the work whenever you like and not think about it, because someone else will do it. Combined with working from home, it doesn’t seem like all that bad a way to make a few extra bucks? At least, if the problems in the article can be fixed.

It seems like a decent fit when something else is more important in your life and the job comes second or third.

You could just stop all the crap you are doing and believe what you want. If prejudice is already in the question, you cannot gain "neutral" answers. Or nuanced for that matter...

I personally think that the fight against hate speech in the name of minorities is far more offensive than a visit to the darkest corners of 4chan. Perhaps objectively so, since the latter group is more diverse (international audience) and inclusive (low standards) than contemporary academia, which proposes these metrics as a gold standard.

But the technical challenge is not solvable as soon as there are different standards of allowed content. Everybody has to decide those limits for himself, parents in case of minors. At least that is true for universal platforms. Some people live up when they get headwind, some people might feel inhibited. The quest against hate speech is intuitive one but still very foolish.

In my experience is that people search out offensive material to be offended. There are cases of harassment that are separate from this and often people seem to misjudge this. But AI is far away from making competent choices here.

An annotator service like suggested here would mainly be a service to select the sample group to spit out predetermined results. Maybe that is a service Google like to provide. They are also just people pleaser I guess.

You could strip people of their anonymity and they will get happier in polls, more content with management. How would you evaluate the change in the data you see?

strongly worded here; insensitivity ("I don't care about you") and disregard ("not in your shoes, don't care") are fertile ground for casual offensive speech and no, it is not technologies job to fix that .. BUT in the real world there are a series of feedback loops, from discomfort, to yelling, to a slap in the face or an arrest by an officer. This works in the real world, where is the feedback in a digital world?

The cue here for me was "You could strip people of their anonymity and they will get happier in polls" as an unexamined backing into the feedback topic.. real people have feedback that sticks

What "feedback that sticks" is natural in digital realms, and how can it evolve, not dictatorially?

I'm not really sure what you are angry about here because you dont seem to be arguing against the main conceit that bias of whatever form is a factor in constructing these data sets. Is it just that, for you, bias is inevitable and we simply have to toughen up?

In general I am angry about censorship ambitions and futile and mislead attempt to get rid of hate by banning it from the internet. Even while I am aware that hate can reproduce in a simple scheme that can lead to mutual radicalization, previous attempt to contain it all made the situation worse. But correction is nowhere in sight, it is as if people try to implement insanity by committing to the same mistakes over and over.

But I am not that angry and I don't think that can be read out of my comment aside from general disapproval.

I would assume that gig workers did not care as much about hate speech as some academics do and did not flag content as expected. This discrepancy is declared as bias. Fine, be that way...

> The Google researchers suggest that ‘[the] disagreements between annotators may embed valuable nuances about the task’.

On that I agree with the researchers, but would propose that any annotation (hate speech yes/no) would have to fall back to the 'no' and solve the dispute. Otherwise only asking the target will provide any additional understanding. Perhaps asking as supreme court too, but that is not feasible and not even the highest courts are infallible.

"In my experience is that people search out offensive material to be offended."

Lots more, it seems, seek out people to try to deliberately offend / intimidate others.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact