Data labeling has been moving to onshore / higher paid work. There's still a lot offshore, but for LLMs in particular and various specialized models, there's a massive trend toward hiring highly educated, highly paid specialists in the US.
But as other commenters have warned: beware of labor laws, especially in CA/NY/MA.
I've had a front-row seat to this...our company hires + employs contract W2 and 1099 workers for the tech industry. Two years ago we started to get a ton of demand from data labeling companies and more recently foundation model cos who are doing DIY data labeling. Companies are converting 1099 workforces to W2 to avoid misclassification. Or they're trying to button up their use of 1099 to avoid being offside.
I do agree. But the thing I would add though is that at least when I was there (W2012 - we were in the same batch!) YC and PG in particular were vocal about principles they believed in, especially the idea that naive young founders get kicked around in Silicon Valley and ought to be treated fairly. PG talked about this constantly and of course it’s a big theme of his essays. I remember one particular instance where a potential investor was behaving questionably and PG offered to step in and talk to them if they crossed a certain line (which they did not in the end). He was clear about what he concerned acceptable and unacceptable and why. And we were nowhere near the cool end of the batch.
Yes YC acts in its interests but in my experience they live by good principles and that makes all the difference. The Chaos Monkeys anecdote is an example of that. So I don’t agree with the article’s framing that they throw their weight around to simply exercise whatever power they have over others for financial gain.
Maybe this is a dumb question but I thought valuations had fallen a bunch since mid 2022 and now lots of VC firms are struggling with companies (especially mid-to-late stage) who raised at much higher valuations than they can now get in the market. But this firm is saying that current valuations are too high to make new investments. Would it not be a good time to invest in later stage startups? Or is the issue that the forward growth potential of these companies is lower now for some reason.
Valuations have "fallen" but not actually fallen: there are very bad consequences to raising what's known as a "down-round"(1) so no one is actually doing that unless they are absolutely forced to. So no company is interested in actually allowing an investment at that lower valuation. They only do that if they are absolutely forced to, because they desperately need the money.
So while yes, when the valuations go down seems like a perfect time to buy (buy low, sell high!), in these closed markets it is difficult to find someone to accept your money when it is down. This is a big difference from the publicly traded market, where you can essentially always buy stock. But in these private markets, everyone agrees that the value of a share of company X is lower than before, but no one is willing to sell you a share today at that price, so you can't actually invest your money.
1: Where the top-line valuation is below the previous valuation. This is extremely bad for a company because investors almost always have protections for a down-round, so the loss generally is felt entirely by the workers and the founder.
There are several strategies companies can employ. One common approach is to raise an extension or bridge round. Many startups are adopting this method, with estimates indicating that approximately 40% of current funding rounds fall into this category.
In these cases, companies raise funds at the same valuation as their previous round, often labeled as Series A+ or Series C+ or Series B Extension.
Another, less common strategy involves using a SAFE (Simple Agreement for Future Equity), which will convert to equity during the next priced round.
That is possible, or hit break even. You'd be surprised how quickly a company can go from -50% margins to positive margins when their job is on the line.
About a year ago my org made shaving costs our highest priority. Our infra team spent half a year slashing our cloud spend, and we've been pushing hard to become cash flow neutral.
I have to imagine this priority shift is in part due to the money markets being what they are.
private valuations are weird. There's some abstract idea that 'valuations have fallen' for sure - people are saying things like "I don't think COMPANY_BLAH that raised $100M at a $1B valuation is actually worth $1B"
But it was never officially 'worth' that much in the same way as a market cap of a public company anyway. If they do a downround, where they raise money at a lower valuation than the previous one, that's generally bad for everyone, so there's a strong tendency to try to 'wait it out' and just pretend they're still worth $1B and hope the market recovers and no one has to write down their investments.
I was part of the W12 batch of YC (which was a lot smaller, ~60 companies, back when Paul Graham was still leading YC).
YC funds a lot of companies and has always had super high variance in the companies it funds. Entrepreneurs are a wild bunch of people. There have always been companies where the founders turned out to be BS artists or sociopaths. Companies that folded immediately after the program started. Companies with messy cofounder breakups already brewing at the beginning of the batch. Companies that turned out to be slightly scammy. Some of the founders that were in those companies pivoted and became successful.
Picking on Pear AI (which I don't know anything about) as evidence of YC failing is silly. It's also a super early stage company and you really have no idea what they will do.
The test of YC to me is, can they keep attracting and picking some of the best founders (which you can't really tell for years). And providing the inspiring, warm, but pushy environment that best sets up founders for success, and in turn keeps them coming to YC. I'd apply to YC again in a heartbeat if I were ever starting another company.
It takes time to build a company to significant revenue. I'd be curious to rule that out as the primary explanation before reading too much else into this.
This is great. This is mentioned in the discussion topics section ("modeling randomness in gathering belongings"). But I wonder what the distribution of baggage-gathering time looks like? I'd guess pretty positively skewed, i.e. most people are pretty quick but then a few people take a long time or a really long time. The speed of the flush stage of a particular wave is capped at the slowest person in that wave.
I do think there will/should be a reckoning about the how training data is acquired and attributed. For example, LLMs could attempt to cite sources, or share ad revenue fractionally with all the sources of that inform the response they're presenting.
I think that as the magic wears off it's becoming clearer that LLMs are more like fancy search engine UIs than intelligent agents. They surface, remix, and mash up content that everyone else created, without the permission of the creators.
That doesn't mean there won't be economic fallout. Spotify may have figured out legal streaming - but the music industry is still much smaller than it was in the 90s
> For example, LLMs could attempt to cite sources, or share ad revenue fractionally with all the sources of that inform the response they're presenting.
Neither of which address my problem: how do I share with people generally without sharing with AI?
At the risk of sounding flippant you might print your articles out onto sheets of paper and send them to interested parties by mail.
I'm sure a standard not unlike robots.txt will emerge. That might give some comfort, although I would remain sceptical given that many crawlers refuse to honour it.
> you might print your articles out onto sheets of paper and send them to interested parties by mail.
Or, better, do what I've actually done and make the websites private, invite-only.
> I would remain sceptical given that many crawlers refuse to honour it.
yeah, a robots.txt-like solution isn't adequate for just that reason. I don't rely on robots.txt alone to stop crawlers because of your observation here.
It certainly makes it easy for OpenAI, Microsoft, and Google (etc.) to benefit from what I shared, charging a toll to end users and buffered from any consequences of sharing it incorrectly. If I had some assurance they'd link back to my content so that users could see the primary source material, and if they did all this for free, I'd be keener to share.
It seems clear that there will be an initial contraction that we're seeing now, with people being distrustful of others benefiting from their work.
I've been doing art for decades, and so much of what I did in the past got merged into culture without much in the way of remuneration, even when I did get paid. Commercial and fine artists who make money off their works are rare, and the main benefactors were large corporations long before OpenAI came to pick the bones clean.
As we circle the event horizon (personally, I'm with the people who argue we passed the point of no return back in the 1930s), it will get more difficult to tell what's going to happen next, but everything only has to be added to the training data once. To a determined attacker, there is no data fortress that can't be raided, and it only has to be raided once.
The old hacker motto, "Information wants to be free", wasn't an ideal to work towards, it was a statement of fact: keeping information locked up is hard and it only has to get loose once.
The problem of how to get paid has always been the main problem facing people who work. I suspect with compensation, like everything else, we'll do the right thing after we've exhausted all other options.
I don't want to share with AI because if what proponents of AI are predicting is correct, I think it will result in very bad things. I don't want to have contributed to that, even a little.
It was nice when local radio DJs had platforms to feature local artists. This is becoming rarer as DJs disappear, and the audience fewer as they move to algorithmic feeds.
Second this. The 32x64 version is what I used to make this digital tide clock as a project with my nephew https://www.filepicker.io/api/file/UzFNawTTWq4h5h6YF9Im. It fetches tide data from NOAA to tell you next high and low tides.
We also made an arcade game with the LED screen.
Not the cheapest route but it’s relatively easy and fun to get it working
But as other commenters have warned: beware of labor laws, especially in CA/NY/MA.
I've had a front-row seat to this...our company hires + employs contract W2 and 1099 workers for the tech industry. Two years ago we started to get a ton of demand from data labeling companies and more recently foundation model cos who are doing DIY data labeling. Companies are converting 1099 workforces to W2 to avoid misclassification. Or they're trying to button up their use of 1099 to avoid being offside.
reply