More

aix1 · 2024-11-11T08:03:40 1731312220

If true, does this imply the end of the scaling laws? I'm guessing all these players were expecting a certain level of model improvement for their compute budget. If they're now disappointed, this suggests their estimates were off?

Or is this a disconnect between pertaining loss and downstream performance on real-world tasks?

light_hue_1 · 2024-11-11T10:39:23 1731321563

Scaling laws are largely about data and parameters, compute mostly more indirectly and weakly.

If there's no extra data most scaling predictions start to look bad.

None of this is surprising. Every AI/ML researcher has done a back of the envelope estimate to see how much data there is and when this must stop. Me included. It's a common topic at conferences.

aix1 · 2024-11-11T12:10:34 1731327034

Am I understanding it correctly that the gist of your argument is that OpenAI, Google etc have run out of available data to train LLMs?

What's your estimate for how many tokens that represents?

pixelbash · 2024-11-11T09:52:16 1731318736

I suspect it’s more the end of an avenue than the end of the road, but it sure seems like throwing more data and compute is hitting diminishing returns on existing methods.

iLoveOncall · 2024-11-11T09:50:14 1731318614

It was never a "law", it was what companies selling models made their investors believe but the limits were obvious from the start.

ForHackernews · 2024-11-11T11:33:10 1731324790

Is the obstacle compute, or is it available training data?

At some point the LLMs are just redigesting their own vomit.

jasfi · 2024-11-11T11:40:39 1731325239

Very likely the training data.

yzydserd · 2024-11-11T09:54:10 1731318850

Punctuated equilibrium

aix1 · 2024-11-11T07:52:24 1731311544

My math is a bit different. Assuming a 20 Cal/hr difference, a 40 hr working week, 52 weeks/year, 7700 Cal per kg of weight loss:

20*40*52/7700=5.4 kg/year

Surprisingly large. Did I mess something up?

jdougan · 2024-11-11T08:14:25 1731312865

Nit: 1 Calorie (capital C) == 1000 calories (lowercase c) == 1 kcal. 1 kCal == 1,000,000 calories, which probably isn't what you meant.

aix1 · 2024-11-11T08:30:40 1731313840

Oh yeah, that old chestnut. Thanks for pointing out. Fixed.

aix1 · 2024-11-09T14:20:13 1731162013

I assume when you say "politically" you include the ongoing war in Ukraine. I think it's a huge factor in this context, given its impact and the risks it presents to folks (especially young men) who choose to remain in Russia.

By some estimates, 900K people have left Russia since the invasion [1], of whom 100K were IT specialists leaving in 2022 alone [2] (I haven't looked at the figures since then). I think that's a pretty strong indication of the sentiment.

And wrt the economy, didn't the Central Bank just increase the benchmark rate to 21% [3] (with another 2% hike widely expected in December)?

https://en.wikipedia.org/wiki/Russian_emigration_during_the_...

https://www.technologyreview.com/2023/04/04/1070352/ukraine-...

https://meduza.io/en/feature/2024/10/29/russia-s-key-interes...

protomolecule · 2024-11-09T16:28:23 1731169703

In Russia programmers working at accredited IT-companies are safe from mobilization [0]

> I think that's a pretty strong indication of the sentiment.

Some left following their western employers, some fled from the mobilization in 2022.

>And wrt the economy, didn't the Central Bank just increase the benchmark rate to 21%

It did, that's why I said 'mostly'. Kremlin is pumping money into defense industry and into payments to contract soldiers, the Central Bank is doing what it can to curb the inflation. Overall it hasn't affected the life much.

[0] https://www.gosuslugi.ru/armydelay

throw-the-towel · 2024-11-10T00:16:18 1731197778

I'm Russian, I used to be a programmer at an accredited company, and I still wouldn't be safe from the draft because I have no university education. (For me personally this is a moot point because I left Russia, but still.)

protomolecule · 2024-11-13T13:05:43 1731503143

Good point, be safe.

oytis · 2024-11-09T17:32:41 1731173561

> Kremlin is pumping money into defense industry and into payments to contract soldiers, the Central Bank is doing what it can to curb the inflation.

Well, yes, and what do you think the impact of it is going to be on the economy?

protomolecule · 2024-11-09T17:37:01 1731173821

The lowest unemployment ever, some redistribution of wealth in favor of workers in the military industry and soldiers.

oytis · 2024-11-09T20:15:40 1731183340

That's the immediate result. But what about later, when the war is over? Military expense is non-productive, economically it's the same as just giving people money for free while they are sitting on the couch (nobody is killed in the latter case though).

Military overspending literally killed Soviet Union, and people returning from Afghanistan war made guaranteed the unforgettable social climate after that. I don't see any reason to be bullish on Russia, even if Trump now gives up Ukraine for Putin.

throw-the-towel · 2024-11-10T00:18:42 1731197922

To give a bit more context to protomolecule's replies: back in 2022 every analyst was predicting total collapse of the Russian economy within a year or thereabouts. This didn't happen. So, we now tend not to believe the doomsayers at all. (And I'm not saying we're right! Just trying to explain the biases in play right now.)

_djo_ · 2024-11-10T10:35:22 1731234922

Which analysts were those?

The ones I was reading emphasised the need for a strong sanctions regime with penalties for third party workarounds, with the reasoning given that Russia had prepared its economy for war and already isolated it to some extent.

Maybe your typical cable TV talking head pundit might’ve been confident about Russia’s economy collapsing, but the actual professional analysts and civil servants who advise governments were much more cautious.

protomolecule · 2024-11-09T22:10:54 1731190254

Yes, Russian economy will crash after the war, just like American after the end of WW2.

AlexeyBelov · 2024-11-10T06:12:25 1731219145

Why this comparison instead of USSR and Afghanistan?

protomolecule · 2024-11-10T07:46:26 1731224786

Because it shows the weakness of the argument.

aix1 · 2024-11-09T13:15:48 1731158148

Amazon and Google :)

https://www.forbes.com/sites/qai/2023/10/31/google-invests-i...

dartos · 2024-11-09T15:24:38 1731165878

Only 2B? I kind of feel like Microsoft got ripped off now.

aix1 · 2024-11-09T15:57:41 1731167861

I guess that depends on how this whole thing plays out over the long run.

aix1 · 2024-11-09T12:56:33 1731156993

A variant I've seen (at a couple of European investment banks):

Analyst -> Associate -> Assistant VP -> VP -> Director -> Managing Director

aix1 · 2024-11-09T12:09:55 1731154195

These days it's the manager who writes the next-level assessment (NLA) and it's the NLA that forms the main body of the packet that the promo committee looks at. They're also the one soliciting and summarizing peer feedback etc.

jeffbee · 2024-11-09T15:22:05 1731165725

That's wildly incompatible with my experience there in the previous decade. I hope the current managers are up to the job! Most of my managers didn't have even the slightest inkling of how to evaluate what I was doing on the job. One of them was mainly occupied with running the "mindfulness" office.

aix1 · 2024-11-09T15:56:38 1731167798

> That's wildly incompatible with my experience there in the previous decade

Yes, there have been some very signficant changes over the last few years. Promos are decided in-org. Managers play a much more important role. There are promo quotas along with associated pressures (felt more acutely in some orgs compared to others).

But, putting that asides, a manager who doesn't have any idea of what or how their reports are doing is clearly failing at their job under any of those systems.

aix1 · 2024-11-09T12:04:31 1731153871

In theory, yes. In practice, it would take some pretty extraordinary circumstances for one to get promoted without their manager's support.

For starters, even if it is the candidate themselves who self-nominate, it is still the manager who writes the promo readiness assessment that forms the main body of the packet. It is also the manager's job to solicit peer feedback and represent it to the committee.

One could imagine a scenario where the manager's opinion diverges sharply from the assessment of the session lead and other senior folks sitting on the committee (who decide collectively) but again, those would be some pretty exceptional circumstances...

Layered on top of this are promo quotas, which already mean that some folks who do tick all the boxes aren't getting promoted as soon as they would be otherwise (or at all). That is to say that there are lots of headwinds even if the manager is supportive, let alone when they aren't.

aix1 · 2024-11-09T11:55:15 1731153315

I'd say that's a rather narrow way to look at it.

I know more than a handful of folks who are very happy with their level of responsibility and comp, and don't want higher expectations and more stress even if it would bring in significantly more cash. I personally think that's a pretty healthy way to look at it. Not everything in life is about money.

aix1 · 2024-11-09T11:51:49 1731153109

Right, while there was a "growth" expectation for L4s written into the SWE job ladder, there were no fixed timelines. Enforcement varied from org to org: at least one of my previous orgs periodically conducted talent reviews, specifically looking at cases like long-tenured L4s to decide whether to intervene.

That was before the layoffs started. One of my by then ex-reports, who was a very talented and knowledgeable but not at all career-focused long-time L4s got laid off in one of the rounds. :(

aix1 · 2024-11-07T08:01:01 1730966461

> embedding a pre-trained KGE model into a transformer model

Do you have any good pointers (literature, code etc) on the mechanics of this?

zxexz · 2024-11-07T08:13:45 1730967225

Check out PyKEEN [0] and go wild. I like to train a bunch of random models and "overfit" them to the extreme (in my mind overfitting them is the point for this task, you want dense, compressed knowledge). Resize the input and output embeddings of an existing pretrained (but small) LLM (input only necessary if you're adding extra metadata on input, but make sure you untie input/output weights). You can add a linear layer extension to the transformer blocks, pass it up as some sort of residual, etc. - honestly just find a way to shove it in, detach the KGE from the computation graph and add something learnable between it and wherever you're connecting it - like just a couple linear layers and a ReLU. The output side is more important, you can have some indicator logit(s) to determine whether to "read" from the detached graph or sample the outputs of the LLM. Or just always do both and interpret it.

(like tinyllama or smaller, or just use whatever karpathy repo is most fun at the moment and train some gpt2 equivalent)

[0] https://pykeen.readthedocs.io/en/stable/index.html

zxexz · 2024-11-07T08:23:59 1730967839

Sorry if that was ridiculously vague. I don't know a ton about the state of the art, and I'm really not sure there is one - the papers just seem to get more terminology-dense and the research mostly just seems to end up developing new terminology. My grug-brained philosophy is just to make models small enough you can just shove things in and iterate quick enough in colab or a locally hosted notebook with access to a couple 3090s, or even just modern Ryzen/EPYC cores. I like to "evaluate" the raw model using pyro-ppl to do MCMC or SVI on the raw logits on a known holdout dataset.

Really always happy to chat about this stuff, with anybody. Would love to explore ideas here, it's a fun hobby, and we're living in a golden age of open-source structured datasets. I haven't actually found a community interested specifically in static knowledge injection. Email in profile, in (ebg_13 encoded).

Jerrrrrrry · 2024-11-07T16:14:52 1730996092

Thank you for your comments (good further reading terms), and your open invitation for continued inquiry.

The "fomo" / deja vu / impending doom / incipient shift in the Overton window regarding meta-architecture for AI/ML capabilities and risks is so now glaring obvious of an elephant in the room it is nearly catatonic to some.

https://www.youtube.com/watch?v=2ziuPUeewK0

napsternxg · 2024-11-07T08:34:42 1730968482

We also did something similar in our NTULM paper at Twitter https://youtu.be/BjAmQjs0sZk?si=PBQyEGBx1MSkeUpX

Used in non generative language models like BERT but should help with generative models as well.

zxexz · 2024-11-07T08:41:19 1730968879

Thanks for sharing! I'll give it a read tomorrow - I do not appear to have read this. I really do wish there were good places for randos like me to discuss this stuff casually. I'm in so many slack, discord, etc. channels but none of them have the same intensity and hyperfocus as certain IRC channels of yore.