Hacker Newsnew | past | comments | ask | show | jobs | submit | gchadwick's commentslogin

Another example of the growing trend of buying out key parts of a company to avoid any actual acquisition?

I wonder if equity holding employees get anything from the deal or indeed if all the investors will be seeing a return from this?


I wonder if such deals will create employee lawsuits. I'd certainly be looking at legal options if I was one of the founding employees.

It should. Look at what happened at Windsurf when Google did something like this

https://news.ycombinator.com/item?id=44673296


>> one of the founding employees

If you were an employee, you were not a founder. A founding-employee would be someone who explicitly "invested" time/money into a company without compensation. If you are also an employee earning a wage you better have a written agreement stating what amount was "investment" and what amount was compensated wage.


Startups typically offer employees, particularly early employees, substantial equity compensation. If the employer is offering this compensation in bad faith, or otherwise preferring one equity holder over another without an explicit contract - then they are at the very least a crappy business partner. A founding engineer with a 2% stake could be missing out on 5-10 million of this transaction.

As an aside, most founders are paid during the entire project. It’s not hard to raise a preseed round to get yourself paid for 6-24 months to work on an idea. If a founder chose to bootstrap - that’s all fine, but let’s not pretend that the employees aren’t taking massive career risks vs “standard” employers.


> If the employer is offering this compensation in bad faith, or otherwise preferring one equity holder over another without an explicit contract - then they are at the very least a crappy business partner.

I don’t know about you, but every company I’ve ever worked at is a shitty business partner if that’s the metric. The standard has always been I get what we agreed to if I was lucky, and otherwise I got full “I’ve altered the deal, pray I don’t alter further” and dared you to defend your rights.

I actually have called their bluff a few times and gotten some money out of it, but it was always a year long event or more to resolution and involved risking even more money on lawyers.


Just one slight problem: people need to eat, and food costs money.

Your startup won't succeed when its founder starves to death. It's why the founder will usually get a bunch of cash during investment rounds [0]: they can't focus on the company if they are constantly worried about cash in their personal life. Unless the founder is already independently wealthy, it is a guarantee that they'll be employed by the company and being paid a living wage. Heck, in some countries this is even legally required!

According to your logic, no successful startup will ever have a founder, as any form of pay instantly degrades them to regular employee, and any kind risk taken and below-market salary is completely irrelevant. Never mind the fact that they are taking home a minimum-wage salary while working 100 hours a week - they are earning a wage so they can't possibly be a founder.

So if this logic already breaks down for the founder, why couldn't it also break down for early employees whose compensation is mostly in stock options? How is their situation any different from the founder's?

[0]: https://www.stefantheard.com/silicon-valleys-best-kept-secre...


The employees are getting paid twice.

The employees are getting paid zero times.

do they make a salary

Startups often pay a shitty salary in exchange for a decent chunk of stock options, with the implicit promise that you'll make bank if you work hard and make the company successful.

Getting screwed out of your payout by such a totally-not-an-acquisition is wage theft. It's like promising a sales-related bonus at the beginning of the year, and then in December changing the metric to "AI-related sales to the CEO's golf buddies".


Startup options are worthless. The only value most people will ever extract from a startup is the experience they had working there, and the salary that was put in their bank account.

I understand that a lot of inexperienced people (like in this thread) think they're going to get rich though.

No, it is not "wage theft" to not get rich when the company exits (by whatever means).


Nobody expects to get rich off working for a startup. The risks are massive, and very few exit with billion-dollar deals. This is taken into account by the people who work there and accept those stock options: 99.xx% chance of being worth essentially zero, but a tiny yet nonzero chance of being able to retire early when it does a billion-dollar exit. It's a lottery ticket, not a promise - every startup employee understands that.

Groq is now changing the deal after the fact by making those stock options worthless 100% of the time. It's like you participate in a lottery, and then the organizer decides to just not do a draw and keep all the proceeds for themselves. Sorry, but that's theft.

Don't intend to pay out in the unlikely event that you hit it big? Then don't offer stock options to your employees and pay market-rate salaries - plus of course a decent premium for the fact that (unlike an established company) your startup can go bust at any time and doesn't offer stable employment. You can't have it both ways.


Startup options are usually worthless, yes, because very few startups end up getting to a position where the options are worth something.

> No, it is not "wage theft" to not get rich when the company exits

I don't think anyone in this thread thinks they're gonna get rich by working for a startup. There's a hope that they will, that's why they are working, but there's no expectation. Maybe there's an expectation of getting a nice tidy sum after an exit (in the 5 or 6 figures) but not in the 7 or 8 figures, at least not if they're just employees and not founders.

What's being discussed is a startup exiting for billions of dollars and the employees with equity seeing zero of it.

Working for a startup usually means lower wages and longer hours, for the chance at striking it rich if the company succeeds. If employees don't see anything when the company succeeds, there's literally no upside to working for a startup.


I recall having to sit through many trainings on how to value employee equity. My experience is that most startup employers try to BS what it means to convince people to value their equity at a significantly higher price than they otherwise should.

If the employer is explicitly making the employee options worthless, then they should be obligated to disclose this. Otherwise it’s trivial to engineer a corporate entity which pays the employees while “licensing” the technology from an IP holding firm. Later they can simply sell the IP holding firm without owing employees a dollar.


It is absolutely wage theft. Equity is part of the deal. Abusing some legal loophole to deprive employees of ownership and liquidity is not okay.

The implicit promise is only partially true. Very rarely you can find a proven talent that will actually forego significant salary. Often time when that happens the person is close to founders and will have a significant role in shaping the startup and will get quasi-acquired too.

This promise may have been more true before 2010s where public companies were not paying as much in liquid cash and private companies were not valued so aggressively. Fact is most employees take the startup offer because they don't actually have a liquid offer that's super competitive at that moment, or they are just kind of bored and taking a break of the corporate job that does not give them too many responsibilities, i.e. they are compensated via the title, not just the promise of making bank.


That just means you’re pulling from the lower end of the talent pool. There is nothing wrong with this, but usually talent is correlated with outcome. Most hot startups which are going places are near impossible to get into even for folks with good offers.

Your last sentence is not mutually exclusive with my statement. Both could be simultaneously true. The sheer number of big company employees compared to hot startup makes it hard for everyone good to get into the startup, especially considering they usually have more specific needs. That said it could also be the case that the hot startup cannot easily get good employees from big company.

My point was more that the high end ones they do get are usually in the front piece of the airplane in the acquisition split. Also, the really hot startups are actually paying quite a bit of cash upfront so the original premise of employee sacrifice isn’t as true.


If part of their remuneration is in shares, they have a legitimate interest in the value of those shares.

wdym?

They get a share of the $20B plus now they get to work for Nvidia.

>> one of the founding employees

If you were an employee, you were not a founder. A founding-employee would be someone who explicitly "invested" uncompensated time/money into a company without compensation and also worked as an employee. If you are also an employee earning a wage you better have a written agreement stating what amount was "investment" and what amount was compensated wage.


In my career I've seen startups "shut down" and lay off the NA team.

I've seen venture capital acquire startups for essentially nothing laying off the entire product team aside from one DevOps engineer to keep everything running. I've seen startups go public and have their shares plummet to zero before the rank-and-file employees could sell any shares (but of course the executives were able to cash out immediately). I've seen startups acquired for essentially nothing from the lead investor.

In none of these scenarios did any of the Engineers receive anything for their shares.

Yet every day people negotiate comp where shares are valued as anything more than funny money.


I have a friend who worked in a company that got "not acquired" in a similar deal.

She didn't see a dime out of it, and was let off (together with a big chunk of people) within 6 months.


As this gets more common, I think it will eventually lead to startups having a hard time attracting talent with lucrative equity compensation. It will be interesting to see how long it takes until this catches on among employees, but I already wouldn't take any positions in startups with a significant payment in equity anymore. The chances are slim that this pays out anyways, but now even when you are successful, noone will stop some megacorp from just buying the product and key employees and leaving everyone else with their stake in the dust.

At my last job search I didn’t consider any equity based startups seriously because of this trend. It was already such a tenuous path as it stood, but now with the norm established it seems like it’s become impossible for a rank and file employee to get paid out.

I’m more curious how angel investors are being treated in these exits. If _they_ dry up the whole pipeline goes away


Investors with enough into the deal to fight it in court get enough to not fight it. Key employees needed by the 'not acquirer' get compensation sufficient to retain them, although increasingly much of this is under a deferred vesting arrangement to ensure they stay with the 'not acquirer'.

Non-essential employees and small investors without the incentive or pockets to fund a legal fight get offered as little as possible. This structure also provides lots of flexibility to the 'not acquirer' when it comes to paying off existing debts, leases, contracts, etc. Basically, this is the end of being an early employee or small angel investor potentially resulting in a lucrative payoff. You have to remain central and 'key' all the way through the 'not acquisition'. I expect smaller early stage investors will start demanding special terms to guarantee a certain level of payout in a 'not acquisition'. I also expect this to create some very unfortunate situations because an asset sale (as they used to be done), could be a useful and appropriate mechanism to preserve the products and some jobs of a failing (but not yet fully failed) company - which was better for customers and some employees than a complete smoking crater.


that is a great point. it’s one thing to occasionally rugpull employees, who are still at least paid for their services and robbed only of their EV on their options (i say “only”, though i find this increasingly common practice to be absolutely deplorable, to be clear). but how could investors possibly be happy with this becoming the new normal? will it get to the point where these sorts of faux acquisitions also involve paying out investors and only shafting employees? at that point you are only really even getting like a 20% discount over acquiring the company outright, which hardly seems worth it. which is to say that your point is very astute: the investors are definitely the linchpin here.

The company still got $20B of cash(?) in its books, it can pay dividends to its shareholders (investors) and they get their payment. The company can go down the drain afterwards. If it can still make money with its remaining assets that's only a nice small bonus.

So the only ones getting shafted are the employees.


I suppose the firm could simply roll the 20 billion into a long term asset. It’s not a big deal to anyone except employees if the asset never pays out. Departed employees would not be privy to how the money is eventually exited from the now shell company 20 years hence.

> will it get to the point where these sorts of faux acquisitions also involve paying out investors and only shafting employees?

Yes, correct


20% of 20b isn’t exactly loose change, even for a megacorp.

The chance of rank and file employees getting anything has always been small now its smaller.

It's already happening. You need a good lawyer to read equity terms to make sure you aren't going to get rug pulled by a founder later on. Even so I still consider equity to generally be worth zero unless the founder is someone I trust fully, since there are so many ways for them to legally not give you anything.

The equity in almost all startups has already been a bait and switch for more than a decade. Most will refuse to answer you about % of equity share anyways, but if they did it's tiny tiny amounts, and in the end half the time it's up to the acquiring entity just how seriously they end up taking it. If you landed at an entity like Google (as I did from the place I was working 15+ years ago) you could be treated well. Elsewhere, not great.

During boom times it made more financial sense to go straight to a FAANG if you could.


If you ask me there has been a major shift into trying to make "startups" into just another form of corporation. It started years ago when I started seeing things like "Founder Engineer - 0.5% equity" in jobs here.

>As this gets more common

Boy, it would be so nice if a major correction were to drain these massive companies' warchests so that it doesn't become more common.


With the job market being in the state it is in, there will always be people wanting to take their chances.

Let's face it and accept that the golden days of people working in tech startup (and soon large companies) are over.

RIP 1980 - 2023.


LOL@Americans waking up.

I guess you'll have to face the music at some point.


Looking at GDP, the golden age is still right here.

https://data.worldbank.org/indicator/NY.GDP.MKTP.CD?location...

/s


Different kind of gold raining down on us now though

You should have just bought gold

Naa, just wait, its dribbling down.

it always dribbles down after it dribbles up and then it dribbles down again…

Though (very) unevenly distributed

Need this plotted against cumulative American debt lol

Can you say more about why mechanically she didn't get anything?

If you exercise your options you have real stock in the company, so I don't see how you can get shafted here.

Did investors do some sort of dividend cash out before employees were able to exercise their options? (Obviously shady, but more about investors/leadership being unethical than the deal structure).

Would love to know more about how this played out.


Multiple share classes are the norm even before the new acquisition types we see here. It’s extremely common in an acquisition for employee shares to be worth nothing while investor and founder shares are paid out.

But these new “acquisitions” aren’t even that. They are not acquisitions at all. They just hire the talent directly with perhaps an ip rights agreement thrown in as a fig leaf.


I'm well aware of dual class shares, but preferences are typically 1x, and none of the deals were for less than the amount raised, so they're not relevant here.

The fact that these are not really acquisitions doesn't change the fact that Groq the entity now has $20b.


Groq doesn't keep that money, it goes to VCs. They claim the company is "pivoting", not "selling" and avoid the payout trigger.

Money can't just "go" somewhere, it needs a reason first, at least for book-keeping. I mean, VCs can get their invested capital back but on top of that, how would that money be transfered? $20B is a lot and for sure the VCs will not just write an invoice of $18B for consulting services.

Hey, husband of that friend here, The bought company had huge debts to the investors (it is a startup, not tiny but small one, ran for several years) and after cashing those out from the purchase deal, the employees were left with shares that were worth 0$. (might be that the founders also grabbed some money out of that purchase, no one knows tho)

The employees of that bought company were given an incentive by the buying company to stay for a while and help tearing down and integrating their product into the buying company.

One could say shady, I'd say that it was just a bad deal.


Thanks for the details.

It's definitely true that common stock gets $0 if the acquisition price is <= (sum raised + debt).

That sort of sounds like the startup wasn't doing well, and the acquisition wasn't for a lot of money (relative to amount raised), which seems very different from these Groq/Windsurf situations.


There have been at least a half dozen of these deals in the past 1-2 years including Google “licensing” CharacterAI to pull their founders back into Google as valued employees.

In the deal mentioned above: my guess is that preferred class shareholders and common shares got paid out but the common shareholders had such a low payout that it rounded down to zero for most employees.

This can happen even in a regular acquisition because of the equity capital stack of who gets paid first. Investors typically require a 1x liquidation preference (they get their investment back first no matter what).


Liquidation preferences are typically 1x these days, so they only matter when companies are sold at fire sale prices where basically nobody is making any money.

The deals are all weird so it's hard to really know what's happening, but if Groq gets $20b, I don't see how common stock holders don't get paid.


Special dividend to priority class and retain the rest to grow the remaining sham company?

I've seen some discussion that paying out normal employees might look more like an acquisition on paper which they may want to avoid for ftc reasons. I've also seen some discussion that this is a quid pro quo to the trump family to get Nvidia back into China (jr. bought in at the September financing round..).

Lots of speculation in general, including why nvda chose to spend 20bil on this.


Do you actually know this is what happened?

Dividends to only one class seems crazy. I would be kind of shocked if that was legal.


No, I have no visibility. I'm saying speculation is rampant is all.

If I had to guess I'd say investors get their returns but non exec employees mostly get screwed.

I was involved in a (obviously smaller) situation with an acquisition that went to a top consumer CPU maker (you can guess). The investors got nothing as the buyout money was used to fund new pivots in the existing company. So no options or shares were monetized and investors maintained their existing stake that had technically the save value, just most of the value was temporarily all cash. The only people to make out were the ones who went with the asset sale (retention bonus stuff) and the leadership that stayed (raises, etc.)

Is it related to the FTC’s “anti-monopoly” stance with Khan? It’s continue under the Trump admin since her successor supposedly approved of her work

It’s yet another way for investors to screw early employees whose face doesn’t fit.

> However, Groq’s architecture relies on SRAM (Static RAM). Since SRAM is typically built in logic fabs (like TSMC) alongside the processors themselves, it theoretically shouldn't face the same supply chain crunch as HBM.

It's true SRAM comes with your logic, you get a TSMC N3 (or N6 or whatever) wafer, you got SRAM. Unfortunately SRAM just doesn't have the capacity you have to augment with DRAM which you see companies like D-Matrix and Cerebras doing. Perhaps you can use cheaper/more available LPDDR or GDDR (Nvidia have done this themselves with Rubin CPX) but that also has supply issues.

Note it's not really about parameter storage (which you can amortize over multiple users) it's KV cache storage which gets you and that scales with the user count.

Now Groq does appear to be going for a pure SRAM play but if the easily available pure SRAM thing comes at some multiple of the capital cost of the DRAM thing it's not a simple escape hatch from DRAM availability.


SRAM scaling also hit a wall a while ago, so you can't really count on new processes allowing for significantly higher density in the future. That's more of a longer-term issue with the SRAM gambit that'll come into play after the DRAM shortage is over though - logic and DRAM will keep improving while SRAM probably stays more or less where it is now.


You can still scale SRAM by stacking it in 3D layers, similar to the common approach now used with NAND flash. I think HBM DRAM is also directly stacked on-die to begin with, apparently that's the best approach to scaling memory bandwidth too.

It'll be interesting to see if we get any kind of non-NAND persistent memory in the near future, that might beat some performance metrics of both DRAM and NAND flash.


NAND is built with dozens of layers on one die. HBM DRAM is a dozen-ish dies stacked and interconnected with TSVs, but only one layer of memory cells per die. AMD's X3D CPUs have a single SRAM die stacked on top of the regular CPU+SRAM, with TSVs in the L3 cache to connect to the extra SRAM. I'm not aware of anyone shipping a product that stacks multiple SRAM dies; the tech definitely exists but it may not be economically feasible for any mass-produced product.


> AMD's X3D CPUs have a single SRAM die stacked on top of the regular CPU+SRAM, with TSVs in the L3 cache to connect to the extra SRAM.

Just FYI, the latest X3D flipped the stack; the cache die is now on the bottom. This helps transfer heat from the compute die to the heatsink more effectively. In armchair silicon designer mode, one could imagine this setup also adds potential for multiple cache dies stacked, since they do interpose all the signals, why not add a second one ... but I'm sure it's not that simple, for one: AMD wants the package z-heights to be consistent between the x3d and normal chip.


The issue is size, SRAM is 6 transistors per bit while DRAM is 1 transistor and a capacitor. Anyone who wants density starts with DRAM. There’s never been motivation to stack.


I agree with your description and conclusion. Additionally the companies that can make chip stacks like HBM in volume are the HBM manufacturers. As they are bottlenecked by the packaging/stacking right now (while also furiously building new plant capacity) I can't see them diverting manufacturing to stacking a new SRAM tech.


Every time I read about D|S-RAM scaling I'm reminded of https://www.besang.com/

Ever heard of them? What do you think? Vaporware?


Is Groq different from Grok?


They're unrelated. Groq = chip company, Grok = model by x.ai.


The similarity in names is likely to Groq’s detriment.


Maybe, but they'd been operating under that name for 7 years before Elon came along and decided he needed a name for his model.


The swastika was in use for thousands of years before Hitler and his crowd changed its meaning forever.


It's for a CS course at Stanford not a PyTorch boot camp. It seems reasonable to expect some level of academic rigour and need to learn and demonstrate understanding of the fundamentals. If researchers aren't learning the fundamentals in courses like these where are they learning them?

You've also missed the point of the article, if you're building novel model architectures you can't magic away the leakiness. You need to understand the back prop behaviours of the building blocks you use to achieve a good training run. Ignore these and what could be a good model architecture with some tweaks will either entirely fail to train or produce disappointing results.

Perhaps you're working at a level of bolting pre built models together or training existing architectures on new datasets but this course operates below that level to teach you how things actually work.


Karpathy's contribution to teaching around deep learning is just immense. He's got a mountain of fantastic material from short articles like this, longer writing like https://karpathy.github.io/2015/05/21/rnn-effectiveness/ (on recurrent neural networks) and all of the stuff on YouTube.

Plus his GitHub. The recently released nanochat https://github.com/karpathy/nanochat is fantastic. Having minimal, understandable and complete examples like that is invaluable for anyone who really wants to understand this stuff.


I was slightly surprised that my colleagues, who are extremely invested in capabilities of LLMs, didn’t show any interest in Karpathy’s communication on the subject when I recommended it to them.

Later I understood that they don’t need to understand LLMs, and they don’t care how they work. Rather they need to believe and buy into them.

They’re more interested in science fiction discussions — how would we organize a society where all work is done by intelligent machines — than what kinds of tasks are LLMs good at today and why.


What's wrong or odd about that? You can like a technology as a user and not want to delve into how it works (sentence written by a human despite use of "delve"). Everyone should have some notions on what LLMs can or cannot do, in order to use them successfully and not be misguided by their limitations, but we don't need everyone to understand what backpropagation is, just as most of us use cars without knowing much about how an internal combustion engine works.

And the issue you mention in the last paragraph is very relevant, since the scenario is plausible, so it is something we definitely should be discussing.


> What's wrong or odd about that? You can like a technology as a user and not want to delve into how it works

The question here is whether the details are important for the major issues, or whether they can be abstracted away with a vague understanding. To what extent abstracting away is okay depends greatly on the individual case. Abstractions can work over a large area or for a long time, but then suddenly collapse and fail.

The calculator, which has always delivered sufficiently accurate results, can produce nonsense when one approaches the limits of its numerical representation or combines numbers with very different levels of precision. This can be seen, for example, when one rearranges commutative operations; due to rounding problems, it suddenly delivers completely different results.

The 2008 financial crisis was based, among other things, on models that treated certain market risks as independent of one another. Risk could then be spread by splitting and recombining portfolios. However, this only worked as long as the interdependence of the different portfolios was actually quite small. An entire industry, with the exception of a few astute individuals, had abstracted away this interdependence, acted on this basis, and ultimately failed.

As individuals, however, we are completely dependent on these abstractions. Our entire lives are permeated by things whose functioning we simply have to rely on without truly understanding them. Ultimately, it is the nature of modern, specialized societies that this process continues and becomes even more differentiated.

But somewhere there should be people who work at the limits of detailed abstractions and are concerned with researching and evaluating the real complexity hidden behind them, and thus correcting the abstraction if necessary, sending this new knowledge upstream.

The role of an expert is to operate with less abstraction and more detail in her oder his field of expertise than a non-expert -- and the more so, the better an expert she or he is.


Because if you don't understand how a tool works you can't use the tool to it's full potential.

Imagine if you were using single layer perceptrons without understanding seperability and going "just a few more tweaks and it will approximate XOR!"


If you want a good idea of how well LLMs will work for your use case then use them. Use them in different ways, for different things.

Knowledge of backprop no matter how precise, and any convoluted 'theories' will not make you utilize LLMs any better. You'll be worse off if anything.


Yeah, that's what I'm trying to explain (maybe unsuccessfully). I do know backprop, I studied and used it back in the early 00s when it was very much not cool. But I don't think that knowledge is especially useful to use LLMs.

We don't even have a complete explanation of how we go from backprop to the emerging abilities we use and love, so who cares (for that purpose) how backprop works? It's not like we're actually using it to explain anything.

As I say in another comment, I often give talks to laypeople about LLMs and the mental model I present is something like supercharged Markov chain + massive training data + continuous vocabulary space + instruction tuning/RLHF. I think that provides the right abstraction level to reason about what LLMs can do and what their limitations are. It's irrelevant how the supercharged Markov chain works, in fact it's plausible that in the future one could replace backprop with some other learning algorithm and LLMs could still work in essentially the same way.

In the line of your first paragraph, probably many teens who had a lot of time in their hands when Bing Chat was released, and some critical spirit to not get misled by the VS, have better intuition about what an LLM can do than many ML experts.


I disagree in the case of LLMs, because they really are an accidental side effect of another tool. Not understanding the inner workings will make users attribute false properties to them. Once you understand how they work (how they generate plausible text), you get a far deeper grasp on their capabilities and how to tweak and prompt them.

And in fact this is true of any tool, you don’t have to know exactly how to build them but any craftsman has a good understanding how the tool works internally. LLMs are not a screw or a pen, they are more akin to an engine, you have to know their subtleties if you build a car. And even screws have to be understood structurally in advanced usage. Not understanding the tool is maybe true only for hobbyists.


Could you provide an example of an advanced prompt technique or approach that one would be much more likely to employ if they had knowledge of X internal working?


You hit the nail on the head, in my opinion.

There are things that you just can’t expect from current LLMs that people routinely expect from them.

They start out projects with those expectations. And that’s fine. But they don’t always learn from the outcomes of those projects.


I don't think that's a good analogy, becuase if you're trying to train a single layer perceptron to approximate XOR you're not the end user.


None of this is about an end user in the sense of the user of an LLM. This is aimed at the prospective user of a training framework which implements backpropagation at a high level of abstraction. As such it draws attention to training problems which arise inside the black box in order to motivate learning what is inside that box. There aren't any ML engineers who shouldn't know all about single layer perceptrons I think, and that makes for a nice analogy to real life issues in using SGD and backpropagation for ML training.


The post I was replying to was about "colleagues, who are extremely invested in capabilities of LLMs" and then mentions how they are uninterested in how they work and just interested in what they can do and societal implications.

It sounds to me very much like end users, not people who are training LLMs.


The analogy is if you don't understand the limitations of the tool you may try and make it do something it is bad at and never understand why it will never do the thing you want despite looking like it potentially coild


I think there are a lot of people who just don't care about stuff like nanochat because it's exclusively pedagogical, and a lot of people want to learn by building something cool, not taking a ride on a kiddie bike with training wheels.


That's fine as far as it goes, but there is a middle ground ...

Feynman was right that "If you can't build it, you don't understand it", but of course not everyone needs or wants to fully understand how an LLM works. However, regarding an LLM as a magic black box seems a bit extreme if you are a technologist and hope to understand where the technology is heading.

I guess we are in an era of vibe-coded disposable "fast tech" (cf fast fashion), so maybe it only matters what can it do today, if playing with or applying towards this end it is all you care about, but this seems a rather blinkered view.


The problem is that not even the ones building them can understand it. Otherwise they wouldn't be breaking their "I promise AGI by the end of next year" promises for the Nth time.

That or they are flat out lying. My money's on the latter.


If everyone had to understand how carburettors, engines and break systems work; to be able to drive a car - rather than just learn to drive and get from A to B - I'm guessing there would be a lot less cars on the road.

(Thinking about it, would that necessarily be a bad thing...)


The problem is that we have a huge swathe of "mechanics" that basically don't know much more than how to open a paintcan and paint a pig despite promising to deliver finely tuned supercars with their magic car making machine.


I'm personally very interested in how LLMs work under the hood, but I don't think everyone who uses them as tools needs that. I don't know the wiring inside my drill, but I know how to put a hole in my wall and not my hand regardless.


The thing is that if you actually learn how they work, they lose all of their magic. This has happened to anyone I know who bothered studying them and is not selling them. So I'd rather people learned. Knowing how a drill works doesn't make you any less likely to use a drill.


Not everybody who drives a car (even as a professional driver) knows how to make one.

If you live in a world of horse carriages, you can be thinking about what the world of cars is going to be like, even if you don't fully understand what fuel mix is the most efficient or what material one should use for a piston in a four-stroke.


Do you go deep into molecular biology to see how it works , it is much more interesting and important


But the question is if you have a better understanding of LLMs from a user's perspective, or they.


Obviously they are more focused on making something that works


Wow. Definitely NOT management material then.


Which is terrible. That's the root of all the BS around LLMs. People lacking understanding of what they are and ascribing capabilities which LLMs just don't have, by design. Even HN discussions are full of that. Even though this page literally has "hacker" in its name.


I see your point but on the other hand a lot of conversations go: A: what will we do when AI do all the jobs, B: that's silly LLMs can't do the jobs. The thing is A didn't say LLM, they said AI as in whatever that will be a short while into the future. Which is changing rapidly because thousands of bright people are being paid to change it.


> a short while into the future

And what gives you that confidence? A few AI nerds already claimed that in the 80s.

We're currently exploring what LLMs can do. There is no indication that any further fundamental breakthrough is around the corner. Everybody is currently squeezing the same stone.


The trouble is that "AI" is also very much a leaky abstraction, which makes it tempting to see all the "AI" advances of recent years, then correctly predict that these "AI" advances will continue, but then jump to all sorts of wrong conclusions about what those advances will be.

For example, things like "AI" image and video generation are amazing, as are things like AlphaGo and AlphaFold, but none of these have anything to do with LLMs, and the only technology they share with LLMs is machine learning and neural nets. If you lump these together with LLMs, calling them all "AI", then you'll come to the wrong conclusion that all of these non-LLM advances indicate that "AI" is rapidly advancing and therefore LLMs (also being "AI") will do too ...

Even if you leave aside things like AlphaGo, and just focus on LLMs, and other future technology that may take all our jobs, then using terms like "AI" and "AGI" are still confusing and misleading. It's easy to fall into the mindset that "AGI" is just better "AI", and that since LLMs are "AI", AGI is just better LLMs, and is around the corner because "AI" is advancing rapidly ...

In reality LLMs are, like AlphaFold, something highly specific - they are auto-regressive next-word predictor language models (just as a statement of fact, and how they are trained, not a put-down), based on the Transformer architecture.

The technology that could replace humans for most jobs in the future isn't going to be a better language model - a better auto-regressive next-word predictor - but will need to be something much more brain like. The architecture itself doesn't have to be brain-like, but in order to deliver brain-like functionality it will probably need to include another half-dozen "Transformer-level" architectural/algorithmic breakthroughs including things like continual learning, which will likely turn the whole current LLM training and deployment paradigm on it's head.

Again, just focusing on LLMs, and LLM-based agents, regarding them as a black-box technology, it's easy to be misled into thinking that advances in capability are broadly advancing, and will rise all ships, when in reality progress is much more narrow. Headlines about LLMs achievement in math and competitive programming, touted as evidence of reasoning, do NOT imply that LLM reasoning is broadly advancing, but you need to get under the hood and understand RL training goals to realize why that is not necessarily the case. The correctness of most business and real-world reasoning is not as easy to check as is marking a math problem as correct or not, yet that capability is what RL training depends on.

I could go on .. LLM-based agents are also blurring the lines of what "AI" can do, and again if treated as a black box will also misinform as to what is actually progressing and what is not. Thousands of bright people are indeed working on improving LLM-adjacent low-hanging fruit like this, but it'd be illogical to conclude that this is somehow helping to create next-generation brain-like architectures that will take away our jobs.


I'll give you algorithmic breakthroughs have been quite slow to come about - I think backpropagation in 1986 and then transformers in 2017. Still the fact that LLMs can do well in things like the maths olympiad have me thinking there must be some way to tweak this to be more brain like. I recently read how LLMs work and was surprised how text focused it is, making word vectors and not physical understanding.


> Still the fact that LLMs can do well in things like the maths olympiad have me thinking there must be some way to tweak this to be more brain like

That's because you, as you admit in the next sentence, have almost no understanding of how they work.

Your reasoning is on the same level as someone in the 1950s thinking ubiquitous flying cars are just a few years away. Or fusion power, for that matter.

In your defense, that seems to be about the average level of engagement with this technology, even on this website.


Maybe but the flying cars and fusion ran into fundamental barriers of the physics being hard. With human level intelligence though we have evidence it's possible from our brains which seem to use less compute than the LLMs going by power usage so I don't see a fundamental barrier to it just needing some different code.


You could say there is no fundamental barrier to humans doing anything that is allowed by the laws of Physics, but that is not a very useful statement, and doesn't indicate how long it may take.

Since nobody has yet figured out how to build an artificial brain, having that as a proof it's possible doesn't much help. It will be decades or more before we figure out how the brain works and are able to copy that, although no doubt people will attempt to build animal intelligence before fully knowing how nature did it.

Saying that AGI "just needs some different code" than an LLM is like saying that building an interstellar spaceship "just needs some different parts than a wheelbarrow". Both are true, and both are useless statements offering zero insight into the timeline involved.


> I don't see a fundamental barrier to it

Neither did the people expecting fusion power and flying cars to come quickly.

We have just as much evidence that fusion power is possible as we do that human level intelligence is possible. Same with small vehicle flight for that matter.

None of that makes any of these things feasible.


> Still the fact that LLMs can do well in things like the maths olympiad have me thinking there must be some way to tweak this to be more brain like.

That's like saying, well, given how fast bicycles make us, so much closer to horse speed, I wonder if we can tweak this a little to move faster than any animal can run. But cars needed more technological breakthroughs, even though some aspects of them used insights gained from tweaking bicycles.


Yes, it's a bit shocking to realize that all LLMs are doing is predicting next word (token) from samples in the training data, but the Transformer is powerful enough to do a fantastic job of prediction (which you can think of as selecting which training sample(s) to copy from), which is why the LLM - just a dumb function - appears as smart as the human training data it is copying.

The Math Olympiad results are impressive, but at the end of the day is just this same next word prediction, but in this case fine tuned by additional LLM training on solutions to math problems, teaching the LLM which next word predictions (i.e. output) will add up to solution steps that lead to correct problem solutions in the training data. Due to the logical nature of math, the reasoning/solution steps that worked for training data problems will often work for new problems it is then tested on (Math Olympiad), but most reasoning outside of logical domains like math and programming isn't so clear cut, so this approach of training on reasoning examples isn't necessarily going to help LLMs get better at reasoning on more useful real-world problems.


I’m trying not to be disappointed by people, I’d rather understand what’s going on in their minds, and how to navigate that.


And to all the LLM heads here, this is his work process:

> Yesterday I was browsing for a Deep Q Learning implementation in TensorFlow (to see how others deal with computing the numpy equivalent of Q[:, a], where a is an integer vector — turns out this trivial operation is not supported in TF). Anyway, I searched “dqn tensorflow”, clicked the first link, and found the core code. Here is an excerpt:

Notice how it's "browse" and "search" not just "I asked chatgpt". Notice how it made him notice a bug


First of all, this is not a competition between “are LLMs better than search”.

Secondly, the article is from 2016, ChatGPT didn’t exist back then


I doubt he's letting LLM creep in to his decision-making in 2025, aside from fun side projects (vibes). We don't ever come across Karpathy going to an LLM or expressing that an LLM helped in any of his Youtube videos about building LLMs.

He's just test driving LLMs, nothing more.

Nobody's asking this core question in podcasts. "How much and how exactly are you using LLMs in your daily flow?"

I'm guessing it's like actors not wanting to watch their own movies.


Karpathy talking for 2 hours about how he uses LLMs:

https://www.youtube.com/watch?v=EWvNQjAaOHw


Vibing, not firing at his ML problems.

He's doing a capability check in this video (for the general audience, which is good of course), not attacking a hard problem in ML domain.

Despite this tweet: https://x.com/karpathy/status/1964020416139448359 , I've never seen him citing an LLM helped him out in ML work.


You're free to believe whatever fantasy you wish, but as someone who frequently consults an LLM alongside other resources when thinking about complex and abstract problems, there is no way in hell that Karpathy intentionally limits his options by excluding LLMs when seeking knowledge or understanding.

If he did not believe in the capability of these models, he would be doing something else with his time.


One can believe in the capability of a technology but on principle refuse to use implementations of it built on ethically flawed approaches (e.g., violating GPL licensing laws and/or copyright, thus harming open source ecosystem).


AI is more important than copyright law. Any fight between them will not go well for the latter.

Truth be told, a whole lot of things are more important than copyright law.


Important for whom, the copyright creators? Being fed is more important than supermarkets, so feel free to raid them?


Conflating natural law -- our need to eat -- with something we pulled out of our asses a couple hundred years ago to control the dissemination of ideas on paper is certainly one way to think about the question.

A pretty terrible way, but... certainly one way.


I am sure it had nothing to do with the amount of innovation that has been happening since, including the entire foundation that gave us LLMs themselves.

It would be crazy to think the protections of IP laws and the ability to claim original work as your own and have a degree of control over it as an author fostered creativity in science and arts.


Innovation? Patents are designed to protect innovation. Copyright is designed to make sure Disney gets a buck every time someone shares a picture of Mickey Mouse.

The human race has produced an extremely rich body of work long before US copyright law and the DMCA existed. Instead of creating new financial models which embrace freedoms while still ensuring incentives to create new art, we have contorted outdated financial models, various modes of rent-seeking and gatekeeping, to remain viable via artificial and arbitrary restriction of freedom.


Patents and copyright are both IP. Feel free to replace “copyright” with “IP” in my comment. Do you not agree that IP laws are related to the explosion of innovation and creativity in the last few hundred years in the Western world?

Furthermore, claiming “X is not natural” is never a valid argument. Humans are part of nature, whatever we do is as well by extension. The line between natural and unnatural inevitably ends up being the line between what you like and what you don’t like.

The need to eat is as much a natural law as higher human needs—unless you believe we should abandon all progress and revert to pre-civilization times.

IP laws ensure that you have a say in the future of the product of your work, can possibly monetise it, etc., which means a creative 1) can fulfil your need to eat (individual benefit), and 2) has an incentive to create it in the first place (societal benefit).

In the last few hundred years intellectual property, not physical property, is increasingly the product of our work and creative activities. Believing that physical artifacts we create deserve protection against theft while intellectual property we create doesn’t needs a lot of explanation.


What you see as copyright violation, I see as liberation. I have open models running locally on my machine that would have felled kingdoms in the past.


I personally see no issue with training and running open local models by individuals. When corporations run scrapers and expropriate IP at an industrial scale, then charge for using them, it is different.


What about Meta and the commercially licensed family of Llama open-weight models?


I have not researched closely enough but I think it falls under what corporations do. They are commercially licensed, you cannot use them freely, and crucially they were trained using data scraped at an industrial scale, contributing to degradation of the Web for humans.


Since Llama 2, the models have been commercially licensed under an acceptable use policy.

So you're able to use them commercially as you see fir, but you can't use them freely in the most absolute sense, but then again this is a thread about restricting the freedoms of organizations in the name of a 25-year-old law that has been a disgrace from the start.

> contributing to degradation of the Web for humans

I'll be the first to say that Meta did this with Facebook and Instagram, along with other companies such as Reddit.

However, we don't yet know what the web is going to look like post-AI, and it's silly to blame any one company for what clearly is an inevitable evolution in technology. The post-AI web was always coming, what's important is how we plan to steward these technologies.


The models are either commercial or not. They are, and as such they monetise the work of original authors without their consent, compensation, and often in violation of copyleft licensing.

> The post-AI web was always coming

“The third world war was always coming.”

These things are not a force of nature, they are products of human effort, which can be ill-intentioned. Referring to them as “always coming” is 1) objectively false and 2) defeatist.


> Continuing the journey of optimal LLM-assisted coding experience. In particular, I find that instead of narrowing in on a perfect one thing my usage is increasingly diversifying

https://x.com/karpathy/status/1959703967694545296



what you did here is called confirmation bias.

> I think congrats again to OpenAI for cooking with GPT-5 Pro. This is the third time I've struggled on something complex/gnarly for an hour on and off with CC, then 5 Pro goes off for 10 minutes and comes back with code that works out of the box. I had CC read the 5 Pro version and it wrote up 2 paragraphs admiring it (very wholesome). If you're not giving it your hardest problems you're probably missing out.

https://x.com/karpathy/status/1964020416139448359


Yes, embedding .py code inside of a speedrun.sh to "simplify the [sic] bash scripts."

Eureka runs LLM101n, which is teaching software for pedagogic symbiosis.

[1]:https://eurekalabs.ai/


I'd say there's a mix of 'Chinese GPUs are not that good after all' and 'Nvidia doesn't have any magical secret sauce, and China could easily catch up' going on. Nvidia GPUs are indeed remarkable devices with a complex software stack that offers all kinds of possibilities that you cannot replicate over night (or over a year or two!)

However they've also got a fair amount of generality, anything you might want to do that involves huge amounts of matmuls and vector maths you can probably map to a GPU and do a half decent job of it. This is good for things like model research and exploration of training methods.

Once this is all developed you can cherry pick a few specific things to be good at and build your own GPU concentrating on making those specific things work well (such as inference and training on Transformer architectures) and catch up to Nvidia on those aspects even if you cannot beat or match a GPU on every possible task, however you don't care as you only want to do some specific things well.

This is still hard and model architectures and training approaches are continuously evolving. Simplify things too much and target some ultra specific things and you end up with some pretty useless hardware that won't allow you to develop next year's models, nor run this year's particularly well. You can just develop and run last year's models. So you need to hit a sweet spot between enough flexibility to keep up with developments but don't add so much you have to totally replicate what Nvidia have done.

Ultimately the 'secret sauce' is just years of development producing a very capable architecture that offers huge flexibility across differing workloads. You can short-cut that development by reducing flexibility or not caring your architecture is rubbish at certain things (hence no magical secret sauce). This is still hard and your first gen could suck quite a lot (hence not that good after all) but when you've got a strong desire for an alternative hardware source you can probably put up with a lot of short-term pain for the long-term pay off.


What does "are not good after all" even mean? I feel there are too many value judgements in that question's tone, that blindsides western observers. I feel like the tone has the hidden implication of "this must be fake after all, they're only good at faking/stealing, nothing to see here move along".

Are they as good as Nvidia? No. News reporters have a tendency to hype things up beyond reality. No surprises there.

Are they useless garbage? No.

Can the quality issues be overcome with time and R&D? Yes.

Is being "worse" a necessary interim step to become "good"? Yes.

Are they motivated to become "good"? Yes.

Do they have a market that is willing to wait for them to become "good"? Also yes. It used to be no, but the US created this market for them.

Also, comparing Chinese AI chips to Nvidia is a bit like comparing AWS with Azure. Overcoming compatibility problems is not trivial, you can't just lift and shift your workload to another public cloud, you are best off redesigning your entire infra for the capabilities of the target cloud.


I think my question made it clear I'm not simply assuming China is somehow cheating here - either in the specs of their current product, or in stealing IP.

No, I just struggle to reconcile (but many answers here go some way to clarifying) Nvidia being the pinnacle of the R&D-driven tech industry - not according to me but to global investors - and China catching up seemingly easily.


Unfortunately I think global investors are quite dumb. For example all the market analysts were very positive about ASML, Nvidia, etc but they all assumed sales to China would continue according to projections that don't take US sanctions or Chinese competition into account. Every time a sanction landed or a Chinese competitor made major step forward, it was surprise pikachu, even though enthusiasts who follow news on this topic saw it coming years ago.


To me at least "not good after all" means their current latest hardware has issues which means it cannot replace Nvidia GPUs yet. This is a hard problem so not getting there yet doesn't imply bad engineering just a reflection of the scale of the challenge! It also doesn't imply that if this generation is a miss following generations couldn't be large win. Indeed I think it would be very foolish to assume that Alibaba or other Chinese firms cannot build devices that can challenge Nvidia here on the basis of current generation not being up to it yet. As you say they have a large market that's willing to wait for them to become good.

Plus it may not be true, this new Alibaba chip could turn out to be brilliant.


If I'm reading this right, glitching the I2C bus prevents the Secure Enclave from booting. It seems the device recovers from this itself 'Although the device recovered and remained operable', maybe the Secure Enclave reboots itself after seeing a fault in the I2C?

No evidence of any security issue is presented. Though it's certainly wanted to drum it as something major 'This is a high-severity, unpatchable design flaw'.


The device "recovering" while entering debug mode on production hardware is the security issue. Fuses are supposed to prevent that. They don’t. That’s the flaw.


If I own an iPhone 15 Pro, how am I impacted by this? Why does this repo say that a hardware recall may be necessary?


If debug logic is still active, attackers with physical access can dump firmware, extract secrets, or bypass protections that should be fused off.

Think: stolen phones, shady repair shops, or border checks — cases where physical access + this flaw = real risk.

That’s why a hardware recall may be necessary... fuses are meant to be irreversible. If they fail, there's no patch.


I thought it was a great book, dives into all the details and lays it out step by step with some nice examples. Obviously it's a pretty basic architecture and very simplistic training but I found it gave me the grounding to then understand more complex architectures.


The CTO of applications reporting to CEO of applications (who reports to the actual CEO) is kinda weird? I figure you're either the actual CTO or you're not a C-level exec and should have another title. Just more title inflation I guess. Maybe in same way you see VPs of X everywhere in some organizations we'll be starting to see CEO/CTO of X lower and lower down the org chart.


Sounds to me like another phase in the growth of the "Unaccountability Machine"

https://www.amazon.com/Unaccountability-Machine-Systems-Terr...

Oh the CTO approved it so we should blame them. No not that CTO the other CTO. Oh so who decided on the final out come. The CTO! So who is on first again?


“CTO” makes sense as a signal that “the buck stops here” for technical issues. They are the highest-ranking authority on technical decisions for their silo, with no one above them (but two CEO’s above them for business decisions)

If Mira Murati (CTO of OpenAI) has authority over their technical decisions, then it’s an odd title. If I was talking with a CTO, I wouldn't expect another CTO to outrank or be able to overrule them.


It would be quite strange indeed for Mira Murati to have a say over their technical decisions, considering she does not work for OpenAI :)


It's signaling P&L responsibilities. It's not that weird, at least not unheard of at all, just that it's typically done through "EVP" - So EVP Applications, VP of Applications Engineering, etc. - I'm guessing that the line items those "C"s who are not Sam are responsible for, are bigger than most F500 executives, and they're using titles to reflect that reality.


just gonna point out that Google has done this as well and its not so much title inflation as it is just acknownledging the fact that if the unit they command was a standalone business they would well be worth the CEO/CTO title.


Yep. CEO of YouTube, Google Cloud, etc.


This C level thing is happening for decades. There are many with CTO title who manage groups sometime as small as 5-10 people. And they are not startups but large corporates.


Just look at any media agency (OMG as an example). There are CEOs up the wazoo, one for North America, one for EU etc.

In practice, these are just internal P&Ls.


I'm just sitting here wondering what in the world "Applications" is, is that a subsidiary or what?


A thing that uses a model to have customers.


Apparently that's what they call ChatGPT, Codex, etc ¯\_(ツ)_/¯


Andes (Condor is owned by Andes) seems to get relatively little press Vs other RISCV outfits. My sense is they've been quietly building a very solid RISCV CPU business with a great IP portfolio.

This latest core looks very interesting, can't wait to see it hit silicon and see what it can really do!


Andes has won a lot of sockets already with their lower-power cores. They have almost become the #1 choice for RISC-V cores.


I think they've been doing more RISC-V deals than SiFive for quite a few years, due largely I think to their proximity to and established relationships with (for NDS32) a lot of the current customers for RISC-V.


Who knows what the closed source models use but certainly going by what's happening in open models all the big changes and corresponding gains in capability are in training techniques not model architecture. Things like GQA and MLA as discussed in this article are important techniques for getting better scaling but are relatively minor tweak vs the evolution in training techniques.

I suspect closed models aren't doing anything too radically different from what's presented here.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: