Hacker News new | past | comments | ask | show | jobs | submit | LukeB42's comments login

>you'd need a pretty custom toolchain

Just port/extend pytorch to make it easy to utilize fused ops.


Easier said than done. Even with Google level resources, TPU support for pytorch is patchy (https://arxiv.org/abs/2309.07181). Device abstraction is not great, assumes CUDA in unexpected places.


The Groq AI chip startup has solved this problem. They don't use hand written kernels at all, instead they use a compiler, and they have the top speed in the world on LLaMA2-70B, 240tokens/s.

https://www.youtube.com/@GroqInc/videos

Other interesting Groq tidbits - their models are deterministic, the whole system up to thousands of chips runs in sync on the same clock, memory access and network are directly controlled without any caches or intermediaries so they also run deterministically.

That speeds up communication and allows automatic synchronisation across thousands of chips running as one single large chip. The compiler does all the orchestration/optimisation. They can predict the exact performance of an architecture from compile time.

What makes Groq different is that they started from the compiler, and only later designed the hardware.


What is the pass rate on torchbench? This gives a more realistic measure of how good a vendor's pytorch support is.

All the big chip startups have their own pytorch compiler that works on the examples they write themselves. From what I've seen of Groq it doesn't appear to be any different.

The problem is that pytorch is incredibly permissive in what it lets users do. torch.compile is itself very new and far from optimal.


Pytorch XLA is such a pain to use. And once you go TPU you need the same energy to switch back, so you can’t quickly test out how it performs on your problem.


One of the big reasons custom hardware solutions struggle.

IMO - you’d have better luck as a hardware vendor implementing an LLM toolchain and bypassing a general purpose DL framework. At the very least you should be able to post impressive results with this approach rather than a half baked pytorch port.


I feel like that would make it harder for a vendor to keep up with the industry.

Say you took all the effort in the world to build your custom LLM toolchain to train a Llama on custom hardware. And then suddenly someone comes up with LoRA. You didn't even finish porting it to your toolkit then someone comes up with GPTQ.

Can't keep up with a custom toolchain imo.

It's like a forked linux kernel. Eventually you're gonna have to upstream if you're serious about it, which is what AMD is actively doing with pytorch for ROCm (masquerading it as CUDA for compatibility).


I disagree. llama.cpp[0] is a good counterpoint to this, since it uses a custom ML framework created from scratch. Despite not having the developer team of a large company, it still keeps up with many of the advancements in LLMs.

[0] https://github.com/ggerganov/llama.cpp


llama.cpp is not necessary for creating lots of demand for the chip it was originally written for (Apple M1), whereas new hardware vendors need to demonstrate they can plugin to existing tools to generate enough demand to ship in volume.


> lots of demand for the chip it was originally written for (Apple M1)

To be fair, the M1/M2 chip can't be purchased or used separately from the Mac, unlike GPUs or socketed CPUs, and demand for Macs is already fairly high.


That might be good enough to get a hardware startup acquired, but not good enough to get major sales. Users want pytorch and negligible switching cost between chips.

Bigger problem for startups trying to muscle in on LLMs is that there isn't much room for improvement on existing solutions to do something radically different.


>Bigger problem for startups trying to muscle in on LLMs is that there isn't much room for improvement on existing solutions to do something radically different.

aye - unless you are able to notch a 10x cost/performance improvement. The migration overhead will just make it not worth it to switch.


Typically Google resources go to TF before PyTorch, no?


Even after prioritising tensorflow, keras, jax etc., they can still afford to have a very large team working on torch_xla and still hedge their bets with a separate team on torch_mlir.


Everything in an organism is learned.

If you think it isn't then you're not observing the right timescale.


Read the article. They got the mechanism of “learned helplessness” wrong. It’s actually a “learned lack of control”.

This makes every difference in how to treat people who have this problem. All you have to do is teach people how to feel they have control again.


Is this because you don't know how to implement the weight updates in NumPy yourself?


Terrible joke and COMPLETELY missing the point.


Can you explain like I'm 5 why this matters distinctly from how transformers are normally trained with autodiff and what its possible applications are?


I’m talking about attention only transformers. Those don’t have an autodiff but still learn. The math is actually really cool.


> attention only transformers

Can you share any good link on the subject?



Maybe I am missing something, but I don't see any learning without autodiff.


I thought you were asking about attention only transformers. This paper touches on some of it https://arxiv.org/abs/2212.10559v2.


The paper speculates that it is analogous to gradient descent and empirically confirms it is similar in behavior, but it is not a rigorous proof of any kind.

The momentum experiment they made also does not seem related. E.g. it just adds past values to V, which extends the effective context length.


> but it is not a rigorous proof of any kind.

Such is the nature of early theories.



How is fraud in a commercial index relevant to our understanding of money and central banking? Did the Enron scandal undermine our understanding of steam turbines?


How is the interbank interest rate being defined on the fly in-mente relevant to our understanding of some large central banks determining their total credit supply by punching some numbers into a computer ex nihilo?

Seems similar enough to me. That's how.


> the interbank interest rate being defined on the fly

All prices are determined on the fly, certainly day-to-day ones. Libor wasn’t the interbank rate, it was one commercial offering, albeit a powerful one. The Fed Funds rate always was and now SOFR are transactionally derived, which is fundamentally different from Libor, which was never anything more than a survey.


> Seems similar enough to me.

That's a bad criteria if you don't know exactly what you are talking about.


There is a finite amount of energy in this universe and yet here we are at "Practically all prices are determined on the fly".

This is mere bankster handwaving in lieu of calculating physically intrinsic value for a sufficient number of commodities.


> a finite amount of energy in this universe and yet here we are at "Practically all prices are determined on the fly"

This is a silly comparison. Stars don’t model their fusion output. Particles interact on the fly. There is also no model relating entropy to overnight collateralised borrowing rates.

> calculating physically intrinsic value for a sufficient number of commodities

Interbank funds aren’t a finite commodity.


>Stars don’t model their fusion output.

The sum total positive energy contained in the universe can be calculated and predicted.

>Interbank funds aren’t a finite commodity.

This statement is obviously false and can run into brick walls in practice.

The comparison isn't silly in the slightest. Currencies must be coupled to a finite resource to function; Lest agent A buy all of agent B's gold using practically nothing but chutzpah.

That you think the comparison is "silly" shows limited/magical thinking on the subject.


> statement is obviously false

No, it isn’t, though misunderstanding it isn’t even fundamental to the flaw in your thinking. A couple of banks can create and destroy an infinite amount of money among them with no real effect. JPMorgan credits UBS a trillion trillion trillion dollars at the latter’s JPMorgan account at the same time UBS credits JPMorgan at its UBS account, and then they both undo it a moment later. No real effect. Hell, JPMorgan could create the money with no counterbalance so they could look at it how pretty it is for an indefinite amount of time. Same deal. Regulators won’t be happy, but that’s because of the potential effects of UBS trying to buy the Fed’s balance sheet.

It’s when the interbank market interacts with broader markets that anything real happens.


Again more bankster handwaving.

What need do banks have for that capability where the capability shouldn't clearly be criminalised?


> What need do banks have for that capability where the capability shouldn't clearly be criminalised?

Banks don't legally have that capability.

The point wasn't that banks do this. It's that it would have the same-real world effect (again, outside regulatory action and law enforcement) as me writing you a trillion-dollar IOU.


What's the physically intrinsic value of a paper clip?


Number of joules that can be extracted from the atoms composing the molecules composing the steel wire it's made of.

...How can you not see this?


You screwed up the answer here in this classic Uber-commodity based economy (which no actual economist has ever proposed outside of thought experiments).

The traditional answer when people go down this path is “what ever the producer and consumer agree the price is based on a currency denominated in joules that can be extracted from an atom”.

By doing so you’ve eliminated all forms of value adding capabilities from your economic system. The paper clip is no more valuable than its unprocessed atomic components, which is clearly not how real value is derived (or your currency is completely divorced from value).


What's the physically intrinsic value of an energy extractor then?


The value of its atoms and the electrons orbiting those atoms (give or take labour costs for transforming those atoms) in joules or watts.


What if there's only one energy extractor in the world?


> Did the Enron scandal undermine our understanding of steam turbines?

If you know anything about it, you probably are aware it's accounting related rather than technology related.


> it's accounting related rather than technology related

Precisely. The accounting scandal has as much to do with the underlying technology as the Libor scandal does with our understanding of the mechanics of banking. Nobody informed walked away from the Libor scandal rethinking the fundamentals of banking in the same way chickens didn’t get bioengineered in response to chicken Libor.


"I guess that without outside criticism we’d all be driving cars that run on cold fusion, cancer would already have been cured 100 times over, etc."

LeCun invented the convnet and may well have been writing scientific research papers since before the author of that sentence was even born lmfao


They're spies. They just stop transmitting when the signal strength meter person's present. It's not rocket science.


they are spies, they’ll make a nice warning device that is concealed, no?


Doesn't surprise me in the slightest. I know a whole community of civillians who're assaulted with microwave weapons daily.

𝗜𝘁'𝘀 𝘂𝘀𝗲𝗱 𝘁𝗼 𝗽𝘂𝗻𝗶𝘀𝗵 𝗱𝗶𝘀𝘀𝗶𝗱𝗲𝗻𝘁𝘀 𝗮𝗹𝗹 𝗼𝘃𝗲𝗿 𝘁𝗵𝗲 𝘄𝗼𝗿𝗹𝗱.

Here's a retired Royal Navy radiological weapons expert explaining how it's been used on foreign adversaries since the 60s:

https://youtu.be/z99_SzoXZdY

Full video: https://youtu.be/v5Tn89I7uic


Petty criminals are recruited in places like 4chan and KiwiFarms and given computer access to these microwave weapons and trained on how to use them on civilians - children included.

It isn't any specific government or company that is responsible - albeit many are used as front to collect money and influence.

Hacking their devices would uncover identities, as would following the money trail.

Anyone working on this, contact me.


At 1:00, they claim microwaves will trigger a photoelectric cell used as an anti-tamper mechanism in a mine if someone is "beaming" him with microwaves. That doesn't make any sense. Photoelectric cells are there to detect... photons. Light. And they're indeed used as anti-tamper mechanisms.


Microwaves are photons. I don't know if photoelectric cells will respond to that wavelength, but they are the same particle as visible electromagnetic radiation.


Yeah it's supposed to be non-ionising. Curious claim. Excuse me while I read about what happened to Tesla's research on my wirelessly charged phone.


Oh boy, the comments on these are teeming with tin foil hatters and believers in chemtrails.


You can pay people to do that and also to pretend it matters.

I took this photo myself the other day. Look at this contrail starting and stopping: https://ibb.co/t2k8fKd


What is Idem for Blender? Is there a homepage URL?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: