Hacker Newsnew | past | comments | ask | show | jobs | submit | bdcs's commentslogin

"Everybody will need to do some work if he is to be contented ... a 15-hour week may put off the problem for a great while. For 3 hours a day is quite enough to satisfy the old Adam in most of us!" - Keynes, 1930

Though this was a 100-year prediction so we still got three and half to go!


Trying to lose is also fun (as white)

Some observations:

* Knights are color bound

* You can mate with Knight & King (K+K is still insufficient material)

* 3 fold repetition still applies (and has a popup!)


How do you mate with N+K? Surely your King can't give check, and if your Knight is giving check then the enemy king can just take a step toward it to get out of check.


Here's my attempt at a undergrad-level summary (corrections welcome!):

The core idea is to quantize KV cache, but do so in a way that destroys minimal information. In this case, it's similarly scores between vectors. The simplest way to do this is to change all the elements from 16bit of precision to, say, 4 bits (Scalar Quant.). These papers improve on it by realizing: almost all the energy (concentration of measure) is towards the equator of the hypersphere (normally distributed as 1/d; d=vector dimensionality). (The curse/blessing of hyper dimensionality strikes again.) So when we quantize the elements (think "latitudes", e.g. to the nearest degree) we destroy a lot of information because basically all the vectors were around the equator (so some latitudes have a lot of vectors and some have very few). The idea is to rotate the vectors away from the equator so they're more consistently distributed (to better preserve the entropy during quantization, which I guess was amitport's DRIVE idea). PolarQuant does a hyperpolar coordinate transform which superficially seems neat for preserving entropy because of this equator/polar framing (and ultimately unnecessary as shown by TurboQuant). They also realized there's a bias to the resulting vectors during similarity, so they wrote the QJL paper to fix the bias. And then the TurboQuant paper took PolarQuant + QJL, removed the hyperpolar coords, and added in some gross / highly-pragmatic extra bits for important channels (c.f. elements of the vectors) which is sort of a pathology of LLMs these days but it is what it is. Et voila, highly compressed KV Cache. If you're curious why you can randomly rotate the input, it's because all the vectors are rotated the same, so similarity works out. You could always un-rotate to get the original, but there's no need because the similarity on rotated/unrotated is the same if you compare apples to apples (with the QJL debiasing). Why was PolarQuant even published? Insu Han is solely on that paper and demanded/deserved credit/promotion, would be my guess. The blog post is chock-full of errors and confusions.


Some corrections: the vectors are un-rotated in practice for future query vectors. This could be removed with a slightly different LLM arch.

PolarQuant does live on in TurboQuant's codebooks for quantization which borrows from the hyperpolar coords


> added in some gross / highly-pragmatic extra bits for important channels

I'm curious what you meant by that. I understood it to only have the MSE quantization vector, a 1-bit QJL vector, and a scalar magnitude.

> PolarQuant does live on in TurboQuant's codebooks for quantization which borrows from the hyperpolar coords

Isn't the turbo codebook the irregularly spaced centroid grid?


> extra bits per channel

Page 18 of the paper: > As shown in Table 1, our approach outperforms other methods for both Llama-3.1-8B-Instruct and Ministral-7B-Instruct, achieving significantly higher average scores. We evaluate our method using 2.5-bit and 3.5-bit quantization during text generation. These non-integer bit precisions result from our strategy of splitting channels into outlier and non-outlier sets, and applying two independent instances of TurboQuant to each, allocating higher bit precision to outliers. This outlier treatment strategy is consistent with prior work [63, 51] . For example, in our 2.5-bit setup, 32 outlier channels are quantized at 3 bits, while the remaining 96 channels use 2 bits, leading to an effective bit precision of (32 ×3 + 96×2)/128 = 2.5. For 3.5-bit quantization, a different ratio of outliers and regular channels leads to a higher effective bit precision. Despite using fewer bits than competing techniques, TurboQuant maintains performance comparable to unquantized models

So they find channels / indicies-of-the-vector that are important and give them more bits (3 bits) than the rest (2 bits).

>Isn't the turbo codebook the irregularly spaced centroid grid?

yes I believe so. They mention it's informed by the concentration of measure and the uncorrelated/independent vectors after the initial conditioning rotation. I feel like it was informed by PolarQuant, but that may just be how I intuit what's going on (because thinking about this in polar coordinates makes more sense in my head). IOW, I think the irregular spacing is maybe informed by TurboQuant.

However they do say, slightly to the contrary: "We find optimal scalar quantizers for random variables with Beta distributions by solving a continuous 1-dimensional k-means problem using the Max-Lloyd algorithm."


Beautiful explanation, thanks!


>Oh, an EREV is fancy way to say "hybrid" ok

Kind of. EREVs are what locomotives have been doing for a century (and to a lesser extent barges), which is called diesel-electric in that field. I agree the terminology is lacking, but EREVs are quite compelling (and their high market share in China supports consumer demand).

Hybrid: * ICE must run during regular operation (except for ~very short distances at ~very slow speeds) -- this increases operational costs (oil changes, economy, engine designed for torque and wide RPM range). * Complex drivetrain with wheels moved by electric motors and ICE, axles, etc. * Generally 10-40 miles of EV range

EREV: * Basically an EV with a short range, and whenever you want to charge the battery on the go (or use the waste heat from the ICE) it can use an efficient (Atkinson cycle) engine to do so. (Though american EREVs have used poorly suited engines for parts availability and enormous towing numbers) * Generally 50-200 miles of EV range * Think "EV for daily commute; ICE for road trips (and heating)"

IMO EREVs would've been a better development path than hybrids or pure EVs.[0] Immediately lower TCO in various interest rate environments via highly-flexible battery sizes, no cold or range anxiety issues, technically simple drive train and BTMS.

[0] I mean the Prius made a lot of technical strides given the battery technology/costs and familiarity the industry had with ICE at time. Tesla went full EV which is a very optimistic approach, and works well enough if you stick around the charging network, but the batteries are still expensive and heavy compared to a small ICE + tank.


I agree EREVs make a lot of sense, electric first but not requiring a full commitment, especially for a truck that sometimes has to do things like towing.

https://insideevs.com/news/777407/scout-motors-erev-reservas...

I'm sure this wasn't lost on Ford, 80% of Scout reservations come with the EREV and only 20% BEV.

Maybe one day they will have enough volume in the segment to justify making the pure BEV version again but with parts sharing with the EREV. An advantage to EREV design is that if done smartly you can offer the same vehicle stripped down and BOOM you have a BEV too.


The problem with EREVs is they are more complex than a BEV. More parts to go wrong, to purchase, and ultimately a (potentially) higher price.

The reason to do EREVs for a manufacture is, IMO, primarily because they can't get a hold of batteries for a cheap enough price. And I think that's the weakness of the way Ford has attacked EVs. They haven't (AFAIK) really built out battery plants. As a result, they are at the whims of their supplier for their battery packs.

For a truck like the F150, that's a large pack requirement that probably ultimately likely killed their margins.

Edit OK, they've been working on a plant for the last 5 years, but it looks like they've done almost nothing. Like, literally just have some support structs up.


Studies have shown that hybrids are more reliable than ICE vehicles - it turns out that using EV mode of the time and ICE less often increases reliability. No reason an EREV shouldn’t be even better.


One factory was done, and already producing EV batteries. They're converting it to fixed energy storage:

https://www.wdrb.com/news/business/all-1-600-kentucky-batter...


Even if batteries were very cheap, you run into scaling issues where your battery pack ends up very heavy, so then you're using increasingly more energy to lug your heavier battery back around for everything that isn't long-range towing.


Are they really much more complicated than a hybrid? Think RAV4 Hybrid. I’d much prefer a fully electric drivetrain with an electric generator to the joyless CVT.


EREV is different from diesel-electric in that the EREV has a large battery whereas the diesel-electric locomotive does not. But the "ICE engine drives a generator which drives a motor" philosophy is similar in spirit.


Yes true; good point. I think this is changing (e.g. regen braking for aux. power on passenger trains maybe eventually capacitors for traction drives in the future), but currently and ~almost all the time, this is correct and a good point.


I think the term of art in the automotive space so far has been "series hybrid". But like you said, the differentiation here may just be the size of the battery. Series hybrids are still predominantly driven by fossil fuels, even if the drive is an EV drivetrain, due to the battery mainly acting as an energy buffer.

The absolute sweet spot, as someone from a country with long long distances, is a plugin series hybrid that has ~150-300km EV range and a ~60 litre fuel tank. That's getting me to work entirely electric, and then once a month when I need to see family I can chew down the fossil fuels.


Yeah, the difference is the Powerboost hybrid electric motor is only like 50 hp. I want 350 hp of electric motor that can be powered by either the battery or an onboard ICE.


I wonder about the specs though.

I recall the bmw serial hybrid was called a range extender, because the gas motor couldn't actually put out enough energy to drive the vehicle on the freeway.

So basically it was an EV with a small +xx mile extra range from the gas engine.

so no "ice for road trips", more like "ice for an additional +xx miles" then you need to recharge.

In comparison the chevy volt had a better hybrid design (not a serial hybrid) and you could drive it on gasoline only.


The i3 was bad, but Ford is planning long range towing with the EREV so it should be fine.


is there any good comparison of Hybrid vs EREV efficiency (when main battery is depleted), even with Atkinson cycle ICE for EREV? my understanding was that the main reason for all this complexity in Hybrids was due direct-to-wheel power transfer efficiency, while in EREV there's efficiency loss when converting ICE output to electric current...


Looking to the Chinese market is insightful, IMO. There's one platform for a luxury sedan, and it gets ~200mi on EV mode (~100MPGe) and then ~400mi on gas. It works out to about 70mpg purely on gas. I'm not sure how it's so high, but I'm guessing a combination of low drag (Cd), efficient small turbocharged engine (you really only need enough power maintain high speed, not accelerate up to it), and lots of regen braking.

BYD and Geely have similar systems. Their ICE are around 47% thermal efficiency so like ~double what you'd expect in a pure ICE car + regen and other bonuses.

https://carnewschina.com/2025/08/02/im-motors-launches-stell...


I guess you’d call my Chrysler Pacifica an “EREV” then.

It’s honestly perfect for us. 32 miles on a charge, we barely touch the gas except for the winter when it’s so cold out we need the engine to warm us up. Any other time and the battery is all we need, and it charges overnight on a simple 110V wall outlet. Long trips are still possible, you just drive. We go through maybe 8 tanks of gas per year with our occasional long trips (compared to having to stop at a charging station for an hour, I’ll take it.)


There have been no EREVs produced and sold yet AFAIK (though maybe BMW had a version of the i3 that did? I'm not sure). Dodge has one in the works. Ford has now announced one. The old Chevy Volt was philosophically wanting to be an EREV but was as a practical matter still a parallel hybrid.


The Volt was only "parallel" when running from gas. It was still serial in that when running from battery it only ran from battery, then switched to gas generating electricity, with some mechanical assisted torque in edge cases (usually only past highway speeds or "mountain climbing").

That was mostly because the electrical conversion from a gas generator is still so relatively inefficient and slow compared to a modern battery. The mechanical efficiency of gas engines is relatively better (which is why ICE has survived as a category for so long). Batteries are far more efficient at delivering high power on demand as needed for torque than a gas generator.

Any EREV is going to have that problem and experience those trade offs. It's a unfortunately defining part of the category. It's also why Chevy has said there's no real future in EREV power trains because they are a worst of both worlds situation with too many unfortunate trade offs to consider, such as needing to be parallel in gas-only operation edge cases to make torque requirements.


That version of the i3 definitely is one. Though the way it limits the gas tank and won't let you control it manually in the US for tax purposes sucks.


No. The ICE isn't connected to the drivetrain in an EREV; it's only used to provide power to the EV drive system.

The Pacifica is what you'd call a plug-in hybrid (PHEV) because the ICE is still connected to the drivetrain.


It relies on an “unintuitive observation”[0] that you can run batches basically for free (up to a limit). So if you only run one inference, you batch it plus a lot of guesses and, if you guess right, can speed up the inference by the number of guesses. If you guess wrong, you're back to regular speed (and still fully correct).

[0] https://x.com/karpathy/status/1697318534555336961


> does it actually make a difference for SRAM?

I have no idea in practice. But for the thermodynamic limit of actually making a difference, any irreversible change requires heat to be generated, e.g. initializing to zero, truncating, or bitshifts with discarded information. In contrast, addition/subtraction/multiplication/bitshifts without over-/under- flow will not necessarily generate heat.

https://en.wikipedia.org/wiki/Landauer%27s_principle

PS. you can also use mass-energy equivalence to extend this to calculate the lower limit of mass for a given quantity of information. TL;DR: The internet weighs 50g https://www.youtube.com/watch?v=WaUzu-iksi8


reminds me of the classic post: https://workplace.stackexchange.com/questions/93696/is-it-un...

OOC Do you use ChatGPT/Gemini via the chat interfaces, pasting context and design and then copying out?

Thanks!


Largely.

On my team for my jobs, most things are now microservices, so the entire codebase can go into the context and I can just send it the Jira ticket with some clarifications and ask what needs to be done. Follow up question is a test.


That is correct[0], as known from the iPad M4 analysis.

I will say SVT-AV1 has had some significant ARM64 performance improvements lately (~300% YoY, with bitrate savings at a given preset[1][2], so call it a 400% increase), so for many use-cases software AV1 encoding (rather than hardware encoding) is likely the preferred modality.

The exceptions, IMO, are concurrant gaming with streaming (niche on MacOS?) and video server transcoding. However, even these exceptions are tenuous: Because Apple Silicon doesn't play x86's logical core / boost clock games, and considering the huge multi-threaded performance of M4, I think streaming with SW encoding of AV1 is quite feasible (for single streams) for both streaming and transcoding. x86 needs a dedicated AV1 encoder more-so due to the single-threaded perf hit from running a multi-threaded background workload. And the bit-rate efficiency will be much better from SW encoding.

That said, latency will suffer and I would still appreciate a HW AV1 encoder.

[0] https://en.wikipedia.org/wiki/Apple_M4 [1] https://www.phoronix.com/news/SVT-AV1-1.8-Released [2] https://www.phoronix.com/news/Intel-SVT-AV1-2.0


I've been doing that with a Minisforum bd790i -- a Ryzen 7945HX which is a monster of a CPU (16 zen4 cores, with very high single threaded and multi threaded benchmarks[1]) with a paltry TDP. Minisforum even coined the term "Mobile on Desktop (MoDT)". It's a great platform that is utterly cheap for what you get (mobo w/ PCIe gen5 + CPU + heatsink for less than the price of an individual comparable CPU). Note: This CPU is usually in a laptop with a relatively underpowered cooling solution compared to the linked motherboard with a proper heatsink+fan. Hence the benchmarks have a very wide spread due to varying (usually laptop-based) cooling.

Pertinent to the conversation though, the BIOS is very much lacking, and supposedly software-based fan control is not implemented. That said, running the fans constantly at ~silent levels of rotation keeps the temps cool (you can even run the heatsink without a fan if you want).

[0] https://store.minisforum.com/products/minisforum-bd770i?vari... [1] https://www.cpu-monkey.com/en/cpu-amd_ryzen_9_7945hx


https://www.llamaindex.ai/ is much better IMO, but it's definitely a case of boilerplate-y, well-supported incumbent vs smaller, better, less supported (e.g. Java vs Python in the 00s or something like that). Depends on your team and your needs.

Also Autogen seems popular and well-ish liked https://microsoft.github.io/autogen/

LangChain definitely has the most market-/mind- share. For example, GCP has a blog post on supporting it: https://cloud.google.com/blog/products/ai-machine-learning/d...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: