Hacker Newsnew | past | comments | ask | show | jobs | submit | Kayou's commentslogin

Wait, the Q4 quantization which is more than 20GB fits in your 16GB GPU ? I didn't know that was possible, I was always restricting myself to smaller model than the VRAM I had


Yep. These Mixture of Experts models are well suited for paging in only the relevant data for a certain task https://huggingface.co/blog/moe

There's some experiments of just removing or merging experts post training to shrink models even more https://bknyaz.github.io/blog/2026/moe/


MoE is not suited for paging because it’s essentially a random expert per token. It only improves throughput because you reduce the memory bandwidth requirements for generating a token since 1/n of the weights are accessed per token (but a different 1/n on each loop).

Now shrinking them sure, but I’ve seen nothing that indicates you can just page weights in and out without cratering your performance like you would with a non MoE model


Not entirely true, it’s random access within the relevant subset of experts and since concepts are clustered you actually have a much higher probability of repeatedly accessing the same subset of experts more frequently.


It’s called mixture of experts but it’s not that concepts map cleanly or even roughly to different experts. Otherwise you wouldn’t get a new expert on every token. You have to remember these were designed to improve throughput in cloud deployments where different GPUs load an expert. There you precisely want each expert to handle randomly to improve your GPU utilization rate. I have not heard anyone training local MoE models to aid sharding.


is there anywhere good to read/follow to get operational clarity on this stuff?

my current system of looking for 1 in 1000 posts on HN or 1 in 100 on r/locallama is tedious.


Ask any of the models to explain this to you


That blog post was super interesting. It is neat that he can select experts and control the routing in the model—not having played with the models in detail, tended to assume the “mixing” in mixture of experts was more like a blender, haha. The models are still quite lumpy I guess!


llama.cpp is designed for partial offloading, the most important part of the model will be loaded into the GPU and the rest on system ram. I run 500B+ models such as DeepSeek/KimiK2.5/GLM-5 without having that much GPU vram.


How much do you use?

I have lots of trouble figuring out what the limits are of a system with x amount of vram and y amounts of ram. How do you determine this?


Ideally you'd have (parameter count) * (bits per parameter) VRAM for the entire (presumably quantized, don't forget to account for that) model. So very approximately 16 GiB for a 34B model quantized to 4 bits per parameter.

You can spill to RAM in which case you at least want enough for a single active expert but really that's going to tank performance. If you're only "a bit" short of the full model the difference might not be all that large.

These things are memory bandwidth limited so if you check out RAM, VRAM, and PCIe bandwidth what I wrote above should make sense.

Also you should just ask your friendly local LLM these sorts of questions.


I usually do ask the llm what parameters to use. But that’s why I know so little about parameters!


This is why they say "A3B" meaning only 3B is active at a time, limiting VRAM usage.


The A3B part in the name stands for `Active 3B`, so for the inference jobs a core 3B is used in conjunction with another subpart of the model, based on the task (MoE, mixture of experts). If you use these models mostly for related/similar tasks, that means you can make do with a lot less than the 35B params in active RAM. These models are therefore also sometimes called sparse models.


If you're going to use a lib like pLimit to limit the concurrency of a map, just use pMap which accepts an argument to limit the concurrency and makes it more readable and straightforward

https://github.com/sindresorhus/p-map?tab=readme-ov-file#usa...


I have thrown people under the bus while under pressure from my manager to explain why reviewing a Pull Request takes me several hours or days while other members did it in a few minutes. It was the wrong thing to do but the pressure was unbearable.


I appreciate your honesty, that's not easy to admit even on anon forums. You seem to evaluate it correctly. Just one thing - if work is such shit ie due to manager, just leave, you don't own them nothing. First try to talk to manager that this doesn't work and what can be improved to bring out your potential, but in any case search of another job should run in parallel.

I've seen so many brilliant people stuck in jobs they didn't like or even hated, when it would be trivial for them to stand up and go to next door. Yet they didn't. Don't be that guy.


Combine the current job market with, say, a work visa tied to the employer, and it turns out "just" in "just leave" does a lot of heavy lifting...


Sure, but so does the word "unbearable." If your workplace is unbearable, you will probably need to leave sooner or later - either because you found a way out or because you burned out. Whatever GGP did might've saved their hide, but it probably didn't result in a "bearable" level of stress (indeed, they said it was "wrong", so it probably gnawed at them). Stress has a way of diminishing your capacity when you need it most. Better to act sooner than later.


In that case what prevents us from building energy storage facilities that produce hydrogen at night and release the electricity back into the grid during the day? Wouldn't it be more efficient than transporting the hydrogen by truck and transporting the hydrogen battery on every train?


Yes. That would be a better idea.


but it would still be very much more expensive than just solar


Solar isn't even competing against Hydrogen.

Hydrogen is a storage technology, not really an energy source. Hydrogen competes with Li-ion batteries.

How many Li-ion batteries do you need to equal one 200-ton liquid H2 storage tank? At 120MJ per kg, 200-tons == 200,000 kg == 6GW-hrs of electricity. There's no Li-ion battery in the world that's anywhere close to that kind of storage capacity... and various researchers are aiming at 3000ton H2 storage tanks.

Erase 50% or even 80% of the energy due to inefficiency / costs of cryogenics, and you still have a bigger energy storage tank from H2 than anything possible with Li-ion. And the future of H2 is looking like 10x capacities are reasonable over the next 10 years.

-------

Whatever solar solution you were thinking, just do the same except instead of using big, expensive, heavy Li-ion batteries, replace the storage mechanism with H2 fuel cells.


Higher energy density = higher danger of releasing it unintentionally.

I totally understand why hydrogen is a great rocket fuel. You need this high energy content and the extreme lightness for high Isp.

On land, I suppose, the energy density of 100 tons of hydrogen in one place is unnecessarily high. The fact that hydrogen has no odor, and its flame is entirely infrared, invisible, does not help.

By the same token, I think that lithium batteries have excessive energy density for large-scale land applications, like buffer storage for solar / wind power. Even a lead-acid battery, with all its environmental downsides, weight, etc is at least not a major fire hazard. I suppose that large-scale electricity storage will take off when cheaper and safer, while less dense, alternatives to lithium batteries are commercialized.

As a power source for a car, a lithium battery at least is not cryogenic. OTOH on the scale of a train this may already be not a big problem. Same possibly for an oceangoing ship, but it would be terrible to start losing fuel and power if a bad storm damages the cryogenic system.

An ideal (fantastic) system could use methane and turn it into carbon, only oxidizing the hydrogen. Sadly, similar reactions only work so far with much more complex molecules.


Lithium batteries are extremely flammable. Worse, fires are extreme hard to extinguish. For trains this would be a truly huge fire.

Converting methane into hydrogen is known as steam reformation. We can easily due this, but we don't want to because it is a fossil fuel.


> The fact that hydrogen has no odor, and its flame is entirely infrared, invisible, does not help.

I thought it was UV? (Which, if anything, is even worse).


> In that case what prevents us from building energy storage facilities that produce hydrogen at night and release the electricity back into the grid during the day?

That's literally the plan?

https://www.nrel.gov/news/program/2020/answer-to-energy-stor...

Hydrogen is a newer technology than Li-ion. But yeah, its more than capable of these things. We just gotta build out pipelines and facilities to handle it.

But no. To deliver MWs of electricity to trains requires using a ton of copper on all rail-lines, as well as advanced transistors to switch that electricity around. Hydrogen storage of electricity is a good idea and is being developed, but there are innate benefits to the fuel-methodology for applications like trains.

In particular, a pipeline will likely transmit more "energy" at cheaper costs than a bundle of copper wires. Steel and concrete pipelines are just cheaper. So instead of building expensive copper wires + expensive transistors to switch electricity all around the place, why not build pipelines?

> Wouldn't it be more efficient than transporting the hydrogen by truck and transporting the hydrogen battery on every train?

Well, first off, the most efficient form of transporting Hydrogen would be a pipeline. But lets say we're dealing with a remote area so a truck is necessary.

1kg of Hydrogen has 120MJ of power, or 0.033 MW-hrs of electricity. A singular kilogram.

https://www.energy.gov/eere/fuelcells/hydrogen-tube-trailers

These trailers can store 900kg of Hydrogen. Or in other words, 30 Megawatt-hours of Hydrogen based electricity. Larger vehicles, like trains, can likely afford to carry larger containers, possibly even cryogenic liquid-hydrogen that is even more compact.

https://demaco-cryogenics.com/blog/liquid-hydrogen-storage/

Current storage tanks from NASA can hold 270 tons of liquid Hydrogen, with plans to scale to 3000 tons of liquid hydrogen storage. That's 3000000 kg, or 100 Gigawatt-hours of energy storage per tank.

--------

So yeah, the amount of H2 energy storage available far exceeds what is possible with Li-ion technology.

Pipelines will be more efficient at moving Gigajoules / dozens of MW-hours at cheap costs. Trucks and trains can carry the fuel wherever they need to go. Energy storage / Hydrogen batteries will scale to far higher capacities than Li-ion could ever dream of.

--------

Hydrogen is an incredibly light fuel. Its difficulty in transportation is __volume__ rather than its weight. Storage technologies, such as higher pressure (700-bar or higher), and liquid cryogenics are needed for H2 storage to be effective. These technologies are just becoming possible today.

So only now can we dream of what liquid-hydrogen storage tanks can offer us. Literally 100GW-hrs of energy per liquid-hydrogen tank is feasible (while *current* prototypes from NASA are holding 9GW-hrs of energy storage).


Even if diesel fuel has only about 3/8 of the energy per mass of hydrogen, it has much higher energy per volume than hydrogen.

For a locomotive of a train, the volume of storage is a much more important limitation than the mass.

Moreover, after adding the mass of the fuel containers, it is likely that diesel fuel has also a greater energy per mass than hydrogen.

Hydrocarbon fuel can also be transported by pipelines.

So none of these arguments show any advantage of hydrogen versus the traditional diesel fuel.


The idea is to create a green fuel of the future, powered by Solar Panels and Nuclear energy, and wind and hydro.

-------

I do think that green-hydrocarbons have a potential future. Green diesel would be a biofuel. But Hydrogen can turn into a hydrocarbon through Syngas synthesis (eventually turning into Kerosene and other "green hydrocarbon" fuels).

If the chemistry works out, maybe that's the future. But experimentation with pure H2 looks promising right now.


I believe that as you say, converting the hydrogen obtained by electrolysis or photolysis into hydrocarbons via syngas is the most likely future solution for all energy storage problems where batteries are inappropriate.

Even if the energy efficiency of a storage cycle is lower than when using directly the hydrogen, the savings due to easier handling and storage are huge.

Moreover, that path will allow the reuse of all the existing infrastructure for hydrocarbons, whose replacement would require a very long time and very high costs.


H2 can be transported over natural gas pipelines. At least as high as 15% hydrogen / 85% Nat Gas.

Maybe that's not enough for the long term, but that mix/ratio should be sufficient to bootstrap the fledgling H2 industry.

If Syngas / synthetic diesel becomes more efficient in the future, we switch to that. I'm not against experimentation or tests.


Thanks for the article on energy storage with hydrogen to resolve peaks on the grid. Seems like a nice solution on paper, although ineficient according to this article citing someone from the MIT, round-trip efficiency of 18%-46%.

https://www.spglobal.com/marketintelligence/en/news-insights...


Considering that night-time energy from nuclear plants, as well as peak 12pm noon energy from Solar Power plants, are going wasted right now, 18% round-trip efficiency is better than the 0%-efficient fully wasted energy that we're doing right now.


But is the liquid hydrogen that leaks and have fire hazard that delay the nasa launch now?


Same on MacOS and Firefox with a 4K screen ans same on Firefox and a Pixel 6, scrolling lags hard on the section with pictures. Making a bad first impression of this new OS.


> Making a bad first impression of this new OS.

This is more than a little ridiculous and you know it.


What lense do you have to take your picture ? You might want to zoom a bit to make it look like you see it, at least a 100mm equivalent I would say.


I think this is what separates a good photographer from... me.

I've tried various zoom levels before—and aspect ratios, focal lengths, etc—but I can never capture in an image what I'm seeing with my eyes. Either the enormous mountain is a tiny feature off in the distance, or it fills the frame and all context is lost. I can't seem to find a framing that communicates both the grandness of the subject, and the larger context it's situated in.

Obviously a 2D, cropped image of the landscape is going to have to lose information compared to my 3D, panoramic view of it. But I also know I've seen good photos of these types of things. What are those photographers doing to capture that?


It may help to know that it isn't easy to do.

Two things that might help your images say what you'd like them to say:

1) For depth, try making images that have a "foreground, middle-ground, and background". The 24-28mm-equivalent lenses on smartphones are a perfect training ground for this kind of composition, as it is easier to select foreground elements.

2) Dodging and burning: The human eye is drawn to bright parts of an image. Gently darkening things that are less-important and gently highlighting things (and paths) that are more important can have a huge impact on the perception of an image. The Snapseed app, again on a smartphone, offers a very-intuitive interface (look for the "brush" tool) for learning to dodge and burn.


> I can never capture in an image what I'm seeing with my eyes. Either the enormous mountain is a tiny feature off in the distance, or it fills the frame and all context is lost

Try adding a sense of depth by having a foreground, middle and back.

Look at good landscape photos of mountains or other large features and you’ll see they almost always do this. By having near, mid and far elements of interest you add a sense of scale to the photo.


Images are often stacked too to achieve proper focus throughout the picture. A lot of photos you see aren't physically possible to get in 1 shot.

https://photographylife.com/landscapes/focus-stacking-tutori...


Back when SlR cameras were newish, they came with 55mm lenses. That seemed to match what one eye sees. (You couldn’t look though the camera and open the other eye and it seemed to work.) I would have thought wider as You can see more than the 50mm lens shows you.

https://en.m.wikipedia.org/wiki/Normal_lens#The_problem

But 85mm soon became my most used lens. I had a 135mm to but it always seemed too long (on not long enough)


100mm is far, far tighter than what the human eye sees. You need to be closer to 40-50mm on a full frame.


Yeah, back in the day I would always be afraid to fry my device when plugging a 3rd party charger, because of the polarity or voltage. Now worst thing that can happen is... nothing. For me that's an improvement.


Inner city Paris is limited to 30km/h, I don't think going 25 km/h would be a problem.


Sure enough for the power consumption of the modems, but I don't think the 5G networks are solar powered in Australia. Using multiple networks at the same time will use more power than using only one.


Does a 5G/4G tower consume more energy when one extra device is connected to it (for redundancy), but not actually transferring data? The answer might be yes, but it'd be very marginal.


Yeah, and I think it really shows that WhatsApp, which wasn't developped by Facebook originally, doesn't benefits from the compression. The app seems very optimized originally, but maybe I'm wrong.


Also see the stark contrast with their Facebook Lite app. Makes you wonder what difference all those megabytes in the full app actually make...


Append-only development


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: