I am running this on 16 GB AMD Radeon 7900 GRE with 64 GB machine with ROCm and llama.cpp on Windows 11. I can use Open-webui or the native gui for the interface. It is made available via an internal IP to all members of my home.
It runs at around 26 tokens/sec and FP16, FP8 is not supported by the Radeon 7900 GRE.
I just love it.
For coding QwQ 32b is still king. But with a 16GB VRAM card it gives me ~3 tokens/sec, which is unusable.
I tried to make Gemma 3 write a powershell script with Terminal gui interface and it ran into dead-ends and finally gave up. QwQ 32B performed a lot better.
But for most general purposes it is great. My kid's been using it to feed his school textbooks and ask it questions. It is better than anything else currently.
Somehow it is more "uptight" than llama or the chinese models like Qwen. Can't put my finger on it, the Chinese models seem nicer and more talkative.
I have a 7900 GRE, which is the same except less memory. I run Gemma 3, LLama 3.1, the QwQ models and the DeepSeek distilled models using llama.cpp. They run fine, I especially like the new Gemma3-27b-Q6 (20 GB model), I get 2 tok/s on it.
I have also run Hunyuan3d-2 and generated 3d models. You would've to separate out the model generation and texture generation phase, but it works.
I run ComfyUI and bootleg gguf models. This is all on windows. Now even WSL2 works, so I am using Ubuntu-24.04 on Windows 11 to run Hunyuan3D-2.
For LLMs, llama.cpp native binaries are available. Everything just works out of the box.
I am impressed, it runs very fast. Far faster than the non-turbo version. But the primary time is being spent on the texture generation and not on the model generation. As far as I can understand this speeds up the model generation and not the texture generation. But impressive nonetheless.
I also took a head shot of my kid and ran it through https://www.adobe.com/express/feature/ai/image/remove-backgr... and cropped the image and resized it to 1024x1024 and it spit out a 3d model with texture of my kid. There are still some small artifacts, but I am impressed. It works very well with the assets/example_images. Very usable.
I don't think so. FIR filters can be unrolled and parallelized over the data. These are definitely possible to do on GPU to great effect. But, IIR filters constantly depend on the output of the prior time step, so you can't unroll anything. These would probably be faster to simulate on the CPU.
I think a valsalva maneuver keeps intra-thoracic pressure high by closing up a bit lower the air pathways? At least that is how I do it, it feels like it it bottled up at about the place my larynx site.
You could use signal for messaging. But why not SMS? It is not like text messaging is owned by anyone like WhatsApp or Telegram.
I found that my only requirement for a smart phone is the Uber and Lyft apps. If I can get somehow free of them I can switch to a Nokia 1100 and be happy.
> Also there are times when I ask a specific question like "What does such and such a library function return?" and they'll say "two strings" but don't know if it's a list or a tuple, of if it ever returns None, or whether it raises an Error etc.
I am curious to understand why you need to do this step? Is it possible to have a more hands off approach?
Give them a task and a deadline and then let them come back with the work. If they have a problem, then they can come to you, but if they have not done their home work, then they need to solve it themselves. Is it possible this is too much of involvement in their work?
> So at that point I'll open up the library documentation and read it with them. But they will never on their own initiative open the docs.
How about telling them that they need to look it up (either in front of you or after they need to return to their station)? Would that work for you?
> I'm struggling a bit to collaborate with them because I have a strong bias towards reading (docs or books) to understand and they seem to have the exact opposite. They seem almost sad when I send them a link to raw information.
Maybe they associate this and what follows with a negative experience? Is it possible to take a more hands-off approach? Sometimes it helps to appear dumb to the person we are mentoring. Let them figure out the steps that are figure-out-able like reading API docs etc., Only help in steps that really need your expertise and experience, but not effort.
I think this is because this is how the current generation thinks. I am mentoring some interns and my own kid thinks looking up YouTube for the solution is the right thing to do. They in fact search on the YouTube app on their phone and then go to Google.
It runs at around 26 tokens/sec and FP16, FP8 is not supported by the Radeon 7900 GRE.
I just love it.
For coding QwQ 32b is still king. But with a 16GB VRAM card it gives me ~3 tokens/sec, which is unusable.
I tried to make Gemma 3 write a powershell script with Terminal gui interface and it ran into dead-ends and finally gave up. QwQ 32B performed a lot better.
But for most general purposes it is great. My kid's been using it to feed his school textbooks and ask it questions. It is better than anything else currently.
Somehow it is more "uptight" than llama or the chinese models like Qwen. Can't put my finger on it, the Chinese models seem nicer and more talkative.