Hacker News new | past | comments | ask | show | jobs | submit | dc443's comments login

What i don't get is why people gravitate toward trying to show off how many symbols they're able to manipulate in their brain without screwing something up.

It's a computer. It does what it was instructed to do, all 50 million or so of them. To think you as a puny human have complete and utter mastery over it is pure folly every single time.

As time goes on I become more convinced that the way to make progress in computing and software is not with better languages, sure, those are very much appreciated, since language has a strong impact on how you even think about problems, but it's more about tooling and how we can add abstractions to the software to leverage the computer we already got to alleviate the eye gouging complexity of trying to manage it all by trying to predict how it will behave with our pitiful neuron sacs.


Don't forget about naming variables like it's a punchcard and every character matters.

Well he DOES have a threadripper now.

And I'm sure the one thread used by the rust build will be blazing fast.

slightly off topic but I would like to understand better why there is handwringing going around about cryptosystems being broken by future quantum computers. Don't we already have quantum resistant cryptosystems? Why not just switch to them across the board?

> WGPU doesn't support multiple threads updating GPU memory without interference, which Vulkan supports.

This is really helpful for me to learn about, this is a key thing I want to be able to get right for having a good experience. I really hope WGPU can find a way to add something for this as an extension.


Do you know if these things I found offer any hope for being able to continue rendering a scene smoothly while we handle GPU memory management operations on worker threads?

https://gfx-rs.github.io/2023/11/24/arcanization.html

https://github.com/gfx-rs/wgpu/issues/5322


The actual issue is not CPU-side. The issue is GPU-side.

The CPU feeds commands (CommandBuffers) telling the GPU what to do over a Queue.

WebGPU/wgpu/dawn only have a single general purpose queue. Meaning any data upload commands (copyBufferToBuffer) you send on the queue block rendering commands from starting.

The solution is multiple queues. Modern GPUs have a dedicated transfer/copy queue separate from the main general purpose queue.

WebGPU/wgpu/dawn would need to add support for additional queues: https://github.com/gpuweb/gpuweb/issues?q=is%3Aopen+is%3Aiss...

There's also ReBAR/SMA, and unified memory (UMA) platforms to consider, but that gets even more complex.


> The solution is multiple queues. Modern GPUs have a dedicated transfer/copy queue separate from the main general purpose queue.

Yes. This is the big performance advantage of Vulkan over OpenGL. You can get the bulk copying of textures and meshes out of the render thread. So asset loading can be done concurrently with rendering.

None of this matters until you're rendering something really big. Then it dominates the problem.


I believe you can do loading texture data onto the GPU from another thread with OpenGL with pixel buffer objects: https://www.khronos.org/opengl/wiki/Pixel_Buffer_Object

I haven't tried it yet, but will try soon for my open-source metaverse Substrata: https://substrata.info/.


It is possible but managing asynchronous transfers in OpenGL is quite tricky.

You either need to use OpenGL sync objects very carefully or accept the risk of unintended GPU stalls.


Yeah you need to make sure the upload has completed before you try and use the texture, right?


Yes, and you need to make sure that the upload has completed before you reuse the pixel buffer too.

And the synchronization API isn't very awesome, it can only wait for all operations until a certain point have been completed. You can't easily track individual transfers.


Thank you. I hope to see progress in these areas when I visit later. I was hoping to be able to go all in on wgpu but if there are still legitimate reasons like this one to build a native app, then so be it.


It depends on your requirements and experience level. Using WebGPU is _much_ easier than Vulkan, so if you don't have a lot of prior experience with all of computer graphics theory / graphics APIs / engine design, I would definitely start with WebGPU. You can still get very far with it, and it's way easier.


Short version: hope, yes. Obtain now, no.

Long version: https://github.com/gfx-rs/wgpu/discussions/5525

There's a lock stall around resource allocation. The asset-loading threads can stall out the rendering thread. I can see this in Tracy profiilng, but don't fully understand the underlying problem. Looks like one of three locks in WGPU, and I'm going to have to build WGPU with more profiling scopes to narrow the problem.

Very few people have gotten far enough with 3D graphics in Rust to need this. Most Rust 3D graphics projects are like the OP's here - load up a mostly static scene and do a little with it. If you load all the content before displaying, and don't do much dynamic modification beyond moving items around, most of the hard problems can be bypassed. You can move stuff just by changing its transform - that's cheap. So you can do a pretty good small-world game without hitting these problems. Scale up to a big world that won't all fit in the GPU at once, and things get complicated.

I'm glad to hear from someone else who's trying to push on this. Write me at "nagle@animats.com", please.

For a sense of what I'm doing: https://video.hardlimit.com/w/7usCE3v2RrWK6nuoSr4NHJ


What I do currently is just limit the amount of data uploaded per frame. Not ideal but works.


That works better in game dev where you have control over the content. Metaverse dev is like writing a web browser - some people are going to create excessively bulky assets, and you have to do something reasonable with them.


It works with large assets too. Just split the upload into chunks.


Do you have any references? I thought all wgpu objects are wrapped with an Arc<Mutex<>>.


That sounds wild to me. In c++ I remember a time where I increased the frame rate of a particle renderer from 20fps to 60fps+ simply by going from passing shared_ptr (Arc equivalent) to passing references.


Nevermind. Just an Arc<>.


I wonder how much less ergonomic is is for getting there via Vulkan. For the ray tracing shaders.


Damn that is pretty impressive. I just drive my 3d printer and run the unifi controller interface with mine. Thanks for the inspiration.


I've been hoarding apple devices too. I have a 12 inch macbook, recently brought it up to Ventura. But the blasted thing won't hold a charge. The battery is fine over 90% health but something is causing power drain while it sleeps even when I try to hibernate it.

It's impossible to find a use case for the thing. I might have to sell it.


I have 2x 3090 do you know if it's feasible to use that 48GB total for running this?


Yes, it runs totally fine. I ran it in Oobabooga/text generation web ui. Nice thing about it is that it autodownloads all necessary gpu binaries on it's own and creates a isolated conda env. I asked same questions on the official 70b demo and got same answers. I even got better answers with ooba, since the demo cuts text early

Ooobabooga: https://github.com/oobabooga/text-generation-webui

Model: TheBloke_Llama-2-70B-chat-GPTQ from https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ

ExLlama_HF loader gpu split 20,22, context size 2048

on the Chat Settings tab, choose Instruction template tab and pick Llama-v2 from the instruction template dropdown

Demo: https://huggingface.co/blog/llama2#demo


Is there any specific settings to make 2x3090 work together?


Not really? I just got those cards in separate PCI slots and the Exllama_hf handles spreading the load internally. No NVLink bridge in particular. I use the "20,22" memory split so that the display card has some room for the framebuffer to handle display


Do you mean you don't use NVLink or just use one that works? I am under the impression it is being phased out ("PCIe 5 is fast enough") and some kits don't use it.


I don't use NVLink


Interested in this too


I'm very curious what your other components are and how you managed to fit 2 3090s in one PC.


Ouch. I got this wrong and was under the impression that GPT4 got this wrong for half an hour, and then figured out after reading it again after returning from a walk that this is one hell of a trick question. My brain automatically assumed that a man's widow is the man's dead wife, but I see that the correct way to interpret this is to realize that it means the man is the one who is dead.

It's pretty awesome to realize that from now onward my computers are going to be able to help catch more and more of the holes that clearly exist in my cognition.


There should be an option to control and limit the severity of flashing that is brought about with the flash light for notifications accessibility setting. I like to use it because I don't want to make my phone obnoxiously loud, because i can't hear its vibration, and I still want to get a chance to perceive the notifications, but usually the light is too damn bright and I do worry it could trigger epilepsy in innocent passersby. Having an alternative flashing behavior like a smooth pulsation would be excellent.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: