Hacker Newsnew | past | comments | ask | show | jobs | submit | the_harpia_io's commentslogin

escaping bugs in llm-generated code are weirdly hard to catch on review because the logic looks fine - it's the edge cases that are off. had a similar (much less dramatic) thing with a cleanup script that worked fine on ci but went sideways on a dev machine with spaces in the path. nothing wiped but it was close enough that i started testing path handling separately.

the tricky part is the model isn't really "wrong" in any obvious sense. works on most inputs. it just doesn't know what your actual directory structure looks like.


the scheduling is the tell. 17 commands every 30 min isn't analytics or crash reporting - that's systematic fingerprinting with a consistent cadence.

what's frustrating is this is basically invisible without running in a monitored environment. static analysis won't surface it. you'd need behavioral monitoring - network traffic plus syscall tracing - to even know it's happening.

seen similar patterns in CI/CD tooling actually. less blatant but same mechanism - process phoning home way more often than you'd expect, commands that look like routine system auditing. most devs assume third-party tools behave themselves.


mostly for the boring stuff - layout scaffolding, new component boilerplate, converting figma specs. works well enough there.

falls apart with anything stateful or that requires understanding the rest of the codebase. hallucinates imports, ignores how you've actually structured things. spent more time correcting context issues than i saved.

the part that bugs me more is the output looks clean but has subtle problems. it'll reach for innerHTML where it shouldn't, handle user input in ways that aren't production-safe. easy to miss in review when you're moving fast and the code looks confident


the copy paste concern is the most interesting bit honestly - even when it's not literally copy-pasting, AI error handling often looks correct but silently eats exceptions or returns wrong defaults. it gets the structure but misses what the code actually needs to do.

the boilerplate stuff is spot on though. the 10-type dispatch pattern is exactly where i gave up doing it manually


honestly the AMD-first bit surprised me - usually ROCm support is an afterthought or just broken outright.

curious about BVH traversal specifically. dynamic dispatch patterns across GPU backends can get weird fast. did KernelAbstractions hold up there or were there vendor-specific fallbacks needed for the heavier acceleration structure work?


Well I'm a bit of an AMD "fanboy" and really dislike NVIDIA's vendor lock in. I'm not sure what you mean by dynamic dispatch across GPU backends - nothing should be dynamic there and most easier primitives map quite nicely between vendors (e.g. local memory, work groups etc). To be honest, the BVH/TLAS has been pretty simple in comparison to the wavefront infrastructure. We haven't done anything fancy yet, but the performance is still really good. I'm sure there are still lots of things we can do to improve performance, but right now I've concentrated on getting something usable out. Right now, we're mostly matching pbrt-v4 performance, but I couldn't compare to their NVIDIA only GPU acceleration without an NVIDIA gpu. I can just say that the performance is MUCH better than what I initially aimed for and it feels equally usable as some of the state of the art renderers I've been using. A 1:1 comparison is still missing though, since it's not easy to do a good comparison without comparing apples to oranges (already mapping materials and light types from one render to another is not trivial).


pbrt-v4 parity is a solid baseline - that codebase already leans hard on NVIDIA so a fair comparison was always going to be messy. surprised wavefront was the harder bit though, i'd have expected BVH tuning to be the nightmare.


To be fair I was suprised too. But I made a relatively simple straight port from the AMD rays sdk plus some input from the pbrt-v4 CPU bvh code and it just worked relatively well out of the box... This is the main intersection function which is quite simple: https://github.com/JuliaGeometry/Raycore.jl/blob/sd/multityp... I'm not even using local memory, since it was already fast enough ;) But I think we can still do quite a lot, large parts of the construction code are still very messy, and I want to polish and modularize the code over time.


makes sense honestly - straight port from a solid SDK beats reimplementing everything from scratch. local memory optimization is one of those rabbit holes anyway. construction code being messy is just that stage of the project


honestly the isolation here is kind of interesting. each call gets a clean env - no shared state, no "function A leaked into B's heap" bugs. pure functions taken to the logical extreme.

cgi was mocked for fork-per-request. lambda said sure, vm-per-invocation. this is just the next step down that road


the risk tiering framing is the most useful thing i've seen from this retreat content, tbh. it maps directly to how ai-generated code review actually works - you can't give equal weight to 300 lines of generated scaffolding, so you triage by risk class. auth flows, anything touching external input, config handling - slow lane. the rest gets a quick pass.

the part that's tricky is that slow lane and fast lane look identical in a PR. the framework only works if it's explicit enough to survive code review fatigue and context switching. and most teams are figuring that out as they go.


the LLM training data angle is real - we noticed the same thing when reviewing AI-generated Go code. the style is fine to fix automatically, but the security patterns are harder. LLMs trained on pre-1.21 code will happily generate goroutines with shared mutable state and no context propagation, which go fix doesn't touch. honestly not sure if the solution is more tooling or just better review habits for AI-generated code specifically


the ecosystem fragmentation thing hit me pretty hard when i was trying to set up a consistent linting workflow across a mono-repo last year. half the team already using biome, half still on eslint+prettier, and adding any shared tooling meant either duplicating config or just picking a side and upsetting someone

i get why the rust/go tools exist - the perf gains are measurable. but the cognitive overhead is real. new engineer joins, they now need 3 different mental models just to make a PR. not sure AI helps here either honestly, it just makes it easier to copy-paste configs you don't fully understand


the worst part isn't even the slop itself - it's that it kills the signal. I used to skim slack threads and get the gist in 30 seconds. now half the messages are these perfectly structured five-paragraph responses to a yes/no question and I just... stop reading.

honestly I think the root cause is that most corporate communication shouldn't exist at all. the AI just makes it easier to produce more of nothing. before chatgpt people would write a two-line email that said the same thing, and that was fine.

what worked for our small team was pretty blunt - we just started replying 'tldr?' to anything that felt padded. no accusation about AI, just 'this is too long for what it's saying.' took about two weeks before people started self-editing. not sure that scales to a big org tho


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: