Hacker Newsnew | past | comments | ask | show | jobs | submit | xmcqdpt2's commentslogin

I wonder if it's related to the fact that Windows as such weird rules about allowed file names. Like not directly obviously, more like culturally inside microsoft.

I’m pretty sure Azure was built out with Hyper-V, which was built into the Windows kernel. So everything that relied on virtualization would’ve had bizarre case insensitivity and naming rules.

I’ve lost track of servers in Azure because the name suddenly changed to all uppercase ave their search is case sensitive but whatever back-end isn’t.


Isn't case insensitivity a Win32 thing only? I would not expect it to impact stuff in Hyper-V or the windows kernel. AFAIK for example NTFS is case-sensitive.

NTFS supports case-sensitivity, but if you have case-sensitive distinctions in a directory that's marked case-insensitive, bad things happen. (Those bad things are probably entirely deterministic, theoretically-predictable, and documented in one of Raymond Chen's big books of Windows sadness, but that doesn't mean I want to deal with them as a mere mortal.)

I would not dismiss something like that directly being the cause. Not the reason you can't name a file "CON" on Windows, but it's very likely some weird ass thing they were stringing together with Windows Server and Hyper-V and SMB backed them into the corner we're all in now

is it most as an 50% of individual jobs? or able to produce 50% dollar for dollar?

what does "economically" means here? would it cover teaching? child care? healthcare? etc.


By the definition above, it is possible to have AGI that is also much more expensive to run than human engineers.

I'd much much rather the model write the code blocks than the prose myself. In my experience LLM can produce pretty decent code, but the writing is horrible. If anything I would prefer an agentic tool where you don't even see the slop. I definitely would rather it not be committed.

I thought they picked it specifically because it is gender neutral, but now I double checked and apparently it's only gender neutral in French,

https://en.wikipedia.org/wiki/Claude_(given_name)


> Since the code in question was doing IO that you knew could fail handling the situation can be as simple as setting a flag from within the signal handler.

If you are using mmap like malloc (as the article does) you don't necessarily know that you are "reading" from disk. You may have passed the disk-backed pointers to other code. The fact that malloc and mmap return the same type of values is what makes mmap in C so powerful AND so prone to issues.


Yes, and for writing (the example is read-write) it's of course yet another kettle of fish. The error might never get reported at all. Or you might get a SIGBUS (at least with sparse files).

Maybe all the users are OpenClaw instances?

There is no "why." It will give reasons but they are bullshit too. Even with the prompt you may not get it to produce the bug more than once.

If you sell a coding agent, it makes sense to capture all that stuff because you have (hopefully) test harnesses where you can statistically tease out what prompt changes caused bugs. Most projects wont have those and anyway you don't control the whole context if you are using one of the popular CLIs.


If I have a session history or histories, I can (and have!) mine them to pinpoint where an agent either did not implement what it was supposed to, or understand who asked for a certain feature an why, etc. It complements commits, sessions are more like a court transcript of what was said / claimed (session) and then you can compare that to what was actually done (commits).

Some of my sessions are over 1GB at this point. I just don't think this scales usefully or meaningfully. Those things should live as summarized artifacts within issue tracking IMHO

It's not reproducible though.

Even with the exact same prompt and model, you can get dramatically different results especially after a few iterations of the agent loop. Generally you can't even rely on those though: most tools don't let you pick the model snapshot and don't let you change the system prompt. You would have to make sure you have the exact same user config too. Once the model runs code, you aren't going to get the same outputs in most cases (there will be date times, logging timestamps, different host names and user names etc.)

I generally avoid even reading the LLM's own text (and I wish it produced less of it really) because it will often explain away bugs convincingly and I don't want my review to be biased. (This isn't LLM specific though -- humans also do this and I try to review code without talking to the author whenever possible.)


That would be great because "I got it from Wikipedia and Arxiv" isn't exactly useful.

From reading your second link (and please tell me if I got it wrong) it sounds like it isn't actually tracking to training data but to prototypes which are then linked a posteriori to likely sections of the training data. The attribution isn't exact, right? It's more like "these are the likely texts that contributed to one of those prototypes that produced the final answer." Specifically the bit in PRISM titled "Nearest neighbour Search" sounds like you could have a prototype that takes from 1000 sources but 3 of them more than the others, so the model identify those 3, but the other ones might matter just as much in aggregate?

It says that the decomposition is linear. Can you remove a given prototype and infer again without it? That would be really cool.


This part of the claim is involved, so we have future posts to clarify this. And yes, you can remove a prototype and generate again. We show examples in that prism post.

In prism, for any token the model generates, you can say, it generated this token based on these sources. During training, the model is 'forced' to match all the prototypes to specific tokens (or group of tokens) in the data. The prototype itself can actually be exactly match to a training data point. Think of it like clustering, the prototype is a stand-in for training data that looks like that prototype, we force (and know) how much the model will rely on that prototype for any token the model generates.

The demo in the post is not as granular because we don't want to overwhelm folks. We'll show granular attribution in the future.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: