Hacker Newsnew | past | comments | ask | show | jobs | submit | andybak's commentslogin

Isn't one of the common criticisms of AI code that it's a bit too keen to write extensive comments - even when they probably aren't needed?

The default for Claude today is "no comments". Here's a conversation I had with Claude about it recently:

---

Me: I've notice that Claude is deleting my comments and tends not to insert many of its own. I don't want to go too crazy, but can we modify CLAUDE.md to dial up the comments a little? Like, try not to delete comments that are still relevant, and add a few explanatory comments.

Claude: [... proposal for modifying CLAUDE.md ...]

Claude: Want me to apply this, tweak the wording, or place it differently? Note that your global ~/.claude/CLAUDE.md-style instructions (visible to me as system guidance) currently say "default to writing no comments" — the project-level rule above will override that for this repo, which is what you want.

Me: Where is this rule that says "default to writing no comments"? That seems crazy?

Claude: That text is part of the system prompt I'm given by Claude Code itself — it's not in any file in your project or in your ~/.claude/ config. It's baked into the harness's default instructions to the model.

Claude: Specifically, two lines from my system prompt:

Claude: Under "Doing tasks": "Default to writing no comments. Only add one when the WHY is non-obvious..."

Claude: Under "Text output": "In code: default to writing no comments. Never write multi-paragraph docstrings or multi-line comment blocks — one short line max."


Well the comments tend to be superfluous "whats" (describing the code itself) instead of the more helpful "whys." And they're almost never the most useful "why nots".

That's different to my (recent) experience. I get fairly insightful "whys". At least on bugfix commits.

> And they're almost never the most useful "why nots".

Really? I find that Claude really likes to write "why nots" in comments when iterating on implementations and fixing bugs, to the extent that the comments grow into spot-logs of overly-specific documentation of what was tried and why it was scrapped.


IME the comments are not useful though. I'm not the biggest fan of AI code, but the codes still way higher quality than the "this is what this does" comments

Personally I've found one of the biggest gains with coding agents is in helping me read code. Actually - that's a lie. I don't read the code. Mostly (unless my spidey-sense goes off) I ask the LLM to read the code and tell me what it does.

And then I make a decision based on that.

I guess I'm wondering if the article is missing half the picture. Yes - AI is wrong some of the time (and that % varies based on a host of variables). But it can read code as well as just write it. And that does matter as it changes the trade-offs this article is weighing up.


It's been pretty great for ramping up into codebases too. "Give me a summary of project in current checkout in markdown form."

do you enjoy reading markdown? I don't, even with the editor in preview mode.

I'm actually asking claude to make me eli5 htmls with svg diagrams of the code I want to understand.

Once I even asked him to create for me a manim video about it.


The "It’s harder to read code than to write it" was always silly. The example spolsky gives to support it is basically "devs like to rewrite other devs code, therefore reading is hard" which is obviously bunch of nonsense. That's like saying reading poetry is harder than writing poetry because poets keep writing new poems despite the fact that Shakespeare already wrote it. Now that you can recruit LLM to explain any complicated codebase to you it's even less true.

This isn't a great analogy. The thing about code is it is part of a whole. While often code can be read in smaller pieces and understood, quite often you have to understand a very large part, if not all the work to really see what is going on.

This is why things like SAST are topical. They miss all kinds of exploits because they don't understand the program. The more in depth you try to scan the more the memory requirements explode.

Now LLMs are much better at this, but between context windows and costs you can bankrupt yourself pretty quick putting code bases in context memory.


This analogy directly addresses spolsky’s botched argument. The point is devs went into this career to dev not read other people’s code whether it makes business sense or not. What you said applies to writing code just as well as reading it so clearly writing cant be easier than reading - it’s at least as hard and most definitely harder

There are times when reading the code is necessary, but oh boy are LLMs so much faster at finding the part of the codebase I want to read.

Several projects I work on call into or interact with gigantic codebases. A couple years ago I would have to allocate 30 minutes some times to either trace through the code base or setup and attach a debugger to step through the code until I found the part I needed to know about.

Now I send an LLM to go find it and it comes back with a list of files and line numbers in minutes.

It’s still not perfect. I had a codebase walk this morning where even GPT-5.5 extra high failed to find something I knew was in there on the first pass. It got it with some more directed prompting. If you delegate trust 100% to an LLM you will get bit eventually.


Exactly, and that's why this maxim about "understanding the code base" being the bottleneck is also somewhat misleading.

Claude is even better at helping you understand the code base then it is at writing code! It can look at a bunch of files and give you an accurate run down in ten minutes.


I like to constrain it as much as possible to ignore variables and function names; the human stuff.

With a new code base my first goal understand how it CRUDs state. What structures and what operations?

Less concerned at the outset if its sorting carrots or processing orders for shirts.

AI seems to work way better for me when I tell it to ignore the use case and focus on surfacing runtime and mathematical operations embedded in the code.

Constraints on new abstraction and telling it to stick to math terms and types and objects also seems to help avoid hallucinations and layers of useless indirection.


It lets you understand the code base at reduced granularity when you want. Or zoom in beyond the written lines to explain _why_ some code is the way it is.

I asked Claude to tell me why something was implemented the way it was, and got an excellent response. One data point, would love to hear more examples.

I agree, Claude has been pretty great at explaining code. It even does well at explaining to me old code that I wrote by hand — including both non-intuitive quirks and flaws.

Claude in its default configuration has untapped potential for explaining and documenting code because it defaults to writing no comments. I added this to my global CLAUDE.md and so far so good:

    # Comments

    Keep existing comments unless they're wrong
    or stale. Err slightly toward adding short
    explanatory comments for non-obvious code.

Why not just read the code.. Ya'll are literally letting your brain atrophy by using llms this way.

Despite you using "literally" purely for emphasis, I would dispute this based on both the literal and popular meaning of "literally".

How do I know this comment isn't just AI?

Why not just read the assembly?

As someone interested in 3D and geometry but with no interest in radio - I find the naming clash most irritating!

If only I had a penny for every HN comment about naming conflicts.

Well, it is one of the 2 hard problems...

My projects could genuinely benefit from telemetry as I have no idea about usage patterns and my community (mainly artists) is not famous for maintaining a close dialogue with software developers.

I haven't bothered because a) opt-out risks a backlash and b) opt-in affects the data so much it becomes useless (much smaller sample and probably self-selecting a certain type of user)

Skimming the comments here, it seems everybody assumes telemetry is always nefarious. I get the distrust of large corporations and other obvious bad actors - but the blanket cynicism for all telemetry here is kinda surprising. Have none of the developers here ever had a need for it themselves?


I’m sympathetic to both the default distrust and to devs like you who want telemetry to improve their software and won’t use the data for anything else, but it is because of bad actors and enough dark ad patterns that we just can’t trust companies to play nice, and it’s too difficult to expect people to scrutinize each and every app or site individually. So I get why the default assumption is nefarious behavior.

But you’re totally right - telemetry & crash dumps & analytics are helpful & great for devs who care about the customer UX and don’t use the data for advertising or anything other than fixing & writing good software, so it’s a real kind of tragedy of the commons that we can’t have safe, trustworthy, and pro-consumer telemetry.

I went from building a web app that used Google Analytics and some other kinds of anonymous telemetry (and using that data only for identifying functional software & site issues), to building driver software that absolutely cannot send data out, and I wish for telemetry all the time. Not only is it difficult to understand what users are doing, they usually don’t even know themselves and can’t tell me what happened when things crash. The result is that turnaround times for critical issues are in months, when it could be days or hours if we had crash dumps and analytics, the lack of automated reporting hurts users.

I’m not sure there’s a way to separate the good from the bad, to designate some kinds of telemetry as safe and to be able to trust it while disallowing the stuff we don’t want. If that were somehow possible, if anyone has ideas, I would love to help figure out how to make it a reality.


The best way is to collect logs & crash reports locally, and if the app crashes you offer an option to send the report directly to you.

That's what I do in my apps. And it turns out, that actually increased the quality of the bug reports I got, because users were more invested and willing to cooperate.


Telemetry only tells you what users do, not why and doesn't explain their mental models. Try asking directly: open a discussion board (for example Github's Discussions) and encourage them to post about aspects of the software they found puzzling/annoying/inefficient. Take 15 minutes a week to go through the posts to see if anything attracts your attention.

Normal users don't register on a discussion board to tell about what went well during a normal day.

People only bother when something has made them really angry about something and need to vent.

This is why default analytics is the correct option. It gets the average people who don't care about forums and usually won't even bother to change many of the settings. The crowd who doesn't open HN first thing in the morning.


> I get the distrust of large corporations and other obvious bad actors […]

> the blanket cynicism for all telemetry here is kinda surprising

Who's providing the telemetry/analytics if not one of same large corporations?

Many devs say they care about user privacy, but very few seem to care enough not to farm surveillance out to a 3rd-party they have no control over.


> Who's providing the telemetry/analytics if not one of same large corporations?

Erm. It would be me? The idea was that the app (not a web app btw) would send back data about which features were being used (to a server I control) so I could build up a picture of how often various features were being used relative to other features. Nothing remotely personally identifiable.


Opt-in data is "useless"

That's one I have not heard before

Useless for what

Targeting a certain "type of user" perhaps

"I get the distrust of large corporations and other obvious bad actors - but the blanket cycnicism for all telemetry here is kind surprising"

There is effectively no way for a user to determine whether an actor is "bad" or "good" and that definition may vary depending on the user

The user cannot verify how the data might be used or where it might be transferred. As such, there is almost zero incentive for the data collector not to engage in malfeasance (as the user defines that term); deterrents are lacking

Perhaps there is irony in criticising "blanket" cynicism whilst arguing for "default" telemetry. Both suffer from the same "one size fits all" error


"Another idea was to have a prompt asking if you wish to upload the log or not, after every crash. Apart from the extra implementation time, players will still often click "Don't send", either because that's what they have been doing for many years or because they just want to quickly get back in the game or because they feel that the crash was somehow their fault."

https://factorio.com/blog/post/fff-231

Here, the developer is trying to infer user reasoning and intent

Curiously, he omits the possibility that users would prefer not to send the data

When in fact this is exactly what users indicate they prefer

He pretends that "Don't send" is ambiguous, for example, that despite clicking "Don't send" users actually don't care if data is sent

But there is nothing here that indicates users wanted to send data or that they do not care

Software developers can obviously do whatever they want and they can act against the interests of users

This includes ignoring or explaining away the preferences of their users ("Don't send") and engaging in speculation about user reasoning and intent

The developer here seems dismissive of users' reasons for clicking "Don't send", even though he does not know the reasons and can only speculate. At the same time he expects readers to take his reasons for collecting crash logs as justified. Then he unilaterally decides to remove user choice (the "Don't send" button) and substitute his own choice (send data) as a default

Perhaps lack of developer pre-release testing and quality control is relevant to this discussion. Alas, the problem is framed as one of data collection and user consent where the "solution" is making data collection surreptitious and making "consent" uninformed, implied


Begs the question: Why isn't opt-out "useless"

Perhaps because only way to get large sample size is to target users who are unaware of "defaults", i.e., remove choice

Perhaps when forced to make a choice ("opt-in"), users will not choose to share data (unless the developer uses dark patterns to manipulate the choice)

Why is that


> That's one I have not heard before

No offense, but if that's the case, you are very new to the discussion. It's been pretty well-documented that opt-out provides orders of magnitude more useful reports than opt-in.

For the best example: Factorio, a game with an almost-exclusively-technical playerbase and extremely well-regarded and community-friendly dev team, which already had a ton of people writing good bug reports on the forums, [fixed 12 crash-causing bugs](https://factorio.com/blog/post/fff-231) within two days after making crash reports automatic and opt-in.

And if it has that much impact for Factorio, you can imagine how much bigger the impact is for non-technical software.


I feel that I have been a victim of "good telemetry" too, as when advanced product features were removed which were probably not popular but that I personally relied on.

I haven't tried that specific case but - are you sure? It does get a lot of stuff right from context. I think it would probably depend how much of the frame, the poster took up.

More reference images from different angles is always going to give more accurate information in 3D. From a single 2D image there is a lot of ambiguity in the context. Several different shapes in 3D can be represented in identical ways in 2D. Additional context like lighting shadows etc helps. But more real signal from more images will always be better

I'm not saying it wouldn't be - because that's obvious.

Agreed, wasn't arguing just trying to add additional information in case it isn't obvious to anyone

Maybe, but what is wrong with wanting real depth instead of "made up depth"? One extra photo mostly solves that.

1. There's many use cases where only a single photo is available

2. There are many models similar to Sharp that do accept multiple photos - but Sharp is trying to solve a specific problem. If you have multiple photos - don't use Sharp.


I vibecoded a simple web app using Sharp that allowed be to quickly browse any local image folder and view them as "almost" volumetric 3d scenes in a VR headset.

I precomputed and cached each one so it was nearly instant. The effect - although only a crude wrapper around what Sharp already does - was quite transformative and mesmerising. Just the ease of pointing it at any folder of photos and viewing them fully spatially.

It was a bit of a mess code-wise and kinda specific to my local setup - but I should really clean it up deploy it somewhere for other people to try. Although I keep assuming someone else will do it before me and make a better job of it.


I would love to try that out, if you ever make it let me know.

My email is in my profile - ping me and I'll be much more likely to remember to do it.

See above for the link to the repo

Nice, would love to see it, feel free to link it here <3

It's a fork of someone else's viewer with gallery mode added: https://github.com/IxxyXR/sharp-web-splats/tree/main

just edit app.py or set an env var for GALLERY_FOLDER, install the reqs, run app.py and use add "/gallery" to the url displayed in the console.


> was within a few months of an expiring warranty

A few months inside or a few months outside?

Because that seems to determine who's being unreasonable in this.


Oh, I meant within! I guess that is ambiguous, I figured within = inside, and outside = expired. I'll edit.

Honestly what really egged me on was that I told them I might take them to small claims, and their response was sending a bunch of small claims cases they won.


No one sends you the cases they didn't win and they probably had a lot of court cases.

That's not the point. They were gleeful about their behaviour. Its even more despicable than then faux-kind "oh we are so sorry for your trouble and you are a valued customer, but computer says no."

My first thought is "support a tiny subset of svg that probably still covers 90% of real-world use cases".

I do feel that's there's two distinct types of svg - "bunch of paths with fills" and "clever dangerous stuff" where most real SVGs are of the former type.

Fully expect this to be shot down by someone that's thought about this problem for longer than the 120 seconds I just spent. :)


This is what happens when there isn't an adult in the room to reign things in, you get project overreach. SVGs should never have supported scripting. You want scripting in SVGs, fine, make it a different file format.

I can't imagine the cumulative number of man hours wasted on this problem when the vast majority of users were just looking for a way to make their logos look sharp.


Or you can literally just manipulate your SVG through the DOM in an external JS script... I still have no idea what the original motivation behind scripts in SVGs was.

I imagine it may have been attractive to those who liked Flash.

I think it may have been the other way (ie attractive to those who didn't like flash) - SVG was seen as a potential flash replacement?

> SVG was seen as a potential flash replacement?

Yes, that was a large part of the thrust back in the day. Even if it wasn't officially a goal of the SVG working group, there was a lack of an open standards-based alternative to what Flash was able to do, and the developers of the SVG standard saw that adding animation/tweening wouldn't take much given what browsers were already becoming capable of.


a little bit of a, a little bit of b. to displace flash if you don't like it, SVG has to have flash-like features to appeal to those who do use it and steal them away.

OG actionscript was very similar to Javascript. It only started to diverge when type hints were introduced.

AS2 was mostly following the direction of ES4 — so it wouldn’t have diverged if it hadn’t been abandoned.

While SVG is a web technology, for the longest time you had to install SVG support as a browser plug-in. I remember installing Adobe SVG viewer around 2000. It was used for interactive visualizations.

I'm don't remember precisely but I don't think you could script it from the DOM, I don't see how that could work if it's a plugin.


> SVGs should never have supported scripting.

I would even go further: HTML should never have supported scripting.


... or third party requests. Scratch the H in HTML and internet tracking would have never happened.

You could track people without links. You just couldn't go to other places without links.

I think you're right but the lack of industry standard for this kind of thing kills it. People want to be able to take the output of whatever tool they use that exports SVG and put it in a browser. Which isn't an unfair request. But you wouldn't have a guarantee it wouldn't filter out the tool using some obscure SVG functionality.

I'd love to see an agreed standard like OpenGL vs OpenGL ES for SVG. SVG-ES. Everyone agrees on the static, non-scripted elements that should work.


The way linked SVGs render from within img tags is basically perfect for SVG images (which as I understand is not standardized but is largely the same across browsers). External resources and scripting are blocked while still rendering nearly all SVGs correctly. And of course, any CSS is scoped to the SVG.

If someone formalizes this as a new format, please give it a new name! tvg tiny vector graphics? savg safe vector graphics?

And keep the scope as simple as possible so it actually ships! Don’t try implementing a binary format or something.


Someone did this already and did call it tinyVG! https://tinyvg.tech/

Maybe I'm missing something as I am not a frontend developer, but when you embed SVGs in an img tag as part of a Phoenix LiveView or even just a static component, you no longer get the ability to dynamically change paths/fills/colors with events coming from the server. Even if it's as simple as having a shape that you want to fill with a brand/highlight color, which at least for me is a common use case.

.rvg, Restricted Vector Graphics?

> My first thought is "support a tiny subset of svg that probably still covers 90% of real-world use cases".

It sounds like the linked post was about someone using a blacklist instead of a whitelist. It doesnt matter how tiny your subset is if you allow through stuff you don't recognize.

For the most part svg is safe. The dangerous parts are pretty obvious - script tag, image tag, feImage tag, attributes starting with on, embedding html in <foreignObject>, DTD tricks, namespace tricks, CSS that loads external stuff (keep in mind also presentational attributes. Its not just style attribute/tag).

The rest of it is pretty safe.


W3C has been defining SVG Native, but it hasn't progressed much lately — mostly because there hasn't been any interest in it. SVG Native is a small subset of SVG 2.0 which doesn't support scripting, animations or any external references. https://svgwg.org/specs/svg-native/

SVG Tiny PS (Portable Secure) is an attempt at this - https://www.ietf.org/archive/id/draft-svg-tiny-ps-abrotman-0...

Though I think it's still a draft, it does appear to be a requirement for BIMI - https://en.wikipedia.org/wiki/Brand_Indicators_for_Message_I...


Seems like someone already implemented your idea. https://tinyvg.tech/

Maybe we could use a subset of SVG or PDF?

There's the SVG Tiny profile that some implementations use, like BIMI/VMCs.

So if you are building something where you control every SVG ever produced and rendered then this is totally reasonable.

If you ever need to interface with other tools that generate SVG you now need to have a way of essentially transpiling SVG from the wild into your tamed SVGs. Oftentimes this is done by hand, by a software developer and designer (sometimes the same person).

And this is for basic functionality that your designers expect and have trivial controls for in their vector editors, like "add a drop shadow."

The article goes into some issues with sanitization itself, and except for <script> these are a bunch of reasonable things that someone might expect to work or not have issues with. Sandboxing rendering isn't an unreasonable approach if you're not writing the parser and renderer yourself.


Yeah, I think that's the real answer.

Look at what Microsoft did with Excel--the dangerous stuff is behind a switch.

Thus, solution:

Add two bits to the tag.

SVG1 does not execute any sort of script.

SVG2 does not follow links.

SVG3 is actually SVG1 + SVG2 as these are bit flags, not numbers.

Additional bits are reserved for future use if any other issues are found.

The only real safety is in the engine, not by any sanitizer.


Which is essentially already how it works. See https://svgwg.org/specs/integration/#secure-animated-mode

What switch?

Whether macros are enabled.

Fwiw I just thought the same, parse (don’t validate) the bits you like and recreate / reject the input.

The rust resvg [1] project uses micro SVG (usvg) [2] as an interim normalization format that simplifies SVG into solely:

* Simplified paths (no shapes, only one kind of object)

* DTDs, Attributes, CSS and references are pre-resolved

* Invisible elements and comments are removed

Resolving shared items may cause sizes to increase drastically however. This is the sort of explosion that compression is perfect at compressing.

1: https://docs.rs/resvg/latest/resvg/ 2: https://docs.rs/usvg/latest/usvg/


I wonder if it would be best if this was at the browser level as some sort of new format. Otherwise surely it would be really slow/cumbersome to deal with these in ‘user space’

I would say that a proper sanitizer should remove any attribute that has /https?:/ in it. Maybe it should allow access to a subtree of a blessed domain you control, where stuff like textures is stored.

It always seems like any animated svg loses all of the animation after sanitizing

A lot of SVG animation uses JS for some reason. It would be interesting to see if sanitisers strip CSS and SMIL animation, I don't see any security reasons to do so.

There are 3 different methods of animating svgs so it probably depends.

This is what android does. It has its own vector asset format and android studio has an action for importing svgs.

You'd lose a lot of useful features, like SMIL animation.

But you'd gain adoption. A fair trade.

You just invented you own definition of intelligence. I'm pretty sure that strategy could also support the opposite conclusion.

So your problem with the definition is that "I invented it"?

Do you have any rational objection to the definition? If you don't have, then I am afraid that you don't have a point.


I commented on this topic elsewhere on this page. This is an email from 2021. Maybe they changed policy but here:

> Bob (Backblaze Help)

> Aug 5, 2021, 11:33 PDT

> Hello there,

> Thank you for taking the time to write in,

> Unfortunately .git directories are excluded by Backblaze by default. File

> changes within .git directories occur far too often and over so many files

> that the Backblaze software simply would not be able to keep up. It's beyond

> the scope of our application.

> The Personal Backup Plan is a consumer grade backup product. Unfortunately we

> will not be able to meet your needs in this regard.

> Let me know if you have any other questions.

> Regards,

> Bob The Backblaze Team


> changes within .git directories occur far too often and over so many files that the Backblaze software simply would not be able to keep up.

I don’t really understand that. I’m using Windows File History, and while it’s limited to backing up changes only every 15 minutes, and is writing to a local network drive, it doesn’t seem to have any trouble with .git directories.


>File changes within .git directories occur far too often[..]

That's a crazy statement. The cloud backup system I use can be configured to how often it should bother even looking for new files, and for the section where I have my .git repos (they're actually "bare" git repos and I push to them, locally) I've set it to every two hours. Which is actually overkill because they absolutely do not change that quickly.


This is idiotic. All they have to do is schedule them and then introduce enough hysteresis to not constantly churn on their end. Even if they backed up at most once a day this would be better than this idiocy.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: