Hacker Newsnew | past | comments | ask | show | jobs | submitlogin





I added Opus 4.5 to my benchmark of 30 alternatives to your now-classic pelican-bicycle prompt (e.g., “Generate an SVG of a dragonfly balancing a chandelier”). Nine models are now represented:

https://gally.net/temp/20251107pelican-alternatives/index.ht...


I hadn't seen these before, they are so cool! Definitely enhances the idea to see a bunch of different illustrations in the same place.

Blogged about it here: https://simonwillison.net/2025/Nov/25/llm-svg-generation-ben...


Thanks! I feel honored.

Gemini 3.0 Pro Preview is incredible compared to the others, at least for SVGs.

I was about to say the same; suspiciously good, even. Feels like it's either memorised a bunch of SVG files, or has a search tool and is finding complete items off the web to include either in whole or in part.

Given that it also sometimes goes weird, I suspect it's more likely to be the former.

While the latter would be technically impressive, it's also the whole "this is just collage!" criticism that diffusion image generators faced from people that didn't understand diffusion image generators.


I agree with your sentiment, this incremental evolution is getting difficult to feel when working with code, especially with large enterprise codebases. I would say that for the vast majority of tasks there is a much bigger gap on tooling than on foundational model capability.

Also came to say the same thing. When Gemini 3 came out several people asked me "Is it better than Opus 4.1?" but I could no longer answer it. It's too hard to evaluate consistently across a range of tasks.

Did you write the terminal -> html converter (how you display the claude code transcripts), or is that a library?

I built it with Claude. Here's the tool: https://tools.simonwillison.net/terminal-to-html - and here's a write-up and video showing how I built it: https://simonwillison.net/2025/Oct/23/claude-code-for-web-vi...

Wispr Flow/similar for STT input will boost your already impressive development speed. (if you wanted)

Thank you!

> Thinking blocks from previous assistant turns are preserved in model context by default

This seems like a huge change no? I often use max thinking on the assumption that the only downside is time, but now there’s also a downside of context pollution


Opus 4.5 seems to think a lot less than other models, so it’s probably not as many tokens as you might think. This would be a disaster for models like GPT-5 high, but for Opus they can probably get away with it.

i think you have an error there about haiku pricing

> For comparison, Sonnet 4.5 is $3/$15 and Haiku 4.5 is $4/$20.

i think haiku should be $1/$5


Fixed now, thanks.

I wonder if at this point they read what people use to benchmark with and specifically train it to do well at this task.

:%s/There model/Their model/g



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: