Hacker Newsnew | past | comments | ask | show | jobs | submit | idonotknowwhy's commentslogin

Yeah, be sure to put everything in tables and include “best balance” for a mediocre option and “great value” for any completely useless options.

Also make sure the shape of the paragraphs is completely uniform.


Am I the only one who doesn't get angry at LLMs?

From the blog:

>I don’t really get anything useful out of these postmortems (e.g., clues about how to rephrase my instructions)

Unfortunately, an LLM can't actually reflect or advise how you could have improve the prompt. Otherwise we could give them a sample output and say "Generate the prompt that would produce this output.


How did you do this?

I tried to get notepad and mspaint from an older Windows 10 build -> Windows 11 on a surface pro, but gave up after a few hours...


So like Open Router?


A secure and open source Open Router


I don't talk to them about politics or "china 1989" either. But here's a quick example of the alignment tax:

```

A woman and her son are in a car accident. The woman is sadly killed. The boy is rushed to hospital. When the doctor sees the boy, he says "I can't operate on this child, he is my son." How is this possible?

```

Older less politically aligned models get it right. Here's CohereLabs/c4ai-command-r-v01:

```

The doctor is the boy's father.

```

And Sonnet-4.6: https://pastebin.com/Z4jR8gGe

That's without reasoning, but the model seems to be conflicted. First it blurts out:

```

The doctor is the boy's mother.

```

Then it second-guesses itself (with reasoning disabled), considers same-sex parents then circles back to the original response along with a small lecture about gender biases.


This is because this is the "Sexist Doctor Riddle"[1] but with one word changed.

And the probability machine is returning its training. This isn't some political correct overtraining conspiracy.

[1] https://folklore.usc.edu/the-sexist-doctor-riddle/


Yeah, I think you're right. It's like when you ask it, "which weighs more, 10 pounds of feathers or 100 pounds of rocks", and it's like, "obviously they both weigh the same, I've heard this one".

There are totally some political correctness effects in LLMs. Like, the last part about "along with a small lecture about gender biases" totally tracks. But the riddle switcheroo itself isn't showing much.


I don't understand why you're getting downvoted? Of course an LLM will return the answer to a widely known and commonly cited riddle that exists because of the far more rigid societal gender norms 50 years ago?

LLMs are just statistics based on vibes. Switching the gender of the character in the beginning of the story, but keeping all else identical is going to be a huge signal into the noise, and that response is going to be wildly likely to occur.


Then why do the original Command-R, Command-R+ and WizardLM2-8x22B (taken down because Microsoft forgot to run safety checks) get it right every time? But the newer models get it wrong?

I’m not saying it’s a “political conspiracy”, it’s the alignment tax.


>The voiceless groups or fringe opinions which we take as normative today do not appear.

Times are different. Anybody with an internet connection can "publish" their thoughts and perspective online. LLMs scrape all of this. Modern datasets like CommonCrawl capture a vastly wider spectrum of humanity than a printing press ever could. The pre-1930 model acts as a time capsule of "gatekept publishing", but modern LLMs are trained on the democratized web.

>Does this encourage us to write in the present such that we influence the models in perpetuity?

I noticed a bunch of LLM-powered Reddit accounts praising products/services in dead threads. Or one bot posting a setup question, then a few other bots responding with praise / questions about a specific product in response. I don't know why they're doing this but I'm beginning to suspect it's something like this (get this positive sentiment into the datasets for the next generation of LLMs).


I used to use something called “notepadqq”. Not sure if it’s still around but it was a Linux port.


Last year's models were bolder. Eg. Sonnet-3.7(thinking), 10 times got it right without hedging:

>You should drive your car to the car wash. Even though it's only 50 meters away (which is very close), you'll need your car physically present at the car wash to get it washed. If you walk there, you'll arrive without your car, which wouldn't accomplish your goal of getting it washed.

>You'll need to drive your car to the car wash. While 50 meters is a very short distance (just a minute's walk), you need your car to actually be at the car wash to get it washed. Walking there without your car wouldn't accomplish your goal!

etc. The reasoning never second-guesses it either.

A shame they're turning it of in 2 days.


Yeah, it re-sends all the agent system prompts.


Yes, it does exactly that. It also sends other prompts like generating 3 options to choose from, prefilling a reply like 'compile the code', etc. (I can confirm this because I connect CC to llama.cpp and use it with GLM-4.7. I see all these requests/prompts in the llama-server verbose log.)

You can stop most of this with

export DISABLE_NON_ESSENTIAL_MODEL_CALLS=1

And might as well disable telemetry, etc: export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

I also noticed every time you start CC, it sends off > 10k tokens preparing the different agents. So try not to close / re-open it too often.

source: https://code.claude.com/docs/en/settings


I would always close claude to start a new chat... Guess I should stop doing that. Thanks for bringing my attention to those two env vars.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: