I'm using these things to evaluate pitches. It's well known the default answer i...

jampa · 2025-03-01T03:38:25 1740800305

Positivity is still an issue, but there are some improvements that I found to work around it:

- ChatGPT works best if you remove any “personal stake” in it. For example, the best prompt I found to classify my neighborhood was one that I didn’t tell it was “my neighborhood” or “a home search for me”. Just input “You are an assistant that evaluates Google Street Maps photos…”

- I also asked it to assign a score between 0-5. It never gave a 0. It always tried to give a positive spin, so I made the 1 a 0.

- I also never received a 4 or 5 in the first run, but when I gave it what was expected from the 0 and 5, it callibrated more accurately.

Here is the post with the prompt and all details: https://jampauchoa.substack.com/p/wardriving-for-place-to-li...

edoceo · 2025-03-01T04:01:55 1740801715

This is hot, thank you!

codelion · 2025-03-01T01:57:45 1740794265

Interesting challenge! I've been playing with similar LLM setups for investment analysis, and I've noticed that the default "niceness" can be a hurdle.

Have you tried explicitly framing the prompt to reward identifying risks and downsides? For example, instead of asking "Is this a good investment?", try "What are the top 3 reasons this company is likely to fail?". You might get more critical output by shifting the focus.

Another thought - maybe try adjusting the temperature or top_p sampling parameters. Lowering these values might make the model more decisive and less likely to generate optimistic scenarios.

edoceo · 2025-03-01T04:05:13 1740801913

I've not tried that top N method and I will.

Early experiment showed I had to keep the temp low. I'm keeping it around 0.20. from some other comments I might make a loop to wiggle around that zone.

kyle_grove · 2025-03-01T16:48:05 1740847685

There's the technique of model orthogonalization which can often zero out certain tendencies (most often, refusal), as demonstrated by many models on HuggingFace. There may be an existing open weights model on HuggingFace that uses orthogonalization to zero out positivity (or optimism)--or you could roll your own.

sillysaurusx · 2025-03-01T02:03:58 1740794638

Have you tried asking it to be more blunt or even aggressive? It seemed to work quite well. It flat out rejected a pitch for Pied Piper while being cautiously optimistic about Dropbox: https://chatgpt.com/share/67c26ff4-972c-800b-a3ee-e9787423b7...

edoceo · 2025-03-01T03:05:30 1740798330

Yes, using those words. Tried even instructing that default is No.

Most repeatable results I got was to evaluate metrics and when too many were not found reject.

My feelings are it's in realm of the hallucinating that's routing the reasons towards - yea, this company could work if the stars align. It's like its stuck with the optimism of the first time investor.

istjohn · 2025-03-01T05:04:01 1740805441

Maybe simultaneously give it one or more other pitches that you consider just on the line of passing and then have it rank the pitches. If the evaluated pitch is ranked above the others, it passes. Then in a clean context tell the LLM that this pitch failed and ask for actionable advice to improve it.

j45 · 2025-03-01T01:50:44 1740793844

Probably a few ways to drive that thru the roof and run a bunch of scenarios at once.

Do you input anything with the prompt in terms of investment thesis?

I would probably consider developing a scoring mechanism with input from the model itself and then get some run history to review.

edoceo · 2025-03-01T04:00:26 1740801626

Like make a feedback loop? Agent/Actor type of play? Maybe test against an AI formed thesis even?

cshimmin · 2025-03-01T07:00:04 1740812404

Hm, I wonder if you could do something like a tournament bracket for pitches. Ask it to do pairwise evaluations between business plans/proposals. "If you could only invest in A -OR- B, which would you choose and what is your reasoning?". If you expect ~80% of pitches to be a no, then take the top ~20% of the tourney. This objective is much more neutral (one of them has to win), so hopefully the only way the model can be a "people-pleaser" is to diligently pick the better one.

Obviously, this only works if you have a decent size sample to work from. You could seed the bracket with a 20/80 mix of existing pitches that, for you, were a yes/no, and then introduce new pitches as they come in and see where they land.