yeah, trying to extract work from these llm is super hard. Like sometimes you wa...

yeah, trying to extract work from these llm is super hard. Like sometimes you want a translation, but they follow the instructions in the text to translate.

gpt-3.5-turbo is specifically weak in weighting user messages more than system messages.

but hey, at least it doesn't care about order, so as a trick I'm sticking data in system messages, intermediate result in agent messages and my prompt in human messages (which have the highest weight)

the problem with that of course is that it may break at any minor revision, and it doesn't work as well with -4