You should probably add an example for using the special commands '.critic' '.resolve' commands that you are projecting as the key 'spice' of your project. There is no 'how to use' or 'how it works' provided for these special commands in your readme.md. One would have to walk through the entire code to hopefully get an idea. I went through the main script - I have an idea on the overall flow, but I still don't know how these special commands work. Other people might also run into the same issue. So, additional documentation can help
What is the perf of GPT4 vs GPT3.5 in the 'reasoning', 'reflection', 'critisism' and 'resolver' tasks like in your project? I see that you have commented out gpt3.5 and replaced with GPT4 in config yaml. Was GPT3.5 perf too bad? I don't think many people have GPT4 API access. If this requires atleast GPT4 to be effective, it might take a while before anyone else in the community can take it up.
Good idea about Claude, never tried it myself. But I've heard that GPT-4 is superior at the moment. Is it true?
Also thought about GPT-4 prices, but from my experience: If I use it intensively for a 3-4 hours it costs about $5. This is much cheaper than anyones hour rate.