I've seen a few YC startups focusing on this but I haven't decided yet if we should build this internally or use an external tool.
I would like to see how changing the system prompt or any of the logic in the pre-defined prompts affects the output.
I put this list together, I'm pretty sure one of these should solve what you're after: https://llm-utils.org/List+of+tools+for+prompt+engineering
And you can participate in the arena, which pits them against each other. I'm surprised that I actually voted for GPT-3.5 over GPT-4 for a lot of my use cases.