It's really already very difficult to write good problem material for evaluations. Having to find a way where difficulty is intermediate for the target audience (not too easy, not too hard) but also too hard for LLMs would be very challenging / impossible for most disciplines.