*How well did the algorithms detect insightful analysis, deep understanding beyo...

How well did the algorithms detect insightful analysis, deep understanding beyond the immediate subject matter, factual correctness, salience and an ability to write to a specific audience?

I'm far from well informed, but my understanding of standardised tests is that the standard specifies the algorithm, which already ignores your good points above to achieve standardised grading. All that really changes is whether a human or a robot executes the algorithm, the human insight has already been squeezed out of the system.