This is almost as old as automated theorem proving. There exists systems for mathematics, programming, music, essays and various analytical fields of study.
I think labeling it an "expression" checker is a good thing, as it narrows down its purpose and scope.
The database class before it was explicitly Coursera included exercises using relational algebra where any expression that produced correct answers in the general case was correct.
So the grader usually tested how much memory any exercise used over several data sets, and fit it to a quadratic. So one had to use, e.g., less than N^2 + 16N + 64 bytes.
One exercise specified that a data structure used ints, but the grader only tested it on values that could be stored in a short. The extra spare memory allowed for an alternate implementation using a more complicated data structure.
The point of the exercise was to figure out the somewhat trivial optimization that allowed the simpler data structure to run in roughly the same time, not to learn the more complicated data structure. After the more complicated solution was posted to the forums, there was really no incentive for people to figure out the simpler method.
I think labeling it an "expression" checker is a good thing, as it narrows down its purpose and scope.