What's the benefit of this over, for example, MagicSchool which my school is continually pushing on us and is free?
This next bit is going to sound a bit harsh or confrontational, but trust I'm coming at it with a good heart of hope. I'm a bit on edge about pushing AI for grading. Teachers are... not technically literate folk. They won't understand the nuance of AI and using it to grade student work seems like heartache and disaster in the making. Are you doing anything to somehow subvert the fundamental nature of LLMs to make it suitable for grading?
Not OP, but I work on AI in higher ed at a major university.
I get the concerns about AI grading. The solution isn't to have AI grade entire assignments at once. Instead, break down the assessment into smaller, discrete tasks and develop a grading rubric around those. The idea is to limit how the AI can respond - usually to simple binary choices like completed/not completed, true/false, etc. (Also, the models have been RLHF’d to generally put a positive spin on things, so if anything they’re likely to be overly generous in assessment.)
From there, provide the AI with the answer key, student response, rubric, and any other necessary context then use the Structured Outputs API to force consistent responses for each discrete task. I've had the most success using boolean values or simple enums (like "Correct", "Partially Correct", "Incorrect"). You can include a field for reasoning, then chain AI calls to get a second assessment as verification.
This next bit is going to sound a bit harsh or confrontational, but trust I'm coming at it with a good heart of hope. I'm a bit on edge about pushing AI for grading. Teachers are... not technically literate folk. They won't understand the nuance of AI and using it to grade student work seems like heartache and disaster in the making. Are you doing anything to somehow subvert the fundamental nature of LLMs to make it suitable for grading?