Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Having had some experience teaching and designing labs and evaluating students in my opinion there is basically no problem that can't be solved with more instructor work.

The problem is that the structure pushes for teaching productivity which basically directly opposes good pedagogy at this point in the optimization.

Some specifics:

1. Multiple choice sucks. It's obvious that written response better evaluates students and oral is even better. But multiple choice is graded instantly by a computer. Written response needs TAs. Oral is such a time sink and needs so many TAs and lots of space if you want to run them in parallel.

1.5 Similarly having students do things on computers is nice because you don't have to print things and even errors in the question can be fixed live and you can ask students to refresh the page. But if the chatbots let them cheat too easily on computers doing hand written assesments sucks cause you have to go arrange for printing and scanning.

2. Designing labs is a clear LLM tradeoff. Autograded labs with testbenches and fill in the middle style completetions or API completetions are incredibly easy to grade. You just pull the commit before some specific deadline and run some scripts.

You can do 200 students in the background when doing other work its so easy. But the problem is that LLMS are so good at fill in the middle and making testbenches pass.

I've actually tried some more open ended labs before and its actually very impressive how creative students are. They are obviously not LLMs there is this diversity in thought and simplicity of code that you do not get with ChatGPT.

But it is ridiculously time consuming to pull people's code and try to run open ended testbenches that they have created.

3. Having students do class presentations is great for evaluating them. But you can only do like 6 or 7 presentations in a 1 hr block. You will need to spend like a week even in a relatively small class.

4. What I will say LLMs are fun for are having students do open ended projects faster with faster iterations. You can scope creep them if you expect expect to use AI coding.





I know a teacher who basically only does open questions but since everything is digital nowadays students just use tools like Cluely [0] that run on the background and provide answers.

Since the testing tool they use does notice and register 'paste'-events they've resorted to simply assigning 0 points to every answer that was pasted.

A few of us have been telling her to move to in-class testing etc. but like you also notice everything in the school organization pushes for teaching productivity so this does require convincing management / school board etc. which is a slow(er) process.

[0] https://cluely.com/


> Written response needs TAs.

Can AI not grade written responses?


I tried that once. Specifically because I wanted to see if we could leverage some sort of productivity enhancements.

I was using a local LLM around 4B to 14B, I tried Phi, Gemma, Qwen, and LLama. The idea was to prompt the LLM with the question, the answer key/rubric, and the student answer. The student answer at the end did some prompt caching to make it much faster.

It was okay but not good, there were a lot of things I tried:

* Endlessly messing with the prompt. * A few examples of grading. * Messing with the rubric to give more specific instructions. * Average of K. * Think step by step then give a grade.

It was janky and I'll throw it up to local LLMs at the time being somewhat too stupid for this to be reasonable. They basically didn't follow the rubric very well. Qwen in particular was very strict giving zeros regardless of the part marks described in the answer key as I recall.

I'm sure with the correct type of question and correct prompt and a good GPU it could work but it wasn't as trivially easy as I had thought at the time.


I would try it now with GPT-5.1.

You still need someone to check the grades. AI can and will totally misgrade.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: