Hacker News new | past | comments | ask | show | jobs | submit login
CodeAid: A classroom deployment of an LLM-based coding assistant (austinhenley.com)
56 points by jermaustin1 8 months ago | hide | past | favorite | 38 comments



> Direct code solution queries (44%) where students asked CodeAid to generate the direct solution (by copying the task description of their assignment).

Did these solutions scores get penalized for lack of real understanding? Or, to put it another way, is your class about teaching programming itself or about teaching how to solve problems using any tool available (including an AI that solve it for you)?


No. Students' usage was anonymized, so the course instructors did not know who used the system in what way. This was to make sure that students could use the tool freely without feeling like the instructors are watching their usage.


OK, cool. Hadn't realized it was designed as an experiment. So that would more be something the potential users might want to consider when they read about this. Thank you for clarifying.


the explanation of "*word = "hello"" shown is completely incorrect, the memmove explanation is also incorrect


> It was developed by Majeed as a web app that uses GPT3.5 to power an assortment of AI features

Really need GPT-4 for technical explanations, but it's also much more expensive.

Every once in a while, ChatGPT logs me out and switches to GPT-3, and I immediately notice just due to the quality of the answer.


I'm the lead author of this paper. Feel free to ask me anything! - MK


What do you think about the ethical implications of using unreliable agents as educators?


The same goes with human TAs that are extensively used in undergrad introductory programming classes. They can also be unreliable in many cases.

1. Provide students with the tools and knowledge to critically verify responses, either coming from an educator or a an AI agent. 2. Build more transparent AI agents that show how reliable they are on different types of queries. Our deployment showed that the Help Fix Code was less reliable, while other features were significantly better.

But totally agree that we should be discussing the ethical implications much more.


> The same goes with human TAs that are extensively used in undergrad introductory programming classes. They can also be unreliable in many cases

I think one difference is that human TAs can, theoretically, be held accountable for their reliability whereas a holding a LLM accountable is a little more difficult.


My experience as a TA is students definitely do not have the knowledge to critically verify responses.


Doesn't this make AI agents better? Given that human TAs do make mistakes a LOT, or in many cases are just unprepared (e.g. haven't done the programming assignment themselves)

Human TA's have ego, AI doesn't. With proper tools, you should be able to steer an AI agent.

I think both humans and AI agents both have their drawbacks and benefits. That's why the last section of the paper discusses that we even need to teach students (or provide tools) to help them decide where to use AI vs non-AI tools.


TA's can say things like "I dont know, lets figure it out together". LLMs will spit out false information as easily and confidently as true information.


Humans have their drawbacks?

I have to say, I really urge you to consider the nature of your framing here.

As you can tell I am quietly (or maybe not-so-quietly) appalled by the zeitgeist around generative AI, but even if I were not, I hope I would still see that this kind of linguistic framing is insensitive and self-defeating if not wholly inappropriate and demeaning if you want to see any co-operation from educators, who are as a broad picture, tired, dedicated, hopeful people who -- unlike LLMs -- can and do place significant moral and ethical value on teaching.


I'm curious as to how you would frame it instead.


I wouldn't put myself in that position.

When the question is of the form "what do you think about the ethical implications of using AIs?" and the quick answer is "humans also make mistakes" I think the entire premise is on pretty shaky ground.


I think I may have been unclear.

>... I would still see that this kind of linguistic framing is insensitive and self-defeating if not wholly inappropriate and demeaning if you want to see any co-operation from educators

I think this was spot on, and I agree with you. I meant to ask: have you thought about a way to frame it, in which educators don't feel threatened but instead excited ? Given that, from my perspective, there's a great chance to enhance both the teaching and learning experience, as well as results


I don’t think I can answer because again I think I would not put myself in that position.

I am going to retreat from this argument because this entire topic has me questioning my planned career change away from development towards training.

If this is what people think about human teachers, what at all is the point?


> The same goes with human TAs that are extensively used in undergrad introductory programming classes. They can also be unreliable in many cases.

Ehh. Those TAs, if they feel they might be wrong, can consult the lecturer/professor. And if they feel they might be wrong, they can just say so.

IMO there is little to no comparison between a bad TA and a confidently-wrong LLM (having been a TA who knew to consult the professor if I felt I was not on solid ground).

LLMs have no experience with teaching, they have no empathy for students grappling with the more challenging things, and they can gain no experience with teaching. Because it's not about spewing out text. It's about guiding and helping students with learning.

For example: can an LLM sympathise or empathise with a cybernetics student who is grappling with the whole conceptual idea of laplace transforms? No. It can only spew out text with just the same level of investment as if it was writing a silly song about cats in galoshes on the Moon.

I wish we were not in this "well humans also..." justification phase.

It is genuinely disrespectful to actual real people and it's founded on projection.

And in this case, it will also shut down the pipeline of academic progression if TAs are no longer hired.

Why are we doing this to academia when the better approach would be giving TAs better training in actual teaching? More-senior academics doing this kind of research work is absolutely riddled with moral hazard: it's not your jobs immediately on the line.

ETA: sooner or later, people in the generative AI market should really consider not just saying that we should talk about the ethical implications, but actually taking a stand on them. It's not enough to produce something that might cause a problem, rush it into production and just say "we might want to talk about the problems this might cause". Ethics are for everyone, not just ethicists.


Meanwhile almost every TA I had at uni didn't really want to be there. They were there for their PhD, not as a professor in training which would have made your position more understandable. And to boot they rarely spoke English very well. I had a few TAs that I understood so poorly that I stopped attending their labs.

The TA system feels like a hack where university gets to get free labor out of PhD students, but the undergrads suffer for it. I don't think there's much to glamorize. Nor do I think there's much to salvage from the days where you needed to attend office hours to get help. You see it as this critical human experience in uni but I don't.

That said, half my professors at uni also prob didn't want to teach. They were there for research.


> They were there for their PhD, not as a professor in training which would have made your position more understandable.

Right. Not all TAs become professors. But at a first approximation all professors have TA experience; it's generally their first experience of teaching.

I was paid for my time as a TA, in the UK. It would be illegal for them not to pay.


LLMs are tools. They're not everything. Yes, they can't sympathize or empathize. But if they can help a student to be more productive and learn at the same time, then I'm all in for designing them properly to be used in such educational contexts... "as an additional tool."

We need both humans and AI. But there are problems with both, so that's why they can hopefully complement each other. Humans might have limited patience, availability, etc. and AI lacks empathy, and can be over-confident.

> Why are we doing this to academia when the better approach would be giving TAs better training in actual teaching?

Sure, that is a fantastic idea and some researchers have explored it.

But, what's wrong with doing exploratory research, in a real-world deployment? In the paper we describe both where CodeAid failed and where students and educators found it useful, in a very honest way.


> We need both humans and AI.

Genuine question: Why do we need both humans and AI? What's the evidence base for this statement?

I feel this is another thing that proponents state as if it's unchallengeable fact, an all-progress-is-good thing.

I question this assertion. People have become all too comfortable with it.

(Personal opinion: I don't think teaching needs AI at all, and if it does, a traditional simple expert system with crafted answers would still be better. I think there's a staggering range of opportunities for improving teaching materials that don't involve LLMs, and they are all being ignored because of where the hot money goes.)


I think my stance is pretty clear about "utilizing" AI in educational settings. We absolutely don't need AI the same way we need air to breathe. But AI could potentially provide some solutions (and create new problems or have adverse effects as well), so why not explore it properly to find out where it works and where it doesn't?


The statement is a false statement to begin with. We don't have AI yet. Maybe when we have software that is truly intelligent, we can let it teach us. Until then I see this more as a buggy interactive textbook and agree with the author's description of it as a tool and disagree with the idea of it as a teacher.


Imagine we could, with the snap of a finger, come up with an AI tutor that is objectively better than human TAs. Better as in: between 2 groups of 10,000 students, those with AI tutors do better on perfomance metrics 95% of the time than those students with human tutors. Would you be opposed to replacing human tutors then?

If your answer is yes, (in some flavor of "protecting and helping the jobs of those who teach), I would argue your ethics are focused on the wrong group. Teaching is for students to learn, not for teachers to have jobs.

We don't have said technology yet, but it's reasonable to think we can get close. If there's a good chance to improve how well students can learn, I don't think "teachers don't appreciate it" is a good reason not to do it.


> Would you be opposed to replacing human tutors then?

Yes.

> If your answer is yes, (in some flavor of "protecting and helping the jobs of those who teach), I would argue your ethics are focused on the wrong group. Teaching is for students to learn, not for teachers to have jobs.

OK. But your framing here projects upon me the idea that I'm solely concerned about replacing jobs, in order for your argument to succeed. (Though again, that is the cold-rationalist AI zeitgeist[0]: why should people have jobs when an AI can do it?)

It elides the possibility that it is inherently better to learn from a real person, who has invested time and effort into teaching you. What is the point of higher education in particular if you are not learning, at some point, from people who are directly adjacent to cutting edge thinking?

> We don't have said technology yet, but it's reasonable to think we can get close. If there's a good chance to improve how well students can learn, I don't think "teachers don't appreciate it" is a good reason not to do it.

Well then. I can't argue with this, if you think it's OK to take humanity out of teaching. I think — perhaps feel — you are so wrong that I can barely even string the words together to explain. And that is an unbridgeable divide.

[0] https://www.theatlantic.com/technology/archive/2024/05/opena... or https://archive.ph/AL81B

'In response to one question about AGI rendering jobs obsolete, Jeff Wu, an engineer for the company, confessed, “It’s kind of deeply unfair that, you know, a group of people can just build AI and take everyone’s jobs away, and in some sense, there’s nothing you can do to stop them right now.” He added, “I don’t know. Raise awareness, get governments to care, get other people to care. Yeah. Or join us and have one of the few remaining jobs. I don’t know; it’s rough.”'


> It elides the possibility that it is inherently better to learn from a real person, who has invested time and effort into teaching you.

I did elude it, for the sake of the argument. If it turns out that in fact human tutoring is fundamentally better, then there's of course no point in using an inferior system (sweeping accesibility and other concerns under the rug). Go humans, if we're better!

> What is the point of higher education in particular if you are not learning, at some point, from people who are directly adjacent to cutting edge thinking?

For the subset that do research, this matters a lot. But for most everyone else looking for a better job, it's not really relevant.

> Well then. I can't argue with this, if you think it's OK to take humanity out of teaching. I think — perhaps feel — you are so wrong that I can barely even string the words together to explain. And that is an unbridgeable divide

I appreciate your candidness, and perhaps it's true that we may just not be able to agree. For what it's worth, my bet is tutors' quality will improve, rather than them getting displaced. My point however is: I want my kid to learn as best as possible. If that turns out to be with a robot, I'm not making my kid worse off to save some guy's job.


The TAs in my undergraduate intro to programming class were very knowledgable and reliable, but that is a sample size of 1.


The Grad student teachers and TAs in my math courses - including discrete math - were at best ambivalent to us lesser Computer Science students and at worst under-trained and contemptuous.

University of Oregon ~2014ish


i agree with this. I keep trying to instill paranoia in the younger people I work with. even if you can see that the code is doing set_x(5), if it's crashing 20 lines down, I want you to either print or breakpoint the code here and really prove to me that x is now 5, before I look any further. sometimes set_x() might not do what you think. other times there might be something stomping on it from here to there, but I want to be absolutely sure, I don't share your faith in the documentation, I don't trust my own eyes to read the code, I just want to be 100% sure.


Right. So can an LLM convey that paranoia?

The way a formal methods lecturer explained to me his concerns about the Y2K problem by talking about the embedded systems in the automated medication pumps treating his sick partner, and how without an MMU and code that could not be inspected, there was a non-zero chance that rolled-over dates would cause logging data to overwrite configuration data?

Can an LLM convey a bit of anger and fear when talking about Therac-25?

Even though a TA is often at a much lower teaching level than this, every single person who has ever learned anything has done so with the benefit of a teacher who "got through to them" either on a topic or on a principle.

It's bonkers to compare TAs and LLMs simply on their error rate, when the errors TAs make are of a _totally_ different nature to the errors LLMs can make.


oh my point was that somebody has to strike the fear of god in them first before they start trusting the llm blindly. I know the llm can fake this kind of thing, especially if you put a prompt that forces "as an LLM, I'm probably going to shoot you in the foot randomly", but I'm sure they'll get used to ignoring it.


> oh my point was that somebody has to strike the fear of god in them first before they start trusting the llm blindly.

We agree on that :-)



That article is borderline rambling, and I don't see how it applies to asking this question.


First off, congrats! Cool paper! Q: did you measure the correlation between CodeAid use and test scores? When educational youtube videos took off, people liked them a lot, but turns out people weren't really retaining very much information. You'd think this wouldn't be the case for CodeAid, but it's important to check


Thanks!

> did you measure the correlation between CodeAid use and test scores?

Unfortunately, we couldn't measure such correlations due to many external factors that can impact students' performance on test scores.

However, our previous research specifically compared students learning python coding for the first time with and without LLM code generators. Austin and I wrote a summary on it here: https://austinhenley.com/blog/learningwithai.html

We found that students who performed higher on our Scratch programming pre-tests (before starting the Python lessons), performed significantly better if they had access to the code generator. However, the system that we used in that study was very different, it showed the exact code solution to the students' query. Here with CodeAid we were trying to avoid producing direct code solutions!


> Unfortunately, we couldn't measure such correlations due to many external factors that can impact students' performance on test scores.

Fair. Guess we'll have to roll it out to increase that sample size :)

> We found that students who performed higher on our Scratch programming pre-tests (before starting the Python lessons), performed significantly better if they had access to the code generator.

Interesting!

From the paper > Additionally, students in the Codex group were more eager and excited to continue learning about programming, and felt much less stressed and discouraged during the training

This by itself could reap great benefits years down the line. Me and my wife have felt the same way at work!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: