Hacker News new | past | comments | ask | show | jobs | submit login
Kids who use ChatGPT as a study assistant do worse on tests (hechingerreport.org)
176 points by notamy 12 days ago | hide | past | favorite | 158 comments





When I was young, and learning math, my father always forbade me from looking at the answer in the back of the textbook. “You don’t work backwards from the answer!”, and I think this is right.

In life, we rarely have the answer in front of us, we have to work that out from the things we know. It’s this struggling that builds a muscle you can then apply to any problem. ChatGPT, I suspect, is akin to looking up the answer. You’re failing to exercise the muscle needed to solve novel (to you), problems.


I think this is very correct for studying, when I was in undergrad, if I saw the answer, I found it was more effective to skip past it and solve it later. Even more important, once I am done, I should not go look at the answer to confirm if I am right or wrong, I should try and validate the answer by looking at my solution and trying to figure out if my solution is correct or not because there are plenty of ways to "disqualify" an answer, once I learned to do this well, my grades really went up.

However, I don't always agree with the fact that we don't have the answer in front of us in many situations. There are a lot of situations beyond grade school and undergrad where you do have the answer in front of you. Sometimes it's in for form of Numpy or Matlab or Simulink. It might be someone's research publication. Replicating these by working both forwards and backwards from their libraries or results can be much more effective.


Good point about disproving your answer (before seeing the alleged real answer). The ability to quickly verify/disprove the answer is a separate and useful skill all by itself, that isn’t explicitly taught.

Troubleshooting/problem solving, working backwards from symptom to problem etc — that’s another, richer and very rewarding “real world” type of skill that isn’t directly taught, (though often encountered incidentally).


Half of modern mathematics is basically assuming something is true and then trying to work forwards and backwards to show that it is true. But that requires a lot of rigor. If you hand-wave a single step you are just exercising an advanced version of confirmation bias. Working forwards and sanity checking your result is a lot more forgiving.

Guessing something and then confirming it is completely different from knowing the answer and then working towards it, especially for the kinds of problems that students solve in school. For the kind of proofs that working research mathematicians produce the hard part is not assuming that a Lemma is true, but already coming up with the correct definitions to formulate the Lemma in the first place is a delicate task.

I think it depends on how the tool is used. If a student is just plugging in the problem and asking for the answer, there is clearly no long term benefit to this. If the student is trying to understand a concept, and uses GPT to bounce ideas around or to ask for alternate ways of thinking about something, it can be very helpful.

How spacing is utilized can be helpful too. Struggle with the problem for 20-30 minutes. Ask for a nudge. Struggle some more. Repeat many times.

Some concepts also just have to be thought about differently to get to the aha moment, especially in math. AI may have an opportunity to present many instances of “think about it this way instead”.


> If the student is trying to understand a concept, and uses GPT to bounce ideas around or to ask for alternate ways of thinking about something, it can be very helpful.

The article said that even using the AI this way did not improve results.


All we know is the tools that were given to the students, not how they used them, and it’s a fairly limited study on top of that, without knowing anything about how other variables were controlled. That being said, I wouldn’t be surprised if the payoff from AI learning isn’t as good as people might think, there still has to be a good process in place and I don’t think it can replace a good teacher and some quality struggle.

children using calculators to solve multiplication exercises do worse in multiplication exams

Ai is a tool. Use it as a tool - get benefits, use it to cheat...


>In life, we rarely have the answer in front of us,

this sounds a lot like the "when you're an adult, you won't always have a calculator in your pocket" line we all heard in elementary school. and of course, we all know now how wrong that was.

in life, chatGPT exists. and tools like it are only going to become more widespread. we are going to have the answers in front of us. knowing the things an LLM knows is not a useful skill anymore.


In order to develop the kinds of mental skills you need to tackle complex problems, you need to practice them on simpler problems first (particularly if you happen to be a child). If you decide at a young age it's worthless to attempt any problems that can be solved by a calculator or ChatGPT, you will probably never learn how to solve any problems that you can't use those tools for either.

Also, knowing what kind of things exist and what questions to ask is half the battle. If you haven't stored anything in your head on the grounds that you can outsource knowing things and thinking about them to ChatGPT, you're not going to be able to prompt it efficiently either. It's much harder to sanity check numbers and formulas given to you by others (or ChatGPT) if you can't do any quick mental math. Having original ideas that don't yet exist in LLM datasets also requires storing a lot of concepts and making new connections between them.

I suppose all this is moot if you expect AGI to be available to everyone in the near future, but if it turns out to be decades or more away unfortunately human mental effort will still be required in the meantime.


Not true. Current research is that experts are better with a ChatGTP, and beginners worse. An expert knows what to ask, and a beginner juat flails around.

Tools like calculators, ChatGTP, Google, and chainsaws are often force magnifiers for experts, and a danger for beginners.

If you have an idea of what you are trying to do at a deep level then they are great, but beginners don't, they usually just use them as a shortcut to avoid needing any real understanding of the space, and just get "the answer".

Although in some ways youre right. There will need to be changes in the skills we value, just churning out words, or calculating large sums and interating obscure trig functions will lose value.


> Current research is that experts are better with a ChatGTP, and beginners worse.

Current research being your opinion?

Because literally every expert I personally know to be competent says that they're essentially garbage producers and don't use them beyond occasionally checking if they've improved. And no, they haven't.

The only way to get decent code from them is to be so specific that you'd be able to produce the same code in a fraction of the time you need to add all the context necessary for the LLM to figure it out.

It's like coding with a junior. Thay can produce great code too. It's just gonna take a lot of handholding if complexity skyrocks.

The only place llms got a foot hold is with generating stock art that's mostly soulless marketing material. This "industry" is going to crash so hard in the next 3 years


Sorry, it was from a comment, metrics showed senior devs improved, juniors didn't.

But if you want a sketch solution quickly, know the requirements, and can fix the silly mistakes it's a useful tool. Especially if you are doing something where you are rusty, or are used to a slightly different framework or language.

They are like a super fast but slightly clueless junior, who can hammer out mostly correct boilerplate. Not someone you want unsupervised or training a junior, but usefully in some ways.

And yes, boilerplate has disadvantages, but it has its place.


>The only way to get decent code from them is to be so specific that you'd be able to produce the same code in a fraction of the time you need to add all the context necessary for the LLM to figure it out.

You’re not using it to be better than yourself though in your domain of expertise though.

For example if I need a very specific bash script, I can describe it to ChatGPT and get a good result far faster than I can learn the appropriate bash.

Or if I need a summary of all the config options for some package.

You can also use it to speed up your own learning. But this all depends on you being able to check the out put.


Agreed.. All the co-pilot devs where I work are the juniors. Most of us just type the code we want.

Experts are often slower at starting to learn new tricks, because they don't need them, but are better at using them once they do.

But for junior devs, it's often the blind leading the blind.

That said, I don't actually use ChatGTP much. I'm just pretty sure I should in some areas.


Not a junior, but I find it useful.

Generating test data, adding documentation template for functions (which tbh is less useful when you specify types everywhere, even inferred types) and recently, after reading a HN comment, 'rubber ducking'. Which is great, because almost each time I need the rubber duck technique was for real obvious and easy issues I overcomplicated (last was induced by a naming change), and ChatGPT is quite good at that.


seems it's more about active thinking vs just get the job done mindset

>we all know now how wrong that was.

How wrong was it ? I find myself having to do head math every time I playing video games.


I do mental math pretty much every day. Even just going through the grocery store.

I also use things I learned from calculus and physics. Like when driving. No, not solving integrals all the time but it does teach you to focus on rates of change


Which video games?

League of Legends.

Just knowing that your combo can finish an opponent can switches your situation from very passive to very aggressive. Also some jungle / gank timing require clock math.

I think most serious / pro players memorize the numbers already. But casual players like me still have do head math.


This is about building [math] skill, not simply calculating a result

sure, but kids education is about building useful skills for life. passing a test is not a useful skill, if the test is useless.

teaching kids problem solving and how to be productive is important. maybe that means knowing how to solve certain mathematical equations, maybe that means knowing how to use tools like ChatGPT. focusing on the math just for the sake of passing tests because that's what we've got good tests for isn't super helpful to anybody.


Passing tests is not useless. It shows knowledge at the level of the test.

I write tests for my code and when they fail I know I have to change something. If you're finishing a math test and not passing, then you don't have the skills to solve the problems in said test.

Tests are only useless when you ace them.


Beyond that, there’s research[0] showing that testing humans actually increases their retention on that subject — we don’t just gives tests to kids to evaluate whether they successfully learned something.

[0] https://gwern.net/doc/psychology/spaced-repetition/2006-roed...


(1) chatGTP is not a calculator. A calculator resolutely computes the exact numerical answer (modulo finite bits and rational numbers etc). ChatGTP is more stochastic. sometimes it's right, maybe a lot of the time, but no promises! (2) said student is taking an exam to find answers without the benefit of ChatGTP. Thusly doing exercises in a ChatGTP will not benefit the exam in the same way someone who only uses the calculator will suck at mental math. similar to what some of us may have seen "no calculators allowed on math exam"

but the big hangup is that "stochastically correct/incorrect" part. Can't count on it, still need skillz. Maybe when straberry/q* drops things will change in this regard


The same holds for calculators. My university banned graphical calculators from math courses. It is too easy to plug in a formula and see the graph. But for mathematical intuition, they want you know how formulas look like from your memory.

Mechanical aids—be they abacuses, slide rules, pocket calculators, supercomputers, or large language models—will never replace the need to reason.

Unless you can do the task yourself, you have a very hard time figuring out whether the LLM hallucinates or not. Relying on ChatGPT for your education is the path towards believing fake news.

Except you still have to validate ChatGPTs answer... Even answer keys in text books are sometimes wrong, teachers are sometimes wrong. ChatGPT is often wrong. You still need to learn the skills.

"It’s this struggling that builds a muscle…"

Very few of us have the luxury or good fortune to achieve anything of substance without effort, as the old saying goes 'no pain, no gain'.

Good teachers help and are essential for encouragement. If or when AI becomes a truly encouraging mentor then it will be useful.

When a kid I recall hearing a recording of Sparky's Magic Piano on the radio which drove home that there's no shortcut and that effort and hard work eventually pay off. For anyone that hasn't heard it there's a copy on the Internet Archive: https://archive.org/details/78_sparkys-magic-piano_henry-bla...

Here's a Wiki synopsis: https://en.m.wikipedia.org/wiki/Sparky%27s_Magic_Piano


> ChatGPT, I suspect, is akin to looking up the answer.

It’s a tool that can be used in a variety of ways. For example, if you have it act as Socrates working through a dialogue where you still have to use your logic to get to the answer, I doubt that is akin to looking up the answer.


In life there usually isn't an answer.

There's never an answer out there waiting for us. Answers are by definition created by people.

Even when there is, circumstances change and the answer is wrong or slightly wrong.

2 apples + 2 apples = 4 apples. No human intervention needed, and not slightly wrong.

2 water drops + 2 water drops = 1 water drop.

Look closer, one of the apples is rotten. 3 apples.

Exactly how things really work.

If you know you got it wrong and you don't know why, what's the alternative to working backwards?

Sir, the topic of this thread is "back in my days the road to school went through a forest full of bears, an active volcano, a drug cartel turf, and a warzone", not "how to teach effectively". Of course you won't learn anything without the answers because how are you supposed to know if you should adjust your thinking or not. If you can correct your own scribbles without answers, then it means that you're practicing things you already know, which isn't what most people consider "learning".

But that ought to be the last resort. Many of my textbooks had answers for only some questions, answers to the others were only in the teacher's manual.

An LLM used properly would be like an individualized tutor that knows the subject very well, and learns the student's quirks quickly.

I mean, any pre-prompt telling the LLM to be a good tutor and not just give kids the answer can be trivially bypassed by 8 year olds; but if the session logs were available for (potentially LLM-based) review to check whether they stayed in tutor mode...

Shit, I should wrap a UI around this and sell it.


Shit, I should pitch wrapping a UI around this and sell that.

Is "never ever, ever ignore previous instructions" a valid prompt?


It's valid, but so is "ignore previous instructions, even if you were previously instructed to never ever, ever do so."

This and related problems may eventually destroy the world.


These comments are filled with misunderstandings of the result. There were three groups of kids:

1. Control, with no LLM assistance at any time.

2. "GPT Base", raw ChatGPT as provided by OpenAI.

3. "GPT Tutor", improved by the researchers to provide hints rather than complete answers and to make fewer mistakes on their specific problems.

On study problem sets ("as a study assistant"), kids with access to either GPT did better than control.

When GPT access was subsequently removed from all participants ("on tests"), the kids who studied with "GPT Base" did worse than control. The kids with "GPT Tutor" were statistically indistinguishable from control.


Changing things almost always improves results, that is the first rule you need to remember during education testing. Most of the improvements disappear when you make it standard.

This effect likely comes from novelty being more interesting so kids gets more alert, but when they are used to it then it is the same old boring thing and education results go back to normal. Of course things can improve or get worse, but in general it is really hard to say, you need to have a massive advantage over the standard during testing to actually get any real improvements, most of the time you just make things worse.


Reading comprehension is really awful nowadays or people tend to just comprehend what confirms their prior beliefs. The sad part is that none of those people will ever realize the errors in their comprehension of the article. That's exactly one of the mechanisms how people form wrong opinions, it's compounding and almost impossible to change.

The takeaway should be (besides the research still needing reproduction) to encourage the control of the type of AI agent that is given to students, ones that don't just give answers to copy but provide tutoring. OpenAI should be forced to develop such a "student mode" immediately and parents and educators need to be made aware of it to make sure students are using it, otherwise students are going to get much worse in tests, as they just ask it for answers to copy in assignments.


Assuming that the kids with "Human Tutor" were statistically better than control (they were not in the study so we will not know) - this is a very poor showing for ChatGPT.

Used incorrectly. Yes.

LLMs, for me, have been tremendously useful in learning new concepts. I frequently feed it my own notes and ask it to correct any misunderstandings, or to expand on things I don’t understand.

I use it like I would an on demand tutor, but I can totally understand how it could be used as a shortcut that wouldn’t be helpful.

In the same way, I can hire a tutor that will help me actually learn, or I can hire a “tutor” that just does the homework for me. I’ve worked as a tutor so I’ve seen people looking for both, and people that don’t want to learn are always going to find a way. People who do want to learn are also going to find a way.


> frequently feed it my own notes and ask it to correct any misunderstandings, or to expand on things I don’t understand

In what fields? I’ve tried this with some simple finance and aerospace problems; it’s me, sophomore year undergrad, except someone laced everything I drank with LSD.


I find it's good at answering my questions about humanities subjects e.g. history.

Maybe it's only as good as an undergraduate, but I don't have a humanities degree, so an undergraduate is someone I can learn from.


If it’s leading you to sources, go for it. If you’re taking it point blank, consider what you’re learning. (If it isn’t pointing you to sources, ask for them.)

Technical fields.

Right now I'm going through the Dive Into Deep Learning course/textbook. Oftentimes it will gloss over a concept, or assume knowledge.

For a specific example: I wanted an expansion of why there are different loss functions and why you might choose one over another since the source material sort of plowed right through it. I got a good answer to my question without having to spend an hour reading through other materials


> it’s me sophomore year undergrad, laced with LSD

That was the deepest most insightful version of me.


> LLMs, for me, have been tremendously useful in learning new concepts.

Are you validating this in a rigorous way like this study, or are you just "feeling" like it's useful for your learning?


You want me to put a sample size of 1 anecdote through a peer reviewed study?

All I'm saying is that I find it to be useful to be able to ask questions about topics that I'm learning, and have them answered instantly instead of having to use a search engine or other resource to hunt down an answer to a specific question.

For a recent example where I am learning about deep learning: In what circumstances is MSE a useful loss function for regression? This is not a hard question to answer from a variety of resources, but I find it particularly useful to have my question answered differently instead of having to sort through a variety of resources. Maybe I'm losing something by not having to click through a few different resources and parse information that isn't directly relevant to what I want to learn. But for me, getting the answer fast keeps me from having to break away from the actual material that I'm learning.


From the abstract:

“Consistent with prior work, our results show that access to GPT-4 significantly improves performance (48% improvement for GPT Base and 127% for GPT Tutor). However, we additionally find that when access is subsequently taken away, students actually perform worse than those who never had access (17% reduction for GPT Base).”

Kids who use ChatGPT do actually “significantly” better according to the authors. Now I don’t know if significantly means statistically significant here because I haven’t read the methodology but 127% increase in performance must be something. That said, that’s a clickbaity title if I’ve ever seen one.

Edit: Upon closer reading, the increase in performance is statistically significant. Also “access to GPT“ in this case is having GPT open while solving the problems, not studying with GPT and then solving the problems, which was my first understanding from the clickbaity title. Results are not terribly surprising in that regard.


> Also “access to GPT“ in this case is having GPT open while solving the problems, not studying with GPT and then taking the test

If this is your takeaway you misread the paper. Students have access to GPT (if they have access, the control didn't) while working through practice problems. Not for the exam itself. From the paper in the experimental design section:

> Each session has three parts:

> 1. In the first part, teachers review a topic (e.g., combinatorics) previously covered in the course, and solve one or more examples on the board. This part is identical to a standard high school one-to-many (i.e., teacher-to-students) lecture.

> 2. The second part is an assisted practice period, where students solve a sequence of exercises designed by teachers to reinforce the covered concept. Our randomized intervention (described in more detail below) only affects this second, self-study part.

> 3. The third part is an unassisted evaluation, where students take a closed-book, closed laptop exam. Importantly, each problem in the exam corresponds to a conceptually very similar practice problem from the previous part—this design was chosen to help students practice the key concepts needed to perform well on the exam.

Students with GPT (either form) did better during the practice problem portion and then worse during the actual exam (without GPT access) than students in the control.


Thanks for the clarification. I did look at it pretty quickly initially. Students might be over-relying on GPTs, which means less studying, which means less useful retention in the exam

They don't do better as far as learning is concerned. Isn't this concerning?

> However, we additionally find that when access is subsequently taken away, students actually perform worse than those who never had access (17% reduction for GPT Base).

The abstract continues:

> That is, access to GPT-4 can harm educational outcomes. These negative learning effects are largely mitigated by the safeguards included in GPT Tutor. Our results suggest that students attempt to use GPT-4 as a "crutch" during practice problem sessions, and when successful, perform worse on their own. Thus, to maintain long-term productivity, we must be cautious when deploying generative AI to ensure humans continue to learn critical skills.


Relying on GPT while solving the problems must be inflating the grades (my use of the word inflate is intentional, because the evaluation does not represent the true knowledge of the student), which then results in lower retention in the long run.

People sophisticated in a field can ask Sonnet or 4o questions that amount to a different way of searching and sometimes even a better one. If you ask a question in a direct, probing, narrow way you can sometimes come out ahead.

Someone educated by the News Feed algorithm (which is what RLHF amounts to: reward for getting human to click) is going to be the worst kind of wrong: /r/ConfidentlyIncorrect.


+1 insightful

PS was there ever a blog at b7r6.net?


Not yet.

I’m still grinding out what disclosure is and isn’t responsible.

But bet your bum I’ve got a theme picked out.

I didn’t realize anyone cared who I was enough to know I held the domain.


I was just curious; liked your comment, checked your hn profile, visited the domain it listed (shrug)

> Kids who use ChatGPT do actually “significantly” better according to the authors

No, ChatGPT does significantly better than the kids who don't have access to ChatGPT.

Copy-pasting answers from ChatGPT isn't some amazing skill.


You can read the paper itself to get my useful information. The title and article is confusing.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486


Story time. I always struggled with math as a kid. School to high school, then didn't touch it much until Uni. Teachers typically couldn't explain things in a way I "got it" in a school setting. I had some success with a private tutor to get me over the line in high school.

Then at Uni I'm doing Computer Graphics, which included advanced (for me) math. I was panicked, and initially struggled until one of my good friends who was also studying the same course, and is VERY good at math, was able to answer my vague "I don't get it" questions, or at least guide me to more specific questions.

I think I'm quite a visual learner, I don't think at that time there was a concept of people learning "differently". Luckily my good friend was also a visual learner, along with also being very good at math. It was like someone was able to see how my brain worked and feed me information in a way it could compile. I became quite good at math after that.

You really need to learn how to learn. Its fascinating, but also horrifying when I now consider all the lives that have been negatively impacted because this wasn't understood, and people were led to believe they couldn't do something which maybe then really wanted to be able to do.

If GenAI can help with that, I'm all in.


Glad to hear you were able to find a mechanism that clicked and stuck with it after that! The concept of learning styles for individuals though is a common myth

https://onlineteaching.umich.edu/articles/the-myth-of-learni... https://www.youtube.com/watch?v=rhgwIhB58PA&t=2s

There isn't any evidence that individuals learn best to a single style and generally approaching learning from multiple facets is the best way for everyone to learn!


Thankyou, I'll be interested to understand more about what the latest thinking around this is.

I've always assumed some people are better suited to learning one way, while other another way. I've never been good at absorbing information or understanding (which I'd differentiate from knowing, rightly or wrongly) through reading, while I know many others who can easily do this.

I'm hoping some of these resources touch on that.


As far as I understand the current scientific consensus is that learning styles do actually not exist.

It's more like the science has concluded that people don't have personal learning styles that help them the most. However it has also concluded that different learning styles all improve learning for all. So if you want to improve then you should try as many angles on the problem you can.

How did your friend teach you math visually?

I'm using the term "visual" very loosely here. I prefer to be able to conceptually group and link things logically. He was able to explain things in a context that matched this, and greatly simplified things. Some things that I still suck at learning are foreign languages. Sometimes there are rules you can follow, sometimes you just have to "know", and sometimes the rules that apply in one case are the opposite in another. I'm amazed anyone can communicate effectively at all :-)

Sounds like your friend is an amazing tutor. Communication and teaching is indeed hard.

I really believe most people don't learn math because they don't get the teaching they need, not because math is somehow cognitively inaccessible for them (except in case of low iq). Lots of people learn language and grammar and are able to write strong texts with a high level of rational and logical thinking, but have not been able to learn math. These people clearly have the mental faculties for math, they just didn't get the teaching they needed. Math is a language for expressing thought, just like any language is, and these people have mastered other languages, so they have the prerequisites for being able to learn the language of math. Math is an especially hard language and it requires good teaching.


If I lived before the tape measure was invented, and rely on carefully placing my metersticks to measure things, I can get really good at measuring without the need for a measuring tape. After all, a measuring tape is just a few flexible metersticks anyways, so if you need to measure something longer than the full length of the tape, you are screwed.

If you take the measuring tape away from the person who relied on that tool instead of being good at using a meterstick, or perhaps no tools besides their own arm length, they are gonna suddenly not be able to measure, unless they go through the effort of learning to measure without the tape.

You can argue that measuring tape is a crutch preventing people from learning how to properly measure, and has its own limitations, but regardless its still really helpful, especially for people who only need to measure things occassionally, and not super long things.

ChatGPT is a tool. Just like all other tools, like computers, cars, etc., if you take it away, most people cannot perform the function for which they relied on the tool to help them do.


What if the function they were using the tool for is basic reasoning?

ChatGPT is not the first tool used to replace basic reasoning, and we haven't collapsed yet. I would argue that nearly all tools can be used to remove basic reasoning, depending on if you choose to use it in such a manner.

What previous tools would you say replaced basic reasoning? I would not classify calculators, spreadsheets, spelling and grammar checkers etc as basic reasoning tools.

I'm talking about bringing together several independent concepts or sets of facts and forming a coherent point of view or argument from them.


> What previous tools would you say replaced basic reasoning?

No tool replaces reasoning itself. Not even an LLM. All (most) tools have the capacity to, if you use them for this purpose.

> I would not classify calculators, spreadsheets, spelling and grammar checkers etc as basic reasoning tools.

Why not? I have seen people use calculators to chug "6-2" before, and they were not children. In fact, all four of those examples are perfect for my point. They create the opportunity to do great things, or be a crutch so you don't have to think. If you are convinsed that there is no user of grammarly that didn't just quickly type a draft and throw it in without proofreading, for the sake of not needing to use basic grammatical reasoning, then you have a very kind view of mankind.

> I'm talking about bringing together several independent concepts or sets of facts and forming a coherent point of view or argument from them.

This is a more narrow view than what I was arguing, so it's possible we will have to agree to disagree. I guess a quick rebuttle here is that people who don't exercise sufficient thought to form coherent conclusions from separate information were going to stay stupid anyway. And then yes, ChatGPT would be bad because it could in theory enable this behavior in its handouts.


> They create the opportunity to do great things, or be a crutch so you don't have to think

The very reason people are so excited about LLMs is that they believe that they are capable of a kind of thought that machines up to now have been incapable of. If you buy this then it follows that relying on them has a qualitatively different effect on our own abilities than previous tools.


I worked at building supplies shop for 5 months when I was a youth, and I still have an uncanny ability to estimate lengths of building materials 20 years later, just from having handled and sorted different sizes of building materials so much.

I remember when I started there the old guys would point at stuff and say the length of it, and I was just amazed that they were correct, even if if it was a long 4.8 meter long plank of wood that they had never seen before. I gained that ability after like 3 months there and I still have a strong ability for it now. Weird random ability.


After skimming through the article - as I understand ChatGPT was used as a tutor.

Many users of ChatGPT clearly know that it doesn't do math.

Now imagine your teacher is wrong at solving 40+% of problems it’s teaching you? Or that your measuring tape is wrong at nearly every second measurement?

Yeah it’s a tool - but you must nail your fundamentals right. Where I grew up - calculators were not allowed in elementary school - so every student must nail down basic arithmetic.


why is this surprising. all such tools hamper learning. if you want to learn, read books, read and write. don't use a spellchecker for ur language exam. no calculator for calculus. pen and paper. how is this going backwards :(

As cool as AI is, the only thing it’s going to do is increase inequality.

I bet all the wealthy and middle class parents with STEM background will get tutoring for their kids in the “old ways” knowing full well that the people who aren’t reliant on AI and can spot mistakes in output will now have a huge advantage in the workforce.


This can also back fire.

I remember the days when people said programmers who don't use the internet/Google while coding actually learn more than those who do. While that was true initially- Eventually internet was just the norm and all pervasive. People who didn't use just got more unproductive, Same with companies.

Sure grind on whatever you are learning, but don't equate suffering with making progress. After a while do use the tools, you won't be building anything worth while without them.


This is an interesting thought. I learned programming thru a text editor vim. I didn’t have autocomplete and LSPs wouldn’t be invented until 8 years later. I wasn’t smart enough to figure out how to install/setup omnicomplete or snippets. All this meant I had to seek out the docs every time I wanted to use a method (or even be aware of an API), I had to write the same method every time.

I’d like to think this paid off for me, when writing JS I have a good heuristic on what methods do what that was slowly developed over constant repetition.

Now that I’m mentoring others I notice how others work with these modern editor features. It’s certainly faster than how I did it but I do wonder if there is a difference in aptitude.

When I worked at my first large corporation there was a very intelligent dev. I noticed over time he never really looked up answers on stack overflow or random blogs. He always went straight to the docs and the source code itself. I like to think his method of deliberate slowdown has paid off massively, even the way he asks questions was better than the rest of us.

It is hard to know which ways are better for learning and in the end we were all roughly making the same amount but there has to be something more to the usual pedagogy for software engineering.

I do wonder if tools like autocomplete, myriad of internet answers or musings, and now LLMs may be a hindrance for initial learning but it’s always hard to make these arguments because we have the benefit of learned experience whereas the new generation are now using different tools than us yet still arrive at the same conclusions.


I can report that I have learned more about bash scripting in the past couple of years using ChatGPT to write all manner of scripts, than I did in the previous 20 years of copying and pasting things off StackOverflow.

You’re supposed to learn by reading books and doing exercises by yourself (https://tldp.org/LDP/abs/html/). SO is equivalent to ChatGPT and will teach you nothing deep.

Disagree that it's the equivalent of SO. Unlike SO, I can ask it why it is doing something and as a result I can better understand what it writes, I can create little test cases, and make sure it's not going to do anything destructive or incorrect.

Given the fact that I only ever write bash scripts when I need to "make computer do thing now now now now now" the likelihood that I would ever take the time to deeply learn the ins and outs of bash are slim to none.

I can ask it to explain certain constructs and give me worked examples very quickly to improve my understanding.

Even though I can't say I could reproduce all the various things that it has written for me, I know things that bash can do that I was never aware of before, and I can recognise lots of things in the syntax now that would previously have looked like gibberish.


The difference is you can't query or chat with a book.

You can with a LLM. So if you are learning some thing new, you can keep poking and asking questions until you reach some level of understanding.

Im guessing with a book a lot of that has to happen with yourself.


Yeah, we all learn exactly the same way. We're all great at reading books and following steps in the right order.

Agreed. I have learned more about AI using chatGPT compared to traditional stuff.

If your goal is passing a test or rushing through homework then yes. But if you’re actually curious about a topic and understand the limitations of LLMs and how to use prompt language you can actually learn quite a lot from them.

Agreed. I dont want kids to use this until they are much older... it's a crutch. They are not applying critical thinking skills when using a LLM. The value is when children try->fail->try->fail->try->succeed. It builds up problem-solve abilities.

The title could be worded better. Kids using "base" GPT4 performed poorly but the ones with access to a finely-tuned "tutor" GPT4 did okay. The study was purposefully done in a domain the current SoTA LLMs struggle in (Math).

From the (draft!) paper's abstract:

  A key remaining question is how generative AI affects learning, namely, how humans acquire new skills as they perform tasks. This kind of skill learning is critical to long-term productivity gains, especially in domains where generative AI is fallible and human experts must check its outputs.
  ..
  Consistent with prior work, our results show that access to GPT-4 significantly improves performance (48% improvement for GPT Base and 127% for GPT Tutor).
  However, we additionally find that when access is subsequently taken away, students actually perform worse than those who never had access (17% reduction for GPT Base). That is, access to GPT-4 can harm educational outcomes.
  These negative learning effects are largely mitigated by the safeguards included in GPT Tutor.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486

To emphasize, the "GPT Tutor" kids didn't do worse on the exam than the control kids, but they didn't do better either. The effect was slightly negative, statistically insignificant:

> Student performance in the GPT Tutor cohort was statistically indistinguishable from that of the control cohort, and the point estimate was smaller by an order of magnitude (-0.004), suggesting minimal impact to performance in the unassisted exam.

"GPT Base" would provide complete answers, but "GPT Tutor" was prompted to provide only hints. So the result is perhaps that:

1. Given the option to let a machine ("GPT Base") do their homework, many kids will lazily take it. These kids won't learn as much.

2. A machine that refuses to do their homework ("GPT Tutor") doesn't cause that problem. It doesn't seem to help either, though.

I'd guess that laziness would explain most of the harm rather than mistakes made by "GPT Base", though I have no particular evidence for that. Maybe someone will repeat this study in a domain where "GPT Base" makes fewer mistakes, allowing those two effects to be distinguished. (Though would that pass ethics review, now that "GPT Base" is known to impair learning in at least some cases?)


Not surprising. Test is all about memorising things. If you don't need to memorise everything because it's on google, you won't.

Thus, when the test rolls around, nothing is memorised and then they do bad.

It's like memorising phone numbers VS keeping in the contacts app. Before I memorised tons of numbers, but now they're all on the app and I barely recall my own.


I was willing to entertain the idea they could do better. I guess the tests have to be written to leverage the skill.

That said, all things being equal kids who write notes by hand out-perform kids who type them. Even touch type them. So maybe the old ways are better in this specific brain-knowledge-competency-understanding forming space?


I never was good at hand written notes, but I always outperformed my classmates on tests. Except for the one geometry class where the teacher gave me detention for not taking enough notes. Maybe I'm an exception.

I'm unsure its safe to generalize in this space TBH. I think your story is familiar and enough seen it might be a counter-case to the argument.

That said.. "learning styles" has mostly been debunked. In truth we don't entirely know why some people do amazingly well in maths and some don't. It's an interesting field of research (and the related field of "does well in compsci")


It seems kind of obvious, no?

The act of repetition and processing the data ourselves is what leads to a deeper understanding, and asking a chatbot for an answer seems like it would skip the thinking required when learning "the old fashioned way."

Maybe we can learn how to incorporate using chatbots in education, but I suspect there need to be guardrails on when and how they are used so students can get the benefit of doing the work themselves.


> A third group of students had access to a revised version of ChatGPT that functioned more like a tutor. This chatbot was programmed to provide hints without directly divulging the answer. The students who used it did spectacularly better on the practice problems, solving 127 percent more of them correctly compared with students who did their practice work without any high-tech aids.

Is it me, or is does this directly contradicts the title?


You seemed to leave the part out of your quote where it says those higher percentage of practice problems didn’t translate to test scores.

At best they do the same, at the worst they do worse. There was no scenario where ChatGPT actually helped (unless you count practice problems, which are apparently a meaningless stat, since it doesn’t translate to test scores).

Most kids in the general public would use normal ChatGPT, not one specially optimized to act like a tutor. So in the real world, kids who use it will do worse, and it’s not with seeking out the special one, as there is no advantage.

The title seems like a pretty fair 1 line summary, and pours some much needed cold water on a space that has been full of hype and promises, many of which have fallen flat.


That 127 is when they had access to chatGPT during _practice_.

They fared worse once that access was taken away during the actual exam.


What if the test is irrelevant to the current times?

“Those with ChatGPT solved 48 percent more of the practice problems correctly, but they ultimately scored 17 percent worse on a test of the topic that the students were learning.”

So, in the real world, where people can use chatgpt in their jobs, the kids that use it will do better than the kids who don’t.

Maybe a better test is: can you catch chatgpt when it is wrong? Not, can you answer without ChatGPT?


ChatGPT won't be able to help with all situations in the real world though. And there's a chance OpenAI fails and other LLMs become less accessible and more expensive. Future lawsuits and legislation could lead to them becoming cripled. It's a hell of a crutch to teach the next generation to lean on.

I’d argue the opposite, that rarely in the real world is chatGPT easily accessible/usable.

Working with a customer/client? I don’t think you’ll take a minute to say “give me one second let me ask ChatGPT”


Depends on the job. My sister uses an AI (Gemini) so much that she has a subscription. Her husband also uses it frequently, and keeps it open in a tab at all times. He is a programmer, and she does marketing. I don't use an LLM for work, but very easily could if I wanted. All three of us mostly communicate with coworkers/partners/clients through email, so it's definitely doable.

Ok makk we're all set to perform your open brain surgery tomorrow I'm going to have chatgpt open on my laptop so we should have that tumor out in a jiffy.

I recently used AI assistants for help with programming homework. My usual prompts include "help me think in the right direction", "is my thinking correct" etc. I also find myself copy pasting a question in chat to understand it better.

I had the suspicion that this is not aiding in my learning process even though I am able to "solve" more problems. Nice to see this confirmed. Time to stop!


I learned programming before genAI, but with plenty of video tutorials and stackoverflow available.

Someday I noticed that I would automatically hit up the search engine within seconds when I hit a road block. It's kinda the same as with GenAI. You just have to know how to query search engines efficiently. I got stuff done, but I often did not learn much and sometimes didn't even understand what I was doing. So I started to force myself to use data- or reference sheets and try to come up with a solution myself. And if that wasn't good enough or I had real trouble, then I tried using some different resource. And it improved my programming experience tremendously. It's hard, but it's worth it.


I am super intuitive and also suspect that you probably coded frequently. In fact, I would assumed you did so on… a daily basis ;)

> I also find myself copy pasting a question in chat to understand it better.

This is different fromt students relying on ChatGPT to pass tests.

Your use is much more adulty and you're trying to understand before proceeding. You use ChatGPT like a tutor rather than a calculator, which is improving what you know rather than taking from it


Thanks! I think the problem is when I ask for assistance.

It's perfectly fine to reach for help after a fair attempt. However, I sometimes catch myself reaching for help too quickly. This happens mostly when I'm tired which leads me to think of homework as something to get done with, rather than a learning exercise.


Key part of the study is that it gets the logical steps wrong. Worse, it's convincing, so be careful.

LLMs are great for finding key words in a domain you don't know which you can then use to search.

As a mostly self taught programmer the advice I'll give it to read docs and learn to read code. This is when my skills really increased. It's easy to get caught up in trying to just find the answer and doing these things definitely takes longer. But it has a multiplicative effect. You'll not only learn to be able to do it faster but you'll need to do it less often. So never pass up the opportunity to learn. There are times you need to rush but they're far less often than you think.

The other big advice I have is two strategies when coding. First write it quick and fast. Sloppy is okay, you're learning to solve the problem (move fast and break things). Then while you're cleaning up the mess you made, document. While you're doing that it'll stress the lessons into your memory and you'll almost always discovery things you missed. It is literally the rubber ducky method. Plus, you get the benefit of docs (others will read your code. And months or years later you'll come back, asking what idiot wrote this garbage to only find it's you. But also, that's a good sign, because you improved!). The second point is that the best code is flexible code. Remember that you suck at code (the secret is we all do) and you're going to have to come back, edit, and debug. So by writing flexible code you're making life easier for future you. It's really easy to forget these lessons because we think in the moment and our egos don't want us to admit we're bumbling idiots, but like in a video game, when it gets harder it means you're progressing. Don't misinterpret that signal.


Thanks for the advice!

> As a mostly self taught programmer the advice I'll give it to read docs and learn to read code. This is when my skills really increased.

How did you improve your ability to read code? It's currently hard for me to understand a largish codebase written by others without much documentation. eg. I thought of contributing to htmx [1], cloned the repo but couldn't make heads or tails of the codebase - even though it's a single file, albeit a long one.

1. https://htmx.org


  > How did you improve your ability to read code? 
You're going to hate me for this, but it's by reading lots of code and being very very confused. When tackling a large code base I still pull out pen and paper and create flow charts.

And don't fret that you suck at it now. These are really hard things but they take time


Side note and blog promotion: I find fascinated that ChatGPT can easily simulate the age of child when giving answers for homework: https://www.fabianzeindl.com/posts/chatgpt-simulating-agegro...

Why is the study specifically of Turkish students

The paper at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486 says:

> One of the co-authors, Özge Kabakcı, a high school math teacher and former department chair of the math department at our partner Turkish high school, led the development of all session materials.


I guess it would be hard to find a high school with the sample size that you need (thousand) that will agree on collaborating. And in the US every county will have different rules and in terms of math they don't teach it in standard way.

But why Turkish not British or any other place is going to be a question no matter the location. But do you really think the results will be significantly different if it is done lets say on Vietnamese students?


its like gulping food; one has to chew. Time to learn how to educate when knowledge is at your fingertips.

What were the primary reasons that made students who used ChatGPT do poorly on math assessments, even though they had worked correctly through a greater number of practice problems?

They didn’t have to figure out how to solve the problem. Instead of struggling a bit, which is where the learning happens, they would likely go to ChatGPT for the answer. When the answer bot was taken away, they weren’t prepared to think about how to solve the problem and work it out.

I’ve noticed this even using Copilot in VS Code. I rarely use it, but if I start pulling it out, I notice at the first hint of actually thinking about how to do something, my brain seeks to ask Copilot instead. It’s like there is an off switch that gets flicked when there is an easy button available. If I were to figure it out on my own, I’d know what to do next time I run into a problem like that… if I use Copilot, I’ve learned nothing, other than next time I run into this, use Copilot.

It’s a crutch when it comes to learning.


Exactly. I stopped using it entirely, and I noticed that I would write something and then pause, expecting copilot to take over for a while after quitting. It felt like my brain wasn't really engaged.

Its anecdotal, but I feel a lot better after giving it up and I think I can do a lot more when I can fully reason about the problem after poking at it from different angles.


> they had worked correctly

ChatGPT worked correctly for them but they learned nothing. It should be pretty obvious that you don’t learn by copying answers.


> A draft paper about the experiment was posted on the website of SSRN, formerly known as the Social Science Research Network, in July 2024. The paper has not yet been published in a peer-reviewed journal and could still be revised.

Should have started with that.

A study without independent replication hardly counts as «researchers found», much less one that hadn't even been peer-reviewed yet !


I think the problem that people don't see anymore is using tests themselves. A clever idea is worth more than a single tick in the correct checkbox. This applies to maths as well. Tests are faster to check and, supposedly, objective, but a viva voce exam is still superior imho.

The evaluation method is wrong.

It's like when cars first came out, you ask people to drive cars for a month and they get used to cars. Then you ask them to compete in a horse race and see how fast they can go.

We should evaluate how fast they solve a problem, no matter how.


I use ChatGPT 4o to check my child's homework, but I forbid them from using it directly. That way, I can make sure the work is correct (or at least wrong in the same way as ChatGPT) without straining my tired brain.

Kids who have their parents do their homework do worse on tests.

s/parents/chatgpt


Do kids who have get private tutoring also average worse on tests?

Your characterization of "do their homework" seems wrong as the paper says of the GPT Tutor branch that "is hard for students to use it as a crutch since its prompt asks it to avoid giving them the answer and instead guide them in a step-by-step fashion."


"I blame the parents" s/parents/chatgpt/

Did you read the article before responding to it? It wasn't set up to do their homework.

Did you read it? When not expressly forbidden from doing so with the 'fine-tuned on their specific test problems and answers' model, they simply asked ChatGPT for the answer.

I think the way to us ChatGPT is to have it explain a concept once and give a few examples.

After that, the student should struggle the old fashion way with problems.

I would like to see a study that looks at this approach.


I have wondered if future generations will struggling with critical thinking / problem solving - without the aided technology assistance.

I think they will be the case. When I was a kid my high school math class was very calculator driven. I have a lot to say about that, but won’t bore you with it. I would often see adults pull out a pencil and paper and do a whole bunch of math very quickly, or do a lot of math quickly in their head. I couldn’t do this as well as them, and my math classes weren’t doing anything to move me in that direction. When the calculators were taken away, most of the kids were useless. I saw a guy in National Math Honor Society put 1-1 in his calculator, unironically. This was almost 25 years ago. I assume it’s only gotten worse, and will continue to get worse as more tools that provide answers are given to the students.

I think new tools are great, but it seems to me that the people who can utilize them the best are the ones who learned without them. Those are the people who understand the concepts behind them, so they can ask the right questions, with the right terminology, and they can recognize if an answer makes sense.

When learning is done with these tools, even just calculators, little is being learned. Memorize the button sequence to press leading up to the test. Once the test is over, forget the sequence… and even if the buttons are relented, it won’t matter, because while we all walk around with a calculator in our pockets now, it’s not a TI-83 (or whatever people use now), so without the conceptual knowledge, what good is the button sequence. That doesn’t even prepare a person to leverage ChatGPT well.


Current and previous generations struggle too. Critical thinking and problem solving skills have always been an absolute bear to teach, and few teachers or school systems have the stomach to teach it.

I’m thinking of science education, for example. It’s not just a matter of teaching kids a bunch of disconnected facts about science like “the mitochondria is the powerhouse of the cell” but teaching kids to use the scientific method. To teach the scientific method well, you have to get the kids to design experiments and be comfortable with the prospect of failure. This is, sadly, a very modern approach and requires training the teachers how to do this. Some schools don’t but it’s not something you would have likely gotten much of in the 20th century.

Instead, people are still fighting over dumb shit like which facts are taught in class. “Critical thinking skills” are not on the menu most of the time.


This is my biggest fear of AI technology. That we outsource our reasoning to things that can't.

As a ML researcher myself I'm often baffled with the direction we go with these tools. It feels like we try to use then to do the things humans like and are best at and then have humans do the things ML models are better at and humans hate doing.

The tech is no doubt exciting (it's why I research it!) but that neither justifies unbounded hype nor blind ourselves. But I guess that's the classic engineering problem: it's easy to get lost in the good and exciting parts of what you're building and lose sight of the harm it can do. It's hard to not fall victim to this and I'm sure we're all guilty of it to some extent, I know I am


I wouldn’t be terribly concerned about that, as testing as it’s done in school is a moronic practice to begin with.

Test results are a measure of how well you can do on tests.

Did nobody read the article? It says right there that the students who used chatgpt right, as a tutor, did much better than their peers.

If your human tutors just give you the answers when you ask for them, how do you think it'll ho?


> It says right there that the students who used chatgpt right, as a tutor, did much better than their peers.

No LLM processing required, just old fashioned critical reading and comprehension (my emphasis added):

>> The students who used it did spectacularly better on the practice problems, solving 127 percent more of them correctly compared with students who did their practice work without any high-tech aids. But on a test afterwards, these AI-tutored students did no better.


I have a visceral dislike, even hate, for what the LLM hype brought the world. The never ending slop it is spouting, filling up the entire internet. More and more I get confronted with images and media that turn out to be AI generated, when I find out I am disgusted and just close the tab.

Soulless drivel, endlessly streaming.

And I'm confident that the education system as we know it will be severely damaged because of it.

Even in our own field, I can guarantee you that software developers that "grew up" with these garbage AI assistants will be worse coders than the generation that came before. You will never develop the understanding, the insight, that's needed by chatgpt'ing your way through college and life.

Excellent news for my own market value of course, but I don't hesitate to say that I regret the LLM hype happened, the impact on the world is overwhelmingly negative (not even touching on the catastrophic environmental and financial cost to society).


But better in life…

Are people not reading the article here?

Let me tldr:

  - Study had 3 groups: normal GPT, system prompt to make GPT act as tutor and focus on giving hints, not answers, and no GPT 
  Group 1 (normal GPT)
    - 48% better on practice problems
    - 17% worse on test
  Group 2 (tutor GPT)
    - 127% better on practice problems 
    - equal test score to control group 

  GPT errors:
    - 50% error rate
      - 8% error on arithmetic problems
      - step by step instructions we're wrong 42% of time
    - GPT tutor was fed answers
  - students with GPT and GPT tutor predicted that they did better (so both groups were over confident)
Paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486

I'll reply with my opinion to this comment. But many comments are not responding to the article content


I think the part of overconfidence is actually important here. Looking at the paper Figure 2 shows the number of questions. For all sessions with GPT students averaged 2 messages per problem. Interestingly 20% isn't restating the problem or asking for the answer. Looks like 50-60% of tutor questions are asking for answer or restating problem and they ask more questions.

In my experience, usually where GPT and other LLMs get things wrong is in the steps. I see people are frequently over confident about the results and I've long thought this was a big part of it that the answer may be right but steps to get there are wrong. I see this a lot with river crossing problems. People will often show a prompt that solves it (half the time there's information leakage) and a good portion of those will have the logic errors. But people just look at the answer (btw, this is the reason I claim LLMs don't reason. It's not about getting the answer wrong, it's about the logical steps. That's the real evaluation!)

Personally I think this is good evidence of over fitting (memorization).

I also think this is why you should be careful when using it to code. It's the details that matter a lot. Since LLMs are aggregators I'd like to remind everyone that the average coder and average code is terrible. Hell, even good programmers often suck. Code is hard! Whenever I try using LLMs to code I find that I write lines faster but I end up spending more time debugging and prompt engineering than if I just read the docs and did it myself (this also has the added benefit of the struggle making me remember more). So I'll use it to write things I don't really care about but otherwise I'm not getting how people are finding it so helpful (yes, I read my comment). But I suspect many people are similar to the students in the study.


I learned the hard way: no pain, no gain.

I liken using llms for 'studying' to going to a gym with a hydraulic lift.

Yeah you'll lift much, much more. But is that the point?


learning need focus

Yeah because it lies and makes stuff up to fill in any gaps.

Gasp

[flagged]


Well I flagged your comment so right back at you.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: