And now that they've already cheated and were caught, what happens? Double down? Come clean? Nope, post publicly online for ideas.
This person does not seem to think things through. Glad to see cheaters put in their place, although I wonder what they would have done if ChatGPT didn't exist.
> I wonder what they would have done if ChatGPT didn't exist.
This can be answered, as until late last year it effectively didn't for the average student.
There were 3 main things they'd do (in my experience) for some kind of take-home assignment. In-room or at-home exam cheating are whole different subjects.
1. Pass notes or answers from year to year. Think shared Dropbox folder between cohorts, with papers and solutions to assignments. Simple fix - don't be lazy, make new graded assignments each year. Actually by doing this, students learn more as they practice the previous papers! Short circuit this by giving them the past assignments openly, and encouraging them to work together on them, then give them a fresh assignment to work on. They'll understand it and be in a good place to do it without cheating.
2. Use essay mills to pay someone to write them an "essay". These sites effectively sell bespoke written essays, marketed with pseudo legitimacy as "study materials". Some come with guarantees of uniqueness and that they won't trip plagiarism scanners as they haven't been given to anyone else. Colleges have access to tools that detect content submitted to other colleges previously, as well as content from the open web, so the essay mill pays other more advanced students or graduates to quickly whip up unique essays for paying students.
Without ChatGPT, I can say with confidence this is what they would do, because it's what they did up until late last year.
You could do stylistic analysis or similar to look for changes in writing style over multiple submissions if something seems "off" - with luck you'll find things like variations of spelling certain words, or specific writing style changes. Or the essay might be overly generic and missing the substance of the specific question topic.
3. Straightforward plagiarism via "copy paste", or "copy, reword a few unusual words, paste". Occasionally you get something more evolved like "copy, paraphrase, paste", or use of a thesaurus to swap words out (often inappropriately) to try avoid binary comparison detection. Fuzzy detection usually spits this though. Sometimes you'll simply get the exact same piece of work submitted by multiple students who have colluded and shared their answer with each other.
Right, totally forgot essay mills were a thing. Originally, outside of plagiarism, I couldn't think of any ways to cheat on an essay.
Part of writing an essay, or at least the type of essay I'm thinking of, is researching and criticizing other views on the subject, so if you find some material to draw from you might a well incorporate it properly.
The other part is presenting your own thoughts - which I assume is where ChatGPT came into play. I'm not aware of anything that could handle this part before ChatGPT (or other LLMs) existed short of presenting other people's thoughts as your own (also plagiarism).
Part of the challenge around essays is likely also what type of essay is sought, and the content required. Some educators are lazy, and just want something submitted with credible sounding words, parroting something suitably deferential to the accepted norms of the field.
Those kinds of essays are the ones most likely to find themselves the subjects of tech or non tech enabled cheating. They're also effectively low signal to noise ratio measures, and I'd discourage them.
When someone is presenting their own views, they can still steal someone else's... But there's going to be more breadth in these, and (hopefully) more substance in the essays.
I personally don't consider essays to be useful at all - maybe they're appropriate in certain disciplines, but I'd argue the more concerned a professor or department is about language models, the less intellectual merit there is in the subject, or at least the work being used to assess understanding of the subject.
Unfortunately, few educators focus on really trying to assess understanding and ability to apply understanding to new situations, but that's the kind of reasoning that generative models will struggle with. A very quick way to catch language model users out might be to ask them a bit about the same topic, then in a second paper give them their own initial assignment back and ask them to critically evaluate it now. Good luck critically evaluating hollow, vapid GPT waffle in timed exam conditions!
It’s astonishing that the word “generative” gets applied to technologies that are designed to duplicate, adapt, interpolate, or outright plagiarize without attribution in most cases. The skills trained into these codes do a lot of things and can produce impressive results, but when these results are traced to their sources, “generation” is at best a last-place description of what they do, and may be entirely absent.
What cheaters would do if ChatGPT didn't exist? They would probably be doing the exact same things they've been doing before ChatGPT existed.
If you meant the OOP specifically, either they would cheat in some old-fashioned way or not cheat at all. Difficult to tell, especially since we don't know whether it was the first time they cheated.
We have collectively become way too ok with cheating. My undergrad institution had immediate expulsion for plagiarism and it was the correct way to do it. It essentially never happened. Contrast that with institutions with "honor codes" and other toothless nonsense and it's night and day. High schools need to have harsh penalties as well (removal from extracurriculars etc) to head this behavior off and/or keep plagiarists out of undergrad programs. If this person is cheating as a grad student then their entire academic history is likely built on lies as well. May their department have the courage to do the right thing and show them the door.
In Norway, self plagiarism has been a hot topic. Should it be treated as "ordinary" plagiarism if a student re-uses parts of an earlier assignment they have written themselves?
And more generally, is it okay for institutions to provide commercial institutions copies of the work of students who cannot say no? To what degree should students be in control of what happens to their work (i.e., can they put some kind of license to it, to limit what it is used for)? Should the students be compensated when their papers are used to increase the value (i.e., database) of a third-party company?
My university definitely had expulsion as a punishment for plagiarism but it was rarely used in the first instance, especially on undergrads or in cases of collusion with other students rather than something like essay commissioning. If you were caught plagiarising a major assignment I don’t doubt that you’d be expelled, but a small coursework (especially in first year) would have probably just been zeroed out or you would have failed the module.
When I was in college, I worked a few terms as a teaching assistant. Cheating was rampant and I never saw any meaningful consequences. Many of the students that I personally caught still ended up passing the class. This was over 20 years ago.
I went back to grad school 15 years ago. Same story, only it had only gotten worse. I dropped out, in part due to the perceived total lack of institutional integrity.
They aren't reading them anyway. 4000 word assignments are just a sadistic game they still play for some reason. You jump through the hoop and write the paper, then they run it through an automated spelling and grammar checker and make sure it passes the plagiarism check before skimming a few lines and assigning a grade. At this point it's just automatically written papers trying to fool automated paper grading software, like some sort of highly inefficient generative adversarial network.
In some alternate timeline an enterprising person at OpenAI has written a memo in which they requested a new service to be implemented. This service hashes all generated responses before they are sent to the end user and saves each hash in a db somewhere. After OpenAI’s products have become popular enough that plagiarism is a valid concern, OpenAI releases their (paid) generated response history API which allows you to provide a hash and get a response as to whether or not it exists in their database. This puts a stop to 85% of all low effort plagiarism and earns OpenAI a decent additional revenue stream.
But that would immediately scare away all the future potential OpenAI customers… why would OpenAI do that? If customers know that it’s possible to determine that a text came from ChatGPT with a 100% accuracy, then no one would use ChatGPT.
The use of generated content for professional or academic dishonesty is a very tiny subset of all potential use cases. Neither OpenAI or schools and corporations want to face the backlash from their customers if it is discovered that they allow generated content to be used deceptively. To say that OpenAI would scare away all their potential customers if there is no chance for cheating or deception with generated content is a gross overstatement. Being able to trace the origin of content (or anything for that matter, in the virtual or real world) is a cornerstone of how the world operates and removing the anonymity of generated content is no different.
The world, society, is not prepared for a scenario where computers generate human quality copy. This is hugely disruptive and will be used by workers and students to cheat their classes and do their work for them for a while.
Because it is for ghost writers or to act as the ghost writer. What good is a text generating AI otherwise if you're not generating text to present as though it's your own? Even if the text is accurate, it's still ghost writing unless the whole point is to specifically present an AI-generated piece of text as AI-generated (aka pure novelty). AI written articles, for example, aren't doing that.
There are many, many uses for a text generating AI other than for nefarious purposes. It can give suggestions for things to do, it can write short stories to amuse you, it can act as an RPG game master, it can help with language learning, and so on. Assuming that it is only for cheaters is like how the music and film industry assumed that the only purpose for ripping CDs and DVDs onto your computer was to pirate them, ignoring that many people wanted to have a media server to stream their legally purchased media within their own homes.
Not the only market, but a big part of it. People who work on social media will definitely make use of ChatGPT… unless Twitter, FB, et. al start to label tweets and messages with “Written by ChatGPT”.
Same goes for anything that has to do with selling written words. Not a small market precisely.
If ChatGPT input and output is not being logged for future use, they're missing a trick. They should provide a tool to universities to paste in a given excerpt and see if it shows up in the output logs. Might even charge for the service!
While there are interesting comments here looking at technical solutions like fuzzy/partial hashing, the solution for a professor is likely a bit different.
Rather than prove the student used Chat GPT (hard for reasons you point out), isn't this better handled as a suspected cheating scenario (collusion, candidate impersonation etc.)
Many institutions have a regulation on the books to allow for oral examination to be carried out for any course at the discretion of the course leader, for the purpose of evaluating a student for award of a grade.
If you suspected a student had used significant outside help, you'd presumably do the same - have an oral examination, and discuss the work, ask them to discuss the subject matter, and test their understanding of it.
At that point, either the student crumbles and doesn't have any level of understanding beyond "paper thin" (like if they paid an essay mill to write the paper), or they have a good understanding. If the latter, then they meet the learning outcomes of the class and have demonstrated this satisfactorily. And if they have learned the content and show an understanding of it, it's effectively acted as a study aid and actually helped them to learn. They now have shown they know the material, so they have passed the course legitimately.
I suspect in the longer term we'll see attempts to ask questions which can't be so easily answered by generative waffle, which actually should be an improvement on many types of examination. How this works in less scientific disciplines will be more challenging - I imagine they'll struggle to create "un-GPT'able" questions.
But colleges adapted (or at least good professors did) to remote examinations during the pandemic - declare the exam "open book", and ask questions that require robust understanding of topics, such that sitting and reading the book will run you out of time. Give students questions with generated parameters so each student gets a slightly different variant of numerical questions, etc.
The solution to language models generating believable or compelling waffle doesn't need to be technical - I suspect non technical solutions may win out. Whether they scale enough (or a professor cares enough to actually sit down with students for a few minutes) is another question, but maybe they should be doing that in the first place?
> Many institutions have a regulation on the books to allow for oral examination to be carried out for any course at the discretion of the course leader, for the purpose of evaluating a student for award of a grade.
This, in my opinion, is the only approach that will work. When I ask cheaters to come in to explain the work, they get cold feet.
> Many institutions have a regulation on the books to allow for oral examination to be carried out for any course at the discretion of the course leader, for the purpose of evaluating a student for award of a grade.
In my country, that would require the whole class to be subjected to an oral exam; as our examination regulations require strict equivalency of evaluation methods across all students that take the course during the term (contents of the exam may differ across students but the main method of evaluation, and outcomes to be measured, may not).
> proving that a student used ChatGPT seems difficult.
I wonder if there is a transparent proxy in place logging access to ChatGPT when on a campus network. My employer does this, and makes no effort to hide it. Everything is logged and specific things are captured for evaluation later.
However devils advocate here - I took an essay question verbatim from a humanities assessed exercise, and very quickly (think seconds of effort) created a prompt and got Chat GPT to write text each time it was prompted, until told to stop.
I copied the output (which I'd requested references to be included in), and gave it to the professor who set the question.
They very quickly confirmed the language model essay was considerably better than their human students' essays. It was more coherent and readable, better answered the question set, was more "on topic", and even better structured - it followed a structure that introduced a position, then substantiated the position, gave some mention to alternative viewpoints, and compared and justified the position in relation to them.
The topic wasn't hard, you could easily Google copious essays on the topic, but the output wasn't pulled from the web verbatim (even in parts), and it was better than the actual students' work!
The biggest failing was the references - language models generate plausible and readable text... So it invented lots of very interesting and plausible sounding references, none of which existed. The titles sounded relevant. The authors were appropriate to the subject matter, the journals or publication venues were also appropriate and relevant. It's just that none of them actually existed!
the same way you prove that someone plagiarized anyone else. You demonstrate that the text was reproduced, in this case probably by checking the assignment with ChatGPT themselves and comparing it to student answers.
ChatGPT has a very verbose, uncanney valley style to it that likely sounded like nothing the student had organically submitted before, it's not that hard to tell.
> ChatGPT has a very verbose, uncanney valley style to it that likely sounded like nothing the student had organically submitted before, it's not that hard to tell.
I've had 2 students threaten me and my university with lawsuits over accusations of cheating (not plagiarism, fwiw). And I had far more compelling evidence than an uncanny valley of verbosity.
I think it's more likely that this story is bullshit (as someone else in this forum has also speculated) than the likelihood that a professor would be willing to go to the mattresses with only a stylistic argument for plagiarism.
> it's not that hard to tell.
It might not be that difficult to tell, but proof is another matter.
edit: The students were bluffing with their threats of lawsuits, but it still made for two unpleasant semesters.
The weird thing is that it's quite likely within 5 or 10 years, not only will teachers have given up, but they will expect students to leverage these types of tools.
Why? I know for a fact that the majority of the students in many of the classes I took were cheating. I've found test answers from college classes while borrowing colleagues calculators. Half the colleagues I work with don't know very basic things they should know, from their degrees, with several admitting to not doing the work. I catch lies on resumes every week, and have caught people cheating during interviews. I've known people that cheated at work (outsourced).
I think it's more likely to have happened, by now, than to not have happened.
Cheating is absolutely rampant. My wife witnessed her classmates cheating throughout their coursework (during the early pandemic). One of those people went on to get a coveted spot in the local nursing school and is now a nurse. My wife couldn’t get in, because they use a point system for admissions. The system is rigged to be gamed, and the cheaters are winning the game.
I regularly use ChatGPT to improve my English. I write a paragraph, I don't like it and ask ChatGPT to rewrite it fixing grammar mistakes (English is not my first language). Would that be considered plagiarism? No (just like using Grammarly is not considered plagiarism). So, how on earth can the university distinguish between using ChatPGT for rewriting my own words vs using it to write an entire essay?
Whether it's plagiarism (or cheating) depends on the context. If you're in a class where your english ability is being evaluated in your writing, it's cheating. If it's adding new ideas, its cheating.
If you're writing technical reports that have no english or stylistic weighting in the grading, and you're using it solely as an advanced grammar checker and making sure all the ideas in the content are your own, then it could be acceptable.
I think professors and schools need to realize learning and life will be different now with the mass utilization of AI. Humans are taking a role more of composers than generators of the raw material — writing in this case. And I think, ultimately, it should be ok to use assistive technologies like ChatGPT to write an essay. After all, a human still signs off on it. The human still has to creatively make decisions with respect to prompts and what to include.
And as long as it does not infringe on someone’s copyrighted work, then what is the trouble with using AI to write? I’m sure all scientific papers could be made far more comprehensible in parts with the use of assistive ai tech. It’s like spell check but better
When writing, I often start with a poorly written draft that I then polish during a second pass. Recently, I've been experimenting with ChatGPT to see if it can help with the editing phase. "Please rewrite this to fix grammar and spelling. Please also rewrite it in a professional voice."
Are AI detection tools able to distinguish this (arguably fair) use of AI from "Please write the whole thing for me from scratch"?
The detection tools currently aren't great/perfect.
I've noticed that some of them will detect "hollow" text (think profile boosting wannabe thought leader type content from LinkedIn), as being a very high probability of being automatically generated. I don't think the text in question was from a generative model (based on when it was posted), but I can see the stylistic traits in hollow, substance-free human generated text, which detection algorithms try to find.
I think you would be safest to avoid using it like that for now, as the kinds of traits sought include lengths of sentences, peculiarity of words in relation to each other etc. These properties are likely to feed through into the final text when you ask it to redraft.
In saying this, automated detection isn't likely to be reliable enough to take action over (in the longer term). Though I guess from a very blunt personal viewpoint, if someone's own writing can be confused with a generative model output, it probably isn't very insightful work to begin with!
As an aside, I have been playing around with ChatGPT a little. One thing I have found is that because its token space is limited to 4096, it will often struggle with long documents. Often the first and the last won’t agree or it will “forget” stuff from the beginning of the document.
> Edit2: I am doing a masters, so this will not be taken lightly. And I recieved 2 grants to study it.
Facts to consider before cheating...