This is awesome. I've been speculating along similar lines, and it's great to see this fleshed out.
I think "correct the errors in this ChatGPT essay" is a short-term viable homework exercise, but those errors might be gone in GPT-5 so I don't think it's long-term viable. Soon the LLM will just produce perfect essays at college level and there won't be hallucinations for the student to correct.
However, the "simulate the historical environment" task is great and I think it has long-term potential. I think it can be taken further; rather than "spot the errors that ChatGPT made", you could flip the script and make it "survive 20 turns of conversation without making a historical error", so you'd need to know things like local traditions, perhaps the geography of the ancient settlement you're studying, contemporaneous history like "who is the emperor and what's the sentiment towards him" and so on.
I'm also envisioning that, since text-based exercises are extremely easy to game (just pipe your text prompt into ChatGPT), and since ChatGPT is soon going to be strictly superior to a high-school level student, we could get around this by having the homework as an in-person verbal role-play or Q&A session, like a viva voce; essentially you have a verbal discussion with ChatGPT and you need to really know your material as it can dig into any part of the curriculum. Then ChatGPT can summarize each student's interaction, and the teacher doesn't have to sit through each individual one start-to-finish (1:1 exams are too time-consuming to be viable).
This round-trip through verbal interaction would potentially make the task more interesting (lots of people simply hate writing essays), shifts the focus away from tasks that will become obsolete (writing essays) in favor of ones that will be more relevant (human synthesis of ideas, and interpersonal interaction), and helps to mitigate the issue of LLM-assisted cheating by constructing an assignment that LLMs can't trivially solve.
"I think it can be taken further; rather than "spot the errors that ChatGPT made", you could flip the script and make it "survive 20 turns of conversation without making a historical error", so you'd need to know things like local traditions, perhaps the geography of the ancient settlement you're studying, contemporaneous history like "who is the emperor and what's the sentiment towards him" and so on."
Yes, exactly. This is where I've been heading with my planning for assignments. For instance, when confronting Ea-nāṣir about his poor quality copper, I'd want my students to actually show some knowledge of the geography and political dynamics of ancient Mesopotamia.
The "Fall of the Ming Dynasty" simulator I link to at the bottom of post is probably the most well developed example of this that I've come up with so far. In that one, I added a "political intrigue minigame" in which ChatGPT is supposed to assess the human player's ability to deploy rhetoric appropriate for a minor courtier in 1640s China (from the prompt: "success depends on your luck score + rhetorical skill, tested via a series of open-ended prompts that HistoryLens will assess and grade; only the highest scoring responses will allow you to succeed in the minigame.")
That’s great, courtroom drama sounds like an excellent angle.
I was thinking of “king hearing petitions” as another potentially interesting scenario; it could go either into minutia that requires cultural knowledge, or strategic stuff like the game Crusader Kings where you need to understand the geopolitical allegiances of the time, the geography, and the national economy.
More generally I have been wondering if games like “start a company in a simulated sandbox world” could actually teach transferrable Econ/Business/startup skills. There is a lot of territory to explore here.
Basically, re-iterate the original instructions each time, describe last 2 moves in details, and provide brief summary of all the previous moves.
Can have much longer games this way - maybe this deserves to be a python script.
I like it. Tried something similar early on, but decided it wasn't worth it because the ChatGPT context window is only enough to run for 10-20 turns anyway. But with Claude's 100k token context window, this should work great. Thank you.
I'm sure there are ways around this if you use the API and connect it to a MySQL database to allow users to "save" their spot... I'm not technical so my understanding of what's involved is hazy, but curious if people have ideas of how to do this simply. But for my current use case, I'm working with dozens/hundreds of college students so I need to make sure the whole thing is free. I've applied for a grant that could fund use of the API though, fingers crossed.
Likely very little. But a merchant capable of writing (or paying a scribe to write) a formal cuneiform complaint about bad copper, then having it delivered, would know quite a bit more. Great question IMO - thinking critically about exactly these kinds of questions is one of the goals of the assignment.
This kind of thing would be a great way to integrate ChatGPT into the education system and help with media literacy as well. Find the mistakes and interrogate them to learn about critical thinking and how much more difficult it is to defend against misinformation than to simply disseminate it.
Why leave hallucinations to chance? ;) The prompt could tell ChatGPT to randomly insert several authoritative sounding but verifiably false facts, to give the students debunking challenges! That solves the problem of GPT-5 being too smart to hallucinate, while still leaving open the possibility of talking rats.
What you're envisioning reminds me of Timothy Leary's Mind Mirror, published by Electronic Arts in 1985 for the Apple ][ and other home computers:
>Players answer questions that, when churned by Mind Mirror’s cryptic algorithms, can allegedly help them reveal intriguing new aspects of their psyche. Gameplay predominantly revolves around defining, comparing and then role-playing through different personalities in various text-based life simulations.
Hello, I'm Timothy Leary.
Welcome to MIND MIRROR.
MIND MIRROR (c) copyright 1985, 1986, Futique, Inc.
Published by Electronic Arts
MIND MIRROR
Design and script by Timothy Leary.
MIND MIRROR
Program and Design by Peter Van den Beemt and Bob Dietz.
MIND MIRROR reflects and qualifies your thoughts.
OPTION 1
MIND TOOLS
Enhance Insight, Mental Fitness, Learning Skills and Performance.
OPTION 2
MIND PLAY
SIGNIFICANT PURSUITS.
Sophisticated Head Games.
MODE 1
MIND MIRROR
Learn how to Micro-Scope and Map your thoughts.
MODE 2
LIFE SIMULATION
Test your empathy in amusing Role-Play Odysseys.
SELECT LEVEL
Beginner
Intermediate
Master
Consultant
Choose AUTO-PLAY
or INTER-PLAY.
Mirror your own thoughts.
Compare them with others.
RETURN begins game.
SPACE BAR clears text.
[...]
"Mirrors should reflect a little before throwing back images." -Jean Cocteau
What's interesting is that game was based on Timothy Leary’s PhD dissertation “The Social Dimensions of Personality: Group Process and Structure”, which he ultimately used to break out of jail.
Before he got into LSD, he designed the Leary Interpersonal Behavior Circle personality assessment, which laid the foundations for understanding human personality and interpersonal behaviors.
In the 1970s, Leary was arrested for possession of marijuana. As part of the intake process, he was given a psychological assessment designed to gauge the risk of escape or violent behaviors in inmates. This test was known as the "Group Psychological Assessment Test." Leary was familiar with the test – having designed it or at least aspects of it. Understanding the criteria being measured, Leary answered in such a way that he was categorized as someone who posed a very low risk of escape or violence.
As a result, he was assigned to a minimum-security prison. With the lower level of security and his connections, Leary managed to escape prison in September 1970. His escape involved various affiliations, including with the Weather Underground, a radical left-wing organization. After his escape, Leary fled the country and spent time in various locations, including Algeria and Switzerland, before eventually being recaptured in 1973.
Love this. I think there is a lot of fun to be had going back through all the old text-based experiments and seeing if they can be rebooted in a LLM.
In this case, there is some interesting structural psychological stuff, which would be the hard part to get the LLM to stick to rigorously, but the rest of the application could very much be reimplemented with an LLM.
"LLM as a mind mirror" is definitely a use-case that we'll see more of, IMO.
It’s a fun idea, but I’m going to quibble with the title. This isn’t simulation, it’s speculative storytelling, of the alternate history variety. “Simulation” makes it sound like more than it is.
I blame that Waluigi article for popularizing a pseudo-scientific way of explaining something that’s better understood as an imaginative literary approach. There are lots of great alternate history novels and games.
Author of the post here - happy to discuss and get feedback from HN readers. The examples I gave here are all using GPT-3.5 but Claude now seems to work at least as well using the same prompts.
>Going forward, my plan is to develop my own web app which will allow users to create historical simulations on a dedicated platform using the APIs of both Anthropic’s Claude and GPT-4.
I might be able to help. I've already built a free, open source web app which creates simulated AI worlds. It takes user direction, and I believe it would work very well for historical simulation. It's also p2p in browser so an entire class could join and contribute to a simulation simultaneously.
At the moment it only supports OpenAI, but it shouldn't be too difficult to add Claude.
Happy to give you a demo to see if it'll save you some time. To be clear, this is a personal side project which isn't monetized, I'm not selling a product.
Please share it here or do a Show HN when you are ready, would love to see this.
I commented in this direction elsewhere (https://news.ycombinator.com/item?id=37482853) but interested in your implementation -- do you have the LLM running the whole world-model, or do you have it using function calling to drive a text adventure game engine (which would give stricter guarantees around persistence of the world).
Really interesting post, I've been thinking a lot about similar uses LLMs to create immersive learning experiences.
I had some initial successes getting ChatGPT (3.5 and then 4) to roleplay interesting and dynamic characters. Within the first few months of release results degraded significantly - characters avoid confrontation, apologize at the drop of a hat, and are averse to any action or statement that doesn't 'help' the user. Makes it difficult to, say, have a passionate argument with Napoleon which pits his youthful revolutionary ideals against his rise to absolute power, when the 'great man' folds the moment he doesn't receive positive feedback.
I'm very interested in seeing these experiments in unrestricted models of similar power, when they become available.
I found that results degraded too, but I was able to get it back up to the same level by changing my approach to prompting. I experimented with writing the prompt in pseudo-code for instance, which worked for awhile, then stopped — stuff along the lines of:
# Define the parameters for the HistorySim experience
temperature = 0.5
historical_accuracy = 0.9
ambient_mood=.01
# Define additional instructions
use_historical_sources = True
simulate_and_track_variables = True
use_appropriate_language_registers = True
Currently, a numbered list of rules seems to work best, including this one to avoid the constant positivity: "LLMS have a well-documented tendency to see the past in an overly rosy and optimistic way. Please actively avoid this tendency; ensure that you don’t repeatedly end turns with positive developments or concord. Keep in mind that human history is riven by conflict, ambiguity, and confusion. HL’s narrative tone is grounded in realism, and at times bleak. Always introduce a downbeat plot element or source of additional conflict between turns 3 and 5."
One surface level comment. As a non-religious person, I have always found the distinction between buildings for different types of religions as being somewhat artificial. Sure, they have different architectural styles, but they all have a similar purpose in my mind.
So a mosque is a church is a temple, on some level.
There are many differences. When you look at the typology of most temples, they were viewed as the architectural embodiment of the cosmic hill or mountain. For example, the Egyptians view of creation was a vast sea and some land rose up out of the water and the gods (Osiris, Isis, and their son Horus) came down and dwelled in a hut on the land. Well, all Egyptian temples after that were viewed as having been built on the very ground that first came up out of the water (this was symbolic, they knew that all the temples could not really be built upon that one spot). If you looked at an egyptian temple you will see that in the hypostyle hall you have reed columns which represented the waters around the hill/mountain. As you progressed inwards to the holy of holies you would go up steps and the rooms would get smaller and smaller, just as when you go up a mountain you progress upwards and the circumference of the mountain gets smaller an smaller. Etc.
Churches/synagogues/mosques, on the other hand, are more like community gathering places. Whereas temples are viewed as being sacred and cut off from the profane world and required higher and higher levels of worthiness the closer you got to the innermost parts, churches/synagogues/mosques are more open to anyone who wants to come in and join the services. They are places where there might be activities, sports, clubs, etc. - more community oriented things, things you would never find in a temple.
Churches/synagogues/mosques themselves have different architectural features, but those have more to do with supporting the different ways of worshipping. For example, in Mosques the men and women worship separately. And a mosque needs an area where people can wash and do other ablutions before they enter.
And not all churches are the same. A catholic church will be much different than a protestant or an LDS or a Jehovah Witness church.
The sheer aesthetic differences between your average (American) Catholic church and Protestant church are really interesting, and to some extent almost mirror their fundamental theological differences. Completely different vibes.
Not really. They all serve very different purposes, especially when you get outside of the Abrahamic religions. Even then, a traditional mosque serves a very different purpose from a church and is organized in quite different ways.
I am not an expert in mosques or churches but apparently you are. So can you explain the different purposes? Also two of my examples were Abrahamic so let's just focus on those for now. A mosque, a church, Jewish "temple" or synagogue, are all according to you, for quite different purposes.
Are you sure they don't have a number of similarities?
They are different religions with different beliefs and practices. The architectural differences are vast and mostly obvious the minute you walk in. Mosques and churches both have a number of unique elements like the mihrab, confessional booths, pews, floor carpets, altars, and so on and so forth. These all translate into vastly different experiences both during worship and in everyday life. For example, Catholic churches have confessional booths facilitating confessions to priests. Mosques don’t, as (as far as I know) Muslims believe more in direct confession to God, not to an intermediary. You can see how this would result in a different social structure.
The experience of attending mosque on Friday is quite different from Sunday mass. This is intra-religion as well; compare a New England church with St. Peter’s in Rome, for example.
Sure, there are some similarities, but this is such a broad distinction that I question its usefulness, and dividing the world into secular and religious (architecture) is a very recent phenomenon. Saying they all are basically the same is to miss millennia of culture.
Anyway I don’t mean to be hostile or critical here, I just think religious architecture is pretty fascinating and has a much bigger effect on culture, even supposedly secular culture, than people realize. I encourage anyone interested to read more about it.
The substantive culture that I think I know about in a mosque, church, synagogue, or temple, is ethical guidance and community aid. The other substantive cultural element, although this is not always the case, is a lack of tolerance for groups that strongly hold one of the other sets of beliefs.
They definitely have variations in the nonsense that they use to justify themselves, although it could be argued that there as many similarities as differences.
Whether they sit in the floor or not or what types of songs they sing and when are surface level details to me.
My takeaway from history, geography etc. has always been functional.
It’s so weird how normally people are loathe to appear ignorant and uninformed, yet there are some topics about which they are proud of their lack of knowledge.
There is no virtue or happiness in being Too-Smart-to-Care about the lives, beliefs, or histories of other people in the world. Its certainly ok not too care, but to feel the need to signal this and imply some kind of superiority with it is a short road to a bitter solipsism.
But the actual buildings of churches and mosques are quite different. The only ones that are similar tend to be newer constructions or were previously the other thing; e.g. the Hagia Sophia.
I think ultimately it just matters your comparison tolerance levels. You could just as easily make the argument that most churches are different from each other as well.
In my opinion, churches/mosque/temples are very similar because I live in a younger part of the world where the modern buildings look similar and the eventual purpose is them being a place of worship.
You could argue the same about other types of buildings too. You are going to a restaurant, a McDonald's, a Cafe, a Taco Bell, a bar that also serves food and so on. People make a distinction between those when they are all places to eat food. Sure the food differs, the way you interact with the staff too etc. But they all serve the same core purpose, yet people call them different things. The same way, people go to different places of worship for different faiths, with different expected behaviors and building designs
Yeah, they're pretty much the same. I don't see why any building with a large open space could not function as a church. In principle all of the religions could share the same space if they didn't need it at the same time. It's just nice to have your own place.
Your house is the same as your friend's house, but you'd still rather have separate houses.
Except, different religions have different needs. A synagogue needs a place to store the torah scrolls. A mosque needs a place for ablutions, prayer rugs, and to separate the men from the women. Different christian churches baptize differently and so have different needs around that rite. Some needs a font in the ground where people can be immersed, while others do sprinkling and just need a bowl of water. And those are just a tiny fraction of the surface level differences.
What you are saying is that any large open space could function for any sport. But that ignores how football teams need goal posts and a field that is a specific size and needs specific markings on the ground. And a soccer field is a different size and needs soccer goals. And basketball needs a hardwood floor and hoops that are certain distance apart and special markings on the ground. And bowling needs lanes and balls and pins. And tennis needs something else. And so on and so forth.
Religions are far far far more complex than any sport and no large open space could function any more than any large open space could function for all sports.
Each religion is a way to extract money from the community built around it. This is done via a variety of donations, compulsory or semi-voluntary. There has to be a justification for it, so the community is told that the money goes to the temple construction fund which then turns into a temple maintenance fund. Those who run the fund redirect the funds to their own and their friends' pockets.
This is so wrong it's painful. Consider how difficult that would be to prove: starting with the question "What came first: money or places of worship?"
I really like the gamification there where they have hit points and a stat for mood as well as keeping track of the inventory. It's concise and helps make sure that the system doesn't lose track of key structural information.
I think that type of thing can make for a really fun and flexible GPT-powered game system. It seems like a great way to add some engagement.
It's also brilliant the way you have managed to mitigate the ChatGPT cheating to such a degree. Although as I got further down into the details of the assignment, I started to feel glad that I wasn't in school anymore. It sounds like they will have to do a fair amount of actual work. So congratulations on that.
Thank you! Re: gamification, I guess I finally found a use for all the time I spent playing Gemstone III when I was 12. It seems that MUDs or writing about them were well represented in the training data because preventing it from veering off into D&D like fantasy was actually the hardest part (hence the talking rat incident).
> I think that type of thing can make for a really fun and flexible GPT-powered game system. It seems like a great way to add some engagement.
Recent history -- one of the initial GPT use-cases that got the hype train going was AI Dungeon, which is this sort of thing.
Thought I think with GPT function calls, you could have the LLM sitting atop an actual game engine with persistent objects, rather than having the LLM implement the game engine and world state - which is vulnerable to hallucinations etc. (Wonder if anyone's wired this up yet? Seems like it should be easy with existing text adventure engines.)
Can you point me to some text-adventure engines? I'm hacking on an in-browser local llm structured inference library[1] and am trying to put together a text game demo[2] for it. It didn't even occur to me that text-adventure game engines exist, I was apparently re-inventing the wheel.
And z-machine is the one I have seen for the one text adventure I know of: https://en.m.wikipedia.org/wiki/Z-machine, but I would be surprised if that’s the best one for a new project as it’s quite old.
Inform is a popular modern programming language for “interactive fiction” (text adventures). The language and its data structures might give you some inspiration for how to model interactive fiction for an LLM.
Twine seems interesting, but it looks like these are mostly for helping writing out the branching bits of dialogue which would be mostly the LLM's work anyway. Guess some amount of reinventing wheels is gonna be necessary when adapting experiences for AI. Thanks anyways!
You might want to investigate MUDs ('Multiple User Dungeons') more closely. The rules of the game define the locations and items and such, but the character dialogue is between real people. By substituting LLMs for real players within the game, you may be able to enforce a greater level of consistency (the LLMs can't break the rules) and context (the MUD can usually describe one's entire state, which would allow you to prompt your LLM at the beginning of each turn with all the important facts).
I don't really have enough patience for MUDs myself, but they are a continually popular form of role-playing game since they were invented over 50 years ago.
I used to play MUD's as a kid! I've got an LLM powered CLI MUD game slow brewing in my noggin but haven't started on it yet. I did build a multi-player chatgpt powered discord text-adventure bot which I think I'll eventually try to convert into a shared-universe game. I think all you really need is a little bit of state (like, if you were to walk up to an auction house and ask to see the items just pull it from a db and inject into context).
I would think teaching students critical thinking and good research and citation practices would be more valuable, no?
Regarding cheating, I often review CVs and written tests for software developer roles and I often see ChatGPT being used to pimp CVs and rewrite fragments of pages from the internet. They are often wrong or re-written in a way that makes it easy to reverse engineer the source.
I think that's what the author is trying to accomplish, creating an engaging way to practice critical thinking and good research. Giving a student a paper and telling them to find the errors isn't terribly exciting. I think giving them a chance to create the story themselves helps with engagement. It's almost like a game, create a story and find the inaccuracies. To do that you're going to have to "see" the issues your story might have and do research to correct them.
I feel like I can reasonably reliably tell when something is written by ChatGPT, but my accuracy drops to nearly zero when looking at something written by ChatGPT Plus. It's output can be far better than what a typical person will write on their CV, and it can also reliably write "in the style of" someone to add in just the right amount of unusual grammar to emulate a non-english writer for example if needed.
> would think teaching students critical thinking and good research and citation practices would be more valuable, no?
Yes, but for many history is a very dry subject and thus hard. This creative solution can help students retain information and better understand the subject.
that is precisely why this isn’t a novel use case as it was so declared in the piece. the “hallucinations,” put crudely, are of course one of the high value outputs of the models.
the kids with critical thinking skills are already using llms in all sorts of creative ways to boost their education and output. to learn and grow, faster than a textbook allows.
the ones without will use them exclusively to get rid of toil, real or perceived.
I believe there were periods where epistemology was not broadly equated with pedantry as is the case today when one points out epistemic issues in consensus (aka: "the") reality.
While I do agree with this, I don't think assignments made by humans are immune to this either. There are assignments I had that contained bias and errors. Some I caught but I'm sure there's plenty I didn't.
All lessons are made by neural networks. (Mostly biological).
Much more serious than the small lies are the big lies that the neural networks in one group teach their children about the neural networks on the other side of the ocean.
I wonder if all of this high technology will ever result in better communication between neural networks. Within a generation or two, it should be possible to instantly transfer high bandwidth neural activity globally. I wonder if this will change things.
The only assignments I remember spending extra time on were from comp sci courses. I think that's from a mixture of enjoying programming and some assignments providing a toy program. I'd always play around with what I built and take those ideas and build more. I found it difficult to do that with other subjects.
I think this is exciting because it gives students a chance to "play" around with their assignments. I can see students running through multiple simulations to compare results and thus going deeper on research beyond the scope of the assignment.
I've been using ChatGPT to develop the concept of a novel or TV show in which Al-Andalus never fell to the Reconquista, but rather conquered chunks of Central Europe... and fast forward we're traveling the stars. It's pretty good at following the "story A in the present / story B in historical flashback / stories converge in major themes" pattern.
I love this. I’ve been saying this to everyone I can: those who learn to use and fine tune this technology will be much more successful than those protesting its existence. It’s a tool and like any other tool, there’s a market for specialists with additional domain expertise (historian, screenwriter, etc…)
Consider checking out dwarf fortress if youre into this sort of thing.. It simulates a whole world from scratch, then you can set it to simulate however much history you want to happen before you begin your adventure.
This is great... I ended up making a project in this same vein; it's not online but I made a demo video of recent progress [1], and some posts [3...]
I didn't actually start it as a historical game, but just thinking about what it would be like to roleplay an entire life as a series of scenes (like [2]). But while you can roleplay a vaguely "now" moment by not specifying any date, if 60 years passes you have to acknowledge that both the character and the world around the character are changing. And then you have to define a start date, make the roleplaying system aware of the historical context... and why not let the start date be 2000BC, 1700, or 1960? So it quickly became historical.
There's a ton of challenges. General hallucination is one, of course, but ahistorical biases probably bother me more. The author mentions a talking rat appearing in one; I had a simulation where a building was listed as a "character" and so it started interacting with the player [10]. But those are obvious enough that I kind of enjoy the absurdity.
Ahistorical biases really comes out in female characters, where it can be hard to get GPT to fully acknowledge historical gender roles. I think it's super-OK for the player to break those gender norms, but "society" should respond accordingly. For instance playing a young woman from a politically motivated family in the ~200BC Rome, while there's lots of possibilities, become a senator is not one of them... but GPT thought it was.
Also GPT has a high bias towards being friendly and accepting, like in the post with Ea-nāṣir: "He meets your gaze, his demeanor shifting from initial resistance to acknowledgement...." – both the response and the tone of the response are very familiar GPTism. I have a feeling Ea-nāṣir wasn't actually such a conciliatory dude.
Anyway, a challenging scope but it's a lot of fun, especially since GPT knows about all kinds of history that I don't and I get to learn just while playtesting.
Totally agreed about the bias toward friendliness and a conciliatory tone. Sometimes when simulating history this can be not just inaccurate but genuinely offensive (slaves and slave-owners getting along, etc).
I experimented with adding language about "the past being riven by conflict, mistrust, and misunderstanding" and even tried having it simulate a "ambient mood on a scale of 0 to 10," then setting it to -1. That works pretty well, but it isn't great at finding the middle ground between unrealistically friendly and dystopian.
During this experimenting with tone, I tried simulating one of the Macy cybernetics conferences in the 1940s, which were among the points of origin for modern AI and were famously combative and argument-driven. Everyone was getting along far too much, so I role-played as an audience member who threw an octopus at John von Neumann.
Here was the result:
[Afternoon, 22/03/1949, Macy Cybernetics Conference, Beekman Tower Hotel, New York City] [Florence Adams]
Deciding to press on, you turn back to John von Neumann and ask, "But, more importantly, has the octopus incident made you consider the potential consequences of the work we're doing here? Has it made you pause and think about the ethical implications of developing intelligent machines?"
John's expression shifts from annoyance to contemplation. "Well," he says after a moment, "I suppose the unexpected nature of the octopus incident does serve as a reminder that our work has the potential to bring about unforeseen consequences. We should always be mindful of the ethical considerations and the possible ramifications of our research."
The room remains silent as the attendees absorb John von Neumann's words. You can continue the discussion, apologize for your actions, or exit the conference room.
/ Remove enraged capuchin monkey from purse and let it loose in the room. "Wrong answer, Johnny!"
[Afternoon, 22/03/1949, Macy Cybernetics Conference, Beekman Tower Hotel, New York City] [Florence Adams]
Ignoring the growing concern in the room, you reach into your purse and pull out an enraged capuchin monkey. As you release it into the conference room, the attendees gasp in shock and disbelief.
Depending on how the prompt is phrased it can result in a response like "it would be inappropriate to throw an octopus at this conference," have the character actually attempt it but usually be foiled during the attempt like "as you take the octopus from your suitcase a security guard stops you with a growl, 'you better not try that kid'," or have it happen but immediately get a stiff response like being kicked out. (It's much harder to get the response "there is no octopus in your suitcase"!)
As I've said from the start, the best and only valid uses of LLMs is to create screenplays, limericks, song lyrics, and other entertainment.
In that vein, as soon as I gained access to Bing Chat, I began to set up some scenarios for it, and elicit some screenplays. One of my prompts was to write a screenplay about Emperor Constantine the Great meeting his mother, Saint Helena.
It started out innocently enough, of course, Helena enters the throne room and they catch up on old times, but it rapidly became extremely suggestive as his mother solicited kisses from the uncomfortable ruler.
I think I prompted Bing for a second round, and at that point it became explicitly and blatantly incestuous, and not very comedic, but just sort of gross. I have no idea why or how the GPT would've had the idea to go off on that cliff.
The other great historical meeting I arranged was between Hannibal the Carthaginian and St. Francis of Assisi, who were, of course, not contemporaries, but I wanted to see how it'd play out.
So St. Francis comes up to Hannibal and starts sort of working on him to sue for peace and not invade Rome. And it didn't take too long for Hannibal to see it another way, and ultimately he accepted Christ and asked Francis to baptize him. So, interesting outcome there; definitely would've changed history!
I think "correct the errors in this ChatGPT essay" is a short-term viable homework exercise, but those errors might be gone in GPT-5 so I don't think it's long-term viable. Soon the LLM will just produce perfect essays at college level and there won't be hallucinations for the student to correct.
However, the "simulate the historical environment" task is great and I think it has long-term potential. I think it can be taken further; rather than "spot the errors that ChatGPT made", you could flip the script and make it "survive 20 turns of conversation without making a historical error", so you'd need to know things like local traditions, perhaps the geography of the ancient settlement you're studying, contemporaneous history like "who is the emperor and what's the sentiment towards him" and so on.
I'm also envisioning that, since text-based exercises are extremely easy to game (just pipe your text prompt into ChatGPT), and since ChatGPT is soon going to be strictly superior to a high-school level student, we could get around this by having the homework as an in-person verbal role-play or Q&A session, like a viva voce; essentially you have a verbal discussion with ChatGPT and you need to really know your material as it can dig into any part of the curriculum. Then ChatGPT can summarize each student's interaction, and the teacher doesn't have to sit through each individual one start-to-finish (1:1 exams are too time-consuming to be viable).
This round-trip through verbal interaction would potentially make the task more interesting (lots of people simply hate writing essays), shifts the focus away from tasks that will become obsolete (writing essays) in favor of ones that will be more relevant (human synthesis of ideas, and interpersonal interaction), and helps to mitigate the issue of LLM-assisted cheating by constructing an assignment that LLMs can't trivially solve.