I had him as a teaching assistant when I was teaching data structures at Princeton back in Fall 2013. Princeton CS has their professors rotate through as TAs every so often through their courses.

That semester, I made a slight mistake on the final exam where I asked students to create an algorithm that could find the second shortest path from s to every other vertex in a graph. I forgot to specify that the second shortest path should be simple (i.e. should not reuse any vertex twice). Having to deal with non-simple paths makes the problem much much harder.

None of the students figured it out in the time available, and I'm sure I would also have been stumped if I had tried to solve the problem. Bob figured it out though. And then I remember he graded all 150 solutions to the problem himself, having as a blast as he went through students attempts at an effectively impossible problem.

Being that damn smart must feel amazing, almost like having a superpower.

I just tried o1, and it did pretty well with understanding this minor issue with subtitles on a Dutch TV show we were watching.

I asked it "I was watching a show and in the subtitles an umlaut u was rendered as 1/4, i.e. a single character that said 1/4. Why would this happen?"

and it gave a pretty thorough explanation of exactly which encoding issue was to blame.


4o’s answer seems sufficient, though it provides less detail than o1.


A common problem, no doubt, with a lot of training context. But man. What a time to be alive.

Damn, the model really goes to length to those trivial but hard problems. Impressive

That was interesting. I asked it to try to say something in another language, and she read it in a thick American accent. No surprise. Then I asked her to sing, and she said something like "asterisk in a robotic singing voice asterisk...", and then later explained that she's just text to speech. Ah, ok, that's about what I expected.

But then I asked her to integrate sin(x) * e^x and got this bizarre answer that started out as speech sounds but then degenerated into chaos. Out of curiosity, why and how did she end up generating samples that sounded rather unlike speech?

Here's a recording: https://youtu.be/wWhxF7ybiAc

FWIW, I can get this behavior pretty consistently if I chat with her a while about her voice capabilities and then go into a math question.

I'm curious to hear feedback, especially:

1. Given the limited amount of scarce student time I want to spend on this, do pull requests / GitHub actions tests (provided for them) / code reviews seem like the right team processes to introduce?

2. Each student's work is independent of every other student's. Is there some way to have them working on different pieces of the same problem, or having the students solutions interact in some deeper way than random selection in a lottery? On my first post of this idea, a user named taftster imagined taking inspiration from Factorio in some way, with students each contributing some piece of a large system.

3. Is there a way we could naturally allow pull requests for the contest code or maybe even the README that establishes how to review pull requests? I'm thinking a bit about Fluxx or Nomic here.

On 2, you could recreate Axelrod's classic prisoner's dilemma tournament from 1980. Here's a short writeup about it: https://cs.stanford.edu/people/eroberts/courses/soco/project...

Another game you could try is the "guess 2/3 of the average" game.

You could make the assignment one game like these, or put a few together.

Like a lot of schools, our Data Structures course doubles as a soft intro to software engineering. I want them to know how to solve real problems efficiently and manage complexity.

Student time is incredibly precious. The hope here is that students will spend less than an hour on this, and in the process get a taste of how code reviews / automation can help teams function better. I'm hoping I can thread the needle and give them tools that will help them more efficiently complete their capstone project at the end of the semester, without spending too much of their time teaching them these tools.

FWIW, we have discussed the idea of a lower division software engineering class, and I was really hoping Pamela Fox would do this before she headed back to industry (interesting story there: https://blog.pamelafox.org/2022/05/my-experience-as-unit-18-...). We do have an upper div software engineering course as well, though it's not taken by most of our students.

This is a weird assignment idea. I’d love to hear what you think, how you might improve it, etc.

Some specific open questions I have:

1. How can I nudge students towards being creative with their comparators so that we get more variety?

2. Is there a natural extension that would get students working on the same files, yielding a richer demonstration of CI?

3. Could we allow students to somehow edit the contest code itself without breaking the entire idea of the project?

I really don't like this assignment idea as anything more than getting used to the assignment submission and grading system.

1. What is it teaching? Comparator<Integer> is trivial to implement, and you're adding no requirements on it (except maybe having a human readable shortname). Code reviewing a Comparator which merely has to compile and match the two simple rules of Comparators is not a useful code review. CI tests to check those rules for common errors is quite easy (undermining the value of code review) but there's not much extra complexity to shove into the Github actions to show their power. Maybe students will be amazed by a live scoreboard, but I don't expect that anymore. We had live scoreboards for (secretly submitted) assignments back in the 00s, and the world has become significantly faster-updating since then.

2. Incentivizing unique comparators adds in a lot of meta-work. Students will need to monitor the repository and the submissions of other students if the incentive is too high. Some students may wait until the last second to submit their solutions to avoid being detected, and yet other students may try to play spoiler and copy other submissions.

3. I don't like the structure where students submit two things (a number and a comparator) where the outcomes of the lottery are almost entirely based on the number. The comparator, which seems to be the intent of the lesson, isn't tied to the success of a particular student. And since you're only re-rolling comparators in the case of a 0 output for an input pair, there's a pretty significant chance that some student submissions won't even appear in the lottery computation tree.

1. It's definitely not about the Comparator. The lesson is all about showing off a CI/code review process. This is intended to be a short exercise, expected to take no more than an hour. The CI tests can't check that the comparator actually does what it claims, they'll just test that it's a valid comparator. And I'd build on this in a future pair assignment, where they can use some or all of the steps of the workflow from this more gimmicky assignment.

2. I agree it's a bad use of student time to see if someone's submission is unique, but I suspect there are ways we could structurally push students to do something unique, e.g. gus_massa's post below.

3. Yeah, it's a little weird. The integer submission is just to keep the macguffin of the assignment going. Also for N submissions, we'll use at least N-1 comparators. If we use RNG that cycles through every comparator before repeating, we can make sure everyone gets used at least once.

About 1:

* Generate a list of 1000000 pairs of random numbers. Run the Judge of each student on that list to get a new list of ternary numbers. Compare the list with the list of all the other students. The distance is the numbers of differences with the closest one. The one that is more far away wins a chocolate.

Perhaps add the variants like 123111213 -> 321333231 to detect mirror criteria.

Add 2222... (everyone ties) to avoid giving the chocolate to a lazy student.

(I guess it's easy to cheat with a good hashing method instead of an interesting criteria. :( )

* Generate a list of 1500 random numbers and join it with the 1500 numbers of the students. Use the Judge to compare the number of a student with all the other numbers. The objective is that the number of the student must be close to the middle. Perhaps 60% or 75%, to encourage been creative and cheating to win, but not too much. (I think you don't like "cheating", but for me it adds some fun.)

I like these! And I encourage a little bit of cheating. I'm curious to see how the particularly clever students will subvert the system.

As it happens, I had a fun assignment all about subverting the system back in the day when I taught security at Princeton back in 2012. In the final assignment (assignment 8), students were given a Linux disk image of a hard drive owned by a guy named Nefarious who had committed a murder. This was an assignment originally created by J Alex Halderman and Ed Felton, but I went a little extra in my version.

In the assignment text, I mentioned that Nefarious was originally arrested due to an anonymous tip from a pseudonymous "Cecco Beppe". Buried in /home/root on Nefarious's drive, there is an innocuously labeled file: CB.7z, which is notable only in that it is the only file on the drive dated 2012 (intentionally), years newer than all of the files on the drive. If one went to the trouble of unzipping this file, they'd realize it is a very small disk image inside the disk image.

Booting this image dropped the user into a chat with a depressed artificial intelligence (a custom ALICE chat bot, but with some hard coded responses to advance my story, this was pre-GPT) who, when prodded with the appropriate keywords, told the tale of the AI's depression about the murder and attempted but failed self-deletion -- leaving the AI nothing more than a pathetic and mostly incoherent chat bot.

Further prodding led the AI to reveal the existence of secret assignment #9 as well as a URL for said assignment.

At the URL, there is a zip file containing a flawed pseudorandom generator, a file encryption tool that uses said PRGen, an encrypted file, and a truncated copy of the corresponding plaintext. The truncated plaintext explains that their next task is to complete decryption of the file, which they can do by exploiting the flaw in the PRGenerator.

The deciphered text then explains that their next task is to go back to the HW1 autograder and submit a new solution for HW1 (which, incidentally, was to develop a PRGen) -- but with the catch that they may only use print statements, i.e. they're just trying to trick the parser. This was based on an security flaw I'd discovered in Princeton's grader while I was working on their first Coursera courses.

Once students submitted this cheating print statement code via the usual Princeton web submit for HW1, the autograder activated a secret message that explained the final part of secret assignment 9, which was to steal the source code for the HW1 autograder and send it to me.

This last part could be done by simply writing code that opening the .class files and prints them to the screen when the autograder script runs.

I had around 10 students (out of ~150) figure this all out, it was great.

I agree with one of the sibling comments here. Comparator is just way too simple of an interface. And you've defined a lot of it in terms of a single individual, and maybe just slightly a "team" in the larger sense of the entire class. It's kind of a "boring" assignment.

Just an idea. What if you created some sort of "robot" contest that defined various areas of functionality. Each robot could have sensors, motor controls, and the ability to reason on these. The interfaces of the robot would be modular and pluggable by student code; each student would collaborate on their teams robot, providing the logic for one of the interfaces. This effectively forces students to collaborate in a single repository.

The competition could include things like:

a) Teams of {n} students would build a robot collaboratively. They would each need to contribute into a shared team git repository and have a suite of CI tools that builds their code. The system pulls each robot and puts it into a continuously running competition, evaluating each robot for fitness. The contest is ongoing, so that each team can improve their team score (with more commits) until the assignment due date. Grades are given according to the top performers.

b) Like (a), but each student may collaborate on multiple robots and issue pull requests across multiple team repositories. The student with the most accepted pull requests is the winner.

I like the idea of a student placing their code into a larger vessel; collaboration with others through the use of modern software development methodologies. I also like the idea of seeing the "score" of the team robot being improved over time, as the team sees their robot moving up the charts. This helps incentivize the team towards improvement. "Gamifying" the system is a great way of encouraging student involvement.

I love that you're doing this assignment. As a full time software professional that has previously taught as an adjunct, I really appreciate that you're trying to teach concepts that are being actively used throughout industry.

[edit] As inspiration, think about the game Factorio as a potential model (without the graphics). Each student team would be responsible for a "line" on a production floor, each needing to process or assemble parts coming down the conveyor belt. The fittest team is the one that can correctly assemble the parts and deliver packages in the shortest time, etc.

I'll get pondering. Having some sort of rube goldeberg-esque mega project was my original idea, but finding a specific form for it has been elusive.

Taking inspiration from Factorio in some way is an interesting proposition. Will consider!

