Hacker News new | past | comments | ask | show | jobs | submit login
Solving Advent of Code Puzzles with GitHub Copilot: Day 1 (freddiev4.xyz)
100 points by FreddieV4 on Dec 2, 2021 | hide | past | favorite | 65 comments



I can't help but think that whenever copilot is able to help with a solution, it's because the language itself doesn't allow you to express solutions concisely. I looked at the solutions in the article, and very little of the code is actually part of solving the problem. There are a few bytes of relevant information in each line.

I usually use Common Lisp, and whenever I find myself typing boilerplate of a type that copilot would be able to help with, it's because I haven't used the right abstractions.

As an extreme example of this, I solved day 1 in KAP (my APL dialect) and for the second problem, my solution was 8 characters long. There was no real need to have anything type that out for me.


Agreed, even with trivial JS constructs parts 1 and 2 can be solved (including parsing) in 58 chars (and only O(n), not O(n*w) like Copilot's solution). The copilot prompt alone for the just the logic part of part one was 98 chars.

Surprise, programming languages are more efficient at encoding logic than English.


It could be that my use case was simple enough, but for a class i'm taking, we were tasked with creating some code for a web application, as well as unit tests for it, and while copilot wasn't of much use for coding the actual application, it pretty much nailed autogenerating the test cases.

Now, as i said, this was for a class, so it was simple. Sadly work policies at my employer forbids the use of copilot (or anything transmitting code out of the company), so i can't really test it with "real code",


It's a question of whether copilot can be "creative". I'm mit sure, but it did manage to solve some of the harder leetcode questions here https://youtu.be/FHwnrYm0mNc and I'm curious how it will fare on the later advent of code problems.


The exact answers to leetcode questions are probably included many times in the Copilot dataset, in many programming languages. Fresh Advent of code problems are much more interesting.


Copilot is trained on examples of programs, so it will not be able to produce programs unlike any it has seen before.

This will not be easy to see in any kind of coding exercise. It is very hard to come up with coding problems with solutions that have never been written before, at least for a coding competition like Advent of Code. For example, we don't know of polynomial time solutions to the Travelling Salesman problem but it wouldn't make sense to have that as a coding challenge in Advent of Code, or Leetcode, or anywhere else.

Since nobody has ever produced a program solving such a problem, Copilot will not be able to produce a solution either, but it's not designed to do that anyway.


I'm fascinated by APL but man is it hard to get used to. I keep waiting for someone to build something like APL but sufficiently modernized as to actually be useful to me :)


That is what I'm trying to do, actually. However, the core syntax for fundamental array operations are still the same, but there is a layer of regular imperative structures around it, so you can write code that looks like JS if you want. It's just not as efficient.

It's not ready for actual use yet though, but any suggestions would of course be appreciated.


Hey! For folks who are doing this year's Advent of Code in Jupyter Notebook:

I'm one of the founders of https://cogram.com, a coding assistant for data scientists. Cogram brings Copilot-like features to Jupyter Notebook.

You can use Cogram by writing ordinary code (and Cogram will complete it), or by writing comments (in which case Cogram completes the code that matches the comment). We've also launched a web app that generates SQL statements from descriptions in simple language.

We're just starting out and would love it if you could try it out, give feedback, and spread the word if you like. It doesn't cost anything -- we have a generous free tier (1000 completions / month).


It’s interesting to see the mix of local identifiers and public code that copilot is able to generate. In the end I disabled it because I had to keep stopping to check its suggestions, really breaking the programming flow. But I imagine with some content and UI improvements it will be invaluable in the future.


I've tried it with Go and it's impressive. However, as you said, having to check generated code is tedious. I feel like I can trust Copilot's code less than what a colleague would produce. So then my coding sessions become code review sessions. I don't want most of my coding to be like that.


Yes, I find it quite fun as long as I can turn it on and off with a single keypress. Then you can have it mostly off, and consult it only when you want to.


Isn’t this working by finding someone else’s solution to the day 1 challenge? As in, since the OP copied and pasted the prompt, it was relatively easy for Copilot to match to the correct solution (also how it knew the window size for the second half was 3)?


I would think the today's code is not a part of the Copilot yet...


Surprise! Everything everyone types into Microsoft software is recorded and uploaded!


I'd bet good money, they aren't ingesting, retraining, and pushing production copilot models on a subdaily basis.


i think that's kind of the point behind gpt style models, updates are cheap ... not only in terms of source material but also feedback on which suggestions are accepted.


I think you need to actually get the new data from Github and finetune the model. Maybe it's done very frequently, but unlikely.


Training the AI that often and fast would be impressive. But I don't think it's the case because they didn't brag about it.


I get that the point is Copilot and not the code but

    sum([arr[i] > arr[i - 3] for i in range(3, len(arr))])
will solve part 2. (Obviously for the first part subtract 1 and start the range at 1.)

You don't need functions (but that's kind of an opinion, so whatever).

More importantly, to check if a rolling sum is increasing you don't need to compute that sum. I wonder if Copilot can ever figure that out.


I haven't dug into this too deeply, but how does checking the current arr index against the value 3 positions back work to correctly calculate the answer?


Another poster had the same thing, but explained it a bit further. I get it now. Clever!


> How can I come up with solutions to puzzles more quickly?

If that's the goal, an obvious solution may be to not write useless comments, typing and verbose function names :) Solving day 1 in Python is 10 lines of code without even trying to golf it.


He copied and pasted that comment from the problem.


There's no such text in the problem - see https://adventofcode.com/2021/day/1. That comment describes the solution, not the problem. The program's output is also annotated; you don't do that if you try to be fast, you already know what the output is gonna mean.


It would be interesting to see if it's possible to just get Copilot to give you other people's solutions as often people upload their solutions to GitHub.

The comment to trigger Copilot would be something like the following:

# Advent day1 part1


No. It doesnt fetch new code from github or anywhere else. the model has been trained and uses the data of that training.


What is with the recent "grassroots" Copilot postings that go uncontracdicted in the comment section? Are the critics worn out and now the code laundering machine is pushed by sheer volume of postings?


Regardless of how it found it, or what it was basing it on, puzzle two is missing the key "aha" with such a sliding window type of problem.

Which is that two of the values are the same in both windows you're comparing; you don't actually need to sum anything, just compare the unique values from each (i-3 and i, respectively).

If Copilot is trained on a corpus of average devs...I do worry about what this says about the state of our field.


Meh. The problem is so small and simple and straightforward, you don't need any clever "aha" moment. If anything, being clever just makes the code slightly less obvious, because the reader needs to get that "aha" moment too.

...and my solution did actually do things the way you described, so I'm not just being defensive about it :P


These types of puzzles can help educate people as to how things might need to be coded to work in more "real world" situations. At my previous job we used them as the basis for discussion and education amongst the team (hoping to do something similar at current company).

For example, taking 2021_day1, the naive solution that reads in all of the values and then iterates through calculating and comparing sums is all well and good, it satisfies the end goal of producing a solution.

What if you had 1,000,000,000 values in the input?

* A program that reads in and stores all of the values before iterating through them now consumes a non-insignificant amount of memory.

What if you had to check the differences between the sums of 100,000 consecutive terms?

* A program that calculates and compares the sums (rather than just comparing the two non-common terms) will be doing a lot more unnecessary computation.

What if you had to take a seemingly endless input stream and report the running totals at 1,000,000 iteration intervals?

* A program that reads all of the input before iterating through it is now unsuitable.

..etc..

Programming is not just about getting the answer, it's knowing where you can do the bare minimum to do that, and knowing under what conditions your bare minimum solution will become impractical and what you would have to do to avoid this.

There's nothing stopping people using AoC as a toy to collect all of the stars (surprisingly few people have all 304 stars, it was under 700 last time Eric answered that question in July 2021). But it can also be used for a lot more than that.


I agree with the general principle of naive vs. specialized solution.

Using AoC day 1 is probably not a good example of it however, because a lot of modern languages offer streaming iterators, so the naive, declarative solution will still not need to load all the values into memory at once.


I'm not sure what streaming iterators have to do with it; the thing I noted was simply that you only need two values at a time. How you get to them from the input is kinda immaterial; the observation is that you can create a more generalized solution (that will be efficient no matter the size of the input, and no matter the size of the window). In fact, if the windows are only offset by one, you don't even need to do any math. It becomes a trivial two pointer problem. Even with a streaming iterator, you're summing up values (either a lot of unnecessary summations across the entire window, or you're adding/subtracting individual values off, which is one step away from realizing the thing being subtracted, and the thing being added, are the only relevant inputs per iteration).

More broadly, it's a neat observation that you can determine the direction of a rolling sum without actually maintaining a sum. And certainly, you can do it without recalculating the sum(s) at every step.


It may be a neat observation, but you're not pushed in any way to recognize it so I struggle to see how it could be the point of the exercise. It's so dead simple to just code it up naively in thirty seconds that unless you notice the trick right off the bat (or already know it), no one's going to think twice that there might be a more efficient or clever solution.


>> so I struggle to see how it could be the point of the exercise

I think you make a mistake in assuming "the" point, as in, singular. From Advent of Code's own website - 'People use them as a speed contest, interview prep, company training, university coursework, practice problems, or to challenge each other.'

We also have seen people use them as an excuse to start doing something in a new language, and, per this post, to play with Github Copilot and understand what it's doing.

So...it's really what you want to make of it. -My- point was simply that if there's an ML model trained with all sorts of approaches devs use to solve sliding window style problems, and it manages to solve the problem but still misses such a generally useful optimization, it does call into question what it was trained on.


> I'm not sure what streaming iterators have to do with it; the thing I noted was simply that you only need two values at a time

Yes, iterators have nothing to do with the 'just compare the two changed values' optimization. I brought them up because the parent brought up issues of eager vs lazy loading and reading infinite input streams in chunk.


When you try to write this kind of thing quickly, it's hard to beat something like `sum(lines[i:i+3]) > sum(lines[i-1:i+2])`. You get points for time of submission, not for the code.


If you just imagine the sums written out, you can see the trick faster than you could type anything involving:

  lines[i+1] + lines[i+2] + lines[i+3] > lines[i] + lines[i+1] + lines[i+2]
It's a single mental step to eliminate `lines[i+1]` and `lines[i+2]` on both sides, arriving at `lines[i+3] > lines[i]`.


...but why would I write anything like that (or even imagine it) when I could just write the thing I wrote above? :) It directly encodes what the puzzle is asking for, so you don't even have to stop and think about it. Of course it can be logically simplified, but if you want to be relevant in the leaderboard you need to act in a matter of single seconds.

The whole puzzle can be solved in just two lines of Python, but I certainly wouldn't come up with that in 1min it took the person that solved it first, since I had to take a step back and notice quite a few things (even if they're ultimately rather simple observations) to come down to such concise solution.


>> ...but why would I write anything like that (or even imagine it)

Because in the example explaining the steps, it shows you that? I.e., 'The measurements in the first window are marked A (199, 200, 208); their sum is 199 + 200 + 208 = 607. The second window is marked B (200, 208, 210); its sum is 618'; even how the text is aligned on the page, the 200 and 208 are almost directly above/below each other.

Sure, yes, if your goal is just to solve it as quickly as possible for the internet points, by all means do the naive thing. But if you already "had to take a step back and notice a few things to come down to such (a) concise solution", it seems like maybe solving it as quickly as possible wasn't your goal.


Solving it quickly is the goal before sending a solution, since that's the only thing that officially counts in AoC. After doing that, I could take a step back, think about it and make it concise as a self-imposed challenge - which of course also involved not counting the sums.


Kinda presumptuous to call it "the goal", given the reason this submission even happened came from someone for whom it -wasn't- the goal. Not to mention the fact submissions don't close after the leaderboard fills.

"Officially counts" - sure, in the sense of being officially counted. Not in the sense of even what Eric recognizes as the reasons for Advent of Code existing (as there are a number of reasons why people might use it in the About section).


Of course you're right that not everyone cares about speed, but... why are you even saying this? This subthread started with an explicit "When you try to write this kind of thing quickly", and even the article itself asks "How can I come up with solutions to puzzles more quickly?", and yet you got here to argue that speed doesn't matter. For the purpose of this discussion "the goal" is clearly defined, but I'm afraid you missed it.


>> For the purpose of this discussion "the goal" is clearly defined, but I'm afraid you missed it.

Like how you apparently missed the other two bullets the article listed?

Fair that your original comment was about speed, but, you're also dismissing the discussion after that (i.e., you had to notice various nuances to the problem that slowed you down, and that this optimization was something directly noticeable just in the example given in the problem discussion), as well as the comment (i.e., mine) that started this thread, that was about the quality of code being pulled from Github and informing Copilot, and NOT about the speed of coming up with a solution on your own under time pressure, which is unaffected in this instance when using Copilot.

In short, you're saying "It's about speed; everything else doesn't matter", and I'm saying "It neither was about speed in my original post, nor is that the only possible goal". You're insisting the place you moved the goalposts is the only place they've ever existed.


> you had to notice various nuances to the problem that slowed you down

How can I ignore my own argument in this discussion that proves my point? :D

> In short, you're saying "It's about speed; everything else doesn't matter"

Of course not, I'm saying "various people have various goals, and noticing that you can simplify sliding windows doesn't matter when you're after speed and naively calling `sum` is faster to type".


It’s going to be interesting how services like copilot affect the quality of programmers who start their studies with it available as a tool. It will also have an interesting knock on effect on hiring? Someone could have a portfolio almost entirely generated but not a literal copy of anything else.


I suspect it would in increase their chance of getting an interview, but they would struggle during the live coding portions.


I decided to use rust yesterday for advent of code to learn something new and had copilot enabled.

Copilot suggested `into_iter` instead of `iter` and my very first rust experience was a battle with the borrow checker trying to figure out wth is a move.


yea i imagine having a robot write code in a language you’re not already an expert in would be painful

In my opinion the danger of automation is that anyone who knows less than the machine is not going to recognize a mistake - plus, a robot can do the easy things first and now you have a talent pool problem: no one is paying juniors to learn if a robot can do the work of a junior


For the second puzzle, there is no need to calculate a sum of three elements. I only realized this myself after having thought about it a bit longer.


I can't even get into Github's Copilot pilot program. Are they still adding new users?


They do, but I believe the queue is sorted by users' GH activity.


would be more interesting to see when it starts suggesting anything relevant in the 15s,30s,45s,1m,10m,1h intervals after a puzzle is released...


its completely irrelevant


I don't know ... the video at least is straight up bullshit, right? The terminal at the bottom of the editor session shows that the user has already run "copilot_solution_puzzle_two.py" with the desired answer before he types anything.


Hey, thanks for your comment. I actually tried to make an attempt to write / run the code just once, and it printed the correct answer, but I forgot to record it. So I deleted all of the code and tried typing _again_ from scratch to record it, and ran the program _again_, and it yielded the same result (the second run is what's shown in the video)

This is also something I'm curious about regarding Copilot. I mentioned that "GitHub Copilot has a list of possible solutions to code completions, so I wonder if it’s just luck that the suggested solution was the correct one".

I'm going to test further to see if it just always produces the same code completion, or there's some bit of randomness to the completion.


>> This is also something I'm curious about regarding Copilot. I mentioned that "GitHub Copilot has a list of possible solutions to code completions, so I wonder if it’s just luck that the suggested solution was the correct one".

The Codex model on which Copilot is based had about 30% accuracy on the first solution to a coding problem, but 70% when it was allowed to generate 100 solutions and choose the one that passed unit tests. On the other hand, when the best-of-100 solution was chosen according to the probability assigned to it by the model it scored 45% [1]. So it's kiind of luck.

Basically, Copilot has no way to tell whether it's giving you a right or wrong answer, other than to select the answer with the highest probability according to its training set which usually means the most common answer in its training set. So the probability that the answer you get is the "right" answer depends on the probability that the right anwser is the most common answer to the problem you give it. If that makes sense?

__________

[1] https://arxiv.org/abs/2107.03374

See Figure 1.


That's already mentioned in the blog post: the author first wrote a solution in a normal way, and only then decided to try copilot.


I'm doubtful, too. How would copilot know what to generate for the second function? How can get_depth_input be sufficient information to generate a function that reads a text file "input.txt" line by line and cast each line to an int? If copilot truly does that, then it's heavily geared towards toy problems.


Because he literally already told it what he wants it to provide in his existing solution, which you can see in the tree pane on the left. This experiment means nothing.


Does Copilot use other files open in the editor? Are they becoming a part of the prompt?


> Does Copilot use other files open in the editor?

While I am not 100% sure of the sources, my use of Copilot makes me pretty sure it uses other open files in the editor, other files in the current project folder (whether or not open in the editor), and to suspect it may use the past history of the current file (at least in the same edit session).


That sounds like too much input. Remember that Copilot is based on GPT-3, so its input size is limited to 2048 tokens.

I think it's more simple to assume that "get_*_input" is a common name for a function that reads input from a stream and so that this kind of string is common in Copilot's training data. Again, remember: GPT-3. That's a large language model trained on a copy of the entire internet (the CommonCrawl dataset) and then fine-tuned on all of github. Given the abundance of examples of code on the internet, plus github, most short progams that anyone is likely to write in a popular language like Python are already in there somewhere, in some form.

The form is an interesting question which is hard to answer because we can't easily look inside Copilot's model (and it's a vast model to boot). The results are surprising perhaps, although the way Copilot works reminds of program schemas (or "schemata" if you prefer). That's a common technique in program synthesis where a program template is used to generate programs with different varaible or function names etc. So my best guess is that Copilot's model is like a very big database of program schemas. That, as an aside.

Anyway I don't think it has to peek at other open files etc. Most of the time that would not be very useful to it.


You're right, I was wrong.

> GitHub Copilot uses the current file as context when making its suggestions. It does not yet use other files in your project as inputs for synthesis. [1]

[1]: https://copilot.github.com/


Well I just guessed. So... we arrived at the correct answer through our interaction? :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: