Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: GPT Repo Loader – load entire code repos into GPT prompts (github.com/mpoon)
373 points by mpoon on March 17, 2023 | hide | past | favorite | 149 comments
I was getting tired of copy/pasting reams of code into GPT-4 to give it context before I asked it to help me, so I started this small tool. In a nutshell, gpt-repository-loader will spit out file paths and file contents in a prompt-friendly format. You can also use .gptignore to ignore files/folders that are irrelevant to your prompt.

gpt-repository-loader as-is works pretty well in helping me achieve better responses. Eventually, I thought it would be cute to load itself into GPT-4 and have GPT-4 improve it. I was honestly surprised by PR#17. GPT-4 was able to write a valid an example repo and an expected output and throw in a small curveball by adjusting .gptignore. I did tell GPT the output file format in two places: 1.) in the preamble when I prompted it to make a PR for issue #16 and 2.) as a string in gpt_repository_loader.py, both of which are indirect ways to infer how to build a functional test. However, I don't think I explained to GPT in English anywhere on how .gptignore works at all!

I wonder how far GPT-4 can take this repo. Here is the process I'm following for developing:

- Open an issue describing the improvement to make

- Construct a prompt - start with using gpt_repository_loader.py on this repo to generate the repository context, then append the text of the opened issue after the --END-- line.

- Try not to edit any code GPT-4 generates. If there is something wrong, continue to prompt GPT to fix whatever it is.

- Create a feature branch on the issue and create a pull request based on GPT's response.

- Have a maintainer review, approve, and merge.

I am going to try to automate the steps above as much as possible. Really curious how tight the feedback loop will eventually get before something breaks!




This is awesome, can't wait to get api access to the 32k token model. Rather than this approach of just converting the whole repo to a text file, what I'm thinking is, you can let the model decide the most relevant files.

The initial prompt would be, "person wants to do x, here are the file list of this repo: ...., give me a list of files that you'd want to edit, create or delete" -> take the list, try to fit the contents of them into 32k tokens and re-prompt with "user is trying to achieve x, here's the most relevant files with their contents:..., give me a git commit in the style of git patch/diff output". From playing around with it today, I think this approach would work rather well and can be like a huge step up from AI line autocompletion.


Please see the following repos for tools in this area:

https://github.com/jerryjliu/llama_index

and/or

https://github.com/hwchase17/langchain


Maybe someone can correct me, but my understanding is that you would calculate the embeddings of code chunks, and the embedding of the prompt, and take those chunks that are most similar to the embedding of the prompt as context.

Edit: This, btw, is also the reason why I think that this here popped up on the hackernews frontpage a short while ago: https://github.com/pgvector/pgvector


This sounds like a reasonable start. Eventually we need to get to the point where we can expose an API for models to request additional information on their own.


Exactly, but this has some scary implications in the future - imagine when it is common pratice to allow AIs API access as a matter of course...

When giving a prompt, the prompt causes the crawling of many APIs to build the response - the power of such activity/features, will be scary power-to-authoratarian goals.

Imagine if the prompt is "Select all users who have political beliefs, posts, comments, links from APIs A, B, C, etc where sentiment appears to dissent from [party line]"


Imagine that these prompts are not triggered by humans anymore, but by the AI invoking itself.

Sam Altman takes comfort in the thought that their AI does nothing without a human prompting it, so it has a human in the loop, as a circuit breaker if you will.

This assumption is rapidly becoming a mere hope, as right now probably hundreds of developers are working on systems which, when put into production and connected to other systems, might just come down to: the AI is calling itself, and giving itself orders.


Imagine that you just out-did me on how dystopian this AI cyberpunk future could get.

Imagine I want you to shoot me.'

Imagine I Want [whatever]GPT to make a cyberpunk Anime based on such...


It's just so slow for the autocompletion use case to do it like that. Ideally, you're never chaining serial requests to the LLM. Even if you do stuff in all the data into a single prompt, the execution time seems to be superlinear with the number of tokens, again getting super slow.


Yeah I agree it's too slow for autocompletion at the moment, but this would be for full feature implementations, not just autocomplete. For example, if I have a repo I want to add a table and rest api implementation in, it can do this: https://imgur.com/a/mIJvaJr (ignore the formatting errors in the UI, somehow parts of it show up in as code and others not, but api wouldn't have this issue, especially since you can use the system message to enforce output format).

I'm happy to wait even 30-60 seconds for this which I can easily evaluate, criticize (and the model will correct it) and then proceed to just patch and move on. I think the results from this will be much better with the 32k model, but remains to be seen.


Working with GPT becomes like coding in plain English.


There's a reason we don't code in plain English though. Natural language has ambiguities. This is the reason we invented programming languages.

It's best illustrated by the old joke:

  A programmer's wife told him "Go to the store and buy milk and if they have eggs, get a dozen." He came back a while later with 12 cartons of milk.
A good chunk of all bugs in software are down to the requirements being insufficiently well specified. Further, many bugs are the discovery of new requirements when informal specification encounters reality.

"Read from standard input into this byte array" doesn't specify what to do when the input exceeds the byte array.

When you overflow the buffer, you get a "well obviously you're supposed to not do that"... that's wasn't stated at all.

When the function keeps going after a newline or a null byte or whatever, there's another "well obviously you're supposed to stop at those points". That was also not specified.

and so on.

At the point you're specifying all these cases and what to do when, it's so specific and stilted, you might as well be using a programming language.


Actually, programming languages were invented because speaking machine code was too much of a pain in the ass! Programming in English is a natural next step. Ambiguity is not an issue -- you keep speaking until its resolved.

(We already program in English, in a sense, when we tell humans what we want, and they go code it. Now we'll just be telling machines.)


This type of well-defined language actually predates computers by several thousand years. Even way back in antiquity they used "programming languages" like these to get around the inherent ambiguities of natural language.

Originally as formulaic syllogisms and Aristotelian logic, but then onto other forms of codified language, formal logic etc.

Adding more words often makes things less clear, not more so. What you need is well-defined terms with no overloaded meaning.

> (We already program in English, in a sense, when we tell humans what we want, and they go code it. Now we'll just be telling machines.)

Humans get it wrong all the time though. A great many bugs arise from quite simply misinterpreting the requirements. Which leads to requirements becoming more formulaic and resembling a programming language.


I disagree that logic or math are the same as a programming language; a programming language is defined by the fact that a machine can execute it.

Plus also most math and logic is still communicated and developed in a mix of human languages (English etc) and ad-hoc, not rigoursly defined notation; it's nowhere near the precision of a programming language.

Though you CAN of course grind it out at that level, if you want, but it's very unwieldy and not how people actually work.

If you're looking to deduce some sort of proof from well-defined principles and you wish to eliminate the possibility of error, then sure well-defined terms (a rigorous language) is useful.

If you're looking to produce a sofwtare artifact, just saying what you want in high-level terms and providing iterative natural language feedback is going to work great and be way nicer than trying to formalize everything.

(Maybe not true for low-level plumbing and things that need to be secure. But for like "build an app", "build a game", "make a shell script that does x", I think it will certainly end up being true.)


Well what you want is one and only one behavior. You're going to need to be specific to the point where what you're specifying is that singular behavior. An interesting example is a binary search, which is easy to informally specify, although it's shockingly difficult to produce a correct implementation. [1]

Given what you want is a singular specific behavior, not just sorta that behavior, but exactly that behavior, it makes sense to use a language that caters to specificity.

As an aside, we've experimented with more natural-language like programming languages before (COBOL, BASIC). They're a pain in the ass because you needed to type a novel to do even trivial things. The terseness of modern programming languages is a feature.

[1] https://ai.googleblog.com/2006/06/extra-extra-read-all-about...


You want one behavior from the perspective of the user's desire, but there will be a very large set of possible programs that will achieve that desire. So you just need to be precise w/r/t desire, not with the nitty gritty behavior. Traditional programming languages are great to expressing the nitty gritty behavior, and bad at capturing the desire.

Don't think of LLMs as akin to COBOL or Basic (or HyperTalk, is another great example) -- those are essentially rigid languages with a sprinkling of "Englishness" to try to make them seem less foreign to novice users. And yes, that just gets in the way once you become proficient in rigid languages.

But that's a completely different beast than programming in actual English.

So for example, the reason binary search (a simple algorithm) is hard to implement is because of the mismatch between easy to understand human desire and the pickiness of the rigid language designed for the machine.

Now, it is very easy to implement a correct binary search.

Tell ChatGPT:

> Write a binary search method in C#

It responds:

    public static int Search(int[] arr, int target)
    {
        int left = 0;
        int right = arr.Length - 1;

        while (left <= right)
        {
            int mid = left + (right - left) / 2;

            if (arr[mid] == target)
            {
                return mid;
            }
            else if (arr[mid] < target)
            {
                left = mid + 1;
            }
            else
            {
                right = mid - 1;
            }
        }

        return -1; // Return -1 if the target is not found
    }
We can also ask it:

> Prove it is correct

It responds:

> To prove the correctness of the binary search algorithm, we can use a technique called loop invariant. A loop invariant is a condition that remains true for every iteration of the loop. In this case, the loop invariant is that the target element, if it exists in the array, will always be within the range defined by the left and right pointers.

> Loop Invariant: At the start of every iteration, if the target element exists in the array, it must be within the subarray arr[left, right].

> Initialization: At the beginning, left = 0 and right = arr.Length - 1, which covers the entire array. So, if the target element is present in the array, it must be within the initial range.

> Maintenance: During each iteration, the algorithm examines the middle element (mid = left + (right - left) / 2). There are three cases:

> If arr[mid] == target, the target element is found and returned. > If arr[mid] < target, the target must be in the right half of the subarray. Therefore, we update left = mid + 1. Since the array is sorted, we are sure the target is in the range arr[mid + 1, right] if it exists in the array. > If arr[mid] > target, the target must be in the left half of the subarray. Therefore, we update right = mid - 1. Since the array is sorted, we are sure the target is in the range arr[left, mid - 1] if it exists in the array. > In cases 2 and 3, we have updated either the left or right pointer, maintaining the loop invariant for the next iteration.

> Termination: The loop terminates when left > right. Due to the loop invariant, if the target element exists in the array, it must be in the range arr[left, right]. Since left > right, this range is now empty, which means the target element is not in the array, so the algorithm returns -1.

> In conclusion, the binary search algorithm is correct, as it either returns the index of the target element if it exists in the sorted array, or -1 if it doesn't.

(OK, I'd quibble with that proof a little bit -- it was a little hand-wavy on why the loop must terminate.)

How about the broken binary search you linked to?

Ask ChatGPT:

> Can you find the bug in this java binary search implementation? [Code]

It responds:

> The given binary search implementation appears to be correct, assuming that the input array a is sorted in ascending order. However, there is a potential issue with integer overflow when calculating the middle index mid. To avoid this, you can calculate mid using the following line:

    int mid = low + ((high - low) / 2);
So maybe now you can see how English might be a superior way to program.


Right, I brought it up as an instance of a class of problems that has this property of being easy to specify but difficult to implement correctly. It will know how to implement a binary search because a great deal of articles have been written about the correct way of implementing a binary search and the pitfalls of this one particular problem is very well documented.

It's almost unique in that the problem has a corpus of literature about how difficult they are to implement correctly, which pitfalls are common, and how to solve them. ChatGPT being able to regurgitate this solution is not a good demonstration of it's ability to solve general programming problems.


That's my general point: it's easier to say what you want than to implement it in a low-level (relative to English) language. Hence why English is a good programming language.

And LLMs aren't just good at binary search, they're good at lots of things.

Imagine you are in a room with a programmer who is unquestionably better and more expert than you are.

Now let's say you need to write a program. Would you be better off trying to write it yourself, or describing what you want to the better programmer, and letting them write it?

Obviously the latter!

Given a sufficiently advanced compatriot, English is the preferred programming language.

Now, are LLMs good enough? Probably not yet, but getting there rapidly!


And we are back with Cobol haha full circle


You would probably should also include all relevant imports. So in C/C++ add all non-standard headers referenced by those files, in other languages simulate the import system, maybe pruning imported files to just the important parts (type definitions etc)


AFAIK, only the older models allows you to do fine tuning, not sure GPT4 will allow to create your own fine tuned model so basically with the API it will work the same as with the chat gui.


Just remember the API charge is 6c per input token [1]. If you push 32k input tokens in, you're looking at $2000 per API call just as input.

You... might wanna consider a self hosted alternative for that use case, or at least do like, a `| wc` to get an idea of what you're potentially sending before calling the api.

[1] - https://help.openai.com/en/articles/7127956-how-much-does-gp...


6c per thousand tokens, so $2 per maxed out API call


oh, you're quite right, I didn't see that it was per /1k tokens. My bad!


Still super expensive. You cant build a business arround it with these rates. It should be at least 1000 times cheaper if not more.


Depends how good it is now, doesn’t it? If it actually writes code (it does), which is ‘good enough’ compared to a 25-150$/hr human (it does for the lower part of that scale), the $2 is definitely a good business case. For instance, I had to write 1500 lines of js yesterday because one of my colleagues who took the task could not manage it in the previous 2 days and the deadline was today. We would’ve saved about 500$ in total doing it with that expensive API from the start. Now it was free besides my hourly wage (it took me little over an hour) so it made even more sense.


> which is ‘good enough’ compared to a 25-150$/hr human (it does for the lower part of that scale),

I'm not exactly sure you define that scale, perhaps Minecraft bots or the like, where the damage after complete failure is self contained to perhaps a few dollars or a few hours of human annoyance. I'm sure there are many niches where a 50% success rate of mass generated programs can earn you big bucks.

But in my experience Codex does very limited reasoning about code paths. For the current state of the art, you are almost guaranteed to have catastrophic bugs in any non-trivial programs engineered by prompt.


When a bad senior or junior/medior delivers their pr, I (or some other skilled senior) review it and it has issues which I explain and we go in a fix loop. Often I just approve it and fix it myself as it takes too long. That is good enough as that’s going on in all companies. Gpt is the same only many times faster; the loop is instant and sometimes I give up and fix it myself, as it is not going to get it, just like some (depressingly many) humans.


Have you worked with ChatGPT to generate the edge cases/test data? then fed that in to get back out a function or what ever?

I tried it with phone number formats and it came up with more than I could.


Stupid question: "How could all the crypto mining infra be reporpused to be GPU GPT prompting farms?"


depends on if you can build something that can classify what api to call.


Eventually you get to a point where the AI simply doesn’t know how to do something correctly. It can’t fix a certain bug, or implement a feature correctly. At this point you are left trying to do more and more prompt crafting… or you can just fix the problem yourself. If you can’t do it yourself, you’re screwed.

I wonder if the future will just be software hobbled together with shitty AI code that no one understands, with long loops and deep call stacks and abstractions on top of abstractions, while tech priests take a prompt and pray approach to eventually building something that does kind of what they want.

Or to hell with priests! Build some temple where users themselves can come leave prompts for the general AI to hear and maybe put out a fix for some app running on their tablets devices.


> I wonder if the future will just be software hobbled together with shitty AI code that no one understands, with long loops and deep call stacks and abstractions on top of abstractions

There's plenty of software out there that fits this description if you just remove "AI" from the statement. There's nothing new about bad codebases. Now it just costs pennies and is written in seconds instead of thousands paid to an outsourcing middleman firm that takes weeks or months to turn it around.


At least the shitty software might have unit tests


GPT can write pretty good unit tests imo.


Agreed. It knows what something is supposed to do fairly well.


co-pilot excels in writing tests. It removes so much of the tedium and boilerplate. It seems to understand fixtures once you define one and understands subsequent changes to the boilerplate such as changing usernames, ids, appending items to lists. It is such a massive boost in productivity that those using it are way ahead of those who resist.


That's great. I've done something much similar with other tools. It certainly helps automate the drudgery in writing tests for code that I understand and wrote.


I don't see programming going away for this reason. Think about it, if you have to carefully describe what you want to do to an AI - you are just writing a program. Only a program is deterministic and will do what you tell it to, whereas an AI may or may not.

The future that I see, coding and AI are divided into two camps. The one is what we would call "script kiddies" today - people who don't understand how to write software, but know enough to ask the right questions and bodge what they get together into something that mostly works. The other camp would be programmers who are similar to programmers today, but use AI to write boilerplate for them, as well as replace Stack Overflow.


I'm already at the point where I get frustrated when GPT writes some incorrect code - despite it saving me enormous amounts of time. The appetite for productivity seems to be insatiable. I want to be able to create a million lines of code per month by myself.


> I want to be able to create a million lines of code per month by myself.

That sounds terrible. Who is going to read all that code when you need to understand and modify it? The LLMs, I guess?

I'd much rather be able to write 10,000 lines of code that can do what your million lines of code does. Better programming languages, libraries, and other abstractions are what we need.


>> I'd much rather be able to write 10,000 lines of code that can do what your million lines of code does.

Sure, who wouldn't. Unfortunately this is my hypothetical and I get to control what I mean by it. The million lines of code in my hypothetical is good quality, maintainable with reasonable density.

>> Better programming languages, libraries, and other abstractions are what we need.

In the entire history of languages, we've only managed about a 10X improvement via these mechanisms (that is being charitable probably). Several important things are still written in C which would mostly be recognizable to a programmer from 40 years ago. There are still problems to solve but I feel we are on the asymptotic section of the curve in this regard.


So relatable! I find myself getting irritated when it gets stuck on something and I have to intervene, yet it helps absolutely shred LOCs in some other cases.


"If we wish to count lines of code, we should not regard them as lines produced but as lines spent." — Dijkstra


I think these days, LOC is a reasonable metric for a peer-reviewed code base. It is unlikely that the lines of code in a given project can be reduced by 10X in a maintainable way. Can we easily reduce the LOC in the Linux kernel or Postgres by 10X for example? It seems unlikely.

LOC is a fuzzy proxy for the amount information in a code base. It should never be used for any kind of productivity metric however since that is obviously dumb and easily gameable (win by writing terrible code). Of course most productivity metrics are fall into the dumb/gameable category so lets not use them.


Between GPT-3 and GPT-4 the precision required for prompts was decreased significantly. In theory it should reach the point where a project manager type person would be able to describe what is needed and it would simply do it, the main thing missing from attaining that is that GPT-4 basically never responds to questions with requests for clarification, otherwise, a whole team of developers could be reduced to just one proofreader.


That’s not very impressive. With very little precision I can throw a few keywords into google and get a full answer and perhaps some code snippets from stack overflow to solve whatever problem I have in the moment.

Except you can’t just blindly copy code snippets, you have to read the author’s explanation and perhaps adapt things to your own code sometimes, or reject their solution entirely. GPT-4 can’t do this because it doesn’t actually know what the hell it’s doing, it’s just putting stuff together in a form that is most probably correct based on what it has seen in training data for past examples.

I fear for the layman who sees a bunch of AI generated code and think it must be right. Who knows what bugs, security flaws, or performance issues they will run into, that they have no idea how to solve or even to begin asking a prompt for.


> That’s not very impressive.

That’s kind of impressive.

I usually have zero luck with stack overflow except for super trivial things.

Some of the stuff I’ve done has no documentation and I had to find an example (like one other person in the entire history of mankind though this was a good idea) on some random repo on GitHub. Or I’m implementing code from a paper written 30 or 40 years ago and there’s no example code to look at. I could ask on stack overflow but who needs that abuse?

Admittedly I just do this to amuse myself and think that having a LLM digest a paper and spit out code is the bee’s knees.


> Admittedly I just do this to amuse myself and think that having a LLM digest a paper and spit out code is the bee’s knees.

This is what I have been doing also. Verification/Implementation of Software Engineering papers will be juicy especially with image ingestion.


You must not have used GPT very much based on this response. It is pretty great.


The thing that makes ChatGPT (especially 4) great though is that everything is hyper customized to you and, given a little prompt context (which is pretty trivial to automate and will be built right in to our tools one way or another), it can produce working code that fits your exact needs extremely rapidly.

e.g., if I type into google, "Build me a React component todo list that loads its data using fetch() to /api/v1/todos and is styled with Tailwind", I'm going to get a bunch of stuff that I can maybe use, if I invest 15 minutes wading through cancer-inducing blogs. Whereas at least for trivial problems, ChatGPT just tesseracts something that's extremely close and you copy-paste it in, change a few things, boom, component done. Something doesn't work? Often a follow up message fixes it.


The problem is the unfounded assumption that the code will be shitty 10 years from now. You really want to make that bet given the extreme velocity of recent progress? My expectation is that the code will actually be good.


> software hobbled together with shitty AI code that no one understands

That’s my fear as well. Software may just get shittier overall (aka “good enough”), and in higher volumes, due to it taking less time to crank out using AI.


I think it's going to be the end of libraries, why would people bother with libraries when every service can be churned out in 5 seconds with ChatGPT ?

We're going to witness he greatest copy pasta in history and find out how that goes.


I think it will be a bit of both, why reach for that module when you can just get AI to write the 10% you need so quickly? (Which is actually a net gain I think in many cases, just look at leftpad drama) But the productivity boost from AI will also lead to more libraries available that do useful things in the areas it's not so sharp. They will also start to lean in to AI assisted coding styles too of course.

And once we get good tooling to inject in recent data, source code, etc so it can use those as well... gonna be great


I think the end of libraries would be bad, even for LLMs, they're not super intelligence and the more spaghetti code they're asked to work with, the less likely they will be able to perform, because statistically, they will be very hard to grok. It will increasingly have to deal with more and more entropy.

Infact, it probably performs so well because of the heavy use of libraries we see today.

Also side note:

I doubt you're ever going to be able to upload code to an LLM my friend, not a publicly accessible one. It would be quite dangerous? Could you imagine the attacks, the implanting of back doors etc? Don't think so.

I'd say scraping the web is probably going to be an increasingly dangerous pursuit for ChatGPT now people are getting to better understand the attack vectors available.


Ultimately if everyone stops writing libraries how will GPT get new data to train on? It will form some kind of feedback loop on its own outputs and just write shittier and shittier code.


Most of the software will be GPT itself.


Yes, aren't we under-focusing a bit, right now, on the goals that in the past were achieved by writing code but will in the future be achieved by LLMs themselves (and their replacements)?

For example, maybe we write some code now with the goal of helping a customer service person do their job. But we all know that plenty of people are trying to replace customer service people with LLMs, not use LLMs to write tools to help customer service people.

I see that the LLM still needs to know what's going on with the customer account, and maybe for a long time that takes the form of conventional APIs. But surely something is going to change here?


“This repo is GPL-v3 licensed. Rewrite it while preserving its main functionality”


This is already legal without AI. Copyright protects only expression, not ideas, systems or methods. This is why directly reverse-engineering a proprietary binary to extract the algorithms and systems is legal.


Indeed, but it’s a much more legally dubious proposition when it comes to entire repos. A repo has more potentially creative structure for copyright to attach to. For example the class graph, or the filesystem layout are creative decisions that could potentially be protected. Current LLMs are nowhere near powerful enough to reimplement an entire repo without violating copyright.

For an individual function I can totally believe GPT4 could strip creative expression from it today. For example you could ask it to give a detailed description of a function in English, and then feed that English description back in (in a new session) and ask it to generate a code based upon the description.


Sounds like clean room, and if you can do that for GPL code, you can also do that for proprietary code, which is fair in a sense. Or maybe the question is whether you can re-label the code so written as GPL or MIT ......Or, you should let GPT pick a license that it likes.


You can also decompile a proprietary binary, ask GPT to rewrite it, and then re-release it as your own work.

Somehow I feel the lawyers won't agree.


Reimplementing would also help with training data, it's a way of extracting the idea without copying the original form. Works even better on images with variations, you generate the style from image A with content composition from image B, thus extracting the style without the exact expression.


> Copyright protects only expression, not ideas, systems or methods

Copyright is a law agreed by a humans in a social contract created to protect humans and further their interests in a 'fair' manner. There is no inalienable right to copyright, no universal law that requires it, it's not an emergent property of intelligence that mechanically applies to artificial entities.

So while the current copyright laws could be interpreted in the way you suggest for the time being, they are clearly written without any notion of AI, and can and should be revised to incorporate the new state of the world; you can bet creators will push hard in that direction. It's pretty clear that the mechanical transformation of a human body of work for the sole purpose of stripping it of copyright is a violation of the spirit of copyright law *.

*( as long as that machine can't also generate a similar work from scratch, in which case the point becomes moot. But we are far, far, from that point)


"mechanical transformation of a human body of work"

Functional purposes where never meant to be covered by copyright.


For anyone interested, this is called "clean room design"[1]. Unfortunately it doesn't protect against patents, but it does against copyright.

[1] https://en.wikipedia.org/wiki/Clean_room_design


This is an overcomplicated method of applying the idea/expression dychotomy. It is perfectly legal to not use clean room design, as shown by the Sega v. Accolade and Sony v. Connectix cases.


New GPT-4-powered business model just dropped.


Nice!


Rather than prompting GPT into implementing a solution, can we prompt it to try to preemptively find issues with the codebase or missing-functionality?

Also, do we know what languages GPT-4 "understands" at a sufficient level? What knowledge does it have of post-2021 language features, like in C23?


It has no post-2021 knowledge, but while playing with it, I found that you can just paste the documentation (no need to even format it) and it'll just "learn" it. For example, safetensors wasn't available back then apparently, I just copied the docs into it and was able to get it write pretty good pytorch code that incorporates safetensors.


i just asked it about safetensors today! also, got a response that amounted to "i don't know what that is. i'm guessing it's X"


did you post the documentation into the prompt?


I imagine few shot learning would kick in for most new language features. A feature may be new to a particular language, but is it really new?


I tried a new Lisp dialect on it, my own hobby language. It could cope well given explanations initially, but with some degradation after a while. The full transcript went to 84kB, so it must be doing some kind of intelligent summarization behind the scenes to stay as coherent as it did, right? (The standard context window is supposed to be 8k tokens.)

(https://gist.github.com/darius/b463c7089358fe138a6c29286fe2d... paste in painful-to-read format if anyone's really curious. In three parts: intro to language; I ask it to code symbolic differentiation; then a metacircular interpreter.)


you might be able to do a 2 step process where you have it only highlight relevant sections of text given your task as an intermediate step. like a filter step before the task step.


I am skeptical about using the method of generating a large amount of repository data and sending it, because there may be too many files in the repository. I think a better approach might be for OpenAI to open an interface for transferring GIT repositories, and then let OpenAI analyze the repository data, which is similar to what [chatpdf](https://www.chatpdf.com/) is doing.


How much text can you feed GPT-4?

Our codebase is 1 million lines of code.

Can we feed the documentation to it? What are the limits?

Is it possible to train it on our data without doing prompt engineering? How?

Otherwise are we supposed to use embeddings? Can someone explain how these all work and the tradeoffs?


I've been wondering if you could use something like llama-chain's tree summarization, but modified to be aware of inter-module dependencies: https://gpt-index.readthedocs.io/en/latest/guides/index_guid...


I'm waiting on my GPT-4 API access so I can use gpt-4-32k which maybe can soak up 10k LOC?

Clearly this will break eventually, but I am playing around with some ideas to extend how much context I can give it. One is to do something like base64 encode file contents. I've seen some early success that GPT-4 knows how to decode it, so that'll allow me to stuff more characters into it. I'm also hoping that with the use of .gptignore, I can just selectively give the files I think are relevant for whatever prompt I'm writing.


> GPT-4 knows how to decode it

I wonder if you could teach it to understand a binary encoding using the raw bytestream, feed it compressed text, and just tell it to decompress it first.


Here is what GPT-4 says about it. "As an AI language model, I can understand and work with various text encoding schemes and compression algorithms. However, to work with a raw bytestream, you would need to provide specific details about the encoding and compression used.

To teach me to understand a particular binary encoding and compressed text format, you should provide the following information:

The binary encoding used (e.g., ASCII, UTF-8, UTF-16, etc.). The compression algorithm employed (e.g., gzip, Lempel-Ziv-Welch (LZW), Huffman coding, etc.). Once you provide these details, I can help you process the raw bytestream and decompress the text. However, keep in mind that my primary focus is on natural language understanding and generation, and I might not be as efficient at handling compressed data as a dedicated compression/decompression tool."


When GPT gives an answer like that, is it actually a meaningful description of its capabilities? Does it have that kind of self-awareness? Or is it just a plausible answer based on the training corpus?

Genuine question.


My guess is that the training data includes things specifically about the GPT itself and its capabilities, so it would be somewhat correct. But it's also known to just make shit up when it feels like it, so you can't 100% trust it, same as with all other prompts/responses.


Base64 encoding increases the size of text by 4/3. Like the other commenter asked, I wonder if another encoding could work


Have multiple instances of Gpt4 with different parts of the codebase interact with each other to write the whole thing.

Probably doesn't work this way lol


32k tokens is the limit, so you won't be able to load the whole thing into the context.


For an alternative, you can use LangChain.

Unfortunately GPT is not yet aware of what LangChain is or how it works, and the docs are too long to feed the whole thing to GPT.

But you can still ask it to figure something out for you.

For example: “write pseudo-code that can read documents in chunks of 800 tokens at a time, then for each chunk create a prompt for GPT to summarize the chunk, then save the responses per document and finally aggregate+summarize all the responses per document”

Basically a kind of recursive map/reduce process to get, process and aggregate GPT responses about the data.

LangChain provides tooling to do the above and even allow the model to use tools, like search or other actions.


GPT 4 is limited currently to 8k tokens, which is about 6000 words.

You can use our repo (which we are currently updating to include QuickStart tutorials, coming in the next few days) to do embedding retrieval and query

www.GitHub.com/Jerpint/buster


This is what llama-index was designed for! https://gpt-index.readthedocs.io/en/latest/ Would love to incorporate this Github repo loader into LlamaHub


Fully automated junior swe but on hyper speed. Natural next step


How will we grow new senior SWEs in the future?


Software engineering will never be the same. The LLM will teach its users about programming. This is the worst LLM tech will ever be. That is an incredible statement. The rate of error will decrease to near zero or at the very least significantly better than human. Universities will resist at first but the new tools that will emerge will be core curriculum at university. Just as I now don't use the pumping lemma day to day, programmers of the future will not write code. They will primarily review and eventually AI systems will adversarially review and programmers will do final review.

All programmers will become translators from product vision to architecture implementation via guided code review. Eventually this gap will also be closed. Product will say: Make a website that aggregates powerlifting meet dates and keeps them up to date. Deploy it. Use my card on file. Don't spend more than $100/month. The AI will execute the plan.

Programmers will come in when product can't figure out what's wrong with the system.


You must kill one in hand to hand combat before you can take their place.


Given how quickly everything around generative AI has been evolving, would your money be on a new junior SWE becoming a senior SWE first or LLM tooling gaining senior SWE capabilities first?


just start new instances from the base image or useful checkpoints

"The Age of Em" by Robin Hanson thinks through a lot of this in great depth


In a pod.


61 LOC for implementation, 42 LOC for tests.

this repo currently has more HN upvotes than LOC.

very high leverage code!


The code does less than the title appears to claim - it's a simple concatenation of files into one text file. Luckily GPT itself is high leverage and that's all you need.


https://github.com/mpoon/gpt-repository-loader/pull/17/ If you look at this PR, he had ChatGPT write the tests for him.

He wrote the issue on https://github.com/mpoon/gpt-repository-loader/issues/16 and summarized https://github.com/mpoon/gpt-repository-loader/discussions/1...

"Open an issue describing the improvement to make Construct a prompt - start with using gpt_repository_loader.py on this repo to generate the repository context, then append the text of the opened issue after the --END-- line."

Feels like it needs to add a little Github client to be able to automatically append the text of issues at the end of the output. I'm sure ChatGPT can write a Github client in Python no problem.


I've got a hunch that AI will be able to put Hyrum's Law (https://www.hyrumslaw.com) to good use in the future: given an application, generate unit tests for all documented and undocumented behaviours of the system. Do all the refactoring you need afterwards and you'll have a large safety net backing you up. With refactoring complete, regenerate unit tests for all new behaviours of the system.


I’m thinking if GPT can write entire programs professionally and iterate, it would be OpenAI that benefit the most and would be a guarded asset I.e not open to public till it’s safe for release. Most of us probably don’t need to worry about work after that as that could well be AGI


OpenAI already has access to any prompts anybody uses to write programs using GPT.

Who’s to say GPT-4 isn’t some ploy to gather data to train private AGI?


This is precisely what I believe is happening.

What other 'relationships' does OpenAI have with [corp/gov] where the private AGI is shared/sold/service as product to NGO or GOV customers?


Or both.


From what I understand this seems useful if you have a model that will accept a large or unlimited number of tokens. I was looking into doing the same thing with ChatGPT and went with ada to find snippets related to the prompt and then to include those with a prompt to ChatGPT: https://bbarrows.com/posts/using-embeddings-ada-and-chatgpt-...

Does ChatGPT 4 now accept more tokens maybe?


ChatGPT 4 currently accepts 8000 tokens and will eventually support 32k


I can't get it to eat 8k tokens. I assume this is only available via the api. The web interface is limited to around 2k tokens.


The help page says you have to select the 8k model to do 8k. If there's no UI for that, then I guess it's API-only. And the 32k one is being rolled out separately. I think you have to sign up for access to that one.


So I have been toying with an AutoPR GitHub Action for a bit but seeing this spurred me to actually put it into a format for y'all to try. It uses GPT-4 to automatically generate code based on issues in your repo and opens pull requests for those changes.

https://github.com/irgolic/AutoPR/

What AutoPR can do:

- Automatically generates code based on issues in your repo. - Opens PRs for the generated code, making it easy to review and merge changes.

By using this GitHub Action, you can skip the manual steps of prompt design, and let the AI handle code generation and PR creation for you.

Here's how I used it to make the license for itself: https://github.com/mpoon/gpt-repository-loader/issues/23

Feel free to give it a try, and I would love to hear your feedback! It's still in alpha and it works for straightforward tasks, but I have a plan to scale it up. Let's see how far we can push GPT-4 and create a more efficient development process together.


nice. general question: how many lines of code (at 120 char col len) could you send in one prompt?

also, the entire thing is literally 60 lines of python. sometimes i don't get what gets upvoted on HN anymore


> also, the entire thing is literally 60 lines of python.

Which is a lot as you can do this in one line of bash. And have in the past for other reasons.

Something like;

     find . -name "*.py" -exec cat {} + > output.txt


Looks like it does this to me:

   for file in `git ls-files`; do echo $file; cat $file; echo; echo -------; done


So I just tried it out. The output.txt which was generated was... 375mb of mostly binary junk. I'm a lazy lark and have lots of nonsense in this repo which I shouldn't, but I was hoping the tool might be able to detect which files are "meaningful" or not.

I tried again after updating the script to accept a .gptinclude file (this functionality was entirely added by one GPT-4 query). This time the output file was a much more acceptable 744kb.

Now upon hitting the actual API, I'm being informed that there's a token limit of 4096 (which I wasn't aware of and isn't mentioned in the repo).

Doesn't that really severely limit the usefulness? What good is it uploading a repo if you're only limited to 4096 words? That's scarcely a couple files!

Sort of wish I hadn't spent time on this - I feel like in theory it's a nice idea, but so limited in practice that I don't see it being useful for anyone working on something meaningful.


Im no expert, but wouldn't it make more sense to give the repo-context (structure, source code, PRs, Issues, ...) as embeddings? You could use langchain to generate an embedding and send it through the API, like explained here [1]. It then should have access to the context at inference time, which as I understand is better than loading the context in the prompt which wastes tokens / max. output length and has a limit.

1 https://www.youtube.com/watch?v=veV2I-NEjaM


In an ideal world, you would be able to have a contextId that you would pass to OpenAI prompt calls. And be able to manage that context separately. So you would pass it code files (with expiration dates) And you could also provide a list conversationIDs, so when providing an answer for a particular prompt request, GPT knows what previous prompts and responses to consider.

As of right now, I've never used the API as a developer, but I've heard that you have to provide the ENTIRE context with EVERY prompt request. How do you work around that?


I’m curious to see how this turns out as well, though you’ll probably have to devise a workaround for the token limit for this to be effective for all but the smallest projects.


Hey, I got inspired by this and built https://github.com/andreyvit/aidev, it sends a slice of repo to OpenAI with a prompt, and saves the results back into files. It's in Go, and has built large chunks of itself. (As one of my friends said, that gives “self-documenting code” a totally new meaning.)


This will fail to get the commit history over time, as it's just reads files in the directory.

If this was using the git library and https://langchain.readthedocs.io/en/latest/reference/modules... it would be a more complete solution.


now that's a name I haven't seen in years, hi mpoon!

also this is slick as hell


Oh hi fire


hey man, hope things have been going well for you

on the main topic, have you heard about RWKV ( rnn based gpt style network )? the project's actively working on implementing "infinite" context length support, which would probably pair very well with a project like yours


Show HN is back, you heard it here first.


Seeing a lot of comments in here about the token limits.

Another path you can take is to fine tune a model on your business. Each training item has to fit within the token limit, but you can send hundreds of megs of these for training.

It's more expensive to run a FT model, but you don't have to include any prior context (assuming it's common to all prompts).


You can't fine tune gpt-3.5 or gpt-4 though


True, but they all have the same underlying foundation.

ChatGPT is just a big tech demo of what anyone could achieve. Granted, the training data is the hardest part. But, if you have a narrow domain and piles of existing data to work with, I don't see why you can't exceed the performance of these offerings for topics that actually matter to you.


isn't this a massive privacy violation? any employer would most likely not be okay with this.


it’s a proof of concept. you will have on-premise models soon that are privacy preserving, and OpenAI can set up another tier of API that has privacy-preserving TOS (just like cloud providers do)


Not all code is secret


Maybe I didn't understand the usage of this but how is this output.txt file that this repo generates to be used and provided as an input to chatgpt. Can someone eloborate this for me?


That’s cool! I tried to get GPT to build a todo list from react to sql and the docker file but got a little stuck, I’ll try with 4 when I use it next


Or you can just type into ChatGPT4 the following prompt: "How can I load a GitHub repository into ChatGPT?"


Before anyone working on commercial code bases thinks to use this, stop. Uploaded code becomes part of OpenAI.


This is not true anymore, they changed their terms so this is now opt in.


It's still probably extemely against any reasonable business's code of conduct.


That you're getting downvoted for an obvious statement like this suggests to me a lot of HNers are disclosing trade secrets and proprietary code to OpenAI that they know damn well they shouldn't.


Why would it be? Most businesses have no issue hosting private stuff on cloud or in github private repos.

TOS and trust matter the most.


Because the business has those relationships worked out. I can't just decide that I trust OpenAI and send my company's code over, that would be insanity.


I always think about this when using a free online prettifier, decoder, and the like. But I'm sure people use those things with code/secrets from work without really considering it, and I think those habits will carry right over to AI chat.


FYI, prettier.io is open source and can be self-hosted. Also if you watch the website’s network traffic, you won’t see it uploading anything to a server: it’s all done client-side.


Good to know!


Did you ask GPT to write the loader?


cat prompt.txt && find . -type f -name *.py | xargs -I{} sh -c 'printf -- "---\n\n" && cat {}' && echo "--END--"


Would GitHub copilot not solve the repository load problem?


Am I missing something? From what I understood from Wolfram description of GPT and GPT in 60 lines of Python, a GPT model's only memory is the input buffer. So 4k token for GPT3, some more but still limited for GPT4.

To summarize the GPT inference process as I understood it, with GPT3 as example:

1) the input buffer is made of 4k token. There are about 50k token. So the input is a vector of token ids. We can see it as a point in a high dimensional space;

2) The core neural network is a pure function: for such an input point, it will return an output vector as large as there are token. So here, a 50k element vector, where each entry is the probability that the associated token is the next element.

The very important thing here is that the whole neural network is a pure function: same input, same output. With immensely large super fast memory this function could be implemented as a look-up table, from an input point (buffer) to an output probability vector. No memory, no side effect here.

3) The probability vector is fed into a "next token" function. It doesn't just take the highest probability token (boring result), but use a "temperature" to randomize a bit, while using the output probabilities;

4) The next token chosen is inserted into the input buffer, keeping the same total number of token. Go back to (1) until a "stop" token is selected at (3).

So in effect, the whole process is a function from a point to a point. "point" here is the buffer seen as a (high dimensional) vector, so a point in a high dimension space. The generation process is in effect a walk in this "buffer space". Prompting puts the model into some part of the state, with some semantic relation to the prompt semantic content (that's the magic part). Then generation is a walk in this space, with a purely deterministic part (2) and a bit of randomization (3) to make the walk trajectory (and its meaning, which is what we care about) more interesting to us.

So if this is correct, there is no point in injecting a lot of data into a GPT model: the output is defined by the input buffer size. Just input the last 4k token (for GPT3, more for GPT4) and you're done: everything else would have disappeared. So here, just input the last 4k token of a repo and save some money ;)

To avoid this limitation, one would have to summarize the previous input, and make this summary part of the current input buffer. This is what chaining is all about if I understood correctly. But I don't see chaining here.

Sooo... Am I missing something? Or is the author of this script the one missing something? I don't mind it either way, but I'd appreciate some clarification from knowledgeable people ;)

Thanks


Seems to be very useful! Thanks for making it!


Should call it "Repo Depot"




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: