Hacker News new | past | comments | ask | show | jobs | submit login
The lifecycle of a code AI completion (sourcegraph.com)
237 points by tosh 9 months ago | hide | past | favorite | 85 comments



This article has been around for some time, and it still shines. It focuses on building a product rather than just prototyping an LLM wrapper and waiting for the dark magic of GenAI.

Chapters like "The Strive for Faster Latencies" and "Post-Processing" are truly inspiring.

Creating production-level DevTools demands much more effort than merely wrapping around a ChatCompletion endpoint and mindlessly stuffing a context window with everything accessible inside the IDE (so-called "prompt engineering").


I attempt to use LLMs in coding tasks many times a day, the capability is there: Opus can make and execute a plan, GPT-4 can find and fix a few persistent typographical errors when Opus attempts verbatim output, and Dolphin-8x7 (or a bunch of other consistently candid models) can de-noise out the static interference from the Morality Police over-aligment.

For a long time it’s been about extravagances like the ability to recover from a lost session, or get a link queried and used as context, or do a reset of the slowly but inevitably corrupted state without losing all of your painstaking assembled point in some abstract state space.

Basic product building on the core LLM experience is way more important than incremental improvements on LLMs, I’d take robustness, consistency, and revision control over an LLM breakthrough, a chat bot session is a tech demo if I lose my work with my browser tab.

These folks get it, I agree.


> This article has been around for some time, ..

Interesting, what gave you this impression? This article was first published only 4-6 months ago around Oct 2023.

Don't get me wrong, I'm a fan of sourgraph and their founder, Quinn, is quite charismatic. I almost went for a job offer with them 5+ years ago. But let's be real, startups are not a winning game for a non-founder, thank goodness I stuck it out with BigCorp. In any case, this isn't that old of a post, cheers.


They seem to have got the correct impression that it's been "around for some time" as "some time" does not mean "a very long time" and you just confirmed that it indeed isn't a just-published article but one which, coming out last year, has been around "for some time".

I'm guessing you're just reading into the phrase an implication that isn't there about it being particularly old.


"Some time" does have pretty much that implication. The exact amount of time is unspecified, but I'm unsure that 6 months is enough to count. Maybe this is an American vs British English thing?

You dan't have to trust me: https://dictionary.cambridge.org/dictionary/english/for-some...


Well, you could argue that in the fast-evolving AI domain 5 months is already "a fairly long period of time"...


> startups are not a winning game for a non-founder, thank goodness I stuck it out with BigCorp

I'm curious why you think a big corporation has a higher expected value than a startup.



Can confirm.

Spent the last 5 years at 2 different startups.

Presently unemployed and broke due to literally worthless equity.


Because their equity compensation is in liquid stock (RSUs) rather than stock options.


> Congratulations, you just wrote a code completion AI!

> In fact, this is pretty much how we started out with Cody autocomplete back in March!

Am I wrong in thinking that there's only like 3(?) actual AI companies and everything else is just some frontend to ChatGPT/LLama/Claude?

Is this sustainable? I guess the car industry is full of rebadged models with the same engines and chassis. It's just wild that we keep hearing about the AI boom as though there's a vibrant competitive ecosystem and not just Nvidia, a couple of software partners and then a sea of whiteboxers.


For those who might not be aware of this, there is also an open source project on GitHub called "Twinny" which is an offline Visual Studio Code plugin equivalent to Copilot: https://github.com/rjmacarthy/twinny

It can be used with a number of local model services. Currently for my setup on a NVIDIA 4090, I'm running both the base and instruct model for deepseek-coder 6.7b using 5_K_M Quantization GGUF files (for performance) through llama.cpp "server" where the base model is for completions and the instruct model for chat interactions.

llama.cpp: https://github.com/ggerganov/llama.cpp/

deepseek-coder 6.7b base GGUF files: https://huggingface.co/TheBloke/deepseek-coder-6.7B-base-GGU...

deepseek-coder 6.7b instruct GGUF files: https://huggingface.co/TheBloke/deepseek-coder-6.7B-instruct...


you've got to differentiate between training, inference and hardware. they all benefit from the "AI boom" but at different levels and have varying levels of substitutability (Google tells me that's a real word)


it make sense, in order to come up with a base model, you will need a lot of quality training data, tons of compute.

the role of an AI startup is to come up with ideas, thus useful products. Most of existing products pre-AI, are also front ends to existing operations systems and exiting databases, because creating the whole stack does not make sense. At least we have state of the art open models that we can use freely


Now think about the amount of products built and scaled worldwide by this audience on top of the few cloud compute providers out there.


Your question can be reformulated as "will the application layer's lunch be eaten by better foundation models?"


Alternatively: Will my drop-shipping business be killed by AliExpress?


I mean... the article goes on to explain all their value add... which of course can be replicated, but it's not as if you can just grab an API key and do the same.


But OpenAI could make this themselves in a few weeks, right? If they happen to decide to, then this company is done.

That's what I don't get about all these AI startups. The core value of their business is an API call they don't own. It's like people who were selling ______ apps on iPhone when Apple added their own built-in _____ features to iOS, except the entire industry is like that.


Well, the same happened in the early days of the PC. There were all these companies that sold basic utilities like disk scanners, defragmenters, partitioners, antiviruses, etc. When operating systems started to get bigger and include these utilities by default, that industry dried up overnight. I don't think there's anything wrong in building an inherently ephemeral business that just seeks to fill a niche that exists just because someone else hasn't filled it yet.


Don't forget text editors! And even though it didn't last, many people made very good money on these.


Well sure, a company with ample funding could in theory do anything. It seems sort of like asking why Google didn't beat GitHub on code hosting. They did try, but their focus was elsewhere, so they gave it up. [1] And OpenAI doesn't seem to be doing developer tools at all?

GitHub and Microsoft are more likely candidates. GitHub Copilot is a competing product that could be improved if they chose to do so.

[1] https://code.google.com/archive/


I'm not really saying that small companies should never compete with big companies.

I'm saying that a small company whose biggest competitor is also their key supplier is, if not doomed, at least operating on borrowed time. If my value proposition is reselling AWS hosting with an easy-to-use dashboard, I better pray to god Amazon never makes AWS simpler.


I feel like that can be said about any SaaS app. The reality is, mega-corps move very slowly and often can't deliver the same user experience as a start up. If everyone had the approach of "well I shouldn't make this app because if Google wanted to, they could do it in a few weeks" we wouldn't have 90% of the startups we see today.


Aren't we all operating on borrowed time though? Even if some company exists for 3 years and is able to founders a small acquihire exit, what's wrong with borrowing some time from AWS/OpenAI? We can't all come up with PageRank and patent it and form a company around it.


From the article, it sounds like they are using Claude? Maybe there's more than one possible supplier. There probably will be more.

Edit: looks like the autocomplete provider is configurable in the VS Code plugin settings. Not sure what the default is.


Currently for chat we support Claude 2, Claude 3 Haiku, Sonnet, and Opus, GPT-4 Turbo, and Mixtral 8x7b.

For autocomplete we're using Starcoder 16b.

We also support local inference with any LLM via Ollama (this is experimental and only for Cody Free/Pro users). Enterprise customers can choose which LLMs they want.


The “core value” of many companies is a database service they don’t own, but their businesses seem to do just fine. There’s a few who get obviated over time (usually they are also slow to react), but they’re a minority.

I would also question the idea that OpenAI could build a code editing companion this robust in a fee weeks. It’s been a long time since the models have been the limiting factor in these tools.


I think I’d argue that the models are still a limiting factor for these kinds of tools - but there’s still also plenty of competitive space outside of models. The context you gather, the way you gather it and your prompting matters.


> The context you gather, the way you gather it and your prompting matters.

But your API provider gets to see most of that because you send it to them along with your prompts. I don't know what the ToS for these AI providers are like, but it seems like they could trivially copy your prompts.


Uhhh, no they don't? They see the results of RAG but there's very little to distinguish what is context you gathered vs. what isn't. On top of that, there's nothing in the prompt that indicates what decision was made before inserting something into a prompt. Let's say my retrieval pipeline used a mix of semantic search and keyword-based search, and I used some mechanism to decide which results from which search to include for this specific request. Nothing in the prompt will say that, and so the differentiating behavior is still kept "secret" if you will.

But I think this line of thinking is a bit ridiculous. Yes, if you send your data to a database service, that provider has your data and theoretically use it to run you out of business. Except they kinda can't, both as a matter of practicality and legality. OpenAI, Anthropic, and others have SOC II compliance. The easiest way to ensure you run yourself out of the market is to start violating that stuff.


Most companies have valuable data that they do own stored in a commodity database they don't. I know if Amazon started charging 10x for RDS, it would be painful but we could migrate in a few weeks.


Which part do you think you couldn't do on your own with your own api key?


They’ve been thinking about the problem continuously for the past year, can run experiments across their user base and thus have a lot more context and insight than someone whipping up an implementation.


well they wrote a whole blog about how they do it... and im sure this isn't the only one on how to approach this. they also only have a 30% acceptance rate, which is only 10% over their initial attempt with just a prompt.


One thing I really miss is a standard way to block any copilot/ai code completion tool from reaching specific files. That’s particularly important for .env files, containing sensitive info. We don’t want leaking secrets outside our machine, imagine the risks if they become part of the next training dataset. That’d be really easy to standardize, it’s just another .gitignore-like file.


So Cody allows you multiple ways to manage this.

In the Cody settings.json file you can disable autocomplete on entire languages/file types.

Additionally, we recently rolled out a Cody Ignore file type where you can specify files/folders that Cody will not look at for context. This feature is still in experimental mode though. https://sourcegraph.com/docs/cody/capabilities/ignore-contex...

With Cody, we have a relationship w/ both Anthropic and OpenAI to never use any data submitted via Cody users for training and data is not retained either.


> With Cody, we have a relationship w/ both Anthropic and OpenAI to never use any data submitted via Cody users for training and data is not retained either.

Can you say more about this? Is it public? Contractual? Verifiable somehow?


Hi - yes it is all outlined in our Terms of use for Cody

https://sourcegraph.com/terms/cody-notice

Sections I-IV :)


I see that it’s outlined there, I’m just curious how you gained the confidence to believe them (anthropic / openAI). Do you pay for auditors or do they have or provide some other evidence that gives you confidence that they won’t train on this data?

Neither Anthropic and OpenAI seem to be particularly mature organizations so while maliciously lying to their customers (you) about this would surprise me, “accidentally” not separating data sources would not be very surprising at all…


GitHub employee here

This does exist in GitHub Copilot - it’s called content exclusions: https://docs.github.com/en/copilot/managing-github-copilot-i...

I’m not sure if Cody has a similar feature, or if there’s any move towards a standardised solution.


Not just any subscriber is allowed though:

> This feature is available for organization accounts with a Copilot Business subscription.

And even if you exclude the moment anyone starts a chat the files are read and sent and could form suggestions:

> Excluding content from GitHub Copilot currently only affects code completion. GitHub Copilot Chat is not affected by these settings.

Both quotes from your link


> Not just anyone is allowed though:

Does it mean engineers created the feature but managers disabled it for some users?


It's called Price discrimination (https://en.wikipedia.org/wiki/Price_discrimination) almost all SAAS use it.

In this context, it means that I (a hobbyist) get to enjoy copilot at a discounted rate because there are some feature (context exclusion) which were costly to implement that I am not using. So it makes sense for me to take a less expensive plan which only includes the core features without the cruft. If there is any of these additional feature that I need, then I just need to pay for the engineering time that went into them, which takes less than a few seconds, it's just a button to click on. Fair and square.


Could you make this feature available to every subscriber?


These types of tools should exclude all files and directories in the .gitignore as standard operating procedure, unless those files are specifically included. Not just because of secrets, but also because these files are not considered to be part of the repository source code and it would be unusual to need to access them for most tasks.


It should probably use .gitignore as a default then you can opt in / out further. This would be a securer default than everything in.


.gitignore unless there’s a specific .codyignore ?

Or maybe a set of sensible global default ignores, based on the usual suspects from gitignore.io or suchlike ?


We need a standardized .aiignore file that everyone can work with. Aider does this with .aiderignore. They all just need to agree on the common filename


And format. The glob spec that gitignore uses isn’t necessarily trivial.


I created a cli tool that copies a GitHub or local repo into a text file for llm ingestion. It only pulls the filetypes you specify.

https://github.com/jimmc414/1filellm


I agree, but you really shouldn't keep unencrypted secrets locally to begin with.

Most secret managers allow you to either specify value references in .env files, or provide a way of running programs that specifically gives them access to secrets.


I wouldn’t want Cody interrogating my local Terraform state or tfvars, for example. There might not be unecrypted secrets in there, but there’s configuration data that I don’t want to disclose too.


Why not just use .gitignore itself? It's also unlikely they want to train on build files and such.


That makes sense but not everyone uses git. A new file that you can point to your gitignore in would probably be a good idea.


If you do not use git, you might not be GitHub's target audience.


You can purchase GitHub copilot and use it (even for corporations) and not use GitHub.

My company buys copilot for all devs, but doesn’t use GH or GHE


But your company doesn't use Git? That's all you need for .gitignore.


That's clever, but GitHub is recognized as the way to do things, so I've seen a couple repos where the only commits are of binaries to download, which technically uses git, but I wouldn't really say that's using git.


I often start a new project and work on it using Copilot for quite a while before making it official, running "git init" and creating a .gitignore file.


You shouldn’t have sensitive secrets on your workstation in your .env file though. It’s not what it’s for.


Where would I keep them instead? I always wonder. Let’s say I am developing an app against OpenAI API. Where do I put the OpenAI API Key in VS Code?

As far as I know .env file is the main supported way of setting environmental variables in VS Code.


Very interesting! I wonder to what extent this assumption is true in tying completions to traditional code autocomplete.

> One of the biggest constraints on the retrieval implementation is latency

If I’m getting a multi line block of code written automagically for me based on comments and the like, I’d personally value quality over latency and be more than happy to wait on a spinner. And I’d also be happy to map separate shortcuts for when I’m prepared to do so (avoiding the need to detect my intent).


This is great feedback and something we are looking at in regards to Cody. We value developer choice and at the moment for Chat developers can choose between various LLM models (Claude 3 Opus, GPT 4-Turbo, Mixtral 8x7b) that offer different benefits.

For autocomplete, at the moment we only support Starcoder because it has given us the best return on latency + quality, but we'd def love to support (and give users the choice to set an LLM of their choice, so if they prefer waiting longer for higher quality results, they should be able to)

You can do that with our local Ollama support, but that's still experimental and YMMV. Here's how to set it up: https://sourcegraph.com/blog/local-code-completion-with-olla...


> We value developer choice and at the moment for Chat developers can choose between various LLM models (Claude 3 Opus, GPT 4-Turbo, Mixtral 8x7b) that offer different benefits.

I wish y'all would put a little more effort into user experience. When you go to subscribe it says:

> Claude Instant 1.2, Claude 2, ChatGPT 3.5 Turbo, ChatGPT 4 Turbo Preview

Trying to figure out what's supported was tedious enough[0] that I just ended up renewing my Copilot subscription instead.

[0] Your contact page for "information about products and purchasing" talks about scheduling a meeting. One of your welcome emails points us to your Discord but then your Discord points us to your forum.


Thank you for the feedback. I totally agree there and we'll address this.


> Because of the language-specific nature of this heuristic, we generally do not support multi-line completions for all languages. However, we’re always happy to extend our list of supported languages and, since Cody is open-source, you can also contribute and improve the list. (link to https://github.com/sourcegraph/cody/blob/main/vscode/src/com...)

That link to the list of supported languages is broken. I couldn't find a similar file elsewhere in the repo: maybe the list got folded up into a function in another file? Also a bit annoying that I couldn't find the info on the company's website (though I gave up pretty quickly).


The list of supported languages is at https://sourcegraph.com/docs/cody/faq#what-programming-langu.... It works for all programming languages, but it works better on some than others, depending on the LLM and the language-specific work we've done on context-fetching, syntax-related heuristics, etc. On the LLM point, Cody supports multiple LLMs for both chat and autocomplete (Claude 3, GPT-4 Turbo, Mixtral, StarCoder, etc.), which is great for users but makes it tough to give any formal definition of "supported language".


Even that link doesn't claim to be comprehensive, including an extremely vague "shell scripting languages (like Bash, PowerShell)," as if I am supposed to cross my fingers and hope this means a nonstandard shell like fish works, when it seems like that could introduce weird or inscrutable bugs coming from incompatible bash syntax. (One under-discussed downside of LLM code generation is that it further penalizes innovation in programming languages.)

What I find frustrating is that Cody is doing deterministic language-specific tricks that don't depend on the underlying LLM, so why not just give a list of all the languages you did deterministic tricks for and call that the supported languages? Why be vague? Why deflect to the underlying LLM to encourage people to try your product with languages you won't do any work to support?

Claiming it works on "all programming languages" is lazy and dishonest, and clearly false. There's nothing magical about LLMs that relieves software developers of the duty to specify the limitations of their software.


The code is Apache 2 and we linked it from the blog post. You can see all of the tricks!

Nothing here is dishonest. We’re open about the limitations of the LLM being the main determining factor, and the numerous languages we’ve prioritized making it work well on. I see your point about not mentioning “etc.” in the shell scripting parenthetical. Will update.



I don't think it is. There is a test file which includes C#, Kotlin, etc among supported languages, and those aren't included in the file you linked: https://github.com/sourcegraph/cody/blob/main/vscode/src/com...

But this test didn't seem to include TypeScript so it's obviously not comprehensive. I'm not convinced this information is actually in one place.


Fantastic article and impressive work by this company. They're basically wrapping LLM's with a working memory, and tying it to user input. And thus we step a little closer to AGI/ASI.


It’s probably Retrieval Augmented Generation? Just wanted to put this here in case somebody is not familiar with this pattern.


has anyone tried Cody and Github Copilot and compared? I'm using GitHub Copilot and wouldn't mind switching to a better alternative.


(I left this comment earlier but I'll c+p here as well)

I wrote a blog post comparing Cody to Copilot a little while ago. Some of the stuff might be outdated now, but I think it still captures the essence of the differences between the two. Obviously I'm a little biased as I work for Sourcegraph, but I tried to be as fair as one could be. Happy to dive deeper into any details. https://sourcegraph.com/blog/copilot-vs-cody-why-context-mat...

Our biggest differentiators are context, choice, and scale. We've been helping developers find and understand code for the last 10 years and are now applying a lot of that to Cody in regards to fetching the right context. When it comes choice, we support multiple LLMs and are always on the lookout in supporting the right LLM for the job. We recently rolled out Claude 3 Opus as well as Ollama support for offline/local inference.

Cody also has a free tier where you can give it a try and compare for yourself, which is what I always recommend people do :)


that url got cut off and 404s (prob copy/pasta from another comment)

Correct url: https://sourcegraph.com/blog/copilot-vs-cody-why-context-mat...


Thank you for that. I copied an earlier comment and I guess the URL got cut off.


I used them both.

I ended up disabling copilot. The reason is that the completions do not always integrate with the rest of the code, in particular with non-matching brackets. Often it just repeats some other part of the code. I had much fewer cases of this with Cody. But, arguably, the difference is not huge. But then add on top of this choice of models.


I noticed I had a lot fewer of these problems these last few weeks. I suspect the Copilot team has put a lot more effort into quality-of-life recently.

For instance, I'd often get a problem where I'd type "foo(", and VsCode would auto-close the parenthesis, so my cursor would be in "foo(|)", but Copilot wouldn't be aware of the auto-close, so it would suggest "bar)" as a completion, leading to "foo(bar))" if I accepted it. But I haven't had this problem in recent versions. Other similar papercuts I'd noticed have been fixed.

I haven't used Cody, though, so I don't know how they compare.


I've used Copilot for months and Cody just today. I'm in the habit of using autocomplete to generate multiline chunks of code. So far, Copilot seems a bit better at autocomplete.

In particular, Copilot seems to do better at generating missing TypeScript import statements. These are relative imports of files in the same small repo. Neither of them seems to really understand my codebase in the way that Cody promises - they make up imports of nonexistent files. Copilot sometimes guesses the right file, I think based on my understanding my naming conventions better.


I switched from Copilot to Supermaven and in my experience it’s more than twice as effective. The suggestions are better and incredibly fast. Co pilot was a nice productivity boost but this is next level for me, I’m genuinely building features noticeably faster.


how does this compare to copilot? i would be willing to switch (or try it at least) if it can give better experience inside neovim


I wrote a blog post comparing Cody to Copilot a little while ago. Some of the stuff might be outdated now, but I think it still captures the essence of the differences between the two. Obviously I'm a little biased as I work for Sourcegraph, but I tried to be as fair as one could be. Happy to dive deeper into any details.

https://sourcegraph.com/blog/copilot-vs-cody-why-context-mat...

Our biggest differentiators are context, choice, and scale. We've been helping developers find and understand code for the last 10 years and are now applying a lot of that to Cody in regards to fetching the right context. When it comes choice, we support multiple LLMs and are always on the lookout in supporting the right LLM for the job. We recently rolled out Claude 3 Opus as well as Ollama support for offline/local inference.

Cody also has a free tier where you can give it a try and compare for yourself, which is what I always recommend people do :)

On Neovim, Cody actually does have experimental support for neovim: https://sourcegraph.com/docs/cody/clients/install-neovim. Not all features are supported as in VS Code though.


Cody is very good. I recommend giving it a try.

FWIW I believe Cody is much more actively developed than Copilot is these days, and so it has a more comprehensive feature set.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: