This article has been around for some time, and it still shines. It focuses on building a product rather than just prototyping an LLM wrapper and waiting for the dark magic of GenAI.
Chapters like "The Strive for Faster Latencies" and "Post-Processing" are truly inspiring.
Creating production-level DevTools demands much more effort than merely wrapping around a ChatCompletion endpoint and mindlessly stuffing a context window with everything accessible inside the IDE (so-called "prompt engineering").
I attempt to use LLMs in coding tasks many times a day, the capability is there: Opus can make and execute a plan, GPT-4 can find and fix a few persistent typographical errors when Opus attempts verbatim output, and Dolphin-8x7 (or a bunch of other consistently candid models) can de-noise out the static interference from the Morality Police over-aligment.
For a long time it’s been about extravagances like the ability to recover from a lost session, or get a link queried and used as context, or do a reset of the slowly but inevitably corrupted state without losing all of your painstaking assembled point in some abstract state space.
Basic product building on the core LLM experience is way more important than incremental improvements on LLMs, I’d take robustness, consistency, and revision control over an LLM breakthrough, a chat bot session is a tech demo if I lose my work with my browser tab.
Interesting, what gave you this impression? This article was first published only 4-6 months ago around Oct 2023.
Don't get me wrong, I'm a fan of sourgraph and their founder, Quinn, is quite charismatic. I almost went for a job offer with them 5+ years ago. But let's be real, startups are not a winning game for a non-founder, thank goodness I stuck it out with BigCorp. In any case, this isn't that old of a post, cheers.
They seem to have got the correct impression that it's been "around for some time" as "some time" does not mean "a very long time" and you just confirmed that it indeed isn't a just-published article but one which, coming out last year, has been around "for some time".
I'm guessing you're just reading into the phrase an implication that isn't there about it being particularly old.
"Some time" does have pretty much that implication. The exact amount of time is unspecified, but I'm unsure that 6 months is enough to count. Maybe this is an American vs British English thing?
> Congratulations, you just wrote a code completion AI!
> In fact, this is pretty much how we started out with Cody autocomplete back in March!
Am I wrong in thinking that there's only like 3(?) actual AI companies and everything else is just some frontend to ChatGPT/LLama/Claude?
Is this sustainable? I guess the car industry is full of rebadged models with the same engines and chassis. It's just wild that we keep hearing about the AI boom as though there's a vibrant competitive ecosystem and not just Nvidia, a couple of software partners and then a sea of whiteboxers.
For those who might not be aware of this, there is also an open source project on GitHub called "Twinny" which is an offline Visual Studio Code plugin equivalent to Copilot: https://github.com/rjmacarthy/twinny
It can be used with a number of local model services. Currently for my setup on a NVIDIA 4090, I'm running both the base and instruct model for deepseek-coder 6.7b using 5_K_M Quantization GGUF files (for performance) through llama.cpp "server" where the base model is for completions and the instruct model for chat interactions.
you've got to differentiate between training, inference and hardware. they all benefit from the "AI boom" but at different levels and have varying levels of substitutability (Google tells me that's a real word)
it make sense, in order to come up with a base model, you will need a lot of quality training data, tons of compute.
the role of an AI startup is to come up with ideas, thus useful products.
Most of existing products pre-AI, are also front ends to existing operations systems and exiting databases, because creating the whole stack does not make sense.
At least we have state of the art open models that we can use freely
I mean... the article goes on to explain all their value add... which of course can be replicated, but it's not as if you can just grab an API key and do the same.
But OpenAI could make this themselves in a few weeks, right? If they happen to decide to, then this company is done.
That's what I don't get about all these AI startups. The core value of their business is an API call they don't own. It's like people who were selling ______ apps on iPhone when Apple added their own built-in _____ features to iOS, except the entire industry is like that.
Well, the same happened in the early days of the PC. There were all these companies that sold basic utilities like disk scanners, defragmenters, partitioners, antiviruses, etc. When operating systems started to get bigger and include these utilities by default, that industry dried up overnight. I don't think there's anything wrong in building an inherently ephemeral business that just seeks to fill a niche that exists just because someone else hasn't filled it yet.
Well sure, a company with ample funding could in theory do anything. It seems sort of like asking why Google didn't beat GitHub on code hosting. They did try, but their focus was elsewhere, so they gave it up. [1] And OpenAI doesn't seem to be doing developer tools at all?
GitHub and Microsoft are more likely candidates. GitHub Copilot is a competing product that could be improved if they chose to do so.
I'm not really saying that small companies should never compete with big companies.
I'm saying that a small company whose biggest competitor is also their key supplier is, if not doomed, at least operating on borrowed time. If my value proposition is reselling AWS hosting with an easy-to-use dashboard, I better pray to god Amazon never makes AWS simpler.
I feel like that can be said about any SaaS app. The reality is, mega-corps move very slowly and often can't deliver the same user experience as a start up. If everyone had the approach of "well I shouldn't make this app because if Google wanted to, they could do it in a few weeks" we wouldn't have 90% of the startups we see today.
Aren't we all operating on borrowed time though? Even if some company exists for 3 years and is able to founders a small acquihire exit, what's wrong with borrowing some time from AWS/OpenAI? We can't all come up with PageRank and patent it and form a company around it.
Currently for chat we support Claude 2, Claude 3 Haiku, Sonnet, and Opus, GPT-4 Turbo, and Mixtral 8x7b.
For autocomplete we're using Starcoder 16b.
We also support local inference with any LLM via Ollama (this is experimental and only for Cody Free/Pro users). Enterprise customers can choose which LLMs they want.
The “core value” of many companies is a database service they don’t own, but their businesses seem to do just fine. There’s a few who get obviated over time (usually they are also slow to react), but they’re a minority.
I would also question the idea that OpenAI could build a code editing companion this robust in a fee weeks. It’s been a long time since the models have been the limiting factor in these tools.
I think I’d argue that the models are still a limiting factor for these kinds of tools - but there’s still also plenty of competitive space outside of models. The context you gather, the way you gather it and your prompting matters.
> The context you gather, the way you gather it and your prompting matters.
But your API provider gets to see most of that because you send it to them along with your prompts. I don't know what the ToS for these AI providers are like, but it seems like they could trivially copy your prompts.
Uhhh, no they don't? They see the results of RAG but there's very little to distinguish what is context you gathered vs. what isn't. On top of that, there's nothing in the prompt that indicates what decision was made before inserting something into a prompt. Let's say my retrieval pipeline used a mix of semantic search and keyword-based search, and I used some mechanism to decide which results from which search to include for this specific request. Nothing in the prompt will say that, and so the differentiating behavior is still kept "secret" if you will.
But I think this line of thinking is a bit ridiculous. Yes, if you send your data to a database service, that provider has your data and theoretically use it to run you out of business. Except they kinda can't, both as a matter of practicality and legality. OpenAI, Anthropic, and others have SOC II compliance. The easiest way to ensure you run yourself out of the market is to start violating that stuff.
Most companies have valuable data that they do own stored in a commodity database they don't. I know if Amazon started charging 10x for RDS, it would be painful but we could migrate in a few weeks.
They’ve been thinking about the problem continuously for the past year, can run experiments across their user base and thus have a lot more context and insight than someone whipping up an implementation.
well they wrote a whole blog about how they do it... and im sure this isn't the only one on how to approach this. they also only have a 30% acceptance rate, which is only 10% over their initial attempt with just a prompt.
One thing I really miss is a standard way to block any copilot/ai code completion tool from reaching specific files.
That’s particularly important for .env files, containing sensitive info. We don’t want leaking secrets outside our machine, imagine the risks if they become part of the next training dataset.
That’d be really easy to standardize, it’s just another .gitignore-like file.
With Cody, we have a relationship w/ both Anthropic and OpenAI to never use any data submitted via Cody users for training and data is not retained either.
> With Cody, we have a relationship w/ both Anthropic and OpenAI to never use any data submitted via Cody users for training and data is not retained either.
Can you say more about this? Is it public? Contractual? Verifiable somehow?
I see that it’s outlined there,
I’m just curious how you gained the confidence to believe them (anthropic / openAI). Do you pay for auditors or do they have or provide some other evidence that gives you confidence that they won’t train on this data?
Neither Anthropic and OpenAI seem to be particularly mature organizations so while maliciously lying to their customers (you) about this would surprise me, “accidentally” not separating data sources would not be very surprising at all…
In this context, it means that I (a hobbyist) get to enjoy copilot at a discounted rate because there are some feature (context exclusion) which were costly to implement that I am not using. So it makes sense for me to take a less expensive plan which only includes the core features without the cruft. If there is any of these additional feature that I need, then I just need to pay for the engineering time that went into them, which takes less than a few seconds, it's just a button to click on. Fair and square.
These types of tools should exclude all files and directories in the .gitignore as standard operating procedure, unless those files are specifically included. Not just because of secrets, but also because these files are not considered to be part of the repository source code and it would be unusual to need to access them for most tasks.
We need a standardized .aiignore file that everyone can work with. Aider does this with .aiderignore. They all just need to agree on the common filename
I agree, but you really shouldn't keep unencrypted secrets locally to begin with.
Most secret managers allow you to either specify value references in .env files, or provide a way of running programs that specifically gives them access to secrets.
I wouldn’t want Cody interrogating my local Terraform state or tfvars, for example. There might not be unecrypted secrets in there, but there’s configuration data that I don’t want to disclose too.
That's clever, but GitHub is recognized as the way to do things, so I've seen a couple repos where the only commits are of binaries to download, which technically uses git, but I wouldn't really say that's using git.
I often start a new project and work on it using Copilot for quite a while before making it official, running "git init" and creating a .gitignore file.
Very interesting! I wonder to what extent this assumption is true in tying completions to traditional code autocomplete.
> One of the biggest constraints on the retrieval implementation is latency
If I’m getting a multi line block of code written automagically for me based on comments and the like, I’d personally value quality over latency and be more than happy to wait on a spinner. And I’d also be happy to map separate shortcuts for when I’m prepared to do so (avoiding the need to detect my intent).
This is great feedback and something we are looking at in regards to Cody. We value developer choice and at the moment for Chat developers can choose between various LLM models (Claude 3 Opus, GPT 4-Turbo, Mixtral 8x7b) that offer different benefits.
For autocomplete, at the moment we only support Starcoder because it has given us the best return on latency + quality, but we'd def love to support (and give users the choice to set an LLM of their choice, so if they prefer waiting longer for higher quality results, they should be able to)
> We value developer choice and at the moment for Chat developers can choose between various LLM models (Claude 3 Opus, GPT 4-Turbo, Mixtral 8x7b) that offer different benefits.
I wish y'all would put a little more effort into user experience. When you go to subscribe it says:
> Claude Instant 1.2, Claude 2, ChatGPT 3.5 Turbo, ChatGPT 4 Turbo Preview
Trying to figure out what's supported was tedious enough[0] that I just ended up renewing my Copilot subscription instead.
[0] Your contact page for "information about products and purchasing" talks about scheduling a meeting. One of your welcome emails points us to your Discord but then your Discord points us to your forum.
> Because of the language-specific nature of this heuristic, we generally do not support multi-line completions for all languages. However, we’re always happy to extend our list of supported languages and, since Cody is open-source, you can also contribute and improve the list. (link to https://github.com/sourcegraph/cody/blob/main/vscode/src/com...)
That link to the list of supported languages is broken. I couldn't find a similar file elsewhere in the repo: maybe the list got folded up into a function in another file? Also a bit annoying that I couldn't find the info on the company's website (though I gave up pretty quickly).
The list of supported languages is at https://sourcegraph.com/docs/cody/faq#what-programming-langu.... It works for all programming languages, but it works better on some than others, depending on the LLM and the language-specific work we've done on context-fetching, syntax-related heuristics, etc. On the LLM point, Cody supports multiple LLMs for both chat and autocomplete (Claude 3, GPT-4 Turbo, Mixtral, StarCoder, etc.), which is great for users but makes it tough to give any formal definition of "supported language".
Even that link doesn't claim to be comprehensive, including an extremely vague "shell scripting languages (like Bash, PowerShell)," as if I am supposed to cross my fingers and hope this means a nonstandard shell like fish works, when it seems like that could introduce weird or inscrutable bugs coming from incompatible bash syntax. (One under-discussed downside of LLM code generation is that it further penalizes innovation in programming languages.)
What I find frustrating is that Cody is doing deterministic language-specific tricks that don't depend on the underlying LLM, so why not just give a list of all the languages you did deterministic tricks for and call that the supported languages? Why be vague? Why deflect to the underlying LLM to encourage people to try your product with languages you won't do any work to support?
Claiming it works on "all programming languages" is lazy and dishonest, and clearly false. There's nothing magical about LLMs that relieves software developers of the duty to specify the limitations of their software.
The code is Apache 2 and we linked it from the blog post. You can see all of the tricks!
Nothing here is dishonest. We’re open about the limitations of the LLM being the main determining factor, and the numerous languages we’ve prioritized making it work well on. I see your point about not mentioning “etc.” in the shell scripting parenthetical. Will update.
Fantastic article and impressive work by this company. They're basically wrapping LLM's with a working memory, and tying it to user input. And thus we step a little closer to AGI/ASI.
(I left this comment earlier but I'll c+p here as well)
I wrote a blog post comparing Cody to Copilot a little while ago. Some of the stuff might be outdated now, but I think it still captures the essence of the differences between the two. Obviously I'm a little biased as I work for Sourcegraph, but I tried to be as fair as one could be. Happy to dive deeper into any details.
https://sourcegraph.com/blog/copilot-vs-cody-why-context-mat...
Our biggest differentiators are context, choice, and scale. We've been helping developers find and understand code for the last 10 years and are now applying a lot of that to Cody in regards to fetching the right context. When it comes choice, we support multiple LLMs and are always on the lookout in supporting the right LLM for the job. We recently rolled out Claude 3 Opus as well as Ollama support for offline/local inference.
Cody also has a free tier where you can give it a try and compare for yourself, which is what I always recommend people do :)
I ended up disabling copilot. The reason is that the completions do not always integrate with the rest of the code, in particular with non-matching brackets. Often it just repeats some other part of the code. I had much fewer cases of this with Cody. But, arguably, the difference is not huge. But then add on top of this choice of models.
I noticed I had a lot fewer of these problems these last few weeks. I suspect the Copilot team has put a lot more effort into quality-of-life recently.
For instance, I'd often get a problem where I'd type "foo(", and VsCode would auto-close the parenthesis, so my cursor would be in "foo(|)", but Copilot wouldn't be aware of the auto-close, so it would suggest "bar)" as a completion, leading to "foo(bar))" if I accepted it. But I haven't had this problem in recent versions. Other similar papercuts I'd noticed have been fixed.
I haven't used Cody, though, so I don't know how they compare.
I've used Copilot for months and Cody just today. I'm in the habit of using autocomplete to generate multiline chunks of code. So far, Copilot seems a bit better at autocomplete.
In particular, Copilot seems to do better at generating missing TypeScript import statements. These are relative imports of files in the same small repo. Neither of them seems to really understand my codebase in the way that Cody promises - they make up imports of nonexistent files. Copilot sometimes guesses the right file, I think based on my understanding my naming conventions better.
I switched from Copilot to Supermaven and in my experience it’s more than twice as effective. The suggestions are better and incredibly fast. Co pilot was a nice productivity boost but this is next level for me, I’m genuinely building features noticeably faster.
I wrote a blog post comparing Cody to Copilot a little while ago. Some of the stuff might be outdated now, but I think it still captures the essence of the differences between the two. Obviously I'm a little biased as I work for Sourcegraph, but I tried to be as fair as one could be. Happy to dive deeper into any details.
Our biggest differentiators are context, choice, and scale. We've been helping developers find and understand code for the last 10 years and are now applying a lot of that to Cody in regards to fetching the right context. When it comes choice, we support multiple LLMs and are always on the lookout in supporting the right LLM for the job. We recently rolled out Claude 3 Opus as well as Ollama support for offline/local inference.
Cody also has a free tier where you can give it a try and compare for yourself, which is what I always recommend people do :)
Chapters like "The Strive for Faster Latencies" and "Post-Processing" are truly inspiring.
Creating production-level DevTools demands much more effort than merely wrapping around a ChatCompletion endpoint and mindlessly stuffing a context window with everything accessible inside the IDE (so-called "prompt engineering").