Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Use Code Llama as Drop-In Replacement for Copilot Chat (continue.dev)
187 points by sestinj on Aug 24, 2023 | hide | past | favorite | 52 comments
Hi HN,

Code Llama was released, but we noticed a ton of questions in the main thread about how/where to use it — not just from an API or the terminal, but in your own codebase as a drop-in replacement for Copilot Chat. Without this, developers don't get much utility from the model.

This concern is also important because benchmarks like HumanEval don't perfectly reflect the quality of responses. There's likely to be a flurry of improvements to coding models in the coming months, and rather than relying on the benchmarks to evaluate them, the community will get better feedback from people actually using the models. This means real usage in real, everyday workflows.

We've worked to make this possible with Continue (https://github.com/continuedev/continue) and want to hear what you find to be the real capabilities of Code Llama. Is it on-par with GPT-4, does it require fine-tuning, or does it excel at certain tasks?

If you’d like to try Code Llama with Continue, it only takes a few steps to set up (https://continue.dev/docs/walkthroughs/codellama), either locally with Ollama, or through TogetherAI or Replicate's APIs.




I wish someone would make an IntelliJ/Android Studio plugin for Code Llama (or another local llm). I know both platforms have their own AI features but that involves sending my code to their servers which I'm really not a fan of.


We've done work to set Continue up to be a JetBrains plugin—it's a matter of just a) figuring out the basics of putting a webview in a JetBrains plugin and b) implementing this class (https://github.com/continuedev/continue/blob/main/extension/...) that communicates with the server. So long story short, we're planning on this soon and welcome PRs. It's a medium-sized task.


Well, you might hold your nose for a few days while Copilot helps you write the IntelliJ plugin for Code Llama. Now that would be a good hacker news post...


UniteAI (https://github.com/freckletonj/uniteai) I think fits the bill for you.

This is my project, where the goal is to Unite your AI-stack inside your editor (so, Speech-to-text, Local LLMs, Chat GPT, Retrieval Augmented Gen, etc).

It's built atop a Language Server, so, while no one has made an IntelliJ client yet, it's simple to. I'll help you do it if you make a GH Issue!


at Refact we have Jetbrains plugin that works with a bunch of local code models https://github.com/smallcloudai/refact/


but for code llm to be useful , the local machine need to have some very powerful GPU


not true anymore. give codellama-7b-instruct a try. Just install Ollama. Pretty mind-blowing performance for the ram it uses and how fast it is. It's in the ballpark of chatgpt-3.5-turbo for code related stuff


I just love this so much. It took what, maybe 6 hours for these model binaries to be publicly available? Never seen so much excitement in tech.


This would be great if I had a beefy machine with 32GB ram and 16 cores. The alternatives presented are to use either Together or Standard api's to run llama-code. Both have a free tier but eventually you wind up paying. Has anyone calculated the financials and determined that the cost of these alternatives is less than copilot? Cost aside, there's a concern of quality.

I should just buy a super computer..


If you run Ollama locally, the cost is zero. With Replicate, you pay about $0.005 per second, so with an average of say 15 seconds per request to generate the full response, and 25 inputs per day, this comes out to ~$50/month. More expensive than classic Copilot, but also solving a different, potentially more valuable problem. Also likely that costs drop as we're seeing across the board. And at a minimum, the free tiers probably get you all the experimentation you need before figuring out which model is best :) (including Continue's free trial for GPT-4)


Curious to learn about the different, potentially more valuable problem that you mentioned about? Is this for Replicate OR Continue?


Perfect reason to build a desktop IMHO :-)

You'd be surprised how affordable it can be. Mine sits on a tailnet with my laptop and handles most of the workload for me. Especially while I'm at home, I barely even notice that my dev environment is not running "locally." Also doubles great as a host for audiobookshelf, jellyfin, archivebox, and more. I have virt manager set up too (client on laptop, virtmanager service running on desktop) so I can easily spin up all sorts of VMs. I love it. I spent about $2k but I bought top of the line.


>This would be great if I had a beefy machine with 32GB ram and 16 cores.

Honestly this is a pretty standard workstation setup now. The days of 16GB being sufficient are pretty much over. RAM is cheap. And you can run the 7b model with well under that much.


No need for a super computer.

The 7B model should run on fairly reasonably priced GPUs. So if you have a desktop PC you can probably get a second hand GPU.

My MacBook Pro runs the larger model fine, wouldn’t call it a super computer, but I do think it’s a rather expensive machine.


Yes, you likely will either have to pay a monthly fee or own compute yourself.

Where did this expectation of free compute come from? Copilot costs too.


>Where did this expectation of free compute come from?

VC funded growth & users at any cost and figure out profitability later model?

i.e. hn gang


Why is this post flagged?

I've yet to try code stuff with AI (even Copilot). How well do local models like Code Llama work compared to GPT? When I use ChatGPT, GPT 3.5 feels a decade behind GPT 4. Half the answers are flawed. It's only GPT4 that boosts my productivity to an impressive level.

If you look at local text models for fiction or chat, they seem way behind even 3.5. See the examples table at https://docs.sillytavern.app/usage/faq/#what-do-you-mean-whe...

Code tasks are the sort of thing where I need as much accuracy as possible. Would something like Code Llama actually boost productivity or is this just "look, we can technically do it too, even if the result is awful" thing?


When will tools/models like these start integrating with code servers and linters in IDEs instead of just yielding supercharged autocomplete?


The guy TJ De Vries works at Sourcegraph + contributes to neovim and he’s building sg.nvim - a plug-in to hook Sourcegraph AND their code assistant Cody to neovim LSP.


I <3 neovim, but copilot has me in vscode more than i'd like. all of these tools are amazing; seriously excited about the future here.


I use copilot in neovim[1]. It was remarkably simple to get installed. Highly recommend

[1]: https://github.com/github/copilot.vim


I do use that plugin actually; TPope is an absolute boss. The One missing feature I find myself moving to VSCode for that's not in the Vim plugin yet is the ability to open up an interactive chat session and ask:

"Please parse <dict name> and re-key it by <field>", making sure to remove entires where <x> is Blue and converting <y> to Yellow."

and it'll dump out a 40 line (working!) parser in ~3 seconds I can then further customize. It's honestly remarkable as you can then interactively ask it to update/adjust "Can you make that a reusable function where I pass in X,Y, and Z?" "Can you convert that to a tuple comprehension?", "Can you unroll that loop and add inline comments explaining the regex?" are all futher-drilldowns I'd use and expect good responses to.

I do have ShellGPT setup to give me (similar generic) responses in the terminal; but I haven't found a good way to let it see my code yet to let it parse my data structures as fluidly.


need to mod https://github.com/jackmort/chatgpt.nvim to use copilot or ollama


Alright - this looks amazing - thank you. I'm surprised I missed it, What's the best place to stay ontop of new/awesome neovim plugins? I admit to not having paid as much attention to /r/neovim as in past years after the recent api lockdown; but there has to be more o.0



surely steve yegge will not let this go unanswered


I guess I've been summoned.

We're primarily focusing on VSCode, IntelliJ and Neovim for Cody. Of course I'll be working on an Emacs version, but that's kinda best-effort for now.

As for the new crop of codegen models, they seem to be getting to parity with GPT/Claude/Bard-class models for code autocompletions, but not so much for other tasks.

We're working on incorporating OSS models, but I'd be surprised if they're ready for prime-time this year. I think next year they'll be huge.

Just my $0.02, take with a grain of salt. Shit moves fast.


I think "best-effort" would be a good tagline for emacs


excuse me @dang a ban is in order


please share that elisp when you get around to the emacs version. it doesn't matter if it sucks, you can't let this t.j. hooligan get the last laugh.


Are there any copilot-style plugins for Sublime Text? I still can't stomach the idea of vs code.


good to see code llama supported through Continue already.

Are you seeing good results with code llama yet?


It's definitely a knowledgable model. I'm seeing a bit of trouble with stopping at the right point and managing chat context, but this is all a matter of prompt engineering. Very promising, and exciting work to be done!


> I'm seeing a bit of trouble with stopping at the right point and managing chat context, but this is all a matter of prompt engineering

Can you explain what you mean by this, maybe provide some examples?


Yeah, I asked it to write bubble sort in python, and it did so perfectly, but then decided to add a couple dozen parentheses to the end. I then followed up asking it to remove these, and it didn't seem to grasp that there was a conversation, as it just rewrote the same thing.

All said, this was after like 5 minutes of playing with the 13B model, and was not using any kind of human/assistant formatting, hence the need for at least simple prompting to make that work (or fine-tuning if it isn't trained yet on conversational data)


Why is it necessary to sign up for two external services if this is running the LLM locally?


You can cobble together an OpenAI-esque server locally with llama.cpp: https://github.com/ggerganov/llama.cpp/issues/2766 and the script here: https://www.reddit.com/r/LocalLLaMA/comments/15ak5k4/short_g... and use Continue's GGML Model option to query it: https://continue.dev/docs/customization#local-models-with-gg...

(I've gotten this working, more or less, but don't have the hardware to make it practical so I can't give any feedback about Continue or the model.)


Amazing. Was looking for instructions to do precisely that


No need to sign up for any service to use Continue with Ollama. This will do the inference all locally.

I believe if you want to do the inference using Replicate or Together, you’ll have to sign up for their services.


I don't have a Mac.

So it's either have a Mac or don't run locally if you want to use this service?


Sorry about that. We are working on building the Windows and Linux versions. May I ask which specific OS you are on?


Not OP, but would be great to have it working on Linux. How can one help?


^exactly, Replicate or Together are only options that are potentially more convenient


I don't see how things like Cursor[0] (a VS Code fork with a nicer Codex UI) have a moat or chance of profitability in the face of things like this. If you can run models locally or your employer can spin up their own instance for internal use on a decent server, why is anybody paying OpenAI? And that leaves Cursor with no cash flow to skim off of.

[0]: https://www.cursor.so/


You can't run anything even remotely close to GPT-4 locally.

Large enterprises can run a private instance of it, but OpenAI is paid for that (indirectly via Microsoft).


Their value proposition appears to be a fancy UI. Better/cheaper models would only improve their product and grow their addressable customer base.


Anybody can copy that UI once they figure out what works. No moat.


I wouldn't discount good UI.

I don't use Cursor, but consider that GPT-3 sat around for months and nobody was really talking about "AI doom" and "the lightcone" until OpenAI put a good chat UI layer over it.

I would argue that actually capturing the value from these models lies mostly in the UI: allowing the user to seamlessly and quickly extract useful information from the model.


Chat generally being a nice way to interact with the model is one thing, but if anything this proves my point: there are now dozens of little web UIs on GitHub that are as good as the official one or better. You put in an API key and you have fully recreated their UI. I even wrote a CLI[0] in 130 lines of Deno that I prefer over the web UI. People are trying to charge for them but I don't think any will succeed.

[0]: https://gist.github.com/david-crespo/d9dbefe5a50c0f0da9ac3de...


Are you paying OpenAI? I’m not. Facebook’s midterm game on “massive” AI seems to be “commoditize the competition” and ATM it looks like a good bet.


I am paying a couple bucks a month for GPT-4 API calls, but not for Copilot usage.


there's overhead to setting up your own local LLM instance (by that I mean both resource utilization and the setup process itself). so long as this continues (the setup portion at least), there will be people not willing to invest the time and rather pay cursor.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: