Hi HN,
Code Llama was released, but we noticed a ton of questions in the main thread about how/where to use it — not just from an API or the terminal, but in your own codebase as a drop-in replacement for Copilot Chat. Without this, developers don't get much utility from the model.
This concern is also important because benchmarks like HumanEval don't perfectly reflect the quality of responses. There's likely to be a flurry of improvements to coding models in the coming months, and rather than relying on the benchmarks to evaluate them, the community will get better feedback from people actually using the models. This means real usage in real, everyday workflows.
We've worked to make this possible with Continue (https://github.com/continuedev/continue) and want to hear what you find to be the real capabilities of Code Llama. Is it on-par with GPT-4, does it require fine-tuning, or does it excel at certain tasks?
If you’d like to try Code Llama with Continue, it only takes a few steps to set up (https://continue.dev/docs/walkthroughs/codellama), either locally with Ollama, or through TogetherAI or Replicate's APIs.