Ask HN: Why doesn't Copilot for Business use your organization's actual repos?

sqs · on Feb 24, 2023

They're working on it: https://githubnext.com/projects/copilot-view/.

We (Sourcegraph) are also working on it, and also bringing other kinds of code intelligence to the LLM so we can answer more kinds of questions: https://twitter.com/sourcegraph/status/1623339664428892162.

dpiers · on Feb 24, 2023

I’m an auditor at a large organization that uses Sourcegraph, and I love it. On one project I had to trace a compliance flow across 14 services written in 4 languages, and Sourcegraph made it easy to navigate everything and document our findings with links to code at specific revisions. I use it to dig through our codebases on a daily basis.

gtaylor · on Feb 24, 2023

Is this going to be available for self-hosted enterprise users?

sqs · on Feb 24, 2023

Yes, we're getting this out to existing customers first (including self-hosted customers). Anyone can sign up on the waitlist in my post above, and we're automatically prioritizing existing customers.

eigenvalue · on Feb 24, 2023

Awesome, thanks for your reply.

louiskw · on Feb 24, 2023

A couple of projects you can look at that might be good substitutes for what you’re asking for -

https://www.codecomplete.ai/ - current YC batch, looks very promising and sounds like this could be a big part of their USP

https://www.tabnine.com/ - the OG code completion, mentions personalised models on their enterprise plan

It’s also on the roadmap for us (AI code search not completions) https://bloop.ai/

travisjungroth · on Feb 23, 2023

Very limited opinion. Having looked into fine tuning Whisper and GPT, it looked to be a fiddly thing. The training isn’t as robust as the inference. Makes sense, it’s running at a different scale. You can manually check models before you release them. But that would put the cost at a different scale than $10/month.

My takeaway was that it’s very easy to mess up a model by fine tuning. You can overfit or have rapid degradation. Again, this is just reading about other experiences on other software, so maybe that’s not the case here.

SteveDR · on Feb 24, 2023

My org shot down an idea like this because they don’t want OpenAI to train on (or have access to) our codebase. I don’t blame them.

literalAardvark · on Feb 24, 2023

Neither do I, but it's a dinosaur with an umbrella attitude.

PaulHoule · on Feb 23, 2023

The usual story is that fine tuning is not very expensive compared to building the foundation model.

eigenvalue · on Feb 23, 2023

Right, that's why I'm really surprised they aren't doing something like that. Especially because they presumably have some amount of idle compute at Azure that could be used to work on this when it's not needed for something else.

theGnuMe · on Feb 24, 2023

It might not be stable. Sometimes fine-tuning can throw things out.

eigenvalue · on Feb 24, 2023

I think there are some good strategies for avoiding this, like setting an extremely low learning rate, or not allowing gradient updates that would result in more than N weights changing by more than K%, things like that.

PaulHoule · on Feb 24, 2023

The right approach might be something retrieval based where your own code gets embedded and then searched with some combination of dense similarity and keyword index.

aunch · on Feb 24, 2023

Codeium has a self hosted enterprise solution that gives the enterprise the ability to fine tune on their repos within that self hosted instance!