Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Why doesn't Copilot for Business use your organization's actual repos?
30 points by eigenvalue on Feb 23, 2023 | hide | past | favorite | 15 comments
I was just looking at the page for Copilot for Business ( https://docs.github.com/en/enterprise-cloud@latest/copilot/overview-of-github-copilot/about-github-copilot-for-business ), and while it does offer some useful things for business users versus the regular Copilot product, it seems to me that they could make it much more powerful by fine-tuning a smaller model on all the repos in an organization (even private ones) that could generate additional recommendations that take into account all the existing code a company has. Obviously, users would want assurances that these personalized models wouldn't be shared with anyone outside of the organization, but Microsoft has enough credibility that I think many businesses would try it in hopes of enhanced productivity.

Is it because it would be cost prohibitive? Maybe it only makes sense to fine tune a model if there are at least N users in the organization. Anyway, curious if anyone here has insights into this. Also interested in whether there are other companies offering this kind of product.



They're working on it: https://githubnext.com/projects/copilot-view/.

We (Sourcegraph) are also working on it, and also bringing other kinds of code intelligence to the LLM so we can answer more kinds of questions: https://twitter.com/sourcegraph/status/1623339664428892162.


I’m an auditor at a large organization that uses Sourcegraph, and I love it. On one project I had to trace a compliance flow across 14 services written in 4 languages, and Sourcegraph made it easy to navigate everything and document our findings with links to code at specific revisions. I use it to dig through our codebases on a daily basis.


Is this going to be available for self-hosted enterprise users?


Yes, we're getting this out to existing customers first (including self-hosted customers). Anyone can sign up on the waitlist in my post above, and we're automatically prioritizing existing customers.


Awesome, thanks for your reply.


A couple of projects you can look at that might be good substitutes for what you’re asking for -

https://www.codecomplete.ai/ - current YC batch, looks very promising and sounds like this could be a big part of their USP

https://www.tabnine.com/ - the OG code completion, mentions personalised models on their enterprise plan

It’s also on the roadmap for us (AI code search not completions) https://bloop.ai/


Very limited opinion. Having looked into fine tuning Whisper and GPT, it looked to be a fiddly thing. The training isn’t as robust as the inference. Makes sense, it’s running at a different scale. You can manually check models before you release them. But that would put the cost at a different scale than $10/month.

My takeaway was that it’s very easy to mess up a model by fine tuning. You can overfit or have rapid degradation. Again, this is just reading about other experiences on other software, so maybe that’s not the case here.


My org shot down an idea like this because they don’t want OpenAI to train on (or have access to) our codebase. I don’t blame them.


Neither do I, but it's a dinosaur with an umbrella attitude.


The usual story is that fine tuning is not very expensive compared to building the foundation model.


Right, that's why I'm really surprised they aren't doing something like that. Especially because they presumably have some amount of idle compute at Azure that could be used to work on this when it's not needed for something else.


It might not be stable. Sometimes fine-tuning can throw things out.


I think there are some good strategies for avoiding this, like setting an extremely low learning rate, or not allowing gradient updates that would result in more than N weights changing by more than K%, things like that.


The right approach might be something retrieval based where your own code gets embedded and then searched with some combination of dense similarity and keyword index.


Codeium has a self hosted enterprise solution that gives the enterprise the ability to fine tune on their repos within that self hosted instance!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: