Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: CodeComplete (YC W23) – Copilot for Enterprise
138 points by dingliqing53 6 months ago | hide | past | favorite | 68 comments
Hello HN! We’re Max and Lydia, co-founders at CodeComplete AI (https://codecomplete.ai), an AI-powered coding assistant for enterprise companies. Many large companies can’t use products like GitHub Copilot because of the security and privacy risks, so we’re building a self-hosted version that’s fine tuned to the company’s codebase.

We love Copilot and believe that AI will change the way developers work. Max wanted to use Copilot when he was an ML engineer at Meta, but leadership blocked him because Copilot requires sending company code to GitHub and OpenAI. We built CodeComplete because lots of other companies are in the same boat, and we want to offer a secure way for these companies to leverage the latest AI-powered dev tools.

To that end, our product is really meant for large engineering teams at enterprise companies who can’t use GitHub Copilot. This generally means teams with more than 200 developers that have strict practices against sending their code or other IP externally.

CodeComplete offers an experience similar to Copilot; we serve AI code completions as developers type in their IDEs. However, instead of sending private code snippets to GitHub or OpenAI, we use a self-hosted LLM to serve code completions. Another advantage with self-hosting is that it’s more straightforward to securely fine-tune to the company’s codebase. Copilot suggestions aren’t always tailored to a company’s coding patterns or internal libraries, so this can help make our completions more relevant and avoid adding tech debt.

To serve code completions, we start with open source foundation models and augment them with additional (permissively-licensed) datasets. Our models live behind your firewall, either in your cloud or on-premises. For cloud deployments, we have terraform scripts that set up our infrastructure and pull in our containers. On-prem deployments are a bit more complicated; we work with the customer to design a custom solution. Once everything’s set up, we train on your codebase and then start serving code completions.

To use our product, developers simply download our extension in their IDE (VS Code currently supported, Jetbrains coming soon). After authentication, the extensions provide in-line code completion suggestions to developers as they type.

Since we’re a self-hosted enterprise product, we don’t have an online version you can just try out, but here are two quick demos: (1): Python completion, fine-tuned on a mock Twitter-like codebase: https://youtu.be/YqkqtGY4qmc. (2) Java completion for "leetcode"-style problems, like converting integers to roman numerals: https://youtu.be/H4tGoFNC8oI.

We take privacy and security seriously. By default, our deployments only send back heartbeat messages to our servers. Our product logs usage data and code snippets to the company’s own internal database so that they can evaluate our performance and improve their models over time. Companies have the option to share a subset of that data with us (e.g. completion acceptance rate, model probabilities output, latencies, etc), but we don’t require it. We never see your code or any other intellectual property.

We charge based on seat licenses. For enterprise companies, these contracts often demand custom scoping and requirements. In general though, our pricing will be at a premium to GitHub Copilot since there is significant technical and operational overhead with offering a self-hosted product like this.

Having access to these types of tools would have saved us a bunch of time in our previous jobs, so we’re really excited to show this to everyone. If you are having similar issues with security and privacy at your current company, please reach out to us at founders@codecomplete.ai! We’d love to hear your feedback.




Not to be confused with the software development book by the same name: https://en.wikipedia.org/wiki/Code_Complete

Taking that name would not fly under trademark rules, but fortunately books are copyrighted. But then again it's published by Microsoft Press - the publishing arm of your biggest competitor.


It is still early enough to do a cool rebranding. Something more distinctive and non-obvious on the first association but on second.


GitHub and other companies like Amazon have the advantage of the scale in terms of dataset. What’s the guarantee that the pre trained model you have that you’ll fine tune on a company’s code base is as good as say Copilot? It makes it even hard to evaluate when you don’t have a demo to try - it’s not that hard to setup a pipeline to run your model in cloud and send invites to potential customers if you want to.


It doesn't make sense to treat Copilot as a competitive benchmark where Copilot isn't even an option for policy reasons, which is what this product is targeting.

Does it provide net positive support to developers? Is that support worth the licensing and maintenance costs? Those would indeed benefit from a demo, but it's also hard to demo something whose value hinges on fine-tuning.

More likely, they'll need to sell low-risk, supported integrations that help an org stand up the project on their own system and evaluate its quality in situ. It's a whole different model than retail SaaS services like Copilot.

This is an exciting announcement for a product meant to fill a gaping hole. The orgs that need it will just have to see how well it works for them.


Thank you!! You've hit the nail on the head and our thoughts exactly. One anecdote is we offer video demos and one potential customer just said, "I don't need to see it. I know what Copilot does." We do offer pilots for customers after fine-tuning on their codebase!


Lol if Swatcoder is available they could be great at devrel


Let us know, swatcoder!


Hah. That's not where I'm investing my attention these days, but good luck!


What compliance issues are there? Copilot is already being adopted at the enterprise level. Microsoft is not going to miss out on enterprise by trying to ship data out.


Wasn't copilot only trained on public GitHub repos? Presumably there's nothing stopping anyone from training on them.


Yea, there are tons of public data to train on. Copilot's under fire right now: https://githubcopilotlitigation.com/, but we make sure we only train on permissively licenses because some companies are sensitive to the IP issues here.


Can you give a list of the permissive licenses you train on? MIT for example requires attribution. In fact, most permissive licenses have similar requirements.


MIT only needs attribution for redistributing "all copies or substantial portions of the Software".

Juat reading the code shouldn't be illegal, and producing a short snippet based on the code but not identical shouldn't count as a "substantial portion", but IANAL...


So if Microsoft/Github ever offers an on-prem version, what will be the advantage of using your product?


Sounds like they'll finetune the model on your existing codebase. Even if Microsoft/Github offers this functionality, it's the kind of thing where it may be worth paying a premium to have good customer support telling you exactly how to make use of this.

It's like how there are plenty of people paying for DB software which is 'worse' than mainstream free ones, but they really like the fact they can call experts who will tell them what they are doing wrong with query optimization, etc.


>It's like how there are plenty of people paying for DB software which is 'worse' than mainstream free ones, but they really like the fact they can call experts who will tell them what they are doing wrong with query optimization, etc.

And for us, poor peasants, there's always going to be stackoverflow.


Yes! Thanks for elaborating for us! :)


Another thing is that since Microsoft/Github are working with OpenAI's closed source Codex model, we think it's unlikely they'll offer something on-prem anytime soon since they would have to reveal the model weights, and thus risk a leak (ex. Meta's LLaMA model weights got leaked within a week)


That is a... very big assumption that I wouldn't put money on.

It's become pretty clear that models aren't a moat. If everyone has Codex-class capability (which is already happening), there's no real risk to them deploying on-prem, because the model itself is a commodity.


Models can be watermarked. A sufficiently tight licence and most decent companies won't risk disclosing them.


At that point you would need to compete on service and price. They'll also get the first mover lock in advantage. Once a large company deploys your product to 5000 devs you are pretty much in for life. /me waves to jira.


Congrats on the launch. I think you should share some technical details for a more substantial pitch. You are using the OSS BigCode effort and "The Stack" [1, 2] (as you say in another comment), which is great.

A few questions that might help an enterprise customer: How big is your base model? Where did you find more datasets (maybe just a hint would be sufficient)? Are you using SantaCoder [3]? Anything you can say about your fine-tuning that makes it special? Totally on board with you that HumanEval/MBPP are not great benchmarks for real world, and do you have a suggested alternative to help me see the value?

The calculus for an enterprise customer might be: "We could fine tune a 6B model on our internal code and internal benchmarks (say with a month of work, a few thousand in compute, 2 people on task), but I'd rather buy an off-the-shelf solution like codecomplete.ai. They give us XYZ benefits." Articulate the XYZ for a technical decision maker who will be your target audience.

* [1] https://huggingface.co/datasets/bigcode/the-stack

* [2] https://arxiv.org/abs/2211.15533

* [3] https://huggingface.co/bigcode/santacoder


Great questions. We want to keep some of our technical details closer to the chest, so I won't go into the specific technologies we're using here.

I will expand a bit on fine-tuning. It's really hard to get this right, and the iteration speed is slow. Of course these companies can build their own, but we want to save them a lot of headache.

So far, we haven't found any off-the-shelf open source base model that works super well for code completions. We've augmented models with a huge amount of data in order to see our current performance, and we ran into a lot of pain along the way.


> Permissively Licensed: Trained only on permissively-licensed repos to avoid legal risks

You’re fine-tuning the model. What model are you fine-tuning? I can’t imagine you trained your own LLM from scratch, so how can you possibly guarantee the core model wasn’t trained on non-permissively licensed code?


We're starting off with an open source base model that was trained on The Stack, a dataset containing only permissively-licensed code, and we're further augmenting it with additional repositories with MIT, BSD, or Apache Licenses.


How are you complying with the attribution requirement for MIT and Apache?

From the MIT License[0]:

>The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

From the Apache License[1]:

>You must give any other recipients of the Work or Derivative Works a copy of this License; and You must cause any modified files to carry prominent notices stating that You changed the files; and You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and

[0] https://opensource.org/license/mit/

[1] https://opensource.org/license/apache-2-0/


Always astonishes me how negative hacker news can be whenever people try to launch stuff. This is cool!


Negative reactions are quicker to arise and quicker to write out, which is a double whammy that causes them to appear more quickly and dominate the thread early. Slower, positive reactions eventually show up (usually!) and in the meantime we get the odd "contrarian dynamic" phenom where "I can't believe how negative these comments are!" objections start popping up and get upvoted to the top. Internet is weird.

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...


The top comment usually goes like:

negative comments -> some presumably "funny" witty joke -> positive/meaningful insight -> peopel complain about comment quality


For real, any of these threads is guaranteed to have a ton of nitpicks and comparisons with products that are way more mature. It's pretty sad to see specially in a so-called hacker community. Congrats on the launch!


When I read these threads I mentally prepend "this is neat" to all comments. Perhaps I'm too optimistic, but I think nearly any engagement with a new startup or idea is appreciation on some level.


I usually prepend that to my critical comments ;)

If I’m responding in a critical manner it usually is because I think something is useful/ there.


A web extension to do this automatically would be the hero no one wanted, but the one we deserved.


Because since there isn't any public demo available, you can't really say more than "Yay, this seems interesting."

So it's easier to point out obvious potential flaws.


Check out when Dropbox launched

https://news.ycombinator.com/item?id=9224


Thanks enono :)


Any thoughts on an individual "Here you go, but you're on your own" license? Something like this would be massively beneficial to small IT teams who have the same inability to send code to OpenAI, but could use the benefit of a smart auto-complete style system like Copilot. They wouldn't necessarily need the custom integration - even small benefits would be seen from an initial core model. Additional integrations could be sold as piecemeal consulting engagements after the fact if need be.

Just a thought - there's a vast market out there for organizations with dev teams in the 1-10 person range.


The infra costs are probably minimum $1000/month so it doesn't make sense to target small teams with that overhead yet. I'd expect something open source to appear hear in the next few months, though it might be based on legally iffy LLaMa weights.


Definitely want to offer this in the future but not currently our ICP. Like rileyphone mentioned, the cost of hosting a model is substantial. Could you please elaborate on why the small IT teams can't send code to OpenAI? We'd love to think more about this!


Congrats on the launch. This seems very compelling. For anyone asking "Why can't GitHub just do this?", (1) never underestimate an extremely smart and motivated team, and (2) competition is critical and valuable.

Let's celebrate when teams build products to fill big needs, instead of dismissing things because of a potential threat from a big company. If we dismiss new things, then we'll just end up in a world where big companies get complacent and devs get less new stuff. (I know many GitHubbers, and they love seeing new stuff and are cheering for it because they can't do everything.)


True, but it's a bold move taking on the market leader with only a weak differentiator they could compete against over night.


Couldn't agree more :)


Love the idea of running this stuff on-prem / in house. Just not being beholden to random updates in the rules or logging policies seems like a big value add. +1 to the other commenter who mentioned they'd love to see a version for smaller teams - there might be more privacy-oriented teams out there than you'd expect. I think even a version that can run in public clouds would be meaningful.

Have you gotten feedback from folks using it on how it compares to Copilot in terms of usefulness? What's the 'order of magnitude' difference?


This seems really cool for a company that has enough surface area of software where chances are somebody has already built the thing you need and instead of building a copy yourself, you can integrate with the existing system


Code discovery and search have been great applications of Copilot and other AI dev tools!


Will you train on accepted/regected and other responses from devs rather than just the final code, so that clients lose something when moving to a similar tool which will pop up tomorrow?


Yes! We definitely have that on our product roadmap, though we think of it more in terms of improving model performance vs. deterring clients from switching to a different tool. Since we're working with large enterprise codebases and deploying models to behind their firewalls, there's (hopefully) a nontrivial switching cost. Thanks for pointing it out!


Have you benchmarked it against Copilot?

Training data is only your own data, because that might not achieve a great accuracy?

Does it work with any programming language?


We don't have any rigorous benchmarks against Copilot, but we're working on building an evaluation framework to do so. We've played a bunch with traditional academic metrics for codegen (e.g. pass @ k) and found they don't correlate super well with real-world performance. Also, want to mention that we are not competing directly with Copilot. A benchmark against Copilot is useful as we further improve our product, but our main value add here is not that we perform better than Copilot, but rather that we serve a customer segment that can't use Copilot. Would love to hear any thoughts you have.

For training, we start with a capable open-source base model, augment it with a bunch of permissively-licensed repos, and then fine-tune on the customer codebase.

We currently support C/C++, Go, Gosu, Java, Javascript, Python, Ruby, and Typescript, but we're continuously adding new languages.


What would a benchmark even look like?

Does it make useful code? Does it make the same code?

Or more strictly on something like latency and cost?


How does this compare with https://www.tabnine.com/enterprise, which is also self-hosted, trained on permissively licensed repositories and supports training on private repos?


This is definitely the case in our company:

"Many large companies can’t use products like GitHub Copilot because of the security and privacy risks, so we’re building a self-hosted version that’s fine tuned to the company’s codebase."


They'll change their tune when they get beaten by more permissive competitors


> Trained only on permissively-licensed repos to avoid legal risks

Nice.

Slightly unrelated, but are there any languages that use mostly non-permissive licenses that might have gimped your dataset for that language?



This is good for some business teams, but it still doesn't offer self-hosting or fine-tuning.


I think this a problem that is worth solving as not only the biggest companies are concerned with security, many SaaS and tech companies are concerned about source code of product become compromised.


Excited to try this looks really good! Been looking for something like this for our Pynecone code base!


Let's talk :). Congrats on the Pynecone launch!


Does the "fine-tuning" also happen on-premise or do you need to access the Code?


Can you let us know which "open source foundation model" you use?


Congrats Lydia & Max! Great demo videos


Thanks ninjaa!


> Trained only on permissively-licensed repos to avoid legal risks

Do any of those “permissive” licences require attribution? Do they all grant a patent license?


Codeium already has this enterprise option, with self-hosting, fine-tuning, and even things like search on top of autocomplete and way more IDEs: https://codeium.com/

How are you different?


aren't you codeium?


Yes, thus the question :)


Don't try to just blithely "smile" your self promotion away. Your question is a valid one, but it's pretty obviously morally dubious to mention Codeium without disclosing that you're involved with them.




Applications are open for YC Winter 2024

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: