Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Launch HN: Buildt (YC W23) – Conversational semantic code search
123 points by Buoy on March 2, 2023 | hide | past | favorite | 42 comments
Hi HN! We’re Ali, Sam and Yang, the founders of Buildt (https://buildt.ai), an LLM-powered IDE extension that allows you to ask highly contextual and semantic questions about your code. It’s a bit like if you had a colleague sitting next to you who has perfect memory of your codebase. Our VS Code extension is here: https://marketplace.visualstudio.com/items?itemName=BuildtAI....

Some demos: https://twitter.com/AlistairPullen/status/162848600700289433... and https://twitter.com/AlistairPullen/status/162848600806408601...

We’ve been devs on projects ranging from mobile apps, arbitrage trading systems, VR platforms to on-demand startups. Without fail, whenever a codebase gets over a certain size or we inherit legacy code, we get slowed down from not knowing where a certain snippet lives, or how it works. I’m sure we’ve also bothered our colleagues when we first get onboarded for longer than they would like.

Current code search products aren’t too different from CMD + F. We’ve often wanted results that aren’t captured by string matches or require some nuanced understanding of our codebase—questions such as “How does authentication work on the backend?”, "Find where we initialize Stripe in React”, or “Where do we handle hardware failures?”

To build a tool to help developers quickly search and understand large codebases requires contextual understanding of every line of code, and then how to surface that understanding in a useful format.

First we need to parse your codebase; this isn’t a walk in the park as we can’t simply embed your code files because in that instance if you were to surface a result for a specific search you’d only be brought to the file that the result was in, and no deeper. To be able to find specific snippets of code you’re looking for, we need to be much more granular in how we split up your codebase. We’ve used a universal parser (TreeSitter), so we can traverse the Abstract Syntax Tree (AST) of your code files to pick out individual functions, classes, and snippets to be embedded; not the entire file. This allows us to work on your codebase on a more semantic level than the raw source code.

Once we have extracted all of the relevant code from the AST, we have to embed them. (We use a number of other search heuristics too, such as edit distance and exact matches, but embeddings are the highest weighted and core heuristic.) We’ve learned a great deal about the best implementations of embeddings for this use case, particularly in this case when using embeddings to search between modalities (natural language and code) we found that hypothetical search queries were the optimal way to surface relevant code, as well as creating a custom bias matrix for our embeddings to better optimize them at finding code from short user queries. Simply embedding the user’s search query and searching the answer space with it was a poor solution.

One embeddings heuristic we use is a HyDE comparison, which involves using an LLM to take the user’s search query, and then generate code that it thinks will be similar to the actual code the user’s trying to find. This process is well documented and has given us a huge increase in performance (https://www.buildt.ai/blog/3llmtricks). Another heuristic allows us to achieve “search for what your code does, not what it is” functionality—this involves the embeddings gaining some form of understanding of what the code actually does. For this we used embedding customisation to create a bias matrix to mutate the vector space in such a way that the embeddings cluster code by its functionality rather than simply its literal strings (https://www.buildt.ai/blog/viral-ripout).

By having a product that lives in your IDE instead of your Git repository, we give you the power of contextual understanding in real time as you’re working on your codebase. There’s no need to context switch or change apps—everything is self-contained; you can easily search for code, have your code refactored and fresh code written from a single extension.

Buildt is free for now as we’re still in beta, but in the future we’ll charge something like $10 per seat per month. We’re currently building the last part of what we consider our core features, cross-file codegen. Soon you’ll be just ask Buildt to instantly perform request such as ‘add firebase analytics to every user interaction’.

We started Buildt as a product to tackle our own frustrations and we’d love for you to try it out and let us know what you think. We can’t wait to hear your feedback, questions, and comments!




Really cool and would love to try it out. Curious why don’t also launch a web interface. No reason it has to be in the IDE only.

I use IntelliJ products so I can’t use this yet


Interesting suggestion, the extension itself is actually a React project that runs in VS Code so not much uplift would be required but we’d have to figure out how we’d actually interact with the codebase in that instance


github oauth?


How would this work with local changes while developing?


This is awesome! A related problem that currently has no solution I know of is documentation rot. I would definitely pay for a LLM that compared docs and code and told me if they got too far apart.


Why isn't this a feature of documentation frameworks? Like it could be just a simple, "Hey, I see this function in the codebase has changed since the time you wrote the documemtation for it, do you want to update it's description?"


I would definitely be at danger of getting into the habit of just saying no if I was asked everytime it changed, especially early in the dev cycle. However, if it was just at pull request time, I probably wouldn't get frustrated with it.


Slightly OT, but what documentation frameworks do you recommend?


I'm actually building this very thing—shoot me an email at govind <dot> gnanakumar <at> outlook <dot> com if you'd like to be a beta tester.


Oh, I probably should've posted here rather than doing a review on the extension. Sorry!

When I click sign in with Google or whatever it's called, it then redirects me to https://localhost:3002/api/blah blah blah

Would love to give it a go though - looks super impressive. We have an "interesting" code base with a lot of moving parts so I was keen to see whether this helped finding the right part of the codebase I needed.

Excellent idea by the way :)


Hey, thank you for the feedback - are you unable to get beyond the sign in point? If you could drop me a line on ali@buildt.ai I’ll gladly try to sort this out for you!


Congrats on the launch! Copilot was the v0 application of LLMs for code, but I think using them in this way will be even more valuable to devs. Added Buildt to the list at https://github.com/sourcegraph/awesome-code-ai.


Nice product! Any integration planned for jetbrain ides?


Yes we do in the medium term, we fortunately built the product in a modular/headless way so adding further integrations is easier, although we're strapped for dev resource currently so once that problem is alleviated then we can start looking at supporting more IDEs!


If it is headless , I assume that their is an API that you could maybe release to the public so companies could build their own IDE plugins or even integrate it further into their workflow like adding code fractions automatically to tickets etc.


Yes we’ve definitely considered this as an option, we’ll hopefully be able to explore it more when we have more dev resource in the next few months (just my cofounder and I working on the tech currently). I think it makes a lot of sense to allow people to make their own integrations!


Any plans to go open-source? Untrusted code like this is unlikely to be allowed to exfiltrate data from most secured private codebases.


We have properitery product. Should we fear our code getting stolen, if we use your plugin.


dont worry. go fast my friend limited time offer


I got quite excited trying this last week, only to be stuck at the importing phase for a while. Then I went back to your landing page, and realized you do not support. Ruby, yet. Bummer. Got a timeline for that? Good luck with your launch!


The next two languages are Go and Ruby as it happens! We need to make some changes based on learnings from the launch today first but it’s relatively easy for us to implement these new languages


Congrats on the launch!

Currently trying to document a large Angular app; how would Buildt interact with something like Compodoc [0]?

[0] https://github.com/compodoc/compodoc


Any APIs I can use so I can build a plugin for Sublime Text? Also, it seems to hang when analyzing. I left it for a day and it did not finish.


Very few answers or comments from the founders after the first couple hours. That's a bit of a shame. A wasted opportunity for a Launch HN IMHO.


Congrats on the launch! I've really enjoyed the blog posts about LLMs and how you're enabling scanning.

FYI after I signed in I had to wait a long time for my repo to index.


No web interface?

Are we experiencing a move back from the web towards Microsoft again?

Is Microsoft gaining a stronghold on the developer community via their tools and services ecosystem?


> Is Microsoft gaining a stronghold on the developer community via their tools and services ecosystem?

They are owning the services that run the developer tools and ecosystem.

I have said this years ago. [0] Why is it taking everyone to long to realize that Microsoft is getting smarter with their newly revived EEE strategy?

[0] https://news.ycombinator.com/item?id=28324999


I have been following your blog for a while now - it's been quite instructive on ways to speed up LLM-based tasks. All the best with your launch!


Thank you very much! Released an article yesterday on my tips for the ChatGPT API if you haven’t already seen it!


congrats on launching! decided to try out the project on a simple Next.js repo i had lying around, just a page and a serverless function. i'm 5 minutes in to "Analysing codebase" and this is way too slow. i'm already bored and about to switch away. just giving feedback. you could have been scanning my repo while giving me that long onboarding with multiple pages to flip thru.


I’ve heard a few reports of slow indexing, it may be load related but I need to investigate further (on a flight currently so will look when I land), if you’re happy to drop me a line on ali@buildt.ai and I’ll try to help figure this out!


In your 3llmtricks link, you are missing a hyperlink to the HyDe paper:

> There’s been a lot of attention around the HyDe paper recently [LINK]


Thanks for flagging - will get this fixed! For reference the paper is here: https://arxiv.org/abs/2212.10496


I love the idea. This sounds like a way better use of conversational AI than "bot that writes commit messages".


Thanks! It’s much harder to implement that’s for sure but I totally agree that it’s a great way to interact, being able to and follow up messages and clarify things is really great


Anyone else unable to complete setup? Mine's been stuck on "Analysing codebase" for hours (~30k LoC).


This may be load related - very happy to provide support on your issue if you can drop me a line at ali@buildt.ai


Tree-sitter builds a concrete syntax tree, not an abstract syntax tree, right?


Would love to try but i'm full nvim now, any plans to integrate?


Not currently, I think jetbrains is definitely next in terms of priority but I think as per some suggestions on here and on Twitter we may end up releasing our headless API to allow people to build their own integrations - need to hire some more devs first though!


Would love this as I know Emacs people would build a package in an instant given an accessible API.


exciting! we built (no pun intended) something similar as a free functionality in Codeium (https://www.codeium.com/waitlist/codeium-search) - we just soft launched so anyone on VSCode can opt in to it

it's cool to see other attempts as well. natural language search done properly can definitely accelerate developers by a ton




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: