Show HN: Sourcebot, an open-source Sourcegraph alternative

peterldowns · 2024-10-02T03:10:25.000000Z

Re-asking [0] as a top-level question, since it has gone unanswered: do you intend to make a business out of this project in some way, or is it a "real" open source project?

I know that intentions can change, but I'm curious how you see it. Sourcegraph was pretty clearly always going to be a business-type-of-project, and like most business projects, relicensed everything to their custom enterprise license. Originally it was Apache 2 [1].

I love open source and I write a lot of it myself [2]. I use the MIT license, just like you've done here, and I admire that. I don't think you owe me or anyone else anything, and the MIT license makes that clear.

I am very interested in this project and I'd love to extend and contribute to it, but only if it's an actual open source project. Seems like every devtools-focused startup these days calls themselves "open source" but fails to actually build a community, because in reality it's just a marketing gimmick. Because the project is actually a company, the people involved never try very hard to build a community of contributors. When the company invariably cannot make money with an open source product, the code gets relicensed to be closed-source. The few people who had contributed end up getting played. That's what happened to Sourcegraph!

So: open source, or open source "for now"?

[0]: https://news.ycombinator.com/item?id=41715776

[1]: https://github.com/sourcegraph/sourcegraph-public-snapshot/c...

[2]: https://github.com/peterldowns

spmurrayzzz · 2024-10-02T16:07:23.000000Z

Not the author, but given that this is a relatively small UI wrapper of a zoekt[1] backend, it seems like the risk here is isolated to the upstream Sourcegraph-maintained search dependency. By relatively small, I mean that the total SLOC for UI code in the entire project is around ~3.5k (compared to the backend which is currently 25x the size). Seems difficult to ascribe any enterprise motivations given that and additionally the UI seems very useful as-is even if you had to fork it and build a new community from there.

[1] https://github.com/sourcegraph/zoekt

peterldowns · 2024-10-02T16:26:16.000000Z

There's not really "risk" either way, I'm a fan of open source and I'm also a fan of businesses making money, I just don't want to donate time and energy to a business.

What they've described smells a lot like a thing that needs to become a business — see Sourcegraph — and Brendan [0] and Michael [1] are currently working together at a startup they founded.

I'm getting tired of seeing other businesses pissing in the pool by claiming to be "open source" purely for the marketing benefits, so I figured I'd ask up front and see what they say.

Should be a simple answer either way!

[0]: https://www.linkedin.com/in/brendan-kellam/

[1]: https://www.linkedin.com/in/msukkari/

spmurrayzzz · 2024-10-02T16:46:16.000000Z

Yea I think I understand your motivation re: not donating your time. I guess my assumption was more-so that the likelihood of 3k SLOC UI project becoming a business seemed incredibly remote. Perhaps that is misguided.

bshzzle · 2024-10-02T16:51:50.000000Z

Thanks for the thoughtful question.

This is still day 1, so we honestly don't have an answer if we will get to a point where we can monetize - it's too early to tell. However if we do end up going down that road, I don't think generating revenue and being a good steward of open source is mutually exclusive.

My view is that there is a balance that can exist between open source and building a profitable business that doesn't negatively impact the open source community. Companies that come to mind that I think are striking this balance are PostHog & GitLab.

peterldowns · 2024-10-02T17:00:47.000000Z

Thanks for the reply — to be clear, I understand your answer to be “this is a business but we’re not yet sure how we’ll make money.”

Great work so far; best of luck!

hanwenn · 2024-10-02T08:07:37.000000Z

Hi!

sorry for not responding to your email, I was swamped.

I looked through the sourcecode, but I can only find UI (ie. browser) code. Does this do anything beyond delivering a more functional and prettier UI on top of an existing zoekt deployment? If no, everybody would be better served if you tried to improve the UI inside Zoekt, which currently is a live demonstration of (my lack of) web app programming skills.

Have you thought of how you will achieve your further goals (eg. semantic search)? That will require server-side changes, but you currently have no Go code at all.

bshzzle · 2024-10-02T17:57:50.000000Z

Hey!

Yea that is correct - in its current state, it's functionally a UI wrapper on top of the zoekt-webserver api. One of the reasons why we decided to go with a separate app is that we have much more experience with Typescript, React, and NextJS (the web framework we are using), so it felt like we could move allot quicker using what we know.

In terms of semantic search, that is still very early days - my intuition is that having a separate "semantic code indexer" server written in Python would again allow us to move quickly (since all of the ML libraries are written in Python).

blagund · 2024-10-02T13:09:36.000000Z

As for just a UI over Zoekt, let me plug https://github.com/TreeTide/underhood/tree/develop#getting-s.... Uses https://github.com/TreeTide/zoekt-underhood.

peterldowns · 2024-10-02T16:39:06.000000Z

There's also Neogrok, which is quite nice!

https://github.com/isker/neogrok

https://neogrok-demo-web.fly.dev/

maxloh · 2024-10-02T15:04:36.000000Z

Zoekt already has its own UI, though it is very feature-limited and lacks syntax highlighting. Demo: https://cs.bazel.build/

If you’re curious about the source, as I was, here it is: https://github.com/sourcegraph/zoekt/blob/main/web/templates...

morgante · 2024-10-01T20:42:28.000000Z

Awesome to see another open source player in the space, especially after Sourcegraph went closed source.

It looks like you're working on this full-time (and it's a lot of work to build great code search, as I know from working on my own product).

What are your plans for monetizing / building a sustainable business without inevitably going closed source like Sourcegraph?

bshzzle · 2024-10-01T21:54:23.000000Z

Currently, we don't have any plans of monetizing - the main focus for us right now is building something that people want to use :)

peterldowns · 2024-10-02T00:00:11.000000Z

Do you plan on eventually attempting to monetize in some way, or is this open source as in free software as in you legitimately are just creating a new open source project?

I understand intentions can change, but there's a difference, and I'm curious to know the answer.

quest88 · 2024-10-02T14:17:45.000000Z

In lieu of money, how do you know you're building the right thing? For me, money is a good indicator you're building the right thing and solving the right problem.

threecheese · 2024-10-01T19:04:22.000000Z

Regarding your response to “why not use an IDE?”; do you have any other product-like use cases interest you? The one you mention - search across many repositories - makes a lot of sense for organizations with (for example) a Github Enterprise installation and want to investigate or make changes across multiple components. This is definitely relevant to me, and so I wonder what other cool things can I do with it?

bshzzle · 2024-10-01T19:30:58.000000Z

I think in the immediate term, we would like to talk to as many people as we can that have this "search across many repos" problem such that we can dial in the core search experience.

Looking beyond the immediate, I think there is allot of fertile ground with respect to making engineering teams more efficient beyond just regular code search. Semantic code search for example is one of those features that I really wish I had when I was at my last job - would have made onboarding onto new codebases much easier.

Would love to hear more about your use cases: brendan@sourcebot.dev

mdaniel · 2024-10-02T15:19:48.000000Z

I'll point out that you're missing a stellar opportunity to showcase your own champagne via

  --- a/README.md
  +++ b/README.md
  @@ -1,256 +1,256 @@
  - We do not collect or transmit [any information related to your codebase](https://github.com/search?q=repo:sourcebot-dev/sourcebot++captureEvent&type=code)
  + We do not collect or transmit [any information related to your codebase](https://demo.sourcebot.dev/search?query=repo%3Asourcebot-dev%2Fsourcebot%20captureEvent)

which regrettably currently says "No results found" :-(

smarx007 · 2024-10-02T18:05:02.000000Z

https://demo.sourcebot.dev/search?query=repo%3ATaqlaAI%2Fsou...

but there are a few things that need fixing, at least repo redirects and case-insensitive `repo:` arguments.

bshzzle · 2024-10-02T19:28:37.000000Z

I like this idea! Will fix this in a sec.

awenix · 2024-10-02T01:57:00.000000Z

Another solid code search tool https://github.com/hound-search/hound.

Based on regexp

maxloh · 2024-10-02T06:12:44.000000Z

It seems to be unmaintained. The last commit was more than a year ago.

imp0cat · 2024-10-02T09:17:34.000000Z

Yeah, it's quite old ( https://www.etsy.com/codeascraft/announcing-hound-a-lightnin... - 2015) Sourcegraph has a more polished interface, Hound is very bare in comparison.

However, Hound does the job well.

maxloh · 2024-10-02T06:17:15.000000Z

Why not just fork Sourcegraph, instead of building the product from the ground up?

metaroxx · 2024-10-03T19:58:29.000000Z

How would this work if you want to index different branches of a repository only?

For example I’d like to index branches release1, release2, etc. but not have it index developer temporary gitlab MR branches.

I assume HEAD is referred to the head of the default branch when cloning the repository.

IshKebab · 2024-10-01T20:24:01.000000Z

Nice! Still not quite as good as grep.app from an interface point of view. They have instant search-as-you-type results over all of GitHub.

It's not open source but I use it all the time. Far superior to Github's search.

richardw · 2024-10-01T22:04:00.000000Z

Anyone know how companies like this maintain tabs on so much of the GitHub repos? I assume very distributed crawling/cloning.

mdaniel · 2024-10-02T15:07:50.000000Z

I'd use their "firehose" API if I were doing it: <https://docs.github.com/en/rest/activity/events?apiVersion=2...> and <https://docs.gitlab.com/ee/api/events.html#list-a-projects-v...>

I don't have experience to know if that's cheaper (for the hoster) than just periodically calling the $(git fetch --mirror) endpoint. I could see opening a conversation with the major providers asking which they would prefer, since it's in everyone's best interest to not unduely hammer them

richardw · 2024-10-03T06:31:02.000000Z

Excellent thank you. Those look like events on a specific resource rather than “firehose” which sounds more like a global events list. Everything GitHub has a quota so there’s no way companies are staying under the normal 5000 or 15000 limit to fetch all of the changes!

mdaniel · 2024-10-03T16:31:35.000000Z

Based on my understanding, yes, the events are global and it is a firehose. The burden would be upon the consumer to drop messages not relevant to the repos it is watching, but almost certainly less heartache than trying to add individual subscriptions for thousands(?) of repos. The GitLab one seems less firehose-y but for this specific problem would still likely help not hammer them

To the best of my knowledge, any such quotas are per API key. It's possible they are per account, but creating accounts is free.

Also, any such mechanism would only be to advise the sync process that a commit (or push) had occurred, and it would still use the $(git fetch --mirror) process but would just be an optimization of not running it (all the time|too infrequently)

jmakov · 2024-10-01T18:30:12.000000Z

Can somebody share the use case of this? Why not just use your IDE?

bshzzle · 2024-10-01T18:37:28.000000Z

yea it's a fair question - an IDE is often more convenient when you have the code checked-out locally. This becomes a pain when you work in a organization with potentially hundreds of repositories that you need to search across (e.g., a org stores their 100+ microservices in separate repos, and you need to find all places where they make a request to your service).

Hackbraten · 2024-10-02T06:54:02.000000Z

I use ghorg in tandem with ripgrep to address that problem. The former is for checking out the main branches of all repositories, the latter to perform the actual search.

eptcyka · 2024-10-01T20:04:18.000000Z

I cannot run Xcode on Linux, I cannot run Visual Studio on Linux, I might not have an IDE set up for the language that I want to inspect. Many reasons. Also, some languages practically require arbitrary code execution to make a build, which I'd much prefer to shove into an isolated VM.

metadaemon · 2024-10-01T19:25:48.000000Z

Finding examples of how others implement similar logic is my biggest use case for code searching, but since GitHub copied SourceGraph, I don't have much of a need for these self-hosted solutions.

rafaelgoncalves · 2024-10-02T17:59:28.000000Z

yeah, github has a nice search now, the only complaint is that you need to be logged on to use, besides this is really nice.

zdw · 2024-10-01T22:50:43.000000Z

Does this make a copy of each repo on ingest?

Can it work against in-place repos, for example if hosted on the same server as a code forge installation?

bshzzle · 2024-10-01T23:53:52.000000Z

Yea exactly - on ingest it clones the repos and will periodically fetch new revisions.

Currently we don't support in-place repos, but feel free to file a issue and we'd be happy to take a look.

planb · 2024-10-01T19:44:17.000000Z

Great work! Any plans to add Gitea/Forgejo (self-hosted) support?

bshzzle · 2024-10-01T20:37:31.000000Z

Thanks! Yea we would definitely like to support more code-hosts. If you have a sec, could you open a issue so we can track it?

planb · 2024-10-02T05:24:43.000000Z

Looks like someone else already did that: https://github.com/sourcebot-dev/sourcebot/issues/13

schreiaj · 2024-10-02T03:54:07.000000Z

Can you add repos after starting the container? What about persisting indexes across restarts?

Still, neat. Glad to have an easy to deploy open source tool like this.

bshzzle · 2024-10-02T19:11:16.000000Z

Yes - there is a file watcher that should pickup modifications to the configuration file.

And you can persist indexes across restarts by mounting a volume to the `/data` directory (e.g., `-v $(pwd):/data`). Indexes are stored in a `.sourcebot` cache directory.

Thanks for the interest!

acloudbutwhy · 2024-10-01T23:37:12.000000Z

What sort of effort is required for additional host types? I see an issue is opened for self-hosted Bitbucket which would be a blocker for me to try it.

mattfat5 · 2024-10-01T18:07:11.000000Z

This is well done thanks for the share.

ashobeiri · 2024-10-01T17:06:33.000000Z

This is really exciting. Happy to see someone building an open source solution in this space

ergocoder · 2024-10-02T02:23:29.000000Z

What a milestone. SourceGraph is big enough to have its own open source clone

TavsiE9s · 2024-10-01T20:21:05.000000Z

Any plans for non Github/Gitlab integrations? Gitea/Gogs/etc. maybe?

bshzzle · 2024-10-01T20:38:37.000000Z

yes definitely - mind opening a issue so we can track it?

cprogrammer · 2024-10-02T00:16:26.000000Z

Does it support Perforce? i couldn't find it in the schema in the repo.

bshzzle · 2024-10-02T02:04:44.000000Z

No just GitLab and GitHub atm - but please feel free to file an issue for Perforce support.

cprogrammer · 2024-10-02T17:14:17.000000Z

Thanks, will do.

j4coh · 2024-10-01T17:49:26.000000Z

Cool to see someone carrying on the dream after SourceGraph lost their way.

bastawhiz · 2024-10-01T21:37:02.000000Z

I haven't followed SG closely. Other than licensing, what have they done to fall out of favor?

Starlevel004 · 2024-10-01T23:11:52.000000Z

They started aggressively pushing their (bad) copilot competitor.

Squarex · 2024-10-02T09:50:58.000000Z

What's wrong with Cody? I find it better than Copilot.

selimthegrim · 2024-10-02T15:08:19.000000Z

People were criticizing their hiring and salary structure too recently

asdev · 2024-10-01T21:14:34.000000Z

sourcegraph is dead with advent of LLMs and AI coding tools right? Github cross repo search is also not bad anymore

esafak · 2024-10-01T22:19:09.000000Z

Wrong. Unless you want to feed the LLM your entire codebase, which is usually infeasible, you need to be able to retrieve relevant context, which relies on understanding the codebase, as Sourcegraph does. Sourcegraph has a product that does precisely this, called Cody.