Hacker News new | past | comments | ask | show | jobs | submit login
Sourcegraph is now open source (sourcegraph.com)
594 points by aranw on Oct 1, 2018 | hide | past | favorite | 93 comments

Sourcegraph CEO here. We're back to work on testing release candidates for Sourcegraph 2.12, coming out this week. I'm especially excited about a few big new features that fit really nicely into the code search and browsing workflows people use Sourcegraph for:

- https://github.com/sourcegraph/about/blob/master/projects/ex...

- https://github.com/sourcegraph/about/blob/master/projects/so...

- https://about.sourcegraph.com/blog/discuss-code-and-docs-in-...

along with a whole host of improvements to code search, code intelligence, self-hosted deployment on a single node and k8s, etc.

Happy to answer any questions folks have about our open sourcing. I summarized the business case at https://twitter.com/sqs/status/1046913901688807424. And we went with the Apache License (2.0) and an open core model because that's about as standard and as open as possible for this kind of thing.

Thanks for creating Sourcegraph for GitLab https://github.com/sourcegraph/about/blob/master/projects/so... we really appreciate it.

Every single developer that used Sourcegraph and talked to me about it loved the product. I think that every code product must have intelligent navigation.

I saw you already responded to my tweet in https://twitter.com/sqs/status/1046986755919024129 We're looking forward to add Sourcegraph to GitLab by default and it is awesome that you offer to do the work.

Thanks for choosing the Apache license which will make this much more acceptable to some companies then for example AGPL.

Sourcegraph CEO here. :) Thanks for creating GitLab and for the kind words. We're indeed looking forward to adding Sourcegraph to GitLab by default. Look out for a prototype/proposal MR from us in a bit. That's one of the super valuable kinds of features/integrations that we can do now that Sourcegraph is open source!

Amazing! Here's an issue we can use to discuss: https://gitlab.com/gitlab-org/gitlab-ce/issues/41925

Awesome! Apart from the Web IDE having Sourcegraph code navigation by default in the repository view would be great.

It's too bad that a browser extension is required to make this work. I've asked previously for GitLab to provide the necessary hooks so that any plugin could offer this type of functionality in a first class way, and was told that the proposal lacked a "concrete first need". I suppose SourceGraph has now demonstrated the usefulness of such a thing, which is great!

Ref: https://gitlab.com/gitlab-org/gitlab-ce/issues/33047

Yes, we would love to integrate Sourcegraph into all code hosts directly and not require users to install a browser extension. Please provide the same feedback to other code hosts you use (GitHub, Bitbucket, Phabricator, etc.) as well and link to this thread. :)

How is the process for that? I'd like to get that integrated into RhodeCode as well.

Hey! Sourcegrapher who works on the browser extension and other integrations here.

The road to complete integration in a product is a bit longer as it takes collaboration if the product isn't open source. This is what we'd like in every product we integrate with.

However, I just did quite a bit of refactoring to make it easier and more straight forward to add support for new code hosts to the browser extension.

If you want to add support, we'd gladly accept a PR!

Always feel free to reach out and ask questions. Check out how it's done on GitHub, GitLab and Phabricator here: https://sourcegraph.com/search?q=repo:graph%5C/browser-exten...

Thanks for your answer. Our product is actually also open-core. And the source is here: https://code.rhodecode.com/rhodecode-enterprise-ce

In this case, what's the best process to start, should we open an issue, or send support email?


In that case, let's build it right in to RhodeCode!

A good place to start would be opening an issue on your product's issue tracker and start discussing requirements there.

Thanks, we'll discuss this and follow up!

This is exciting!

Hello Mike! Can you please reference the Sourcegraph example in the issue? Maybe this will initiate further discussion and convince someone to reconsider your proposal. Thank you!

Yes, for sure there is a need now. We can do either a plug-in or just integrate Sourcegraph directly into GitLab.

It would be great to do it via a generic plugin interface, if possible.

Sourcegraph engineer here - that's the goal of the Sourcegraph extension API, being generic and code-host agnostic. One Sourcegraph extension can provide the same features on any code host (Gitlab, GitHub, Bitbucket, ...)

Hey, I worked on the GitLab support. I'm glad developers are liking it so far!

Thanks for building a great product with clean APIs and user interfaces that are easy to integrate with!

Hey, couple questions for you:

- We are a pretty small shop, only 3 developers, and looking at Sourcegraph for the first time. We have a full Kubernetes cluster for running some of our production applications and would be happy to run this there. The Sourcegraph site lists a Quickstart instructions and instructions for Data Center. I am unsure what exactly the difference between them is.

- The Docker image for sourcegraph/server (used in your Quickstart instructions) doesn't have a Dockerfile listed anywhere. Is that available somewhere?

- The Quickstart instructions also seem to instruct mounting the Docker socket into the container. I haven't seen anything that explains why that is required.

Basically wondering which version I should be looking to deploy for our small team.

You should definitely use our Quickstart instructions :) You can deploy it to your existing Kubernetes cluster (or any other Docker environment).

- The Quickstart instructions[1] describe how to set up a single-machine single-container Sourcegraph instance, which is recommended for small teams. Our Data Center[2] offering is for large teams and runs Sourcegraph across multiple machines in multiple containers for high availability, scalability, etc.

- The Docker Hub image sourcegraph/server is the official, free Sourcegraph build is called Sourcegraph Core and it includes various paid enterprise features built in so that upgrading to Sourcegraph Enterprise is easy (you only need to supply a license key, rather than migrating to a new Docker image). So, the exact recipe for building it is not open-source. However, we use the same exact Dockerfile in Sourcegraph OSS[4] which you can take a look at.

- Each language server providing code intelligence will run as separate Docker container. By mounting the Docker socket into the container, Sourcegraph can automatically start/stop/manage these containers for you. However, this is not required, and in the case of Kubernetes deployment we do not recommend it, please see the "Manual Installation" section of our Code Intelligence docs[5] for more information.

[1] https://about.sourcegraph.com/docs

[2] https://github.com/sourcegraph/deploy-sourcegraph

[3] https://about.sourcegraph.com/pricing/

[4] https://github.com/sourcegraph/sourcegraph/blob/master/cmd/s...

[5] https://about.sourcegraph.com/docs/code-intelligence/install

Hey, quick question (and maybe this is just a legacy from before open source):

From[1]: "If you do not wish to pay for code intelligence, you can disable language servers in the Code intelligence section of the site admin area."

The pricing page however says that feature is included in the free version. It just doesn't seem to make sense to me.

[1] https://about.sourcegraph.com/docs/code-intelligence/install

Sourcegrapher here. Yep, this was a reference to an old pricing model. Just pushed a fix, it will be live soon!

Thanks for the response and quick fix. :)

That last link is exactly what I needed! Thank you very much for all the information and help. :)

Congrats Quinn and company! I hope there's not too much of my code lurking in there =)

Oh! Hey Nico :) Hope all is going well! I miss having ya around!

Is it easy to add new languages? I work at a large company with some very old codebases.

@chrismwendt is a Sourcegrapher working on this. Posting this on his behalf:

Here is how I added GraphQL support last week:

- Follow the hello world tutorial https://github.com/sourcegraph/sourcegraph-extension-docs/bl...

- Add logic to handle hovers and definitions https://github.com/sourcegraph/sourcegraph-graphql

- Write the server part https://github.com/chrismwendt/graphql-ws-langserver (once a browser-based GraphQL analysis library exists, this would be unnecessary because the analysis could be done entirely in the browser)

It's available at https://sourcegraph.com/extensions/chris/graphql and once you enable it you can test it out on a GraphQL file such as https://sourcegraph.com/github.com/chrismwendt/graphql-ws-la... (it's pretty slow/janky at the moment and could be improved, but works).

If you're interested in interfacing with other tools too, you might be interested in LSP (Language Server Protocol):

- https://langserver.org/

- https://microsoft.github.io/language-server-protocol/specifi...

Cool! Are you working on IDE plygins, e.g. for Atom?

Yes, we have basic editor/IDE plugins that let you (1) open up Sourcegraph to the current document's file/line and (2) search. Sourcegraph for Atom is in prelrease at https://github.com/sourcegraph/sourcegraph-atom. Other editors are at https://about.sourcegraph.com/docs/integrations/.

What features (other features, if any) are you specifically interested in? It would be awesome if you posted feature request issues on the open-source repos for sourcegraph/sourcegraph or for the editor plugins!

No vim/nvim or emacs editor plugins?

The two most popular editors used by programmers...


Sourcegraph CEO here. I use Emacs personally and configured https://github.com/sshaw/git-link to make it work for me. See https://github.com/sshaw/git-link#building-links-and-adding-.... Will submit a PR or make our own sourcegraph-mode at some point (if anyone else wants to do so, that would be awesome).

Citation needed...

Just wanted to say thanks for making code intelligence part of the core product, Sourcegraph has been wonderful!

I'm especially interested in open source/free software businesses and business models. Are you planning to do an open core kind of thing, or maintain full on open source while charging for hosting and/or other services? I realise that might be a difficult question to answer as you might want to keep your options open, so I'm not asking for a commitment :-) I'm just curious about your plans.

Also do you ever think you might add (for want of a better phrase) custom development services if you aren't already? For example, allowing people to pay a fee to get specific features added to Sourcegraph. What are the kinds of benefits and risks you might see from going in that kind of direction?

I've noticed that a lot of open core businesses tend to avoid offering customer development services -- and if they do it's often perceived as a cost centre rather than a profit centre. It's something that is provided out of necessity of having to provide custom solutions for a few big customers, but the services team often appears to be at odds with the core development team. As you have obviously gone through the thought process of how to make money when your software is out in the open, I wonder what kinds of thoughts you have on that topic.

It's really nice to see this kinds of stuff happening and I wish you the best luck possible in your endeavours!

It looks like they're going with open core: https://about.sourcegraph.com/pricing

> Open. For business.

You know, it's funny that they are proudly using this slogan. OpenBSD has mocked it in the past, referring to Intel's firmware blobs for wireless NICs:

> Some asshole said he was "open"

> but he was only open for business.


Open core is no big innovation. It's plain ol' proprietary software. It's just like macOS. Most software nowadays has significant free components. The only company that I know that does truly free commercial software is Red Hat. They don't sell you a drop of non-free software.

Some people will argue that the degree of free software in your proprietary software makes a difference. I argue that it only matters if you need any of the proprietary bits or not. A non-free but desired bit of software in a package can be very restrictive and have the same undesirable impacts as having the whole thing being non-free.

I tend to agree, although I probably wouldn't state it in exactly the way you have. Open core's main goal is to sell proprietary software using the open source version as a loss leader.

Having said that, I think there are some companies/individuals who truly believe that open core is a good way of funding free software development. I have even heard Bruce Parens claim that dual licensing is practically the only reasonable business model for free software. This was quite a long time ago (on Slashdot of all places), so I have no idea if he still feels that way, but I mention it only to point out that the issue is far from clear to a lot of people -- even those in the open source community.

Red Hat is almost unique in their business model and they are one of the very few companies who actively try to make money doing custom development contracts. I have argued many times in the past that people should look at what Cygnus Software did and try to emulate it. You may remember that Red Hat acquired Cygnus Software for $600 million and installed Michael Tiemann in various executive positions. Although Cygnus (after having taken VC money) ended up doing some open core work, they happily abandoned it after they were acquired by Red Hat.

I've asked the questions in my original post to several open source and open core companies (possibly most notably GitLab) and have received almost the exact same answer from each of them. I'm hoping to receive an answer here (and I won't try to spoil the response by priming it with what I've received before). My main goal is to at least understand why businesses choose to go open core, why they feel that custom development contracts are not feasible, why they can't build a business of hosting and other similar services alone, etc. It may be a widely held belief simply because businesses are not ready to follow the example of Red Hat, or it may be the case that Red Hat really is special. I'd like to understand as best as I can which it is -- and since I don't run an open source/free software business, the best I can do is to ask those who do.

Sourcegraph CEO here. I share your interest in open-source business models, going back quite a while. My public middle school back in 1999 had a mock investment club, and students voted on which stocks to invest in. (It's funny looking back on this.) I campaigned for RedHat and VA Linux, and we ended up "buying" those (and some JDS Uniphase, naturally).

I plan to write more about this, but here's the summary of why we went open core.

- Hosting is less viable for developer tools, especially those that are intended for companies above ~30 engineers, because those companies usually want their code to stay on-premises. Selling hosting means you'd only sell to small companies.

- Now, compared to 10-20 years ago, customers perceive software needing custom development as bad. Expectations are higher for the product to just work out of the box. They might not ever use it if they can't quickly see that it solves their problem (without getting custom development).

- Doing custom development probably entails a top-down sale, not a bottom-up sale. Compared to 10-20 years ago, bottom-up (having a single dev bring your product in and then spreading from there) is much easier. Bottom-up means that, especially in the early days, you can focus on building the product/eng teams instead of the sales/marketing team.

- Custom development doesn't scale, and we think every dev will be using Sourcegraph (or something like it, if we mess up :) in 5 years. We can only get there by building a product that works for almost everyone, not one that works really well only for a few companies.

- Open core is easy to understand because (if you ignore the availability and open-sourceness of the code, which is purely a benefit) it is just like freemium, which is a well understood concept.

I'm just stating our decision process in choosing open core. I make no claims that these are universally true.

If someone out there is thinking to themselves, "I'd love to pay Sourcegraph to build custom features for my company", then please reach out. At the very least, your desire for those features would be a strong vote in favor of prioritizing them on our roadmap.

Thank you very much for these great answers! It's really helped me understand some more issues that I didn't understand before. I really appreciate the amount of time you took to think through that and write a reply.

As a complete aside, I actually had inside knowledge of the JDS - Uniphase merger (a friend of mine who worked there inadvertently allowed me to see something that made it was clear that it was happening a day or so before). It took all my self control not to buy up a lot of JDS stock. I always wonder if I would have gotten in trouble (or gotten my friend in trouble)...

"Open core" and dual licensing are two completely separate business models. They should not be conflated.

Open core business models depends on selling non-free software. Normally all or nearly all development is done by a singular company. The core software could probably just as easily have been free-of-charge, and everything else would have been completely identical. (But on the other hand, if you're not making money, you might as well publish source code in this day and age.)

Dual licensing business models are much more like selling free software than selling non-free software, with that difference that large parts of your potential customers are not interested in copyleft. This is often the case for the embedded world for example. Here it's much more common to find multi-stakeholder communities, since copyleft can bring a much more level playing field.

As a business model dual licensing means having a free software license (which may or may not be copy-left) and a proprietary software license. Having 2 or more free software licenses is not a business model, even if it is handy for letting people get around the GPL. There is no reason to pay for the second free software license. Otherwise you could just use MIT or the equivalent and be rolling in money. Obviously it doesn't work that way. Pedantically, "dual licensing" means to have 2 licenses. Companies are willing to pay for the proprietary software license because they want to use that code with their own proprietary code.

The only potential difference between that and "open core" is that open core generally talks about applications. Dual licensing a library allows people to use that library in proprietary software. But let's not kid ourselves -- that library is proprietary. That's the whole point.

Can you point to a project with dual licencing where all or nearly all of the development is not done by a singular company. This is rare (if it even happens at all), because you need to have copyright assignment in order to relicense the software under the proprietary license. Any other changes essentially causes a fork in the free vs. proprietary version. This is precisely the same reason why open core and other dual licensed projects tend to be controlled by a single entity. Unless you fork, there is no way to contribute without having to go through the controlling entity.

Potentially, you have in mind that many dual licensed libraries do not offer more features in the proprietary version as compared to the free version. With open core applications, where the license is less important (or not important) to the customer, you need to add different functionality to get them to buy the proprietary version. So I'll grant you that.

I'm not aware of any common difference in terminology usage. Open core is dual licensing. The fact that applications tend to diverge between then free and proprietary versions is due to the fact that nobody wants to link it to another application -- they just want to use it. In order to get people to pay for the proprietary version you need to add more features. But that's not a difference in strategy, it's just a reality of the thing you are selling. Libraries usually don't bother making different feature sets because there is no need -- people are willing to pay for the proprietary version without the extra work.

What no, open core is not dual licensing, at least, not by what is meant here with "dual licensing".

Dual licensing is about selling exceptions. Selling copyleft exceptions. There's no dual licensing without copyleft, because there is no exception to sell. And yes, it requires a single copyright holder. FFTW is an example of dual licensing. Octave gets FFTW because we like copyleft and Matlab gets FFTW because they like to pay to hide their code.

But it doesn't just have to be a library. MongoDB also sells exceptions because people are irrationally afraid of the AGPL.

I think you were confused by software like Firefox which is dual or triple-licensed and all recipients get to pick which license they want to follow, but that's not what was being discussed here.

Sigh... you are right about the copyleft thing. I'm not sure where my head was. Just so I get this right: you're saying that "open core" is essentially having a copyleft license and a proprietary license, while "dual licensing" is having a copyleft license and another non-copyleft free software license? If so, I can understand what you are saying :-)

Open core is freemium. You can use the base under a free license but if you want fancy features, those are under a non-free license.

Selling exceptions (or what someone else called, "dual licensing") means everyone gets the same software, both free and non-free, but people who don't want copyleft have to pay.

Ironic for Theo to call someone else an asshole. Putting people on the defensive is a great way to not get what you want.


We've been using Sourcegraph at Uber and it has been working pretty well. We saw a performance regression last week with `max` queries but it got fixed fairly quickly (less than a week from reporting to having the fix deployed)

Some wishlist things:

- would be nice if the query language was better documented (not the GraphQL console, the help section for that is comprehensive)

- a coworker had expressed a desire for multi-line searches (which is supposedly in the pipeline)

Sourcegraph CEO here. Thanks for the feedback and sorry for the regression! I filed an issue to improve our search docs at https://github.com/sourcegraph/sourcegraph/issues/190 (will fix in the next day or so) and to add multi-line searches at https://github.com/sourcegraph/sourcegraph/issues/191 (this is one of 3 significant search improvements coming in the next few months).

Awesome! Rock on!

(Sourcegrapher here) Just curious, have you discovered this page before from the in-app help area? https://about.sourcegraph.com/docs/search/query-syntax/

(We will improve our docs here & make them stand out more in the app regardless, I think that is a great suggestion.)

I also very much want multi-line searches, I can't wait for them :)

Yeah, this help page is fairly easy to find (two clicks: ? icon -> "How to search"). I figured I'd ask because for example `max` is not documented there, even though it's pretty important for the sort of stuff I use Sourcegraph for (i.e. finding out all the repos that use X).

I was lucky to connect with Quinn and get the opportunity to volunteer as a guinea pig for open sourcing this code :).

Sourcegraph is an interesting product and company. I see the potential for it to have a bigger positive impact on developer productivity than even Jetbrains IntelliJ has, because the reach can go beyond any particular editor or frontend. It could be a ubiquitous solution to efficient code search and discovery.

Will definitely be watching how things unfold for Sourcegraph. If you haven't already read their Master Plan, it's a compelling vision.


Thanks for running through the README and helping us prep the open-source release last week!

This is amazing for secure code reviews. I have already spun up and instance and am demoing it for my team.

Is it possible to get Jira integration?

Can you test bitbucket.org connection with credentials some place I am not seeing? I can't tell what is causing it to fail.

Is there a way to add markers in the code and then a queue to clear it? I.E I mark hotspots in the code and want a developer to review them or edit them.

I think a current pain point would be no flat file upload. I understand you only need the .git directory and this was designed with a developer in mind. A lot of times when doing a secure code review we don't get credentials, we just get a current clone of the code. I just tried to do it using the instructions and had some trouble ended up just cloning it on the host machine. I usually get a zip/tar of the code.

Another feature that would be awesome is export a search to a CSV file. I could script together something for Jira integration with that pretty quickly. Something to get the line affected #, branch and filepath.

The subscription to searches was a brilliant idea. I wish I thought of it.

SSO already integrated? Nice man!

I'd make the discussions default too, I discovered that after playing with it for a bit. It's an awesome feature.

I look forward to seeing more of it.

Heya, I'm the Sourcegrapher working on the discussions feature[1]. It's great to hear your are enjoying it. We're looking for users to help drive the direction of the feature as it matures and really influence what we work on next with it, so if you're up for it I'd really love to get your feedback on it (as much as you'd be willing to provide). :)

Jira integration -> We're working on it! Mind commenting on https://github.com/sourcegraph/sourcegraph/issues/141 ?

Markers in the code -> If you could archive or dismiss the inline discussion threads (so that they no longer show up in the inline view) would that work well?

Flat file upload -> Could you file a feature request for this at https://github.com/sourcegraph/sourcegraph/issues/new/choose with some more details on how e.g. you would wish to give Sourcegraph access to the code?

Exporting search to CSV -> It might be pretty easy to write something like this with our `src` CLI tool: https://github.com/sourcegraph/src-cli You can get JSON output via e.g. `src search -json 'repogroup:sample error'`

Hope to hear more from you! :)

[1] https://about.sourcegraph.com/blog/discuss-code-and-docs-in-...

Awesome! You guys did what I could not. Congratulations! So excited to see where this takes Sourcegraph, and the community around it.

In my experience, even just using Sourcegraph to browse source code in open source repos is way faster than the native git host browsing (BitBucket, GitHub, Gitlab, etc). We use it daily at my lab, especially as I help others learn the code base. Thanks for Sourcegraph.

Holy cow, does this mean I can use source graph as a single dev wanting the features, but not able to upload private source to a cloud service?

Fwiw, I'd love to pay as a private user for this type of service. It's just that for many, we can't use a tool that leaks code. I look forward to that day, if it's not today :)

Yes! There is a `docker run` command that will spin up a Sourcegraph instance for you at https://about.sourcegraph.com/docs/.

The self-hosted Docker (or, for big clusters, Kubernetes) deployment method has been there for about 10 months now, and devs/companies really like it. Making it so easy to self-host Sourcegraph was a big win, and going open-source is another step in that direction.

What is new is that you can now also build Sourcegraph from source and see/hack on/etc. our source code.

Awesome!! So the Go code intelligence is now free for single devs who are self hosting?


Very nice. Poking though the source now, cmdfrontend/graphqlbackend/textsearch.go is one of the more interesting files I have found so far. It's very neat to look under the hood.

Hey, Ben! Yes, that is a neat source file that shows how Sourcegraph does really fast live code search. (Sourcegraph has hybrid search so you get fast search over any repository and any commit, without needing to wait for indexing. It'll hit an index on the default branch, but for any historical commit/branch that isn't indexed, it has a super-optimized live searcher. We sponsored some contributions to the Go stdlib to improve this, in fact.)

BTW, you can pull up that file quickly on Sourcegraph:

1. Search for repo:ph/sourcegraph textsearch.go — https://sourcegraph.com/search?q=repo%3Aph%2Fsourcegraph+tex...

2. Go to the first file — https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-...

3. Now click around! You get code intelligence: go-to-definition, find-references, hover tooltips, etc. It works on any open-source repository, and on your internal repositories if you spin up your own Sourcegraph instance.

As for licensing, we went with open-source using the Apache License because that's what makes it as easy as possible to get user #1 and user #20 inside of companies. See https://twitter.com/sqs/status/1046913901688807424 for a bit more info there.

So before anything else - congratulations and thank you for releasing something as open source, under a business-friendly licence. Regardless of the project itself, that should be commended.

However. I've never paid much attention before, so perhaps I'm missing something.. but this "just" seems to be the code intelligence you'd expect to find in a good IDE.

Is there more to it? Or is this the 'solution' for the lack of such code intel in the plethora of text editors people use these days?

Sourcegrapher here -- two main things:

1. Sourcegraph offers the code intelligence your used to in your editor, but instead of being in your editor it is when you are browsing code on your code host (i.e. on GitHub, BitBucket, etc.) Even when reviewing pull requests / diffs!

2. Sourcegraph provides fast, advanced code search across multiple repositories -- akin to what Google and Facebook offer their devs internally. Regular expression support, extremely up-to-date results, etc. You can read more about this here: https://about.sourcegraph.com/docs/search

Know those cool features in your editor that allow you to hover over source and get more detail on what you are looking at? Also, know that feature in an editor that lets you go to definition? Imagine this and not within your editor but rather your browser, whatever host you choose, such as GitHub or GitLab. Sourcegraph enables this feature with browser extensions.

Sourcegraph CEO here. This is a great way to describe how Sourcegraph integrates with your code host. I love it. :)

Links to save you from searching:



Will definitely use to navigate code. It's way better than github/gitlab/bitbucket.

So what's the use case for this ? If somebody wants "Find all references" , what's preventing them to open the code in Visual Studi/Intellij/Eclipse etc.

I remember Sourcegraph as being a sort of sidecar to your text editor.

Based on your caret position, it would automatically show you crawled examples of how other people use the functions/classes you are dealing with.

Has that been de-emphasised or gone away?

I think you might be confusing Sourcegraph with Kite[1] (zero relation with Sourcegraph). Kite tries to do something like what you are describing.

With Sourcegraph, we have editor plugins for various editors[2] but they only perform "Open current file on Sourcegraph" and "Search selection on Sourcegraph" actions, nothing like what you are describing yet.

[1] https://kite.com/

[2] https://about.sourcegraph.com/docs/integrations/editor-plugi...

It might have been what I worked on at my internship at SG; demo here: https://vimeo.com/164809163

It makes sense not to make this a big part of SG though, since latency is a huge issue and detracts from the rest of the product which is amazing!

"Master plan" link [0] is 404 in the linked article. (Not sure if this is right way to report it)

[0] https://about.sourcegraph.com/plan

Thanks, just fixed (should be live soon).

Very cool! Does it have C or C++ support? Last time I looked it didn't.

Something that understands Python calls into C and vice versa would be interesting to me as well.

OneDev developer here. OneDev is a self-hosted git server and it provides symbol cross referencing for main stream languages including Java, JavaScript, CSharp, C, C++, PHP, Go, Python, R, CSS, LESS, SCSS. Taking linux project for example: https://go.onedev.io/projects/linux/blob/master/tools/lguest...

Sourcegraph has experimental C/C++ support. Emphasis on experimental. It uses the best C/C++ language server (based on the Language Server Protocol) we could find, but that still struggles with some codebases.

You can try it on any open-source C/C++ file on Sourcegraph.com before you spin up a self-hosted instance for your own code, such as:


Here is more info on using C/C++ on Sourcegraph (and an animated GIF): https://github.com/sourcegraph/lsp-adapter/tree/master/docke...

That seems to have a lot of rough edges. For me, when browsing C++, there is exactly 1 killer feature: find callers. I recently deployed Kythe just to get this and it took many hours of my time. I would love to have something a little easier to maintain (that works, though).

The kind of cross-reference SourceGraph provides is a must have for me after a few years of using a tool like it to understand big projects (I don't use SourceGraph, I use Grok/CodeSearch).

Google open sourced a version of Grok some time ago (https://kythe.io/). [1] A quick search also found this `OpenGrok` thing which I assume must be yet another generic indexer [2].

I wonder if there are other popular tools out there providing generic cross-referencing of code bases?


I was trying to find the equivalent of this Kythe page (how to write an indexer)[6] on SourceGraph docs, after 20 minutes I was still confused. I wish there was more clarity on the different pieces of the system. At first I thought SourceGraph could create a cross reference index from an LSP server, but I guess makes more sense it is the order way around? (SG can talk to text editors via LSP).

It must be a lot of work to make integrate any single language with both SG or Kythe and LSP, perhaps there's some tools or methodologies to make this easier?


I feel like language designers don't always think about building their implementations with ease of integration as a goal. I remember wanting to play with doing some AST transformations with both TypeScript and Dart (independently) and finding both languages made the task so hard and tedious. Mandatory remark: this is so easy with Lisp derivatives :-)

Things that a good language+ecosystem should have, IMO:

* Easy to retrieve and manipulate CST (for things like writing refactoring and querying tools, writing code formatters/pretty printers, etc). Many languages provide the syntax tree say, without comments, so it is hard to write a pretty printer.

* Documentation generator with cross-reference navigation.

* Play along nicely with generic build tools like Ninja [3].

* Good debugger and value-pretty-printer support (so I can just drop a breakpoint and print anything easily). And some sort of API to use the debugger programmatically or connect to a remote process.

* An LSP implementation [4], although it seems pretty much every popular language is getting one these days!

1: https://en.wikipedia.org/wiki/Google_Kythe

2: http://oracle.github.io/opengrok/

3: https://ninja-build.org/

4: https://langserver.org/

5: https://about.sourcegraph.com/docs/code-intelligence/

6: https://kythe.io/docs/schema/writing-an-indexer.html

<edit: line breaks>

I've worked a lot with ASTs in the past, and am now working with TypeScript quite a bit. I actually find their AST implementation very nice, and pretty easy to get started on:

    import * as ts from 'typescript'

    const sourceFile = ts.createSourceFile('myclass.ts', `class MyClass { ... }`, ScriptTarget.Latest);
    ts.forEachChild(sourceFile, (node) => {
      switch (node.kind) {
It doesn't get much easier than that in my book. The AST is also fully typed and TS comes with a lot of helper methods, including the transformers package to hook into TS' output pipeline. It's pretty neat.

Some comparisons based on my reading so far:

Adding a language:

  Kythe: a language implementation provides some structured graph
  SourceGraph: a language implementation is an LSP server
Backend logic:

  Kythe: has common logic for querying for relationships
  SourceGraph: simply proxies LSP requests through to the appropriate language server.

  Kythe: An indexer is a language implementation
  SourceGraph: The index means something completely different, it's what keeps track of git repositories.
SourceGraph Architecture: https://github.com/sourcegraph/sourcegraph/blob/master/docs/...

Adding lang server to SourceGraph (probably somewhat stale) https://github.com/sourcegraph/sourcegraph/blob/master/xlang...

We (I work at Sourcegraph) actually started off with a similar model to kythe: https://srclib.org/ This project existed before kythe was published, but was based on the ideas talked about pre-dating kythe. If I remember correctly Steve Yegge had some blogpost about solving the MxN problem, which turned into him doing grok at Google.

We ended up switching away from that model for a few reasons:

- At the time it was very costly to essentially index the whole OSS world, when most commit indexes would never be read.

- It was slow to index a codebase for a commit, and most of the work was wasted since a developer would often only look at a handful of files.

- Getting incremental indexing working usually required pretty deep integration into the build tool, so was a lot of work per language which didn't scale.

- A lot of tools using an "indexed" model end up only indexing the master/trunk branch, and you don't get your code intelligence features for PRs/etc.

At some point LSP came onto the scene. We were early adopters making a bet that this would take off. It has, and the list of community created LSP servers is large: https://langserver.org/ It also allowed us to switch from an upfront indexing model, to a model which just encodes user intent. So the underlying LSP server can be as lazy as it wants to be with respect to how it responds to the user == increased perf and reduced resources.

Things like cross-repo references doesn't come with LSP. Many LSP servers assume the user sets the build up correctly. There are also quite a few more assumptions LSP authors make which don't easily translate into an automated server environment. So we have done quite a bit of work to smooth that over / contributed some LSP servers for popular languages.

Seems like you are digging in so you might have more questions. I'll try keep track of this thread, but also feel free to email me keegan at sourcegraph.com or file issues/questions on our repo.

Thanks for the clarification. The approach makes sense. I am actually impressed by the lack of NIH syndrome, it is rare to see open source software integrated so effectively. That is a +1 for protocol standardization

Is this the thing that caused furore a while ago because it was installed on some editors and it sent all your code to a third party server, or am I thinking of something completely different?

No that was Kite, they paid editor plugin developers to take over the plugin and then added their code complete which requires sending all source code to their servers.



That's the one, thanks. I can't wait to try Sourcegraph in that case, it sounds very useful!

No, that is https://kite.com/ (absolutely no relation to Sourcegraph).

Nope, not Sourcegraph!!!

Open source is the way to go. Onwards and upwards.

Was Fair Source a detour or a stepping stone?

Sourcegrapher here (speaking from my own viewpoint): Stepping stone.

A fully open-source codebase[1] with Apache license is simply a better match as we aim to have a completely open and transparent product and company, where even e.g. planning is done in the open[2].

[1] https://github.com/sourcegraph/sourcegraph [2] https://github.com/sourcegraph/about/blob/master/projects/ab...

is it working only with git vcs?

Hard pass. I dont want an "intelligent" search I just want a search. Will be sticking with Agent Ransack.

To be clear, this isn't "intelligent" search. There is no magic here. It's just fast, advanced code search across multiple repositories (like what Google and Facebook offer their devs internally). This means support for regular expressions, always getting extremely up-to-date results, etc. You can read more about it here: https://about.sourcegraph.com/docs/search

There is "code intelligence", which is a phrase we use to describe IDE-like features such as jump-to-definition, find-references, etc.

> Together, these companies succeeded in bringing computing to billions of people. But these billions of people are still using software applications built by just 0.2% of the world's population (those who can code).

> The next step is to make it so billions of people, not just 0.2% of the world population, can build software (not just use it).

I must admit, this sounds indubitably noble.

But if you think that your software-engineering career is safe from (partial) automation, then these Sourcegraph folks have an iridium-plated bridge to sell you.

In not-so-tongue-and-cheek terms: if Sourcegraph succeeds, then tomorrow's software engineers will be - at most - as valuable as today's factory-robot operators.

* * *

(Of course, anyone worth their salt in this industry ought to be able to find the LSP's inherent limitations, and structure their codebases accordingly.)

EDIT: counterarguments are more persuasive than downvotes.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact