Back in college I was working on patches to OpenSSL, Chrome, Firefox, Apache, etc., to add support for TLS-SRP, and it was a huge pain to jump into these massive codebases and try to understand them. I was using Emacs and had all of the various language support modes configured, but go-to-definition and cross-references barely worked. Searching was slow, and if I wanted to discuss a piece of code with my CS lab partners, I couldn't just share a link.
A friend felt the same pain but then went to work at Google for a bit. At Google, they have some pretty amazing code reading/searching tools (see https://static.googleusercontent.com/media/research.google.c...), and these tools helped Google build a culture of thoroughly reading and reviewing code. The causality is bidirectional, but having good tools certainly played a role in Google's success.
That friend and I ended up building a product, Sourcegraph, initially for ourselves to make code reading easier. We've now built a successful business out of it with the help of an amazing team. Here it is pulling in the OpenBSD sources: https://sourcegraph.com/github.com/openbsd/src/-/blob/lib/li.... Sourcegraph has advanced features for several languages; see https://sourcegraph.com/github.com/mholt/caddy/-/blob/caddyh..., for example. If you love to read code (or want to), we hope you'll love our product. Email me if you have any feedback/requests.
Someone has it run on the openbsd code here: http://bxr.su/OpenBSD/ and it should produce a much more useful representation of the code, see http://bxr.su/OpenBSD/lib/libutil/bcrypt_pbkdf.c#98
Mozilla also has one called DXR which is designed for their large, C++ heavy codebases: https://wiki.mozilla.org/DXR
linux kernel uses LXR, might be useful too.
here is a comparison chart:
That is, Sourcegraph looks to compare to OpenGrok like Github compares to Gitweb. At least from a cursory look.
E.g. a Jira license for 2000 people costs $24,000 yearly, licensing this for the same amount of users would be $1,200,000.
This is way more than organizations of that size tend to pay for top-tier support contracts for software that's critical to business continuity.
Pricing per-user without any advertised discounts is also a trap if you're selling to large organizations. A lot of them tend to, for simplicity's sake, want to just give everyone in the org access to a tool like this, but only 5-10% of the workforce might be using it, but due to how you're pricing it there's no way it's going to be bought in the first place.
With that said, $1.2 million for 2,000 users yearly for a code-search tool is some pretty insane Kool-Aid pricing, and it prevents me from even recommending it to my bosses - I'd be laughed out of the room.
But if you're a large company with 2,000 engineers, then you could be spending nearly a half-billion dollars on engineering salaries alone. If Sourcegraph makes your developers at least 0.5% more productive, then it pays for itself.
And shouldn't companies be spending 5-10% of their salary budget on getting the best tools? (That would mean tens of millions of dollars annually for this hypothetical 2,000-person company.) Companies routinely pay that for people in other roles, such as salespeople, medical professionals, stock/bond traders, etc. I think we all agree developers deserve the same. :)
A couple things:
a) $500 million / 2,000 engineers is $250,000 per engineer. Even if you take benefits and equipment / licensing costs into consideration, that's really high. Most enterprises aren't located in Silicon Valley and don't pay Silicon Valley engineering salaries.
b) I don't doubt your 0.5% figure, and the argument about the best tools money can buy goes all the way back to the Joel Test, but engineers aren't the ones who make the decision whether to buy these tools, the enterprise bean counters do. Do you have any evidence, even anecdotal evidence, to back up your 0.5% number? Because otherwise, the bean counters just see yet another toy that engineering wants to put on the budget, and the budget is always too restricted to add toys to the budget. Remember how long it took most enterprises to understand that multiple monitors aided productivity? And that's practically obvious. Why does Sourcegraph get a pass? What kind of concrete evidence can engineers pass along to their bean counters to justify the cost of Sourcegraph's licenses?
edit: I want to add, Sourcegraph isn't going to be paid only for engineers. It'll also be paid for QA, engineering management, infrastructure/ops... eventually, everyone wants to store something in version control.
When you say something vague yet absolute like you "had all of the various language support modes configured," that is a big indication that you did not have them configured. There are about four major modes for C/C++. Searching and cross-reference is done with external tools. The only time I ever thought searching was slow was when using the grep that came with Mac OS X. There is absolutely no way that online tools can beat ag or rg for code searching, especially if you have an SSD. Exuberant ctags and GNU Global work for cross-referencing and support dozens of languages. And you have Magit and VC mode right there to track down source code history.
In my experience, you're right; but that opinion wouldn't make for a nice long-form advertisement for your software project.
Hook them with hinting "I have a solution better than emacs" first paragraph, add a happy tale about collaboration and friendship, and then pile on the advertisement.
I would recommend the SourceGraph people post browsers for Linux, L4, Xen, LLVM, ... and other great open source infrastructure projects. You'll drive more interest in your product in a helpful way.
There is a note that shows up when viewing OpenBSD sources
"C/C++ is not yet supported (beyond basic code browsing and text search)"
In my experience, these two languages are the most difficult to find good tools for, to browse, jump, and manage large code bases. Yes, some exist, but I thought this was the point of "good tools matter"?
There's also a list of similar tools at Gnu.org:
Thought it might be of interest for others looking at "source browsing" tools.
Results seem pretty good.
Meanwhile, most IDEs, QtCreator, KDevelop, VS, Eclipse... had this feature for years.
I just tried on two devices.
Nexus 5X, Android 7.1.2, Firefox 53.0.2
Samsung Galaxy Tab S3, Android 7.0, Firefox 53.0.2
The page loads and I get that little blue loading indicator but then nothing happens.
Some kinds of understanding involve a no shortcuts grind. That sort of a grind is a big commitment though.
What do I mean with "level"? Let's look at transportation in that regards. At first we just had walking/running. Then we learned how to use horses. Then we developed the wheel and could use horse wagons. Then we discovered the walking bike, etc.
If you are on a low level you may be the fastest on that level, but you may be dimensions slower than people on higher levels. Think horse riding vs car.
But that is not the biggest problem if you only use the approach. On a higher level you'll also be able to solve problems that you didn't even know where solvable. For instance if everybody walks you won't even consider visiting other continents. But if you have airplanes you can get there in a few hours and it becomes something people do at least twice a year.
In programming this "solving problems easily that you didn't even know that there were solvable" happens if you really learn software architecture from actual tools, apis, how standards work, etc. The biggest wow for me was when I started to put in the additional 20-50% overhead to becoming standard conform for a standardized API. In the end when I had a problem I didn't have to code anything, because the other tools were already working with the same API as my tool, and I could just connect them and be done. This way I solved 75% of a semester long software project in one weekend, and I wouldn't consider myself especially intelligent. I just put in the hours to become standard conform, because from learning open source tools I found out it's something that people really do and that it is possible to do that.
You have a production bias to learn only what you need to immediately get your job done, rather than investing time to learn up front which could potentially be more efficient in the long run.
Some kind of curated genius.com for source code would be interesting.
It's easy to read it all on the web, the docs are here: https://golang.org/pkg/ and clicking on a function name shows the source.
There are loads of little things that could be improved, so it's also a nice codebase for contributing to.
In general I think you should pick something that you use every day. There's no point reading OpenBSD for the sake of reading OpenBSD, you'll get far more value out of something familiar.
Opposite of this: STL implementations come to mind.
I've recently hit more than 5 stacked bugs in older version of libstdc++ on bionic libc...
Lectures for it are here:
This teaches you C, x86-64 arch, stuff like two's complement integers and floating point, how memory and CPUs work, how the C compiler works (linking stage, preprocessor), and how to write/understand signals and processes, file I/O, network code ect.
After that you can just start reading the OpenBSD code and will figure it out or get an Andrew S Tanenbaum book on Operating Systems.
Note, if you buy the Pearson Global Edition of CS:APP (it's only 10% the reg price) there's a lot of errata you will have to check. I once got stuck reversing an assembly program into C that did an even/odd parity check because of a print error returning the final XOR'd value & 0 instead of & 1.
If you want to improve your code reading skills and/or C programming skills, then you can probably go ahead and start reading, for example, the OpenBSD source code, even though you don't have any operating systems knowledge.
For now, what's been fun is to load up the same file in both Chromium and Firefox source, and compare the two and how both browsers work.
Chromium source: https://cs.chromium.org/chromium/src/third_party/WebKit/Sour...
Firefox source: https://dxr.mozilla.org/mozilla-central/source/
One of problem of static source code analyzers are false positive. Soon or later you will have to reading code and understand the context. I assume it's better way to improve your C knowledge because you REALLY must understand the code . And besides that the positive effect are more valuable.
Yes they could be reading the source of projects known for their code quality…
> OpenSSL is not an OpenBSD project and the code quality is markedly different :-)[…] and yes, OpenSSL is a bit of a code quality difference than the OpenBSD norm. [nb: these comments were not praising OpenSSL's code quality]
> OpenBSD has proven great at configuration, code quality, and minimalism.
> OpenBSD's incredible code quality quite obviously doesn't apply to the ports tree (and that's not their fault)
> OpenBSD […] has a slower evolution pace and a more carefully planned development model which leads to better code quality overall. Its well deserved reputation of being an ultra secure operating system is the byproduct of a no compromise attitude valuing simplicity, correctness, and most importantly proactivity. OpenBSD also deletes code, a lot of code.
> After scouring the lists and other resources I've yet to find an official reason for OpenBSD dropping LKM support, but would wager it's due to security or code quality/openness ideals.
> OpenBSD, a project that has a frankly psychotic focus on code quality. […] some examples of great code quality. OpenBSD is undoubtedly one of the pin-up projects of the Open Source world, featuring code that is almost supernaturally clean, consistent and direct.
> SELinux, etc. is not that picky about audits and code quality as OpenBSD is.
> “I think our code quality is higher, just because that’s really a big focus for us,” De Raadt says.
such as OpenBSD.
OpenBSD is a fork of NetBSD, another project considered to have above average quality source code. Enough so that Spinellis based his book about code reading on the NetBSD source code.
OpenBSD is one such project.
Also, being able to read and understand code is an important skill in itself.