
Working on Chrome made me develop a tool for reading source code - nebucnaut
https://medium.com/@egraether/why-working-on-chrome-made-me-develop-a-tool-for-reading-source-code-7111ba21a6f0#.2m1e4m8yy
======
barrkel
The symbolic lookups and diagrams described are already implemented in modern
IDEs. People who rely on relatively simple text editors and text / regex
search may be less aware of this, of course.

I do agree that visualizations are lacking. In part this is because of
difficulties in complete analysis, and rather than do half a job, they don't
do any job at all. And in other parts its because different tools do a better
job: for example, instrumenting profilers are better at showing control flow.

I generally want three kinds of things in my head when reading code: control
flow, data flow and data shape.

But there are different resolutions to this. Control flow may be simple, at
the method level; or it may be more complex, with asynchronous callbacks,
queuing systems, RPC, web service requests and orchestrations controlled by
configuration.

Data flow may be simply knowing where in the code a particular attribute is
updated and where it is read and used to make a decision. But it's also about
where the data came from ultimately - what all the ingredients are that go
into its formation - and also what other data it in turn affects.

Data shape has to do with the simple shapes of structures, but diagrams are
scarcely needed for that - a glance at the definition of the structure is
enough to commit it to short-term memory. More interesting is global
invariants, local invariants, longer chains (how you get to one distant
structure from another via links), the database model, configuration data
model, static data vs dynamic data.

Most of the interesting information is at a higher level than can reasonably
be analyzed in most Turing-complete languages. The best info comes from
profiling or debugging.

~~~
bobajeff
In the past working on a large c++ codebase I've tried using a debugger to get
an understanding on how a function gets called.

Even the debugger fails when you come across indirect calls. Also in gdb
multiple threads make it easy to miss something when stepping through code.

~~~
realharo
Visual Studio's debugger has some really nice features and interface for
dealing with multi-threaded code [https://channel9.msdn.com/Shows/Visual-
Studio-Toolbox/C-Plus...](https://channel9.msdn.com/Shows/Visual-Studio-
Toolbox/C-Plus-Plus-Debugging-Tips-and-Tricks#time=19m42s)

------
jasim
As someone looking in from outside, the Chrome codebase is a treat. Anecdote.

Once during the start of my career I was asked to implement a caching layer in
an HTTP library. I had no idea what it was, so there was some reading up to
do. There were the excellent guides from mnot.net, as well as the nicely
written RFC 2616. But Chrome's code was the best of all -
[https://github.com/adobe/chromium/blob/master/net/http/http_...](https://github.com/adobe/chromium/blob/master/net/http/http_response_headers.cc#L990).
If you want to know for a fact how Chrome decides caching, that code is kind
of the heart of it.

Recently I needed to retrieve the "Rendered Font Name" that is available in
the DevTools "Computed CSS" section. This is the name of the system font that
Chrome finally picks, based on the Font-Family property. This is platform
specific and so not in DOM, nor available to Chrome Extensions directly. The
only way it could be done was by making an extension running in debug mode and
communicates to the browser thru its remote debugger protocol. (This part of
the documentation is lacking, but it is an esoteric topic anyway). The good
news was that the code that does this is well encapsulated and could be easily
extracted into a command line utility. For the curious:
[https://chromium.googlesource.com/chromium/src/+/master/thir...](https://chromium.googlesource.com/chromium/src/+/master/third_party/WebKit/Source/platform/fonts/mac/FontFamilyMatcherMac.mm#129)
(there is a nugget about real-world software in the comments)

This is, at a surface level, good code to me. Things are easy to find, there
is a rhyme and rhythm to the system, and feels welcoming. The thoughts of the
people who designed that system over years would be great to hear. Most
writing about good code on the internet comes from an OO background, mostly
wrt information systems. I wonder what people who've written these systems
have to say about building and engineering complex software.

~~~
mattmanser
I'm not a fan of the comments. There's comments that are utterly useless like:

    
    
        // Of course, there are other factors that can force a response to always be validated or re-fetched.
    

Gee thanks, what might those factors be? A comment like that is worse than not
commenting at all.

Or the real ones that get my goat, comments which just tell you what the code
that follows obviously does:

    
    
      // If there is no Date header, then assume that the server response was
      // generated at the time when we received the response.
      Time date_value;
      if (!GetDateValue(&date_value))
        date_value = response_time;
    

There's even a line (1019) where they divide by 10 but don't explain why they
do it. That's what I personally would comment.

The code itself is fairly easy to read (though I do wonder why they bothered
using TimeDelta as it just seems to make the code more complicated, and
results in confusing code like this:).

    
    
        return TimeDelta();  // not fresh

~~~
groby_b
While I'm not on the net team - I just mangle the UI and annoy Ben ;) - I can
wager a good guess: The net team used TimeDelta because for any software
project of Chromium's size strong types are crucial.

If they just returned an int, you'll sooner or later see it passed through
five different places, and at the end location, nobody remembers if it's ms,
time ticks, seconds, or even a unix time instead of a delta. TimeDelta removes
that question.

It also does things that have subtle issues you might miss if you did it "by
hand", like saturated adds, multiplying with an integer value while handling
overflow correctly, etc.

These things might be overkill for smaller project, but once you have
something with hundreds of contributors, every little bit helps keep the code
base sane.

As for the division by 10 - look up at line 953 :)

Yes, it should be a named constant. And some of the comments could certainly
be better. It's a work in progress. (And if you want to help with that work,
we happily accept patches!)

~~~
mattmanser
Yes, that seems like a good reason to use TimeDelta.

------
realharo
The visualization aspect looks quite interesting, but I'm not sure how many
things this can do that a quality IDE already doesn't.

The example uses the author mentions ("Following code paths from method to
method", "Finding where an interface is implemented and which methods get
overridden", "Exploring dependencies between types and functions") all sound
like pretty standard features today. Plus with an IDE you get the benefit of
having them right there in the editor/debugger/etc. and much more.

I would however really like something for inspecting the _run-time_ structure
of an application's objects. Most debugger views are really clunky for looking
at large amounts of data, and even the pretty-print features often don't help
much. Having a zoomable graph with the objects right there in front of me
would really bring my productivity to the next level.

~~~
westoncb
I'm working on a new kind of tool on those lines—would appreciate any feedback
on whether the format I've designed so far would be useful to you (or others).
[http://westoncb.com/projects/avd](http://westoncb.com/projects/avd)

(I've been thinking about ways of getting to deal with arbitrary object
graphs, but an important requirement was to keep it language-independent, so
it just deals with common data structure formats at the moment: lists, trees,
tables, graphs, hashmaps, etc. —my thinking is most difficult to debug algos
are performing operations on these anyhow.)

~~~
cellularmitosis
This is amazing! Is the UI OpenGL? Is the visualization program a separate
process? I'm guessing the main process sends info about the monitored data
structures over a socket?

~~~
westoncb
> I'm guessing the main process sends info about the monitored data structures
> over a socket?

Correct!

The UI/server is Java/OpenGL.

------
mwexler
I see comments that "an IDE can do this". Perhaps so. But there are lots of
data-discovery missions where you need to learn a codebase quickly that you
won't be fully compiling or building yourself. Perhaps you need to replicate
an approach, or understand where something in your code needs to change in
order to work with the other code that you can't touch. Tools like the OP's
could become very handy in understanding a codebase without bringing the
entire thing into your IDE.

I agree that some pieces only are really comprehensible runtime, but I applaud
tools that reflect the need to learn a codebase without necessarily having to
(or being able to) bring all the code into your IDE.

~~~
speps
> learn a codebase quickly that you won't be fully compiling or building
> yourself.

I used early versions of Coati (0.5 I think) and it used clang for the
backend, loading a new project took ages, probably longer than compiling it in
my case. I should try again as they released 1.0 not too long ago.

------
javathrowaway
I feel obliged to mention this talk by Zed Shaw:
[https://vimeo.com/53062800](https://vimeo.com/53062800)

It gives an overview and interpretation of a body of neuroscience research in
the context of teaching programming. I can't quite summarize the whole talk
succinctly (and don't want to lure anyone with catchy titles either) — but my
takeaway from it is that those "visual" programming tools are mostly useless
and not going to help significantly.

The reason for that being how the brain works: switching back and forth
between "visual" and "linguistic" cognition is hard, and requires specific
training to do efficiently. Please turn to the talk for references.

------
outcoldman
Trial does not allow to try it... You can only try it on some predefined small
projects. As I understand the only option for me to try it is to buy a
license. Ok, author says that I can get refund in 1 month if I will not like
it. But anyway - too much movements for trying. Author please consider to
actually give an option to really try it. Another: please add retina icons.

~~~
sexyForkBomb
I was curious about this as well. You can mail them for a "real" trial.

------
mrkgnao
Chromium Code Search sounds sort of similar to Hoogle. Can anyone with
experience of both confirm this?

Also -- sort of offtopic, but motivated by TFA -- I'd love some way to find
out about quirks that native speakers of $lang have when they speak
$otherlang.

A few common "tells" that I know, for $otherlang = Englishf are Hyphenating-
Things-Like-This and also spaces before exclamation points or question marks !

~~~
0x4a42
"spaces before exclamation points or question marks !"

This is probably related to typographic rules. For example in french you put a
space before and after double punctuation marks (!?:;" etc) and a space after
single punctuation marks (.,) one exception is the single quote which should
not be precedeed nor followed by a space.

------
TheMagicHorsey
This company is tackling the same problem with a slightly different approach.
I like that they have made their tools open source and modular. Theoretically
you can use their system with any language by writing a few modules.

[https://sourcegraph.com/](https://sourcegraph.com/)

------
relics443
If it was hosted, and you could point it at a GitHub repo, I'd use it.

Otherwise I'd rather use whatever IDE JetBrains has for it. It might not have
the exact same capabilities (or maybe it does, I didn't look closely enough)
but why use another tool and context if the current one is good enough.

------
zfedoran
This reminds me a little bit of the IDA disassembler. There are moments where
you might face production issues with external JS code and have no source maps
available. A tree view to dissect what is going on would be useful in these
situations.

------
mattnewton
This is really cool. One killer feature that I don't know if you have
considered is some kind of visualization of the stack trace, showing how
control passes between the objects & functions. That would really help with
dynamic languages, or large codebases that have been Greenspun-d heavily. It
would be very difficult to implement, especially in a cross-platform way, but
I bet if you pick a language community like c++, java or something else and
focus on it, you might have better results.

------
mondoshawan
This tool looks quite useful for weird build environments with tons of
completely undocumented native code such as AOSP (Android) where IDEs
regularly fall over.

------
godmodus
Trying should be made more accessible.

"Looks" promising.

------
manishsharan
Java provides an easy way for tracing control flow during program execution:
when I join a new contract and am responsible for maintaining legacy java
code, I use AspectJ with logging aspects with before and after pointcuts. This
helps me figure out the application control flow.

I wonder if something similar is available for C/C++.

------
zimablue
It's closed source?

------
hkon
Reminds me of the code bubbles and debug-canvas I used to use in Visual Studio

------
fapjacks
Incidentally, Webkit is the hairiest pile of code I've ever seen.

------
general_ai
How does it compare to Google Kythe (internally known as Code Search):
[https://github.com/google/kythe](https://github.com/google/kythe)?

