
Show HN: Sourcetrail – Visual Source Explorer Now Supports Python - egraether
https://www.sourcetrail.com/
======
mmmrk
Looks very interesting, but seems to be excruciatingly slow at indexing Python
projects. Ran it on
[https://github.com/fonttools/fonttools/tree/master/Lib/fontT...](https://github.com/fonttools/fonttools/tree/master/Lib/fontTools)
(~230 files) and it took 45 minutes while keeping all 16 cores of my Ryzen
1700 at full load, sometimes using ~16 GB of RAM. Exiting the application ends
in a SEGV, but seemingly without consequences? Also, after moving the database
files to a different folder and renaming the "source group", it now asks me
run re-index everything. AAAARRRGHHH.

~~~
egraether
Sourcetrail dev here. Thanks for the feedback!

Yes, Python indexing speed is still slow, mostly due to the underlying
framework that needs a lot of time to resolve symbol names. We hope we can
improve this.

The crash on closing was already reported:
[https://github.com/CoatiSoftware/SourcetrailBugTracker/issue...](https://github.com/CoatiSoftware/SourcetrailBugTracker/issues/698)
I opened another issue to not inform about indexing when only renaming source
groups:
[https://github.com/CoatiSoftware/SourcetrailBugTracker/issue...](https://github.com/CoatiSoftware/SourcetrailBugTracker/issues/700)

------
omeid2
Absolutely amazing product! wish it supported more languages though.

Other than being a great and useful product, their Global Pricing is based on
big-mac index is fantastic. To put this in perspective, the "standard" price
of 744.47$ is twice the median wage in Afghanistan.

~~~
oakslab
Took me a while to find out that you refer to the AUD amount. Where did you
get the global pricing? I selected different countries, but the amounts still
show as equivalent to the USD one?

Edit: They are using paypal rate (or close to it).

------
victor106
Never knew a product like this existed. This is a godsend for me as I navigate
lot of old codebases.

------
albertzeyer
I usually use PyCharm in a very similar manner, to browse others code locally.
This is much more efficient compared to e.g. browsing/reading the code on
GitHub. Whenever I spend more than a few minutes browsing some code on GitHub,
I just clone the repo instead and open it in PyCharm. Finding some usages of
some function, or simple searching for some string or so is much faster and
more comfortable (also because the search on GitHub is so limited). I can only
recommend this workflow to anyone who reads others source code. And reading
others source code is anyway also something I would recommend everyone.

About Sourcetrail, I see from the homepage that it also has some additional
graphical representation. Is this more helpful over what PyCharm already
offers?

If so, maybe this also can be a feature request to PyCharm.

~~~
bshipp
Another vote for the git clone / pycharm workflow. If I'm thinking I might end
up tweaking the code I might fork instead of clone to make generating a pull
request a simple click in the browser. But I'm definitely in love with
Pycharm. Even to the point of using it to evaluate code on remote servers
locally on my laptop. Just perfect.

------
tmikaeld
There exists one for Javascript as well:

[https://github.com/Bogdan-Lyashenko/js-code-to-svg-
flowchart](https://github.com/Bogdan-Lyashenko/js-code-to-svg-flowchart)

Including a VSCode extension (but it seems unmaintained):

[https://marketplace.visualstudio.com/items?itemName=lucasbad...](https://marketplace.visualstudio.com/items?itemName=lucasbadico.code-
flowchart)

~~~
egraether
Thanks, this looks really nice! It doesn't seem to resolve symbol names
however, so you can't see how different functions/definitions are related to
each other and navigate between them. But the same author is working on
another project that seems to go in that direction:
[https://github.com/Bogdan-Lyashenko/codecrumbs](https://github.com/Bogdan-
Lyashenko/codecrumbs)

~~~
tmikaeld
That looks really awesome! Thanks for sharing or I'd miss it.

------
nebucnaut
Here is the release post that describes the Python integration a bit more
detailed:
[https://www.sourcetrail.com/blog/sourcetrail_supports_python...](https://www.sourcetrail.com/blog/sourcetrail_supports_python/)

------
dimmuborgir
How does this compare to Code Maps in Visual Studio? Can anyone who has used
both comment?

~~~
egraether
Sourcetrail dev here. To my knowledge, without going into too much detail:

* Sourcetrail has a tighter integration between source code and graph visualization, because it offers less features than a full blown IDE like Visual Studio and can keep the UI more simple. Users of Sourcetrail can use either code or graph to navigate and both update simultaneously. In Code Maps the visualization stays the static until explicitly updated by the user.

* Sourcetrail's visualization is centered around one currently active symbol. Then other nodes are positioned up/below/left/right of it to encode additional meaning into the layout. Nodes are bundled together to de-clutter. In Code Maps the user starts with a certain symbol and then adds more and more symbols to the visualization, building a map. Filters can be applied to de-clutter.

* Sourcetrail needs to index all source code upfront, but afterwards querying and visualization generation is fast. I'm not certain, but to my knowledge Code Maps indexes only on demand/only partly upfront, so queries are not as instant.

If you are interested in Software Visualization, you can watch my recent talk
at ACCU 2019, where I go through some design decisions of our visualization:
[https://www.youtube.com/watch?v=Gvmp3Gzhv8o](https://www.youtube.com/watch?v=Gvmp3Gzhv8o)

~~~
dimmuborgir
Thanks a lot.

------
devbat8712
Oh my god, I've been looking for something like this for like a full year.
This is awesome.

------
argd678
What frameworks did you use to create the UI?

------
topazas
Awesome! Will try without a doubt.

------
Asooka
I tried the demo back when this was still called Coati. Very impressive, but
not quite suitable for my current job. I didn't have time to fully evaluate it
at the time, but these are my impressions from trying to use it on our
codebase (I can't say exactly who we are due to NDAs).

My first criticism would be that the UI wastes a lot of space with all the
curves and thick borders. Might be due to me using Vim most of the time, but
I'm used to a higher text density. It also felt sluggish, but this might have
been improved in the meantime, and also I used it on Linux which is
notoriously bad for graphical responsiveness.

Second, the indexing was kind of slow. Our codebase is very far from the size
of Chrome's, but it is a big commercial C++ project - something like 2M sloc
(with comments). This was compounded by the fact that switching branches often
led to nearly rebuilding a lot of files due to changes in often-used headers.

Third, it can only index one build configuration. Our single source tree is
used to build several products and indexing only one build configuration is
helpful, but often I need to know if changes necessary for product A will
impact product B. It would be really nice if the indexing was split off into
its own daemon and I could e.g. have three daemons looking over three build
configurations on Linux, an additional remote daemon indexing the MacOS
source, and further two running on a Windows host (possibly a VM). This might
sound extreme and convoluted, but the kind of large C++ project where
Sourcetrail would shine is the kind that has its own very opinionated
idiosyncrasies.

Finally, it doesn't actually solve my most frequent use-case. Sourcetrail is
great for browsing and understanding OOP structure of code. However, I am most
often interested in exploring dataflow. I really want to know where a
particular member of a struct is set, where the values in the expression come
from, where those values are set, where the values in those expressions are
from, etc. This can be accomplished with pretty much the same hierarchical
interface that Sourcetrail currently has, but instead of classes and methods,
the basic units should be expressions and values. Another useful feature would
be "where is this value used" \- say you have a member of a struct, or a
method returning a constant value, and you want to know where the value of the
member is used. Not where the member is used, because the value of the member
is often copied around, but without being modified. I would really like
something that can track through assignments to tell me where this value ends
up. Right now Sourcetrail doesn't really cover this usecase better than
Vim+ctags+rtags+ripgrep. Yes, this sounds a bit like "Dropbox is just
sftp+rsync", but I couldn't make Coati work better for my usecase than my
current setup.

I can't demand that dataflow analysis be implemented, because I can't promise
that I'll use Sourcetrail even if it is, but a data-oriented view of code
might be a worthwhile development to consider.

~~~
egraether
Sourcetrail dev here. Thanks for your extensive feedback!

A lot of things have changed since Sourcetrail was called Coati (about 2 years
ago). We put a lot of work into improving indexing speed, handling multiple
configurations/different languages within one project and reducing
"sluggishness" in the UI. Sourcetrail runs now smoothly on code bases with
multiple MLoC.

But I agree with your suggestion regarding data-flow analysis. That is what
understanding unfamiliar source code often really comes down to. We also had
some user requests to go that direction. We never really looked into this area
so far, because it is a lot harder to collect the data (dynamic analysis) and
it needs a whole new user interface.

While the data collection is solvable (Visual Studio debugger can do it, I
think), I'm not sure whether it is really possible to come up with an
effective user interface that shows which paths the different values take.

To explain why this is hard, let me use a metaphor: With data-flow you need to
deal with a new dimension: time. Sourcetrail can handle dependencies between
definitions really well, before the code is executed: space. What you want is
a tool that combines the two into a space-time exploration tool of source
code. Not sure if possible at all, but very interesting to think about. :)

~~~
j88439h84
I just watched your talk, and found it very interesting, thanks for sharing
it!

I hope you can do the data flow analysis. That would be so cool and SO useful.

Python certainly has the ability to collect the data. A couple existing tools
make use of this.

For example, MonkeyType and Birdseye observe the values that are passed around
by tracing execution during test runs (or even during a production run, but
the performance impact can be substantial).
[https://github.com/Instagram/MonkeyType](https://github.com/Instagram/MonkeyType)
[https://github.com/alexmojaki/birdseye](https://github.com/alexmojaki/birdseye)

Even more information can be gleaned from the gc module (see
[https://mg.pov.lt/objgraph/](https://mg.pov.lt/objgraph/) for a tool using
it).

These tools make good progress, but I'd be _very_ interested to see what a
software-visualization expert would come up with.

I'd also love to see how a concurrent execution tree can be visualized. For
example, the wonderful Trio concurrency library is built on a tree of
concurrent tasks. It would be so cool to see which events are happening at the
same time. I've never seen a visualization of how it'd work. (The Trio team is
also extremely friendly on their Gitter chat.) [https://github.com/python-
trio/trio](https://github.com/python-trio/trio)

Structured logging is yet another exciting area. Can we generate
visualizations from logs in OpenTracing/OpenCensus format? Some existing work
is
[https://github.com/jonathanj/eliottree](https://github.com/jonathanj/eliottree)

Gary Bernhardt's "A whole new world" talk
[https://www.destroyallsoftware.com/talks/a-whole-new-
world](https://www.destroyallsoftware.com/talks/a-whole-new-world) proposes
extracting data from logs and highlighting important lines from tracebacks and
and slow lines from trace timings in the editor.

I haven't used Structurizr, but it seems interesting. Do you have thoughts on
it? [https://structurizr.com/](https://structurizr.com/) has a python port at
[https://github.com/sixty-north/structurizr-python](https://github.com/sixty-
north/structurizr-python)

