
Root: CERN's scientific data analysis framework for C++ - z3phyr
https://root.cern.ch/
======
Herrin
ROOT is my go-to example for peak Object Oriented in the 90s. Look at the
inheritance for something like a 2D histogram of doubles[1].

Trying to find the documentation for how to draw an arrow on a plot was always
fun, because searching for "TArrow ROOT" would inevitably get results for
"taro root".

The XRootD[2] project is pretty interesting, though, and I feel like the
software industry is going to have to start dealing with similar data problems
before too long.

[1]
[https://root.cern.ch/doc/master/classTH2D.html](https://root.cern.ch/doc/master/classTH2D.html)
[2] [http://www.xrootd.org/](http://www.xrootd.org/)

~~~
ddavis
Object oriented is part of the name! If I remember correctly it's some
combination of the original developers' names/initials with OO sandwiched
between.

------
MereInterest
Ooh, boy, rant time. I used ROOT all through my PhD, and it is a royal mess.

As a starter, there is a mass of global state. It maintains a "gDirectory",
which is the currently active directory, either referring to a location on
disk or in memory. Many objects in ROOT will register themselves with the
current gDirectory on creation, and will be destructed when the gDirectory
closes. This includes objects that were declared on the stack, leading to
destructors being called twice.

Since large data requires good performance, you might be wondering how this
interacts with multithreading. Not well at all. There are many ROOT internals
that assume a single-threaded environment. If you call TThread::Init(), most
of those are avoided. However, ROOT has some memory tracking code that gets
called on the constructor/destructor of all ROOT objects, and that memory
tracking code is entirely thread unsafe. In any program that links against
ROOT libraries, the memory tracking would be enabled based on a user's .rootrc
configuration file. Tracking down why a program segfaults for some users but
not others, and those segfaults happen at any point in the code that interacts
with ROOT, was quite irritating.

The histogramming, which is the most used form of plotting in high-energy
physics, has a broken class hierarchy. TH2, 2-d histograms as a function of x
and y, are implemented as a subclass of TH1, 1-d histograms. As a result,
there are all manner of functions appropriate only for TH2, but are able to be
called on a TH1. This includes getting a Z axis for something that inherently
doesn't have a Z axis.

And it has a web browser, TGHtmlBrowser. I don't know why it has a web
browser. It doesn't support https, css, or javascript, so I couldn't see how
it performs on any modern website, but it was rather amusing to see it break
on the Acid tests.

Many of the issues with the interpreter were fixed with ROOT6, which uses an
clang-based interpreter. However, every command executed increases the memory
usage of the program. This becomes an issue with automatically updating
histograms, because the main way that GUIs can be updated is by issuing a
command that is then run through the interpreter.

~~~
SiempreViernes
I always name my TBrowser T_T to express my annoyance of having to use it ':('
is sadly not a valid name : /

~~~
yummypaint
It's like the graphical file open window is designed to troll the user. It
opens in icon view sorted by name with hidden files visible. Like who would
ever want that? Its gaurenteed to never be useful. Switching to list view and
sorting by date requires 3 clicks, and date sorting only works in one
direction. Trying to type to highlight a directory or file by name often
doesnt work.

------
konschubert
Root has some features that are very unique and powerful.

It’s used in particle physics today mostly because it allows to do performant
out-of-memory, on-disk Data Processing.

With frameworks like Python pandas, you always end up having to manually
partition your data if it doesn’t fit in memory. And of course, it’s C++, so
by default the data analysis code is pretty performant. This makes a
difference when you can iterate your analysis in one hour instead of 20.

That being said, when I last worked with it, Root was a scrambled mess with
terrible interfaces and way to many fringe features, e.g. around plotting,
that are better handled by Python nowadays. It even has a C++ command line!!!

I wrote a blog post back then how I thought it could be fixed:
[https://www.konstantinschubert.com/2016/06/18/root8-what-
roo...](https://www.konstantinschubert.com/2016/06/18/root8-what-root7-should-
have-been.html)

~~~
SiempreViernes
Let's be honest: it's used today because it was used yesterday, and there is a
lot of useful legacy code. Not many _like_ plotting with root, or faffing
about with memory allocation.

The reason it started getting use is that in the 1990's, when the current
generation of experiments were starting up, C++ was hot and Fortran was not.
PAW was old and in Fortran and so the young ones wanted to work with the new
hip ROOT instead[1].

[1]: [https://www.quora.com/Why-does-CERN-use-ROOT/answer/Mario-
Al...](https://www.quora.com/Why-does-CERN-use-ROOT/answer/Mario-Alemi)

~~~
pjmlp
Back when I was at HLT, I remember many talking about ROOT but we didn't use
it much in TDAQ.

~~~
SiempreViernes
Oh man, that’s to hardcore for me. I bow to your superior inside knowledge and
ask for enlightenment on the meaning of HLT & TDAQ

~~~
pjmlp
It is a Google search away. :)

~~~
konschubert
No it isn't.

------
westurner
[https://root.cern.ch/root-has-its-jupyter-kernel](https://root.cern.ch/root-
has-its-jupyter-kernel) (2015)

> _Yet another milestone of the integration plan of ROOT with the Jupyter
> technology has been reached: ROOT now offers a Jupyter kernel! You can try
> it already now._

> _ROOT is the 54th entry in this list and this is pretty cool. Now not only
> the PyROOT, the ROOT Python bindings, are integrated with notebooks but it
> 's also possible to express your data mining in C++ within a notebook,
> taking advantage of all the powerful features of ROOT - plotting (now also
> interactive thanks to (Javascript
> ROOT]([https://root.cern.ch/js/)](https://root.cern.ch/js/\))), multivariate
> analysis, linear algebra, I/O and reflection: all available within a
> notebook._

Does this work with JupyterLab now? (edit) Here's the JupyterLab extension
developer guide:
[https://jupyterlab.readthedocs.io/en/stable/developer/extens...](https://jupyterlab.readthedocs.io/en/stable/developer/extension_dev.html)
(edit) here's the gh issue: [https://github.com/root-
project/jsroot/issues/166](https://github.com/root-project/jsroot/issues/166)

...

ROOT is now installable with conda: `conda install -c conda-forge root
metakernel jupyterlab # notebook`

~~~
carreau
For c/c++ in Jupyter, see xeus-cling [https://github.com/QuantStack/xeus-
cling](https://github.com/QuantStack/xeus-cling)

~~~
stochastic_monk
Coincidentally, cling (wrapped by xeus-cling) is also a product from CERN.

~~~
pjmlp
Many of the CERN researchers are pretty deep into C++.

It was there that I got my template meta-programming baptism, back in 2002,
when gcc was still trying to cope with template heavy code.

And curiously, also where I got my first safety heavy code reviews of C++ best
practices.

------
phreeza
This brings back memories from my undergraduate days as a physics student,
where we made extensive use of root in labs. It was a kind of badge of nerd
honor to do analyses in root as opposed to Matlab.

It also came with a sort of C++ interpreter that gave you a repl. I remember
that kind of blew my mind back then.

~~~
saagarjha
> It also came with a sort of C++ interpreter that gave you a repl. I remember
> that kind of blew my mind back then.

That would be Cling (or CINT):
[https://root.cern.ch/cling](https://root.cern.ch/cling)

------
tehsauce
They also have been building a javascript version! A few years ago (as part of
the GSOC program) I worked on parts of their webgl renderer, and ended up
adding a feature to threejs as a result! Sergey has done a great job with it.

Try it in your browser:

[https://root.cern.ch/js/](https://root.cern.ch/js/)

------
V1ndaar
I'm a PhD student in physics, working on CAST [1], a small experiment at CERN.
That fortunately means I don't depend on other peoples' code too much.

With the C++ interpreter of ROOT you can run so called ROOT macros. They are
basically just C++ files containing your code at top level in pair of `{}`.
The "funny" thing is half the time you run a piece of code it'll work the
first time. But then running it again will cause the interpreter to segfault,
due to the mess of global state and the half baked memory management of ROOT.

And don't get me started on the usability of the ROOT classes. A huge amount
of stuff is string based, so throw away your type checking guarantees. I mean
why bother...

Since I really didn't want to continue working with ROOT for my PhD (after
having used it before), I decided to ditch the previous data analysis code and
start from scratch [2]. I decided to use Nim for it, because it provides me
with a powerful language (on par with python for my purposes - aside from
admittedly certain libraries, which I had to wrap), yet still being as fast as
the old ROOT code (in fact faster for my data analysis, but the comparison is
problematic).

Yeah, it cost me a lot of time to get "back to where I started", the code
itself might also not be a shining example of perfection, but I learned a huge
amount about programming and I understand every piece of it. It was totally
worth it. The gains in efficiency thanks to Nim and due to the fact I know the
codebase means I make up a lot of the lost time anyways.

So to any people out there who may be in a similar position, don't be afraid
to throw out what you don't like. :)

[1]
[https://home.cern/science/experiments/cast](https://home.cern/science/experiments/cast)
[2]
[https://github.com/Vindaar/TimepixAnalysis](https://github.com/Vindaar/TimepixAnalysis)

~~~
SiempreViernes
Ah, axion searches! Who said you can't get paid for staring at a wall?

How come you didn't go with Julia? (or Fortran for that matter?)

~~~
V1ndaar
Haha, well you can get payed for it, but you won't be payed well.

It's hard to give an easy answer for that.

But let's step back a little: I played around with Julia several years ago. I
don't know when exactly, probably in 2015 (?). Back then I thought of it as a
faster python for science. Many things I would use python for back then I
could have done in Julia too. Others I couldn't though, because the eco system
was still too small. And while Julia was somewhat faster for the things I
tried, it wasn't amazingly faster. My interest in the language dropped
sometime after that. But still, I felt like it was a language targeted
specifically to scientists. Many things I wanted from my analysis framework
however were outside that bubble I felt. I didn't simply want to write a
faster "analysis script". I do realize however, that the language has evolved
a lot and I'm happy it's finding acceptance!

Fortran on the other hand, I never really considered. I suppose modern Fortran
is a pretty good language though.

So why did I choose Nim then? It gave me: \- the ability to produce standalone
binaries I could just put on any data acquisition pc without a hassle (well
ok, be careful about old glibcs) \- it's fast, and I felt right at home syntax
wise coming from python \- the community is amazing. The first time I entered
Nim's IRC channel I noticed Araq, the creator of the language, answering
random people's questions! In general the community allows for a super quick
feedback loop to learn the language \- it's a pretty concise language. The
whole manual can be read in less than a day \- having written some Clojure, I
loved the idea of a powerful macro system \- after seeing mratsim's
arraymancer library [1] I was happy to 1) have a numpy substitute and 2)
thought if one person could write such a great library in O(1 year) it must be
a pretty awesome language to work with :) \- being able to trivially wrap any
C code around is super helpful \- a pretty strong type system! No more
annoying implicit conversions from any type to a bool, error prone implicit
int <-> float conversions etc.

I probably forgot many points, but well. The truth is of course, many
languages could have worked for me, but the time I spent with Nim in the
beginning was just super pleasant.

[1]
[https://github.com/mratsim/Arraymancer](https://github.com/mratsim/Arraymancer)

~~~
ziotom78
I followed the opposite route: tried Nim and liked it a lot, then switched to
Julia. Perhaps my typical usage of a scientific language is quite different
from yours. In my case, what drove me away from Nim was the fact that even
small features of the language keep changing in no controlled waya. It was
2014, and commits related to some obscure feature were subtly changing the
behaviour of apparently unrelated stuff. Julia changed a lot in the last
years, but in a very controlled way, and always sticking to semantic
versioning. In 2014, Nim 1.0 was said to be behind the corner, yet only a few
weeks ago the first RC for version 1.0 was released (Nim 0.20), and it still
broke some basic stuff like bitshift operators.

Moreover, the lack of scientific libraries in Nim was more severe wrt Julia.
Sure, writing C bindings in Nim is easy, but it is not a zero-effort job: you
have to properly test them to check that types get converted correctly, and
you have to write some documentation. Some guy is currently checking the
quality of Nim libraries [1], and he gave very low scores for a few libraries
of mine (rightly so, IMO, e.g., [2]) because they lack documentation.

[1] [https://forum.nim-lang.org/t/5092](https://forum.nim-lang.org/t/5092) [2]
[https://github.com/ziotom78/nimcfitsio](https://github.com/ziotom78/nimcfitsio)

------
madhadron
I remember when ROOT came out. I remember downloading it, going through it a
bit, and wondering why anyone would leave PAW for it. To this day, I think PAW
is probably a superior system and the ROOT project is best considered an
expensive failure.

The one guy doing C++ (on a different experiment) skipped ROOT entirely and
used Python to orchestrate C++ code.

------
yummypaint
I can't recommend pyroot highly enough. All the benefits of python while
removing some of the worst usability problems in root. Compiling with cython
meshes well, as it tends to provide the biggest performance gains for the
types of tasks people frequently need to perform (like checking lots of
conditionals).

Advice to people starting with ROOT who have lots of data to process: if at
all possible dont mess with multithreading. Make a single threaded process
that grabs only the data needed and makes its own output file. Then combine
the outputs with the hadd tool. You can run the single threaded program en
masse with HTcondor or a similar scheme and it 'just works' while remaining
scalable.

------
ddavis
Are there any other academic fields so dependent on a single piece of software
the way that (experimental) particle physics is dependent on ROOT?

~~~
bob457
My field is control systems. Every academic I know, and every paper I’ve read
which mentions a software stack, uses matlab/simulink. Simulink appears to me
to have no good alternative (maybe jmodelica or something?) There are some
python/Julia alternatives to matlab, but the existing control libraries are
really pretty limited in comparison.

I’m not sure exactly how dependent particle physics is on ROOT, so direct
comparison is difficult.

~~~
ChrisRackauckas
The Modelica systems are a good alternative, but don't really exist in high
level languages yet, other than some transpilers which are a little iffy. We
are planning to change that with Julia though which has enough of an ecosystem
to easily build such an open source tool unlike Python or R.

------
hackworks
Very nice to see this trend on HN. I have been using croot as a REPL for C/C++
for close to a decade.

------
aninteger
I like the user interface. It was built on xclass (a FVWM95 like UI toolkit).
They ported it to Windows by emulating the xlib portions using gdk 1.3.

------
Areading314
Why would some one use this, over say Spark? What are it's unique
capabilities?

