
ROOT – Data Analysis Framework - michaelsbradley
https://root.cern.ch
======
ddavis
An HN post on software I actually use ;) Those of us in the particle physics
world use ROOT quite often. ROOT is pretty old (for me at least, started in
the mid 90's) The most recent version (ROOT 6) is a great step forward for
modern C++ use. It's very far down the line, but the experimental ROOT 7 code
I've seen is even better.

~~~
ludamad
Glad to hear it's getting better. I've heard not-so-nice things in the past

~~~
Create
ROOT can't be said to be OO because it breaks the encapsulation in the guts.
There is a massive usage of "g" global pointers : gROOT, gDirectory, gTree,
gEnv, gSystem, gPad, etc... Around one hundred in v5-18-00, a disaster. And
this definitely breaks the fundamental OO principle of encapsulation.

Then since ROOT violates three basic principles of OO (encapsulation,
inheritance, virtuality) we are compelled to conclude that ROOT can't be
considered as an OO software. ROOT is a bright example of people having jump
to C++ but missed totally the point of OO. At least it will probably stay in
the history of software because of that.

What could be the improvements in a ROOT major revision ?

o at least fix the name ! Is it ROOT, Root, root ? (Hell, we are pretty sure
that any Bazaar model software have at least converged on that !)

o have then a correct namespacing of classes and libs.

o restore encapsulation (then get rid of the g pointers).

o revisit the inheritances. At least have a good histogram class. And arrange
the storage area to be stable (then "fix" the TTree). And please, have an
introspection class that looks like an introspection class.

o use pure abstract interfaces to separate domains. And stick strongly to the
idea to have them pure.

o etc, etc, etc, etc, etc, etc, etc, etc, etc, etc,...

[http://openscientist.lal.in2p3.fr/osc_web_16.11/html/faq.htm...](http://openscientist.lal.in2p3.fr/osc_web_16.11/html/faq.html#faq_ROOT)

Before beginning, I should point out that these are simply my own views and
that I hold no animosity against the developers — their design simply doesn't
work for me. Presumably there are many people "out there" who think ROOT an
excellent piece of software. In complete honesty, though, I have yet to meet
any of them. In fact, I've never had any complaints that this article mis-
represents ROOT, and I've had a fair bit of "fan mail", not mention
discussions with well-respected developers and physicists who hold precisely
the same views :-)

[http://insectnation.org/articles/problems-with-
root.html](http://insectnation.org/articles/problems-with-root.html)

[http://linuxfr.org/nodes/18919/comments/632920](http://linuxfr.org/nodes/18919/comments/632920)

[https://linuxfr.org/nodes/19928/comments/698692](https://linuxfr.org/nodes/19928/comments/698692)

~~~
batbomb
ROOT was the product of Fons and Rene porting PAW from Fortran, learning C++,
OO, flirting with Taligent coding styles, and a bunch of other things all at
the same time.

It was okay for a time, but that's time has long passed.

~~~
drauh
Next, someone will be posting SuperMongo
[http://www.astro.princeton.edu/~rhl/sm/](http://www.astro.princeton.edu/~rhl/sm/)

~~~
batbomb
I work with RHL.

------
ephimetheus
I do data analysis of ATLAS data, and it's everywhere. Everyone knows that
root kind of sucks, and some people have moved to matplotlib do to at least
the plotting for them, however that brings a slide of other problems for you,
for example the plot guidelines for ATLAS publications is formulated in root
terms, so other kinds of plots sometimes not get approved. On the other hand
there is literally millions of lines of code in the analysis framework that
heavily based on root, so there is no real way to switch it out.

------
batbomb
There is quite a lot I could say about ROOT, ROOT files, CINT, but I won't.

There's better options. Don't use it unless you are in HEP.

~~~
whyever
What options are better?

~~~
gh02t
Almost anything, to be honest. Matplotlib, R, Matlab, Mathematica etc. are all
much nicer. Those will do most things ROOT does and be much less delicate. In
a lot of places (especially outside CERN) Matplotlib is taking over where ROOT
might have been used, but it's a slow process.

The problem is that ROOT still has a few very specialized features that its
users still need and you can't get elsewhere. And there are a ton of legacy
analysis tools built on top of it that are difficult to port because of how
ROOT is. _And_ a lot of its more extensive users are comfortable with it and
have no motive to change (they're busy with being scientists).

I don't know anybody who actually _likes_ ROOT, but it also won't be going
away any time soon.

~~~
konschubert
The one thing I am missing in the non-ROOT universe is a powerful fitting
framework that can do multidimensional and simultaneous fits in disjoint
function domains.

------
alxprc
I am a particle physicist, and used to use ROOT every working day. It is still
used daily by thousands of other particle physicists, though, and is a core
part of many high-energy physics experiments.

I think there are a few of objectively neat features of ROOT:

* Versioned persistency of C++ objects deriving from the TObject base class [1];

* Script-like execution of C++ and a C++ REPL based on clang [2]; and

* Dynamic bindings of the C++ classes to Python [3].

There's an accompanying, but independently developed, file access protocol for
reading and writing ROOT files over a network, too [4].

On the other (subjective) hand, ROOT is regarded a pain to use by ‘analysts’,
the people who use ROOT to make the results that go in to physics papers.
There are already some good, old-but-still-valid critiques [5, 6], so I won't
say too much, but I think a large part of the problem comes from two things:

1\. ROOT tries its best to do everything that a particle physicist might want
to do. This encompasses a very wide range of things, and this has lead to ROOT
having a very large, often intractable codebase that cannot be modularised.

2\. It has failed to keep up with contemporary coding techniques and analysis
methods. Most of the PhD students I know use the Python interface to ROOT, and
yet the ROOT developers are planning to drop Python support for the next major
version (ROOT 7, which is expected in 2018). Those that do use C++ aren't able
to use even C++11 effectively with ROOT, as its interfaces aren't compatible.

Luckily, I'm confident that analysts will move to a better way. I've been very
encouraged by the astrophysics and machine learning communities in particular,
who are using Python to do low- and high-level analysis on large datasets, as
we do in particle physics, and are producing fantastic results. Tools like
pandas, matplotlib, and scikit-learn are an absolute _joy_ to use in
comparison with ROOT, and the communities within the Python ecosystem are
wonderful: they foster very open code development, and value readable, well-
documented, fast code.

I don't need ROOT to get any better, because I think the future is already
here.

[1]: [https://root.cern.ch/root/html534/guides/users-
guide/InputOu...](https://root.cern.ch/root/html534/guides/users-
guide/InputOutput.html#inputoutput)

[2]: [https://root.cern.ch/cint-prompt](https://root.cern.ch/cint-prompt)

[3]: [https://root.cern.ch/pyroot](https://root.cern.ch/pyroot)

[4]: [http://xrootd.org](http://xrootd.org)

[5]: [http://www.insectnation.org/articles/problems-with-
root.html](http://www.insectnation.org/articles/problems-with-root.html)

[6]: [http://www.insectnation.org/articles/root-
wishlist.html](http://www.insectnation.org/articles/root-wishlist.html)

~~~
karies
Background upfront: I'm the guy behind the C++ interpreter and ROOT's new
interfaces. I'm the co-author of the only surviving C++ reflection proposal
and the author of the std::variant proposal. I have contributed to the C++
Core Guidelines
([http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines](http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines)
[https://youtu.be/1OEu9C51K2A](https://youtu.be/1OEu9C51K2A)).

* HEP stores about 0.5 exabytes of data in ROOT format, that's almost exclusively serialized objects that do not know anything about TObject.

* XRootD is not really specific for ROOT files. A better example would maybe be our JavaScript de-serialization library, [https://root.cern.ch/js/](https://root.cern.ch/js/)

* No way will the python binding be dropped. I wonder where you got that rumor from. About one third of our users is using it.

* HEP is limited by CPU resources, which is part of the reason why HEP decided to use a close-to-bare-metal language for the number crunching part.

* We just made the use of python and R multivariate analysis tools with ROOT data more straightforward.

* We have people from genomics etc coming to ask for help, because they cannot find a system that scales as well as ROOT does.

And then we have a different perception of the direction out there. I see that
Hadoop was nice but slow, Spark is nice but slow, so now things are moving to
C++, see e.g. ScyllaDB. There is no reason for us to move away from it, but
every reason to make it more usable.

And yes, I agree that this is an issue. But many physicists do not.

~~~
alxprc
Thanks for clarifying. You're right that I was too broad, and it's certainly
true that many physicists don't share my opinion (I'm working on that).

Speed is always a concern, but I don't think it dictates that C++ should be
the primary ‘user-facing’ interface. Numpy is fast, but it doesn't sacrifice a
nice API to achieve it.

Personally, a big difference is that a lot of the Python packages feel fast to
_use_ and, most importantly, to _write_. ROOT can be fast to execute, no
question, but I feel like I'm fighting against it (and I'm sorry that's very
vague and qualitative).

It would be very interesting to hear more about the genomics use-case, and how
they evaluated the other options.

~~~
whyever
I'm using Python for analysis, and I'm running into performance issues
constantly.

------
pjmlp
Although I never used ROOT while at CERN, it surely was part of many of our
discussion subjects at ATLAS-DAQ.

Nice to see it on HN.

~~~
pif
I was in ATLAS DAQ, too, and I'm happy I never had to use too much of ROOT,
too. PAW, on the other hand, it was... charming!

~~~
pjmlp
My area was L2PU and Dataflow related, a decade ago.

So also not that much use of PAW as well.

~~~
pif
I was in the online monitoring group, ever heard about GNAM? And it was around
a decade ago, too.

But PAW was earlier, in the KLOE experiment for my graduation thesis.

------
jbmorgado
My biggest advice about ROOT is: Don't use it really.

Look, ROOT is a very complex framework for data gathering and analysis build
by __physics __and it shows every step of the way. The bugs are everywhere and
it does really weird things like setting global variables when you analyze
some piece of data for instance, changing your results for all subsequent
analysis (this particular bug cost me about 2 weeks).

And in the end, there isn't really any point in using ROOT.

\- Data gathering can be done with a simple CSV (binary if you wish), a more
advanced SQL database, or in the realm of research with the venerable HDF5
format.

\- Data analysis in C++ or any compiled language, just doesn't make much
sense. You can use Python or R. The libraries to read and treat data are
optimized and will make the process much less error prone and probably faster
in the end.

Seriously, don't make the same mistakes as I did just because some older
people in your lab use ROOT and you feel compelled to do it as well. There are
much better tools for the job and I regret not searching for them before
wasting about 6 months of my PhD thesis trying to integrate ROOT in my
research workflow.

~~~
pif
> My biggest advice about ROOT is: Don't use it really.

I partially agree: don't use it as a framework, but do use its libraries, they
are good!

~~~
batbomb
A good chunk of it's libraries are re-exported open source libraries exporting
alternate/C++ interfaces though! For example, GSL, FFTW3, and more than a few
others.

I will say that it is nice that it has most any math function you will need. I
know people who get super frustrated when they can't find a landau
distribution in whatever language/library they are using and then just go back
to ROOT at the end of the day.

------
sklogic
And I am still reaching for CERNLIB (PAW) any time I need to plot something.
Could never understand why Rene Brun, Perevozchikov, et al. got so attracted
to the OOP back then.

~~~
pjmlp
Back then OOP was everywhere in C++ world.

I think that HNers that bash J2EE and JEE designs never had the "pleasure" to
enjoy mid-90's C++ OO frameworks.

~~~
sklogic
Yet, it was really upsetting for me to see the smart and very experienced guys
who wrote the beautiful CERNLIB to fall into this.

