
Particle physicists turn to AI to cope with CERN’s collision deluge - okket
https://www.nature.com/articles/d41586-018-05084-2
======
danbruc
A simplified model of the task. You throw 10,000 springs of various sizes
(particle trajectories) into a box and record the intersection points (hits)
with a set of (spring-penetrable) nested cylinders (detectors) in that box.
You will get on average about 10 intersection points per spring. Now given
those 100,000 intersection points, reconstruct the size, position, and
orientation of the 10,000 springs. Lets add the facts that all the springs are
aligned along one axis and that the cylinders are nested around that axis to
be geometrically closer to reality.

The hardness is due to possible ambiguities of what collection of springs
caused the observed intersection points, imperfections in the helical shape of
the springs and the cylindrical shape of the detectors, limited resolution of
the measured intersection points, and missed (detector efficiency »only« 99 %)
or added (detector noise) intersection points.

~~~
Coding_Cat
Plus, the detector is multi-layered around the point of collision. Each layer
measuring something slightly differently with different accuracies. Some
springs are even completely invisible and you can only find them by looking at
what is missing (neutrally charged particles).

------
konschubert
What's the news here? Using Machine Learning in data analysis is common
practice in particle physics.

Is the news that they want to use ML in the trigger selection now?

Also, which one of the four experiments is doing this?

EDIT: Ah, it's CMS.

~~~
dukwon
> Is the news that they want to use ML in the trigger selection now?

Not quite (and that wouldn't be news). It's using ML for track reconstruction.
Not even LHCb does this.

~~~
tempay
LHCb does use ML for trigger selections already[1].

[1]
[https://cds.cern.ch/record/2243560?ln=en](https://cds.cern.ch/record/2243560?ln=en)

~~~
dukwon
Yep. I meant we don't use it for track reconstruction.

------
konschubert
The problem with AI in particle physics is that you need to very well
understand the efficiency of the selection ( rate of false negative and false
positive classification). And this rate of course isn't uniform for all kinds
of events. Thus, the uncertainty on the selection efficiency tends to grow
with the complexity of the machine learning model. This directly hurts your
sensitivy, the very thing you were trying to improve by using an AI - based
process.

It's a trade-off.

~~~
Analog24
As already pointed out, detector efficiency is determined through extremely
detailed simulations of the entire system. Those measurements are done in a
completely orthogonal manner so the complexity of the model has nothing to do
with the systematic uncertainties that arise from the selection efficiency.

~~~
konschubert
> As already pointed out, detector efficiency is determined through extremely
> detailed simulations of the entire system.

It depends on the experiment, LHCb for example does not use simulated
background.

In any case, the more complex your model gets (number of variables) the
exponentially more simulated Monte Carlo events you need to fill that
multidimensional space.

~~~
dukwon
> LHCb for example does not use simulated background.

That depends on the analysis. If you're looking at a partially-reconstructed
decay (common in semileptonics) then you can't rely on the regular trick of
choosing a sideband sample to work as your combinatorial background. Also,
it's very common to model specific misidentified or partially-reconstructed
backgrounds using simulation.

> In any case, the more complex your model gets (number of variables) the
> exponentially more simulated Monte Carlo events you need to fill that
> multidimensional space.

I understand this argument if you're trying to model the efficiency in _n_ D
space (through splines, histograms, moments etc), but that's usually done when
you're fitting to _n_ variables, e.g. in an amplitude fit. If you just want
the efficiency of a cut on the score from an MVA algorithm, I don't think it
matters. What definitely matters is that the behaviour on simulation
reproduces that of real signal as faithfully as possible.

~~~
konschubert
> I understand this argument if you're trying to model the efficiency in nD
> space (through splines, histograms, moments etc), but that's usually done
> when you're fitting to n variables, e.g. in an amplitude fit. If you just
> want the efficiency of a cut on the score from an MVA algorithm, I don't
> think it matters. What definitely matters is that the behaviour on
> simulation reproduces that of real signal as faithfully as possible.

You might have a point there.

------
DrNuke
Hard hard problem, good luck to competitors. The new Kaggle-Google is a
cheapskate trap but still fun a couple hours a week reading the forums and the
most brilliant kernels.

------
mkagenius
Its very economical for companies to have 100s of AI programmers work on this
(for a few of months) for $25k.

But surprisingly the value they get out of it is certainly only 1 team's work.

~~~
itissid
Can't they ask the top K teams for their solutions. Offer some smaller prize
to them. I believe Kaggle supports that. Even if you have 10 top teams of 5
people each and gave them a sum of 300K total its damn cheap than hiring 4 or
5 full time engineers working for < 1 year.

~~~
jononor
Usually in Kaggle they ask every team elegible for a prize for their solution.
I believe it is just as much to ensure there was no cheating/hacking, than to
actually make use of the solution itself.

------
dean177
“The top three performers of this phase hosted by Google-owned company Kaggle,
will receive cash prizes of US$12,000, $8,000 and $5,000.”

“incomparably more difficult”

~~~
lima
It's a competition, people aren't doing it for the prizes.

~~~
mkagenius
Yes, but many certainly are doing it for the prizes.

~~~
danbruc
I am not sure, that sounds like a bad idea. Your chances of being one of the
three wining teams are not terribly high and even if you manage to win, $12k
for three months of work is not that much even if you work alone. Assuming
there will be someone submitting a solution obtained using state of the art
algorithms, it would probably be pretty naive to assume you have any
significant chance of winning by just spending a couple of hours and throwing
some generic machine learning algorithms at the problem.

It immediately looked like a really interesting challenge to me, but after
reading a bit about the state of the art it seems like three months is a
pretty short time to come up with a meaningful result even if you could work
on it fulltime. Many people already invested a lot of time in that problem and
existing solutions are quite sophisticated and good. The material actually
mentions that they expect that you will have to take into account things like
adjacent detectors overlapping by a few pixels or how particles may light up
several pixels if they hit the detector at very shallow angles and cross
several pixels as they pass through the detector.

The first thing someone probably considers is something like a Hough
transformation and it turns out the creators of the challenge mention that in
the material and submitted a solution based on this as a bench mark which
achieves a score of about 20 %. If I read the related documents correctly, a
meaningful result will require a score of at least about 90 % and the state of
the art would probably be somewhere around 95 % to 98 %. The current leader is
at 26.48 %, admittedly the challenge is only 5 days old. I am really curious
where the scores will be at the end.

~~~
goldenkey
The US dollar is a very sought after commodity in 3rd world countries. 12k
might be a crapshoot to you but 10 years of wages for a smart kid in another
country

~~~
rbanffy
Not nearly desirable enough to motivate someone to invest the time and
resources (which I assume would be non-trivial). 12k for 3 months of work
amounts to 4k per month, which is a low salary for someone with the kind of
skills this requires.

~~~
beojan
Even $5k for three months is comparable to the low end of the range for PhD
stipends, and $12k is well above the high end.

------
amenod
Someone posted what looks like a very nice summary of domain knowledge:

[https://www.kaggle.com/pranav84/beginner-s-guide-to-cern-
s-p...](https://www.kaggle.com/pranav84/beginner-s-guide-to-cern-s-particle-
tracking-data/notebook)

~~~
jononor
Here is a very to-the-point summary of particle tracking:
[http://www.physics.iitm.ac.in/~sercehep2013/track2_Gagan_Moh...](http://www.physics.iitm.ac.in/~sercehep2013/track2_Gagan_Mohanty.pdf)
\- link found in Kaggle forums

------
mattheww
The main thing of interest here is whether a system that knows about physics
(traditional approaches) can be beaten by a system that has no a-priori
knowledge of physics (OOB DL) or if someone will find a way to integrate
physics knowledge into a DL approach.

------
jononor
Curious to see what kind of techniques end up doing best on this. Worked on
using GPUs in this areas back in 2010 for the ALICE project. Which was pretty
straight port of an existing algorithm for finding certain phenomena given
tracks. I believe the track reconstruction then used a multistage process
ending with a Kalman filter.

------
jaddood
I am sure the people at CERN have reasons in choosing machine learning for
that purpose, but I don't see it ideal for this case. Machine learning might
be able to give you your results faster, but generally with a lot lower
accuracy. But suppose the teams got to a really good accuracy in their
programs, it's still for the known and expected. In particle physics research
— or any research for that matter — you cannot simply give it a shot, because
your system might miss very interesting collisions while you think it's
working perfectly fine. I think quantum computing can be really quite useful
here given that the analysis can benefit from parallel processing. Should ML
be necessary, I recommend develop it in a way that its doings are
understandable. (I'm not sure they'll even read the comment to get the
recommendation, but let's give it a try.) Explainable AI is becoming really
significant these days, and some big companies and organisations are working
on it. If I'm not mistaken, the DARPA is working on a project as such, so it's
really not too far off.

------
nonbel
>"In the new problem, she says, you have to find in the 100,000 points
something like 10,000 arcs of ellipse."

Does anyone know how long this currently takes?

~~~
danbruc
That certainly depends on the amount of computing power you can throw at it
and what level of accuracy you are trying to achieve. But the most naive
attempt using a Hough transformation will probably take a couple of
milliseconds on your average computer. However, the accuracy will be far from
good enough with such a simple approach.

I have no idea what amount of time state of the art algorithms will use, but I
could well imagine that it is essentially a tunable parameter that is chosen
to get the best accuracy given the amount of data you have to process and the
time available to complete the task.

~~~
mattheww
Hough transform is basically useless for precision tracking in a high
multiplicity environment. Tracks have 5 degrees of freedom, so the memory
costs make it infeasible. I think ATLAS uses it in the trigger, where you
don't need to actually reconstruct all tracks, just find out if there are a
couple passing certain criteria.

~~~
danbruc
You don't have to accumulate the transformation result into an actual five
dimensional array, you can just transform the points, keep them in a list, a
quad tree, or whatever you like, and then just run a clustering algorithm on
the transformed points. Probably complicated by the fact that every point can
vote for several parameters so that you are not actually clustering points but
something like lines or planes associated with the points.

That seems also to be - but I did not look at the code - more or less what the
creators of the challenge implemented and submitted as a benchmark
implementation, admittedly with the expected poor performance score of only
about 20 %.

~~~
mattheww
If you know a way to find clusters of intersections of hyperplanes, I'm pretty
sure you can get a highly acclaimed paper out of it, but that is not what a
Hough transform is. The Hough transform is an approximate solution to that
problem which works by sampling the hyperplanes and then polling them. There's
no way to perform a Hough transform without having many more transformed
points than input points, and the more precision you need, the more points you
need.

~~~
danbruc
I see, they actually did two different implementation, one DBSCAN based
clustering approach, one based on a Hough transformation, and submitted the
former one as a benchmark.

Not withstanding that, I am not yet convinced that a Hough transformation
combined with something similar to a quad tree could not work. More
specifically I am thinking of delaying the creation of votes. Roughly the
first point just becomes a node in the tree corresponding to the bounding box
of its entire possible parameter space. Only when we encounter a second point
whose possible parameter space overlaps with that of the first one we split up
the two volumes into one volume for which both points vote and a few volumes
for which only one of the points vote.

This obviously requires that the possible parameter spaces do not have
terrible shapes that are hard to bound and I also could see nearly perfectly
overlapping volumes cause issues due to the generation of many small volumes
for the imperfection in the overlap. There are probably more issues and
possibly even show stoppers, but without picking up a pencil and really
thinking about it, I not really tell whether or not it could work out. But, as
said, I am also unable to see immediately why this could never work.

------
JumpCrisscross
Out of curiosity, what fraction of the Bitcoin network’s computing power would
solve this computation wall?

~~~
amelius
Is there even a bound on the computational power we'd need to make an
advancement here?

~~~
jrq
No, but CERN Is known to be careful with their PR. Presumably, and this is
just my intuition speaking, a big enough cluster of computers would solve
this, but they're taking an opportunity to experiment with different
techniques and methods for this experiment, and that's pretty much it.

If CERN had an unlimited budget, I suspect they'd do it however they did it
before.

~~~
Analog24
This is not a problem that can be solved just by throwing more compute
resources at it. It's not as simple as simply having too much data to process,
the real issue is that each detector has a time resolution that goes down to
about a nanosecond. If you get one collision per nanosecond then it's pretty
straight forward to associate every one of the (possibly) million detector
hits to a single event and reconstruct it accordingly. The issues arises when
you have more than one event (i.e. collision) within each nanosecond window.
You end up with detector readings for each event overlapping each other
without a simple way of disambiguating them. This is called "pile up".

When I last worked there in 2015 a typical pile up situation was having about
50 collisions per detector reading. It is no simple problem to simultaneously
reconstruct 50 collisions from the same set of overlapping detector
measurements.

~~~
dukwon
From what I've heard, the amount of pile-up ATLAS and CMS can handle is
limited by the CPU time it takes to reconstruct events, which _can_ be
alleviated by throwing more resources at it, but it is much better to develop
quicker reconstruction algorithms.

Towards the end of last year, they had to start levelling the instantaneous
luminosity to 75% of what they could achieve,† primarily to reduce the load on
the grid.

† Edit: the maximum peak luminosity is still 200% of the design value, so the
performance is beyond initial expectations.

~~~
mattheww
To be fair, 50 PU is above design peak luminosity, much less mean. And I'm
sure I've seen plots from both ATLAS and CMS at the end of LS1 that show
improvements in the processing time at 100 PU by factors of roughly 10.

