
CERN has released 300 terabytes of research data from LHC - elorant
http://www.symmetrymagazine.org/article/lhc-data-at-your-fingertips
======
throwaway_yy2Di
Tangentially, there's even larger public datasets coming out of astronomy.
PanSTARRS' imaging survey will make 2 PB available online. They even have a
picture [0] of the completed dataset in transit, if you wondered what 2 PB of
HDD's on a flatbed looks like.

[0]
[https://archive.stsci.edu/mug/mug_2016/PS1_MUG_2016jan14.pdf...](https://archive.stsci.edu/mug/mug_2016/PS1_MUG_2016jan14.pdf#page=11)

~~~
chris_va
'Never underestimate the bandwidth of a station wagon full of tapes hurtling
down the highway.'

~~~
n00b101
2PB (5,000 lbs) / 4 days = 46 Gbps (excluding setup time)

Not bad, but not extraordinary

~~~
fra
You should consider the impact of distance here as well...

------
jsprogrammer
>“Once we’ve exhausted our exploration of the data, we see no reason not to
make them available publicly,” says Kati Lassila-Perini, a CMS physicist who
leads these data preservation efforts.

Why wait until they've exhausted their efforts?

~~~
noir_lord
They might want first crack at any major discoveries, if they miss something
then everyone else gets a crack.

Seems reasonable to me.

~~~
jsprogrammer
Politics and fame over understanding and progress?

~~~
jefe78
If you don't like it, pay for it.

~~~
pkaye
Who paid for this research anyway? Taxpayers or private organizations?

~~~
sgift
Cern is funded by its member states, i.e. taxpayers:

[https://en.wikipedia.org/wiki/CERN#Participation_and_funding](https://en.wikipedia.org/wiki/CERN#Participation_and_funding)

------
hartem_
It's cool seeing technology developed at CERN in the spotlight. There are a
lot of interesting tools developed there that can solve real problems outside
CERN and academia.

One such technology, featured in the article, is CernVM File System that is
used to distribute terabytes of scientific software to hundreds of datacenters
all over the world.

A shameless plug:

Apache Mesos recently integrated it to solve container image distribution
problem ([https://mesosphere.com/blog/2016/03/08/cernvmfs-mesos-
contai...](https://mesosphere.com/blog/2016/03/08/cernvmfs-mesos-
containers/)).

~~~
Create
_A shameless plug:_

Given that cheap and disposable trainees — PhD students and postdocs — fuel
the entire scientific research enterprise, it is not surprising that few
inside the system seem interested in change. A system complicit in this sort
of exploitation is at best indifferent and at worst cruel.

[http://www.nature.com/news/2011/110302/full/471007a.html](http://www.nature.com/news/2011/110302/full/471007a.html)

Potential missing staff in some areas is a separate issue, and educational
programmes are not designed to make up for it. On-the-job learning and
training are not separated but dynamically linked together, benefiting to both
parties. In my three years of operation, I have unfortunately witnessed cases
where CERN duties and educational training became contradictory and even
conflicting.

[http://ombuds.web.cern.ch/blog/2013/06/lets-not-confuse-
stud...](http://ombuds.web.cern.ch/blog/2013/06/lets-not-confuse-students-and-
fellows-missing-staff)

Resolution of the Staff Council

\- the Management does not propose to align the level of basic CERN salaries
with those chosen as the basis for comparison;

\- in the new career system a large fraction of the staff will have their
advancement prospects, and consequently the level of their pension, reduced
with respect to the current MARS system;

\- the overall reduction of the advancement budget will have a negative impact
on the contributions to the CERN Health Insurance System (CHIS);

[http://cds.cern.ch/journal/CERNBulletin/2015/46/Staff%20Asso...](http://cds.cern.ch/journal/CERNBulletin/2015/46/Staff%20Association/2063669?ln=en)

And a warning to non-western members:

"The cost [...] has been evaluated, taking into account realistic labor prices
in different countries. The total cost is X (with a western equivalent value
of Y) [where Y>X]

source: LHCb calorimeters : Technical Design Report

ISBN: 9290831693 cdsweb.cern.ch/record/494264

 _A shameless plug:_

The Dangers of Self-Reference

Public relations pioneer Edward Bernays refined the creation and use of press
releases.

Propaganda was used by the United States, the United Kingdom, Germany and
others to rally for domestic support and demonize enemies during the World
Wars, which led to more sophisticated commercial publicity efforts as public
relations talent entered the private sector. Most historians believe public
relations became established first in the US by Ivy Lee or Edward Bernays (he
felt this manipulation was necessary in society), then spread internationally.
Many American companies with PR departments spread the practice to Europe when
they created European subsidiaries as a result of the Marshall plan.

------
swagopopotamus
If only I could run to Fry's and buy a 300TB hard drive.

~~~
daveguy
Well, you can "run out" and buy a 180TB 4U backblaze storage pod assembled for
about $10,500. For $21,000 you can buy two and have 60TB to spare. $8,500/
$17,000 if you want to DIY. Not too bad:

[https://www.backblaze.com/blog/cloud-storage-
hardware/](https://www.backblaze.com/blog/cloud-storage-hardware/)

~~~
swagopopotamus
Whoah! There's been a lot of improvements since the initial revision.

~~~
atYevP
More to come...soon ;-)

~~~
atYevP
OK it came -> [https://www.backblaze.com/blog/open-source-data-storage-
serv...](https://www.backblaze.com/blog/open-source-data-storage-server/)

See! Soon!

------
LoSboccacc
The lhc experiments should be sensitive to a wide range of factors. I wonder
if random correlating every variation of the results from same conditions
could show some unexpected correlations like between particle path variation
and earthquake (just speculating here not putting a theory forward)

------
visarga
... on wikileaks!

~~~
Create
In keeping with this spirit, here is a reminder of how we monitor (your) CERN
activities. We monitor all network Traffic coming into and going out of CERN.

Our new analysis infrastructure will be able to cope with the automatic live
analysis of about one terabyte of data every day. All this data is stored for
one year.

Transparent monitoring for your protection

------
Achshar
This [1] is apparently the data released. I am no physicist but that page
doesn't exactly inspire awe among the curious minded.

They do explain how a couple of undergrads were able to use the data to create
something meaningful in the original release article but that specific site
can definitely use a UX designer, or two.

[http://opendata.cern.ch/search?ln=en&p=Run2011A+AND+collecti...](http://opendata.cern.ch/search?ln=en&p=Run2011A+AND+collection%3ACMS-
Primary-Datasets+OR+collection%3ACMS-Simulated-Datasets+OR+collection%3ACMS-
Derived-Datasets)

~~~
noir_lord
Indeed, I'll use the data from the other LHC with the pretty site instead.

~~~
Achshar
I don't mean it has to be pretty, but that is not even pleasant to look at. I
can provide all the useful data in the world but if it's accessibility of low
then it's value is greatly reduced.

~~~
noir_lord
I doubt it makes that much difference in reality, the value is in the data and
since this data is unique and from a single source I can't see it mattering.

Not arguing the value of accessibility but in this case it's a nice to have
rather than an essential.

