1EB with only 30k users, thats a wild TB-per-user ratio. My frame of reference; the largest storage platform I've ever worked on was a combined ~60PB (give or take) and that had hundreds of millions of users.
When experiments are running the sensors generate about 1PB of data per second. They have to do multiple (I think four?) layers of filtering, including hardware level to get to actual manageable numbers.
It depends on which experiment. We call it trigger system. And it varies according to each experiment requirements and physics of interest. For example LHCb is doing now full trigger system on a software side (No hardware FPGA triggering) and mainly utilizing GPUs for that. That would be hare to achieve with the harsher conditions and requirements of CMS and ATLAS.
But yes at LHCb we discard about 97% of the data generated during collisions.
>1EB with only 30k users, thats a wild TB-per-user ratio.
33TB per user is a lot, but is it really "wild"? I can fill up well over 33TB of pro-camera photos in less than a year if I shoot every day. I'm sure scientists can generate quite a bit more data if they're doing big things like CERN does.
Rucio enables centralized management of large volumes of data backed by many heterogeneous storage backends.
Data is physically distributed over a large number of storage servers, potentially each relying on different storage technologies (SSD/Disk/Tape/Object storage) and, frequently, managed by different teams of system administrators.
Rucio builds on top of this heterogeneous infrastructure and provides an interface which allows users to interact with the storage backends in a unified way. The smallest operational unit in Rucio is a file. Rucio enables users to upload, download, and declaratively manage groups of such files.
Tape and off-site replicas at globally distributed data centres for science.
Of the 1EB a huge amount of that is probably in automated recall and replication with "users" running staged processing of the data at different sites ultimately with data being reduced to "manageable" GB-TB level for scientists to do science
Yup, lots of tape for stuff in cold storage, and then some subset of that on disk spread out over several sites.
It's kinda interesting to watch anything by Alberto Pace, the head of storage at CERN to get an understanding of the challenges and constraints: https://www.youtube.com/watch?v=ym2am-FumXQ
I was basically on the helpdesk for the system for a few years so had to spend a fair amount of time helping people replicate data from one place to another, or from tape onto disk.
For experiment data, there is a layer on top of all of this that distributes datasets across the computing grid. That system has a way to handle replicate at the dataset level.
> over the years what discoveries have been made at CERN that have had practical social and economic benefits to humanity as a whole?
Some responders to the question believe I was criticizing a supposed wastefulness of the research. Not knowing the benefits of the discoveries in high energy physics, ie the stuff the accelerators are actually built to discover, doesn't mean I was criticizing it.
Responses referenced the contributions the development of the infrastructure supporting the basic research itself have made, which is fine, but not the benefits of high energy physics discoveries.
So to rephrase the question - What are the practical social and economic benefits to society that the discoveries in high-energy particle physics at institutions like CERN have made over the years?
This is not just in relation to CERN, but world wide, such those experiments which create pools of water deep underground to study cosmic rays etc.
So, a large chunk of the benefits are more in the form of 'side-effects' rather than directly fueled by particle physics discoveries. This is kind of by definition, since the point of particle accelerators as powerful as the LHC is to replicate conditions that cause subatomic particles to fall apart. Same goes with things like neutrino detectors or gravitational wave detectors, they're all looking for things that are barely observable with those engineering marvels, we're a long way away from being able to exploit their discoveries economically.
One of the biggest and more 'direct' social and economic benefits (in the sense of being directly associated with high-energy particle physics) would be the development of synchrotron light sources, which are a lot more direct in their contribution to society. In typical particle accelerators, the emission of synchrotron light is a negative effect, but it turns out to be pretty valuable for material science. These facilities are usually so busy that they have to pick and choose the study proposals they accept. As an example, some of the early understanding of Covid-19's structure came from studies at synchrotrons. More recently there are startups which are attempting to use synchrotron tech to sterilize things.
Besides that it's all mainly indirect effects. A lot of the cost of building and updating these sorts of machines is towards developing improved sensors, cryonics, magnets, optics, control systems, networking systems etc. These all feed into other fields and other emerging technologies.
> Responses referenced the contributions the development of the infrastructure supporting the basic research itself have made, which is fine, but not the benefits of high energy physics discoveries.
I was one of those responders.
There were two very deliberate reasons I specifically avoided talking about particle physics:
1. I interpreted the tone of the original question to be extremely highly cynical of any scientific contribution particle physics has made, so I instead went for 'consequential' things. More excitement around education, outreach, and other adjacent aspects that are beneficial to humans. I did this to potentially avoid, "How has discovering a new Boson made my rent cheaper?" types of arguments that are only ever made in bad faith, but have been made to me a disheartening number of times in my career; and
2. I am scientist and I have collaborators and colleagues at CERN, but I'm not a particle physicist and so I didn't feel adequately qualified to highlight them. I was expecting someone with more expertise would jump in and simply do a better job than I ever could.
If I interpreted the tone of your question incorrectly, please understand that it wasn't an intentional sleight on you, and simply an artefact of a) plain text being an incredibly poor medium for communicating nuance; b) a defensive measure that I have had the displeasure of dealing with in the past. And if you were genuinely curious, that's wonderful, and I'm sorry that I didn't offer you more grace in my response.
You're probably getting replies like that because it's a bit of an odd question. Academic research isn't really done to achieve a particular purpose or goal. The piratical benefit literally is academic.
It's also one of the first questions from people that very much are criticizing, so even if it was an sincere question it will be lumped together. Not recognizing/addressing this when posing the question does nothing to prevent it from the lumping.
IIRC I had issues with inotify when I was editing files on a remote machine via SSHFS, when these files were being used inside a Docker container. inotify inside the container did not trigger the notifications, whereas it did, when editing a file with an editor directly on that host.
I think this was related to FUSE, that Docker just didn't get notified.
FUSE Passthrough is only useful for filesystems that wrap an existing filesystem, such as union mounts. Otherwise, you don't have an open file to hand over.
yeah but still not great for metadata operations, no?
i remember it was really not great for large sets of search paths because it defeated the kernel's built-in metadata caches with excessive context switching?
Dumb question: is it right to think that the the experiments' results are reproducible? If so, what's the value in keeping results from distant past, given the data is generated at this enormous rate?
Well generally yes, but that isn’t how it works there.
Since the things they want to measure in their experiments are so atomically small, sensor noise becomes a huge problem.
So it’s not enough to find the sensor readings for NewMysteriousParticleX to be sure that it actually exists, it could just have been noise.
So you have to run the experiment again and again until your datapoint is statistically significant enough that you are sure, it wasn’t just noise.
A couple of years ago there was this case where they almost found a new particle, the significance was pretty close to the threshold - the problem was that this particle was not expected and would have shaken the foundations of particle physics. Some weeks later the particle has vanished back into the abyss of noise.
Somewhat off topic, but CERN has a fantastic science museum attached to it that I had the privilege of visiting last summer. There is of course Tim Berners-Lee's NeXT workstation, but also so much more. It is also the only science museum I've visited that addresses topics in cyberinfrastructure such as serving out massive amounts of data. (I personally get interested when I see old StorageTek tapes lol.) The more traditional science displays are also great. Check it out if you are ever in the Geneva area. It is an easy bus ride to get out there.
Don’t forget to visit the gift shop too.
They don’t have an online store so it’s the only place to get CERN ‘gears’.
You can easily overspend there for gifts your friends and family will appreciate (if they know and like about CERN and its missions).
What's funny is that I just visited the museum a few months ago, and am coincidentally wearing a CERN hat I got there while reading the post and comments. I also highly recommend checking out the museum!
There are also free tours basically every day, without pre-booking. The itineraries vary, but usually one of the old accelerators (synchrocyclotron) and the ATLAS visitor centre are shown.
slide 22 states that the cost is 1 CHF/TB/month (on 10+2 erasure coded disks), though it would be interesting to do a breakdown of costs (development, hardware, maintenance, datacenter, servicing, management, etc..)
1 CHF/TB/month is a bit expensive for storage at that scale, so it would definitely be interesting to see what they're spending the money on and what they are (and aren't) counting in that price.
Tape backup, accessibility, networking, availability... At 1CHF/TB that's a lot better than my local university still charging >100x that for such services internally
Economies of scale in storage are significant. Also, I don't know why you put up with your university charging 100x that when you can store things on AWS for $5-10/TB/month (or less). That comes with all the guarantees (or more) of durability and availability you get from the university.
I assume most of that exabyte is stored in 2-3 datacenters, in which case the bandwidth cost is actually relatively small. Downloading it would cost a fortune (or take an eternity), but if it stays in the datacenter (or stays in AWS), bandwidth is cheap.
The only commerical-backed storage system is the long term storage tape system. Still it has an home-made overlay API over it to interface with the rest of the systems.
All their important "administrative" stuff (think Active Directory/LDAP user database, mail and other essential services) run on proprietary storage systems from a commercial vendor (not IBM though), with enterprise support and all.
At least that was the case a few years ago when we last talked to one of the heads of IT at CERN, but I guess it hasn't drastically changed
That kind of place can draw a certain kind of employee. This finding is hard to transfer to commercial projects. Sure employees will always claim to be really motivated, especially in the marketing material, but are they we-are-nerds-working-on-the-bleeding-edge-of-human-knowledge-motivated?
Probably not, but there is surely some manager out there who made themselves believe they can motivate their employees to show the same devotion for the self-made hardships of some mostely pointless SaaS product. If you want to grab that kind of spirit, what you do needs to fundamentally make sense beyond just making somebody money.
That's exactly how we were able to go to the moon in 55 years ago. And why it's complicated today.
It was of course lot of money. But it was mostly a lot of highly skilled, motivated devoted people doing for an ultimate common goal.
Money would not have been sufficient by itself.
Since then, a LOT of the smart motivated people have been lured into either banking or adtech. The pay is good and the technical problems can be pretty interesting but the end result lacks that "wow factor".
I also read that nowadays we are more risk averse and many people/manager/companies are mostly administrators of status quo.
Pair that with lack of vision and public engagement for current challenges to humanity.
In other words, if you permit, pure capitalism isn't a sufficiently good motive to get something significant done. But of course most of us don't work towards an ultimate common goal – and neither did most people in those times. One wonders if there is enough meaning left these days to go 'round and ensure most of us feel passionate about the stuff we (have to) do. Maybe we really need a god or war or common enemy to unite all strands into a strong rope.
How much good work have the people reading this thread had to trash because it didn't align with Q3 OKRs? How much time and energy did they put into garbage solutions because they had to hit a KPI by the last day of June?
This is a great point. We work with CERN on a project, and we're all volunteers, but we work on something we need, and contribute back to it wholeheartedly.
At the end of the day, CERN wants to give away project governance to someone else in the group, because they don't want to be the BDFL of anything they help creating. It allows them to pursue new and shiny things and build them.
There's an air consisting of "it's done when it's done", and "we need this, so we should build this without watering it down", so projects move at a steady pace, but the code and product is always as high quality as possible.
CERN buddy of mine suggested that exposing a colony of physicists to elevated ambient levels of helium would trigger excessive infrastructure building behavior.
That’s a great observation, and I think generally correct, but there are private companies where that sort of motivation exists, for basically the same reason
People get this very wrong, CERN is extremely underfunded. People really don't understand how expensive running the accelerators is and most of the budget goes to that. Last years they even had to run for less months than expected because they couldn't afford the rising energy prices.
The buildings are old, the offices suck, you don't even get free coffee and they pay less than the norm in Switzerland. But they have some of the top minds working on very specific and interesting systems, dealing with problems you'd never encounter anywhere else.
I would like to yap more about the current management and their push/reliance on enterprise solutions but to cut it short I really do think cern is a net contributor to open science and they deserve more funding.
CERN was a good example of how much can be done with how little when you have the right people.
For a long time, the entire Linux distribution (Scientific Linux) used for ~15K collaborators, the infra and the grid computing was managed by a team of around 4-5 people.
The teams managing the network access (LanDB), the distributed computing system, the scientific framework (ROOT) and the storage are also small, dedicated skilled teams.
And the result speaks for itself.
Unfortunately, most of that went to shit quite recently when they replaced the previous head of IT by a Microsoft fanboy/girl coming from outside of the scientific environment. The first thing he/she did was to force Microsoft bloatware everywhere to replace existing working OSS solutions.
I think the majority of the Scientific Linux software came from Fedora/Red Hat and the Linux kernel.
Planning and managing the CERN computing infrastructure is a lot of work, then updating and releasing a famous distro on top of that was impressive.
> Unfortunately, most of that went to shit quite recently when they replaced the previous head of IT by a Microsoft fanboy(girl?) coming from outside of the scientific environment.
Painful to read so I did a short check. From a news post I don’t want to link here, but easily found searching “CERN, the famous scientific lab where the web was born, tells us why it's ditching Microsoft and helping others do the same”, direction taken in 2019 seemed quite the opposite. I am not sure how current head of IT at CERN, Enrica Porcari, fits in to the story. Insider info will be appreciated.
There was a huge initiative at CERN to move to non-MS products.
It was great actually: suddenly we were leaving behind a bunch of bloated MS cruft and working with nice stuff. As someone working at CERN I was really inspired, not just by the support for open source but by how well it all worked.
Then next thing I knew we were doubling down on MS stuff. I don't know what happened. It was sad though, and the user experience did not improve in the end.
I'm not close enough to CERN-IT to know the details. But for what it's worth, no one I knew in IT could think of a good reason for going back.
Considering how massively in bed with the U.S. government and other governments that Microsoft is, and said government has been known for keeping tabs even on allies(1), I'm sure that certain parties have a keen interest in keeping up with what's going on at CERN that's not just scientific curiosity. Strangely these Microsoft evangelists manage to pop up in organizations all the time to reverse any open source initiatives. Could just be a coincidence though.
Don’t see any previous experience at Microsoft [2]. Just a self taught fan then?
Edit: “Partnership is the art of understanding shared value. In WFP we have a number of partnerships, not many, but the ones that we have are deep, are sustained, are long-term. And definitely UNICC is one of them. Enrica Porcari, Chief Information Officer and Director Technology Division at the WFP” [1]
United Nations International Computing Centre (UNICC) is a Microsoft shop. Legit to assume, if OP statement holds true, she got the business sponsorship going while CIO at the World Food Program (WFP).
This kind of attempted executive takeover is always the strategy of a team. Who sponsored and voted for her at CERN is the real person of interest.
Joachim Mnich, Director for Research and Computing and her boss [4], holds the position also since January 2021 [1]. Mike Lamont, Director for Accelerators and Technology, also got the job at the same time [2]. Finally Fabiola Gianotti, Director-General, in 2019 extended her tenure for a second term “to start on 1st January 2021” [3].
So in 2019 the initiative to remove Microsoft began. With renewal and promotions taking in to effect it stopped. Interesting. Feeling a strong Microsoft US vs Munich DE vibe. With a twitch of IT.
“newly created CERN Venture Connect programme (CVC), launched in 2023 […] In establishing CVC, CERN’s Entrepreneurship team entered discussions with Microsoft, with the aim to better leverage the Microsoft for Startups Founders Hub“ [1].
Under the purview of Christopher Hartley, Director of Industry, Procurement & Knowledge Transfer (IPT) [2], Microsoft is gaining more footholds at CERN. Won’t be too far fetched to consider Mr Hartley and Ms Porcari as working together to achieve some sort of common good.
Also, the in-kind contributions from hundreds of institutes around the world. Much can, and has, been said about physicist code, but CERN is the center of a massive community of “pre-dropout” geniuses. I can’t count the number of former students that later joined Google and the likes. Many are frequenting HN.
Most of the people who make CERN work aren't working for CERN. The IT department is under CERN, but there are many thousands of "users" who don't get payed by CERN at all. Quite a lot of the fabrication and most of the physics analysis is done by national labs and universities around the world.
CERN budget on experiment level is being paid mostly by contributions from the institutions that is part of this experiment. I am talking about operation, R&D and this would also include personnel contributions to different aspect. There is also service work that each one of the users must do beside doing physics. I am for example work on software development stack beside my current physics analysis. Some of my colleagues working on hardware.
Then there are country level contributions that pays for CERN infrastructure and maintenance (and inter experiment stuff) and direct employees salaries.
The important point here is that (I believe) the 1.4 billion above doesn't account for all the work done directly by institutes. Institutes pay CERN, but they also channel government grants to fund a huge amount of work directly.
Most of the people I know who "worked at" CERN never got a pay check that said CERN on it.
it's only useful for getting loans that you'll pay back with a bigger loan. it's how rich people are always cash-poor but wealthy and live wealthy lifestyles.
How many people ordering a meal (often out of laziness) per day vs thinking and searching the mysteries of universe? Economically it makes sense that Uber generates a lot more of cash.
I think you misinterpreted that there shall be a correlation between _valuation_ and _earnings_. Ubers _first_ ever positive year was 2013, after 15 years in business [1] . Uber may be generating cash, but it's also loosing (lost) cash a lot faster than it was generating it. By taking 2013 as reference (~2 billion), it needs another 5 of those years just to recover from its losses in 2012 (9 billion). I understand the economics behind it, but its valuation is way out of reality.
Yes, but that still covers infrastructure (cables) and a lot of equipment for the experiments including but not limited to massive storage and tape backup, distributed local compute, and local cluster management all with users busy trying to pummel it with the latest and greatest ideas of how they can use it faster and better... Not to mention specialist software and licences. 50M doesn't go that far when you factor all of this in
That being said, though, members contribute more than money. A lot of the work done at CERN is not done on CERN budgets, but on the budgets of member institutes.
Good hiring managers can find the hidden gems. These are typically people who don't have the resume to join FAANG immediately, due to lacking the pedigree, but who have lots of potential. Also these same people typically don't last long because they do eventually move on.
Also it helps that Europe is so behind in tech that if you want to do some cutting edge tech you are almost forced to join a public institution because private ones are not doing anything exciting.
> Also it helps that Europe is so behind in tech that if you want to do some cutting edge tech you are almost forced to join a public institution because private ones are not doing anything exciting.
This is genuinely cringeworthy. Do you think that companies in the EU all use COBOL on mainframes and nothing newer than 10 years old is allowed? Airlines and banks here(!) are rewriting their apps to be Kubernetes native... And have been doing so for years. Amadeus (top 2 airline booking software in the world) were a top Kubernetes contributor already a decade ago.
The tech problems being solved at Criteo, Revolut, Thales, BackMarket, Airbus, Amadeus (to name a few fun ones off the top of my head) are no less challenging and bleeding edge than... "the Uber of X" app number 831813 in the US. Or fucking Juicero or Theranos or any of the other scams.
One wonders if things win because they really are better, or because there's sufficient financial momentum behind them. I have worked in the public sector for some years, and I don't think Europe is behind, just that the budgets are a lot smaller. If you want to capture a lot of people in an ecosystem or walled garden, you're going to need money, and lots of it. For all that's good and bad about it, most of that excess is concentrated in the US, in a few hotspots. No need to get distracted and put a flag on somebody like a Zuckerberg or Jobs or Gates though.
> and I don't think Europe is behind, just that the budgets are a lot smaller. If you want to capture a lot of people in an ecosystem or walled garden, you're going to need money, and lots of it
And the initial market you have is quite a bit smaller. Germany is the biggest EU country by population at 84 million, compared to 333 million in the US. Moving into another EU country means translating into a different language, verifying what laws apply to you, how taxes work, etc. Sometimes it's easy (just a translation), sometimes you might have to redo everything almost from scratch (e.g. Doctolib which schedule healthcare appointments, do meetings online with doctors, can be used to share test results, prescriptions - each new country they enter will have a lot of regulations on healthcare data that will need to be applied).
> You can sell everywhere without figuring out how taxes work.
You have to figure out VAT for each country. And any local regulations and laws that apply to your business.
> You think in USA healthcare is unified
Irrelevant. There isn't an entirely separate set of laws and rules that govern healthcare and related data applying to each different hospital or locale. HIPAA is the big thing that applies to everyone. In the EU, you can't even read the related laws for a different country without bilingual lawyers to translate it to you.
Just tacking some detail onto "promote open science".
CERN was/is a large early user and supporter of the open source KiCAD electronics CAD tooling. The downstream impact of improved accessibility to solid ECAD tooling has been a large contributing factor to the growing ecosystem of open electronics.
A lot of really impressive test and measurement equipment to support their research is developed in the open (see https://ohwr.org/project). People on HN are probably most likely to have heard of the White Rabbit timing project, but there's fantastic voltmeter designs, a lot of FPGA projects for carriers, gateware, fun RF designs.
There's a lot of use for the acceleration and sensor knowledge in the medical sector. Technology first developed for high-energy research can be used to improve CT scans[1], better cancer treatment[2] and so on. This goes way back.
But if Berners-Lee hadn't started the WWW, someone else probably would have within a few years: the hard part was the development of the internet, i.e., a flexible low-cost wide-area network where anyone could start a new service (look in /etc/services to see all the services that people have defined over the years) without the need to get permission from anyone.
IIRC the first WWW server went live in 1990. By then there was already WAIS, Archie and Veronica (search engines for anonymous-FTP sites). In 1991, the first Gopher server went live. Gopher grew rapidly till the late 1990s.
The US government's Advanced Research Projects Agency started funding research into "packet-switched networks" in 1960 which would eventually lead to the internet, which went live in 1969 (under the name ARPAnet, but only a pedant would say that ARPAnet is not the early verion of the internet). Then the USG continued to fund the internet every year till it no longer needed funding in the early 1990s.
So, CERN and Berners-Lee (mostly the latter because no one at CERN other than Berners-Lee cared much about the WWW in its early days before it became a big hit) get some credit for the WWW, but in my reckoning it is a small amount of credit.
A lot of the benefit has come from learning expertise in applications.
Tons of the data science tools have roots in CERN. Tons of interesting statistical methods, tons of experience R&D with superconductors and all manners of sensors.
Tons of math/ computation techniques / modeling, etc would not be here without for CERN.
It would be sort of silly to expect that any of their actual discoveries or tests of the SM would have any actual application, but the ancillary benefits are there.
> practical social and economic benefits to humanity as a whole?
Why does it have to be practical? Scientific discovery is a perfectly valid end in its own even if it only ever means that we understand the universe better
The fact that almost always scientific discovery turns out to have practical purposes in the long run (centuries, not decades) is an added bonus.
It's not like it's a huge expense either. If switzerland decided to it could cover the yearly budget of cern, by itself at the cost of a fraction of a percentage of its gdp alone
No, but had cynicism? Off the member states the highest cost per used payer is still less than a bag of peanuts each year and most people with throw that at the TV over whatever upsets them without thinking. It's collective science not big pharma which is soaks tax payer money and then sells the discoveries back to you with 1000x markup.
And yes CERN has played an important part in the scientific conversation of where we are in the universe and what is looks like. If you don't think that's important I think flat earth cults are working just as hard to derail conversations they don't want to join in good faith...
High energy physics research has contributed some technology with social and economic benefits. Some of that has been direct results coming from pure research into fundamental properties of matter and electromagnetic radiation, some are indirect results that came about because when you build an institute like CERN, it spontaneously generates advances in other areas that solve more general problems (this is known as the "collect a bunch of smart people in a single place, with a lot of resources, to solve a unique problem" strategy). But no, most of the research, pure or applied, has not really had direct practical social and economic benefits to humanity as a whole.
That's entirely missing the point. We, as a society, have decided that we will balance our economic productivity into several different areas- welfare, infrastructure, military, industry, science/research, technology. We believe that investments in areas of research which have no direct benefit still can have positive outcomes- partly through fundamental discoveries, but also enriching us as a species. We also believe these investments will ensure that we have the freedom to be productive in the future.
A cynic might even say that CERN has played a critical role in keeping people from working on military applications, or working for the enemy.
If your criticism (it's hard not to read your comment as an implicit criticism) is that we should invest the results of our productivity more directly into areas which maximize social and economic benefits- sure, this is argued about all the time. The SSC was cancelled, at least partly because people failed to see the value in having a world-class HEP facility in the US.
reply