>In 2013 DOE presented their exascale vision of one exaFLOP at 20 MW by 2020.[6] Aurora was first announced in 2015 and to be finished in 2018. It was expected to have a speed of 180 petaFLOPS[7] which would be around the speed of Summit. Aurora was meant to be the most powerful supercomputer at the time of its launch and to be built by Cray with Intel processors. Later, in 2017, Intel announced that Aurora would be delayed to 2021 but scaled up to 1 exaFLOP. In March 2019, DOE said that it would build the first supercomputer with a performance of one exaFLOP in the United States in 2021.[8] In October 2020, DOE said that Aurora would be delayed again for a further 6 months and would no longer be the first exascale computer in the US.[9] In late October 2021 Intel announced Aurora would now exceed 2 exaFLOPS in peak double-precision compute.[10]
So you announced a machine in 2015 to be delivered in 2018, so a project to take 3 years. In 2017 you then announced you're not going to deliver it at all. You announce a delay to deliver something different in 3 years. That's not a delay, that's just scrapping the project and starting a new one. Just because you call it the same thing doesn't mean anyone believes you.
I do wonder how much cash Intel rinsed out of the DOE for delivering nothing. You would've thought that since the DOE announced a target of 1 Exaflop in 2020, and Intel delivers 0 exaflops, that Intel would make a big loss on this, I bet they didn't.
To make it even more confusing, FLOP is sometimes used to mean "floating point operation", so it can be correct to say that a computation takes 1000 FLOPs. A machine with 2000 FLOPS would do 1000 FLOPs in half a second.
Is there any record of the DoE _not requesting that the first announced specification didn't require a leapfrog step up due to the shifting industry performance and especially performance per Watt targets, roadmaps, application specific development or any bureaucratic and budgetary revision reasons?
(Post Sandy Lake and AMD 7001/2 AVX512 timeframe, GPU market including crypto demand wafer budgets, 100Gbps + and Omnipath spinkill / Slingshot introduction interconnect and even plain old changes in the codes desired to be run.
Oh yeah, good point. I was conflating the DOE grant from years ago with the most recent investment in US semiconductors (which I personally think amounts to a shareholder bailout, hence my jibe).
Twitter fails public discourse. What a painful format to read a presumably thoughtful statement from an intellectual. This medium is what our society has come to for public intellectual discourse in the sciences?
Really, can't understand why academia (or anyone frankly) chose Twitter as a means to do public communications. Character limit, poor access control capabilities, actively blocking people without accounts from reading. Like, every aspect seems to be worse than vintage blogging platforms, apart from the fact that somehow everyone is on Twitter now.
>Massively bigger network than the blog-o-sphere provides more value.
I question the value. Yes, there's discovery but nothing is keeping anyone from clicking through a link from a short summary. And, if people don't? Personally I'm OK with that. No one is paying me for eyeballs. Way too many people are obsessed with pageviews even if they don't get compensated for them.
Well, you can't reach me through Twitter. I think the "everyone" you refer to is "everyone who thinks they're important". Lots of politicians, journalists, "celebrities" and academics.
I tried to read the tweeter's opinions, but gave up after three sentences (3 tweets). That's a truly awful way to consume a reasoned argument.
I think I'm mostly confused about academia really. I get it, it's a nice and easy format to quickly share one's thoughts, so masses are there. Now people who need to speak to the masses (celebrities, politicians) are also there. But academia should be different as I though. Post like this one seem to be rather belonging to a mailing list or a specialised blog or whatever good format that were invented on the internet. Instead it is composed as a series of linked SMS-sized messages on a walled-garden type of platform.
It's still better than Facebook and LinkedIn for login, because I can read a tweet without login at least. Show reply requires login but who cares garbages?
What a painful format to read a presumably thoughtful statement
I see this sentiment posted often on HN and I'm hoping you can help me understand this. I never use Twitter so am not familiar with its interface. When I click on the link in the OP I see a very easy to ready list of 13 contiguous statements.
I'm using Microsoft Edge on Windows 11. I'm curious what you see that makes the format so painful to read?
Not the OP, but I also find it weird - content is broken based on text length, not logical separation and scrolling down at some point is freezing the page with a prompt to login or create an account (I don't have one and I don't want one). The freeze for login is the more annoying one.
All of the statements are small and almost have to be encapsulated to make sense. This robs a lot of nuance and background information that is assumed you would know, or you have to dig up (with little help from the Tweet), to appreciate most of the context. Without it, all I get from the tweet is "Intel Bad".
To take a couple of examples fromm the Tweet:
> Yes, Intel is shipping to Argonne now. Great. How much of it? All of it at once?
I have no context for this. Presumably Intel Argonna is an HPC, and I suppose it is late. Is it late by a year, 5 years, 10 years? Is it worse than something else? What am I supposed to make of it? I have to dig up a bunch of background info to understand what this means.
> But I do know the worst thing Intel could do—did do—this week: host a glittering event that heralded an era of sparkling, Moores Law defying technical progress that for people in the HPC camp found *painfully tone deaf.*
Ok...what event? Could you please name it?
3 tweets later...
> This year at the annual Supercomputing Conference
Ok....maybe looking "annual Supercomputing Conference" will find something, (which it does). But There is nothing on "host a glittering event".
I could go on...but you see my point (hopefully). So now I have a bunch of stuff (that I cannot easily verify) from someone I don't know, and presumably I am supposed to take it all at face value.
This person seems like they know what they are talking about, but considering the terrible Signal-to-Noise Ratio of Twitter, I am not going to accept anything on there at face value.
Now imagine the author was slighty slower at typing and some randoms put the comment in the middle of that split-up comment, without even getting to read the whole
I can imagine that, I just don't see it. Every single time someone is complaining about this format it is a contiguous thought stream with no comments or other breaks in between for me.
Can you tell me what client you are reading this in that is showing comments in between the original content? Again, I am simply clicking on the link from OP on Windows 11 with Microsoft Edge. If I view it in Chrome it looks to be in the same identical easy-to-read format. I'm wondering what others are using that are making it so difficult.
Twitter tries to show the original author's thread-tweets uninterrupted. However, sometimes the thread gets broken -- generally in very long threads -- and it starts showing replies from other users rather than the next tweet in the thread.
It's not happening here, so I'd assume that the person you're responding to has just had a few bad experiences trying to read threads like that in the past.
(It happens with the twitter web client and app, so I'd assume it's something to do with how they build the thread server-side, rather than being something where it's meaningful to ask about the client being used.)
Because instead of your eye naturally flowing down the text you get a sentence or two and then have to jump down to find the next sentence or two which while remembering the context. They aren’t contiguous at all.
Also because of the constraints of tweet length the sentences are chopped short rather than being whatever their “natural” length, short or long, would be.
The point of writing as the GP suggested is to minimize the friction between written text and comprehension/reasoning. Twitter is designed exactly against that, so the result is a bunch of slogans or tiny sound bites.
That's what I'm saying. They are completely contiguous and easy to read and follow for me when I view the OP with Microsoft Edge or Google Chrome on Windows 11.
I'm trying to figure out - what are you viewing the OP with to make it not contiguous?
I recommend the "Privacy Redirect" browser extension which redirects Twitter to Nitter, which is incomparably more usable, especially not logged in. It will also redirect YouTube and Google Maps to Invidious and OpenStreetMaps respectively (and other services as well). Main issue is that the provided list of instances is not reliable so some weeding out is required at the beginning.
Matter Reader App [0] also creates a more readable version if you share a link to a twitter thread. I was a ThreadReader user until Matter added this feature.
Maybe I'm wrong but the writing seems to blame here not the medium. I'm ignorant of the topic and had a hard time even understanding what the person was writing about. From my limited understanding, it would have been more informative to say "Intel was supposed to deliver a supercomputer but is late so now the USA is falling behind China in supercomputing." However, maybe I'm missing the actual content though and I won't disagree that the Medium is the Message.
> Please don't complain about tangential annoyances—things like article or website formats, name collisions, or back-button breakage. They're too common to be interesting.
how else would you make 13 related statements to the broadest possible group, allowing them to react to each one? is your lack of understanding a platform because you don’t like it showing?
Yes, how did discourse ever happen prior to twitter? We were all just cavepeople, smearing things on walls, until twitter saved us!
I also like that you think any of the 13 statements is even worth individually responding to, when in fact, each one is written in a way to drive reaction, rather than be part of an overall coherent story or thought.
They happen to be a reasonable writer, so they can make it overall seem reasonably coherent, but it's also 100% clear the writing would be better if they weren't trying to generate 13 statements that can be individually responded to.
I find it a bit hard to understand the drama. A private company failed to deliver some speculative future technology according to expectations (and/or agreements?). Doesn't that happen all the time? If it is so important to have this supercomputer, isn't it the (US) government that "failed science"? Eggs in baskets and all that.
A few post docs bet their careers on the availability of an exascale computer in 2018, and they paid the price for it too. This tweet thread is from one of them.
According to the tweets, researchers who reserved time for their projects on the supercomputer had to sign NDAs so they cannot complain about the delays... not exactly a good look.
It should be noted that during that 4 year delay the computer was also upgraded by an order of magnitude (180 petaflops -> 2 exaflops). Intel deserves criticism for their various delays for sure, but the proposed 2018 machine and the shipping 2022/23 machine really are only the same in name.
> The U.S. should have had two exascale supercomputers on the Top 500 in 2021. And Aurora, which has been plagued by Intel’s delays since 2017, should have been first—well before “Frontier” at Oak Ridge National Lab appeared. But Intel dropped the ball over. And over. And again.
It does acknowledge that recent deliveries will allow the project to place on the current supercomputer list by virtue of finally being able to run LINPACK tests ... but that's still a few years late and well short of what was promised to be delivered, tested, bedded, and deep into working by now.
To hazard a guess - the delayed delivery of Aurora to Argonne National Labs, originally planned (in different form) for 2018 and now expected "late 2022"
I usually hit HN in the am as I scan for news feeds and such. Didn’t expect to see my tweet here.
This was written for my followers on Twitter, most of whom are in supercomputing. I understand how it might seem a bit over the top to anyone outside of HPC, but let me explain why I wrote it (not expecting it would have more mainstream views).
People in supercomputing already know what I’m talking about so I didn’t give context.
I’m glad some of the respondents here provided good background on the extended delays and the impact.
The thread was written because a few months ago after a researcher let me know her work would not be completed since it hinged on Aurora’s completion. With Frontier (at Oak Ridge National Lab) oversubscribed, she can’t finish her work and her immigration status is in jeopardy.
I felt badly for her but figured it was a one-off. Then I started to talking to more and more people. I never really connected the employment/post-doc/early career HPC researcher with problems with a machine.
The employment and research and academic ‘economy’ around big systems like this is real. Even though by 2021 everyone figured the system wouldn’t be ready by target, it still created a mess. I talked to more, then more people about what the delay meant for their careers.
They were all terrified to say anything publicly. Argonne’s under strict NDA and so is everyone else. Further, everyone was worried about losing funding or “being problematic”
Since I’m not beholden to NDAs and have the freedom to say tough things about big companies and institutions I did. I did it so the people in the same boat as the people I talked to know that their career-jerking issues have been publicly noted. I considered a more formal article for TNP but it didn’t feel like a fit.
What really pushed me to write the thread was all the hype around Intel Innovation this week. They dropped the ball over and over again in HPC since 2017. They hurt people’s careers with these delays. Yet to see them celebrate shipping of a processor that is so far behind (as were the promised ones before it that were to comprise Aurora) and to talk about zettascale ambitions when they have struggled to get a single machine out four years late. It just needed to be called out.
I wanted people to see also that processor delays affect far more than server roadmaps.
And yes, I understand semiconductor delivery and how difficult and unpredictable it is. It was the lack of transparency and the painfully awkward way NO ONE talked about it. Except most journalists who mentioned it offhand but didn't fully express what was at stake in terms of people costs.
So while I’m kind of horrified the tweet for HPC people took off, the positive side of it is that people might think about it this more deeply when they hear “processor X has been delayed” because for comp sci and application domain scientists, the career disruption can be (and should be communicated as) dramatic.
I have a pretty packed workday but can check in and answer/give more context as I have time/can.
i don't understand the concept of needing a specific amount of performance in a supercomputer to do research. You go to publication with the system you have. I;m sure some researcher calculated that if they had X more resources, their simulation would be able discover new things in regime Y, but you know what? Often computer scientists find ways to solve problems that traditionally required a supercomputer- but can be solved on a desktop.
Chasing larger supercomputers is a folly, one we do to remain visually competitive with other countries who want to take the US's crown. But they benefit very few while costing a lot- not a good investment except for problems like "I need my submarine to be 5% more streamlined to remain competitive".
It's not necessarily about the machine's raw performance, but capacity. The compute is a shared utility, and the utilization of supercomputer is quite high. We expect science to grow exponentially. So, compute demand grows exponentially too. The cloud is crazy expensive compared to supercomputing -- you don't want missing compute to hold back research.
If science grows exponentially (rather than linearly or sublinearly), you're screwed.
I don't think the cloud is expensive compared to supercomputing- so long as you consider TCO and you're doing proper deal negotiation. I would much rather than an evergreen cluster in the cloud with fast interconnect that I turn on for myself and shutdown when I'm done, than share some resource and try to keep utilization high.
This is an odd take. Do we even need 6GHz CPUs or should we be content to stay with 500MHz and better algorithms? There's a limit to what algorithmic improvements can do and improving algorithms is hard. If I had a 6,000 core 60THz computer I could make use of it.
Good question. It's because some extreme scale computer science and application work can only be valid with the high core counts/networking available on what is now just one single supercomputer in the U.S. - That machine is now booked, leaving many unable to complete research or have to wait until 2024. Some research can only be done with that scope of system, and people have been waiting years to test exascale software on this machine. I hope this answers the question.
I question your fundamental premise. For example, I used to use supercomputers to run molecular dynamics simulations. When I found that the supercomputer folks didn't want me to run codes (because at the time, AMBER didn't scale to the largest supercomputers) on their machines, I moved to cloud and grid computing and then built a system to use all of Google's idle cycles to run MD. We achieved our scientific mission without a supercomputer! In the meantime, AMBER was improved to scale on large machines, which "justifies" running it on supercomputers (the argument being that if you spend 15% of the cost of the machine on interconnect, the code better use the interconnect well to scale, but can't be embarassingly parallel).
I've seen scientists who are captive to the supercomputer-industrial complex and it's not that they need this specialized tool to answer a question definitively. It's to run sims to write the next paper and then wait for the next supercomputer. Your cart is pushing the horse.
You know the term "embarrassingly parallel" but you seem to ignore that this term exits because there are other classes of problem which lack this characteristic.
Quite a few important problems are heavily dependent on interconnects, e.g. large-scale fluid dynamics and simulations that are coupled with such dynamics: aerodynamics, acoustics, combustion, weather and climate, oceanographic, seismic, astrophysics and nuclear. A primary component of the simulation is fast wavefronts that propagate globally through the distributed scalar and/or vector fields.
As long as there is a future where computers are growing to increase the scope, fidelity, and speed of these applications, there is also a need for infrastructure research to validate or develop new methods to target these new platforms. There are categories of grants that are written to a roadmap, with interlocking deliverables between contracts. These researchers do not have the luxury to only propose work that can be done with COTS materials already in the marketplace.
And conversely, if your application just needs a lot of compute and doesn't need the other expensive communication and IO aspects of these new, leading-edge machines, it _does_ make sense that your work get redirected to other less expensive machines for high-throughput computing. This is evidence of the research funding apparatus working well to manage resources, not evidence of mismanagement or waste.
One thing I've learned is that even when folks think their problem can only be solved in a particular way (fast interconnect to implement the underlying physics) there is almost always another way, that is cheaper and solves the problem, mainly by applying cleverer ideas.
I'll give (yet another) AMBER example. At some point in the past AMBER really only scaled on fast interconnects. But then somebody realized the data being passed around could be compressed before transmit and then decompressed on the other end- all faster than it could be sent over the wire. Once the code was rewritten, the resulting engine scaled better- on all platforms, including ones that had wimply (switched gigabit) interconnect. It reduced the cost of doing the same experiments significantly, by making it possible to run identical problems on less/cheaper hardware.
Second- I really do know a fair amount in this field, having worked on both AMBER on supercomputers (with strong scaling) and Folding@Home (which explicitly demonstrated that many protein folding problems never needed a "supercomputer").
I do not know much about your field of molecular dynamics. But, it is my lay understanding that it tends to have aspects of sparse models in space, almost like a finite-element model in civil engineering. Upon this, you have higher level equations and geometry to model forces or energy transfer between atoms. It may involve quadratic search for pairwise interactions and possibly spatial search trees like kdtrees to find nearby objects. Is that about right? And protein folding is, as I understand it, high throughput because it is a vast search or optimization problem on very small models.
Compared with fluid dynamics, I think your problem domain has much higher algorithmic complexity per stored byte of model data. Rather than representing a set of atoms or other particles, typical fluid simulations represent regions of space with a fixed set of per-location scalar or vector measurements. A region is updated based on a function that always views the same set of neighbor regions. Storage and compute size scales with the spatial volume and resolution, not with the amount of matter being simulated. These other problems are closer in spirit to convolution over a dense matrix, which often has so few compute cycles per byte that it is just bandwidth-limited in ripping through the matrix and updating values. But, due to the multiple dimensions, it is also ugly traversals rather than a simple linear streaming problem.
Intel has been locked in MBA sociopath organizational antipatterns for a decade now. There's nothing controversial about your post, it's just stating the obvious.
As others pointed out, the real controversy should be ... did Intel just take the government money and run? Even worse, did they sit on these funds while AMD was perfectly capable of delivering exascale and prevent AMD from providing a second computer? In terms of government boondoggles it probably is peanuts though.
The most shocking part of your post is that AMD got funding to deliver a computer somewhere else. I would have thought Intel could have blocked that with lobbying.
Lobbying is just businesses asking congress people for things. You can't offer them anything of value in return, then it becomes bribery which is highly illegal. Not saying it doesn't happen, but don't equate lobbying with bribery.
>Lobbying is just businesses asking congress people for things. You can't offer them anything of value in return, then it becomes bribery which is highly illegal.
Some of those "congress people" (or their family members) were employed earlier in the firms they pass legislation for, and some are employed later. Still corruption in a different way.
As someone who spent a bit of time in academia, looking back at it, those folks (grad students and postdocs) really under-invest in understanding good career development.
Academics really should know better than to make bets like these.
When I read this, my thought isn't "Intel screwed several researcher's lives", but "Postdocs/grad students need much better coaching". Unfortunately, we know that professors are the wrong people to provide such guidance. If you want to improve things, I would suggest putting your effort into highlighting the larger problem.
Had Intel delivered, the problem I speak of would still be widespread and the hurt will continue.
when it comes to research, never put all your eggs in one basket. This is something that should be learned at the early stages of graduate school, not at the postdoc level.
It’s blows that they over promised and got bit by it. And others being bit by it also blows. But betting your academic and professional career in the promises of a third party company, /especially a semiconductor one/, is naive at best and stupid at worst.
Claiming “Intel failed science” is the worst kind of dramatic hyperbole.
Also, this whole idea that some people can’t finish their post docs etc because the computer isn’t ready wreaks of poor planning on their part simply because they were never guaranteed that their project would even properly run/work/scale/whatever on the new system without potentially years more of work.
This is why I wish this thread hadn't gone more mainstream.
It's nuanced and specific to how things work for researchers at labs.
I do not expect non-HPC folks to get why it is a big deal and why the strong language was needed. Intel failed the science community in the U.S. that relies on the limited systems large enough to handle the very few applications that consume massive numbers of cores/parallelism. That's not the whole of science, but this system was central to some of the grandest-scale problem solving that exists (think planet-scale climate simulations in high resolution).
I respect your opinion but my opinion wasn't for you. It was for the HPC comm.
You know that many of us here are HPC folks who know the nuances... and simply disagree? I mean, sure, I don't think Intel should have taken the contract or gotten any positive PR for this, but at the fundamental level, many people on this site did supercomputing and HPC at national labs or universities, and now work on machine learning HPC on the cloud. My experience spanning both makes me think that chasing time on the fastest supercomputers is not the best way for scientists to be productive.
People are too obsessed with more compute. Definitely having more clusters, ghz, flops is a good thing, but in my experience, experiments that take embarrassing amounts of computational resources are extremely low yield.
Algorithmic advances are the real MVPs and remain underrated. They are usually the prerequisites for scientific breakthroughs. E.g. AlphaFold2, MapReduce, Blockchain, etc.
... correct me if I am wrong but I believe that AlphaFold2 required massive computational resources beyond the scope of what most university groups have access to.
And Blockchain and MapReduce have absolutely 0 to do with enabling science.
Which is a pity because I generally believe your statement the algorithms matter is on the money, it's just that the examples you give merely replace one uninformed hype with another uninformed hype that's even less relevant...
Mapreduce, at the time, was actually a huge step forward for many scientific codes even if few adopted it. An enormous amount of computing can be done with the Map-Shuffle-Combine-Reduce paradigm (and the better replacements, such as Apache Beam/Google Flume). I used it (while working at Google) to reimplement a number of processing pipelines with the result of better performance and reliability compared to standard genomics codes.
Ah you're absolutely right. Those examples are more about shifting paradigms over science. I just gave a few examples that came into my head at the moment.
Some example of algorithm advances actually furthering science:
Compressive sensing - improves MRI reconstruction above Nyquist frequency
Union Quick Find - solving percolation problems way faster
KD trees - makes particle simulation a tractable problem
As for AlphaFold2 and the other DeepMind algorithms, they definitely require lots of compute but not so much that they are blocked by advances in supercomputing. This is evidenced by other teams with fewer resources starting to replicate their results.
Is Intel's delay really a showstopper for US research? According to top500.org, 32 of the top-100 supercomputers in the world are located in the US. That's way more than any other nation.
Some problems are too big to run even on these supercomputers. These machines are bound by extremely fast (speed/latency-wise) networks, and the work can't be scaled to multiple sites.
So yes, when you have a grand challenge, and no computer, you can't solve that challenge. You can continue playing with smaller versions of it though, but it won't give you the accuracy or detail you need/want.
So, the count is important, yes, but the capabilities of the clusters you have is also important.
> But at least US researchers cannot claim to be in an disadvantaged position compared to other countries?
I can't give a definitive answer to that, unfortunately. For one, I'm not working in the USA. Next, Europe has a lot of collaborations and cooperation under its belt in this regard. Lastly, China's ecosystem and prowess is completely opaque from me (both secrecy and not being interested/have time).
However, the technology is evolving pretty fast. NVIDIA's DPU, AMD/NVIDIA GPUs and AMD processors change the landscape a lot.
Intel's systems are not the best systems for heavily GPU accelerated clusters for a long time. So, they might look powerful in enterprise, but HPC is a completely different landscape. Intel is not the clear leader everything computing for some time now. They're not the worst, but they're not the best, either.
From what I understand it's not so much a "disadvantaged" problem as it is a time-sharing problem. There are a lot of demands on these and the wait list to perform calculations on these machines starts long before the machines are built.
If you're a researcher waiting on your turn to come up and it gets pushed out four years... and you still don't have an actual end in sight... Well a lot of those grants do expire and you may no longer be able to pay for the compute at all...
You might not need a whole machine, but part of it, and your turn may never come due to limited capacity. We operate on a different model, so our problems are a bit different, however we see other sites which operate with this model, and the researchers’ feelings and anxiety about getting their time and use it to its fullest.
Expiration of compute time or the time window before finishing a research is really crippling in some scenarios.
While 77% of the entire top 500 is running on Intel, only 1 of the top 10 machines is. Of the rest of the top 10, 6 are AMD, 2 are POWER, and 1 is Sunway.
This is very dramatic. Big supercomputer projects are not actually a walk in the park, they are designed to intercept technology several years in the future that does not exist yet. It requires science and engineering to make these things. Sometimes they're late, sometimes they go over budget, sometimes they're cancelled, sometimes acceptance criteria have to be adjusted. This isn't unusual or limited to supercomputers. Look at the LHC, look at the James Webb telescope. They had delays and overruns and problems too. This isn't "failing science", it's a project having project problems.
The big supercomputers aren't all that lucrative, Intel wouldn't be swimming in profit from this thing even if it did go to plan, as it is they'll quite possibly be losing money to penalties and cost overruns.
> This isn't unusual or limited to supercomputers. Look at the LHC, look at the James Webb telescope. They had delays and overruns and problems too. This isn't "failing science", it's a project having project problems.
That is entirely uncomparable. They're not building a first supercomputer in the world, nor inventing much new tech aside from the usual progression of faster and smaller
They weren't building the first space telescope or particle accelerator in the world either. I'm not quite sure what the thinking behind pulling that one out like a trump card was.
And believing silicon technology process shrinks and new processor designs that take advantage of it is nothing much new betrays an unfortunate misunderstanding of these technologies. Smaller and faster to you looks like your phone or PC get a little cheaper and faster every few years. The technology that enables that is staggering. These push the limits of materials science and chemistry and a bunch of fields of physics relating to electronics and photonics. Designing the chip requires again pushing boundaries in hard problems in computer science and mathematics to model and compile and optimize and verify the logic.
The two most complicated machines ever made are the microprocessors on the silicon chips, and the factories which make them -- a single one of those costs double what it took to build the LHC at CERN, twice the GDP of Somalia. And that does not include all the R&D cost to reach the point they can be built.
The supercomputer required predicting these things years into the future and intercepting that technology. Intel ran into unforeseen delays in these things which derailed the supercomputer. Not all that surprising that such efforts don't always go smoothly. There are 3 companies left which can manufacture high performance chips, and many that have fallen. There are only a handful that can design high performance CPUs and GPUs.
> They weren't building the first space telescope or particle accelerator in the world either. I'm not quite sure what the thinking behind pulling that one out like a trump card was.
I thought it is obvious but... the fact new CPU generation is every few years and bigger space telescope every 30 years ? Why you claim it is because of tech based on no source whatsoever while in almost every case the reason for delays in just about any project is either mismanagement or engineering not being able to fulfill what sales signed on?
"Hurr durr big computers are hard" is not an argument. It's not even a "single machine" (which would be much more complex to realize), it's a bunch of networked nodes as most (all?) modern supercomputers are, which reduces the scale immensely, as once you get the single server it's just a question of interconnectivity (which Intel will buy from switch vendor most likely) and plumbing.
> And believing silicon technology process shrinks and new processor designs that take advantage of it is nothing much new betrays an unfortunate misunderstanding of these technologies. Smaller and faster to you looks like your phone or PC get a little cheaper and faster every few years. The technology that enables that is staggering. These push the limits of materials science and chemistry and a bunch of fields of physics relating to electronics and photonics. Designing the chip requires again pushing boundaries in hard problems in computer science and mathematics to model and compile and optimize and verify the logic.
They are nonetheless iterative. And Intel is still using essentially same process node as generation before, and same as their consumer chips. Intel wasn't inventing new material or new way to make chips for those, they planned to use same chips that will eventually land in servers. If anything it looks like Intel found a clever way to fund their new architecture...
The project already changed direction twice (from 180 petaFLOP in 2018 to 1 exaFLOP in 2021 to now 2 exaFLOP) which leads me to believe that's mostly a project management issue. CPUs the supercomputer was supposed to be built are already in sale as the last iteration was upgraded to "new" 2022 generation.
> I thought it is obvious but... the fact new CPU generation is every few years and bigger space telescope every 30 years ?
What's your question?
> Why you claim it is because of tech based on no source whatsoever while in almost every case the reason for delays in just about any project is either mismanagement or engineering not being able to fulfill what sales signed on?
I don't know what you're talking about but you don't seem to have understood what I was saying. It is a technology based project, and just like any big project which relies on advancing the state of the art (like the LHC and JWST), they can have problems including mismanagement.
> "Hurr durr big computers are hard" is not an argument.
That wasn't my argument. Was that your argument for LHC and JWST?
> It's not even a "single machine" (which would be much more complex to realize), it's a bunch of networked nodes as most (all?) modern supercomputers are, which reduces the scale immensely, as once you get the single server it's just a question of interconnectivity (which Intel will buy from switch vendor most likely) and plumbing.
I've worked on supercomputer bids before on big SSIs (the old SGI Altixes) and clusters, including one in the top 10 now, and they all run code I've written. It is actually far far more than just cabling a bunch of OTC boxes and switches together.
> They are nonetheless iterative.
Certainly not. Some design shrinks and half nodes are relatively small jumps. This supercomputer bid was likely developed around 2013 soon after Intel scaled production of 22nm, and they expected it to be on 10 or even 7nm.
> And Intel is still using essentially same process node as generation before, and same as their consumer chips.
That's because they had so many problems and delays with their silicon, that was not apparent in 2013, their roadmap blew out multiple times, by many years in the end.
> Intel wasn't inventing new material or new way to make chips for those, they planned to use same chips that will eventually land in servers.
And their server business suffered badly as well during that time, for exactly the same reasons.
> The project already changed direction twice (from 180 petaFLOP in 2018 to 1 exaFLOP in 2021 to now 2 exaFLOP) which leads me to believe that's mostly a project management issue. CPUs the supercomputer was supposed to be built are already in sale as the last iteration was upgraded to "new" 2022 generation.
Big projects have big project issues. Project management quite likely had problems, nowhere did I suggest that was not the case. just like LHC and JWST had project management failures, cost overruns, re-scoping, etc.
Between OpenMP (older computers), CUDA (Nvidia), HIP (AMD), now SYCL (Intel), not to mention more interesting architectures, DOE is probably plently busy and has plenty of very fast computers that are hard enough to keep their codes working on. Aurora being delayed has damaged Intel's reputation and gets chuckled about, but I'm not sure how big of an effect it's really had on the ground.
The aurora supercomputer is delayed 4 years...given the progress in compute I wonder whether it will be obsolete on arrival? In 4 years a lot of new competitors should have emerged, right? If my GPU would come 4 years late I would not be getting what I paid for (state of the art equipment). Or does intel continuously adapt it to new architectures?
I wonder what the supply chain is doing to supercomputer builds for the last 2 years. We were waiting on servers, then waiting on GPUs, and now we're apparently waiting on power supplies. And this is just a small addition to a much smaller 'supercomputer' (800 nodes)
Blaming Intel for US' lack of new supercomputer is a bit weak. IBM also can produce supercomputers. There are also others computer "manifacturers" in US: AMD, HP (itanium), Quallcomm.
"The greatest shortcoming of the human race is our inability to understand the exponential function”. -Allen Bartlett
What she is complaining about is that a exponential trend (Moore's law leading to more compute) is stuttering. In this case it was just for one company and the world was lucky and had backup options. But undoubtedly there will be more stutters in the future, and there will not always be backups.
Science relying on an exponential trend like this it will have more challenges in the future.
Same for all the the other trends built around brute force compute like "big AI" deep learning or a lot of the wasteful programming practices that fill the front pages of HN regularly.
Intel is meant to do a process shrink every 2 years, which is meant to increase compute density and lower power. The last time they successfully did that was 2014 with 14nm, and since then they spent about 7 years getting to 10nm, and even now I'm not sure they're completely there. So basically all the assumptions about anything built with Intel CPUs getting faster were blown out of the water.
Her friend's careers are delayed because Intel keeps promising a super computer and they are allotted time with it to work on their research. The super computer keeps getting delayed year after year.
"Why doesn't she apply there and do a better job herself" - it's so simple!
Because not every scientific library in the world is ported to GPGPU (or even can technically be). CPU-heavy computing is a rule, not an exception.
E.g. in computational fluid dynamics, there are not too many GPU-ready codes available, and most of those are either beta-quality or already abandoned.
Agreed. In computational biology it is similar. About 5-10 years ago there was huge excitement over GPUs and there were things like GPU-accelerated BLAST (program for comparing DNA/protein sequences against each other) and the like, but it really didn't revolutionize the field in general. HPCs like Biowulf (the HPC cluster used by the US National Institutes of Health, a pun on "biology" and "Beowulf") is overwhelmingly CPU-based with a few GPUS that you can request.
Aurora was supposed to be intel GPU for most of the performance. And in general supercomputer means very fast network between nodes as otherwise most of the time is spent in communication. And finally it's amdgpu holding the performance crown for hpc, not nvidia.
>In 2013 DOE presented their exascale vision of one exaFLOP at 20 MW by 2020.[6] Aurora was first announced in 2015 and to be finished in 2018. It was expected to have a speed of 180 petaFLOPS[7] which would be around the speed of Summit. Aurora was meant to be the most powerful supercomputer at the time of its launch and to be built by Cray with Intel processors. Later, in 2017, Intel announced that Aurora would be delayed to 2021 but scaled up to 1 exaFLOP. In March 2019, DOE said that it would build the first supercomputer with a performance of one exaFLOP in the United States in 2021.[8] In October 2020, DOE said that Aurora would be delayed again for a further 6 months and would no longer be the first exascale computer in the US.[9] In late October 2021 Intel announced Aurora would now exceed 2 exaFLOPS in peak double-precision compute.[10]
So you announced a machine in 2015 to be delivered in 2018, so a project to take 3 years. In 2017 you then announced you're not going to deliver it at all. You announce a delay to deliver something different in 3 years. That's not a delay, that's just scrapping the project and starting a new one. Just because you call it the same thing doesn't mean anyone believes you.
I do wonder how much cash Intel rinsed out of the DOE for delivering nothing. You would've thought that since the DOE announced a target of 1 Exaflop in 2020, and Intel delivers 0 exaflops, that Intel would make a big loss on this, I bet they didn't.