Hacker News new | past | comments | ask | show | jobs | submit login
Intel Reveals Post-8th Gen. Core Architecture 10nm+ Ice Lake (anandtech.com)
326 points by bauta-steen on Aug 16, 2017 | hide | past | favorite | 237 comments



Hmmm. It seems Atom/Electron devs need to work hard to slow down new generation processors...


Atom has actually gotten significantly faster recently. It seems they've started to rewrite some core components of it in C++ for performance reasons. I've also used some Electron apps that are very lean. I think it's more of an implementation issue. Slack is pretty terrible at resource usage, for example, but Zeit's Hyper is very efficient & it's resource usage is comparable to iTerm in my experience.

Slack's issues with their electron app shouldn't be particularly surprising, either, considering it was their head architect that published an article on Medium advocating for PHP on the basis of it's concurrency model...


> Zeit's Hyper is very efficient

Ladies and gentlemen! I present you the efficient terminal application, which only needs:

         Process        Memory      Threads
         -------        ---------   -------
         Hyper	        40.8 MB	    32
         Hyper Helper	51.9 MB	    15
         Hyper Helper	18.8 MB	    12
         Hyper Helper	15.2 MB	    4

         Total:         126.7 MB    64
On a serious note, I think the insanity will stop when the operating systems will start shaming badly written applications and nudge users to get rid of them. It is in Apple/Microsoft interest, because users will blame their computers ("My Windows is getting slow...").

The phones already show the list of power hungry apps responsible for your battery life reduction, having this on a desktop would be nice too. If even a terminal needs 64 threads and 126MB of RAM now... looks like some wrist-slapping is in order...


That's a lot more threads than iTerm uses for me, but it's less memory. Typically my work computer (a 2015 15" MBP) tends to be bottlenecked on RAM, too. 16GB sadly is pretty much the minimum viable amount of RAM for me to do full stack development these days.


I've maxed out my 64GB a few times and have seriously weighed upgrading to 128GB. Running all the development environment on my localhost (where I'm the actual sysadmin) is really handy if your sysadmins are too busy to help you with their business side of things.


macOS already does present a list of apps that are using a lot of energy if you click on the battery indicator. I thought that Windows had a similar feature, but I'm not sure.


I don't know how useful that feature is for the purpose of "shaming" given that i have never seen Slack on that list on my Macbook pro.


Slack, Chrome, and Atom are consistently on that list for me. IntelliJ/PyCharm often is as well.


Well, seeing as so many applications remain single-threaded despite years of mainstream multi-core CPUs, that shouldn't be too hard. Especially with many mentors in our industry cautioning new programmers that "concurrency is hard" so often that—at least within my small social network—new developers have been conditioned to avoid subjects like threading.

Yes, concurrency is a challenge, but it's tremendously rewarding. Plus modern languages make it much easier than it was in the past.


Don't worry, i'm sure they'll do it....


VSCode isn't similarly slow. Weird.


microsoft put in a lot of work into how vscode buffers the html its displaying for the file, and it runs very well because of it. It is actually a very cool setup; if you get a chance do an element inspection of the monaco in-browser editor, as it uses the same method. the same can not be said for atom, however.


I will say that, despite its many shortcomings and after I was able to overcome the netbook-induced Atom stigmas, I really do like the Atom for small-business NAS applications.


Pretty sure the comment was about the Atom and Electron editor / framework.

https://atom.io/ https://electron.atom.io/


Yes but I too read it twice before I got it.


As a consumer, it's great to see competition picking up in the CPU market. Intel does not seem to be able to hold the edge in node process as tsmc and Samsung are matching intel. It has to resort to architecture design.


Well actually I think it's more a case of "increase in performance is becoming harder and harder, the technology leader (Intel) is slowly becoming stuck and the competition (AMD) is catching up". Everybody will be more or less at the same level as it requires huge investments to only get a marginal advantage.

Apart from games, I would not be that afraid to install recent pieces of software (browser, office suit, os) on a 10 years old computer (2007 core 2 duo). But in 2007 I would not have thought for a minute that I could do the same on a 1997 computer (Pentium II), and installing 1997 software on a 1987 computer (80386), just no.

Last time I replaced my CPU (i5-6500 in place of i5-2500), I only saw a marginal improvement.

Even at my former job, I worked with some decent servers (32 to 48 cores, 128GB of RAMs) to build Linux masters from scratch (Gentoo based). The oldest server of the bunch (which is 5 years old now) is still the fastest to build a master from scratch, it has less cores but faster clock and even on parallel tasks like compiling stuff, clock is still a more determining factor than cores.

There are still tons of things to improve in CPUs: power consumption, cost, embedded functionalities (SoC)... but performance improvements seem a huge cost for low gains adventure right now and for the foreseeable future.


> Apart from games, I would not be that afraid to install recent pieces of software (browser, office suit, os) on a 10 years old computer (2007 core 2 duo).

Ah, I can answer that one for you: my work computer of that time (E6850, 8 GB RAM), which cost me less than a thousand euros to build back then, has since been reconverted as a work computer for one of my employee, running Windows 10, Office 2016 and Chrome all day long. Only addition has been a 120GB SSD.

It runs much better than the "modern" and "cheap" ~ 400 euros integrated work computer I bought from ASUS and HP in 2015.


yep, an SSD and perhaps more ram is the only upgrade most machines need these days.

I'd rather use an old machine that's been upgraded with an SSD than a brand new machine that only has a mechanical hdd


Gains in instructions-per-clock start to flatten out. And that's where the gains were coming from in the last years. Some time ago a paper was posted here that showed how even if you have an infinite amount of transistors, you will still be limited in the range of 3-10 instructions-per-clock for typical programs.

Clock speeds seem to have leveled and IPC will only see another gain of 50-100%. Single threaded performance is close to the limit. What after that? Is this the end?


> Gains in instructions-per-clock start to flatten out. And that's where the gains were coming from in the last years.

This is commonly claimed but it's actually false for x86_64 desktop parts. For a single core scalar integer workload the IPC boost from i7-2700k to i7-7700k was maybe 20-25% on a great day, but the base frequency increase was a further 20%, and max boost freq increase ~15%. The frequency increase is of similar importance as the IPC increase.


When was the last time we saw a 50-100% performance gain in cars? airplanes? spacecraft?

Was it the end of those industries?

Welcome to mature technology.


> spacecraft

Economically? I think this or last year.


Economically, SpaceX is about 3% cheaper than Arianespace.

That's not a 50% or 100% improvement.

Maybe they'll get that improvement once they run recycled rockets all the time, but not before that.


@mlvljr: Your account seems to be shadowbanned, I can’t reply to your comment.

Currently, SpaceX has prices around 56-62 million USD per launch of a normal satellite (with a weight and orbit where they can recover the first stage).

Arianespace launches such lighter satellites in pairs, always two at once, at a price of around 60 million USD per satellite.

The Chinese launchers offer the same at around 70 million USD per launch.

So, the prices aren’t that different.

But, for launches from reused rockets, SpaceX is damn cheap. The first launch on a reused rocket cost below 30 million USD.

So, to recap: Today, in best case, SpaceX is between 4 and 13% cheaper than the next competitor. But in a few years, once they launch mostly reused rockets, they’ll be around 50 to 60% cheaper than the next competitor.


I imagine that, while SpaceX will continue to improve their cost/kg to orbit and reach a launch expense of half the current cost with re-usables pretty quickly, until someone else can compete, they could just increase their profit per launch enormously. Musk needs some serious capital for his Mars plans. I hope his global satellite internet provider concept works (I can't wait to have a option other than AT&T or Comcast) and brings in the big bucks. Then he won't need to make money on launches and can drop the launch price on launches to close to cost to help all space activities. Maybe even start selling re-usable rockets to other launch companies. Can't wait to see that day.

Long term, Musk is shooting for a ~100x reduction in launch costs to make a Mars colony feasible. Hope he makes it.


Isn't this an even further argument for cloud computing? If cost savings all come from having more cores at the same price, but end user devices can't put all those cores to work, having more of the compute intensive work happen on the back end amortized over many end users seems like the only way to benefit from improvements in cores per chip.


Memory and storage. Still big gains to be had there. Imagine if your whole hard drive was RAM speed.

Also more specialised cores e.g. DSP, and customisable hardware i.e. FPGA.


I distinctly remember a benchmark (which my google-fu is currently unable to find) between Intel chips with and without the Iris chip. On similar conditions (clock base/turbo and core count), the Iris chip had about a 20% performance advantage.

It wasn't explained in the benchmark, but the only reason I could imagine was the Iris chip worked as an L4 cache because the benchmark was not doing graphics stuff. That is what the Iris chip does, it sits right there in the socket with a whole bunch of memory available for the iGPU or work as L4 cache if available.

It's also a great way to do (almost) zero cost transfers from main memory to (i)GPU memory -- you'd do it at the latency of the L3/L4 boundary. With intel, that unlocks a few GFLOPs of processing power -- in theory, your code would have to be adapted to work this in a reasonable way, of course.

To sum things up, I agree with you, memory is a path that holds big speedups for processors. Don't know if "the Iris way" is the best path, but it indeed showed promise. Shame that Intel decided to lock it up for the ultrabook processors mostly.


I think the end point will be a massive chip with fast interconnects and a (relatively) huge amount of on die memory talking over a fast bus to something like nvme on steroids.

My new Thinkpad has nvme and the difference is huge compared to my very fast desktop at work which has SATA connected SSD's.


GPUs:

http://michaelgalloy.com/2013/06/11/cpu-vs-gpu-performance.h...

http://www.anandtech.com/show/7603/mac-pro-review-late-2013/...

This is behind much of the interest in machine learning these days. Deep learning provides a way to approximate any computable function as the composition of matrix operations with non-linearities. It does this at the cost of requiring many, many times the computing power. But much of this computing cost can be parallelized and accelerated effectively on the GPU, so with GPU cores still increasing exponentially, at some point it's likely to become more effective than CPUs.


"Deep learning provides a way to approximate any computable function as the composition of matrix operations with non-linearities."

Thanks, and I wish this sentence was one of the first things I read when I was trying to figure out exactly what Deep Learning really meant. It's much more comprehensible than the semi-magical descriptions that seem far more prevalent in introductory articles.

It's also fascinating that a seemingly simple computing paradigm is so powerful, kind of like a new Turing Machine paradigm.


"Deep learning provides a way to approximate any computable function as the composition of matrix operations with non-linearities."

This actually describes neural networks in general, not so much "deep learning".

Deep learning comes from being able to scale up neural networks from having only a few 10s or 100s of nodes per layer, to thousands and 10s of thousands of nodes per layer (and of course the combinatorial explosion of edges in the network graph between layers), coupled with the ability to process and use massive datasets to train with, and ultimately process on the trained model.

This has mainly been enabled by the cheap availability of GPUs and other parallel architectures, coupled with fast memory interconnects (both to hold the model and to shuttle data in/out of it for training and later processing) and the CPU (probably disk, too).

But neural networks have almost always been represented by matrix operations (linear algebra), it's just that there wasn't the data, nor the vast (and cheap) numbers of parallelizable processing elements available to handle it (the closest architectures I can think of that could potentially do it in the 1980/90s would be from Thinking Machines (Connection Machines) and probably systolic array processors (which were pretty niche at the time, mainly from CMU):

https://en.wikipedia.org/wiki/Systolic_array

https://en.wikipedia.org/wiki/WARP_(systolic_array)

These latter machines started to prove some of what we take for granted today, in the form of the NAVLAB ALVINN self-driving vehicle:

http://repository.cmu.edu/cgi/viewcontent.cgi?article=2874&c...

Of course, today it can be done on a smartphone:

http://blog.davidsingleton.org/nnrccar/

The point, though, is that neural networks have long been known to be most effectively computed using matrix operations, it's just that the hardware wasn't there (unless you had a lot of money to spend) nor the datasets - to enable what we today call "deep learning".

That, and AI winters didn't help matters. I would imagine that if somebody from the late 1980s had asked for 100 million to build or purchase a large parallel processing system of some form for neural network research - they would've been laughed at. Of course, no one at that time really knew that what was needed was such large architecture, nor the amount of data (plus the concept of convolutional NNs and other recent model architectures weren't yet around). Also - programming for such a system would have been extremely difficult.

So - today is the "perfect storm", of hardware, data, and software (and people who know how to use and abuse it, of course).


I don't think GPUs are a particularly good solution for these, they aren't the future and won't be around for mass-deployment that much longer.


It seems the author is down the 'deep learning' rabbit hole.

>> It does this at the cost of requiring many, many times the computing power. But much of this computing cost can be parallelized and accelerated effectively on the GPU, so with GPU cores still increasing exponentially, at some point it's likely to become more effective than CPUs.

So can be any matrix. Sadly, there aren't as many algorithms that are efficiently represented by one.


That's quite a statement - what will replace GPUs for the ever increasing amount of ML work being done?


TPU-like chips; though they can be (partially) included on GPUs as well as is the case with the latest NVidia/AMD GPUs.


There's nothing special about the tpu. The latest gpus are adding identical hardware to the tpu, and the name "GPU" is a misnomer now since those cards are not even intended for graphics (no monitor out). Gpus will be around for a very long time, just not doing graphics.


Yep. Simply the core idea of attacking memory latency with massive parrelization of in flight operations rather than large caches makes sense for a lot of different workloads, and that probably isn't going to change.


> Some time ago a paper was posted here that showed how even if you have an infinite amount of transistors, you will still be limited in the range of 3-10 instructions-per-clock for typical programs.

Do you know what papers that was? I would have thought that with infinite transistors you could speculatively execute all possible future code paths and memory states at the same time and achieve speedup that way.


Oldie but goodie:

http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-93-6.pdf

Speculation can only take you so far. How do you speculatively execute something like:

   a = a + b[x];
?

You can't even speculatively fetch the second operand until you have real values for b and x.

Trying to model all possible values explodes so much faster than all possible control paths that it's only of very theoretical interest.


It's not the end, if we as software developers can stop counting on the hardware folks to improve performance and do the hard work necessary to parallelize our apps. (This includes migrating components to use SIMD and/or GPUs as appropriate.)


I believe the last big improvement in power consumption was (at least for Intel) was the 5th generation Haswell chips. What they have been doing is cutting back support on older chips (2nd and 3rd generation specifically).


Apart from games, I would not be that afraid to install recent pieces of software (browser, office suit, os) on a 10 years old computer (2007 core 2 duo). But in 2007 I would not have thought for a minute that I could do the same on a 1997 computer (Pentium II), and installing 1997 software on a 1987 computer (80386), just no.

People like to say a Core 2 Duo can still hang, and while it might be okay for basic tasks (and you could use a 10 years old computer in 2007 for basic tasks like word processing, internet), modern PC's are much faster.

On benchmarks you're going to be 4-6x faster on most tasks with a 2017 Macbook Pro compared to a 2007, and then don't even get started with anything that takes advantage of SIMD or GP workloads.

The other reason people say they can use a 10 year old PC today is because they've upgraded it. Early Core 2 Duo systems shipped with 512MB or 1GB RAM. They came with very slow 80GB hard drives. Upgrading to an SSD and 4-6GB of RAM is a must.


I agree, these days a Core 2 Duo or a Core 2 Quad is getting long in the tooth. Sandy Lake and beyond are still decent performing chips.

Edit: I guess I was ahead of the curve, because my Core 2 Duo box had 2 Gbs of ram out of the box (upgraded to 4 GBs of ram later, requiring me to reinstall windows to a 64 bit edition, and then swapping the chip for a Core 2 Quad).


Well, I'm not denying that, I'm exactly in that situation for my laptop:

2007 thinkpad x61, core 2 duo T7100, but with 4GB of RAM and an SSD, to be fair I'm cheating a little as I use Debian+DWM which is lighter than Windows 10.

But it holds itself pretty well and CPU limitation will probably not be the reason I change it. The screen resolution (1024x768) will probably be the main motivator.

But this only illustrates that there were some improvement because of RAM evoluation (4 to 8 GB or RAM now for average PCs vs 1 to 2 GB back then) and huge improvement because of disk evolution (SSD vs mechanical drives). CPU is far from being the main improvement factor for common usage in the last 10 years.


FWIW The technology that often makes older computers more tolerable these days tends to be the SSDs. I have a mid-2009 MBP with 8GB of RAM and a 256GB SSD that I still use for light duty web browsing and as a DAW in my little hobby studio.


> There are still tons of things to improve in CPUs: power consumption, cost, embedded functionalities (SoC)

Don't forget bugs! We have had buggy silicon, microcode and drivers, all of which probably still need fixing.


> Apart from games, I would not be that afraid to install recent pieces of software (browser, office suit, os) on a 10 years old computer (2007 core 2 duo).

The only issue I have with my old laptop from around then is browsing the web. Everything else, the performance is fine.


Maybe the gains are running out, but that song has been sung for decades and never proved true yet.

It's hard not to look at everything and interpret the marginal gains the past half decade as monopolist laziness/arrogance.


Ice Lake has been delayed from 2017 to 2018, so I'm not sure what competition has to do with it.


> it's rare these days for Intel to talk more than a generation ahead in CPU architectures.

This, to me, looks like the PS2 vs Dreamcast move, where Sony blocked Dreamcast sales with PS2 hype. We've been waiting for 10nm for a while now, and now Intel is essentially confirming "yes, it's really coming this time" where they might not have otherwise--it's hard to not see that as a reaction to Threadripper. At least on the surface, increased competition has led to increased transparency, which is good for consumers...or at least I'm happy knowing I have the option to avoid a big, hot, power-hungry chip if I want to.


The Osborne effect: https://en.wikipedia.org/wiki/Osborne_effect

Although it has been deliberately deployed many times now, as you note one example of.


seems like Intel is willing to cannibalize Skylake-X to hurt Threadripper. I think that means Threadripper must be looming large at Intel.


Honestly The revision-x's are forgettable to begin with, unless you need that many cores (No ECC either).


And people keep falling for it

See:Itanium

Never compare an existing product to one that is not yet being sold. Never


Itanium was such a strange ... event. A complete outlier, it is even from today's perspective not easy to see why it happened.


Most people buying a laptop don't really care or know what generation of the CPU they have. Not convinced it hugely impacts sales


It's the bulk enterprise purchases that they're worried about, not individuals. That may include bulk purchases to build laptops, but it's probably more datacenters and high-performance computing they're worried about.


I didn't wait even though I'm excited for what Ryzen mobile chips might bring and frankly an i7-7700HQ is enough for a laptop, if I need more grunt than that I'll switch to a desktop.


Nice, I never knew what it was called. Thanks!


I like that idea. It would mean intel had to show its hand a bit early with large promises in the near future. I admit i am a bit nostalgic for the crazy days of cpu clockspeed doubling every year. Then the early Gpu wars.


The most obvious/shameless situation like this was when they "launched" their Core i9 processors.


We never had a date for Ice Lake. I think we only started hearing about it this year.

It's Cannon Lake (Intel's first 10nm chip) which is supposed to come out this year.


Wow. Great comment.


Now that Intel got competition again with very competitive new AMD CPUs, Intel is releasing "new" CPU to the public that waited in their basement for some time. AMD CPUs are a lot faster than they hoped, so Intel has to skip one generation.


In retrospective, Intel should have bought NVidia when they had the chance; GPUs is the only area making huge progress year to year now.


Besides GPUs, memory (DRAM), storage (SSDs, hard drives), wired networking (Ethernet, Thunderbolt, Fibre Channel), wireless networking (WiFi, Bluetooth, cellular) and displays (monitors, VR) are all still keeping pace with their respective versions of Moore's Law. Of course they still aren't going to catch up to CPUs any time soon. (Never in the case of networking, since light travels only so fast).


Even better course of action would be AMD merging with Nv instead of ATI.


They couldn't, their logo colors doesn't match.


What? Nvidia = Geen, AMD = Green :S


Apparently AMD and Nvidia discussed this, but Nvidia's CEO Jensen Huang wanted to be CEO of the combined company. I think AMD saw the deal as an acquisition, not a merger.


But we're all out of DeLoreans...


That ship sailed 11 years ago.


Intel didn't buy NVIDIA, but they did buy Altera, and there's plenty of room for growth with FPGAs.


Aren't FPGAs mainly for design phase and the real crunching in the industry done on ASICs? At least whole automotive industry works that way - FPGAs to design/test stuff, ASICs for production and making money.


Nope, plently of people do heavy computing with FPGAs.

The speedup between FPGA -> ASIC is not that dramatic. Their real advantage is power draw and ammortized cost. FPGAs also have to initialize when powering up.


It mostly depends on how many you are planning to sell. For a given performance/functionality, FPGAs cost more per chip than ASICs. But ASICs come with a much greater upfront fixed cost.


Note the FP of FPGA - "field programmable".

What this means is that to implement the hardware the FPGA represents, you have to "program" it; this is typically done in one of only a couple HLLs (VHDL and Verilog, known as Hardware Descriptor Languages or HDLs).

At one time, Xilinx (Altera's competitor) made an FPGA which could be programmed "on the fly" very quickly; it (well, many thousands) were used for an interesting machine, of which only a few examples survive (the whole thing at the time was surreal, if you followed it - it seemed like a scam more than anything, but real hardware was shipped).

This machine was called the CAM-Brain machine, and was the creation of researcher Hugo de Garis (who is retired, and is a seemingly strange fellow in the AI community - but not as strange as Mentifex):

https://en.wikipedia.org/wiki/Hugo_de_Garis

http://dl.acm.org/citation.cfm?id=591856

https://profhugodegaris.files.wordpress.com/2014/02/arj-rev2...

I encourage you to research this machine, and Mr de Garis, as the whole thing is fascinating (and I will also say, from a design perspective, the shipped CAM-Brain Machine was one of the "sexiest" looking boxen since the early Crays).

CAM-Brain meant "cellular automata machine brain" - it was basically an effort to evolve a neural network using CA and FPGA; the CA would evolve the HDL which described the hardware representation of the NN, which would then be dumped to the FPGA for processing. The process (from what I understand) was iterative.

I don't believe the "kitten" ever went past much more than some early 3D models (maybe some CAD, too) and a software simulator. At least, that's what you can still find out there today (images of the simulator running on Windows NT, iirc).

The effort was noble, but it didn't work for more than simple things. I think it was part of the "evolve-a-brain" NN dead end, which seemed to hold out some promise at the time.

That's just a bit of background, but it shows how Intel and FPGAs can be used for building hardware to represent neural networks (a GPU/TPU is not a neural network - it is merely a processor for the software representation of the neural network). Whether that's their intention, or something else (maybe something like Transmeta tried?) - only they know.


That's pretty cool! Thanks for sharing this! I'll take a detailed look at it :)


Is this Intel's response to AMD Threadripper and EPYC?


No, that would be Cascade Lake-X and Cascade Lake-SP.


According to the table in the article:

- 2011: 32nm

- 2012: 22nm

- 2014: 14nm

- 2018?: 10nm

I don't know much about foundry processes, but it seems that it's taking more and more time for lesser and lesser gains, right? At this rate, how long until we reach sub nanometer? What are the physical limits on these processes, and does it have any implications for end users? Will we be using 2nm CPUs for 50 years?

Would love to hear the thinking of anyone educated on the topic.

Edit: very intrigued by the sustained downvotes on this ¯\_(ツ)_/¯


The step from 14 to 10 nm is huge. Both from a technological perspective on the manufacturing side as well as on the effect it will have to the number of transistors on a die and the power consumption of those transistors. Remember that power consumption and the number of transistors are related to the surface area so there is a square factor in there. 14 nm ^2 = 196, 10 nm ^2 = 100, so that's almost a doubling of the number of transistors and approximately a halving of the power required per transistor for a given die area.


Okay, so the node names are effectively useless at this point. They used to refer to gate length, but no longer, even for Intel. Oh, and Intel's 10nm will actually have lower performance than their 14nm.

Besides, it matters not, the bottlenecks today are in memory and interconnects.


> Oh, and Intel's 10nm will actually have lower performance than their 14nm.

Less than 14++, sure, but 10+ and 10++ will fix that.


Yes, but I was just pointing out that scaling is now neither a panacea nor free.


Performance compared to what? Same power, Same price, or just the performance of the fastest processor of the series?


At some point, maybe we'll start seeing RAM put onto the CPU itself.

I mean, more than it already is (with cache).


You will see stacked memory and silicon interposers, but you won't see main memory on the CPU die. DRAM is based on an array of what is called "trench capacitors." The fabrication process is sufficiently different that they don't even make these in the same facility, much less on the same die process. An array of trench capacitors will always be smaller than transistor based memory (SRAM.)


There are alternatives on the horizon, e.g. STT-RAM, MRAM, memristors, PCMs, ReRAM (same as memristors according to some...), spintronic devices, etc.

There are also other attempts such as FBRAM to replicate DRAM structures without the need for insane A/R trench capacitors.

I believe that such solutions are necessary to continue scaling.


It is not a big problem to make DRAM on pretty much every SOI process, just power consumption and refresh rates will have to be quite big.

The problem with MRAM is unreliable reads, they are excellent for low clock speed devices, but as you go into gigahertz range, signal quality of an mram cell begins to degrade, and you have to put a darlington on top of it, or a bicmos transistor, thus negating its cell size advantage


For DRAM are you talking about standard deep tech capacitor dram or FBRAM?

Agree with MRAM, but it is also a very immature technology, so there's hope at least. unless you're talking about crossbar crosstalk which can be solved with a diode.


About floating body ram and others that rely on capacitance of the substrate itself rather than a dedicated capacitor


Ah, I see. But it's not significantly higher than other DRAMs, right? At least last I checked the difference wasn't that big.


I believe that data published by people peddling embedded dram ip is their "best case scenario" with still significant alteration to manufacturing process


While true, the same math holds for 22 nm^2 (484) to 14 nm^2 (196).

Real world gains will never be as high as the math suggests, as you get into leakage currents etc.


Sure, I never meant to imply that previous process steps were much smaller, just that this one is still formidable in its own right. Real world gains will not be 100% but they're a very large fraction of that. Obviously any technological advance in a mature industry is going to show reduced return on investment at some point, it's rather surprising that the ROI on these process shrinks is still worth it given that we are now well beyond what was thought to be possible not all that long ago.


Yeah, so the node names now apparently refer to the "smallest feature size", which is some random thing on the M0 metal layer. Source - from a former Intel engineer for more than a decade


So not like when games consoles used to advertise how many "bits" they had: take whatever has the widest bus and advertise that as the number of "bits" or use tricks like the Atari Jaguar: 2x 32bit cpu's = 64bit, right? RIGHT?


That's how Intel define it; other fabs have their own definitions.


There are various limits in play: https://www.extremetech.com/computing/97469-is-14nm-the-end-.... At sub-nanometer we're talking about features of 5-10 atoms across. At that scale you get effects like electrons quantum tunneling between transistors: https://www.theverge.com/circuitbreaker/2016/10/6/13187820/o.... We probably won't get there with existing silicon technology.


I read on HN, a couple months ago, these numbers no longer represent the physical size of anything but are now just a marketing label, a sort of 'performance equivalent to a theoretical size of'.

Anyone know if there's any truth to this? Might try to find the comment later when I have time.


Think of them as relative indications of feature size and of spacing between identical parts (arrays if you want to use a software analogy), so even if an actual transistor will not be 10 nm or 14 nm their relative sizes will relate on one axis as 10 nm to 14 nm. Keeping the numbers from a single manufacturer will definitely aid in the comparison.

There is a ton of black magic going on here with layers being stacked vertically and masks not having any obvious visual resemblance to the shape they project on the silicon because of the interaction between the photons / xrays and the masks due to the fact that the required resulting image is small relative to the wavelength of the particles used to project it.

There is a super interesting youtube video floating around about this that I highly recommend, it's called 'indistinguishable from magic':

https://www.youtube.com/watch?v=NGFhc8R_uO4

It's up to date to 22 nm. Highly recommended.


Here is the talk a few years later: https://www.youtube.com/watch?v=KL-I3-C-KBk


I've seen that video several times now and it just doesn't cease to amaze me.

It's really a must-see for anyone interested in processor technology.

Oh!, there's a new video! awesome!


If you think that lithography is challenging, your brain is going to invert looking at the litho technology for 7nm and onwards :)


Yeah, 7nm doesn't mean anything now. Refer to them effectively as product names. Intel is really no better.

The "7nm/10nm" transistor fin pitches are something like ~50 n, and the length is something like 100-150nm


I don't have any direct experience with "deep submicron" stuff, but from what I've read you basically can't trust these numbers to be comparable. The various sizes/spacings don't scale together the way they did for larger feature sizes, so you could have e.g. a "14nm" process where the area of an SRAM cell, NAND gate etc. ends up the same size as another foundry's "20nm" process even though the actual transistors are smaller.


Intel's are actual measurements. TSMC, GF, and samsung are all marketing bs.


They're all marketing, Intel is no exception. At 40nm and over, the Intel node names were larger than the industry average, now it's the other way around.


> I don't know much about foundry processes, but it seems that it's taking more and more time for lesser and lesser gains, right? At this rate, how long until we reach sub nanometer? What are the physical limits on these processes, and does it have any implications for end users? Will we be using 2nm CPUs for 50 years?

The lattice constant of crystalline silicon is 0.54 nm, and since it's an FCC structure the distance between neighboring atoms is 0.38 nm. So with a hypothetical 2 nm CPU, some feature would be only roughly 5 atoms across, which leaves VERY little room for manufacturing tolerance. How would one manufacture a device containing billions of transistors with such small tolerances? I don't think we'll ever see such things, at least not with a lithography approach.

Heck, I think it's close to black magic that they are able to produce transistors on a 10nm process, but apparently experts say that up to 5nm (13 atoms!!!) might be possible.


I think that beyond our current processes, there is the potential for different materials to take the place of XXnm silicon processes, which could fit more transistors in a smaller area.

Research like: http://news.stanford.edu/press-releases/2017/08/11/new-ultra...

"3D" processes which build multiple layers on top of one another may also see more investment as other methods become prohibitively expensive. And once you've cheaply gone to 2 layers in a big blob of epoxy, what's stopping you from doing 4? 8? 16? 32? [Heat dissipation, probably]

But whatever, people have been saying Moore's Law is dead since Moore's Law was invented. Who knows whether we'll technically hit one milestone or another. Things get faster, what the hell.


People are already stacking multiple layers today for memory, although the layers are always manufactured separately and then bonded together in a separate step.

I wouldn't be surprised to see more of that in the future, think caches on top of cores, but I doubt we'll ever see multiple layers of transistors produced in a single step. Technical challenges aside, the economics of trying to produce multiple layers at once are just always going to be worse: higher latency from a wafer entering a fab to finishing the wafer, and much higher rate of defects. (When you produce the layers separately, you can potentially test them separately before putting them together, which is a huge win for defect rate.)

It's possible that manufacturing multiple layers at once might eventually allow for a higher density of vertical interconnects, but I just don't see that becoming the deciding factor.


widely assumed 5nm is the limit. I know i've seen others discuss some ideas around how close they can get, but i'm struggling to find the thread.. In any case, this may help: https://en.wikipedia.org/wiki/5_nanometer

Found it.. http://semiengineering.com/will-7nm-and-5nm-really-happen/ and the HN discussion from several years ago: https://news.ycombinator.com/item?id=7920108


also Is 7nm The Last Major Node? https://semiengineering.com/7nm-last-major-node/


Going from 14nm to 10nm increases the number of transistors per mm from 37 to 100 millions. That's a huge difference!!


This is a sensible question. Not sure either why the downvotes. I'm curious as to the answer myself. Although, I don't know if we'll ever see sub-nanometer. Maybe that's the reason for the downvotes, that sub-nanometer is not really in the realm of what's possible with current CPU architectures and the physics of silicon. Although, that's simply based on today's physics. Who truly knows what the future will bring.


Didn't 14nm arrive late 2015 or in 2016?


If you consider relative change it looks like

2011: 32nm

2012: 31% size reduction

2014: 36% size reduction

2018: 28,5% size reduction


Yes, but in 1, 2, and 4 years respectively.


In 8 years, in 2026, we expect 7nm.


All you get now is power reduction. Performance is completely bottlenecked since 22nm. Peak computing!


I wonder if this one will come with free backdoors and spyware installed, thanks to the wonderful Intel Management Engine (Intel ME) backdoor. [1][2][3]

Intel (and AMT) keep pushing more and more proprietary code that can not be read, changed or removed. No one knows exactly what it does and it has built in screen and key recording. It's my advice and the advice of privacy advocates that no one should purchase or use any processor made by Intel or AMD until they address these serious issues.

1. https://libreboot.org/faq.html#intel

2. https://puri.sm/learn/intel-me/

3. https://news.ycombinator.com/item?id=14708575


I'm not sure this should be discussed in this thread.

Also, I don't know of any alternative that doesn't have large unauditable blobs integrated into the chip.

All ARM SoCs come with radio processors that are running a non-trivial piece of software with full access to the system memory, which is responsible for power management, boot sequence and wireless communications. It is by definition network connected.

AMD has a technology it calls the Platform Security Processor (PSP for short) which does basically the same thing.

To have a processor that doesn't have this kind of technology, you have to give up on decades of advancement in compute power, or buy a very expensive and non-portable POWER8 or POWER9 system.


Why should a serious backdoor, privacy concerns and ethical problems with a monopolies new product not be discussed in a thread about that product? Not sure I get your point on that.

But yeah you are totally right on the alternatives. Nothing quite matches Intel and AMD, and a lot of those ARM SoC's have proprietary code running on their bootloader too. But you can get some processor from 7 years ago that are usable.

OpenPOWER is fantastic though and has real potential. There were a few projects out there looking to implement a laptop and personal desktop computer using it, but unfortunatly didn't reach it's funding goals.

I think the more people that know about Intel and AMD's shading practices that more funding open hardware projects can get, and maybe in the next few years we can replace Intel and AMD with ethical and open solutions.


I agree, this has to be allowed to be discussed about, it's literally about the product.

Haven't heard about OpenPOWER, I hope more people are made aware of alternatives to get funding and spin.

There are some ARM processors that live without blobs, I think Olimex produces what they call open-source hardware (OSHW), is this an acceptable product?


I meant that as in, there have been plenty of dedicated discussions threads on this site and many others regarding the Intel ME. Most people here know about the ME by now, and we don't have to bring it up in every single Intel-related thread.


> a lot of those ARM SoC's have proprietary code running on their bootloader too

Usually possible to replace that blob! e.g. https://github.com/christinaa/rpi-open-firmware for the Raspberry Pi


Check out Talos II motherboard. It's a workstation-class motherboard with dual POWER9 CPUs for $2750. It's a good price for workstation computer IMO. They claim that all their firmware is open source. Specifications are quite modern. The only problem is (kind of) exotic architecture, but many people would be able to use it with open source software.


> It's a workstation-class motherboard with dual POWER9 CPUs

Isn't it more accurate to say it might at some point be available as a motherboard with power 9 CPUs?

I mean, it looks very interesting, but afaik no-one has been shown even a prototype yet?


I was scanning thought he comments to see if somebody had already mentioned this and if you hadn't I would have.

I am finding the Talos II an increasingly attractive proposition, even though the prices got a full system are quite staggering by comparison to mainstream hardware.


Though if you're comparing the Talos II to a Mac Pro, the price difference isn't quite as staggering :P


> All ARM SoCs come with radio processors that are running a non-trivial piece of software with full access to the system memory, which is responsible for power management, boot sequence and wireless communications. It is by definition network connected.

The high-end ones used for flagship smartphones/tablets do, but low-end ones used in cheaper tablets/TV boxes and more specialized hardware often don't have any radio interface.


You know the depth of how unrealistic is your advice, right?


Do you know the depths of not taking that advice and what lurks in them? Do you know that if everybody simply took it to heart, there'd be nothing unrealistic about it at all? How many months of abstinence and solidarity would be required to end these practices, or the companies if they so wish? And then that money simply shifts to ethical companies and we actually have a future. Or, we keep pretending it's all so very hard, and don't have one.


You are asking for the whole of humanity to stop buying some of the most sought after products of modern times from two of the best-selling makers of that industry.

I am all in for some philosophical discussion but actually being this detached from reality doesn't make you any good. It's not because you can see the stars that you can reach for them right now...

So yes, in summary: it is hard, to the point of impossibility.


Pfff. It's the strike breakers that make it hard.

Keep that up for a while longer, and it will become a physical impossibility, as any gesture of resistance leads to automatic extermination. Until then? Thanks for nothing.


> Do you know the depths of not taking that advice and what lurks in them?

Pretty much everyone can imagine computers being insecure and unreliable, since computers are currently insecure and unreliable.


Not to mention that they'd wouldn't be secure and reliable just from removing Intel ME


Who's talking about unreliability? I'm talking about perfectly reliable tools of oppression.


Interesting. Pragmatically, are there any okay-ish options for consumer-level processors and motherboards that are not Intel or AMD?

ARM devices?


The Talos II[1], which is a IBM POWER9-based machine. It's a bit more expensive than a standard Intel machine (~$2k for the whole prebuilt machine, a bit less for just the motherboard+CPU).

Everything in it is free, including all of the firmware, and the CPU is an open specification.

[1]: https://www.raptorcs.com/content/base/products.html


Does anyone here know someone who works on these various management engines? It'd be interesting to see if the security services were involved or if they really were back dooring all computers right?

My guess is it's definitely possible but it would have been popped by foreign agencies by now too and there would have been a leak of tools to exploit such devices? I guess maybe it's very tempting to be able to hack any device though so knowing the NSA they are probably for doing this, fuck the consequences?


There was this floating around a while ago: https://i.redd.it/id88hvysu3ny.png


Well, this seems quite a bit unsubstantial. If an Intel Employee in that position wanted to leak some real info, i would assume it would be accompanied by something that gives the information some credibility.


You mean an anonymous post on 4chan about Obama spying on Trump isn't super credible? Color me shocked...


Many people actually need hints like that:-)


"The stories and information posted here are artistic works of fiction and falsehood. Only a fool would take anything posted here as fact."


The backdoor is super useful for large enterprises, don't hope for much.


The issue I have is companies like Google and Puri.sm have asked Intel and AMD for a blank signed blob that completely disables ME but they have refused this. It would take them literally no time at all. This raises all sorts of red flags that something dodgy is going on.

If you had the chance to make a supplier who creates millions of chromebooks happy, wouldn't you take every opportunity to help them, especially if it costs you little to no money at all. Obviously there is a big reason why they don't want this backdoor removed.


Which is why someone with deep enough pockets and some help by the community (crowdfunding?) should invest in making open alternatives possible. Thousands of people have been laid off in the past by big silicon corporations, I refuse to believe there aren't 10 of those people in the world who caannot be hired to design an open platform. It doesn't have to be fast as modern processors; if it allows opening a webpage at acceptable speed or playing a video at 30fps 720p that is more than enough for most us, and more importantly would send a huge message. Many would of course disagree, mainly gamers who would sell their soul to the devil for a faster graphics card or other people who don't care about their privacy. Once the design is done, it comes the fab. Decades ago any company would have to set up its own but today there are fabless companies who design chips and fabs producing them for various customers, so it's just a matter of money. The goal isn't to create an alternative with respect to computing power, but rather in usage. The message is "we're not using your bugged shit to communicate among us or keep our data".


halvar flake (Google P0 security guy) talk a bit about it in the last Black Hat Asia : https://www.youtube.com/watch?v=JCa3PBt4r-k.

Basically he says that even Google is puny (in terms of production units) in front of Intel or Samsung, and cannot ask for custom firmwares.

Hardware security is currently a shit show because of global monopolies/"oligopolies".


If Google is tiny in Intel's eyes, who isn't?

They operate a top 3 cloud service and have enormous internal data centers as well.


Companies that act as OEMs for enterprises most likely have a larger footprint of Intel installs than Google. Any single companies usage of a product is dwarfed by how much effective installs a large OEM might have.

Maybe if Lenovo, Toshiba, Acer, Dell, etc all asked Intel to provide said blobs (and the threat was tangible) then they would probably reconsider.


It would be more useful for them if it could be controlled at the source level. The management engine would be fine if it were free software and could be replaced.

I'm very unhappy with my old Sun servers, for example, because the management system cannot be upgraded and the servers are no longer supported. I'm stuck with proprietary insecure software that I depend on and that I have no way of changing. It's all worse if the insecure outdated software can only be replaced by soldering wires to a chip on the board.


This is disturbing, to say the least. Given how much effort I've invested in securing myself, it's... disappointing. The rationale, it seems, is that government doesn't count as "someone to be concerned about", from a security point of view.

I'm curious about how one would be associated with a particular chip. I understand that key strokes can be logged, TCP/IP can be read; you can be scraped, but ultimately how is their backdoor aware of you so that it doesn't appear to them like needle in a stack of needles. A fascinating and revolting technical conundrum.


If CPU backdoors exist how has nobody logged network traffic required to abuse them?


Malware is in fact using IME.[1] And remotely exploitable vulnerabilities have been found it in.[2]

[1] https://blogs.technet.microsoft.com/mmpc/2017/06/07/platinum...

[2] https://arstechnica.com/information-technology/2017/05/intel...


I'm surprised I didn't hear about this until now. It looks like the user had to have enabled AMT, so this isn't exactly conspiracy levels.

It is troubling how little priority the computer world gives to proper security models.


Are you certain that the advice you are giving is suited to the security and privacy objectives of the masses?

Also, the management engine is more about the chipset than about the core, which is what the announcement is about.


yep. Once everything will be under control, having the freedom to write your own software, especially software that challenges the rules, will be useless...


So it does affect only Desktop workload right? Because on servers as long as we use virtuals we should be good. Am i right?


No. Basically the Intel ME is a completely separate ARM processor that's physically stuck onto each Intel Processor. It has direct access to everything the Intel chip does. The memory it's allocating, the hardware commands (ie keyboard, mouse, display), the software running, the processes running. This all happens at a higher level than the actual Intel processor and you have no control over it at all.

Basically whatever you run at any level on your Intel chip can be monitored by the Intel ME chip, no matter how many VM's, operating systems, encrypted files/processes you have installed/are using.


It's not an ARM processor in the case of Intel.


Ahh thanks, sorry was getting confused. It's AMD's PSP that uses an ARM based spyware kit. I wonder what Intel ME actually runs on then. Probably just another Intel Chip?


> no one should purchase or use any processor made by Intel or AMD until they address these serious issues.

so go back to stone age by stop using all PCs/Servers? nice troll.


Have a look at https://minifree.org/ and a few Chromebooks (obviously with the operating system replaced). There are some options, but yeah it's a big problem that the microprocessor market has been locked up by two monopolies.

But I guess people have to make a personal judgement. Is ethics, privacy, freedom more important than a faster processor to run your games on?


Sure, because we only use computers to play games on...


In addition to MiniFree, there's the Talos II[1] which is an entirely free motherboard and CPU (based on IBM's POWER9). It's a very modern CPU specification, and is also fairly powerful. Currently pre-orders are open. They are a bit pricey (~$2k for a fully prebuilt machine), but if you feel that you want a more powerful CPU that is an option. They also have server offerings.

[1]: https://www.raptorcs.com/content/base/products.html


Back in the days when a new computer became hopelessly obsolete within 3 years, I would never have considered spending that much. But perhaps now I might :)


What will be the clock when single threaded? Can we get above 5GHz finally?


We have been over 5ghz for a decade, it just takes ln2 to do it.

We aren't going to see ludicrously high clock rates for the foreseeable future. There are a lot of compounding factors as to why, but the biggest ones are the pressure for efficiency driving designs that aren't dumping higher and higher voltage to get frequency, the diminishing returns on voltage vs frequency (see Ryzen, where a 20% improvement in clocks costs about a 50% increase in power draw across all skews, and similar situations happen with Intel).

That being said, a 4ghz Skylake core crushes a 4ghz Core 2 core. Depending on your benchmark used it can perform anywhere from 80% to upwards of 170% faster per clock. You don't get as dramatic year over year improvements increasing the the per cycle performance, but innovations leading up to ~2004 (or 06 for the multicore boon) were just stuffing power hungrier and hotter transistors on smaller dies.


> We have been over 5ghz for a decade, it just takes ln2 to do it.

Non-x86 systems went over 5 GHz on air many years ago, e.g. IBM POWER.


Delidded Kaby Lake also sometimes does 5GHz on air IIRC


> That being said, a 4ghz Skylake core crushes a 4ghz Core 2 core.

Core 2 Duos were slower per clock than Pentium 4s, I think by quite a bit. It was a real set back for performance when they came out.


That can't obviously be true except for some certainly specific workloads.

Core 2 was the breakthrough that left AMD in the dust from which it still hasn't recovered to parity. Even if I can't recall the numbers, I fail to remember how the very energy-intensive (high clocked) Pentium 4 could have been faster per clock.

Are you sure you're not thinking of the Pentium 3 to 4 change? After all, Core architecture had more in common with P3, didn't it?


5GHz is not that unreachable:

i5-4670K AT 5GHZ OC On a $30 Air Cooler https://www.youtube.com/watch?v=NUHm2qHI3gc


a lot of i7-7700K's can be overclocked to 5.1GHz and beyond, while on air cooling


We need something other then Silicon to have 4Ghz + at reasonable Power usage. I think Material Science is going to be the next big thing when we reach the limit of Silicon.

I really wish we could see more IPC and 10Ghz+, our single threaded performance have be stuck at the same level for far too long.


Anyone else get redirected to malware?


This has happened to me multiple times.

I've reported it to them a couple of times, last time Ryan Smith told me on twitter that I'm welcome to report it to him direct: https://twitter.com/ryansmithat/status/877409854087806976


Yes! I thought it was my DNS or something. It only happens on mobile.


Anandtech have great content, but their advertising can be... aggressive.

I assume you have an adblocker on desktop but not on mobile?


They have fallen a long way since Adnan ran the site...


Adnan is the guy who may or may not have killed his girlfriend in the 90's. You are thinking Anand. As in AnandTech. And even without Mr. Shimpi AnandTech is still one of the best sources for in-depth reviews of hardware.


They have fallen behind in CPU and GPU testing methodology especially around games. Their testing of databases was pretty awful (it fit in memory) and a variety of other obvious limitations to their testing I would argue they are not only worse than they were when Anand ran the show but now significantly worse than a lot of the places they compete with.

Pcper is a significantly more capable review site these days.


I didn't realize he wasn't still running things. When did that happen?

I hope he made some good money.


You are right about the name, I blame autocorrect.

I think especially their phone and tablet reviews have gone down a lot. Fx no deep dive into the newer iPhones and iPads.


This is a reference to the Serial podcast, for anyone wondering. Recommended (both seasons).


And people get mad on here when I say that I use an adblocker. Until the ad industry gets its shit together, I will continue to do so, as not blocking ads is negligent from a security perspective. This is a terrible state of affairs.


>And people get mad on here when I say that I use an adblocker.

Really? Can you link an example?


[flagged]


They don't usually include process names in consumer targeted marketing material...


They used to go Process, Architecture, Process, Architecture, which was styled as Tick, Tock. Recently they switched to Process, Architecture, More Architecture.

I like to call it Tick, Tock, Clunk.


At this point real compute happens on the GPU. I think we'll see a shift to major apps being driven by GPGPU.

The CPU's performance has started matter less and less.


What do you mean by major apps? Do you have some reason to believe that apps will suddenly become embarrassingly parallel? Major apps often don't even take full advantage of SIMD instructions on the CPU. As soon as you need a context switch, branching, or fast memory access your GPU is crap.

GPUs are only good at _very_ specific workloads.


I used to believe that, until I started to get into cryptocoin mining. There were algorithms that were specifically designed to be GPU resistant and they all were ported and saw significant gains. It was that experience that pushed me to learn how to program these devices.

The tooling isn't good enough yet, but there is no question in my mind practically everything will run on a GPU like processor in the future. The speedups are just too tremendous.

Intel knows this, which is why they purposely limit PCIe bandwidth.


>I used to believe that, until I started to get into cryptocoin mining.

99% of the apps we use are totally unlike cryptocoin mining. And the style they are written (and more importantly, their function) is even more GPU resistant.

>The tooling isn't good enough yet, but there is no question in my mind practically everything will run on a GPU like processor in the future. The speedups are just too tremendous.

Don't hold your breath. For one, nobody's porting the thousands of apps we already have and depend on everyday to GPU.


While true in the short term, it should be noted Intel is moving more towards the GPU model with its many slower core CPUs and very wide vector units. There is value in a large core capable of fast out of order execution for control purposes. But data processing can be done much faster with a GPU model.

You can even implement explicit speculative execution - simply use a warp for each path and choose at the end. It is very wasteful but can often come out ahead.


No, Intel's approach is very different from GPUs, because Intel has a strong negative interest in making porting to GPUs easy (also, wide vectors are far easier to do in a CPU than a "GPU like model").


Cryptocoin mining is embarrassingly parallel by its nature. You are trying lots of different inputs to a hash function, so you can always run an arbitrary number of them in parallel. There are various ways to reduce the GPU/FPGA/ASIC advantage, like requiring lots of RAM, but the task is still parallel if you have enough RAM. Something like a JavaScript JIT on the other hand is fundamentally hard to parallelize.


"GPUs are only good at _very_ specific workloads."

Let me detail you a completely specific workload citing my own personal experience over objective facts with citations

Oh and let me also take a swipe at Intel again without any verifiable evidence


Primecoin is a good example. If you look at the implementation it requires generating a tightly packed bitfield for the sieve and then randomly accessing it after. Lots of synchronization required so you don't overwrite previously set bits are required and the memory accesses are random so its suboptimal for the GPU's memory subsystem.

It took under a year for a GPU miner to come out. Having optimized it for Intel in assembly I was convinced it wasn't possible for a GPU to beat it - and yet it happened.

It turns out even when used inefficiently a thousand cores can brute force its way through anything.


But you don't have thousands of cores. You have a medium amount of cores (64 in the latest Vega 64 GPU) with a high amount of ALUs (and threads) per core. When the GPU executes an instruction for a thread it looks if the other threads execute the same instruction and then utilises all ALUs at once. This is great for machine learning and HPC where you often just have a large matrix or array of the same datatype that you want to process but most of the time this isn't the case.


> a thousand cores can brute force its way through anything

Cool! Please factor this very large prime for me <insert some RSA prime>


I can't tell what you are mocking...

That primecoin exists? That it's POW is finding prime numbers? That it has a GPU miner?

http://cryptomining-blog.com/2192-gpu-mining-for-primecoin-x...


> a thousand cores can brute force its way through anything

This is so fundamentally wrong that your complete lack of understanding the underlying math is obvious. Let us use 1024bit RSA as the key to bruteforce. If we use the entire universe as a computer, i.e. every single atom in the observable universe enumerate 1 possibility every millisecond, it would take ~6 * 10^211 years to go through them all. In comparison, the universe is less than ~14 * 10^9 years old.

And this is for 1024bit, today we use 2048bit or larger.


Except you're not bruteforcing through all 2^1024, you're factoring a number, which is much easier and why rsa 768 is broken and rsa 1024 is deprecated.


Do you know how to bruteforce a prime factorization? Because I'm pretty sure from your comment that you don't. The calculation is based on enumrating all prime pairs of 512bit length or less each (the length of the indivitual primes may be longer, but for napkin math it's a very good approximation).

That is bruteforcing. That faster and better methods exists is irrelavent for a bruteforcing discussion, but I do mention in the very next message in that thread that they exist.


XPM has nothing to do with finding primes used in RSA, I'm not sure where that came from. It's POW is finding cunningham chains of smaller sized primes. That said I wouldn't be surprised if it could be adapted to find larger primes on a GPU significantly faster than a CPU.


It was simply a response to your statement about bruteforcing anything.

There is some ways to factor a prime that is much faster than bruteforcing, and yes they work on a GPU. But that has nothing to do with bruteforcing, but clever math.


The statement was about the relative speed of a thousand dumb cores vs. a couple really good cores. Not about absolute speed.


The statement was stating an absolute. Algorithmic complaxity matters more than the speed of the computing hardware. Obviously, more power means you can compute more, but not everything.


That's a very uncharitable reading of what I wrote. The topic of the conversation was GPU performance vs CPU performance. Despite being less flexible, the sheer quantity of execution units more than makes up for it.

But no, I suppose its more likely I was really saying GPUs aren't bounded by the limits of the universe.


The context is Primecoin and ability to find primes. Prime factorization is related to that, and it would be obvious to read your statement in absolut for that context and it's relation. At least I did.


That's just simply not true. The 'real computation' happening on GPUs is either very heavy floating point work or graphics related work, almost everything else is running on the CPU.


GPUs are only useful if your problem is data parallel. The majority of compute intensive problems that are also data parallel at the same time have been shifted to GPUs and SIMD instructions already. A GPU isn't some pixie dust that makes everything faster.


I've never run a piece of software that used a GPU for anything other than rendering. I believe I'm part of a very large majority.


If you'd like to, these links will let you do so from the comfort of your (desktop) browser.

https://tenso.rs

http://gpu.rocks/


Libreoffice has OpenCL acceleration for some spreadsheet operations. With the advent of NVME storage, and the potential bandwidth it yields, I would expect to see database systems emerging that can GPGPU accelerate operations on tables to be way, way faster than what a CPU can handle.


> I would expect to see database systems emerging that can GPGPU accelerate operations on tables to be way, way faster than what a CPU can handle.

Why do you expect that? Many DBMS operations that are not I/O limited are memory limited, and a GPU does not help you there (on the contrary, you get another bottleneck in data transfers to the small GPU memory). What can help is better data organization, e.g. transposed (columnar) storage.


That's why all GPU databases I know of are columnar, or at least hybrid (in the case of IBM DB2 BLU)...

The more complex the operations performed on the columns (trasnform), the better the GPU database will be - because of the higher ops/bytes ratio


What on earth are people doing on spreadsheets that needs GPU acceleration??


People use spreadsheets for anything that you'd use a "normal" programming language for. When I worked at a bank, a real-time trading system was implemented as an Excel spreadsheet. There were third-party and internally-developed libraries to do the complicated stuff (multicast network protocols, complicated calculations that needed to be the same across all implementations, etc.) but the bulk of the business logic and UI were Excel. It's easy to modify, extend, and play with... which also makes it easy to break functionality and introduce subtle bugs. Though the same is true of any software development environment -- things you don't test break.


Right, but most things you do in a "normal" programming language don't run on a GPU either.


Don't think of spreadsheets as glorified tables. Think of them as the world's most-commonly used business logic and statistical programming language. A competitor to R, if you will.


First, who runs business logic on GPUs?

Statistics, sure, that's definitely a good candidate for GPUs. I don't know much about R, but a quick google suggests you can run R code on a GPU, by working with certain object types, like matrices with GPU-accelerated operations.

That doesn't seem like it maps very well to a spreadsheet unless you have one big matrix per cell. I'm guessing (maybe incorrectly) that when people work with matrices in Excel, they're spread across a grid of cells. You probably could detect matrix-like operations and convert them to GPU batch jobs, but it seems very hard and I'm skeptical of how much you'd gain.

So I'm still wondering what kinds of typical Excel tasks are amenable to GPU acceleration in the first place. People use Excel to do a lot of surprising things, sure. But people use C++ and Python and Javascript for a lot of things too, and you can't just blithely move those over to the GPU.

Maybe it's specific expensive operations, like "fit a curve to the data in this huge block of cells"?


OK, so I googled around a bit more and found a useful presentation on LibreOffice internals: https://people.gnome.org/~michael/data/2014-05-13-iwocl-libr...

Looks like it is indeed identifying large groups of cells with common formulas, and running those calculations on the GPU.


You never worked with accounting I guess. They easily get a single spreadsheet that's over 1GB, and that's normal.


I'm aware of big spreadsheets, but from what I've seen, it tends to be complex and very ad-hoc calculations that (I imagine) don't lend themselves very well to GPUs.

Making very complex tasks run well on a GPU is hard, whereas CPUs are great for dealing with that stuff.

If you have something like a 100,000 row spreadsheet where every row is doing exactly the same calculation on different input data, sure, that starts to make sense. If people are really doing that in Excel, I'm surprised! (but maybe I shouldn't be)


Batch processing. Lots of rows.


Intel chips have a giant iGPU, that on certain models occupies near half the space. It can currently decode 4k Netflix. The technology is equally applicable to GPUs.


The video decoding is done by a hard block called Quick Sync. It's part of the iGPU, but is not using the regular GPU cores.


Yes, but its made with the same lithography - if not the same mask (or equivalent). The lithography is GPU/CPU debate agnostic.


Let's try a compromise, how about multithreaded programs? Or how about programs utilizing less control flow so that static processors (yes, like the infamous VLIW) can extract ilp?


Right on. Core speeds are not getting any faster. Chips with thousands of cores are the future. Intel is way behind in this area


A quick look at AWS shows probably 5 out of 100 services that use GPU as their main resources whereas all of the others use CPU's.

Machine learning may be all the hype right now, but it is still suboptimal for most software demands.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: