Atom has actually gotten significantly faster recently. It seems they've started to rewrite some core components of it in C++ for performance reasons. I've also used some Electron apps that are very lean. I think it's more of an implementation issue. Slack is pretty terrible at resource usage, for example, but Zeit's Hyper is very efficient & it's resource usage is comparable to iTerm in my experience.
Slack's issues with their electron app shouldn't be particularly surprising, either, considering it was their head architect that published an article on Medium advocating for PHP on the basis of it's concurrency model...
On a serious note, I think the insanity will stop when the operating systems will start shaming badly written applications and nudge users to get rid of them. It is in Apple/Microsoft interest, because users will blame their computers ("My Windows is getting slow...").
The phones already show the list of power hungry apps responsible for your battery life reduction, having this on a desktop would be nice too. If even a terminal needs 64 threads and 126MB of RAM now... looks like some wrist-slapping is in order...
That's a lot more threads than iTerm uses for me, but it's less memory. Typically my work computer (a 2015 15" MBP) tends to be bottlenecked on RAM, too. 16GB sadly is pretty much the minimum viable amount of RAM for me to do full stack development these days.
I've maxed out my 64GB a few times and have seriously weighed upgrading to 128GB. Running all the development environment on my localhost (where I'm the actual sysadmin) is really handy if your sysadmins are too busy to help you with their business side of things.
macOS already does present a list of apps that are using a lot of energy if you click on the battery indicator. I thought that Windows had a similar feature, but I'm not sure.
Well, seeing as so many applications remain single-threaded despite years of mainstream multi-core CPUs, that shouldn't be too hard. Especially with many mentors in our industry cautioning new programmers that "concurrency is hard" so often that—at least within my small social network—new developers have been conditioned to avoid subjects like threading.
Yes, concurrency is a challenge, but it's tremendously rewarding. Plus modern languages make it much easier than it was in the past.
microsoft put in a lot of work into how vscode buffers the html its displaying for the file, and it runs very well because of it. It is actually a very cool setup; if you get a chance do an element inspection of the monaco in-browser editor, as it uses the same method.
the same can not be said for atom, however.
I will say that, despite its many shortcomings and after I was able to overcome the netbook-induced Atom stigmas, I really do like the Atom for small-business NAS applications.
As a consumer, it's great to see competition picking up in the CPU market. Intel does not seem to be able to hold the edge in node process as tsmc and Samsung are matching intel. It has to resort to architecture design.
Well actually I think it's more a case of "increase in performance is becoming harder and harder, the technology leader (Intel) is slowly becoming stuck and the competition (AMD) is catching up".
Everybody will be more or less at the same level as it requires huge investments to only get a marginal advantage.
Apart from games, I would not be that afraid to install recent pieces of software (browser, office suit, os) on a 10 years old computer (2007 core 2 duo). But in 2007 I would not have thought for a minute that I could do the same on a 1997 computer (Pentium II), and installing 1997 software on a 1987 computer (80386), just no.
Last time I replaced my CPU (i5-6500 in place of i5-2500), I only saw a marginal improvement.
Even at my former job, I worked with some decent servers (32 to 48 cores, 128GB of RAMs) to build Linux masters from scratch (Gentoo based). The oldest server of the bunch (which is 5 years old now) is still the fastest to build a master from scratch, it has less cores but faster clock and even on parallel tasks like compiling stuff, clock is still a more determining factor than cores.
There are still tons of things to improve in CPUs: power consumption, cost, embedded functionalities (SoC)... but performance improvements seem a huge cost for low gains adventure right now and for the foreseeable future.
> Apart from games, I would not be that afraid to install recent pieces of software (browser, office suit, os) on a 10 years old computer (2007 core 2 duo).
Ah, I can answer that one for you: my work computer of that time (E6850, 8 GB RAM), which cost me less than a thousand euros to build back then, has since been reconverted as a work computer for one of my employee, running Windows 10, Office 2016 and Chrome all day long. Only addition has been a 120GB SSD.
It runs much better than the "modern" and "cheap" ~ 400 euros integrated work computer I bought from ASUS and HP in 2015.
Gains in instructions-per-clock start to flatten out. And that's where the gains were coming from in the last years. Some time ago a paper was posted here that showed how even if you have an infinite amount of transistors, you will still be limited in the range of 3-10 instructions-per-clock for typical programs.
Clock speeds seem to have leveled and IPC will only see another gain of 50-100%. Single threaded performance is close to the limit. What after that? Is this the end?
> Gains in instructions-per-clock start to flatten out. And that's where the gains were coming from in the last years.
This is commonly claimed but it's actually false for x86_64 desktop parts. For a single core scalar integer workload the IPC boost from i7-2700k to i7-7700k was maybe 20-25% on a great day, but the base frequency increase was a further 20%, and max boost freq increase ~15%. The frequency increase is of similar importance as the IPC increase.
@mlvljr: Your account seems to be shadowbanned, I can’t reply to your comment.
Currently, SpaceX has prices around 56-62 million USD per launch of a normal satellite (with a weight and orbit where they can recover the first stage).
Arianespace launches such lighter satellites in pairs, always two at once, at a price of around 60 million USD per satellite.
The Chinese launchers offer the same at around 70 million USD per launch.
So, the prices aren’t that different.
But, for launches from reused rockets, SpaceX is damn cheap. The first launch on a reused rocket cost below 30 million USD.
So, to recap: Today, in best case, SpaceX is between 4 and 13% cheaper than the next competitor. But in a few years, once they launch mostly reused rockets, they’ll be around 50 to 60% cheaper than the next competitor.
I imagine that, while SpaceX will continue to improve their cost/kg to orbit and reach a launch expense of half the current cost with re-usables pretty quickly, until someone else can compete, they could just increase their profit per launch enormously. Musk needs some serious capital for his Mars plans. I hope his global satellite internet provider concept works (I can't wait to have a option other than AT&T or Comcast) and brings in the big bucks. Then he won't need to make money on launches and can drop the launch price on launches to close to cost to help all space activities. Maybe even start selling re-usable rockets to other launch companies. Can't wait to see that day.
Long term, Musk is shooting for a ~100x reduction in launch costs to make a Mars colony feasible. Hope he makes it.
Isn't this an even further argument for cloud computing? If cost savings all come from having more cores at the same price, but end user devices can't put all those cores to work, having more of the compute intensive work happen on the back end amortized over many end users seems like the only way to benefit from improvements in cores per chip.
I distinctly remember a benchmark (which my google-fu is currently unable to find) between Intel chips with and without the Iris chip. On similar conditions (clock base/turbo and core count), the Iris chip had about a 20% performance advantage.
It wasn't explained in the benchmark, but the only reason I could imagine was the Iris chip worked as an L4 cache because the benchmark was not doing graphics stuff. That is what the Iris chip does, it sits right there in the socket with a whole bunch of memory available for the iGPU or work as L4 cache if available.
It's also a great way to do (almost) zero cost transfers from main memory to (i)GPU memory -- you'd do it at the latency of the L3/L4 boundary. With intel, that unlocks a few GFLOPs of processing power -- in theory, your code would have to be adapted to work this in a reasonable way, of course.
To sum things up, I agree with you, memory is a path that holds big speedups for processors. Don't know if "the Iris way" is the best path, but it indeed showed promise. Shame that Intel decided to lock it up for the ultrabook processors mostly.
I think the end point will be a massive chip with fast interconnects and a (relatively) huge amount of on die memory talking over a fast bus to something like nvme on steroids.
My new Thinkpad has nvme and the difference is huge compared to my very fast desktop at work which has SATA connected SSD's.
This is behind much of the interest in machine learning these days. Deep learning provides a way to approximate any computable function as the composition of matrix operations with non-linearities. It does this at the cost of requiring many, many times the computing power. But much of this computing cost can be parallelized and accelerated effectively on the GPU, so with GPU cores still increasing exponentially, at some point it's likely to become more effective than CPUs.
"Deep learning provides a way to approximate any computable function as the composition of matrix operations with non-linearities."
Thanks, and I wish this sentence was one of the first things I read when I was trying to figure out exactly what Deep Learning really meant. It's much more comprehensible than the semi-magical descriptions that seem far more prevalent in introductory articles.
It's also fascinating that a seemingly simple computing paradigm is so powerful, kind of like a new Turing Machine paradigm.
"Deep learning provides a way to approximate any computable function as the composition of matrix operations with non-linearities."
This actually describes neural networks in general, not so much "deep learning".
Deep learning comes from being able to scale up neural networks from having only a few 10s or 100s of nodes per layer, to thousands and 10s of thousands of nodes per layer (and of course the combinatorial explosion of edges in the network graph between layers), coupled with the ability to process and use massive datasets to train with, and ultimately process on the trained model.
This has mainly been enabled by the cheap availability of GPUs and other parallel architectures, coupled with fast memory interconnects (both to hold the model and to shuttle data in/out of it for training and later processing) and the CPU (probably disk, too).
But neural networks have almost always been represented by matrix operations (linear algebra), it's just that there wasn't the data, nor the vast (and cheap) numbers of parallelizable processing elements available to handle it (the closest architectures I can think of that could potentially do it in the 1980/90s would be from Thinking Machines (Connection Machines) and probably systolic array processors (which were pretty niche at the time, mainly from CMU):
The point, though, is that neural networks have long been known to be most effectively computed using matrix operations, it's just that the hardware wasn't there (unless you had a lot of money to spend) nor the datasets - to enable what we today call "deep learning".
That, and AI winters didn't help matters. I would imagine that if somebody from the late 1980s had asked for 100 million to build or purchase a large parallel processing system of some form for neural network research - they would've been laughed at. Of course, no one at that time really knew that what was needed was such large architecture, nor the amount of data (plus the concept of convolutional NNs and other recent model architectures weren't yet around). Also - programming for such a system would have been extremely difficult.
So - today is the "perfect storm", of hardware, data, and software (and people who know how to use and abuse it, of course).
It seems the author is down the 'deep learning' rabbit hole.
>> It does this at the cost of requiring many, many times the computing power. But much of this computing cost can be parallelized and accelerated effectively on the GPU, so with GPU cores still increasing exponentially, at some point it's likely to become more effective than CPUs.
So can be any matrix. Sadly, there aren't as many algorithms that are efficiently represented by one.
There's nothing special about the tpu. The latest gpus are adding identical hardware to the tpu, and the name "GPU" is a misnomer now since those cards are not even intended for graphics (no monitor out). Gpus will be around for a very long time, just not doing graphics.
Yep. Simply the core idea of attacking memory latency with massive parrelization of in flight operations rather than large caches makes sense for a lot of different workloads, and that probably isn't going to change.
> Some time ago a paper was posted here that showed how even if you have an infinite amount of transistors, you will still be limited in the range of 3-10 instructions-per-clock for typical programs.
Do you know what papers that was? I would have thought that with infinite transistors you could speculatively execute all possible future code paths and memory states at the same time and achieve speedup that way.
It's not the end, if we as software developers can stop counting on the hardware folks to improve performance and do the hard work necessary to parallelize our apps. (This includes migrating components to use SIMD and/or GPUs as appropriate.)
I believe the last big improvement in power consumption was (at least for Intel) was the 5th generation Haswell chips. What they have been doing is cutting back support on older chips (2nd and 3rd generation specifically).
Apart from games, I would not be that afraid to install recent pieces of software (browser, office suit, os) on a 10 years old computer (2007 core 2 duo). But in 2007 I would not have thought for a minute that I could do the same on a 1997 computer (Pentium II), and installing 1997 software on a 1987 computer (80386), just no.
People like to say a Core 2 Duo can still hang, and while it might be okay for basic tasks (and you could use a 10 years old computer in 2007 for basic tasks like word processing, internet), modern PC's are much faster.
On benchmarks you're going to be 4-6x faster on most tasks with a 2017 Macbook Pro compared to a 2007, and then don't even get started with anything that takes advantage of SIMD or GP workloads.
The other reason people say they can use a 10 year old PC today is because they've upgraded it. Early Core 2 Duo systems shipped with 512MB or 1GB RAM. They came with very slow 80GB hard drives. Upgrading to an SSD and 4-6GB of RAM is a must.
I agree, these days a Core 2 Duo or a Core 2 Quad is getting long in the tooth. Sandy Lake and beyond are still decent performing chips.
Edit: I guess I was ahead of the curve, because my Core 2 Duo box had 2 Gbs of ram out of the box (upgraded to 4 GBs of ram later, requiring me to reinstall windows to a 64 bit edition, and then swapping the chip for a Core 2 Quad).
Well, I'm not denying that, I'm exactly in that situation for my laptop:
2007 thinkpad x61, core 2 duo T7100, but with 4GB of RAM and an SSD, to be fair I'm cheating a little as I use Debian+DWM which is lighter than Windows 10.
But it holds itself pretty well and CPU limitation will probably not be the reason I change it. The screen resolution (1024x768) will probably be the main motivator.
But this only illustrates that there were some improvement because of RAM evoluation (4 to 8 GB or RAM now for average PCs vs 1 to 2 GB back then) and huge improvement because of disk evolution (SSD vs mechanical drives). CPU is far from being the main improvement factor for common usage in the last 10 years.
FWIW The technology that often makes older computers more tolerable these days tends to be the SSDs. I have a mid-2009 MBP with 8GB of RAM and a 256GB SSD that I still use for light duty web browsing and as a DAW in my little hobby studio.
> Apart from games, I would not be that afraid to install recent pieces of software (browser, office suit, os) on a 10 years old computer (2007 core 2 duo).
The only issue I have with my old laptop from around then is browsing the web. Everything else, the performance is fine.
> it's rare these days for Intel to talk more than a generation ahead in CPU architectures.
This, to me, looks like the PS2 vs Dreamcast move, where Sony blocked Dreamcast sales with PS2 hype. We've been waiting for 10nm for a while now, and now Intel is essentially confirming "yes, it's really coming this time" where they might not have otherwise--it's hard to not see that as a reaction to Threadripper. At least on the surface, increased competition has led to increased transparency, which is good for consumers...or at least I'm happy knowing I have the option to avoid a big, hot, power-hungry chip if I want to.
It's the bulk enterprise purchases that they're worried about, not individuals. That may include bulk purchases to build laptops, but it's probably more datacenters and high-performance computing they're worried about.
I didn't wait even though I'm excited for what Ryzen mobile chips might bring and frankly an i7-7700HQ is enough for a laptop, if I need more grunt than that I'll switch to a desktop.
I like that idea. It would mean intel had to show its hand a bit early with large promises in the near future. I admit i am a bit nostalgic for the crazy days of cpu clockspeed doubling every year. Then the early Gpu wars.
Now that Intel got competition again with very competitive new AMD CPUs, Intel is releasing "new" CPU to the public that waited in their basement for some time. AMD CPUs are a lot faster than they hoped, so Intel has to skip one generation.
Besides GPUs, memory (DRAM), storage (SSDs, hard drives), wired networking (Ethernet, Thunderbolt, Fibre Channel), wireless networking (WiFi, Bluetooth, cellular) and displays (monitors, VR) are all still keeping pace with their respective versions of Moore's Law. Of course they still aren't going to catch up to CPUs any time soon. (Never in the case of networking, since light travels only so fast).
Apparently AMD and Nvidia discussed this, but Nvidia's CEO Jensen Huang wanted to be CEO of the combined company. I think AMD saw the deal as an acquisition, not a merger.
Aren't FPGAs mainly for design phase and the real crunching in the industry done on ASICs? At least whole automotive industry works that way - FPGAs to design/test stuff, ASICs for production and making money.
Nope, plently of people do heavy computing with FPGAs.
The speedup between FPGA -> ASIC is not that dramatic. Their real advantage is power draw and ammortized cost. FPGAs also have to initialize when powering up.
It mostly depends on how many you are planning to sell. For a given performance/functionality, FPGAs cost more per chip than ASICs. But ASICs come with a much greater upfront fixed cost.
What this means is that to implement the hardware the FPGA represents, you have to "program" it; this is typically done in one of only a couple HLLs (VHDL and Verilog, known as Hardware Descriptor Languages or HDLs).
At one time, Xilinx (Altera's competitor) made an FPGA which could be programmed "on the fly" very quickly; it (well, many thousands) were used for an interesting machine, of which only a few examples survive (the whole thing at the time was surreal, if you followed it - it seemed like a scam more than anything, but real hardware was shipped).
This machine was called the CAM-Brain machine, and was the creation of researcher Hugo de Garis (who is retired, and is a seemingly strange fellow in the AI community - but not as strange as Mentifex):
I encourage you to research this machine, and Mr de Garis, as the whole thing is fascinating (and I will also say, from a design perspective, the shipped CAM-Brain Machine was one of the "sexiest" looking boxen since the early Crays).
CAM-Brain meant "cellular automata machine brain" - it was basically an effort to evolve a neural network using CA and FPGA; the CA would evolve the HDL which described the hardware representation of the NN, which would then be dumped to the FPGA for processing. The process (from what I understand) was iterative.
I don't believe the "kitten" ever went past much more than some early 3D models (maybe some CAD, too) and a software simulator. At least, that's what you can still find out there today (images of the simulator running on Windows NT, iirc).
The effort was noble, but it didn't work for more than simple things. I think it was part of the "evolve-a-brain" NN dead end, which seemed to hold out some promise at the time.
That's just a bit of background, but it shows how Intel and FPGAs can be used for building hardware to represent neural networks (a GPU/TPU is not a neural network - it is merely a processor for the software representation of the neural network). Whether that's their intention, or something else (maybe something like Transmeta tried?) - only they know.
I don't know much about foundry processes, but it seems that it's taking more and more time for lesser and lesser gains, right? At this rate, how long until we reach sub nanometer? What are the physical limits on these processes, and does it have any implications for end users? Will we be using 2nm CPUs for 50 years?
Would love to hear the thinking of anyone educated on the topic.
Edit: very intrigued by the sustained downvotes on this ¯\_(ツ)_/¯
The step from 14 to 10 nm is huge. Both from a technological perspective on the manufacturing side as well as on the effect it will have to the number of transistors on a die and the power consumption of those transistors. Remember that power consumption and the number of transistors are related to the surface area so there is a square factor in there. 14 nm ^2 = 196, 10 nm ^2 = 100, so that's almost a doubling of the number of transistors and approximately a halving of the power required per transistor for a given die area.
Okay, so the node names are effectively useless at this point. They used to refer to gate length, but no longer, even for Intel. Oh, and Intel's 10nm will actually have lower performance than their 14nm.
Besides, it matters not, the bottlenecks today are in memory and interconnects.
You will see stacked memory and silicon interposers, but you won't see main memory on the CPU die. DRAM is based on an array of what is called "trench capacitors." The fabrication process is sufficiently different that they don't even make these in the same facility, much less on the same die process. An array of trench capacitors will always be smaller than transistor based memory (SRAM.)
It is not a big problem to make DRAM on pretty much every SOI process, just power consumption and refresh rates will have to be quite big.
The problem with MRAM is unreliable reads, they are excellent for low clock speed devices, but as you go into gigahertz range, signal quality of an mram cell begins to degrade, and you have to put a darlington on top of it, or a bicmos transistor, thus negating its cell size advantage
For DRAM are you talking about standard deep tech capacitor dram or FBRAM?
Agree with MRAM, but it is also a very immature technology, so there's hope at least. unless you're talking about crossbar crosstalk which can be solved with a diode.
I believe that data published by people peddling embedded dram ip is their "best case scenario" with still significant alteration to manufacturing process
Sure, I never meant to imply that previous process steps were much smaller, just that this one is still formidable in its own right. Real world gains will not be 100% but they're a very large fraction of that. Obviously any technological advance in a mature industry is going to show reduced return on investment at some point, it's rather surprising that the ROI on these process shrinks is still worth it given that we are now well beyond what was thought to be possible not all that long ago.
Yeah, so the node names now apparently refer to the "smallest feature size", which is some random thing on the M0 metal layer. Source - from a former Intel engineer for more than a decade
So not like when games consoles used to advertise how many "bits" they had: take whatever has the widest bus and advertise that as the number of "bits" or use tricks like the Atari Jaguar: 2x 32bit cpu's = 64bit, right? RIGHT?
I read on HN, a couple months ago, these numbers no longer represent the physical size of anything but are now just a marketing label, a sort of 'performance equivalent to a theoretical size of'.
Anyone know if there's any truth to this? Might try to find the comment later when I have time.
Think of them as relative indications of feature size and of spacing between identical parts (arrays if you want to use a software analogy), so even if an actual transistor will not be 10 nm or 14 nm their relative sizes will relate on one axis as 10 nm to 14 nm. Keeping the numbers from a single manufacturer will definitely aid in the comparison.
There is a ton of black magic going on here with layers being stacked vertically and masks not having any obvious visual resemblance to the shape they project on the silicon because of the interaction between the photons / xrays and the masks due to the fact that the required resulting image is small relative to the wavelength of the particles used to project it.
There is a super interesting youtube video floating around about this that I highly recommend, it's called 'indistinguishable from magic':
I don't have any direct experience with "deep submicron" stuff, but from what I've read you basically can't trust these numbers to be comparable. The various sizes/spacings don't scale together the way they did for larger feature sizes, so you could have e.g. a "14nm" process where the area of an SRAM cell, NAND gate etc. ends up the same size as another foundry's "20nm" process even though the actual transistors are smaller.
They're all marketing, Intel is no exception. At 40nm and over, the Intel node names were larger than the industry average, now it's the other way around.
> I don't know much about foundry processes, but it seems that it's taking more and more time for lesser and lesser gains, right? At this rate, how long until we reach sub nanometer? What are the physical limits on these processes, and does it have any implications for end users? Will we be using 2nm CPUs for 50 years?
The lattice constant of crystalline silicon is 0.54 nm, and since it's an FCC structure the distance between neighboring atoms is 0.38 nm. So with a hypothetical 2 nm CPU, some feature would be only roughly 5 atoms across, which leaves VERY little room for manufacturing tolerance. How would one manufacture a device containing billions of transistors with such small tolerances? I don't think we'll ever see such things, at least not with a lithography approach.
Heck, I think it's close to black magic that they are able to produce transistors on a 10nm process, but apparently experts say that up to 5nm (13 atoms!!!) might be possible.
I think that beyond our current processes, there is the potential for different materials to take the place of XXnm silicon processes, which could fit more transistors in a smaller area.
"3D" processes which build multiple layers on top of one another may also see more investment as other methods become prohibitively expensive. And once you've cheaply gone to 2 layers in a big blob of epoxy, what's stopping you from doing 4? 8? 16? 32? [Heat dissipation, probably]
But whatever, people have been saying Moore's Law is dead since Moore's Law was invented. Who knows whether we'll technically hit one milestone or another. Things get faster, what the hell.
People are already stacking multiple layers today for memory, although the layers are always manufactured separately and then bonded together in a separate step.
I wouldn't be surprised to see more of that in the future, think caches on top of cores, but I doubt we'll ever see multiple layers of transistors produced in a single step. Technical challenges aside, the economics of trying to produce multiple layers at once are just always going to be worse: higher latency from a wafer entering a fab to finishing the wafer, and much higher rate of defects. (When you produce the layers separately, you can potentially test them separately before putting them together, which is a huge win for defect rate.)
It's possible that manufacturing multiple layers at once might eventually allow for a higher density of vertical interconnects, but I just don't see that becoming the deciding factor.
widely assumed 5nm is the limit. I know i've seen others discuss some ideas around how close they can get, but i'm struggling to find the thread..
In any case, this may help:
https://en.wikipedia.org/wiki/5_nanometer
This is a sensible question. Not sure either why the downvotes. I'm curious as to the answer myself. Although, I don't know if we'll ever see sub-nanometer. Maybe that's the reason for the downvotes, that sub-nanometer is not really in the realm of what's possible with current CPU architectures and the physics of silicon. Although, that's simply based on today's physics. Who truly knows what the future will bring.
I wonder if this one will come with free backdoors and spyware installed, thanks to the wonderful Intel Management Engine (Intel ME) backdoor. [1][2][3]
Intel (and AMT) keep pushing more and more proprietary code that can not be read, changed or removed. No one knows exactly what it does and it has built in screen and key recording. It's my advice and the advice of privacy advocates that no one should purchase or use any processor made by Intel or AMD until they address these serious issues.
I'm not sure this should be discussed in this thread.
Also, I don't know of any alternative that doesn't have large unauditable blobs integrated into the chip.
All ARM SoCs come with radio processors that are running a non-trivial piece of software with full access to the system memory, which is responsible for power management, boot sequence and wireless communications. It is by definition network connected.
AMD has a technology it calls the Platform Security Processor (PSP for short) which does basically the same thing.
To have a processor that doesn't have this kind of technology, you have to give up on decades of advancement in compute power, or buy a very expensive and non-portable POWER8 or POWER9 system.
Why should a serious backdoor, privacy concerns and ethical problems with a monopolies new product not be discussed in a thread about that product? Not sure I get your point on that.
But yeah you are totally right on the alternatives. Nothing quite matches Intel and AMD, and a lot of those ARM SoC's have proprietary code running on their bootloader too. But you can get some processor from 7 years ago that are usable.
OpenPOWER is fantastic though and has real potential. There were a few projects out there looking to implement a laptop and personal desktop computer using it, but unfortunatly didn't reach it's funding goals.
I think the more people that know about Intel and AMD's shading practices that more funding open hardware projects can get, and maybe in the next few years we can replace Intel and AMD with ethical and open solutions.
I agree, this has to be allowed to be discussed about, it's literally about the product.
Haven't heard about OpenPOWER, I hope more people are made aware of alternatives to get funding and spin.
There are some ARM processors that live without blobs, I think Olimex produces what they call open-source hardware (OSHW), is this an acceptable product?
I meant that as in, there have been plenty of dedicated discussions threads on this site and many others regarding the Intel ME. Most people here know about the ME by now, and we don't have to bring it up in every single Intel-related thread.
Check out Talos II motherboard. It's a workstation-class motherboard with dual POWER9 CPUs for $2750. It's a good price for workstation computer IMO. They claim that all their firmware is open source. Specifications are quite modern. The only problem is (kind of) exotic architecture, but many people would be able to use it with open source software.
I was scanning thought he comments to see if somebody had already mentioned this and if you hadn't I would have.
I am finding the Talos II an increasingly attractive proposition, even though the prices got a full system are quite staggering by comparison to mainstream hardware.
> All ARM SoCs come with radio processors that are running a non-trivial piece of software with full access to the system memory, which is responsible for power management, boot sequence and wireless communications. It is by definition network connected.
The high-end ones used for flagship smartphones/tablets do, but low-end ones used in cheaper tablets/TV boxes and more specialized hardware often don't have any radio interface.
Do you know the depths of not taking that advice and what lurks in them? Do you know that if everybody simply took it to heart, there'd be nothing unrealistic about it at all? How many months of abstinence and solidarity would be required to end these practices, or the companies if they so wish? And then that money simply shifts to ethical companies and we actually have a future. Or, we keep pretending it's all so very hard, and don't have one.
You are asking for the whole of humanity to stop buying some of the most sought after products of modern times from two of the best-selling makers of that industry.
I am all in for some philosophical discussion but actually being this detached from reality doesn't make you any good. It's not because you can see the stars that you can reach for them right now...
So yes, in summary: it is hard, to the point of impossibility.
Keep that up for a while longer, and it will become a physical impossibility, as any gesture of resistance leads to automatic extermination. Until then? Thanks for nothing.
The Talos II[1], which is a IBM POWER9-based machine. It's a bit more expensive than a standard Intel machine (~$2k for the whole prebuilt machine, a bit less for just the motherboard+CPU).
Everything in it is free, including all of the firmware, and the CPU is an open specification.
Does anyone here know someone who works on these various management engines? It'd be interesting to see if the security services were involved or if they really were back dooring all computers right?
My guess is it's definitely possible but it would have been popped by foreign agencies by now too and there would have been a leak of tools to exploit such devices? I guess maybe it's very tempting to be able to hack any device though so knowing the NSA they are probably for doing this, fuck the consequences?
Well, this seems quite a bit unsubstantial. If an Intel Employee in that position wanted to leak some real info, i would assume it would be accompanied by something that gives the information some credibility.
The issue I have is companies like Google and Puri.sm have asked Intel and AMD for a blank signed blob that completely disables ME but they have refused this. It would take them literally no time at all. This raises all sorts of red flags that something dodgy is going on.
If you had the chance to make a supplier who creates millions of chromebooks happy, wouldn't you take every opportunity to help them, especially if it costs you little to no money at all. Obviously there is a big reason why they don't want this backdoor removed.
Which is why someone with deep enough pockets and some help by the community (crowdfunding?) should invest in making open alternatives possible. Thousands of people have been laid off in the past by big silicon corporations, I refuse to believe there aren't 10 of those people in the world who caannot be hired to design an open platform. It doesn't have to be fast as modern processors; if it allows opening a webpage at acceptable speed or playing a video at 30fps 720p that is more than enough for most us, and more importantly would send a huge message. Many would of course disagree, mainly gamers who would sell their soul to the devil for a faster graphics card or other people who don't care about their privacy. Once the design is done, it comes the fab. Decades ago any company would have to set up its own but today there are fabless companies who design chips and fabs producing them for various customers, so it's just a matter of money. The goal isn't to create an alternative with respect to computing power, but rather in usage. The message is "we're not using your bugged shit to communicate among us or keep our data".
Companies that act as OEMs for enterprises most likely have a larger footprint of Intel installs than Google. Any single companies usage of a product is dwarfed by how much effective installs a large OEM might have.
Maybe if Lenovo, Toshiba, Acer, Dell, etc all asked Intel to provide said blobs (and the threat was tangible) then they would probably reconsider.
It would be more useful for them if it could be controlled at the source level. The management engine would be fine if it were free software and could be replaced.
I'm very unhappy with my old Sun servers, for example, because the management system cannot be upgraded and the servers are no longer supported. I'm stuck with proprietary insecure software that I depend on and that I have no way of changing. It's all worse if the insecure outdated software can only be replaced by soldering wires to a chip on the board.
This is disturbing, to say the least. Given how much effort I've invested in securing myself, it's... disappointing. The rationale, it seems, is that government doesn't count as "someone to be concerned about", from a security point of view.
I'm curious about how one would be associated with a particular chip. I understand that key strokes can be logged, TCP/IP can be read; you can be scraped, but ultimately how is their backdoor aware of you so that it doesn't appear to them like needle in a stack of needles. A fascinating and revolting technical conundrum.
yep. Once everything will be under control, having the freedom to write your own software, especially software that challenges the rules, will be useless...
No. Basically the Intel ME is a completely separate ARM processor that's physically stuck onto each Intel Processor. It has direct access to everything the Intel chip does. The memory it's allocating, the hardware commands (ie keyboard, mouse, display), the software running, the processes running. This all happens at a higher level than the actual Intel processor and you have no control over it at all.
Basically whatever you run at any level on your Intel chip can be monitored by the Intel ME chip, no matter how many VM's, operating systems, encrypted files/processes you have installed/are using.
Ahh thanks, sorry was getting confused. It's AMD's PSP that uses an ARM based spyware kit. I wonder what Intel ME actually runs on then. Probably just another Intel Chip?
Have a look at https://minifree.org/ and a few Chromebooks (obviously with the operating system replaced). There are some options, but yeah it's a big problem that the microprocessor market has been locked up by two monopolies.
But I guess people have to make a personal judgement. Is ethics, privacy, freedom more important than a faster processor to run your games on?
In addition to MiniFree, there's the Talos II[1] which is an entirely free motherboard and CPU (based on IBM's POWER9). It's a very modern CPU specification, and is also fairly powerful. Currently pre-orders are open. They are a bit pricey (~$2k for a fully prebuilt machine), but if you feel that you want a more powerful CPU that is an option. They also have server offerings.
Back in the days when a new computer became hopelessly obsolete within 3 years, I would never have considered spending that much. But perhaps now I might :)
We have been over 5ghz for a decade, it just takes ln2 to do it.
We aren't going to see ludicrously high clock rates for the foreseeable future. There are a lot of compounding factors as to why, but the biggest ones are the pressure for efficiency driving designs that aren't dumping higher and higher voltage to get frequency, the diminishing returns on voltage vs frequency (see Ryzen, where a 20% improvement in clocks costs about a 50% increase in power draw across all skews, and similar situations happen with Intel).
That being said, a 4ghz Skylake core crushes a 4ghz Core 2 core. Depending on your benchmark used it can perform anywhere from 80% to upwards of 170% faster per clock. You don't get as dramatic year over year improvements increasing the the per cycle performance, but innovations leading up to ~2004 (or 06 for the multicore boon) were just stuffing power hungrier and hotter transistors on smaller dies.
That can't obviously be true except for some certainly specific workloads.
Core 2 was the breakthrough that left AMD in the dust from which it still hasn't recovered to parity. Even if I can't recall the numbers, I fail to remember how the very energy-intensive (high clocked) Pentium 4 could have been faster per clock.
Are you sure you're not thinking of the Pentium 3 to 4 change? After all, Core architecture had more in common with P3, didn't it?
We need something other then Silicon to have 4Ghz + at reasonable Power usage. I think Material Science is going to be the next big thing when we reach the limit of Silicon.
I really wish we could see more IPC and 10Ghz+, our single threaded performance have be stuck at the same level for far too long.
Adnan is the guy who may or may not have killed his girlfriend in the 90's. You are thinking Anand. As in AnandTech. And even without Mr. Shimpi AnandTech is still one of the best sources for in-depth reviews of hardware.
They have fallen behind in CPU and GPU testing methodology especially around games. Their testing of databases was pretty awful (it fit in memory) and a variety of other obvious limitations to their testing I would argue they are not only worse than they were when Anand ran the show but now significantly worse than a lot of the places they compete with.
Pcper is a significantly more capable review site these days.
And people get mad on here when I say that I use an adblocker. Until the ad industry gets its shit together, I will continue to do so, as not blocking ads is negligent from a security perspective. This is a terrible state of affairs.
They used to go Process, Architecture, Process, Architecture, which was styled as Tick, Tock. Recently they switched to Process, Architecture, More Architecture.
What do you mean by major apps? Do you have some reason to believe that apps will suddenly become embarrassingly parallel? Major apps often don't even take full advantage of SIMD instructions on the CPU. As soon as you need a context switch, branching, or fast memory access your GPU is crap.
I used to believe that, until I started to get into cryptocoin mining. There were algorithms that were specifically designed to be GPU resistant and they all were ported and saw significant gains. It was that experience that pushed me to learn how to program these devices.
The tooling isn't good enough yet, but there is no question in my mind practically everything will run on a GPU like processor in the future. The speedups are just too tremendous.
Intel knows this, which is why they purposely limit PCIe bandwidth.
>I used to believe that, until I started to get into cryptocoin mining.
99% of the apps we use are totally unlike cryptocoin mining. And the style they are written (and more importantly, their function) is even more GPU resistant.
>The tooling isn't good enough yet, but there is no question in my mind practically everything will run on a GPU like processor in the future. The speedups are just too tremendous.
Don't hold your breath. For one, nobody's porting the thousands of apps we already have and depend on everyday to GPU.
While true in the short term, it should be noted Intel is moving more towards the GPU model with its many slower core CPUs and very wide vector units. There is value in a large core capable of fast out of order execution for control purposes. But data processing can be done much faster with a GPU model.
You can even implement explicit speculative execution - simply use a warp for each path and choose at the end. It is very wasteful but can often come out ahead.
No, Intel's approach is very different from GPUs, because Intel has a strong negative interest in making porting to GPUs easy (also, wide vectors are far easier to do in a CPU than a "GPU like model").
Cryptocoin mining is embarrassingly parallel by its nature. You are trying lots of different inputs to a hash function, so you can always run an arbitrary number of them in parallel. There are various ways to reduce the GPU/FPGA/ASIC advantage, like requiring lots of RAM, but the task is still parallel if you have enough RAM. Something like a JavaScript JIT on the other hand is fundamentally hard to parallelize.
Primecoin is a good example. If you look at the implementation it requires generating a tightly packed bitfield for the sieve and then randomly accessing it after. Lots of synchronization required so you don't overwrite previously set bits are required and the memory accesses are random so its suboptimal for the GPU's memory subsystem.
It took under a year for a GPU miner to come out. Having optimized it for Intel in assembly I was convinced it wasn't possible for a GPU to beat it - and yet it happened.
It turns out even when used inefficiently a thousand cores can brute force its way through anything.
But you don't have thousands of cores. You have a medium amount of cores (64 in the latest Vega 64 GPU) with a high amount of ALUs (and threads) per core. When the GPU executes an instruction for a thread it looks if the other threads execute the same instruction and then utilises all ALUs at once. This is great for machine learning and HPC where you often just have a large matrix or array of the same datatype that you want to process but most of the time this isn't the case.
> a thousand cores can brute force its way through anything
This is so fundamentally wrong that your complete lack of understanding the underlying math is obvious. Let us use 1024bit RSA as the key to bruteforce. If we use the entire universe as a computer, i.e. every single atom in the observable universe enumerate 1 possibility every millisecond, it would take ~6 * 10^211 years to go through them all. In comparison, the universe is less than ~14 * 10^9 years old.
And this is for 1024bit, today we use 2048bit or larger.
Except you're not bruteforcing through all 2^1024, you're factoring a number, which is much easier and why rsa 768 is broken and rsa 1024 is deprecated.
Do you know how to bruteforce a prime factorization? Because I'm pretty sure from your comment that you don't. The calculation is based on enumrating all prime pairs of 512bit length or less each (the length of the indivitual primes may be longer, but for napkin math it's a very good approximation).
That is bruteforcing. That faster and better methods exists is irrelavent for a bruteforcing discussion, but I do mention in the very next message in that thread that they exist.
XPM has nothing to do with finding primes used in RSA, I'm not sure where that came from. It's POW is finding cunningham chains of smaller sized primes. That said I wouldn't be surprised if it could be adapted to find larger primes on a GPU significantly faster than a CPU.
It was simply a response to your statement about bruteforcing anything.
There is some ways to factor a prime that is much faster than bruteforcing, and yes they work on a GPU. But that has nothing to do with bruteforcing, but clever math.
The statement was stating an absolute. Algorithmic complaxity matters more than the speed of the computing hardware. Obviously, more power means you can compute more, but not everything.
That's a very uncharitable reading of what I wrote. The topic of the conversation was GPU performance vs CPU performance. Despite being less flexible, the sheer quantity of execution units more than makes up for it.
But no, I suppose its more likely I was really saying GPUs aren't bounded by the limits of the universe.
The context is Primecoin and ability to find primes. Prime factorization is related to that, and it would be obvious to read your statement in absolut for that context and it's relation. At least I did.
That's just simply not true. The 'real computation' happening on GPUs is either very heavy floating point work or graphics related work, almost everything else is running on the CPU.
GPUs are only useful if your problem is data parallel. The majority of compute intensive problems that are also data parallel at the same time have been shifted to GPUs and SIMD instructions already. A GPU isn't some pixie dust that makes everything faster.
Libreoffice has OpenCL acceleration for some spreadsheet operations. With the advent of NVME storage, and the potential bandwidth it yields, I would expect to see database systems emerging that can GPGPU accelerate operations on tables to be way, way faster than what a CPU can handle.
> I would expect to see database systems emerging that can GPGPU accelerate operations on tables to be way, way faster than what a CPU can handle.
Why do you expect that? Many DBMS operations that are not I/O limited are memory limited, and a GPU does not help you there (on the contrary, you get another bottleneck in data transfers to the small GPU memory). What can help is better data organization, e.g. transposed (columnar) storage.
People use spreadsheets for anything that you'd use a "normal" programming language for. When I worked at a bank, a real-time trading system was implemented as an Excel spreadsheet. There were third-party and internally-developed libraries to do the complicated stuff (multicast network protocols, complicated calculations that needed to be the same across all implementations, etc.) but the bulk of the business logic and UI were Excel. It's easy to modify, extend, and play with... which also makes it easy to break functionality and introduce subtle bugs. Though the same is true of any software development environment -- things you don't test break.
Don't think of spreadsheets as glorified tables. Think of them as the world's most-commonly used business logic and statistical programming language. A competitor to R, if you will.
Statistics, sure, that's definitely a good candidate for GPUs. I don't know much about R, but a quick google suggests you can run R code on a GPU, by working with certain object types, like matrices with GPU-accelerated operations.
That doesn't seem like it maps very well to a spreadsheet unless you have one big matrix per cell. I'm guessing (maybe incorrectly) that when people work with matrices in Excel, they're spread across a grid of cells. You probably could detect matrix-like operations and convert them to GPU batch jobs, but it seems very hard and I'm skeptical of how much you'd gain.
So I'm still wondering what kinds of typical Excel tasks are amenable to GPU acceleration in the first place. People use Excel to do a lot of surprising things, sure. But people use C++ and Python and Javascript for a lot of things too, and you can't just blithely move those over to the GPU.
Maybe it's specific expensive operations, like "fit a curve to the data in this huge block of cells"?
I'm aware of big spreadsheets, but from what I've seen, it tends to be complex and very ad-hoc calculations that (I imagine) don't lend themselves very well to GPUs.
Making very complex tasks run well on a GPU is hard, whereas CPUs are great for dealing with that stuff.
If you have something like a 100,000 row spreadsheet where every row is doing exactly the same calculation on different input data, sure, that starts to make sense. If people are really doing that in Excel, I'm surprised! (but maybe I shouldn't be)
Intel chips have a giant iGPU, that on certain models occupies near half the space. It can currently decode 4k Netflix. The technology is equally applicable to GPUs.
Let's try a compromise, how about multithreaded programs? Or how about programs utilizing less control flow so that static processors (yes, like the infamous VLIW) can extract ilp?