Slack's issues with their electron app shouldn't be particularly surprising, either, considering it was their head architect that published an article on Medium advocating for PHP on the basis of it's concurrency model...
Ladies and gentlemen! I present you the efficient terminal application, which only needs:
Process Memory Threads
------- --------- -------
Hyper 40.8 MB 32
Hyper Helper 51.9 MB 15
Hyper Helper 18.8 MB 12
Hyper Helper 15.2 MB 4
Total: 126.7 MB 64
The phones already show the list of power hungry apps responsible for your battery life reduction, having this on a desktop would be nice too. If even a terminal needs 64 threads and 126MB of RAM now... looks like some wrist-slapping is in order...
Yes, concurrency is a challenge, but it's tremendously rewarding. Plus modern languages make it much easier than it was in the past.
Apart from games, I would not be that afraid to install recent pieces of software (browser, office suit, os) on a 10 years old computer (2007 core 2 duo). But in 2007 I would not have thought for a minute that I could do the same on a 1997 computer (Pentium II), and installing 1997 software on a 1987 computer (80386), just no.
Last time I replaced my CPU (i5-6500 in place of i5-2500), I only saw a marginal improvement.
Even at my former job, I worked with some decent servers (32 to 48 cores, 128GB of RAMs) to build Linux masters from scratch (Gentoo based). The oldest server of the bunch (which is 5 years old now) is still the fastest to build a master from scratch, it has less cores but faster clock and even on parallel tasks like compiling stuff, clock is still a more determining factor than cores.
There are still tons of things to improve in CPUs: power consumption, cost, embedded functionalities (SoC)... but performance improvements seem a huge cost for low gains adventure right now and for the foreseeable future.
Ah, I can answer that one for you: my work computer of that time (E6850, 8 GB RAM), which cost me less than a thousand euros to build back then, has since been reconverted as a work computer for one of my employee, running Windows 10, Office 2016 and Chrome all day long. Only addition has been a 120GB SSD.
It runs much better than the "modern" and "cheap" ~ 400 euros integrated work computer I bought from ASUS and HP in 2015.
I'd rather use an old machine that's been upgraded with an SSD than a brand new machine that only has a mechanical hdd
Clock speeds seem to have leveled and IPC will only see another gain of 50-100%. Single threaded performance is close to the limit. What after that? Is this the end?
This is commonly claimed but it's actually false for x86_64 desktop parts. For a single core scalar integer workload the IPC boost from i7-2700k to i7-7700k was maybe 20-25% on a great day, but the base frequency increase was a further 20%, and max boost freq increase ~15%. The frequency increase is of similar importance as the IPC increase.
Was it the end of those industries?
Welcome to mature technology.
Economically? I think this or last year.
That's not a 50% or 100% improvement.
Maybe they'll get that improvement once they run recycled rockets all the time, but not before that.
Currently, SpaceX has prices around 56-62 million USD per launch of a normal satellite (with a weight and orbit where they can recover the first stage).
Arianespace launches such lighter satellites in pairs, always two at once, at a price of around 60 million USD per satellite.
The Chinese launchers offer the same at around 70 million USD per launch.
So, the prices aren’t that different.
But, for launches from reused rockets, SpaceX is damn cheap. The first launch on a reused rocket cost below 30 million USD.
So, to recap: Today, in best case, SpaceX is between 4 and 13% cheaper than the next competitor. But in a few years, once they launch mostly reused rockets, they’ll be around 50 to 60% cheaper than the next competitor.
Long term, Musk is shooting for a ~100x reduction in launch costs to make a Mars colony feasible. Hope he makes it.
Also more specialised cores e.g. DSP, and customisable hardware i.e. FPGA.
It wasn't explained in the benchmark, but the only reason I could imagine was the Iris chip worked as an L4 cache because the benchmark was not doing graphics stuff. That is what the Iris chip does, it sits right there in the socket with a whole bunch of memory available for the iGPU or work as L4 cache if available.
It's also a great way to do (almost) zero cost transfers from main memory to (i)GPU memory -- you'd do it at the latency of the L3/L4 boundary. With intel, that unlocks a few GFLOPs of processing power -- in theory, your code would have to be adapted to work this in a reasonable way, of course.
To sum things up, I agree with you, memory is a path that holds big speedups for processors. Don't know if "the Iris way" is the best path, but it indeed showed promise. Shame that Intel decided to lock it up for the ultrabook processors mostly.
My new Thinkpad has nvme and the difference is huge compared to my very fast desktop at work which has SATA connected SSD's.
This is behind much of the interest in machine learning these days. Deep learning provides a way to approximate any computable function as the composition of matrix operations with non-linearities. It does this at the cost of requiring many, many times the computing power. But much of this computing cost can be parallelized and accelerated effectively on the GPU, so with GPU cores still increasing exponentially, at some point it's likely to become more effective than CPUs.
Thanks, and I wish this sentence was one of the first things I read when I was trying to figure out exactly what Deep Learning really meant. It's much more comprehensible than the semi-magical descriptions that seem far more prevalent in introductory articles.
It's also fascinating that a seemingly simple computing paradigm is so powerful, kind of like a new Turing Machine paradigm.
This actually describes neural networks in general, not so much "deep learning".
Deep learning comes from being able to scale up neural networks from having only a few 10s or 100s of nodes per layer, to thousands and 10s of thousands of nodes per layer (and of course the combinatorial explosion of edges in the network graph between layers), coupled with the ability to process and use massive datasets to train with, and ultimately process on the trained model.
This has mainly been enabled by the cheap availability of GPUs and other parallel architectures, coupled with fast memory interconnects (both to hold the model and to shuttle data in/out of it for training and later processing) and the CPU (probably disk, too).
But neural networks have almost always been represented by matrix operations (linear algebra), it's just that there wasn't the data, nor the vast (and cheap) numbers of parallelizable processing elements available to handle it (the closest architectures I can think of that could potentially do it in the 1980/90s would be from Thinking Machines (Connection Machines) and probably systolic array processors (which were pretty niche at the time, mainly from CMU):
These latter machines started to prove some of what we take for granted today, in the form of the NAVLAB ALVINN self-driving vehicle:
Of course, today it can be done on a smartphone:
The point, though, is that neural networks have long been known to be most effectively computed using matrix operations, it's just that the hardware wasn't there (unless you had a lot of money to spend) nor the datasets - to enable what we today call "deep learning".
That, and AI winters didn't help matters. I would imagine that if somebody from the late 1980s had asked for 100 million to build or purchase a large parallel processing system of some form for neural network research - they would've been laughed at. Of course, no one at that time really knew that what was needed was such large architecture, nor the amount of data (plus the concept of convolutional NNs and other recent model architectures weren't yet around). Also - programming for such a system would have been extremely difficult.
So - today is the "perfect storm", of hardware, data, and software (and people who know how to use and abuse it, of course).
>> It does this at the cost of requiring many, many times the computing power. But much of this computing cost can be parallelized and accelerated effectively on the GPU, so with GPU cores still increasing exponentially, at some point it's likely to become more effective than CPUs.
So can be any matrix. Sadly, there aren't as many algorithms that are efficiently represented by one.
Do you know what papers that was? I would have thought that with infinite transistors you could speculatively execute all possible future code paths and memory states at the same time and achieve speedup that way.
Speculation can only take you so far. How do you speculatively execute something like:
a = a + b[x];
You can't even speculatively fetch the second operand until you have real values for b and x.
Trying to model all possible values explodes so much faster than all possible control paths that it's only of very theoretical interest.
People like to say a Core 2 Duo can still hang, and while it might be okay for basic tasks (and you could use a 10 years old computer in 2007 for basic tasks like word processing, internet), modern PC's are much faster.
On benchmarks you're going to be 4-6x faster on most tasks with a 2017 Macbook Pro compared to a 2007, and then don't even get started with anything that takes advantage of SIMD or GP workloads.
The other reason people say they can use a 10 year old PC today is because they've upgraded it. Early Core 2 Duo systems shipped with 512MB or 1GB RAM. They came with very slow 80GB hard drives. Upgrading to an SSD and 4-6GB of RAM is a must.
Edit: I guess I was ahead of the curve, because my Core 2 Duo box had 2 Gbs of ram out of the box (upgraded to 4 GBs of ram later, requiring me to reinstall windows to a 64 bit edition, and then swapping the chip for a Core 2 Quad).
2007 thinkpad x61, core 2 duo T7100, but with 4GB of RAM and an SSD, to be fair I'm cheating a little as I use Debian+DWM which is lighter than Windows 10.
But it holds itself pretty well and CPU limitation will probably not be the reason I change it. The screen resolution (1024x768) will probably be the main motivator.
But this only illustrates that there were some improvement because of RAM evoluation (4 to 8 GB or RAM now for average PCs vs 1 to 2 GB back then) and huge improvement because of disk evolution (SSD vs mechanical drives). CPU is far from being the main improvement factor for common usage in the last 10 years.
Don't forget bugs! We have had buggy silicon, microcode and drivers, all of which probably still need fixing.
The only issue I have with my old laptop from around then is browsing the web. Everything else, the performance is fine.
It's hard not to look at everything and interpret the marginal gains the past half decade as monopolist laziness/arrogance.
This, to me, looks like the PS2 vs Dreamcast move, where Sony blocked Dreamcast sales with PS2 hype. We've been waiting for 10nm for a while now, and now Intel is essentially confirming "yes, it's really coming this time" where they might not have otherwise--it's hard to not see that as a reaction to Threadripper. At least on the surface, increased competition has led to increased transparency, which is good for consumers...or at least I'm happy knowing I have the option to avoid a big, hot, power-hungry chip if I want to.
Although it has been deliberately deployed many times now, as you note one example of.
Never compare an existing product to one that is not yet being sold. Never
It's Cannon Lake (Intel's first 10nm chip) which is supposed to come out this year.
The speedup between FPGA -> ASIC is not that dramatic. Their real advantage is power draw and ammortized cost. FPGAs also have to initialize when powering up.
What this means is that to implement the hardware the FPGA represents, you have to "program" it; this is typically done in one of only a couple HLLs (VHDL and Verilog, known as Hardware Descriptor Languages or HDLs).
At one time, Xilinx (Altera's competitor) made an FPGA which could be programmed "on the fly" very quickly; it (well, many thousands) were used for an interesting machine, of which only a few examples survive (the whole thing at the time was surreal, if you followed it - it seemed like a scam more than anything, but real hardware was shipped).
This machine was called the CAM-Brain machine, and was the creation of researcher Hugo de Garis (who is retired, and is a seemingly strange fellow in the AI community - but not as strange as Mentifex):
I encourage you to research this machine, and Mr de Garis, as the whole thing is fascinating (and I will also say, from a design perspective, the shipped CAM-Brain Machine was one of the "sexiest" looking boxen since the early Crays).
CAM-Brain meant "cellular automata machine brain" - it was basically an effort to evolve a neural network using CA and FPGA; the CA would evolve the HDL which described the hardware representation of the NN, which would then be dumped to the FPGA for processing. The process (from what I understand) was iterative.
I don't believe the "kitten" ever went past much more than some early 3D models (maybe some CAD, too) and a software simulator. At least, that's what you can still find out there today (images of the simulator running on Windows NT, iirc).
The effort was noble, but it didn't work for more than simple things. I think it was part of the "evolve-a-brain" NN dead end, which seemed to hold out some promise at the time.
That's just a bit of background, but it shows how Intel and FPGAs can be used for building hardware to represent neural networks (a GPU/TPU is not a neural network - it is merely a processor for the software representation of the neural network). Whether that's their intention, or something else (maybe something like Transmeta tried?) - only they know.
- 2011: 32nm
- 2012: 22nm
- 2014: 14nm
- 2018?: 10nm
I don't know much about foundry processes, but it seems that it's taking more and more time for lesser and lesser gains, right? At this rate, how long until we reach sub nanometer? What are the physical limits on these processes, and does it have any implications for end users? Will we be using 2nm CPUs for 50 years?
Would love to hear the thinking of anyone educated on the topic.
Edit: very intrigued by the sustained downvotes on this ¯\_(ツ)_/¯
Besides, it matters not, the bottlenecks today are in memory and interconnects.
Less than 14++, sure, but 10+ and 10++ will fix that.
I mean, more than it already is (with cache).
There are also other attempts such as FBRAM to replicate DRAM structures without the need for insane A/R trench capacitors.
I believe that such solutions are necessary to continue scaling.
The problem with MRAM is unreliable reads, they are excellent for low clock speed devices, but as you go into gigahertz range, signal quality of an mram cell begins to degrade, and you have to put a darlington on top of it, or a bicmos transistor, thus negating its cell size advantage
Agree with MRAM, but it is also a very immature technology, so there's hope at least. unless you're talking about crossbar crosstalk which can be solved with a diode.
Real world gains will never be as high as the math suggests, as you get into leakage currents etc.
Anyone know if there's any truth to this? Might try to find the comment later when I have time.
There is a ton of black magic going on here with layers being stacked vertically and masks not having any obvious visual resemblance to the shape they project on the silicon because of the interaction between the photons / xrays and the masks due to the fact that the required resulting image is small relative to the wavelength of the particles used to project it.
There is a super interesting youtube video floating around about this that I highly recommend, it's called 'indistinguishable from magic':
It's up to date to 22 nm. Highly recommended.
It's really a must-see for anyone interested in processor technology.
Oh!, there's a new video! awesome!
The "7nm/10nm" transistor fin pitches are something like ~50 n, and the length is something like 100-150nm
The lattice constant of crystalline silicon is 0.54 nm, and since it's an FCC structure the distance between neighboring atoms is 0.38 nm. So with a hypothetical 2 nm CPU, some feature would be only roughly 5 atoms across, which leaves VERY little room for manufacturing tolerance. How would one manufacture a device containing billions of transistors with such small tolerances? I don't think we'll ever see such things, at least not with a lithography approach.
Heck, I think it's close to black magic that they are able to produce transistors on a 10nm process, but apparently experts say that up to 5nm (13 atoms!!!) might be possible.
Research like: http://news.stanford.edu/press-releases/2017/08/11/new-ultra...
"3D" processes which build multiple layers on top of one another may also see more investment as other methods become prohibitively expensive. And once you've cheaply gone to 2 layers in a big blob of epoxy, what's stopping you from doing 4? 8? 16? 32? [Heat dissipation, probably]
But whatever, people have been saying Moore's Law is dead since Moore's Law was invented. Who knows whether we'll technically hit one milestone or another. Things get faster, what the hell.
I wouldn't be surprised to see more of that in the future, think caches on top of cores, but I doubt we'll ever see multiple layers of transistors produced in a single step. Technical challenges aside, the economics of trying to produce multiple layers at once are just always going to be worse: higher latency from a wafer entering a fab to finishing the wafer, and much higher rate of defects. (When you produce the layers separately, you can potentially test them separately before putting them together, which is a huge win for defect rate.)
It's possible that manufacturing multiple layers at once might eventually allow for a higher density of vertical interconnects, but I just don't see that becoming the deciding factor.
Found it.. http://semiengineering.com/will-7nm-and-5nm-really-happen/
and the HN discussion from several years ago: https://news.ycombinator.com/item?id=7920108
2012: 31% size reduction
2014: 36% size reduction
2018: 28,5% size reduction
Intel (and AMT) keep pushing more and more proprietary code that can not be read, changed or removed. No one knows exactly what it does and it has built in screen and key recording. It's my advice and the advice of privacy advocates that no one should purchase or use any processor made by Intel or AMD until they address these serious issues.
Also, I don't know of any alternative that doesn't have large unauditable blobs integrated into the chip.
All ARM SoCs come with radio processors that are running a non-trivial piece of software with full access to the system memory, which is responsible for power management, boot sequence and wireless communications. It is by definition network connected.
AMD has a technology it calls the Platform Security Processor (PSP for short) which does basically the same thing.
To have a processor that doesn't have this kind of technology, you have to give up on decades of advancement in compute power, or buy a very expensive and non-portable POWER8 or POWER9 system.
But yeah you are totally right on the alternatives. Nothing quite matches Intel and AMD, and a lot of those ARM SoC's have proprietary code running on their bootloader too. But you can get some processor from 7 years ago that are usable.
OpenPOWER is fantastic though and has real potential. There were a few projects out there looking to implement a laptop and personal desktop computer using it, but unfortunatly didn't reach it's funding goals.
I think the more people that know about Intel and AMD's shading practices that more funding open hardware projects can get, and maybe in the next few years we can replace Intel and AMD with ethical and open solutions.
Haven't heard about OpenPOWER, I hope more people are made aware of alternatives to get funding and spin.
There are some ARM processors that live without blobs, I think Olimex produces what they call open-source hardware (OSHW), is this an acceptable product?
Usually possible to replace that blob! e.g. https://github.com/christinaa/rpi-open-firmware for the Raspberry Pi
Isn't it more accurate to say it might at some point be available as a motherboard with power 9 CPUs?
I mean, it looks very interesting, but afaik no-one has been shown even a prototype yet?
I am finding the Talos II an increasingly attractive proposition, even though the prices got a full system are quite staggering by comparison to mainstream hardware.
The high-end ones used for flagship smartphones/tablets do, but low-end ones used in cheaper tablets/TV boxes and more specialized hardware often don't have any radio interface.
I am all in for some philosophical discussion but actually being this detached from reality doesn't make you any good. It's not because you can see the stars that you can reach for them right now...
So yes, in summary: it is hard, to the point of impossibility.
Keep that up for a while longer, and it will become a physical impossibility, as any gesture of resistance leads to automatic extermination. Until then? Thanks for nothing.
Pretty much everyone can imagine computers being insecure and unreliable, since computers are currently insecure and unreliable.
Everything in it is free, including all of the firmware, and the CPU is an open specification.
My guess is it's definitely possible but it would have been popped by foreign agencies by now too and there would have been a leak of tools to exploit such devices? I guess maybe it's very tempting to be able to hack any device though so knowing the NSA they are probably for doing this, fuck the consequences?
If you had the chance to make a supplier who creates millions of chromebooks happy, wouldn't you take every opportunity to help them, especially if it costs you little to no money at all. Obviously there is a big reason why they don't want this backdoor removed.
Basically he says that even Google is puny (in terms of production units) in front of Intel or Samsung, and cannot ask for custom firmwares.
Hardware security is currently a shit show because of global monopolies/"oligopolies".
They operate a top 3 cloud service and have enormous internal data centers as well.
Maybe if Lenovo, Toshiba, Acer, Dell, etc all asked Intel to provide said blobs (and the threat was tangible) then they would probably reconsider.
I'm very unhappy with my old Sun servers, for example, because the management system cannot be upgraded and the servers are no longer supported. I'm stuck with proprietary insecure software that I depend on and that I have no way of changing. It's all worse if the insecure outdated software can only be replaced by soldering wires to a chip on the board.
I'm curious about how one would be associated with a particular chip. I understand that key strokes can be logged, TCP/IP can be read; you can be scraped, but ultimately how is their backdoor aware of you so that it doesn't appear to them like needle in a stack of needles. A fascinating and revolting technical conundrum.
It is troubling how little priority the computer world gives to proper security models.
Also, the management engine is more about the chipset than about the core, which is what the announcement is about.
Basically whatever you run at any level on your Intel chip can be monitored by the Intel ME chip, no matter how many VM's, operating systems, encrypted files/processes you have installed/are using.
so go back to stone age by stop using all PCs/Servers? nice troll.
But I guess people have to make a personal judgement. Is ethics, privacy, freedom more important than a faster processor to run your games on?
We aren't going to see ludicrously high clock rates for the foreseeable future. There are a lot of compounding factors as to why, but the biggest ones are the pressure for efficiency driving designs that aren't dumping higher and higher voltage to get frequency, the diminishing returns on voltage vs frequency (see Ryzen, where a 20% improvement in clocks costs about a 50% increase in power draw across all skews, and similar situations happen with Intel).
That being said, a 4ghz Skylake core crushes a 4ghz Core 2 core. Depending on your benchmark used it can perform anywhere from 80% to upwards of 170% faster per clock. You don't get as dramatic year over year improvements increasing the the per cycle performance, but innovations leading up to ~2004 (or 06 for the multicore boon) were just stuffing power hungrier and hotter transistors on smaller dies.
Non-x86 systems went over 5 GHz on air many years ago, e.g. IBM POWER.
Core 2 Duos were slower per clock than Pentium 4s, I think by quite a bit. It was a real set back for performance when they came out.
Core 2 was the breakthrough that left AMD in the dust from which it still hasn't recovered to parity. Even if I can't recall the numbers, I fail to remember how the very energy-intensive (high clocked) Pentium 4 could have been faster per clock.
Are you sure you're not thinking of the Pentium 3 to 4 change? After all, Core architecture had more in common with P3, didn't it?
i5-4670K AT 5GHZ OC On a $30 Air Cooler
I really wish we could see more IPC and 10Ghz+, our single threaded performance have be stuck at the same level for far too long.
I've reported it to them a couple of times, last time Ryan Smith told me on twitter that I'm welcome to report it to him direct: https://twitter.com/ryansmithat/status/877409854087806976
I assume you have an adblocker on desktop but not on mobile?
Pcper is a significantly more capable review site these days.
I hope he made some good money.
I think especially their phone and tablet reviews have gone down a lot. Fx no deep dive into the newer iPhones and iPads.
Really? Can you link an example?
I like to call it Tick, Tock, Clunk.
The CPU's performance has started matter less and less.
GPUs are only good at _very_ specific workloads.
The tooling isn't good enough yet, but there is no question in my mind practically everything will run on a GPU like processor in the future. The speedups are just too tremendous.
Intel knows this, which is why they purposely limit PCIe bandwidth.
99% of the apps we use are totally unlike cryptocoin mining. And the style they are written (and more importantly, their function) is even more GPU resistant.
>The tooling isn't good enough yet, but there is no question in my mind practically everything will run on a GPU like processor in the future. The speedups are just too tremendous.
Don't hold your breath. For one, nobody's porting the thousands of apps we already have and depend on everyday to GPU.
You can even implement explicit speculative execution - simply use a warp for each path and choose at the end. It is very wasteful but can often come out ahead.
Let me detail you a completely specific workload citing my own personal experience over objective facts with citations
Oh and let me also take a swipe at Intel again without any verifiable evidence
It took under a year for a GPU miner to come out. Having optimized it for Intel in assembly I was convinced it wasn't possible for a GPU to beat it - and yet it happened.
It turns out even when used inefficiently a thousand cores can brute force its way through anything.
Cool! Please factor this very large prime for me <insert some RSA prime>
That primecoin exists?
That it's POW is finding prime numbers?
That it has a GPU miner?
This is so fundamentally wrong that your complete lack of understanding the underlying math is obvious. Let us use 1024bit RSA as the key to bruteforce. If we use the entire universe as a computer, i.e. every single atom in the observable universe enumerate 1 possibility every millisecond, it would take ~6 * 10^211 years to go through them all. In comparison, the universe is less than ~14 * 10^9 years old.
And this is for 1024bit, today we use 2048bit or larger.
That is bruteforcing. That faster and better methods exists is irrelavent for a bruteforcing discussion, but I do mention in the very next message in that thread that they exist.
There is some ways to factor a prime that is much faster than bruteforcing, and yes they work on a GPU. But that has nothing to do with bruteforcing, but clever math.
But no, I suppose its more likely I was really saying GPUs aren't bounded by the limits of the universe.
Why do you expect that? Many DBMS operations that are not I/O limited are memory limited, and a GPU does not help you there (on the contrary, you get another bottleneck in data transfers to the small GPU memory). What can help is better data organization, e.g. transposed (columnar) storage.
The more complex the operations performed on the columns (trasnform), the better the GPU database will be - because of the higher ops/bytes ratio
Statistics, sure, that's definitely a good candidate for GPUs. I don't know much about R, but a quick google suggests you can run R code on a GPU, by working with certain object types, like matrices with GPU-accelerated operations.
That doesn't seem like it maps very well to a spreadsheet unless you have one big matrix per cell. I'm guessing (maybe incorrectly) that when people work with matrices in Excel, they're spread across a grid of cells. You probably could detect matrix-like operations and convert them to GPU batch jobs, but it seems very hard and I'm skeptical of how much you'd gain.
Maybe it's specific expensive operations, like "fit a curve to the data in this huge block of cells"?
Looks like it is indeed identifying large groups of cells with common formulas, and running those calculations on the GPU.
Making very complex tasks run well on a GPU is hard, whereas CPUs are great for dealing with that stuff.
If you have something like a 100,000 row spreadsheet where every row is doing exactly the same calculation on different input data, sure, that starts to make sense. If people are really doing that in Excel, I'm surprised! (but maybe I shouldn't be)
Machine learning may be all the hype right now, but it is still suboptimal for most software demands.