> Intel started using automated place and route techniques for the 386 processor, since it was much faster than manual layout and dramatically reduced the number of errors. Placement was done with a program called Timberwolf, developed by a Berkeley grad student. As one member of the 386 team said, "If management had known that we were using a tool by some grad student as a key part of the methodology, they would never have let us use it."
The grad student was Carl Sechen, advised by Alberto Sangiovanni-Vincentelli.
"...we finally made
the decision that we should go with automatic place and route. Neither one of those things existed at
Intel and the concern was could we get it done in time and would it blow up the areas of the chip so that
they wouldn’t fit and then it would all fall apart and we’d have to do it by hand. So what we did, we got an
automatic placement program from a grad student at Berkeley, it was called Timberwolf and we checked
it out and it seemed to do an adequate job so we had his software. He moved to MIT to work on another
project and we actually had a terminal set up in his campus room where he’d fix bugs in the auto
placement program as they came up. But luckily the whole thing came together and worked. There are
several points in time where we’d get stuck and have to be waiting for him to fix his program. So that
would take the individual cells and put them within a rectangle in an optimal situation for speed.
"...I was just going point out that if management had known that we were using a tool by some grad
student as the key part of the methodology, they would never have let us use it."
EDIT: I didn't realize that Right-o had an article on i386 place and route with standard cells that also links to the panel interview. The specific areas of the i386 die that used standard cells are identified.
How to let the world be populated by people who can make free decisions about their work and their employer. Nothing to do with negative externalities, unless you think feudalism was particularly green.
The two sides (A grad student, Intel) are not operating on an equal field. As such the "free decisions" aren't really a valid dimension for "free dimensions", as we have to include how unfair the field is in this case (...in all cases).
Intel did 152 BILLION DOLLARS in stock buybacks over 35 years and ruined their own lead. They screwed/squeezed a lot of people so they could juice the stock price - to the point where they lost the semiconductor lead (as that wasn't what they were most worried about - they were mostly concerned with stock price). We need to consider all of these things when we determine if the field is fair for an employee.
Of course it's not equal - it shouldn't be. If you want to be able to seriously negotiate with Intel as an equal, you have to do a lot more than be a grad student.
Why would one grad student have the same power as 10000 employees of a company? Surely when you're finishing with ominous repetition you must know something's a bit off[0].
When I go to the page, I get the CF "are you human" check, which I complete.
However, every image load is also getting that check, but those checks are not presented to me - just the image doesn't load because a HTML page is being returned.
The other day I tried to scan a file on VirusTotal. I got an infinite loop of infinitely slow fading "select all the fire hydrants", followed by rejection, 10 times in a row before I gave up.
Almost as though they had rejected me before the captcha and were merely torturing me for their amusement?
Even more bizarrely, VirusTotal presented a second upload form on the captcha page... which itself is captcha-free...
i use uMatrix and I'm well familiar with the Cloudflare "are you human" steps... I am not encountering the problem you describe, and Cloudflare is not listed as being involved on the dashboard
You do realize that a large part of the web routes through buttflare even if no buttflare domains are involved, right? Whether one of those requests retrurns the buttflare captcha instead of the requested resource is controlled by buttflare and can vary from user to user.
No. Actually, the state of the art of EDA software is worse.
My project has been to design (and create) better EDA software that will simulate, optimize and therefore can form and place each individual transistor optimally to achieve lower power, higher speed and lower cost.
There is only one drawback over all existing EDA software: my EDA tools must run on a ($100k) small supercomputer or FPGA cluster because it deals with a billionfold more transistors than existing EDA software and that takes more compute.
It means my software is much cheaper than existing EDA software but will yield much better chips and wafers with much faster, better, cheaper and fewer transistors.
I'm eager to give a talk on my EDA software as well, please consider inviting me to give it?
Other researchers and companies have proven that optimizing transistor design and placement over standard cell libraries and PDKs can be done, for example:
I am very certain (but have no hard proof) that this is what Apple did on their M1, M2, M3. M4 and M5 processors, especially their high end M2 and M5 Ultra chips.
What I'm claiming here is that humanity can design three to four orders of magnitude faster computer chips using at least two orders of magnitudes less energy making chips orders of magnitude cheaper if we only used better EDA software (CAD=> SYM=> FAB) that we use today. Moore's law is not at an end. I'd be happy to provide proof of this, but that takes a bit more effort than a HN comment.
I don't know about this at any detailed level, but doesn't designing standard cells for leading edge nodes involve a lot of trial and error? Is a lot of the issues that can occur even well understood to the level that it can be simulated?
With the approach you mention, would it involve creating "custom standard cells", or would the software allow placement of every transistor outside of even a standard cell grid? If the latter, I would have trouble believing it could be feasible with the order of magnitude of computing power we have available to us today.
The best results will be with custom shapes and custom individual placement of every transistor outside standard cell but within the PDK rules. Going outside the PDK rules will be even better but also harder.
The trial and error you do mostly by simulating your transistors which you than validate by making the wafers. You can simulate with mathematical models (for example in SPICE) but you should eventually try to simulate at the molecular, the atom/electron/photon and even at the quantum level, but each finer grained simulation level will take orders of magnitude more compute resources.
Chip quality is indeed limited by the magnitude of computing power and software: to design better (super)computer chips you need supercomputers.
We designed a WSI (wafer scale integration) with a million core processors and terabytes of SRAM on a wafer with 45 trillion transistors that we won't chip into chips. It would cost roughly $20K in mass production and would be the fastest cheapest desktop supercomputer to run my EDA software on so you could design even better transistors for the next step.
We also designed a $800 WSI 180nm version with 16000 cores with the same transitors as the Pentium chip in the RightTo article.
Has this WSI chip been taped out/verified? I must admit I am somewhat skeptical of TBs of SRAM, even at wafer scale integration. What would the power efficiency/cooling look like?
The full WSI with 10 billion transistors at 180nm has not been taped out yet, I need $100K investment for that. This has 16K processors and a few megabyte SRAM.
I taped out 9 mm2 test chips to test transistors, the processors, programmable Morphle Logic and interconnects.
The ultra-low power 3nm WSI with trillions of transistors anda Terabyte SRAM will draw a megaWatt and would melt the transistors. So we need to simulate the transitors better and lower to power to 2 to 3 terawatt.
There is a youtube video of a teardown of the Cerebras WSI cooling system where they mention the cooling and power numbers. They also mention that they also modeled their WSI on their own supercomputer, their previous WSI.
This sounds exciting but the enormous and confusing breadth of what your bio says you are working on, and the odd unit errors (lowering "a megawatt" to "2 to 3 terawatt), is really harming you credibility here. Do you have a link to a well-explained example of what you've achieved so far?
Are you concerned that going away from standard cells will cause parametric variation, which reduces the value proposition? Have you tested your approach on leading FinFET nodes?
Hello, I am interested in your research as well as MicroMagics. The Claremont (32nm Pentium) and MicroMagic are the only application processors that have utilized NTV by stabilizing the voltage at 350mV-500mV. I started a project to make solar powerable mobile devices https://hackaday.io/project/177716-the-open-source-autarkic-... My email is available in the github's linked.
We've designed $0,10 ultra low power 8/16/32/64 bit processor SoC with built in MPPT buck/boost converters so they can be powered directly from single solar cells or small charge Li-ion cells. They have low power networking so you can create clusters. I'm not sure yet if they will be even lower power than you 5 mW processors but 1 mW is our aim.
I would argue that a solar powered computer would benefit from a 2 megabyte (could be as low as 128KB) SRAM operating system with GUI like Squeak or Smalltalk-80 instead of a Linux as you propose. We've learned a lot from the low power OLPC designs.
Thanks for the invite, I'm eager to collaborate on your solar powered computers but I'm having trouble finding your email in your githubs. Could you email us morphle73 at g mail dot com?
That sounds very suitable for solar powered IoT and IIoT sensors, so the talk about GUI's feels confusing.
Zephyr or freertos are perfectly fine with sub-2meg amounts if SRAM.
this is pretty exciting! i agree about the squeak-like approach. what would you use for the screen? i've been thinking that sharp's memory-in-pixel displays are the best option, but my power budget for the zorzpad is a milliwatt including screen, flash, and keyboard, not just the soc
There are ultra low power ePaper displays that only need power to change pixels but need no power to light the display or hold an image. They are usually black and white or grayscale.
> Typically, the energy required for a full switch on an E-Ink display is about 7 to 8mJ/cm2.
>The most common eInk screen takes 750 - 1800 mW during an active update
The Smalltalk-80 Alto, the Lisa and the 128K Mac had full window GUIs in black and white and desk top publishing.
The One Laptop Per Child (OLPC) had low power LCD color screens especially made for use in sunlight and would combine nicely with solar panels.
hey, i've been looking for those numbers for years! where did you get them?
the particular memory lcd i have is 35mm × 58mm, which is 20cm², so at 7½ millijoules per square cm, updating the same area of epaper would require 150 millijoules to update if it were epaper. the lcd in fact requires 50 microwatts to maintain the display. so, if it updates more than once every 50 minutes, it will use less power than the epaper display, by your numbers. (my previous estimate was 20 minutes, based on much less precise numbers.) at one frame per second it would use about a thousand times less power than epaper
so in this context epaper is ultra high power rather than ultra low power. and the olpc pixel qi lcds, excellent as they are, are even more power-hungry
pixel qi and epaper both have the advantage over the memory lcd that they support grayscale (and pixel qi supports color when the backlight is on)
>At IDF last year Intel's Justin Rattner demonstrated a 32nm test chip based on Intel's original Pentium architecture that could operate near its threshold voltage. The power consumption of the test chip was so low that the demo was powered by a small solar panel. A transistor's threshold voltage is the minimum voltage applied to the gate for current to flow. The logical on state is typically mapped to a voltage much higher than the threshold voltage to ensure reliable and predictable operation. The non-linear relationship between power and voltage makes operating at lower voltages, especially those near the threshold very interesting.
I'd love to but what do you want me to elaborate on?
We started making EDA tools and simulators (CAD, SYM FAB as Alan Kay says) and designing a wafer scale integration to run parallel Squeak Smalltalk (David Ungar's ROARVM) in 2007 and we are still working on it in 2024 so I estimate 30,000 hours now. I call that very ambitious too.
>Also, have you checked out the OpenROAD[1] project? It’s a pretty impressive
No it is not pretty impressive EDA software, OpenROAD software quality is like Linux, " a budget of bad ideas" as Alan Kay typifies it. Openroad is decades old sequential program code, millions of lines of ancient C, C++ and bits of Python programs written in the very low level C language, riddled with bugs and pathes. The tools are bolted together with primitive scripts and very finicky configurations and parametric rules. Not that the commercial proprietary EDA software is any better, that usually is even worse but because you don't see the source code you can't see the underlying mess.
Good EDA tools should be written by just a few expert programmers and scientists in just a few thousand lines of code and run on a supercomputer.
So the first ambitous goal is to learn how to write better software (than the current dozens of millions of lines of EDA software code). Alan Kay explains how [1-3]:
I watched the video of your talk and it seems impressive. I work in an IP design firm, though I don't have any decision power there I could get you a foot in the door. If you're interested in trying to convince the brass, could you send an email to Eli dot senn at dolphin dot fr?
If you have working software that gives orders of magnitude improvements, needing $100k worth of hardware would be no barrier at all. That's a fraction of just the EDA software license budget for many projects.
I wouldn't want to do that, GPU compilers are not open source and the hardware is undocumented. As a scientist I feel they are extremely badly designed.
My transistor and atomic simulation software is extremely parallel but not in the limited SIMD way that GPU's are.
As the article mentions, producing an optimal layout is an optimisation problem where the related decision problem is NP-complete. Even laying out standard cells has to be done using heuristic solutions - blowing up the size of the problem by going from cells to transistors just makes that worse.
The logic is built out of standard gates and logic blocks like flip-flops anyway, so the overhead of using standard cells that implement those building blocks likely isn't too great.
This is more apocryphal than lore, but the understanding I've picked up from EE friends is that standard cells are used because they're proven to work in a given fab process. You don't want your layout software coming up with a trillion different gate prototypes in the midst of laying out your logic circuit!
I'll give you an alternate take: the compute power available to EDA software has been roughly scaling at the same rate as transistors on a die. So the complexity of the problem relative to compute power available has remained somewhat constant. So standard cell design remains an efficient method of reducing complexity of the problems EDA tools have to solve.
That's an interesting thought. However, it assumes that the problem scales with the number of transistors, i.e. O(N). I expect that the complexity of place and route algorithms is worse than O(N), which means the algorithms will fall behind as the number of transistors increases. (Technically, the algorithms are NP-complete so you're doomed, but what matters is the complexity of the heuristics.)
It's worse than that, isn't it? Not only are the algorithms presumably super linear, the transistor count has been increasing exponentially, but the compute power per transistor has been decreasing over time. See e.g. [1].
Although I suppose if the problem is embarrassingly parallel, the SpecINT x #cores curves might just about reach the #transistors curve.
yeah, that plots single-threaded performance, not total compute power. the point it's making is that now those transistors are going to parallelism rather than to single-threaded performance, and also the compute power per transistor stopped increasing around 02007 with the end of dennard scaling
your problem doesn't have to be ep to scale to 10² cores
i suspect it's true that compute power per transistor is dropping because thermal limits require dark silicon, but that plot doesn't show it
All tools at use in the last gen industry (40-12nm) currently make extensive use of standard cell librairies provided by foundries. I don't expect current gen nor next gen to change anything.
I don't think it's a software issue -- AIUI the issue is that foundries will only let you use blocks for which the process was tested, or the yields would be unreliable/all over the place
no, the mask must be made within the tight rules of the proprietary (and very secret) PDK of the TSMC Fab for that node. Just getting it certified that it fits the rules will cost millions.
It's probably more of a node thing than a fab thing. You would have a much easier time getting the fab to do random stuff for you on a legacy node compared to a leading edge node.
Leading edge nodes are basically black magic and are right on the edge of working vs producing broken chips.
You as a customer would never want to be in a position where you are solely responsible for yields.
There are only a few Fabs with nodes smaller than 28nm. Yes, all fabs are like that, with exception of a few experimental tiny labs at research institutes or universities.
Maybe in reality it is somewhere in-between, where TSMC says: here is a set of standard cells that we tested; you can use them but if you modify anything then it's at your own risk.
Maybe in therory, yes. In practise the fab will never allow you to do anything at your own risk because it might contaminate or break their $170 million machine. If you offer a few billion extra to cover that risk, your cheaper off building your own fab instead.
It takes years just to figure out how to use the machines to produce actual working chips. They're the most specialized, intricate, expensive machines in the world. They're operated in enormous clean rooms that only allow for 1 half-micron sized particle per cubic foot. No fab is going to run them without exactly following the procedures they have spent billions developing and testing. They are also going to avoid potential delays as much as possible because these machines have to be running for as much of their useful life as possible to recoup the vast expense. The level of risk-averseness is insane, but warranted.
If you haven't watched this video, I highly recommend it: Indistinguishable From Magic: Manufacturing Modern Computer Chipshttps://www.youtube.com/watch?v=NGFhc8R_uO4 It's a little outdated now, but very comprehensive and gives you an idea of how totally nuts the chip business is.
Think of it less like moving transistors around and more like moving small groups of atoms around (also, you can get single transistor cells for analog and power designs). The processes required to place the rough groups of atoms in roughly specific places involves extreme amounts of energy relative to the size of what you are working with, which is supplied in a combination of chemical, radiation, and thermal forms. As a result, predicting what additional effects attempting to form specific micro-shapes might have is nontrivial. They extensively simulate and then carefully test all of their approved cells on an ongoing basis. Deviating from this could cause unknown issues, including damage, but doing the necessary work to prequalify your custom cells would be prohibitively expensive for most applications.
Going outside of the PDK rules you risk contamination because a chip machine is a high speed ultra-accurate automated mechanical and chemical laboratory. It sprays extremely corrosive acids and vaporizes metals. Going a nanometer outside of the rules would send droplet or flakes of chemicals flying around at high speed.
I'm very skeptical of this too. I don't see a mechanism where moving transistors could break the machine. Although I wonder if you could blow up the die testing machine by making a chip that was one big charge pump :-)
It's not obvious what "moving" means in this thread but no foundry would ever let you make a design the grossly violates the design rule checks (DRC). It's hard to imagine how something risking damage of anything would get through all the many, many, checks, but wasting time and money is definitely possible and to be avoided.
As an aside, few analog layouts will use the "line of diffusion" style that's common in the standard cells. And in analog one can find more exotic transistor patterns, like waffle [1], that aren't used in digital. Many things are possible, but it has to pass DRC.
Skeptical here as well, no one is proposing a serious mechanism for damage like "you left gate oxide uncovered by silicide so it contaminates the CMP machine" or whatever.
What I can imagine, is that the foundry only tests their magic OPC algorithm with DRC clean inputs. If your mask isn't DRC clean, who knows what's coming out the other side.
One difference between the standard cells in the article and the current ones is that the routing channels have been eliminated thanks to the many metal layers we now have. Back then we couldn't really afford to have metal cross the Vdd and ground lines at the top and bottom of the cells so we just stretched the polysilicon lines to the top and bottom edges. Routing was done by continuing the poly into the channel and then connecting cells with metal. This meant that though the decapped poly lines are just one thing in the photos, in terms of design the parts inside the cells are standard and the parts in the channel are custom.
This scheme works even with just poly and one level of metal, but if you have enough metal layers than you can run them through the cells themselves. You just have to avoid the vias that take the inputs and outputs down to the transistors. You have an additional gain if you flip every other row of cells so that the PMOS of two rows have the Vdd rail overlap and the NMOS of two rows have the ground rail overlap.
This is so cool! "Dissecting" a processor like this could be a fun educational activity to do in schools similar to dissecting a frog, but without the animal rights issues.
Personally, I think everyone should try opening up a chip. It's easy (if the chip isn't in epoxy) and fun to look inside. You need a metallurgical microscope to examine the chip closely, but you can see interesting features even with the naked eye.
I didn't know there is such a thing as a metallurgical microscope. What makes them different from biological microscopes? And what is there primary purpose? I am assuming they don't make microscopes just for dissecting chips.
A regular biological microscope shines the light from below. This is good for looking at cells, but not so useful when looking at something opaque. A metallurgical microscope shines light from above, through the lens. They are used for examining metal samples, rocks, and other opaque things.
An external light works for something like an inspection microscope. But as you increase the magnification, you need something like a metallurgical microscope that focuses the light where you are looking. Otherwise, the image gets dimmer and dimmer as you zoom in.
In some places, you've shown the same part of the circuit both with and without the metal layers. How did you find the same location on the die after taking the die out of the microscope, removing the additional layers and putting it back?
I figured that I would want to study the standard-cell circuits, so I made a detailed panorama of one column of standard-cell circuits with the metal. Then after removing the metal, I made a second panorama of the same column. This made it easy to flip back and forth. (Of course, it would be nice to have a detailed panorama of the entire chip, but it would take way too long.)
Biological microscopes illuminate the sample from below, as the samples are typically transparent. Metallurgical microscopes illuminate reflective samples from above.
*"Below" meaning "on the opposite side from the objective" - you illuminate _through_ the sample.
Metallurgical microscopes illuminate the sample "from the top side". The actual implementation even goes as far as making sure the illumination happens on the optical axis of the objective (as if the light was emitted from your eyes/camera, reflected from the sample and then seen by your eyes/camera). They are also called reflected light or epi-illumination microscopes.
Biological microscopes, on the other hand illuminate the sample from the back side (which doesn't work for fully opaque objects).
Discarded RFID cards and the like provide a practically free source of minimally-encapsulated ICs, also often made on an old large process that's amenable to microscope examination.
Having looked at a few RFID cards, there are a couple of problems. First, the dies are very, very small (the size of a grain of slat) so they are hard to manipulate and easy to lose. Second, the die is glued onto the antenna with gunk that obstructs most of the die. You can burn it off or dissolve it with sulfuric acid, but I haven't had success with more pleasant solvents.
Decapping a processor produces toxic waste, which has to be disposed of. Processors, properly handled, last a lot longer than frogs, and can be re-used again and again: to a first approximation, processors do not wear out. I would expect that manufacturing a new processor causes more suffering to more frogs than is caused by killing a frog for dissection.
That said: we have video players in our pockets. Sure, dissecting one frog might be a more educational experience than watching somebody else dissect a frog, but is it more educational than watching 20 well-narrated dissections? I suspect not. I don't think we need to do either.
The grad student was Carl Sechen, advised by Alberto Sangiovanni-Vincentelli.
https://ieeexplore.ieee.org/document/1052337