
The Mill CPU Architecture – Threading [video] - Avi-D-coder
https://www.youtube.com/watch?v=7KQnrOEoWEY
======
phaedrus
Lately I've been thinking that something inspired by the Belt idea could be
applied to making an innovative homebrew 4-bit TTL CPU. The concept is, feed
each of the 4 output bits from a 74181 ALU into a separate chain of 74HCT595
shift registers. (Instead of an implicit accumulator, every machine code
command would be an implicit push-result-into-belt.) Taking a slice of each
parallel data out pin across the shift register lanes, you get a "belt" slot.
I.e. the 4 shift register chips' d0's give you ALU out from time t-0, the 4
d1's give you ALU out from t-1, etc. The effect is that instead of an
accumulator register, you have 32 bits of ALU output _history_ (or 64 bits or
more if you chain more than one 74595 for each bit). Then, having created an
ersatz very wide demux, connect all 4x8 or 4x16 lines to a group of 74151
mux's to get back down to 4-bit register bus values - with the capability that
different groups of mux's can independently point to any nibble in the
history-memory.

Although the same end result could be accomplished without going through a
4x16 "bit matrix" or "crossbar", the setup has some nice properties,
particularly for a hobbyist TTL CPU:

* Generating 16 bit address lines and 8 or 16 bit memory-data IO lines could be done by grabbing 2 or 4 nibbles at a time.

* You could drive LED's or a (decoded) hex 7 segment display directly from the bit-matrix lines to see all N history values.

* If you have at least two mux units, the "A" and "B" inputs of the ALU chip could be pointed at different history values.

* The two mux units could be ganged to feed the wider RAM-address and RAM-data registers.

* When doing math or logic on 8, 16, etc. bit values, you wouldn't have to change the register selectors (mux addresses) to change nibbles: the act of pushing the result moves the next operands into position under the "tape head".

That last point means that a fairly simple TTL circuit could flexibly support
4,8,12,16,...,64 bit ALU ops (provided enough 74HCT595s were connected to
provide 2x that many bits total). Just set up the initial data and operation,
load a TTL counter with the desired number of nibbles, and let 'er rip at the
max speed the 74181 can handle.

I call this "The Suspenders" CPU architecture.

~~~
ChuckMcM
This is the kind of experiment it is fun to use small FPGA evaluation boards
to run. (or a larger CPLD).

------
saosebastiao
I've been thinking for a while about both Itanium and the Mill, both of which
I feel are pretty big advancements in the state of the art for computer
architectures. Itanium was a complete flop and the Mill is yet to be seen.

Explaining success or failure is always fraught with peril and overly
simplified beyond reason, but I can't help but think that Intel massively
fucked up one thing and one thing only: they targeted the enterprise market.
Not just the commodity data center market, but the _enterprise_ enterprise
market, the we-only-buy-IBM hyperconservative and slow to move market. The
type that still run COBOL on mainframes (and still call them mainframes)
because it works and _recompiling_ isn't even close to an option. Nobody in
this market wants a nominal increase in computing power if it comes at the
expense of backwards incompatibility.

They couldn't have seen it beforehand because smartphones weren't a thing, but
a few years after launch they had their ideal target market right in front of
them. Smartphones manufacturers will do _anything_ for an incremental
improvement in power efficiency. They'll take that improvement and exploit
every ounce of it both upmarket in high end phones and downmarket in android
burners for 3rd world countries. And people go through phones like
crazy...nobody is running software on their phones that is more than 1-2 years
old, and most OSes and apps have been updated at least once in the last 3
months. A recompile and migration to a new architecture for this market isn't
even 1% of the hurdle that enterprise software was.

I hope that if the Mill makes it to market, that they get the market right.
I'd hate to see another innovation get the shaft because of something as dumb
as some marketing decisions.

~~~
rogerbinns
Your view with what happened with Itanium isn't what really happened. There is
a wonderful talk by Bob Colwell (architect of the P6 / Pentium Pro) given for
the Stanford EE380 course in 2004 titled "Things cpu architects need to think
about". Sadly it looks like all online copies have gone. I very highly
recommend the whole talk if you can find it.

Itanium was supposed to take over everything, not "enterprise". Amusingly its
performance projections were based on 36 hand coded instructions from a
representative inner loop in Spec, and management went ahead based on that.
Even though they would leapfrog x86 in theory, in practise x86 did a steady
march in performance improvements (helped by Intel's fabs). As Itanium got
late, rather than cancel the project, they decided x86 was for the masses and
Itanium for enterprise.

I really like that they are trying Mill, but suspect Risc V is going to soak
up the dollars and attention.

------
chubot
I got excited about Mill a couple months ago after watching some videos (I
finally understood a little bit, after seeing their materials pop up for
years). It's refreshing to see a design that crosses hardware/software
boundaries rather than just hacking on one side of the fence.

But then I noticed that Mill is not an ISA but a family of ISAs? (Small,
Medium, Large or something like that) They are hacking LLVM so that it knows
about all of the ISAs.

But doesn't this cause a problem for say JIT compilers? (JVM, every major
JavaScript engine) Every single JIT compiler has to know about 4 ISAs? I get
that they are similar, but that seems onerous. Debugging tools have to know
about them too. Even strace is coupled to the ISA. I think the costs may have
been underestimated.

Anything that changes the hardware/software boundary is already risky because
you have to change two things at once. But if you're going to do that, I would
think it should be a single stable interface?

In other words I think the coupling between their own compiler tech and the
hardware is too close. Not everything is a portable C program. There's still
people running Fortran, not to mention non-LLVM based compilers like Go.

~~~
infogulch
> Every single JIT compiler has to know about 4 ISAs?

Actually it's way worse, but much better.

Better first: Typically, binary programs targeting the mill don't target a
specific machine, but they target a hypothetical "general mill" machine code
called GenAsm [0]. GenAsm makes some generous assumptions about the hardware,
like infinite belt and presence of machine instructions. Included with each
machine is the Specializer [1], a program that takes GenAsm and converts it
down to the _specialized_ binary encoding for this specific machine; think of
it like a linker. This includes translating infinite belt semantics into
finite belts, polyfilling any missing machine instructions with microcode,
etc, etc. This process is very fast and the OS can cache the result. JITs can
use an API to convert generated GenAsm into runnable machine code which
includes running it through the Specializer. The specializer is built along
with all the other tooling based on the specification that Ivan mentioned
briefly.

Now for worse: because mill machines are specification-driven, there could be
_many_ more than just 4 ISAs. There could be more ISAs than they have
customers depending all on the needs in each case. But it's no big deal
because everything targets GenAsm and the machine code differences will be
specialized away.

I'm pretty sure it's the Specification talk [2] that goes into the most detail
about this.

[0]:
[http://millcomputing.com/wiki/GenAsm_(code_representation)](http://millcomputing.com/wiki/GenAsm_\(code_representation\))

[1]:
[http://millcomputing.com/wiki/Specializer](http://millcomputing.com/wiki/Specializer)

[2]:
[https://millcomputing.com/docs/specification/](https://millcomputing.com/docs/specification/)

~~~
chubot
_There could be more ISAs than they have customers depending all on the needs
in each case. But it 's no big deal because everything targets GenAsm and the
machine code differences will be specialized away_

Yeah so this is the point I'm quibbling with. I have no doubt it's technically
possible. I'm saying that it will hinder adoption, and they're probably
underestimating the diversity of software components that generate native
code, and underestimating the cost of modifying all those components.

Something like Xen succeeded because it _designed up front_ to be a trivial
modification to kernels -- i.e. paravirtualization.

This sounds like a whole new architectural element. They're not only changing
the interface between the CPU and the compiler; they're also changing the
relationship between the kernel and the CPU (aside from there being a
different ISA.)

It's good to test assumptions, and I wish them luck. But after being somewhat
excited about it, I feel it's just too ambitious. I'd love to be proven wrong
though.

~~~
infogulch
To be clear, I'm just a random that's been following the mill project for a
while, so please don't take my answers as gospel.

The mill is a new, novel, ISA, which will require compilers to support it as a
target. There's no getting around that. But once the tooling is ready, typical
programs written in high level languages like C (i.e. excluding inline
assembly and architecture-specific assumptions) will be a compiler flag away
from being able to distribute binaries that run on all mill chips.

If it's the specializer you're concerned about, it's intended to be very
transparent to the typical user, integrated into the system. Most users are
completely unaware that a thing called the linker even exists, this should be
similar. By the way, just building the spec (defines belt size, available
instructions, etc) generates a fully functioning specializer for that machine.
I'm pretty sure it does that _today_.

------
__s
Curious how applicable the Mill is for real time use cases

Other end is that with the instruction set not being binary stable, I'm
curious how well the Mill would be useful for something like Singularity
[https://en.wikipedia.org/wiki/Singularity_%28operating_syste...](https://en.wikipedia.org/wiki/Singularity_%28operating_system%29)
or a hypothetical WebAssembly OS where userspace programs are an IR for the OS
to compile. IIRC the Mill is suppose to have it's own IR for program
portability

Binary translation viability is key if they want to support Windows-- see
current Windows 10 for ARM

~~~
SAI_Peregrinus
4:30 in the linked video or so: it's statically scheduled, in-order, and all
opcodes have fixed execution latency. Should be good for real time.

~~~
ema
There is still variable timing depending on whether the needed data happens to
be in cache or not. Not sure how hard it is to code in such a way that it is
predictable what data is gonna be in cache at which times.

~~~
snuxoll
DRAM access is always going to be variable by nature, though they reduce the
problem somewhat by never reordering loads and stores. Stores will always
write back to D$1 cache and only hit DRAM when a line needs to be flushed from
the last level cache, so assuming your data and code fits in cache you can
theoretically have 100% determinism (although all of the specific latency
numbers are model-dependent, so just like a traditional DSP you'll have to
tune for each target).

~~~
SAI_Peregrinus
It's also worth noting that hard real-time systems tend to be custom designed,
and often for high-budget products like medical devices or test equipment. So
it is probably possible to add a bunch of SRAM to the board, and guaranteed
latency to that memory is very easy.

------
Quequau
I've only been paying a little attention to this but it does seem pretty
interesting and maybe even promising.

Have they made any indications about production time lines recently?

~~~
petermonsson
Don’t keep your hopes high. The first video is 4 years old.

~~~
snuxoll
To be fair, these guys aren't Intel or even AMD or ARM - they're a small team
with limited funding. I'm not going to be shocked of Mill themselves never
releases production silicon, but worst case their novel ideas will only be
under patent lock-and-key for a period of time - someone else will have the
opportunity to make use of them eventually.

With that said, I'd love to see these things come off a fab line someday -
there's a lot of potential in the ideas behind the Mill architecture, whether
they'll pan out or not is to be seen, but if they fail I'd rather see another
Itanic than to never make it to market in the first place.

------
monk_e_boy
Is there a video that explains the Mill CPU in detail but isn't 15 hours long?
I could go for 2 or 3 hours.

~~~
ithkuil
FWIW I wholeheartedly recommend watching all the material about the mill CPU;
I found it deeply refreshing and illuminating. (disclaimer: I'm a software
engineer, with basic knowledge about hw)

~~~
monk_e_boy
I've watched a few. But I don't need 15 hours of info, 3 is fine :) I'm not
THAT interested (yet!)

------
lasermike026
Do not meddle in the affairs of wizards, for they are subtle and quick to
anger.

~~~
tromp
I remember a variation ending with "for you are crunchy and good with ketchup"

~~~
teddyh
That’s dragons, not wizards.

------
justin_vanw
Why are we talking about this forever? Can't someone just make one of these,
demonstrate that they don't work well in practice, and then we can stop
wasting time on it?

~~~
CarVac
They are only talking as quickly as they can get patents filed. Once the
patents are in place then they'll get to taping out, I presume.

~~~
justin_vanw
Filing a patent takes about 2 months, they've been on this nonsense for 4
years or so.

Edit: 14 years...

~~~
kinghajj
IIRC, they had something on the order of ~50 patents, which they only started
on 3-4 years ago. The first 10 years of the project were entirely spent on
_thinking up_ of the new, patentable ideas that go into the thing. As Ivan
says, you can't put a schedule on insights--they happen when they happen.

~~~
rcxdude
Not to mention this is still basically a spare time project for all of them.

~~~
justin_vanw
Well I guess I'll wait another 14 years before I comment again. I'm sure
you'll all still be here pretending this is a real thing.

