
LLVM Meets the Truly Alien: Mill CPU Architecture [video] - pohl
https://www.youtube.com/watch?v=QyzlEYspqTI
======
state
It's so unusual to find projects like this. I have so much respect for these
guys. It's always fun to see what they're working on. Lots of the technical
specifics are over my head, but I really appreciate their tone and approach.

~~~
igravious
Previous discussion (with a member of their team chiming in):
[https://news.ycombinator.com/item?id=9856334](https://news.ycombinator.com/item?id=9856334)

Seems like a _fascinating_ architecture. I don't get the double program
counter bit, also seems like too much cognitive overhead to me. The belt
(queue of un-named slots) idea and function call for free with multiple return
results, such that regular ops and user defined functions work the same way,
that sure is clever.

Godard mentions that the CPU space has seen little to no innovation. But what
about Transmeta? They were doing some cool stuff, no? And the DEC Alpha, that
was super interesting. Probably lots more besides. I'm sure if I bothered to
check that Wikipedia would have a CPU graveyard page :) Thing is, x86 has been
the only game in town for a long time with Motorola challenging in the past
and ARM challenging now. Maybe time for a three horse race?

~~~
TheOtherHobbes
Alpha was amazing, and the DEC chip people were best-in-the-world talented. If
DEC had made some smarter choices about personal computing we might have had
10 or 20GHz desktops now.

Otherwise, CPU design is almost the poster-child for technical debt. The
problems are more social and economic than technical. Once an architecture
dominates the market it's almost impossible to replace it with something
better. The best hope is to do what ARM are doing and sneak up on the market
sideways, starting from a segment with much less competition.

~~~
Scaevolus
DEC would be able able to violate the laws of physics?

The GHz barrier isn't caused by lack of talent, it's caused by basic thermal
limits of silicon chips.

~~~
rancur
maybe they were focused on fabbing from germanium ingots?

>germanium ingots

100ghz vs ~3ghz limit on silicon

(which is the reason why we sometimes use Strained-Silicon-Gemanium

------
Someone
As to finding instruction boundaries with a variable-length instruction
format: you could use the trick they use to start two instruction streams from
a single address by using it for a stream of fixed-length instruction sizes
running down that can be used to find the border in the stream of variable-
length instructions running in the ’normal’ direction.

You could have:

    
    
      offset  : -----000000000111111
      offset  : 54321012345678901234
      contents: 64212iijkkllllmmmmmm (digits are instruction lengths, letters are instructions)
    

So, reading offsets -1, -2, -3, etc. you read off the sizes of the
instructions you can read in offsets 1, 2, 3, etc. If you have 4 decode units,
you can replace the sizes of the instructions by their offsets, resetting the
offset every 4th instruction.

With instruction sizes of 1-4 bytes, you could store instruction sizes in 2
bits per instruction. That would lower the pressure on your cache for the
extra information you store. There also would be more space in your
instruction set because you no longer would have to worry about creating an
instruction that has a prefix that also is a valid instruction. For example,
you could use 0x01 both for INC and ADD IMMEDIATE VALUE, where it means the
former if the instruction length is 1, the latter if it is longer, with the
size of the immediate determined by the size of the instruction.

Would that be a net win? I wouldn’t know, but it seems simpler (but ’simpler’
looks like a negative in their worldview) than creating those two streams that
somehow must work together to do what a single statement sequence in the
source code does.

~~~
igravious
I don't really understand your solution, but then again I did not understand
the Mill's solution. Can you explain it again?

All I will say is that they seem to have really tried to simplify and
regularize the Mill _except_ in the opcode stream / decode step where they say
that two streams is optimal! Seems like it would be a nightmare and introduce
all sorts of complexities. Was the only bit of the talk I didn't grok, whereas
most others bits had me nodding my head and thinking "oh, neat".

~~~
igodard
(Mill team) The streams are not streams of instruction; they are streams of
half-instructions. The Mill is a wide-issue machine, like a VLIW or EPIC; each
instruction can contain many operations, which all issue together. Each
instruction is split roughly in half, with some of the ops in one half and
some in the other. On the Mill, the split is based on the kind of operation
and the info that ikt needs to encode: all memory and control flow ops on one
side, all arithmetic on the other, although other divisions are possible.

Each half is grouped with the same half of other instructions to make a
stream, and the two streams decode together, step by step, so each cycle one
instruction, comprising both halves, decodes and issues.

The result is to double the available top level instruction cache, and cut in
half the work that has to be handled by each of the two decoders.

------
aidenn0
Does anybody know The Mill's plan for commercialization?

~~~
analognoise
Generate buzz, get bought, never produce anything.

~~~
kjs3
You forgot "have a legion of sycophants/sockpuppets that systematically
downmod any post that's less than a furious jerkoff over how amazing Mill is".

~~~
chc
It's a middlebrow dismissal. It _should_ be downvoted. That's not sycophancy,
it's just Hacker News' losing battle to keep the conversation stimulating and
worthwhile.

~~~
kjs3
I guess if your definition of "stimulating and worthwhile" is "uncritical
fawning", then I can understand your support of downmodding anything critical.
Thanks for the confirmation of my original suspicion.

~~~
analognoise
I agree - an entirely new architecture with nothing to speak for it (no FPGA
code to try it out, no production chips, just paper this, paper that). I'm
much more impressed by Risc-V - it exists in a format you can actually use.

~~~
kjs3
Yup...the RISC-V folks have a working, open source, ~1GHz processor. That runs
real software. Today.

But what do they know?

Mill has...videos. And patent applications. A strong social media presence.
And a simulator that they really, really promise you'll be able to use _real
soon now_ and is going to _totally_ own everything. And they're going to fab
the chips themselves, long after everyone but Intel figured out that was
stupid. Or something.

Because that sort of "we have the off-the-wall architecture that will change
_fucking everything_ in computing forever" attitude worked so well for the
iAPX432, Transputer, Rekursiv, i860, everything Chuck Moore has ever done,
TI9900, most (not all) things called VLIW, etc." Right?

Best of all? A completely uncritical, fawning audience on HN.

------
luckydude
The Mill CPU is going on and on as going to be awesome some day. I spoke with
one their guys at Greg Chesson's wake, it seemed open ended.

Yup, you have good ideas. How about you ship? Then we can see if the ideas
meet reality and work.

I'm friends with one the guys who did the P6. Which became the basis for
pretty much any x86 chip. He shipped stuff.

Ideas are easy, execution is hard, I have yet to see the Mill people execute.

I'd be crazy happy if they stepped up and made me look stupid because they
shipped a kick ass chip.

~~~
uxcn
As I understand, they aren't seeking venture capital backing, so progress is
slow. I know they have (had) plans to implement a Mill on an FPGA, but even
going to a usable ASIC will probably be a challenge.

It's a major architectural change as well. Most existing systems rely on
virtual memory, which the Mill doesn't support.

~~~
adwn
> _Most existing systems rely on virtual memory, which the Mill doesn 't
> support._

I believe you're mistaken. The Mill _does_ support virtual memory; the main
differences to most systems are that its caches use virtual addresses (the TLB
is between the L2 cache and the DRAM), and that it's a single-address-space
architecture.

~~~
aidenn0
ARM did that with L1 cache in the '90s and it was _terrible_ for targeting a
VM OS to. Cache aliasing is a nightmare and a never-ending source of bugs for
e.g. shared memory.

~~~
mike_hearn
It's very likely that they will need to write their own OS from scratch for
the Mill. They might be able to reuse a lot of code from Linux, but it won't
_be_ Linux, as the whole ethos of the Mill is to rethink everything.

In particular the Mill does not have the same notion of a process, the CPU
itself is aware of threading, it has its own security design that isn't
anything like user/supervisor mode, there are no syscalls, and programs have
to be specialised before they can be run, a la ART on Android.

I suspect once they have LLVM up and running, one of the next biggest wins
they'll want to go for is porting HotSpot. The only other company I'm aware of
that was able to actually make money selling a really crazy custom CPU
architecture is Azul with their Vega arch, and they did it by making Java
bytecode their exposed bytecode. The actual CPU opcodes were an implementation
detail and everything compiled on the fly.

The nice thing about porting the JVM is tons of business-critical, performance
sensitive code would Just Work™. And then you have a ready made market for
companies with huge Java (e.g. Hadoop) jobs that are very power/performance
tradeoff sensitive.

~~~
dogma1138
Is VEGA an actual discrete CPU architecture or something they've
licensed/copied from existing chips?

Azul made their money from selling high performance JVM's they have their
"VEGA" compute appliances for Java applications but I've never actually seen
their ISA documentation for the hardware which would be a very interesting
thing to see....

~~~
mike_hearn
It's apparently entirely custom. There are some tech talks about it online.

------
quux
I'm a little confused about how the "pointerness" of pointers is getting lost
in LLVM. The mill can't be the only architecture that encodes extra data in
the high bits of pointers right? The 64 bit objective-C runtime and Xbox 360
come to mind as systems where this is done. Clearly LLVM can generate good
code 64 bit Objective-C... Am I missing something?

------
nly
Linking to a 2.3 GB file on HN is perhaps not the best move.

~~~
iso8859-1
I put it on YouTube:
[https://www.youtube.com/watch?v=QyzlEYspqTI](https://www.youtube.com/watch?v=QyzlEYspqTI)

EDIT: Updated with complete video link

~~~
nly
Why is it 7 minutes short?

~~~
iso8859-1
Seems my download was truncated at 2485374429 bytes, a mere 137934 over the 2
GB mark for some reason. I'll fix it and post the new link. EDIT: The link was
updated.

~~~
dang
Please do. If there's a complete copy on YouTube we'll change the post to that
URL.

Edit: I've changed the URL to the truncated one temporarily, because even a
truncated most-of is probably better than a large file download.
[http://llvm.org/devmtg/2015-04/Videos/SD/day%202/Ivan_1.mp4](http://llvm.org/devmtg/2015-04/Videos/SD/day%202/Ivan_1.mp4)
is a URL for the whole thing.

~~~
iso8859-1
Here it is, dang:
[https://www.youtube.com/watch?v=QyzlEYspqTI](https://www.youtube.com/watch?v=QyzlEYspqTI)

~~~
dang
Yay! Thanks, updated. I hope to watch this later.

------
faragon
Those CPUs look like having a huge potential, but there is no really working
compiler for them. In my opinion, they could try to engage Fabrice Bellard,
the genious behind QEMU and TCC. Or people with experience on Mali GPU
architecture, which shares many points with the Mill CPU approach.

~~~
TazeTSchnitzel
They'll get there. They aren't burning through money, they don't have any:
they're not a properly formed company, just volunteers. It'll get done when it
gets done.

~~~
listic
> they're not a properly formed company

They aren't? Really?

~~~
wolf550e
For a long time the founders didn't have employees and didn't draw a paycheck.

