Hacker News new | past | comments | ask | show | jobs | submit login
ARM Releases Machine Readable Architecture Specification (alastairreid.github.io)
167 points by walterbell on Apr 22, 2017 | hide | past | web | favorite | 18 comments

I work at a chip company. All of our register definitions and other specifications are created in a machine-readable but human-friendly YAML format.

The specs drive scripts which generate the hardware, test interfaces, C headers for software and the PDF manual we give to our customers. I assume everyone is doing this: how else can you guarantee thousands of pages of documentation and a large API match the product perfectly?

What's new is ARM is giving that internal specification to customers. Makes a lot of sense given their customers are also chip companies with similar practices. I think it's a good idea and everyone should do it.

What's new is ARM is giving that internal specification to customers.

Not just customers but everyone, judging by the fact that I could just click the links in the post and download the files.

Yes, ARM's architecture licensees (companies allowed to design their own ARM-compatible processors) have had access to the v8-A specs for years. The new bit is making it available so that researchers, companies and individuals can use it in software projects.

This is a lot more than register specifications. It is a specification of the complete functionality of the architecture, written in a format that allows you to generate simulators, do formal verification of RTL code, or probably even generate synthesizable RTL code as well.

Probably best not to get too excited about generating hardware from the specs... That would compete with ARM's main source of revenue and my reading of the license is that that is the one thing you cannot do with the spec. But I am not a lawyer.

Or, on a more technical note, my feeling is that if that is what you wanted to do, then you would not start from the style of specification ARM has released.

Licensing aside it is still pretty bold for a fabless IP-only company like ARM to release this level of detailed specification; unlike e.g. Intel or AMD who could still have an advantage even if others made competing implementations, because of their fab technology.

I'd bet there already are clones/semi-clones of ARMs hidden in various obscure far-East ICs which have no user-accessible firmware... we just don't hear about them because the ones we do, would've been sued out of existence. I've seen presumably-unlicensed MIPS cores a lot in those.

I repressed dreams of this so many times along the years I'm tearing up a little.

This is pretty insanely awesome. One of the things this should enable would be a transpiler that could compile the spec into an LLVM or GCC backend automatically. Pretty cool to design a new architecture and then automatically generate a bunch of programming languages for it.

Now if we had a peripheral specification like this we could auto generate to a given HAL and that would pretty much make the instruction set architecture completely irrelevant to systems folks building things.

I don't see how you could generate a compiler backend from this -- the spec/pseudocode give you the "this instruction has these effects and side effects" mapping, but you can't easily reverse that to get the "if I want this effect then I need this instruction sequence" mapping which a compiler backend uses. Am I missing something?

As a gross simplification, a compiler is just a very complex pattern-matcher: it turns sequences of characters in source code into a series of machine instructions by matching the operations they perform. This spec gives part of that mapping, but it's probably quite sufficient: think about how a human can write Asm after reading the docs, and the algorithm (s)he uses to do that.

To give a concrete example, suppose you want to compile "c = a + b;". You would realise this is an addition operation, then look at the sizes of the variables and consider where they are/will be. You then search for instructions in the instruction set which can perform addition. Maybe the source operands are in memory and the only addition instructions operate on registers (like ARM) in which case you would then have to solve the sub-problem of getting them into a register, by searching for more instructions which do that; or perhaps (like x86) only one of the operands need be in a register, so you only need to solve that sub-problem once. After you've chosen an instruction, update the machine state and continue with the next operation. Etc. etc. A compiler uses a similar algorithm, and could be driven by this spec. It's not in a good form for efficient compilation (e.g. indexing by operation(s) performed would be far better), but it contains the necessary mapping.

Of course there are plenty of subtle details in this whole process, but it is very possible to do this type of "table-driven meta-compilation". At least for Asm, there are a few table-driven (dis)assemblers which operate on a very similar principle.

If anything, a "table-driven decompiler" using this spec might be a bit easier, since that is directly mapping machine instructions into their higher-level operations.

My point is that the spec does not give the behaviour-to-machine-instruction mapping, only the other way round, and reversing the mapping is non-trivial.

Table driven assemblers and disassemblers would be straightforward to produce from this spec, yes, because those just need to care about the instruction patterns and the mnemonics. A compiler backend needs to care about the semantics of the instructions, which are in the pseudocode, and thus harder to automatically reason about. (For instance lots of the instructions include an add somewhere, but how do you find the one that is just a plain add rather than saturating or simd or also setting the condition flags?) Writing something smart enough would I think be vastly more work than just doing a compiler backend by hand, and likely produce worse code anyway. (A sibling comment suggests it would be phd-thesis level work and I think I'd agree with that.)

I'm reasonably sure you can reverse it to get that, though I don't know what techniques would be used. "Easily" is another matter, though a good enough programming language would let you "prove" that you have a correct reverse mapping, for some definition of prove.

I agree - that should definitely be possible. Not necessarily easy though - but if anybody is looking for a PhD topic or, better yet, a whole research group looking for a challenge, you have all the bits you need to make it happen.

What a wonderful development. I can't imagine trying to do so from an Intel PDF manual! [0]

[0]: https://github.com/google/CPU-instructions

There has been a steady industry of people retyping the manuals of all the major architectures for use in formal verification. At PLDI last year, there was even a paper where a team used synthesis techniques to automatically generate an instruction spec for x86 by generating tests for processors and using the answers to refine their current guess.

And within ARM, there were lots of people transcribing bits of the ARM manual into various forms: C/C++, LLVM .td files, Verilog, spreadsheets, etc. Apart from all the wasted effort, this also created a verification problem: each transcription had its own set of bugs that now had to be fixed. And it missed a verification opportunity: if everyone used the same master copy then each time one group spots a bug and reports it, the spec gets better for everybody. I talked about this virtuous cycle at S-REPLS last year: https://alastairreid.github.io/papers/srepls4-trustworthy.pd... (S-REPLS is a programming language seminar in SE England.)

This paper might be of interest[0]: Stratified Synthesis: Automatically Learning the x86-64 Instruction Set – PLDI 2016

[0]: https://raw.githubusercontent.com/StanfordPL/stoke/develop/d...

ARM, still kicking! This is pretty neat. As M-J-Fox said, most developers end up having to write documentation to get around the code not being machine readable. I had to do this as well for school projects.

Good for them.

Incidentally, there was some discussion about this on reddit too. https://www.reddit.com/r/programming/comments/66kyez/arm_rel...

Topics included the license, what you can do with it, what other specs are out there (for ARM and others), ARM's love of arm/leg/limb-related acronyms and how much people hate XML.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact