Hacker News new | past | comments | ask | show | jobs | submit login
What is a System-on-Chip (SoC), and why do we care if they are open source? (bunniestudios.com)
275 points by picture on Nov 9, 2020 | hide | past | favorite | 89 comments



More eyes on a design would help.

Proprietary SOCs suck when you run into a bug in system bring-up (and if you are going off the beaten track even a little, you will), and you have a vendor that:

- Doesn't believe you. In fact, is quite vocal about not believing you.

- Takes your hard-won repro code (with "a possible workaround, what do you think?") and is silent for months;

- Finally publishes a TR about that bug, along with your very own workaround code, verbatim and unattributed and you never even got an email acknowledging the issue.

... and after nearly a year of chasing the stupid thing down, as a last resort grants you a five minute phone audience with the chip designer. Sixty seconds into the problem description he says, "Oh, I know what's going on..." and proceeds to fix everything by telling you what parts of the documentation are lies. Your workaround turns out to be how you have to do it. You feel a little ill.

An open source SOC would not necessarily be better understood by more people, but it probably couldn't be worse.


That sounds oddly specific, are you okay?


I've had something closely resembling what OP described happen to me twice. In both cases the relevant chip errata document is behind an NDA, but in one of them my bug was the first one on the list.

That one was a "don't use these particular memory locations, or disable the cache (which makes the chip unusably slow)" kind of bug, and I'm still simultaneously amazed that I managed to figure it out and angry that I had to.


Been in a similar position too. Didn't end up getting it added to the errata since we reported it against an engineering sample, but just added a sentence about how 'this status bit flag is "unreliable"' buried somewhere in the middle of the first public release of the reference manual without any warnings as to what that implies downstream. At least it went in the real docs I guess. But this was after them ghosting us about this, so the whole encounter still left a bad taste in my mouth.


The NDA problem in the SoC world is completely out of control.


these are the best comments in HNews! It amazing the amount of incompetence there are in tech industry now days.


I'm fine, thanks for asking :-)

It was a specific example, one of many from a vendor that I will not name, and over a decade ago.


And sadly, actually getting to talk to someone at the end is the impressive part here from what I've seen...


Why do you feel ill? This is how building up expert knowledge works.


Because their year-long arduous journey could, apparently, be avoided with a 60 second chat with the right person.


Yes. Even in tech it comes down, it doesnt matter what you know, it matters whom you know.


> it doesnt matter what you know, it matters whom you know.

while technically the literal reading of your sentence is not incorrect, the phrase above is referring to nepotism which is not the issue OP has.


Would you have accepted his answer though? It was the same answer given in the first place.


I would have believed the chip designer. He explained the problem (an issue involving synchronization between different clock domains in this case) and why the workaround (that WE had to figure out, without help from the vendor) was correct.

The sample drivers written by the company were all buggy, by the way. They wound up doing a cut-and-paste with our workaround in their next board support drop.


What do you mean by accepting his answer? What alternative do you propose.


Oh hey, I was just playing with this... it's really cool!

There's a "Linux on LiteX-VexRiscv" design [1] that adds a DDR controller and MMU to the shared bus as well as a handful of other pieces that allow you to boot a Linux kernel image, mount a filesystem, and get a shell prompt over a serial terminal or ssh.

You can then use familiar interfaces to talk to whatever peripherals you decided to include, e.g.:

    $ echo 1 > /sys/class/gpio/gpio508/value
    $ echo 50 > /sys/class/pwm/pwmchip0/duty_cycle
I got it to run on Greg Davill's [2] Orange Crab FPGA board [3] last night, which was actually pretty easy, as it's one of the supported boards. It was surprisingly usable, even with the soft processor only running at 64 MHz. This was also the first time I used the open source synthesis / place-and-route tools to do anything more complicated than an adder, and they were fast and worked flawlessly.

[1] https://github.com/litex-hub/linux-on-litex-vexriscv

[2] https://twitter.com/gregdavill - He posts a lot of really cool macro and microscope photos of electronics assembly

[3] https://github.com/gregdavill/OrangeCrab


This is about the Precursor [0] to Betrusted [1].

[0]: https://www.crowdsupply.com/sutajio-kosagi/precursor

[1]: https://betrusted.io/


These terms tend to invite a lot of confusion. First of all, open source is not the same as free. In fact, the best open source hardware is a proprietary implementation of code that is maintained by the open-source community, and unless you have endless time and resources, you probably don't want to do this yourself. In chips, an open-source implementation typically involves a "free" instruction set architecture, and the hot one today is RISC-V. Others include MIPS and POWER, which are open-source, but you can't play with the source code like you can with RISC-V.

An SoC is a whole-different beast, because it contains all of the other things needed to create a system, including on-chip memory, I/O, interconnects, possibly some analog components, and it all has to work together. The more complex ones have multiple power domains, circuits that turn off when others turn on, and there needs to be embedded software in some of these devices. Depending upon what process node it was developed at, it also may require multiple voltages and a complex power delivery network. And if you really want to push the performance, you probably want to put this into a complex package, possibly including other chips. Having configurability in there in the form of an FPGA or some programmable logic is an interesting option, which is what Intel has done and presumably what AMD will do with its proposed acquisition of Xilinx. That helps keep it tuned to changes in algorithms for AI and machine learning without having to completely re-do the design.

The challenge will be finding design tools to make sure you haven't messed up anywhere. The free tools tend to be difficult to use and generally ineffective. The commercial tools are much better, but they're also expensive. And the more complicated the design, the more you'll probably need to buy some expensive hardware or lease it from the cloud. Programmability won't solve any of this. It will simply help avoid obsolescence, or at least slow it down.

There's a good article on open source here: https://semiengineering.com/riding-the-risc-v-wave/, with more links at the bottom if you need more.



I'm all for "free" over "open", but from what I understand, the difference is ideological. It sounds to me like you're equating "open source" with "visible source", which is of course inaccurate. A public repo with no license file is not "open source". I'm more familiar with these issue in software than hardware, though, so maybe I've missed something here.


Windows XP has its source available to, yet it isn't open source. It does really depend on the license.


One of the open FPGA tools:

https://symbiflow.github.io/


Whatever happened to OpenSPARC? I don't see any FOSS desigs based on OpenPOWER either.


IBM released an open POWER core.

https://github.com/openpower-cores/a2o


This open source chip is an FPGA with all the cost and speed limitations that carries. What would it take to create a truly open-source chip? I assume a chip fab can be contracted to build them given a workable design. Is this conceivable for a general purpose CPU? Will these FPGA approaches get us closer to such a goal?


For the Precursor platform, though, I believe being FPGA-based is a specifically-desired feature. Bunnie did a hardware security talk at Chaos Computer Congress 36 (https://www.bunniestudios.com/blog/?p=5706) that argued for using FPGAs as a means to gain visibility into the "hardware" to find implementation backdoors.


As the other comments say, it's going to cost a million bucks probably to get something made. I wouldn't be shocked to hear numbers way lower for some smart folks. I wonder, for example, what it cost GreenArrays to get their chips made, how big their run was. But $1M i probably a pretty safe ballpark starting place.

And we have pretty great open source cpu & ram design tools. MAGIC is free. There's a lot of great tools fora lot of this.

What is not so great is all the un-core. And GPU, if you want that.

The SiFive boards were coming with TileLink[1] as the main way to talk to the core, for a while, I believe, which is a pretty low level cache-coherent fabric used no where else. There's decent off the shelf to design a lot of the digital systems here. But I don't know what resources if any are available to help you build a chip-to-chip interface, to expose the TileLink. And there's basically the world's tiniest market for things to plug in on the other end anyways.

When you buy an SoC, almost all embedded folk expect a bunch of semi-standard capabilities. I2C, I2S, DisplayPort, USB, Ethernet, &c are all common things you might want. While we can make a decent cpu now a days, afaik, we are still in very very very early days, very pre-implementation, for almost all of these. Thusfar almost everyone relies on buying closed IP to provide connectivity for the SoC they are making. It's quite likely to be the case as well for the new SiFive Unmatched board; I expect they bought someone's PCIe IP & integrated it.

We are starting to see more neat work with FPGA implementations of various protocols like USB3 & PCIe but these all rely on the FPGA having really good transceivers that tackle the PHYsical layers, do the dirty work. Building actual transistors to be receivers & transmitters into & out of the chip, that's where, atm, we are all but babes, it feels like.

[1] https://www.youtube.com/watch?v=EVITxp-SEp4


> What's more, SoCs incorporate components from a small number of vendors supplying designs for USB, DDR, and PCI controllers: "this means the same disused logic motifs are baked into hundreds of millions of devices, even across competing brands and dissimilar product lines."

https://mamot.fr/@pluralistic/105183378364921755

I really think this is the killer point of it all. The cores are the easy part. The digital part. Are we making headway anywhere else? Eh, yes, a tiny bit? Barely?


On the other hand, https://www.efabless.com/ offers:

$70K 20 WEEKS 100 SAMPLES

See also: https://theamphour.com/503-fabless-chip-design-with-mohammed...


Don’t get me wrong, I think what those guys are doing is cool, but it’s 130nm, a fairly small die, and it’s not a fully custom ASIC. That’s how they’re getting the cost down to under $100K. “Degree of Customization: Only what is available through configuration of the design template”


Thanks for pointing that out. Yes, the $1mm number I cited in my post is the estimated raw cost of a full mask set for a relatively modern process in the 2Xnm or smaller range.

Definitely older processes have much cheaper mask sets -- in part because there are just fewer metal layers, and fewer & cheaper masks required to image the transistors when the wavelength of light used (193nm) is close to the line width of the devices themselves.


>Will these FPGA approaches get us closer to such a goal?

The same HDL that can be configured into an FPGA can be used to make chips.

But we won't get there without the FPGA step. Chips aren't designed directly anymore. The sort of complexity we need these days means it's impossible to get right on the first try, and mistakes are very costly with ASICs.


> The same HDL that can be configured into an FPGA can be used to make chips.

Not exactly, you'll likely have to replace FPGA HW blocks for "hard IP" such as I/O (especially high speed interfaces such as HDMI/DP, MII, but also RAM controllers), power and clock management...


I'm also worried about this, which is part of the reason why the Precursor design avoids using hard IP blocks like RAM and USB controllers. It does limit the performance of the system (we have a simple async RAM interface and full-speed USB only), but the explicit priority is transparency over performance.

The main attack surfaces to worry about when translating this design to an IC are the RAM, boundary scan and eFuse macros, and perhaps slightly less so the PLL and ADC blocks. While these are not trivial blocks to be worried about, there may be things we can do at a design level to complicate attempts to bake back doors into these blocks, and we're investigating methods to verify their correct in-silicon construction non-destructively.


What about the metrology cutouts you have to leave in every 2mm by 2mm region on 28nm and lower processes? I hear the interval is even smaller on 14nm/7nm.

I always thought those were only half for process control and half to force you to leave a blank spot on your mask where they could pattern in nasty additional circuitry on a few wafers.

Edit: TSMC calls this the "Dummy TCD macro". If you have the 28nm design rule book T-N28-CL-DR-002 version 1.3 it's Section 6.3.3 "Dummy TCD Design Insertion Guideline" (if you don't have the design rule book then no, I can't send it to you, please don't ask). The TCD macro is a black box -- nobody outside TSMC gets to see what's in it. You have to leave a 9um*3.5um empty region on all layers in every 2mm by 2mm region of your maskset.

Now that Multi Layer Maskset (MLM) is widespread every fab has the ability to "blade off" regions of a mask and expose a pattern from a different mask in that region (of course wafers processed this way are much much more expensive). The TCD macro rules force you to leave an empty hole in your design at regular intervals where this can be done without breaking your chip.


> I always thought those were only half for process control and half to force you to leave a blank spot on your mask where they could pattern in nasty additional circuitry on a few wafers.

Holy fucking hell. So that is what the comment in https://news.ycombinator.com/item?id=24919073 might have alluded to?

Is there any way to take a random TSMC-fabbed chip and inspect this region, or are the feature sizes too small?


> So that is what the comment in https://news.ycombinator.com/item?id=24919073 might have alluded to?

Oh, no, it gets much worse. Take the blue pill, turn back from this rabbit-hole now; Here Be Dragons.


The amount of FUD that goes on HN related to hardware never ceases to amaze me.

Sometimes I find myself double checking if I am really on HN and not on Facebook.


Of course "the same HDL" is a reductive statement, since components like high-speed IO, or even clock gates are not easily portable between FPGA vendors or foundries. But the user-land Verilog/VHDL code that describes how your custom thing "works" is generally portable.


I don't know what the current status is, but I think https://hackaday.com/2020/06/30/your-own-open-source-asic-sk... is a good start on getting an open source fabricated.


Current status is "it's dead, Jim".

The promised "November run" did not materialize, and there's nothing solid on rescheduling it.

Kind of a let-down.


> What would it take to create a truly open-source chip?

From the article: 1M US-$ at least for the masks plus a boatload of other upfront investment. If you're for bleeding-edge tech like new fab processes, orders of magnitude more.


Thanks I skimmed and missed that.


Google, FOSSi and eFabless are working on a 100% open source chip design workflow that would allow you to use only open source tools to create a chip, and thanks to other efforts open source chips and ISA exist already:

See: https://www.youtube.com/watch?v=EczW2IWdnOM


Let's compare this to QubesOS: Which adversary can hack QubesOS while not being able to hack this ?



If I have control of QuebesOS' ISP -- it's pretty easy to "hack" Qubes, sadly: There is pretty much no way to validate the qubes public keys, they're not signed by anything else. So the software can be substituted when you download it.

More along the lines of what you're asking-- the virtualization creates a huge attack surface and there is a long history of VM escapes, microarch sidechannels, etc.


Qubes OS v4+ does not use typical software virtualization methods. VT-d hardware virtualization it uses was broken only once, and it was done by the Qubes founder: https://en.wikipedia.org/wiki/Blue_Pill_(software)

>There is pretty much no way to validate the qubes public keys, they're not signed by anything else.

Not sure what you mean here. All Qubes keys are signed by the master key which belongs to the main developers.


VT-x itself is not the main attack surface. It's all the interfaces between the inside and the outside of the VMs, for example graphics hardware emulation, network hardware emulation, etc. etc. There have been many of these bugs, on qubes/xen, and basically every other popular hypervisor. I have personally found a kernel memory disclosure vulnerability affecting xen, which I tested under qubes (it wasn't a very useful bug, but the point is, they exist and they are plentiful).

Furthermore, the "blue pill" attack is not a vulnerability in VT-x itself. In fact, it does not exploit any vulnerabilities at all - it simply works off the idea that code running in ring n can hide itself from code running ring+1 (the bluepill hypervisor being ring -1).


Those bugs of course exist, but I would not say "There have been many of these bugs, on qubes/xen... they exist and they are plentiful":

https://www.qubes-os.org/security/xsa/. The actual number is 67 in ~10 years: https://www.qubes-os.org/security/xsa/, which is more secure than anything else AFAIK.

Note, this is counting older Qubes versions, where the virtualization was much less secure.


> it uses was broken only once, and it was done by the Qubes founder

Well, not quite:

> "XSA-213 is a fatal, reliably exploitable bug in Xen," said the security team of Qubes OS, an operating system that isolates applications inside Xen virtual machines. "In the nearly eight-year history of the Qubes OS project, we have become aware of four bugs of this calibre: XSA-148, XSA-182, XSA-212 and now XSA-213."

https://www.csoonline.com/article/3193718/xen-hypervisor-fac...

> Qubes keys are signed by the master key which belongs to the main developers.

Right, and that master key is signed by nothing. There is no way provided to verify it.


> https://www.csoonline.com/article/3193718/xen-hypervisor-fac...

> May 3, 2017

I said Qubes 4+, which was released in 2018: https://www.qubes-os.org/news/2018/03/28/qubes-40/. All previous versions relied on software virtualization (and had much wider compatible hardware).

> Right, and that master key is signed by nothing. There is no way provided to verify it.

Sorry for my ignorance, how does one securely verify a master key not risking to compromise it?


Those with knowledge of what the master key is sign the master key, those with knowledge of those people and their keys sign those keys, and so on and so on until you have signed one of the keys in the set. Aka, "the web of trust".


The Qubes Master Signing Key is kept on air-gapped machine accessible only to the developers. It signs their keys and the fact that they are signed by it is already the evidence of its validity. It's set to never expire and so it is extremely sensitive. I don't see how you can make it better in terms of security.

https://www.qubes-os.org/security/verifying-signatures/#1-ge...


By making it so that people can securely receive it.

If Qubes ISP is malicious or compromised they can intercept http connections towards Qubes' servers. This allows them to trivially obtain a SSL cert for keys.qubes-os.org, etc.

Armed with a valid SSL cert, they MITM traffic. When a target downloads qubes they give them a tampered version, when a target downloads the master key, they give them a lookalike key.

The user can happily verify the tampered qubes with the lookalike key.

PGP proves nothing if you cannot verify that you have an authentic copy of the key.

Qubes gives some handwavy suggestions at how to check the key which do not work.

The first item "Use the PGP Web of Trust." cannot be done because the key isn't signed by anyone. The other suggestions are inapplicable or won't achieve anything if the target's network is being tampered with.

This is not some newfangled problem. PGP has the ability to sign keys for precisely this reason. The main Qubes developers should have personal PGP keys which they use to sign the master key. Their personal keys should be certified by other FOSS developers that they've met (maybe the master key too). Then people who are interested in obtaining high confidence in the key can inspect the chain from their own key to the qubes masterkey.

Obviously not everyone will perform careful validation, but if some do then substituting the key becomes riskier. Unfortunately not only is the key not verifiable at the moment, but the current situation is looks pretty similar to what an ongoing key replacement attack would look like.


Thank you for the interesting thoughts.

> If Qubes ISP is malicious or compromised they can intercept http connections towards Qubes' servers. This allows them to trivially obtain a SSL cert for keys.qubes-os.org, etc.

The malicious party would have to compromise a lot of websites for that: https://www.qubes-os.org/faq/#should-i-trust-this-website.

Apart from that, let's see what the developers will reply: https://qubes-os.discourse.group/t/there-is-no-way-to-valida....


Signatures should go from the devs keys to the master key, not the other way around.


There is no such requirement anywhere. You can check the key by other means: https://qubes-os.discourse.group/t/there-is-no-way-to-valida...


Any other method of verification makes zero sense, it is bizarre the Qubes folks would think otherwise.


Beyond being bizarre, it makes it look compromised.

(I'm not suggesting that it is-- it's just an extremely bad failure mode when you undermine your users ability to detect key substitution attacks by always looking like a key substitution is in progress).

Then again, it's not just them. OpenSSL did this previously too-- replaced their key and the only source for the new key was the same https site as the software, and the key was signed by nothing and it stayed this way for a month.

During which time most Linux distributions shipped the new update, which presumably means that they're not doing due diligence either. OpenSSL promptly fixed it after I reported that I couldn't validate their key.

Fedora itself now has a nearly non-verifyable key, unlike qubes they don't even have a master key that signs their release keys or sign new release keys with old ones. Presumably Qubes picked up their bad practices from Fedora.

It used to be that various developers at redhat would sign the fedora keys and post the signatures to the keyservers, but the DOS attacks on keyservers mean they can't do this anymore.


Can you please be more specific? How does it make Qubes look compromised? The Qubes website is absolutely not the only place to get the Master Key. See the discussion linked above: You can get the key from t-shirts, from qubes-os.org website, from Github, from the Qubes Discourse forum and other places. You can also find this key signed by different people.


efabless.com

Open source chips are not common place right now, but maybe in like 10-15 years.

Love bunniestudios posts.


thanks for reading them!


Bunnie, I have an encryption idea! I'd like to make hw: https://www.notion.so/Horcrux-Encrypted-Messaging-78af0a3f32...


> Of course, this flexibility comes at a cost: an FPGA is perhaps 50x more expensive than a feature-equivalent SoC and about 5-10x slower in absolute MHz

If an FPGA SoC of the same power is 250-500x more expensive - how many of the issues of proprietary SoC's become tolerable?

Wouldn't using a non SoC architecture - Processor + support chips become a better solution?


That would be even slower than an FPGA.


So is the plural systems on a chip (I hope so), or system on a chips (surely not)? And is hyphenation a valid get out of jail free card for people who for some reason prefer the latter? (so system-on-a-chips, still seems a monstrosity to me). Basically I want John Siracusa to be wrong about at least one thing like everybody else :)


I'd imagine the proper plural (assuming multiple systems each on a chip) would be "systems on a chip", cf. "brigadiers general".


Systems on a chip seems wrong, since there’s more than one chip. I’d probably say systems on chips or just SoCs.


SoC means just add power, everything you need to make a chip operate is integrated. No need for hundred active components! Terms and conditions apply! Something like that.


Okay, I haven't been accounting for system-on-chip versus system-on-a-chip.


Given the features and cost, what are the likely class of suitable applications for Precursor? It looks like the guts of a typical (consumer) router e.g., or a set-top device, but maybe not a commodity IOT thing (overkill and cost) and certainly not an iPad (performance, for one).


Precursor is focused on applications that prioritize evidence-based trust over everything else. Thus it's envisioned that it could be useful for applications such as password managers, authentication token generation, secure comms and crypto wallets; but given its high price point it only makes sense for individuals at-risk to use it.

Thus it's overkill for your average user, but if you're a comptroller with million-dollar signature authority on corporate bank accounts or a journalist operating in hostile situations it could be worth the cost.


i suppose "a journalist operating in hostile situations" should be better served by some kind of "steganographic solution", then a pretty shiny Precursor

otherwise, i wouldn't bet he'll come back with all the nails he had in the beginning (maybe if it's disguised as some kind of "stupid" appliance"..)


I don't understand what Rust has to do with the chip circuitry, beyond stuff that could be compiled to RISC-V code to run on the emulator. But it seems to have more stuff involved.

Also, I don't understand what about AES can be taking up so many LUTs.

Clue, please?


Researched.

The Rust business appears to be stuff to generate compile-time constants identifying addresses of peripheral registers for (Rust) code written to run on the core. Apparently the addresses change as you add and remove various peripheral gadgets.

It is doing AES rounds sequentially. It might be that real-time generation of round keys uses up a lot of LUTs.


Is it possible to verify the chip's content is what it says it is? With software we can run some kind of signature generator and verify the signature of the bits match but that seems like it would be impossible with hardware?


It's a bit like trying to detect whether you are running in a virtual machine or not. You can't trust a chip to run a piece of code to introspect on itself; instead a more reliable characterization of a chip may come from examining how it fails when you throw it stuff that is out-of-spec.

For example, glitching the clock may lead to a repeatable behavior that could only be produced if no additional circuitry has been added to the critical path that leads to the glitch.


> Is it possible to verify the chip's content is what it says it is?

No, and that's the big problem.

You can't even tell by grinding the layers off one by one and inspecting them with a microscope. There was a paper published recently showing how to backdoor a random number generator by fiddling with the dopants. No visual difference at any magnification level. I suppose you could test for it with an electron microscope but you'd have to know what you're looking for and where to look.


Isn’t that the reason people aren’t supposed to trust the RDRAND instruction from Intel?


Correct.


Librem is expensive and has less features than an android, but a lot of that cost went into more open hardware. It’s not cheap to do that


I don't think it's a good analogy.

Librem uses a very proprietary, very closed SoC.

The SoC has decent documentation, relative to the average of what passes for documentation these days, but that's about it.


> Librem uses a very proprietary, very closed SoC.

That is FUD. Yes, it is proprietary/closed. To say that it is _very_ proprietary and _very_ closed would imply that it is somehow _more_ proprietary and closed than normal. However, the higher quality documentation available for the i.MX series would make it any more closed. It is ever so slightly more open.


>FUD

Very was meant to highlight the contrast with a SoC that has open RTL.

i.MX is indeed one of the better documented ARM SoC families.


Is there any good source to read more about how closed it is?


Um, you have the burden backwards; closedness means a lack of documentation.

If you want to try, go here:

  https://www.nxp.com/products/processors-and-microcontrollers/arm-processors/i-mx-applications-processors/i-mx-8-processors/i-mx-8m-family-armcortex-a53-cortex-m4-audio-voice-video:i.MX8M?tab=Documentation_Tab
And try to download the "Security Reference Manual for i.MX 8M Dual/8M QuadLite/8M Quad". You can't. It's only available under NDA to extremely high-volume customers. And that manual is where all the goodies are.


I don't know if the author of the website is here, but please reconsider redesigning it. This is what it looks like on a 1900x1200 screen: https://i.imgur.com/Ivyse91.png

The use of white space really is too liberal.

Luckily Firefox has reader view, but that shouldn't be necessary to read an article.


Studies have repeatedly shown that maximum readability is at fixed widths capped to a maximum line length (in both inches and words). Obviously a screenshot does not convey PPI and you could be viewing it on a 60" display or on a 22" screen, but you get the point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: