Somebody asked why are FPGA EDA tools so large. Well, this is the number one reason.
So, at the end of the day, the real reason why FPGA companies don't open source their bitstream (and as I said, the actual database) is simply because it will be a major undertaking for them to document in any way that will actually make it possible for the community to use. An FPGA is NOT a processors so it not as easy to document as documenting an instruction set.
So, very hard to do, and not enough of a business justification to do so (combined with old school management that don't really understand the value of open source). That is it.
BTW, it will actually be relatively doable to document the basic Logic Cell, but the problem is that in today's modern FPAGs, the logic portion is a relatively small portion (when considering complexity) compared to the very complex I/O interfaces.
I think the best you can hope for (and what I believe both X and A are moving towards) is a more flexible tool flow, and heavy use of technologies like Partial Reconfiguration, which should allow you to build lots of tools and simply use the FPGA tools (mostly P&R and Timing Analysis) as "smaller" black boxes, while allowing open source or third party to build higher level system integration tools (which IMO, is what is more needed today).
I think this is your best argument. I believe one of the Big Two did open-source a large aspect of their FPGA's or software a while back. Maybe the bitstream. Nobody cared and they didn't make more money. So, they have no incentive to do that plus negative results that will probably happen.
I was one of the biggest champions of opening our SW and worked hard to create open APIs to give access the device database and timing engine, even where there was no real business justification. The limitation was always what we could make usable without shipping an Altera engineer with the SW. A fair amount is available but unfortunately undocumented. If you look close enough, a some of the features in Quartus are written in plain Tcl, which you can reverse engineer.
I still like the analog with CPUs. If there was no gcc or LLVM and the vendors all had their own compilers, there would be little incentive to open up the ISA. In a word with gcc and LLVM, you're dead in the water if there isn't a port.
I was a little surprised to hear a big part of the job is documentation. How do the chip design teams communicate with the tool development teams? Or is there a problem with releasing internal documentation?
FPGA manufacturers keep their bitstreams secret because there is no real demand for openness. The customers are happy with the product, they are applying the chips and they work. Outside of the tinkerer world there exists a vast realm of industry where access to the bitstream format would be a nice-to-have but not a must. If it were then a manufacturer would surely take the lead in this and capture the marketshare of the others.
For tinkerers and open source proponents this is of course a less-than-ideal situation, we'd like to see that bitstream documented because then we can make tools operate on them and that generate them. But for the vast majority of the customers (some exceptions do exist) this is not a big issue.
The most heard complaint is that the toolchains are cumbersome, slow and too large, not that the bitstreams aren't open (funny though: those toolchains would be fixable if the bistreams were open...).
The article explained the reasons were two: I.P. and tooling. Short version for I.P. is there's all kinds of stuff integrated with FPGA's on many nodes efficiently and you'll always have less due to less cash. Far as tooling, vast majority of Big Two's expenses are on R&D for EDA tools like HLS and logic synthesis. That's the hard part that took them a decade or more to get working enough to be usable as an ASIC alternative or easier to pick up for new HW engineers. You can't just duplicate FPGA HW: you need EDA tools to map hardware designs efficiently and correctly to that HW.
The I.P. problem is tolerable but the EDA problem isn't. There are academic tools to draw on and Mentor even has an FPGA tool that can be re-targeted. Yet, Xilinx and Altera give away most of their tooling free with best stuff dirt cheap compared to six digit EDA tools. The new contender must have FPGA, onboard I.P., and EDA tool that's just as effective despite lacking years & hundreds of millions in R&D. Just ain't happening. So, the focus has to be on niches Xilinx and Altera haven't conquered yet.
Now, I do have several business models I'm tweaking that might allow for a new, OSS FPGA and toolchain with both non-profit and commercial strategies. But you basically need big players individually or in combination willing to loose millions each year to support R&D on it. There's also a few firms that short-cut developments whose tools would help a lot. The effort could acquire them. So, there's possibilities but it will be a niche, unprofitable market regardless.
Consider: open tooling drives user device adoption, which in turn drives tooling refinement. At some point people working on the tooling are going to have ideas that simply didn't happen, or gain management support, in the commercial side.
Now if this enables some advancement that that suddenly makes ICE40 devices more appealing in end products, then Xilinx and Altera are going to be on the outside looking in. If further, Lattice watches what's happening and develops a hardware enhancement that accelerates/reduces power/etc. whatever development has happened on the open tooling side, this will further entrench their style of FPGA architecture.
For example, I am certain that the openness of Linux has essentially killed the search for new "process models" on both the software and the hardware side. (Think address space structure, and mmu design.)
However, if we are realistic, the web paradigm makes very few architectural requirements. Somebody, anybody, can re-architect the stack from the moment a GET request hits a NIC to provide whatever the existing mass of cruft on x86 systems provides probably for far less cost, far more performance, and far better efficiency.
The question then becomes, who are you doing it for? If it's for customers, and you require them to use specialized tools, then they're always going to whine about lock in. (There are ways to solve this problem, but this is already getting rather long.) So the only hope is that you build an app that leverages your infrastructure that no one can clone without a greater investment in traditional tooling.
This is all just a long winded way of saying that I do believe that there is something "out there" in having open FPGA tooling. The time is right. I see a lot of future in SoCs with integrated FPGA fabric featuring open tooling.
Personally, I'd love to see something like an ARMv8-M Cortex-M7 core(s) + FPGA. Do your general purpose stuff on the micro-controller, and accelerate specific tasks with dynamic reconfiguration of the fabric.
What is going to happen, however, is Intel will release a massive, overpriced, Xeon with an inbuilt Altera FPGA and only the likes of Google and ilk are going to be able to afford to do anything with it.
Here's hoping though. I have belief in the Chinese!
Citation needed. Given that most of the device adoption that's already taken place was with seriously-expensive closed-source tools that aren't exactly paragons of UX quality, I think this is assuming a LOT.
My guess is that the quality of the tools (and I'm handwaving "open" into "higher quality" which is not guaranteed) is a distant concern compared to lots of things: device capabilities, power envelope, $/10k, IP library, and many more take precedence.
They do. There's been all kinds of open tooling and more open hardware. What did most people buy and keep investing in? Intel, AMD, IBM, Microsoft, FPGA Big Two, EDA Big Three, etc. Got the job done easily, reliably enough, and at acceptable price/performance.
I call out OSS crowd all the time on why they havent adopted GPL'd Leon3 SPARC CPU's & open firmware if it means so much. It doesn't have X, it costs Y, or too lazy to do Z. Always.
"What is going to happen, however, is Intel will release a massive, overpriced, Xeon with an inbuilt Altera FPGA and only the likes of Google and ilk are going to be able to afford to do anything with it."
That was my prediction. I look forward to it, though, as low-latency and ultra-high bandwidth interface is what FPGA co-processors need most. First seen in SGI Altix machines that I recall. It was a smart acquisition by Intel.
Far as ARM + FPGA, maybe you'll like these:
" I have belief in the Chinese!"
They're already trying to live up to it:
What we personally would love to see or not does not really matter if the economical underpinnings aren't there.
Back to my analogy, the current situation is like having everything to produce assembler or binary but no tools to work with those. You can't debug them, modify, hotload, do JIT's, app restructuring... any improvement to your software that requires access to assembler or binary in RAM. Quite limiting eh, esp given JIT benefits?
Situation is similar for FPGA's with open bitstreams allowing them to be manipulated in situations that would improve performance, flexibility, and so on. Instead, we get these closed bitstreams done all at once by inefficient tools. Poses artificial limits esp for embedded or dynamic scenarios. Opening them up opens up the full power of academic and OSS innovation to those areas. Much of the best innovations in ASIC and FPGA came out of academia so we want the rest opened for improvements.
Funny you should say that. My initial thought on reading the article was, "I wonder if I could set up an open source FPGA company." Of course, the answer is, "No," because I don't have anywhere near the resources that would be needed to do it (in terms of money, expertise, and so on).
From what I know, this is the only public effort that's remained up so far:
That is ignoring all the complexity that makes modern FPGAs what they are. Three different variants of LUTs, on board block RAMs, Hard IP blocks, or even a whole ARM subsystem. Even your quoted article says that the reverse engineering was possible because "There are not many different kinds of tiles or special function units".
The interfaces to all of these are complex and change regularly as they release new variants.
You might, with the right team, and after many many months of effort work out most of what you need to program one of the simpler Altera/Xilinx devices. But they're a moving target.
The best analogy I can think of is that every month ARM releases a processor with some changes to the instruction set. Tracking that with an open source compiler if you DID have documentation would be hard enough.
ARM "hard" peripherals can be assigned to MIO pins, which have nothing to do with the PL, or EMIO which can be routed through PL to the FPGA pins. In both cases, the user must write to PS registers to configure internal muxes and assign the peripheral to a given set of pins. In the latter you must assign the ARM pin as an input in the PL and route it to a PL output as in any regular FPGA design.
>The ARM subsystem has bus connections to programmable logic which happens entirely in the FPGA (e.g., you connect a 16550 UART to the ARM via AXI bus, etc).
Yes it has but the interconnect subsystem is configured in software not through the bitsream.
> I'd say Zynq is a far more complex bitstream than just a vanilla FPGA.
Zynq's PL is a Xilinx 7 Series fabric, I wouldn't say both are much different in complexity. I'm not saying they're simple, tough...
However this has just changed. There is an excellent project at http://www.clifford.at/icestorm/ that has reversed the format for a series of the Lattice FPGA and provide an open source end-to-end solution... And AFAIK they haven't been told off for doing so.
I therefore have one of their USB stick evaluation board on my desk to have a play with as soon as I can. They also have 'hacker friendly' packages.
The most useful tools in fpga dev have little to do with the part of translating the Verilog/Vhdl into something the fpga understands.
Just going to add this:
I had to install VMware (on OS X) and load an Ubuntu ISO just to play with Xilinx's tooling. I dev on win, Ubuntu, and OS X. Primarily Ubuntu. But just a heads up to anyone reading this that can positively affect this: if even I am on a Mac, it's time to expand your offering.
I don't think that having a tool that can't produce working binaries(bitstream or native code) would be that useful. Can you imagine a GCC that can't produce native code but instead have to generate COBOL sources(with almost random feature set) to be compiled by vendors tools?
>[...]Intel about their microcode[...]
Intel have a widely known instruction set that can be used with their cpus, there are many compilers, interpreters(and more) that generate and use it without problems, so I don't see how this analogy is relevant
>There is so much more to the process of creating something useful with an fpga.
You need tools and potential creators of such tools need access to bitstream format used by fpgas to create useful tools
>The most useful tools in fpga dev have little to do with the part of translating the Verilog/Vhdl into something the fpga understands.
But without the ability to translate your code(any code not just vhdl/some variant of verilog) into bitstream, where you would run* it on? It's hard to consider tool useful for fpga development when it can't produce something that can be run* on fpga...
*oh come one, don't be pedantic
The tools available for that task are compact and I'd imagine quite portable based on the OSes they are already supported on. Huge vendor downloads aside, I don't think the problem in the ecosystem is with those tools not being OSS.
Intel's instruction set (ISA if you will, for those reading), is at a higher level than microcode. Microcode is specific to an architecture but not the instruction set. No programmer touches microcode and as far as I'm aware, it's secret and proprietary.
The useful tools to be created that would be open are not related to bitstream formats. (This is actually an assumption that I cannot fully verify at this time.)
I'm not not sure how to address the last point. I'm not trying to be pedantic at all. I'll come back to this as I wrap my head around it.
Edit: I'm continuing to make small edits to grammar and wording.
Also there are many usefull tools that could be created if bitstream format was known.
Some of with could be usefull for other devs, not only fpga devs. And there are so many things you could potentially do with your bitstream, like generating it on the fly.
While floss tools would be nice,(gcc or llvm compiling a function to be accelarated by fpga, and then proper bitsream be genarated on startup by fpga driver) but just having alternatives from vendor suplied toolchain would be good.
Those tools generally are more "barely tolerable necessities" than "useful". Also, I think we're overdue some more languages for hardware development; both Verilog and VHDL are about 30 years old.
All this stuff, including the libraries, can be pretty compact. There is no justification whatsoever for the ISE and Vivado bloat. The essential tools should be very compact. There are multiple copies of a microblaze toolchain, for example. WTF? I need none exactly. Zero. Nil. Stop bundling all the crap, please! Put it in a separate package so everybody could happily ignore it.
That they were trying to include or support most of their devices or use cases in one distribution was my guess as to the reason for the bloat. Thanks for confirming it. I'll add that they'd do even better writing it in a LISP or ML-like language with macro's plus efficient compiler. The code would be readable, robust, efficient, and take no more space than necessary.
Cadence is using Lisp extensively, and their distributions are also not very small and are hard to maintain.
I just think they could be several times smaller if they only included necessary functionality and added extra's repo-style depending on one's device, use-case, and so on. Btw, is Lisp used for extensions over there or the whole application? And which LISP?
And, yes, it's about a time for the monolithic distros to die. People are already spoiled by package managers, there is no justification for 10gb downloads any more.
"Customers will damage their FPGAs with invalid
bitstreams, and blame us for selling unreliable
This used to be true when FPGAs had internal tristate
resources (so you could drive the same net from
competing sources); this is no longer the case for
EDIT: Modern process makes this problem worse actually, since wires are by definition smaller, decreasing their failure threshold. Even a short for a nanosecond can cause irreversible damage.
Jumped out at me. 1 nanosecond is 10-9 second, assuming two switching elements with a 10 Ohm RDSON connected to the rails (probably on the low side, such small elements usually have a rather high ON resistance) that's a 20 Ohm series resistance on 3.3V causing assuming an instantaneous rise (which it won't be) which leaves you with about 0.5 nano-Joule of energy spread out over two locations. That's an extremely small amount of energy to be able to cause damage, surprising!
This is no longer true of modern FPGAs. There is no way to create a signal that is driven by multiple sources; the structure of the FPGA guarantees that any net will have exactly one driver.
the structure of the FPGA guarantees that any net
will have exactly one driver.
It's common for electronics manufacturers to release diagrams like this: http://i.imgur.com/k6o8mwO.png detailed enough to give you an idea of how the circuit behaves and why, but well short of a complete internal schematic for the entire device.
Google "Xilinx FPGA cell" https://www.google.co.uk/search?tbm=isch&q=xilinx+fpga+cell and you'll find similar approximate diagrams exist for FPGA cells. That's the detail level I'm interested in, and I think it's reasonable enough to believe it would exist?
 "Wrong" in the sense that they are only proper abstractions for the officially supported functionality. For example, the DSP48E blocks of the Xilinx Virtex-5 can be chained for higher precision, but if you interpret the diagrams literally and try to build unsupported functions, it won't work as you expect.
What I suspect is happening is that the good old tristate bus can't be turned round fast enough for highspeed designs so some sort of mux network has been built to replace it. The place to look is probably in the patents.
The die space issue is an artifact of using SRAM to store the configuration (likewise using flash is also too big and cumbersome), not to mention the power usage. However, the new non-volatile memories coming up (like memristors or nano-ram) are absolutely minuscule and will have a dramatic effect on the competitiveness of FPGAs. All the sudden you can have an extremely regular structure, with no specialized tiles, with performance comparable to ASICs. In fact, there's an argument that they will outperform ASICs because as we start getting into quantum effects because of die feature shrink, being able to function despite a relatively high degree of errors (which FPGAs can route around with minimal performance loss) will become a huge advantage.
Here's a paper that talks about it:
The very regular structure of the iCE40 is quoted as a reason for its choice as a reverse-engineering target - this helps a great deal with the tooling issue. If the above prediction is true, then FPGAs can potentially be: extremely regular, on par with ASIC performance, and potentially cheaper to produce because of economies of scale. These factors will, I believe, change the dynamics of the market to a point where an open source FPGA is a viable option for crowd-sourcing.
That premise doesn't seem to be right. The routing adds overhead but that's not the real complexity. FPGA's have always been about trying to get more performance and flexibility at the same time. So, instead of simple structures and symmetry, the FPGA's have all kinds of things on them ranging from complex logic units to MAC's to processors to accelerators. This creates difficulty for both OSS and commercial EDA tools in efficiently handling them.
Customers want this stuff, though, because it lets them get more done with less or acceptable amounts of money. The open or simpler alternatives don't. So, the market won't shift to them. If anything, like with OS's and smartphone SOC's, the barrier to entry will only grow with those accepting something open or simple being a niche market.
Note: Lattice iCE40 is already serving a niche market. So, it fits my assessment.
When nonvolatile nanoscale memory becomes more widespread (the demand driven by SSDs, but FPGAs get to benefit), the additional interconnect power costs get close to zero, and when things get so small that nanowires are much smaller than the smallest gate, the space requirements start being negligable as well.
At that point there's no real advantage to having those complex tiles, because the equivalent configured FPGA circuit is just as efficient, so you might as well take advantage of the flexiblity of a completely uniform fabric.
The point of view taken in the paper I linked above is that even LUTs are an uneccessary optimization at that point.
I'm not saying all of this is an inevitability - but it's my bet.
No, they don't. The FPGA's have LUTS that represent other logic gates in weird ways for flexibility. They also have traditionally, software-programmed macro-cells like DSP's and MAC's which are logic programmed. And there's an interconnect and power-saving tricks on top of that. Different enough that properly synthesizing to FPGA's is a different subfield with different techniques and sometimes 5 to 6-digit software to do it well on heterogenous tiles. I got simpler, free ones that can synthesize or optimize logic with primitive gates.
"The lower the power and space efficiency of the interconnect, the larger the incentive is to put ASIC-style interconnected gates into little islands in the FPGA fabric - hence the various complex tiles like MACs that you mention."
"At that point there's no real advantage to having those complex tiles, because the equivalent configured FPGA circuit is just as efficient, so you might as well take advantage of the flexiblity of a completely uniform fabric."
You seem to be looking at the technical aspect for the simplest and most elegant solution. Bets on that usually fail because what drives these markets is consumer demand and what's good for business. Consumer demand wants things faster, cheaper, optimized for their use case, and so on. That, not interconnects or whatever, prompted the creation of complex tiles that could accelerate their workload. SOC, HPC, and cloud server markets have been going in same direction with offload engines for same reason.
The other end of the problem are the suppliers. They know that they need to differentiate to sell more chips. So, they create chips with different specs, new types of LUTS, onboard I.P., accelerators, and so on. They, academics, and startups then create tools to try to utilize them effectively. Consumer demand for this sort of things isn't going away and neither is the need to differentiate with them. So, this pattern will remain regardless of technical arguments as it always has in every part of computing industry.
So, we get to nanoscale memory, nanowires, nano-FPGA's... whatever fictional, better tech you want. So, they get created. Now, everyone has super-fast, low-power chips with tons of logic. Parkinson's Law and the above pattern kick in: companies need to differentiate, solution providers start using exponentially more resources for their competitiveness, and users want ways to run these cheaper/faster/whatever. So, they start adding I.P., using different types of nano-blocks, getting clever with interconnects, and so on.
In other words, you've swapped out the physical components but changed nothing that drove them to complexity in the first place. The drivers will still be there in age of nano-FPGA's. So, they'll still be complex, consumers appetites will still be insatiable, and EDA runtimes will be higher than ever. This outcome is always the safe bet.
LUTs are composed of gates. By gates I mean literal transistors etched in the silicon. They're the simplest form of optimization that FPGAs use to get around the cost of re-routable interconnects.
However, you seem to be missing my point: that complex tiles will no longer be an advantage, for performance or for differenciation. We're reaching the bottom of what silicon can do, and the rules of the game are going to change - the future isn't going to be just smaller and faster versions of the same thing we have now.
And while I agree that differenciation is a powerful driver, it happens just as often as not that the ability to differenciate goes away. That's how we get commoditization. Certainly there will be still be the ability to differenciate based on the various analog peripherals on the chip (GPIOs, DACs, etc.), but in terms of the actual digital FPGA fabric, I think that will go away, and the benefits of standardization and reduced SKUs (i.e. in terms of supply-chain management and economies of scale) will start to win out.
And, certainly the market-driven complexity you talk about will always exist, but it will be driven from the silicon side of it.
I really doubt that given what markets done so far. I hope you're right, though. It would make things so much better for us. I won't bet on it but I'd like it to happen.
Making the bitstream format a public API would make it harder for them to change/update/improve it without making themselves look like assholes for breaking third party software.
That's an interesting point. I think you're the first person I've seen make it. Customers work ends at RTL level mostly so I don't see this as a real issue. Especially if customers are told not to depend on bitstream staying same from device to device.
Others note that the bitstream format is only a small piece of the puzzle, and while I agree, I don't yet care about the HDLs and all of the tools built on top. Just figuring out the bitstream format for more FPGAs would be a huge win and would enable a free toolchain to be begin to be written.
Anyone wanting to see more projects, info on what developing open HW will take w/ potential paths, and so on can follow my links here:
Anyway, we have most of what we need in terms of tooling. It's just going to be a huge loss getting the initial I.P. built and ASIC-proven on each process node. Best route is academics w/ professional support each doing a piece of key I.P., using grants + discounts to build it, and then public domaining it. So, who knows a near billionaire who wants to change the HW world for the better? And who can fight patent suits?
For at least this one person, downloading closed source software from the chip manufacturer is not satisfactory.
The comments also add that these downloads can be on the order of 8-20GB.
Has anyone ever wondered what is in those large binaries?
Does someone think the larger size somehow offers potentially more IP protection?
Does the FPGA utilities world have anything like teensy_loader_cli? It's about 28k. Not limited to MS Windows. Works with both BSD and Linux.
Each aspect of synthesis, equivalence checking, testing, etc has whole MS's and PhD's dedicated to it. I'm sure the result can be a lot smaller than 20GB but it's still going to be incomprehensible by one person except in pieces. And that person has to be an expert on every aspect of hardware development from digital to analog to the wires that make up the gates. Like major OS's or software, you're always going to be taking someone else's word that a huge chunk of it's safe. Might as well plan around that.
* Simulation and verification tools
* Licensed IP cores for a bunch of common tasks
* Toolchains for at least three different platforms (ARM, Microblaze, PowerPC)
* 32-bit and 64-bit versions of everything
* A JRE
Bottom line is, there's legitimately a ton of stuff in there.
This way the entire toolchain can be open source, keeping only the bitstream packer closed. Lawyers are happy, users are happy - win-win.