They apparently existed from 1969 to around late 1970ish or perhaps early 1980ish, during which time they made computers which were open to Microcode programming by the end user -- they even gave the end user a manual on how to do it:
This manual should be entitled "How to build a late 70's style Microcode programmable computer". It really gets into the guts.
No keyboard or screen, although I seem to remember it had an 8in floppy drive.
Its been so long I can hardly remember it, but I distinctly remember those cards with the square metal cans, many of them socketed. It had a really nice set of schematics and service manual inside the case also.
edit: After some google digging, it must have been something like a 3274 terminal controller.
10 or 15 years ago I was reading the classifieds, and a local company was dumping an entire mainframe system with terminals. I don't remember which one it was, but I think I looked up the model numbers and I found it was something like a $150,000 mainframe system ten years earlier. The price? Make an offer! I probably could have had it for a couple of hundred bucks and U-Haul rental charges!
But yeah, I would have wanted one of those if it worked!
It would just be cool to have...
More info: http://www.righto.com/2016/09/xerox-alto-restoration-day-5-s...
Start of story (the part 1 link is not easy to find): http://www.righto.com/2016/06/restoring-y-combinators-xerox-...
The Zuse Z3 is probably the simplest Turing-Complete computer that was ever invented (in 1941 no less!); I'd start there:
An ALU (Arithmetic Logic Unit) that does simple addition and subtraction, binary negation, and integer comparisons.
A collection of registers that store binary bit patterns.
A set of data path switches that connect the elements together in various ways - e.g. so you can connect a register to an ALU and do some math on it, or copy the output of one register to another.
There's also an instruction decoder which converts MOV AX, BX into a set of control signals for all the other parts. For example it sets up the data path switches, connects AX and BX, and then triggers a write on BX.
The first instruction decoders were made from hardwired logic. They shipped with the computer, and they were impossible to change. 
Then it was realised that the logic could be replaced by a kind of nano-program for each machine instruction which set up all the elements dynamically.
This could be baked into ROM, or it could be loaded on boot. The latter meant instruction sets could be updated to add new features to the CPU. This also meant the same hardware could run two different instruction sets. (A nice trick, but often less useful in practice than it sounds.)
The real advantage was a cut in development time. Instead of having to iterate on board designs with baked-in instruction decoding, the hardware could be (more or less...) finished and the instruction set could evolve after completion. Bugs could be fixed at much lower cost.
It also meant the instruction set could be extended almost indefinitely with no extra hardware cost. (DEC's VAX was the poster child for this, with linked list manipulation and polynomial math available as CPU instructions.)
And it meant that cheaper CPUs in a range could emulate some instructions in compiled software, while more expensive CPUs could run it at full speed in microcoded hardware - all while keeping code compatibility across a CPU family.
The modern situation is complicated. Modern CPUs are fully modelled in software before being taped out and manufactured, so boot-loadable microcode isn't as useful as it once was.
ARM is fully hardwired (so far as I know) but x86 has a complex hybrid architecture with some microcoded elements - although I believe they're fixed on the die, and can't be updated.
 More complex CPUs have floating point support, but the principle is the same.
 In fact the earliest decoders were diode arrays, which could be swapped out and replaced. So the idea of microsequencing has been around almost since the first CPUs were built.
Another approach is https://en.wikipedia.org/wiki/No_instruction_set_computing
Which allows programming the CPU directly without relying on static instruction sets(essentially programming at or below microcode level).
Then there is slight terminological problem with the tendency of both Intel and AMD to label essentially any binary blob they don't feel like documenting as "microcode", ranging from few bytes of configuration data, through actual CPU microcode to complete RTOS images for some embedded CPU.
They weren't great but for a small, ambitious company without much chance to iterate they definitely weren't bad.
The technology got incorporated into NVIDIA's Project Denver ARM chips, which were very fast… sometimes. Actually, Denver 2 is shipping in the new Jetson devkit, but I haven't seen much about that or been able to get my hands on one.
That's the general problem with VLIW chips, is that they're amazing on some workloads and pretty bad on others. Itanium, for all its faults, was by far the best general-purpose VLIW CPU.
Your comment about VLIW is true, but in this case that was a 2nd order effect. Far far worse was The effect of the JIT (CMS): a small kernel running over a long time (e.g., DVD playback) would work absolutely excellently, even on the Crusoe, whereas a large codebase with relatively little reuse (say, Word) would give very uneven performance. This problem was never solved [at Transmeta].
The irony is that today, x86 compability is irrelevant in most places.
One thing I've thought about is having a code-morphing CPU as a sort of accelerator that a single process could be offloaded to (of course, this would require an OS that can marshal processes). Think a database server, a JVM, NodeJS—long-running processes that would benefit from JIT.
Might even be viable with upcoming cache-coherent interconnects like CCIX and GenZ. A more ambitious implementation could offload groups of functions.
I would credit them for pushing Intel (and AMD) to optimize for power savings rather than raw speed, because Intel started making significant perf/W improvements.
Transmeta processors were (I believe) the first x86 processors to automatically vary voltage and clock speed multiple times per second. In fact, I wrote a small utility that would display the current clock speed on your taskbar in a graph because it changed so rapidly.
Anyone remember Moblin? Apparently Intel started work on that Linux distro because Microsoft balked regarding an x86 platform without PCI device lookup. And Intel removed the PCI stuff in an effort to cut power drain.
As an aside, running windows on ARM to this day require special SoCs.
You still needed to add a display controller but that was true of all the competing CPUs too.
It appears to be based on the System/360 architecture.
Don't miss to google "John Titor" in this context ...
[Edit] Some links:
1) The sections on SCAMP (the prototype forerunner of the IBM 5100) and on Small Machines in "The IBM family of APL systems" by A.D. Falkoff:
2) On PALM and 32-bit microcode this talk at Wikipedia:
3) On the John Titor story: https://groups.google.com/forum/#!topic/alt.folklore.compute...
* Bit-Slice Design: Controllers and ALUs, Donnamaie White
* Bit-Slice Microprocessor Design, Mick and Brick
Also, 1.2MB pr floppy must have been massive back in '78.
BTW, when they say gate arrays, they mean that the whole thing was implemented on FPGAs?
These are chips with a large grid of gates, which they have in common with FPGAs, PLDs, PALs and the like, but you program them by changing the final metalisation layer during manufacture rather than by feeding a bit-stream into them at initialisation.
They were cheaper and faster to make than truly custom chips. A stock of etched and doped silicon wafers could be held by the manufacturer, so they only needed one mask for the final metal layer to be produced for each customer. This also meant that the turnaround from order of a new design to delivery was faster, as only that single mask needed to be prepared.
Some resources (including sources for a UNIX/X11): http://computermuseum.informatik.uni-stuttgart.de/dev_en/ibm...