Could you point me to a description of the VLIW compiler problems? In 1981 a small group of us coerced the Unix verion 7 portable C compiler to generate VLIW assembler as a senior project. There was nothing astonishing going on; the pcc had perhaps a couple dozen things it needed to be able to generate (conditional execution, arithmetic, pointers), and it was a simple matter of not using stuff before the (very primitive and shallow) pipeline was able to deliver it. After graduating I lost touch with that kind of fun tech - I was hired to modify accounting software written in BASIC. I've recovered ;-).
The most readily accessible one is the Itanium compiler. Intel (and SGI) worked to create a compiler that could maximize the use of the VLIW instruction set of the processor to achieve application specific performance.
This was presented by Intel to its enterprise customers as the 'secret sauce' that would give Itanium the edge over SPARC and other 64 bit architectures. They have invested millions in making this effort work.
However, reception of a workflow for the Itanium compiler was mixed at best. Some workloads it out performed, others it simply matched. The process for training the compiler, which seemed to me at the time to be an outgrowth of the branch prediction efforts, involved running synthetic benchmarks, collecting information about utilization of the execution units and then synthesizing new instruction mixes for the applications. The imposition of such a heavy weight process which needed to be repeated for nearly every change to the code base, worked against the benefits promised. Since code is likely to change often, proposals of waiting until you are 'ready to ship' before optimziation and tuning. But once shipped patches are made, bugs fixed, anomalies corrected. Changing any line of code could create a massive stall in the pipeline and crush performance until the system was re-tuned.
I don't know if that experience was universal, but it was common enough that such stories were everywhere at places like the Microprocessor Forum and other conferences.
One of the problems with VLIW architectures is lack of binary compatibility between CPU generations.
Suppose you had 4-way VLIW architecture and the next generation become 8-way. Even if new CPU will be able to run old 4-way code, it will twice as slower, I.e you need to recompile your software.
In a sense, they did. And they did famously fail to meet expectations.
But considering that nowadays other stacks which rely on jitting regularly achieve real-world performance that is competitive with much native-compiled software, it seems safe to presume that Transmeta's performance problems stemmed from reasons beyond the basic idea behind CMS.