The most readily accessible one is the Itanium compiler. Intel (and SGI) worked to create a compiler that could maximize the use of the VLIW instruction set of the processor to achieve application specific performance.
This was presented by Intel to its enterprise customers as the 'secret sauce' that would give Itanium the edge over SPARC and other 64 bit architectures. They have invested millions in making this effort work.
However, reception of a workflow for the Itanium compiler was mixed at best. Some workloads it out performed, others it simply matched. The process for training the compiler, which seemed to me at the time to be an outgrowth of the branch prediction efforts, involved running synthetic benchmarks, collecting information about utilization of the execution units and then synthesizing new instruction mixes for the applications. The imposition of such a heavy weight process which needed to be repeated for nearly every change to the code base, worked against the benefits promised. Since code is likely to change often, proposals of waiting until you are 'ready to ship' before optimziation and tuning. But once shipped patches are made, bugs fixed, anomalies corrected. Changing any line of code could create a massive stall in the pipeline and crush performance until the system was re-tuned.
I don't know if that experience was universal, but it was common enough that such stories were everywhere at places like the Microprocessor Forum and other conferences.