Here's a RTL-level approach:
Here's one for the hybrid Globally-Asynchronous, Locally-Synchronous model:
On low-level details, here's two on asynchronous synthesis:
Synthesis and verification together
And here's a few designs from my collection:
Note: That's an earlier one I'm including since older nodes are still available via MOSIS and Europractice. Any patents are probably expired. It got first pass in silicon, too.
And I just randomly found this paper that has an optimization algorithm to reduce the area and latency disadvantages of asynchronous circuits: