That "works" (minus circular references) in simple cases, but it still has bad worst case performance (if your decrements chain, causing many decrements and frees).
And in something like boost::smart_ptr, the cache performance is awful because the increment/decrement is an extra indirection in a separate allocation, which is getting hit constantly.
Actually, general malloc performance is horrible and that refcounting implimentation will be the absolute worst. In an accurate GC, the objects can be moved around in memory for more efficient allocation. Add support for generations and it becomes significantly faster than manual allocation. However, refcounting misses all of that and screws up on circular referrences to boot.
And in something like boost::smart_ptr, the cache performance is awful because the increment/decrement is an extra indirection in a separate allocation, which is getting hit constantly.