Why do you need metadata to describe where the "patches" are located? Why not simply putting all frames one below another, where all unchanged areas are transparent? In case missing transparency support in old browsers is an issue, just use a special unused color instead. Since PNG does a very good job in compressing away those empty areas, the resulting image won't become noticeably bigger.
Is there any reason for which you excluded this option?
From memory PNG doesn't compress as well as you may intuitively think in this scenario, with the resulting files being around 3X the size.