I believe it was figured out from a combination of analysis of the blob, examining the patents (which contain quite a lot of information), trial-and-error, and careful examination of the fragments of VC4 source released by Broadcom (their big source release a year or so back contained quite a lot). I know that at least one person wrote a program which would run arbitrary instructions on the Pi and analyse the state of the registers afterwards, which is a neat trick.
The VC4 is a really nice processor, BTW. Dual core, lots of registers, efficient instruction packing, 64x64x8bit vector unit, integrated single-precision FPU using the integer registers (no double-precision, alas), 1kB on-chip lookup table... but, bizarrely, no adc or sbc instructions, so 64-bit arithmetic is hard. Very weird.
https://github.com/hermanhermitage/videocoreiv
I believe it was figured out from a combination of analysis of the blob, examining the patents (which contain quite a lot of information), trial-and-error, and careful examination of the fragments of VC4 source released by Broadcom (their big source release a year or so back contained quite a lot). I know that at least one person wrote a program which would run arbitrary instructions on the Pi and analyse the state of the registers afterwards, which is a neat trick.
The VC4 is a really nice processor, BTW. Dual core, lots of registers, efficient instruction packing, 64x64x8bit vector unit, integrated single-precision FPU using the integer registers (no double-precision, alas), 1kB on-chip lookup table... but, bizarrely, no adc or sbc instructions, so 64-bit arithmetic is hard. Very weird.