This paper describes a new basis for the implementation of the shifter functional unit in microprocessors that can implement new advanced bit manipulations as well as standard shifter operations. Our design is based on the inverse butterfly and butterfly data path circuits, rather than the barrel shifter or log-shifter designs currently used. We show how this new shifter can implement the standard shift and rotate operations, as well as more advanced extract, deposit, and mix operations found in some processors. Furthermore, it can perform important new classes of even more advanced bit manipulation instructions like arbitrary bit permutations, bit gather (or parallel extract), and bit scatter (or parallel deposit) instructions. Thus, our new functional unit performs the functionality of three functional units—the basic shifter, the multimedia-mix unit, and the advanced bit manipulation functional unit, while having a latency only slightly longer than that of the log-shifter. For performing only the existing functions of a shifter, it has significantly smaller area.
Something I have always thought would be interesting is if memory was more bit addressable. Would we find non-obvious efficiencies, like the ones in the Shifter paper, by generalizing from byte to bit addressing?
* More aesthetically pleasing (to me!)
* Flexible bit-scalable integer ranges
* Flexible bit-scalable floating point precisions
* More efficient use of data layout and transfer
* Other non-obvious benefits?
Giving up 3 bits of 64-bit addresses for bit level precision doesn't seem like a big disadvantage for address space.
= 18,466,744,073,709,551,616 bytes
= 18 quintillion bytes
= 18 EiB
= 18,466,744,073,709,551,616 bits
= 2,305,843,009,213,693,952 bytes
= 2.3 quintillion bytes
= 2.3 EiB
Arbitrary bitwidth loads from Arbitrary bit offsets? Basically https://www.cplusplus.com/reference/cstring/strncpy/
I'd like to see structs be representable in the CPU and the CPU would generate all the load store instructions, stencil computations would have a hardware level description.
Could you see supporting this on say a RISC-V cpu? Or would it be a new ISA designed to support bit addressability? If you patched it in, what would it look like?
What if memory was multidimensional? It would be excellent for storing extra metadata, counters, permission bits, provenance, event listeners.
- Mux complexity would be drastically increased which costs area, power and latency.
- Additional gearboxing around SRAM's would be far more complex.
You might be right, but a long rethink of all that would be worth it before concluding that. I would say I am holding onto hope, but I doubt this will happen any time soon. :)