That is a trade-off towards density that seems not worth it where all it would take is a 16 bit NOP to pad and a few more bytes of memory to save on transistors of implementation.
Maybe they did the actual math and figured it's still cheaper? Might be worth it.
Their argument is that since eventually there'll be 8 byte instructions, those will have the same cache line issues (tho that could be addressed by requiring 8 byte instructions be 8 byte aligned)
Maybe they did the actual math and figured it's still cheaper? Might be worth it.