SSDs mostly tell the host system that they have 512-byte sectors or sometimes 4kB sectors, and the typical flash translation layer works in 4kB sectors because that's a good fit for the kind of workloads coming from a host system that usually prefers to do things (eg. virtual memory) in 4kB chunks. But the underlying NAND flash page size has been 16kB for years.
Emulating 4kB or 512B sectors when the underlying media has a 16kB native page size really doesn't add much more complexity on top of the stuff that was already required to handle the fact that erase blocks are multiple megabytes.
The complexity doesn't come from the emulation. It comes from trying to do the emulation efficiently based on assumptions about the behaviour of the other moving parts... which are also doing the same thing.
So, you've got firmware that is pretending you've got 512B/4kB chunks when really you have 16kB, and anticipating how the other layers might be doing things in order to maximize performance.
Then you have a filesystem/VFS layer, which tries to optimize its access patterns anticipating how the underlying solid state storage might be really doing things in 16kB sizes and how it might be optimizing 512KB & 4kB accesses to fit that.
Both those layers are dealing with filesystem journaling and how that might impact performance.
Then you might have a database, which is now trying to anticipate how the filesystem and the underlying firmware might be optimizing access patterns, and so it's trying to optimize to fit all that.
You also potentially have application logic that is trying to anticipate how the database might do things...
What you tend to end up with are many layers of redundant caching that are all working against each other in a very inefficient manner.