Edit: the other thing is that choosing LSM trees (and other data structures) isn't just about storage, it's about access patterns. With a an LSM-tree, you're optimizing for insertions happening often, at the expense of querying. Insertion With B-trees, you spend more time inserting data into the tree, but it's easier to query later.
High performance data systems do optimize for this tradeoff even at the level of L1/L2/L3 cache and main memory, let alone between main memory and disk (which is way slower). The fact that this opportunity for optimization exists is just an irrefutable result of the cache hierarchy, i.e. the fact that there's faster and slower bits of memory, and the slower bits tend to be closer to the CPU and smaller. And this structure happens mainly because of expense, and the laws of physics.
If you're always inserting stuff, you want the location that you're inserting stuff into to be easily accessible. With LSM trees, you're inserting stuff sequentially, so it's easy to keep that spot in CPU cache. With B-trees, you have to scan all over the tree to insert stuff, meaning stuff is flying in and out of cache as you're locating and inserting things.
In my opinion, a nice middle ground between the B tree and LSM approaches.
Intel's Optane is one example, although it isn't quite as fast or as durable as Intel's initial marketing claims.
As regard O_DIRECT vs DMA, not sure what exactly you're saying. Understanding Linux Kernel book, for example, is using them quite interchangeably as well (4th edition, 2.6 kernel). Block devices are handled through interrupts and DMA, and these are lower-level means for doing IO. Scylla devs are using it quite interchangeably as well: https://github.com/scylladb/seastar/blob/master/core/reactor...
Wording might have been not optimal, but author never implied that O_DIRECT is DMA.
disclaimer: I'm the author.