> I tested various fio options, but didn't notice this one - I'll check it out! ...

> I tested various fio options, but didn't notice this one - I'll check it out! It might explain why I still kept seeing lots of interrupts raised even though I had enabled the I/O completion polling instead, with io_uring's --hipri option.

I think that should be independent.

> edit: I ran a quick test with various IO batch sizes and it didn't make a difference - I guess because thanks to using io_uring, my bottleneck is not in IO submission, but deeper in the block IO stack...

It probably won't get you drastically higher speeds in an isolated test - but it should help reduce CPU overhead. E.g. on one of my SSDs fio --ioengine io_uring --rw randread --filesize 50GB --invalidate=0 --name=test --direct=1 --bs=4k --numjobs=1 --registerfiles --fixedbufs --gtod_reduce=1 --iodepth 48 uses about 25% more CPU than when I add --iodepth_batch_submit=0 --iodepth_batch_complete_max=0. But the resulting iops are nearly the same as long as there are enough cycles available.

This is via filesystem, so ymmv, but the mechanism should be mostly independent.