With regard to code size in this comparison someone associated with llvm-mos remarked that some factors are: their libc is written in C and tries to be multi-platform friendly, stdio takes up space, the division functions are large, and their float support is not asm optimized.
I wasn't really thinking of the binary sizes presented in the benchmarks, but more in general. 6502 assembler is compact enough if you are manipulating bytes, but not if you are manipulating 16 bit pointers or doing things like array indexing, which is where a 16-bit virtual machine (with zero page registers?) would help. Obviously there is a trade-off between speed and memory size, but on a 6502 target both are an issue and it'd be useful to be able to choose - perhaps VM by default and native code for "fast" procedures or code sections.
A lot of the C library outside of math isn't going to be speed critical - things like IO and heap for example, and there could also be dual versions to choose from if needed. Especially for retrocomputing, IO devices themselves were so slow that software overhead is less important.
More often than not the slow IO devices were coupled with optimized speed critical code due to cost savings or hardware simplification. Heap is an approach that rarely works well on a 6502 machine - there are no 16 bit stack pointers and it's just slower than doing without - However I tend to agree that a middle ground 16 bit virtual machine is a great idea. The first one I ever saw was Sweet16 by Woz.
I agree about heap - too much overhead to be a great approach on such a constrained target, but of course the standard library for C has to include it all the same.
Memory is better allocated in more of a customized application specific way, such as an arena allocator, or just avoid dynamic allocation altogether if possible.
I was co-author of Acorn's ISO-Pascal system for the 6502-based BBC micro (16KB or 32KB RAM) back in the day, and one part I was proud of was a pretty full featured (for the time) code editor that was included, written in 4KB of heavily optimized assembler. The memory allocation I used was just to take ownership of all free RAM, and maintain the edit buffer before the cursor at one end of memory, and the buffer content after the cursor at the other end. This meant that as you typed and entered new text, it was just appended to the "before cursor" block, with no text movement or memory allocation needed.