Passing via the stack is slow, it requires pointer indirection and forcing entities that reside in registers into memory (because of SRA, purely local structures might already reside only in registers).
The AMD64 ABI was designed to allow passing aggregates via registers, the same as for scalars (many x86 ABIs passed everything in memory), this way aggregating some related parameters in a structure for whatever reason does not imply a performance penalty.
BTW passing small structures by value is actually now fairly common in C++, although objects that are not trivially copyable must always be passed via stack through an hidden pointer parameter even in the by-value case.
Large objects must be passed on the stack of course if there are not enough registers to store them (the ABI define exactly when that's necessary).
The AMD64 ABI was designed to allow passing aggregates via registers, the same as for scalars (many x86 ABIs passed everything in memory), this way aggregating some related parameters in a structure for whatever reason does not imply a performance penalty.
BTW passing small structures by value is actually now fairly common in C++, although objects that are not trivially copyable must always be passed via stack through an hidden pointer parameter even in the by-value case.
Large objects must be passed on the stack of course if there are not enough registers to store them (the ABI define exactly when that's necessary).