Compared to Protobuf, there is definitely going to be a trade-off. Protobuf uses variable-width integer encoding whereas SBE uses only fixed-width integers which means there will be a lot of zero-valued high-order bytes and padding taking up space. But variable-width encoding is costly because it involves a lot of unpredictable branching -- this is exactly where SBE's speed advantage comes from. The exact amount of overhead will vary depending on usage; it could be about the same size, or it could be many times larger. (This is a general problem with benchmarks of serialization systems: the difference is highly dependent on use case, so you should never assume your own application will see the same kind of performance as any particular benchmark.)
Cap'n Proto also uses fixed-width encoding, but offers an optional "packing" step that is basically a really fast compression pass that aims only to remove zeros. I've found that this usually achieves similar sizes to Protobufs while still being faster, because the packing algorithm can be implemented it a tight loop whereas a Protobuf encoder usually involves sprawling generated code that does all kinds of things. SBE could pretty easily adopt Cap'n Proto's packing algorithm.
SBE is probably slightly leaner than Cap'n Proto because they don't use pointers. Instead, SBE outputs the message tree in preorder. Only fixed-width elements are found at fixed offsets; the rest have to be found by traversing the tree. The down side of this is that you want to find one element in a list, you must first traverse all of the elements before it (and all of their transitive children). Cap'n Proto's approach using pointers allows you to process input in the order that is most convenient for you (rather than in the exact order in which it was written) and makes it easy to seek directly to one particular element (which is awesome when you're mmap()ing a huge database), but does have a slight cost in terms of message size.
I'd add that SBE is targeting the FIX community, where there are many, but bounded message types and the incidence of lists (or lists of lists) is rare and already segmented to the sections of systems that people associate with lower performance requirements.
Cap'n Proto does not seem to have this focus (forgive me for assumptions about intentions) and therefore is more designed around supporting the general data structure case.
I'd say that in the cases where the performance differences between these 2 systems really matter you would be unable to make a fair judgement without implementing your real world problem in both and testing the differences. That said, I had an argument today with a colleague about stop bit encoding vs fixed width encoding and without testing real world solutions I was hesitant to make any claims, even though my experience and bias would be toward fixed width encoding.
As for Protobuf, it has been a while since I've worked with them, but when I did I had 2 specific performance bottle necks other than the variability (on the JVM) 1) was encoding non-primitive fields and 2) was not being able to do allocation free programming with them.
Cap'n Proto also uses fixed-width encoding, but offers an optional "packing" step that is basically a really fast compression pass that aims only to remove zeros. I've found that this usually achieves similar sizes to Protobufs while still being faster, because the packing algorithm can be implemented it a tight loop whereas a Protobuf encoder usually involves sprawling generated code that does all kinds of things. SBE could pretty easily adopt Cap'n Proto's packing algorithm.
SBE is probably slightly leaner than Cap'n Proto because they don't use pointers. Instead, SBE outputs the message tree in preorder. Only fixed-width elements are found at fixed offsets; the rest have to be found by traversing the tree. The down side of this is that you want to find one element in a list, you must first traverse all of the elements before it (and all of their transitive children). Cap'n Proto's approach using pointers allows you to process input in the order that is most convenient for you (rather than in the exact order in which it was written) and makes it easy to seek directly to one particular element (which is awesome when you're mmap()ing a huge database), but does have a slight cost in terms of message size.