If anyone from NVIDIA is reading this, I feel like this is a good, and under-explored area of research that'd be very tractable for you: figure out architectures custom-designed to do well on recent NVIDIA GPUs, and especially on Jetson Xavier.
One thing people don't realize is that EfficientNet/EfficientDet aren't necessarily the best choice _for their specific dataset_. In a way, a lot of these academic networks are overfit to the task of e.g. detecting objects in MSCOCO. If your dataset doesn't look like MSCOCO, there's no guarantee whatsoever that they will do well on it. Same with ImageNet for classification. ImageNet is very hard. To do well on it your net has to do something most humans won't be able to do without substantial training - recognize the various dog breeds. If your problem is simpler (which nearly all of them are), chances are you don't need as complicated a model to do well on it. Indeed, a "complicated" model is likely to actually do worse than a model that's "just complicated enough". Due to e.g. overfitting, or being more sensitive to noise in real-world data, and so on. Not to mention it will naturally limit your experiment throughput, which is one of the most important factors for getting a good model that does something practical.
I think part of the problem is that it's currently too easy to reach for the big models - why train one from scratch where you can just tweak a few layers and get better results.