Deepseek doesn't have the infrastructure to support Apple. I suspect that they are also not interested in tailoring to Apple's needs given their mission and small size.
Deepseek's open source inference code, while correct, may not be fully efficient. For example the MLA needs the right associative matrix multiplication order to be efficient.
Do you have any benchmark run yet? I am interested in knowing how many tokens/sec you can get to. Though in the end it should be more efficient to run the model on distributed server clusters.
Recent developments like V3, R1 and S1 are actually clarifying and pointing towards more understandable, efficient and therefore more accessible models.
reply