You should be looking at 7-8b sized models, then. https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct is considered pretty strong for its size. That said, you shouldn't expect more than glorified autocompletion at that point.
I'm not familiar with the practical memory requirements on Macs but I suspect that with 16gb of integrated ram you won't have issues running 14B models even at q6_k, certainly would be fine at q4 and it's definitely going to be capable of writing code based on instruction as well as minor refactoring, generating docstrings etc.
The model itself will fit just fine, of course, but you'll also want a large context for coding. And then since it's integrated RAM, it's also used by everything else running on your system - like, say, your IDE, your compiler etc, which are all fairly memory hungry.
Also keep in mind that, even though it's "unified memory", the OS enforces a certain quota for the GPU. If I remember correctly, it's something like 2/3 of the overall RAM.
* The server runs in a docker container which has an ssh server installed and running in the background. The reason for SSH is simply because that's what edeliver/distillery uses.
* The CI(local github runner) runs in a docker container as well which handles building and deploying the updated releases when merged on master.
* We use edeliver to deploy the hot upgrades/releases from the CI container to the server container. This happens automatically unless stopped which we do for larger merges where a restart is needed.
* The whole deployment process is done in a bash script which uses the git hash for versioning, edeliver for deploying and in the end it runs the database migrations.
I'm not going to say it's perfect but it's allowed us to move pretty damn fast.
I always remind myself that even the celebrated works supposedly have glaring holes in them. If they can get popular and be cherished, then my work too doesn't have to be "water-tight" at all times.
Just think about the human batteries in the matrix movie. Anyone who heard that thought the machines were using human brains as biocomputers, but no, they actually insist on the battery explanation.
Or someone could read it as the Ericsson management not really understanding the capabilities of their own internal tools, and must've read something up on Java or .NET in an HBR article.