He mentions the Colab for Dreambooth, that only takes ten minutes or so to train using an A100 (the premium GPU) and you can have it turn off after it finishes, and saves to Google Drive. Super easy.
I've trained a few smaller models using their Dreambooth notebook, but I think for 4000 training steps, an A100 will usually take 30-40min. I believe replicate also uses A100s for their dreambooth training jobs.
Ah I see, you're right 40 minutes sounds about right for that amount of training. Curious why the decision to train 40 images? I've used 15 for two separate subjects in Dreambooth with excellent results. I'm no expert, experimenting the same way as you, but haven't trained on more than 15-20 images per subject.
I've found the most important part is spending a good amount of time getting the prompts, although I'm not sure if having the person in an environment embodied and describing the objects around them helps give the model a "sense of scale"? For example if I just train "wincy" in the fast Dreambooth "wincy" will be the only token it'll know, with no other info in the prompts, it didn't know what in the image was "wincy" (me). I accidentally did this on training my wife (no prompts at all) and she got really mad at me at how ugly the results were (you made me ugly! haha)
Have you tried it with and without your dog in an environment, then describing the environment your dog is in for the training data?