Hi! It depends on how far you want to go. For this project, we did a lot of exploration, because we had Google-scale infrastructure. Replicating all the exploration will need a lot of GPUs (in the 100s), replicating all the experiments that actually went into the paper maybe a few dozens. Training something similar as what's in the demo with the code we will release takes 1 V100 :)
We can't release the internal training set, but expect a dataset of a few hundred thousand images (e.g., openimages) to be sufficient, maybe even less (AFAIK this has not been explored in a controlled setting).
We can't release the internal training set, but expect a dataset of a few hundred thousand images (e.g., openimages) to be sufficient, maybe even less (AFAIK this has not been explored in a controlled setting).