For instance, there's been a few papers talking about "transfer learning" where a network is trained on video data for the equivalent of tens of thousands of hours, and then used to control a robot (where the robot's inputs are partly from video). The pre-lesrned weights help significantly, as you'd imagine.
In another sense, it's often useful to use a pretrained network as the input to your model (so you'd run the images through another network that outputs a simplified representation of the images, and then run a second model on that). That's currently quite useful; I could see something like that being super useful here. Train on one task with lots of data, and then switch to something similar with less data.
Having actuators isn't necessary for consciousness because locked-in people are conscious but can't move.