I'm dubious of that in cases where the training set isn't distributed. If we call the training copyright infringement is downloading an image infringement? is caching?
I think it's more a question of derivative work. Normally derivative work is an infringement unless it falls under fair use.
Now a human can take inspiration from like 100 different sources and probably end up with something that no one would recognize as derivative to any of them. But it also wouldn't be obvious that the human did that.
But with an ML model, it's clearly a derivative in that the learned function is mathematically derived from its dataset and so is all the resulting outputs.
I think this brings a new question though. Because till now derivative was kind of implied that the output was recognizable as being derived.
With AI, you can tweak it so the output doesn't end up being easily recognizable as derived, but we know it's still derived.
Personally I think what really matters is more a question of what should be the legal framework around it. How do we balance the interests of AI companies and that of developers, artists, citizens who are the authors of the dataset that enabled the AI to exist. And what right should each party be given?
The real kink in that application of derivative work to me is the entire dataset goes into the model and is to some vanishingly small extent is used in every output how can we meaningfully assign ownership through that transition and mixing. And when we do how do we do it without exacerbating the extant problem of copyright in art? We already can't use characters and settings made during out own lifetimes in our own expression because Disney got life + 70 through Congress.