there are literally thousands of years of artwork that fall under public domain,...

adlpz · on Nov 4, 2022

My guess is that is not as much about the amount of available data but how accessible it is. Scraping the internet seems to be one of the preferred ways of gathering vast amounts of, in particular, text and images.

Telling apart what's public domain or not is not a trivially automatable task.

If one just relies on curated libraries of vetted public domain content you don't get, by far, the expected amout of variability and diversity.