Hacker News new | past | comments | ask | show | jobs | submit login

There is tons of english text and images with permissive licensing. All of stack overflow, wikipedia is creative commons. Anything created by the US government or many other governments is public domain

The terms of the CC-BY-SA licenses that Stack Overflow and Wikipedia largely use cannot practically be satisfied in a data model. By design, all outputs derive from all sources to some extent, and the licensing requires that they generally be specifically identified, so you can’t just say “from Wikipedia” or “from Stack Overflow” but “from such-and-such a page, by so-and-so”.

“Permissive” is not enough. You need no-strings-attached, and attribution is a string. Hence mostly talking about public domain materials, which make up the vast majority of suitable materials.

I suspect that covered works of the USA federal government would be quite a large fraction of the public domain material (as reckoned by the USA) from the last 70 years. I don’t believe it’d be enough to be particularly useful, certainly not for pop culture knowledge or colloquial idiom.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
