It will be prohibitively difficult to train the model without some kind of hardware assistance (CUDA). This means that if we're building an ImageNet object detector, even if the code implements the model correctly the first time, training it to have close-to-state-of-the-art accuracy will take several consecutive months of CPU time. Torch has rudimentary support for OpenCL, but it isn't there yet. There are very good pre-trained models that are licensed under academic-only licenses that also help fill the gap. (This is about as permissively as it could be licensed because the ImageNet training data itself is under an academic-only license anyway.)
I'm not sure what niche this project fills. If you want an open-source neural network, you have several high-quality choices. If you need good models, you can either use any of the state-of-the-art academic only ones, or you would have to collect some dataset completely by yourself.
Does this necessarily follow, that a machine-learning model is a derived work of all data it's trained on? As far as I know, the law in this area isn't really settled. And many companies are operating on the assumption that this isn't the case. It would lead to some absurd conclusions in some cases, for example if you trained a model to recognize company logos, you'd need permission of the logos' owners to distribute it.
(This is assuming traditional copyright law; under jurisdictions like the E.U. that recognize a separate "database right" it's another story.)
I'd like to note that some publishers, like Elsevier, allow you access to their dataset (full texts of articles) under a license with the condition that you can not freely distribute models learnt from their data.