It would certainly alleviate the license concerns. If it was possible to train it to a level (that produces effective output), then sure.
As a thought experiment, I thought "what would happen if we trained it on our 15 million lines of product code + my language-ext project". It would almost certainly produce something that looks like 'us'.
But:
* It would also trip over a million or so lines of generated code
* And the legacy OO code
* It will 'see' some of the extreme optimisations I've had to built into language-ext to make it performant. Something like the internals of the CHAMP hash-map data-structure [1]. That code is hideously ugly, but it's done for a good reason. I wouldn't want to see optimised code parroted out upfront. Maybe it wouldn't pick up on it, because it hasn't got a consistent shape like the majority of the code? Who knows.
Still, I'd be more willing to allow my team to use it if I could train it myself.
> This would enforce bad behaviors and make it even harder for fresh developers to argue against it.
I think this is a significant point. It maintains the status quo. We change our guidance to devs every other year or so. New language features become available, old ones die, etc. But we're not rewriting the entire code-base every time, we know if we hit old code, we refactor with the new guidance; but we don't do it for the sake of it, so there's plenty of code that I wouldn't want in a training set (even if I wrote it myself!)
As a thought experiment, I thought "what would happen if we trained it on our 15 million lines of product code + my language-ext project". It would almost certainly produce something that looks like 'us'.
But:
* It would also trip over a million or so lines of generated code
* And the legacy OO code
* It will 'see' some of the extreme optimisations I've had to built into language-ext to make it performant. Something like the internals of the CHAMP hash-map data-structure [1]. That code is hideously ugly, but it's done for a good reason. I wouldn't want to see optimised code parroted out upfront. Maybe it wouldn't pick up on it, because it hasn't got a consistent shape like the majority of the code? Who knows.
Still, I'd be more willing to allow my team to use it if I could train it myself.
[1] https://github.com/louthy/language-ext/blob/main/LanguageExt...