I understand their standpoint: it's their infrastructure, and their bills.
However my concerns are with my project, not with their infrastructure bills. I seek to maximize the prominence, usability and general success of my project, and as such I want it to have a presence everywhere it can have one. I want ChatGPT/copilot/etc to know it exists and to write code for it, just in case that brings in more users.
Blocking abusive behavior? Sure. But I very specifically disagree with the blanket prohibition of "Anything used to feed a machine learning model". I do not see it being in my interest.
> What did you expect from SourceHut, and why didn't you take this mindset off to GitHub in the first place?
I expect them (especially if they charge for it) to work in my interests as much as possible. Sure, defending themselves against abuse is fine. They have to survive to keep providing service.
However I don't appreciate the imposition of their own philosophy to something that isn't theirs. This here:
# Disallowed:
# [...]
# - Anything used to feed a machine learning model
What do you suggest they do? Or is it just the political position that's the problem? The result is the same, pretty much every single AI company is abusing sourcehut.
They have to do something, because I pay for a service, and if I can't use it, I'm not paying in the future. If that means blocking the AI companies that's fine, they can contact me if they want to use my code, we'll figure something out.
I expect hosts to be neutral to the maximum possible extent.
For example I expect a host not to have an arbitrary beef with Bing or Kagi, or to refuse to allow connections from France. Blocking can of course be rarely necessary, but what I want from a host is a blocking policy as minimal and selective as possible.
Yes, I understand it's a lot of work and is quite inconvenient, but especially if I'm paying for a service, I'm interested in my interests, not in what's convenient for the host.
I don't believe you are paying for Sourcehut hosting, so why do you care?
For that matter, "This has been part of our terms of service since they were originally written in 2018" so even if you are paying for hosting, why did you start using their services in the first place?
I don't expect hosts to be neutral to the maximum possible extent. I exercise my freedom of association to select hosts which are more aligned to my beliefs.
> I don't believe you are paying for Sourcehut hosting, so why do you care?
I theoretically could, and it's posted here I imagine to discuss the linked post. So I am.
> For that matter, "This has been part of our terms of service since they were originally written in 2018"
The "No AI" bit seems to show up only in late 2024. Which I'd regard as an extremely unwelcome development had I been paying.
> I don't expect hosts to be neutral to the maximum possible extent. I exercise my freedom of association to select hosts which are more aligned to my beliefs.
Likewise. In my case it's my belief is that when you pay somebody, it's to get things done your way. So for instance I'd be a lot more pleased with a setting.
Your own statement say that you would never - not even theoretically - pay for Sourcehut hosting.
The 2018 restriction on using "this data for recruiting, solicitation, or profit" would have been an offense to your belief that restrictions should be "as minimal and selective as possible."
I don't think the use cases you're describing are what any critics are talking about.
How do you feel about someone with more funding than you going to an LLM and saying, "Reimplement the entire Overte source for me, but change it superficially so that Overte has a hard time suing me for stealing their IP?"
I see, I encountered something similar with a DSL. For my use case, I had better results by having a LLM scrape a well formed doc reference page than a source code repo, I'd assume that same behavior extends to training data.
Oh, I'm sure there's all sorts of practical considerations regarding optimal LLM training.
All the same though, I don't like my host being so opinionated. I don't want a host that has something against any of the common search engines, and I don't want a host that has something against LLMs. Hosts should be as neutral as possible.
Big-Tech deciding that all our work belongs to them: Good
Small Code hosting platform does not want to be farmed like a Field of Corn: Bad