Hacker News new | past | comments | ask | show | jobs | submit login

It looks they announced two changes here:

1. They are charging for use of the Overflow API, not the data itself.

2. They are enforcing the attribution clause of the existing CC-BY-SA license on the content. In their opinion if an AI bot answer includes parts from the stack overflow API (or data dump?), it should credit the most closely matching answer by linking back to stack overflow.




It is not clear if they can enforce the CC license at all here, they are not the authors of the content and their ToS do not contain any clause to delegate enforcement to SE.


Excellent answer.

Following up on the parent, for those who don't already know, EE = ExpertsExchange, an older question-answer website that got strong Google SEO for the questions but eventually hid the actual answers behind a paywall. If you wanted to see the answer, you had to pay. StackOverflow was, at least partially, a reaction to that in the beginning.

There was initially some confusion about the license behind the answers, and they went through some license-spinning (https://stackoverflow.com/help/licensing).

Now, most things at StackOverflow (and the other StackExchange websites) are under CC BY-SA 3.0 or 4.0. Important to point out that there is no ban on training in any of those licenses, probably because they were written before that was even a thing. However, regardless (and the obligatory IANAL), that attribution clause should certainly be included, if it was directly derivative of the code on SO. (How to track thousands of attributions across a large codebase is another question.)

Whether closing down the API is abiding by the spirit of the license is an open question, but it certainly seems to be allowed by the letter of the license.

My personal feeling is that this is rowing upstream, and that the large and incredibly-well-funded companies like OpenAI, Claude, Google, and Meta have already scraped all of that historical data, so this really only hurts the new startups that are poorly funded. However, I'm sure that making this deal with Google et al will be a good thing for Stack Exchange as a whole and perhaps the funding will breathe new life into Stack Overflow (et al).


It’s probably considered enough of a transformation that it no longer falls under the license, but that’s the main question on copyright that’s getting solved in courts right now anyway.

I wonder if the Share Alike license could be updated to include sharing models trained on that data. I’d certainly like to see more CC-licensed models out there.


Can't wait for the avalanche of excuses from AI proponents about how attribution is impossible because their stealing machine totally doesn't steal and even if it did couldn't tell you what it's stealing from and anyway this is the FUTURE!!!1! How dare you be opposed to literally any technological change we decide to call progress go back to your cave luddite




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: