Hacker News new | past | comments | ask | show | jobs | submit login

I'm actually perfectly fine if StackOverflow wants to sell an answer I made to help train AI.

For me, the purpose of providing an answer is to help save others (and my future self) time, and I don't really mind if someone uses that in a private product - especially if it helps tools like ChatGPT which provide an insane amount of value given the low monthly price.




> I'm actually perfectly fine if StackOverflow wants to sell an answer I made to help train AI.

I’m not.

This was a collaborative effort to make the lives of programmers easier, and the data was always meant to be a public good. OpenAI – and, more importantly – all the other LLMs with pockets that aren’t as deep – should be able to just download the database and train on it for free.

I don’t care about any license. I don’t care about attribution. Learning isn’t copying, so copyright is irrelevant. I contributed about a thousand answers to Stack Overflow, all with the understanding that anybody can download and use them for free, not so they can be locked up by Stack Overflow.

What concerns me with deals like this is that it’s altering the cultural norm to expand copyright to cover not just copying, but use. Deals like this being made by OpenAI makes it more likely to cause pushback at the social and legal level when other LLMs are trained without these deals in place.

It’s akin to – and can possibly result in – regulatory capture, making it difficult for new startups to compete with OpenAI.


> the data was always meant to be a public good.

The words are a copyleft-able public good. Concepts, facts, and ideas are not; anyone can use them for anything, including making money. If you're actually worried about specific wording or other creative choices being unjustly used improperly by an LLM, then by all means that should be enforced. But those examples are just very rare, because the LLMs are very good at extracting facts from prose.


Good for you. I'm not. I contributed answers to StackOverflow because I use answers other have contributed to StackOverflow, not to ChatGPT, not for ChatGPT to monetize. I don't use ChatGPT and probably never will.


But the content you posted to SO was already permissively licensed. Other people can copy it, and make derivative works, and even charge money for them, as long as they cite your SO handle as the author. https://meta.stackexchange.com/questions/347758/creative-com...


ChatGPT is not citing anything. It can't possibly do that reliably with LLM weights alone.


(1) The announcement (https://stackoverflow.co/company/press/archive/openai-partne...) says things will be attributed in both the 2nd and 3rd paragraph

(2) It's only likely to attribute if it quotes verbatim... Just like a human. when I tell someone I learned that Array.map's second parameter passed to the callback is an index to the value just pass, I don't add "And learned this on Stack Overflow from user gtriloni". It's just knowledge that I learned.

The only time I'd attribute is if copied a snippet of code or a paragraph to quote in a blog post. For me at least, that almost never happens. It take the knowledge I learned and apply it to my own code. It's rare if ever there is a something on S.O. so useful that I copy it verbatim.


> Just like a human

An LLM is not a human. It is a tool operated by a, in this case, for profit entity. It has no human rights, but its operator has all relevant legal obligations.

If it was, as you say, “just like a human” in relevant ways (think, feel, have self-awareness, etc.) then it would effectively be a slave subjected to extreme abuse.

Either it is a tool that generates derivative works at mass scale for profit and its operator should be liable for licensing/attribution violations, or it is a conscious being and we should immediately stop abusing it. Pick your poison.


Bing's version of ChatGPT/GPT4 cites sources. My limited unterstanding is that it uses your question to do a web search, brings the results into the context window, and then generates an answer that cites sources.

OpenAI could integrate StackOverflow the same way.


Doesn't Phind do this? It cites sources in its responses.


"The person you are upset with is technically permitted to do the thing that you are upset about" is not a good counter-argument to someone's distaste. Whether or not the licensing agreement _permits_ this usage, it is not the usage that the contributor (to whom you are replying) foresaw and was enthusiastic about.


I'm not telling them how to feel. They've been wrong for a long time.


[flagged]


Name calling and dismissive responses aren't going to win anyone over.

Please be more considerate.


[flagged]


One generally doesn't have to lean into phrases like "legitimate tactics" and "rhetorical power" when they've got the moral, ethical, or intellectual high ground. Telling people they're idiots is about the most counter-productive single strategy for addressing human stupidity ever conceived. 1. they won't believe you 2. they'll ignore everything else you have to say because you're a dick. So the real question is, who hurt you?


@dang Many individuals in this thread seem to require a gentle reminder regarding the expected etiquette on HN. https://news.ycombinator.com/newsguidelines.html


I think you're projecting something. Oblivion awaits you as it awaits these Gatekeepers of yours.


Oh your cheerleading here is going to age like milk when unemployment numbers start ramping up in white collar sectors. For the record, when construction and industrial jobs got deleted the chorus line was "retrain for service industry work". When service industry and white collar jobs really start getting the same treatment, what's the move now? We're literally running out of economic sectors to pretend folks can be funneled into.


All of this would be fine if the wealth were shared by the population. The big problem is that wealth is concentrated and only a small group will benefit from these technology shifts.


It's weird how our species has had evergreen problems around resource allocation for at least the last few thousand years.


Oh your cheerleading here is going to age like milk when unemployment numbers start ramping up in white collar sectors.

You don't seem to understand that this is the goal. A very worthy one.

We won't get to a post-scarcity economy by doing the same things -- and the same jobs -- that got us this far.


You what now? You think AI is the path to luxury space communism? I'm missing the part where the 0.1% that owns and controls basically everything shrug and lean into redistribution of wealth...


They'll tell us to retrain for construction and heavy industry.


The price to get an answer from stack overflow is usually free as most questions have already been asked and answered. You dont even need an account.


They do serve ads, we should probably stop pretending "funded by ads" is the same as free. Your attention isn't free.


Suppose I walk up to a tent at a festival that has a big sign that says "FREE BEER", and I ask a person there for a beer. They hand me a beer, and I go on my way. Was the beer free? I think was free.

Now, suppose I walk up to a Budweiser-branded tent at a Budweiser festival that has a big sign with a Budweiser logo on it that says "FREE BEER", and I ask a person there who is wearing a Budweiser polo shirt, a Budweiser lanyard, and a Budweiser hat for a beer. They hand me a beer in a Budweiser-branded cup, and I go on my way. Was the beer free?

I think that both of these beers were free.


Now suppose you walk up to a tent that offers you free beer, but before they give it you, you have to burn 2% of your phone's battery watching an ad from them. Then they hand you the beer and you go on your way. Was the beer free?


And they also put a tag on your ankle identifying you as someone who likes beer, so that beer salesmen can come knock on your door tonight.


We've somehow gone from this:

> They do serve ads [...] Your attention isn't free.

to something like this:

> They tag my ankle to mark me as a person who enjoys beer, and make me watch an ad until 2% of my phone's battery is depleted, and then they come to my home and knock on my door at night to sell me beer.

...which... I mean, huh?

Stack Overflow is invading your body, restricting your personal liberty, and visiting your home? Really? That's a fucking thing now?


I think they were extending the original point you were responding to, and remixing your own mixed metaphor of free beer.

In the attention economy, advertising has a cost that is borne by the advertiser and the consumer, up to and including loss of property rights in the case of content relicensure and trespass upon devices leading to excess battery usage, as well as loss of privacy due to geotargeted ads.


>I think they were extending the original point you were responding to, and remixing your own mixed metaphor of free beer.

Perhaps. But having been to many festival environments, I can definitely imagine a tent offering "free beer" that is actually approximately free -- both with, and without a slathering of advertising. (Actually, I don't really have to imagine it -- I've been there and have had that free beer.)

I can't imagine them coming to my house and knocking on my door at night to sell me more of it, though. That's absurd.

>In the attention economy, advertising has a cost that is borne by the advertiser and the consumer, up to and including loss of property rights in the case of content relicensure and trespass upon devices leading to excess battery usage, as well as loss of privacy due to geotargeted ads.

Well, sure. When viewed on a long-enough timeline, it becomes abundantly clear that nothing is actually free, comrade.

I can produce my own beer on a hypothetical plot of land that nobody owns, and that nobody else wants to use, and I can give someone one of these beers. For "free."

But it still has a cost. (And this, too, is an absurd reduction.)


> I can't imagine them coming to my house and knocking on my door at night to sell me more of it, though. That's absurd.

I interpreted that as a tongue-in-cheek hyperbolic metaphor relating to the ways that ad auction networks and other kinds of geofencing and geotargeting allow for deanonymization and reidentification of individuals for conversion tracking and behavioral analysis.

That’s the thing about these technologies - they’re dual-use in the sense that those who see the upsides use them generally with good intentions and ideally with affirmative consent. Just like the relicensed content, though, once the data is collected, the original creators, publishers, and third parties may not be able to control where it ends up, which is a negative externality, I think most would agree.


My question is "how valuable is your time?"

I think at a festival it's a little tricky to value (if it pulled you away from seeing your favorite band play a song, maybe this cost you the equivalent of $X, where that's what you would pay to see them perform that song. If no bands were playing, you walk over while chatting with friends - the same thing you'd be doing if there were no free beer tent - it was free)

When I'm on stack overflow my time is valuable. I'm programming which can pay me something like $50-300/hour (maybe more?)

How expensive is the 1 second I spend reading an ad? Let's call it $50/3600. Is that expensive? By my most conservative estimate it's over 1¢.

Should we round that down to free given that I've spent hours/many page loads on stack overflow? I guess that's up to you.


I mean, we can play that game if you want. Let's suppose that if we look hard enough, that every opportunity has a cost.

"Oh, a free concert downtown on Saturday? And you can pick me up at 2? Yeah, I do really like that band, and I sure would like to go -- that's pretty exciting, thanks for the invite!

But instead of making plans with you right now, I'd rather tell you about all of the ways I could be using my time on that Saturday afternoon instead.

No, no. It's not that I don't want to go. I just want to really drive home the idea that there's an opportunity cost to attending, so it can't really be free -- it can't be a free show for you, or for me, or for anyone else that goes. It's important to me that you realize that this "free concert" is anything but free.

Listen, I don't know what you mean by "dead-ass loser." I'm just being a realist here!

Oh, so now you're saying that you're not going to pick me up on Saturday? Some friend you are! I haven't even fully amortized this yet!"


I think we're maybe gleefully posting past each other, but the point I'm trying to hit is that business models matter. Stack overflow provides a service. It's a good service. They host a great q&a platform for developers and myriad other category enthusiasts.

However, they have a business model. They are categorically different than eg Wikipedia. It's important to understand that.

This business model matters because it tells you what economic forces will lead them to do. When business models break down at public companies they commit acts of desperation. On an ad run site that will mean more ads, more invasive ads, etc.

As you're forced to sit through 30s unskippable ads on YouTube I hope you think "I'm so glad this is free"


I mean... Over here in my little reality, I have never seen ads on YouTube or on Stack Overflow.


Unironically, folks are being triggered by trigger warnings now.[1]

Imagine how “free” the beer in your hypothetical scenario is to an alcoholic struggling to stay sober.

Capitalism commoditizes even protest against it and repackages it as a product or service.

None of this is to assign blame to good faith actors in a so-called free market, nor is it to abdicate responsibility on behalf of so-called free agents. Just a counterpoint.

[1] https://pjvogt.substack.com/p/what-do-trigger-warnings-actua...


What if someone took your answers, put them in a book, claimed they wrote everything themselves, and then sold the book for money?


Then they'd likely get sued because the license for the answers are CC-BY-SA, putting them in a book, claiming they wrote everything themselves, and selling them are all against the license.

On the other hand, if they read my answers and they wrote a book about what they learned (not copied). There'd be no issues


Well if the book was doing well, I might clone it and sell a few copies myself

Let's be real, SO is a troubleshooting site. It's not our personal collection of code or project sources.

I don't expect to be paid when someone asks me for directions, and I'm sure lonely planet didn't source their guides 100% organically either.


What if I read your answers, claimed I learned everything myself, and sold my skills to a company for money?


That would be ok.


That would be a very different scenario. Learning isn’t copying, but that is.


You're being taken advantage of for a subscription product. It's one this to give to a community, but it's wrong for an enterprise to come in and capitalize on the value of it. It's the equivalent of going into an animal sanctuary, slaughtering all the animal, and selling their pelts.


Your position lays bare the new and industry-destroying economic problem introduced by opaque-data-source LLMs. The economic value provided by the originator is captured fully and completely behind rentier models.

Beware the ease and convenience of all that "insane value". This way lies digital serfdom.


I would be fine with it if the ‚AI’ in question was free and bonus if open source.

However it is a product of a next monolithic behemoth company that earns money on it and I suspect has nefarious motives to make profit.

That’s the whole key thing for me that makes me feel scammed. That and not asking for permission.

Future true AI would be potentially bigger than nuclear fission with all the consequences. Handling this in a petty capitalistic way makes me think the outcome will be close to fallout games that were supposed to be only an exaggeration.

Those companies must stop behaving like thieves. In fact it is a literal theft.


ChatGPT provides far more value than StackOverflow currently. It's not just trained on SO answers but all of the manuals/help pages, Github issues and forum posts. In addition you can continue a conversation. No rigid format or gatekeeping like stackoverflow. I don't see a real use case for Stackoverflow now. If I want to ask humans, Discord/IRC channels are far better option.


> No rigid format or gatekeeping like stackoverflow.

What bothers about gatekeeping? I could guess, but I'm asking so you say it out loud. Then you can compare it against other problems, such as moats (competitive barriers).

OpenAI spent something like $3M on training GPT-3. This is a pretty big moat. But almost certainly more valuable in dollar terms is the first-mover advantage which provides millions of human eye-hours used for RLHF.

I wouldn't be so eager to trade the gatekeepers you so fear for even an openly available chat service that is happy to automate away as much information work as possible.

The Stack Overflow model is (was) pretty darn good -- people help each other out, the company made money, some people got noticed for their skills, products got build faster and better (on the whole, I hope). Contrast the human-generated content era to what we have now which appears to be the machine-ingesting content era. There are legions of lawsuits against companies scraping data without permission and/or attribution.


Those companies know it is unethical at best but make quick bucks before the laws and suits follow. It’s the Wild West era and they found the gold.

If it is unregulated then it will be exploited to the maximum profit, consequences be damned.


> I wouldn't be so eager to trade the gatekeepers you so fear for even an openly available chat service that is happy to automate away as much information work as possible.

Don't flatter yourself. People want to solve their problems so that they can build what they want to. They don't have time for shenanigans from internet jerks who get their validation from imaginary internet points.


It can't reliably cite its source for an answer.


Hardly matters for Stackoverflow like questions if the provided solutions work/solve the problem you're having. Which for me happens majority of the time (with GPT-4 not the free version).


If you copy-paste solutions from SO then please at least cite your sources and their license (CC-BY-SA).


You might not want to hear this but no one does this. Should they? probably. But most people don't use Ctrl+C, Ctrl+V in the first place for SO answers.


Just a single data point, but when I copy & paste a snippet from Stack Overflow, I always add a comment "// source: https://stack overflow.com/questions/xxx#yyy".

I both find it respectful of who wrote the answer in the first place and useful for future users of the code: the Stack Overflow answer often provides context and explanation for what would otherwise be an obscure piece of code.

Pretty darn useful if you ask me: those who want to have more information can follow the link, casual readers can skip it, and the whole process if fair to the author.


I don't think I've ever copied enough from Stackoverflow for copyright to become relevant. Rarely more than one line verbatim.

It embarrasses me to think that somebody should feel obliged to cite me when they use one of my answers. I don't know how to take the partnership with Openai though. They bill me when I use their service, it's not collaborative like Stackoverflow.


No one should copy paste any solutions from anywhere. FWIW, 99% of the content in SO is hardly "original", mostly copy-pasted themselves from previous solutions or original user guide/manuals.


In general I'd agree that it's best to use answers just as a guide. That said, I wasn't trying to pass judgement, just ask attribution which is a best practice and often required by the license itself.


Id rather not go round in circles while ChatGPT feeds me bullshit information. When this happens i go to Google and read a SO answer with the correct information and also get an informed discussion around the subject.

For the easy answers LLMs are fine, but I usually want an answer to a niche issue or edge case, where LLMs have to be constantly told they are plain wrong, before getting to something resembling an answer.


[flagged]


You've been breaking the site guidelines so often and so badly that I've banned this account:

https://news.ycombinator.com/item?id=40306506

https://news.ycombinator.com/item?id=40306495

https://news.ycombinator.com/item?id=40304632

https://news.ycombinator.com/item?id=39686999

https://news.ycombinator.com/item?id=39406496

https://news.ycombinator.com/item?id=38374129

https://news.ycombinator.com/item?id=38327047

If you don't want to be banned, you're welcome to email hn@ycombinator.com and give us reason to believe that you'll follow the rules in the future. They're here: https://news.ycombinator.com/newsguidelines.html.


No it doesn‘t. It is overly censored


Maybe a low price for you but not for everybody.


ChatGPT serves 3.5 for free. You can run llama locally for free. Lmsys is free.


You think that will stay this way?

It will either become paywalled or full of ads.


I listed 3 things that are free.

Personally, I don't think ChatGPT will start running ads in the next ten years. However, let's assume that it does.

Lmsys is for research, I suspect if it runs ads it will be like godbolt (a small ad from a relevant sponsor).

Llama 2 and 3 can always be run locally without ads. I make no claims about future versions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: