Hacker News new | past | comments | ask | show | jobs | submit login
Stack Overflow Community Is Not Happy with the OpenAI Deal (favtutor.com)
75 points by thunderbong 14 days ago | hide | past | favorite | 57 comments



This might be a controversial opinion, but to me it seems kinda petty to remove your answers from SO to stop open-ai from improving their models.

Honestly I don't quite understand all reluctance around AI. It will either improve productivity, or it will be irrelevant. If it doesn't provide quality, it will be outcompeted.

The only consistent argument I can see against AI requires us to view employment as a social welfare program, but if that's the case, why aren't we considering an actual social welfare program?


The claim I've heard is that you're essentially feeding your own knowledge for free into a proprietary system that can be used to generate cash for whatever corporation owns that system (i.e. ChatGPT from OpenAI). I think it's pointless to redact content for this purpose as well but clearly some people have strong takes against AI training.


It's funny, since Stack Overflow has done EXACTLY this since day one (i.e. generate cash, with user knowledge provided for free).

The only difference is SO uses community, gamification & reputation facades, to convince users to participate for free.

With OpenAI its simply a blackbox, no credit is given.

So I guess the lesson is people are willing to participate and share things for free, as long as they're given credit, community standing or something along those lines.


For me it is less about credit and more about access. Stack overflow is public and freely available - I’ll give answers for the benefit of the community. ChatGPT is a product, it’s locked behind accounts and limited unless you’re paying.

They changed the deal on their end? I’ll delete my posts.


But your answers will still be available on SO, unless you remove them. Your answers were free and publicly available until you removed them. Making them also available to paying customers of chat-gpt does not change that at all.

In fact, chat gpt will probably still be able to answer those questions, so you removing your answers actually only removes them from the public, thus forcing people to use a paid product instead.

You had one goal, and your actions achieved the opposite.

Siuan Sanche's law of unintended consequences ought to be taught in primary school. Unfortunately it isn't.


That's a completely silly reply. SO didn't lock me out of the content I created there and charge me $20/mo to re-use it.

> The claim I've heard is that you're essentially feeding your own knowledge for free into a proprietary system that can be used to generate cash for whatever corporation owns that system

Which is exactly how Stack Overflow has operated from day 1: You feed your knowledge into a system owned by someone else.

Also, it’s ridiculous to think that the answers haven’t already been scraped and cataloged every which way for AI training purposes.

The only people who suffer at this point are the people trying to use Stack Overflow. Deleting posts now is an own goal. People will see the information missing from Stack Overflow and switch to asking ChatGPT.


That was always the situation.

The difference between Stack Overflow and (one predecessor eg) Experts Exchange was that SO explicitly weren't making people pay to access that knowledge.

It was to make the internet better. And it did. I've learned a lot through SO sites and if the votes are to be believed, I've made the internet better for tens of thousands of people.

I don't know what AI having access to my content does but I don't think it changes the sums. I answer things, people benefit, SO makes money somehow.


> you're essentially feeding your own knowledge for free into a proprietary system that can be used to generate cash for whatever corporation owns that system

Ive been contemplating this as well. There's a big difference between that quote and this quote:

> you're essentially feeding your own knowledge for free into an open system (the web) that can be used to generate cash for whatever corporation or person best utilizes that system


This feels more akin to a company mirroring stackoverflow and passing it off as its own, which I would object to.


But that was also always allowed under Stack Overflow’s terms. That’s why there are so many Stack Overflow clones.

It ironic that Stack Overflow went with very open and permissive licenses, which have normally been very popular with the community, yet people are outraged when the data is actually being used openly.


Aren't people already feeding knowledge into stack overflow for free which is a propriety system used to generate cash?


At least before the answers would benefit the whole community, fulfilling the spirit of CC licensing. Once it's fed to a LLM, it's essentially a form of laundering, as it's dubious the output of the models will also be free under CC. The "open" in OpenAI is effectively fake advertising. It's a proprietary enterprise misleading people by pretending open something.

It still does tho. I don't see why "whole community" cannot include open-ai? I mean, I've used knowledge gleaned from SO for the benefit of many large corporations who employed me in the past, that's not new.

"The Whole community" and open-ai are not mutually exclusive, despite what you may feel about it.


Because then open-ai won't honor the licence, once content goes through the LLM information laundering machine.

Yeah, but you get points and badges for feeding it free knowledge, they get the cash, go figure. It's the perfect pre-NFT grift.


>> feeding your own knowledge for free into a proprietary system

Isn't that exactly what stackoverflow was?


I think people feel differently about contributing to SO or wikipedia or even quora than they would labelling CIFAR images for instance. Maybe it's a distinction without a difference but people don't usually contribute to things like stack overflow with the objective of training an AI model.


It is an odd stance as their efforts still help others, though a different company benefits financially.

I don't find anything odd about it.

There is no (reliable) attribution.

Their efforts can be used to create a silo of information that others are required to pay for (e.g. when SO shuts down, is inaccessible, or otherwise made non-functional).

Their effort might be used to create completely wrong or even harmful content while only using the training material to learn to convince people to believe the AI output.

Yes, this was all possible before, without LLMs, done by humans (or machine translation for example).

But not at this scale. SO and the comments there were still still the authoritative source, and written by humans (without the need for any proof...)

But there is growing cohort of people who want to use AI as a knowledge blackbox, search engine, encyclopedia, even as an authoritative source.

And with it comes the intent to cleanse this "knowledge" of any individual authorship or traceable source.

It is not an odd stance to oppose this, even if the concrete actions expressing this stance might be futile in this particular case.


Bur what happens after a long time, when most of the content on the internet is AI generated and thus knowledge is limited to what the AI knows?

It might improve productivity in the short term but when we start experiencing the long term effects, it might be too late to go back.


Seems like a very unlikely scenario. And as far as doomsday scenarios go, this is fairly benign.

If people getting stupid is the worst possible outcome, AI is a lot safer than I thought, and I was already on board


At the end of 2023, I found myself ensnared in a nightmare of epic proportions. It was a year marked by loss and betrayal, as I fell victim to the siren song of a fraudulent Forex Investment Company, losing a significant portion of my hard-earned Bitcoin in the process. The aftermath was a whirlwind of despair and uncertainty, as I grappled with the harsh reality of my situation. How could I ever hope to recover what had been so callously taken from me? It seemed like an insurmountable task, a mountain too steep to climb. But amidst the chaos and despair, a glimmer of hope emerged on the horizon – Trust Geeks Hack Expert. With nothing left to lose, I reached out to them, laying bare the extent of my misfortune and placing my trust in their capable hands. From the moment they took on my case, I knew I was in good hands. Their team of professionals and hackers worked tirelessly, employing every tool and technique at their disposal to track down and retrieve my stolen funds. It was a Herculean effort, a battle fought on the front lines of the digital battlefield. Days turned into weeks, and weeks into months, but still, they persisted. Their dedication knew no bounds, their determination unwavering in the face of seemingly insurmountable odds. And then, like a bolt from the blue, came the news I had scarcely dared to hope for – my lost Bitcoin, all 2 million Euros worth, had been successfully recovered. It was a moment of disbelief and profound relief, as I realized that what had once seemed impossible had become a reality. Trust Geeks Hack Expert not only restored my financial security but also restored my faith in the power of resilience and determination. Their expertise and dedication were unmatched, and their commitment to justice was unwavering in the face of adversity. They were not just a team of professionals but a beacon of hope for those who had lost their way in the labyrinth of deception. To anyone who finds themselves in a similar situation, I offer this advice – do not lose hope. Reach out to Trust Geeks Hack Expert website. https://trustgeekshackexpert.com/ & Telegram ID : Trustgeekshackexpert , for they possess the knowledge and skill to turn despair into triumph, and darkness into light. In a world where trust is often betrayed and dreams shattered, Trust Geeks Hack Expert stands as a testament to the power of perseverance and determination. They are the real deal when it comes to reclaiming what is rightfully yours, and I will forever be grateful for their unwavering support and expertise.

> This might be a controversial opinion, but to me it seems kinda petty to remove your answers from SO to stop open-ai from improving their models.

It’s reminiscent of Reddit moderators trying to shut down their subreddits in protest recently.

At the time they thought they were doing what everybody wanted, forcing Reddit executives to bend to their will, and taking control of the platform.

In reality, the only people who were inconvenienced were the users who couldn’t get info they needed for a few weeks. The vast majority of people didn’t care about the issue because they didn’t use 3rd party clients (especially those who were casual users, unlike the power mods). A few months later the protests are all but forgotten and the sentiment has shifted so far that Reddit mods are viewed as the villains most of the time.

Six months from now, I bet this protest will also be forgotten. When people click on Google links to Stack Overflow and find the key answer missing, the villain will be the SO user who threw a tantrum and deleted their post, with the original drama being long forgotten.


Personally, I'm kinda there already. Petty users removing answers out of spite are the villains. Doing stuff out of spite usually makes you the villain, and this is not an exception to that.

> It will either improve productivity

This is only a good thing if you benefit from overall increases in productivity. Most of us don't benefit at an individual level

The general population has been receiving very diminishing returns from widespread increases in productivity, the ownership class has been capturing a majority of the benefit for far too long

Until you can convince me that this will be fixed, I don't see why I should be embracing anything that "improves productivity"


Following this logic, how do you feel about automated tests? Without those, there'd be a much larger market for manual testers then there is currently..

I'm just confused about this standpoint, you seem to be opposed to actually delivering value to your employer? Sounds like you have a bad boss or something, maybe switch jobs? Or heck, create your own company if you want a bigger piece of the cake.

I'm probably missing something about your philosophy, because from where I'm standing, it kinda seems like you're saying that because you have learned programming, you're not only entitled to free money, but you also want an equal share to those who have both put in work and taken on risk to create the company you work for?

Which all kinda sounds like what you actually want is UBI, but only for you and your friends (who have higher education).

What am I missing?


> Following this logic, how do you feel about automated tests? Without those, there'd be a much larger market for manual testers then there is currently..

This is a great example

With automated tests, Manual QA positions are cut. Now the responsibility to write those tests falls on developers. Sometimes, rarely, there are automation QA people, but most places the tests are written by the devs in my experience

So the company is saving money on a bunch of salaries, the company captures the increased productivity that automated tests offer, executives get bigger bonuses, the investors get a nice bump, the individuals who now have extra responsibility get nothing

Maybe the overall developer market gets a higher salary, eventually, if you believe in the trickle down effect

> but you also want an equal share to those who have both put in work and taken on risk to create the company you work for

I don't expect to earn an equal share, but I do expect to benefit when the company does. This is more and more rare

> Which all kinda sounds like what you actually want is UBI, but only for you and your friends

You could not be further from the truth. People work harder for a smaller share every day. I just want that share to start growing, or at least stop shrinking.

I suppose yeah, failing that then I want to not to have to work as hard. I want some benefit from all of this so called productivity, instead of all of that going to rich people who lecture me about how much "risk" they take gambling some percent of their net worth that is more than I have made in my lifetime


So I actually agree with you on the social issues, but I really don't think targeting automation is the way forward.

If we want to work less, we need to automate more to make that even theoretically possible. The social situation of just a few people benefiting is orthogonal to the technology being used.

Which is why targeting specific tech in a fight for social justice is misguided, ineffective and probably even contraproductive.


It seems they are unhappy about OpenAI as a single company entity getting all that power. It doesn't seem clear enough to them that OpenAI wouldn't just snowball leaving others aside.


Yeah that's why, and maybe you are referring to this, lot of people like Universal Basic Income.

However the realistic issue is that looking at the last decades wealth doesn't spread evenly. The Internet has helped created huge almost global online monopolies (google, amazon etc) at an unprecedented rate, instead of spreading the wealth evenly.

So AI will most likely have a similar trajectory, the productivity gains will be monopolised. And we don't have a way of distributing wealth after the fact in Western Capatalism (maybe rightly so), so this is why people are afraid.


"However, we must not forget that AI needs to learn as well from vast sources."

Well, that's actually the problem. This current wave of AI is not "learning" anything really. An AI with any sort of generalizable reasoning ability would just need basic sources on programming syntax and semantics and figure the rest out on its own. Here, instead, we see the need to effectively memorize variations of the same thing, say, answers to related programming questions, so that they can be part of a intelligent sounding response.

I was dubious at the value of GenAI as a search tool at first, but now see that it's actually well suited for the role. These massive models are largely storing information in a compressed form and are great at retrieving and doing basic rewrites. The next evolution in Expert Systems I suppose, although lacking strong reasoning.


> An AI with any sort of generalizable reasoning ability would just need basic sources

That is a completely unsupportable assertion.


No, it's the very definition of being able to generalize.


Exactly, imagine thinking you can learn how to program Java or C just from being handed the language specification, or even learn how to play chess just by being told the rules of the game.

Humans don't learn anything of substance just from being told the strict rules, we also learn from a wealth of examples expressed through a variety of means some of which is formal, some poetic, some even comedic.

Heck, we wouldn't even need Stack Overflow to begin with if we could learn things just from basic sources.


> imagine thinking you can learn how to program Java or C just from being handed the language specification

Throw in the standard library documentation, and that's exactly how many of us learned how to program before projects like stackoverflow or even the web existed for the public. We took those rules, explored the limits of them using the compiler, and learned.

Stackoverflow is, IMO, a shortcut through portions of the exploration and learning phase. Not a bad thing, but importantly it's not required either.


Yeah I was programming from way before Stackoverflow existed and I simply call BS.

No one learned Java or C by reading either of the following documents:

https://docs.oracle.com/javase/specs/jls/se10/html/index.htm...

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

It's cool to romanticize the past though.


I mean...

Humans do this.

Not perfectly, no, but we do.

As the only general intelligences we know of so far, I'd say it's support for the assertion that an AI with general reasoning abilities wouldn't need SO or other examples to figure out how to do specific tasks.


I highly doubt if deleting posts will matter, stackoverflow most likely still serve them to OpenAI would be my guess.


Yup. OpenAI will have the answers and humans won’t except through OpenAI.


Especially if enough people maliciously pollute the commons in support of their political ideology.


As long as SO feels some pain, I'm good with it. Reward good behavior, punish bad. It's not a difficult concept. OpenAI may get what it wants, but SO will also pay for engaging with them in the first place.

When was Stack Overflow community ever happy?


> One theory related to the invention of the Jacquard loom by Frenchman Joseph Marie Jacquard in 1801. The Jacquard loom uses punch cards to automate the raising and lowering of warp threads, allowing one to achieve many more sheds than a human weaver could ever grapple with. This invention, however, put many textile workers out of a job. The theory is that these employees threw their wooden clogs (the French word for these clogs is “sabots”) into the delicate machinery to wreak havoc.

"The Origin of the Word Sabotage" ( https://handwovenmagazine.com/origin-of-the-word-sabotage/ )


Tough situation. The amount of time I spend on Stack Overflow has probably dropped by 90% since I've started paying for ChatGPT4. Clearly it's only capable of what it is because of the massive amount of data like Stack Overflow that is was trained on.


I have been unimpressed with ChatGPT4's ability to generate code for problems of even medium complexity. Even if given partial or complete code from another language and told to translate it!

If we're seeing a heavy drop in StackOverflow usage then my guess is that Stack Overflow was getting most traffic from some very basic queries and ChatGPT is eating that base out from under them. Better for StackOverflow that they partner with OpenAI and focus on serving the higher end that they have left.


Does this interaction seem like low complexity?

https://chatgpt.com/share/e92bf633-4815-45bc-8ae9-9068fe257d...

[EDIT: pasted in wrong link]

There is basically no information out there on how to write error tolerant parsers for language servers. My entire knowledge before starting work on this was someone giving me a three sentence explanation on the F# Discord for an approach using an intermediary AST doing a tolerant first pass.

The key with handling medium and large complexity tasks with an LLM is to break it up into less complex tasks.

First, I showed an example of a very simple parser, parsing floats between brackets and asked for a response that parsed just strings between brackets, then I asked:

I'm working on a language server for a custom parser. What I really want to do is make sure that only floats are between brackets, but since I want to return multiple error messages at a time for the developer I figure I need to loosely check for a string first, build the AST, and then parse each node for a float. Does this seem correct?

I get a response of some of the code and then specifically ask for this case:

can you show the complete code, where I can give "[1.23][notAFloat]" as input and get back the validatedAST?

There's an error in the parser so I paste in the logged message. It corrects the error, so I then ask:

now, instead of just "Error", can we also get the line and char numbers where the error occured?

There's some more back and forth but in just a few iterations I've got what amounts to a tutorial on using FParsec to create error tolerant parsers with line and column reporting ready for integration with the language server protocol.

If anyone would like to point me in the direction of such a tutorial that already exists I would very much appreciate it!


You need an account to see your link.


I wonder if there could be a way to label my data as gpl. when I answer a SO question, I do so because I want to help a random person. If a person wants to make an AI model from my answers, I'm also okay with that. But if they hoard my data and build a walled garden from my data, I'm no longer happy with that. If openai was actually open, that probably would annoy less people.


> I wonder if there could be a way to label my data as gpl.

You can claim whatever you want, but by posting on the site you’ve given them a license as well. It’s a condition of using the website.

You’re free to license your content separately and post it on your own website, but anything you post on Stack Overflow goes under their rules.



I guess it was inevitable. I feel whatever decision they'd make, even canceling the deal, it can't get better. the ship has sailed


> Stack Overflow has recently banned users who have tried to delete their programming source codes to prevent it from being used by OpenAI have been either banned or suspended by the site moderators.

Looks like a Class Action Suite in the making by banning users doing what was already allowed.


What is the complaint in this suit you're hypothesizing? How would damages be computed?

Lawyers like to sue, so I am sure some layers are looking. Also, deleting right now is allowed on all social apps.

Well, forget about new answers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: