This might be a controversial opinion, but to me it seems kinda petty to remove your answers from SO to stop open-ai from improving their models.
Honestly I don't quite understand all reluctance around AI. It will either improve productivity, or it will be irrelevant. If it doesn't provide quality, it will be outcompeted.
The only consistent argument I can see against AI requires us to view employment as a social welfare program, but if that's the case, why aren't we considering an actual social welfare program?
The claim I've heard is that you're essentially feeding your own knowledge for free into a proprietary system that can be used to generate cash for whatever corporation owns that system (i.e. ChatGPT from OpenAI). I think it's pointless to redact content for this purpose as well but clearly some people have strong takes against AI training.
It's funny, since Stack Overflow has done EXACTLY this since day one (i.e. generate cash, with user knowledge provided for free).
The only difference is SO uses community, gamification & reputation facades, to convince users to participate for free.
With OpenAI its simply a blackbox, no credit is given.
So I guess the lesson is people are willing to participate and share things for free, as long as they're given credit, community standing or something along those lines.
For me it is less about credit and more about access. Stack overflow is public and freely available - I’ll give answers for the benefit of the community. ChatGPT is a product, it’s locked behind accounts and limited unless you’re paying.
They changed the deal on their end? I’ll delete my posts.
But your answers will still be available on SO, unless you remove them. Your answers were free and publicly available until you removed them. Making them also available to paying customers of chat-gpt does not change that at all.
In fact, chat gpt will probably still be able to answer those questions, so you removing your answers actually only removes them from the public, thus forcing people to use a paid product instead.
You had one goal, and your actions achieved the opposite.
Siuan Sanche's law of unintended consequences ought to be taught in primary school. Unfortunately it isn't.
> The claim I've heard is that you're essentially feeding your own knowledge for free into a proprietary system that can be used to generate cash for whatever corporation owns that system
Which is exactly how Stack Overflow has operated from day 1: You feed your knowledge into a system owned by someone else.
Also, it’s ridiculous to think that the answers haven’t already been scraped and cataloged every which way for AI training purposes.
The only people who suffer at this point are the people trying to use Stack Overflow. Deleting posts now is an own goal. People will see the information missing from Stack Overflow and switch to asking ChatGPT.
The difference between Stack Overflow and (one predecessor eg) Experts Exchange was that SO explicitly weren't making people pay to access that knowledge.
It was to make the internet better. And it did. I've learned a lot through SO sites and if the votes are to be believed, I've made the internet better for tens of thousands of people.
I don't know what AI having access to my content does but I don't think it changes the sums. I answer things, people benefit, SO makes money somehow.
> you're essentially feeding your own knowledge for free into a proprietary system that can be used to generate cash for whatever corporation owns that system
Ive been contemplating this as well. There's a big difference between that quote and this quote:
> you're essentially feeding your own knowledge for free into an open system (the web) that can be used to generate cash for whatever corporation or person best utilizes that system
But that was also always allowed under Stack Overflow’s terms. That’s why there are so many Stack Overflow clones.
It ironic that Stack Overflow went with very open and permissive licenses, which have normally been very popular with the community, yet people are outraged when the data is actually being used openly.
At least before the answers would benefit the whole community, fulfilling the spirit of CC licensing. Once it's fed to a LLM, it's essentially a form of laundering, as it's dubious the output of the models will also be free under CC. The "open" in OpenAI is effectively fake advertising. It's a proprietary enterprise misleading people by pretending open something.
It still does tho. I don't see why "whole community" cannot include open-ai? I mean, I've used knowledge gleaned from SO for the benefit of many large corporations who employed me in the past, that's not new.
"The Whole community" and open-ai are not mutually exclusive, despite what you may feel about it.
I think people feel differently about contributing to SO or wikipedia or even quora than they would labelling CIFAR images for instance. Maybe it's a distinction without a difference but people don't usually contribute to things like stack overflow with the objective of training an AI model.
Their efforts can be used to create a silo of information that others are required to pay for (e.g. when SO shuts down, is inaccessible, or otherwise made non-functional).
Their effort might be used to create completely wrong or even harmful content while only using the training material to learn to convince people to believe the AI output.
Yes, this was all possible before, without LLMs, done by humans (or machine translation for example).
But not at this scale. SO and the comments there were still still the authoritative source, and written by humans (without the need for any proof...)
But there is growing cohort of people who want to use AI as a knowledge blackbox, search engine, encyclopedia, even as an authoritative source.
And with it comes the intent to cleanse this "knowledge" of any individual authorship or traceable source.
It is not an odd stance to oppose this, even if the concrete actions expressing this stance might be futile in this particular case.
> This might be a controversial opinion, but to me it seems kinda petty to remove your answers from SO to stop open-ai from improving their models.
It’s reminiscent of Reddit moderators trying to shut down their subreddits in protest recently.
At the time they thought they were doing what everybody wanted, forcing Reddit executives to bend to their will, and taking control of the platform.
In reality, the only people who were inconvenienced were the users who couldn’t get info they needed for a few weeks. The vast majority of people didn’t care about the issue because they didn’t use 3rd party clients (especially those who were casual users, unlike the power mods). A few months later the protests are all but forgotten and the sentiment has shifted so far that Reddit mods are viewed as the villains most of the time.
Six months from now, I bet this protest will also be forgotten. When people click on Google links to Stack Overflow and find the key answer missing, the villain will be the SO user who threw a tantrum and deleted their post, with the original drama being long forgotten.
Personally, I'm kinda there already. Petty users removing answers out of spite are the villains. Doing stuff out of spite usually makes you the villain, and this is not an exception to that.
This is only a good thing if you benefit from overall increases in productivity. Most of us don't benefit at an individual level
The general population has been receiving very diminishing returns from widespread increases in productivity, the ownership class has been capturing a majority of the benefit for far too long
Until you can convince me that this will be fixed, I don't see why I should be embracing anything that "improves productivity"
Following this logic, how do you feel about automated tests? Without those, there'd be a much larger market for manual testers then there is currently..
I'm just confused about this standpoint, you seem to be opposed to actually delivering value to your employer? Sounds like you have a bad boss or something, maybe switch jobs? Or heck, create your own company if you want a bigger piece of the cake.
I'm probably missing something about your philosophy, because from where I'm standing, it kinda seems like you're saying that because you have learned programming, you're not only entitled to free money, but you also want an equal share to those who have both put in work and taken on risk to create the company you work for?
Which all kinda sounds like what you actually want is UBI, but only for you and your friends (who have higher education).
> Following this logic, how do you feel about automated tests? Without those, there'd be a much larger market for manual testers then there is currently..
This is a great example
With automated tests, Manual QA positions are cut. Now the responsibility to write those tests falls on developers. Sometimes, rarely, there are automation QA people, but most places the tests are written by the devs in my experience
So the company is saving money on a bunch of salaries, the company captures the increased productivity that automated tests offer, executives get bigger bonuses, the investors get a nice bump, the individuals who now have extra responsibility get nothing
Maybe the overall developer market gets a higher salary, eventually, if you believe in the trickle down effect
> but you also want an equal share to those who have both put in work and taken on risk to create the company you work for
I don't expect to earn an equal share, but I do expect to benefit when the company does. This is more and more rare
> Which all kinda sounds like what you actually want is UBI, but only for you and your friends
You could not be further from the truth. People work harder for a smaller share every day. I just want that share to start growing, or at least stop shrinking.
I suppose yeah, failing that then I want to not to have to work as hard. I want some benefit from all of this so called productivity, instead of all of that going to rich people who lecture me about how much "risk" they take gambling some percent of their net worth that is more than I have made in my lifetime
So I actually agree with you on the social issues, but I really don't think targeting automation is the way forward.
If we want to work less, we need to automate more to make that even theoretically possible. The social situation of just a few people benefiting is orthogonal to the technology being used.
Which is why targeting specific tech in a fight for social justice is misguided, ineffective and probably even contraproductive.
It seems they are unhappy about OpenAI as a single company entity getting all that power. It doesn't seem clear enough to them that OpenAI wouldn't just snowball leaving others aside.
Yeah that's why, and maybe you are referring to this, lot of people like Universal Basic Income.
However the realistic issue is that looking at the last decades wealth doesn't spread evenly. The Internet has helped created huge almost global online monopolies (google, amazon etc) at an unprecedented rate, instead of spreading the wealth evenly.
So AI will most likely have a similar trajectory, the productivity gains will be monopolised. And we don't have a way of distributing wealth after the fact in Western Capatalism (maybe rightly so), so this is why people are afraid.
"However, we must not forget that AI needs to learn as well from vast sources."
Well, that's actually the problem. This current wave of AI is not "learning" anything really. An AI with any sort of generalizable reasoning ability would just need basic sources on programming syntax and semantics and figure the rest out on its own. Here, instead, we see the need to effectively memorize variations of the same thing, say, answers to related programming questions, so that they can be part of a intelligent sounding response.
I was dubious at the value of GenAI as a search tool at first, but now see that it's actually well suited for the role. These massive models are largely storing information in a compressed form and are great at retrieving and doing basic rewrites. The next evolution in Expert Systems I suppose, although lacking strong reasoning.
Exactly, imagine thinking you can learn how to program Java or C just from being handed the language specification, or even learn how to play chess just by being told the rules of the game.
Humans don't learn anything of substance just from being told the strict rules, we also learn from a wealth of examples expressed through a variety of means some of which is formal, some poetic, some even comedic.
Heck, we wouldn't even need Stack Overflow to begin with if we could learn things just from basic sources.
> imagine thinking you can learn how to program Java or C just from being handed the language specification
Throw in the standard library documentation, and that's exactly how many of us learned how to program before projects like stackoverflow or even the web existed for the public. We took those rules, explored the limits of them using the compiler, and learned.
Stackoverflow is, IMO, a shortcut through portions of the exploration and learning phase. Not a bad thing, but importantly it's not required either.
As the only general intelligences we know of so far, I'd say it's support for the assertion that an AI with general reasoning abilities wouldn't need SO or other examples to figure out how to do specific tasks.
As long as SO feels some pain, I'm good with it. Reward good behavior, punish bad. It's not a difficult concept. OpenAI may get what it wants, but SO will also pay for engaging with them in the first place.
> One theory related to the invention of the Jacquard loom by Frenchman Joseph Marie Jacquard in 1801. The Jacquard loom uses punch cards to automate the raising and lowering of warp threads, allowing one to achieve many more sheds than a human weaver could ever grapple with. This invention, however, put many textile workers out of a job. The theory is that these employees threw their wooden clogs (the French word for these clogs is “sabots”) into the delicate machinery to wreak havoc.
Tough situation. The amount of time I spend on Stack Overflow has probably dropped by 90% since I've started paying for ChatGPT4. Clearly it's only capable of what it is because of the massive amount of data like Stack Overflow that is was trained on.
I have been unimpressed with ChatGPT4's ability to generate code for problems of even medium complexity. Even if given partial or complete code from another language and told to translate it!
If we're seeing a heavy drop in StackOverflow usage then my guess is that Stack Overflow was getting most traffic from some very basic queries and ChatGPT is eating that base out from under them. Better for StackOverflow that they partner with OpenAI and focus on serving the higher end that they have left.
There is basically no information out there on how to write error tolerant parsers for language servers. My entire knowledge before starting work on this was someone giving me a three sentence explanation on the F# Discord for an approach using an intermediary AST doing a tolerant first pass.
The key with handling medium and large complexity tasks with an LLM is to break it up into less complex tasks.
First, I showed an example of a very simple parser, parsing floats between brackets and asked for a response that parsed just strings between brackets, then I asked:
I'm working on a language server for a custom parser. What I really want to do is make sure that only floats are between brackets, but since I want to return multiple error messages at a time for the developer I figure I need to loosely check for a string first, build the AST, and then parse each node for a float. Does this seem correct?
I get a response of some of the code and then specifically ask for this case:
can you show the complete code, where I can give "[1.23][notAFloat]" as input and get back the validatedAST?
There's an error in the parser so I paste in the logged message. It corrects the error, so I then ask:
now, instead of just "Error", can we also get the line and char numbers where the error occured?
There's some more back and forth but in just a few iterations I've got what amounts to a tutorial on using FParsec to create error tolerant parsers with line and column reporting ready for integration with the language server protocol.
If anyone would like to point me in the direction of such a tutorial that already exists I would very much appreciate it!
I wonder if there could be a way to label my data as gpl. when I answer a SO question, I do so because I want to help a random person. If a person wants to make an AI model from my answers, I'm also okay with that. But if they hoard my data and build a walled garden from my data, I'm no longer happy with that. If openai was actually open, that probably would annoy less people.
> Stack Overflow has recently banned users who have tried to delete their programming source codes to prevent it from being used by OpenAI have been either banned or suspended by the site moderators.
Looks like a Class Action Suite in the making by banning users doing what was already allowed.
Honestly I don't quite understand all reluctance around AI. It will either improve productivity, or it will be irrelevant. If it doesn't provide quality, it will be outcompeted.
The only consistent argument I can see against AI requires us to view employment as a social welfare program, but if that's the case, why aren't we considering an actual social welfare program?