> Published content will be later used to train subsequent models, and being able to distinguish AI from human input may be very valuable going forward
I find this to be a particularly interesting problem in this whole debacle.
Could we end up having AI quality trend downwards due to AI ingesting its own old outputs and reinforcing bad habits? I think it's a particular risk for text generation.
I've already run into scenarios where ChatGPT generated code that looked perfectly plausible, except for that the actual API used didn't really exist.
Now imagine a myriad fake blogs using ChatGPT under the hood to generate blog entries explaining how to solve often wanted problems, and that then being spidered and fed into ChatGPT 2.0. Such things could end up creating a downwards trend in quality, as more and more of such junk gets posted, absorbed into the model and amplified further.
I think image generation should be less vulnerable to this since all images need tagging to be useful, "ai generated" is a common tag that can be used to exclude reingesting old outputs, and also because with artwork precision doesn't matter so much. If people like the results, then it doesn't matter that much that something isn't drawn realistically.
> Could we end up having AI quality trend downwards due to AI ingesting its own old outputs and reinforcing bad habits? I think it's a particular risk for text generation.
This is exactly what I don't like about Copilot, maybe even more than the IP ethics of it. If it really succeeds, it's going to have a feedback loop that amplifies its own code suggestions. The same boilerplate-ish kind of code that developers generate over and over will get calcified, even if it's suboptimal or buggy. If we really need robots to help write our code, to me that means we don't yet have expressive enough languages.
Todays programmers can see copilot output and probably think “well that’s not optimal”. Fast forward five years, new CS grads are using Copilot 3.0, and are used to specific auto-completes that copilot gives for certain tasks, as they may have never needed to go beyond some of the more basic suggestions.
It “feels” like an older programmer seeing a younger web dev and going “you’re wasting MB of memory!”
While true the web has gotten slower in many regards, and indeed memory may have been wasted, business value creation typically doesn’t care if a few MB is sub optimally wasted, while the previous generation does.
This is a whole class of problem, it seems to me. A semi-automated approach can seem fine when there's an executive function over the top of it, exercised by someone who has not just the knowledge, but current and honed knowledge. But over time, what's keeping that knowledge current and honed?
The airline industry has talked about this, of course, and adoption of robotic surgery has opened up a whole new training problem, because its escape hatches when it goes wrong or can't complete a procedure is often "complete the surgery manually". Which is fine on Day 1 of robotic surgery, but what about day 2 when surgeons typically don't have hundreds of similar procedures under their belts? And where the only time they're called on to exercise the skill is in difficult edge cases?
We have basically turned driving a standard transmission into a weird old person quirk or niche enthusiant skill in the United States. If an automatic transmission required a similar manual fallback or check, how well would that work? Well, it would work fine if basically everyone already had a lot of practice driving a manual--but now? It wouldn't work well at all. Of course, automatic transmission don't fail like that, and are a lot better at switching gears than AI assistants are at generating code. I worry about the semi-automated approach to self-driving, where the driver may not actually have currency with their driving skill, and where--in the instance that it's necessary--a driver has to react to more complicated situations (they don't have practice with the simple ones, and they have to react not to a hazard but to their car's failure to react to a hazard).
And as also pointed out by others, it not only requires effort, but knowledge, and that knowledge will be systematically degraded the more AI-ish code generation is used.
OTOH, those with super-diligent hacker attitudes will start to learn how to find the flaws in generated code and optimize it, so leveraging the tool, but most will just move on to the next task/ticket as soon as the AIish code passes the unit tests. So, super-leveraging AI-generated code will be rare.
> And as also pointed out by others, it not only requires effort, but knowledge, and that knowledge will be systematically degraded the more AI-ish code generation is used.
How is that different than the plague of junior devs we've always had? Devs will get more senior by identifying and correcting issues in AI code rather than code from their peers. Seems OK, like we just got a whole lot more coding capacity.
That and I bet the generic boilerplate code that co-pilot produces is on average much better than the boiler plate code that the average dev might use. So it removes the lower level work.
It is. There are a lot of problems with copilot, but one magical thing is the way it rewards best practices like thinking through what a function is supposed to do before starting to write it.
If you write a good comment describing name, inputs, outputs, logic, and exceptions, the "generate code from comment" capability is kind of amazing. I'm a terrible, hacky programmer, and it has wholly converted me to documentation-first.
The only Copilot-type system I'd consider using is one that did the opposite - generated comments from my code, so my lazy butt doesn't have to write them manually.
I think that can surely be the case, but just like any AI there may need to be manual review to asses what is the most optimal way to go about task X, and retrain.
I can see this go both ways:
boilerplate code being a great generic solution to a set of problems, but a more seasoned programmer may say “that works, but for our use case the trade offs don’t make sense”
Or alternatively, “this code wasn’t something I knew I could do in language X, and it’s far more efficient”
but enterprise FizzBuzz is a demonstration that exactly that phenomenon will happen without AI, merely with books, youtube videos, or blogs (the calcification) and cargo culting (the lazy application).
For real, most developers repeatedly copy/paste things they find on the internet without understanding how they work. So the AI isn’t doing anything special that humans don’t already do.
Source: Bitter old SRE who has had to fix many broken software patterns ripped out of StackOverflow and the like.
As someone in SEO, I've been pretty disgusted by the desire for site owners to want to use AI-generated content. There are various opinions on this, of course, but I got into SEO out of interest in the "organic web" vs. everything being driven by ads.
Love the idea of having AI-Free declarations of content as it could / should help to differentiate organic content from generated content. It would be very interesting if companies and site owners wished to self-certify their site as organic with something like an /ai-free.txt.
Curious, isn't SEO the thing that ruined search to a big extent. A 1000 word article where the actual answer needs a fraction of that size. Or interesting content buried because it is not "SEO optimized". Or companies writing blog content and making it look helpful while actually shilling and highlighting their own product. Plus tons of other things.
So now you need something like ChatGPT to cut through the noise?
> So now you need something like ChatGPT to cut through the noise?
I once employed a journalist to write about the pros and cons of wedding insurance. Just to give you a clue how long ago this was, it was a unique article at the time.
Many years years later, every article you will read about wedding insurance (there will be many thousands) is around 90% similar in style and content to the one I paid for.
I dare say you could use any other topic as an example, one thoroughly researched original and many thousand similar copies. I can't see how ChatGPT is not going to make this situation much worse?
My guess is that ChatGPT is going to solve the SEO spam problem by changing the way we search for things. Instead of searching for webpages that have information about a topic, we're going to ask an AI.
It'll tell you what the pros and cons of wedding insurance are, and because eventually it'll have access to your calendar, it'll tailor that answers to the specifics of the fact that you're having a destination wedding during monsoon season in the area.
Once this kind of AI search becomes the default way we look for information, there won't be a point to creating SEO spam anymore. It'll create other problems, of course, but that's the way it goes with new technology.
> Instead of searching for webpages that have information about a topic, we're going to ask an AI.
I think this is probably true for some people, the same sort of person who sees something on Facebook and assumes that it's true. [1] But there are quite a lot of people for whom "according to whom?" is the next question after being told something factual. For them, I think search's job is to find relevant sources and get out of the way.
But I think even finding out is a long way away. The main thing that ChatGPT has nailed is glibness. It produces text that sounds authoritative, whether or not it's correct. And it's often incorrect. People may try ChatGPT search out of novelty or because it feels human. But if they depend on it and feel the real-world impact of a confidently wrong answer, they're going to treat it as a human that's untrustworthy. A blowhard, a liar, a fool. So I'm sure the major search players are going to be very cautious rolling out chat-like things. Google has spent decades building up consumer trust, and the don't need a zillion articles about people who a too-confident chat steered wrong.
> But I think even finding out is a long way away. The main thing that ChatGPT has nailed is glibness. It produces text that sounds authoritative, whether or not it's correct. And it's often incorrect.
Perfect example I found a while back: If you know Chinese or Japanese, ask ChatGPT for the stroke order of a certain character and watch how confidently it tells you how to draw a nonsensical scribble.
Even when you ask it for the stroke order of 一, it will tell you to draw a vertical line!
> But if they depend on it and feel the real-world impact of a confidently wrong answer, they're going to treat it as a human that's untrustworthy. A blowhard, a liar, a fool.
Maybe that's good. There are glib liars on the net, and not all of them are ChatGPT. If people learn to be skeptical of fine-sounding content on the net, maybe they'll apply it to humans, too.
The most basic technique for estimating trust in an answer today is to check the sources, see who said it, why, who is agreeing with them, etc.
It some AI just spits out an answer without any references, you cannot do that. You either have to blindly trust the answer, which will be dangerous, or you'll have to blindly distrust the answer, at which point the AI will be useless.
Regardless of spam, there is another fundamental issue with AI, Accountability. Any text you've read had a real person behind it with real intentions. Malice and greed or honesty and exploration. It would be very difficult to hold an AI accountable for any offence committed on accuracy or honesty. With a person, you can slowly get to the bottom of it and develop a relationship. AI will muddy the waters of people writing and thinking instead of offloading everything to an AI that could do it faster and you'd even evade any responsibility for your text as you could claim that the AI might have inaccuracies and does not reflect your own opinion
I don't think what you're saying really applies to SEO articles, though. If you don't get wedding insurance because you read some SEO article that recommends against it, even if the advice is clearly bad, can you really hold them accountable? It's tough for me to imagine you'd win that lawsuit.
> With a person, you can slowly get to the bottom of it and develop a relationship.
With this kind of content (with most content on the internet, I'd argue), you really can't.
I'd agree that there is no significant difference in accountability between content farms and AI. But I think that's burying the lede because what makes HN so appealing to many is the personally written blogposts and experts in specific fields or silly hobby projects. AI has the unique ability to mimick all of that for highly specific audiences with significantly lower cost and effort. This would impact curation efforts of many different user-generated websites and posts and could create a new status quo where a nobody could be treated guilty of AI-Generated articles, SEO hacking and upvote manipulation before entertaining the thought that a natural person could be making their first concerted effort for their reputation and prove their honesty.
For now HN is a bastion of conversation across many countries and domains
What is the difference in accountability and ability to get to the bottom of who's responsible, between AI and someone just hiring an offshore content farm to write crap content?
>Once this kind of AI search becomes the default way we look for information, there won't be a point to creating SEO spam anymore
What about creating new, relevant, interesting content that no-one will ever see because search no-longer exists? Will site owners continue to do it knowing AI will crawl it, and never send traffic? Probably....not?
How do LLMs of the future get better if website owners are no longer incentivized to create content?
ChatGPT is trained on those same SEO spam blog posts, so I'm not sure how it solves the fundamental problem. People aren't going to create content for corporate giants to vacuum up as training material for their AI
There are various ways of looking at it and, of course, all sorts of people involved. My focus has always just been to encourage people to treat the search engines as an index that will pick up and rank quality content if you treat them that way. It still works very well today.
I don't see the point. There's lots of old content out there that won't get tagged, so lacking the tag doesn't mean it's AI generated. Meanwhile people abusing AI for profit (eg, generating AI driven blogs to stick ads on them) wouldn't want to tag their sites in a way that might get them ignored.
Does use of a search engine violate the "No AI" covenant with oneself?
Variation on the Turning Test: prove that it's not a human claiming to be a computer.
Modeling premises and Meta-analysis are again necessary elements for critical reasoning about Sources and Methods and superpositions of Ignorance and Malice.
Maybe this could encourage the recreation of the original Yahoo! (If you don't remember, Yahoo! started out not as a search engine in the Google sense but as a collection of human curated links to websites about various topics)
I consider Wikipedia to be a massive curated set of information. It also includes a lot of references and links to additional good information / source materials. Companies try to get spin added and it's usually very well controlled. I worry that a lot of ai generated dreck will seep into Wikipedia, but I am hopeful the moderation will continue to function well.
It would basically depend on how serious the search engines are about wanting created vs. generated content. Generated content is ultimately going to be a regurgitation.
>but I got into SEO out of interest in the "organic web" vs. everything being driven by ads.
If the owner of an SEO site wants to use AI for "content generation", doesn't that mean they didn't care about the human-generated content in the first place?
Seems like a choice between garbage and slightly more expensive garbage. What is interesting or organic about that? Back in the day, people used to put things on their websites because they cared about it and wanted to say it.
I read this more as meaning that they work for a legitimate company or two that is trying to organically improve their search results without stooping to nasty tricks
But if said company doesn't care about the human-generated content quality in the first place (evidenced by the fact that they're willing to replace it with AI generation), how is that not also a "nasty trick" by your standard?
At the end of the day they just want to optimize search results. And the Overton window of acceptability currently allows "human generated SEO content" but not "AI generated SEO content". It's just an arbitrary rule.
I think the difference between created content vs. generated content is quite more than arbitrary. If the leading search engines truly don't care about the difference, I'd predict a future where you have more bifurcation between a human web and a bot-driven one.
Don't worry, it won't be small shops doing it. It will be the majors, if that's where the money is.
To quote Yan LeCun:
Meta will be able to help small businesses promote themselves by automatically producing media that promote a brand, he offered.
"There's something like 12 million shops that advertise on Facebook, and most of them are mom and pop shops, and they just don't have the resources to design a new, nicely designed ad," observed LeCun. "So for them, generative art could help a lot."
How could it possibly help unless there were some independent verification mechanism though? If there's a motivation to lie about the content being "organically generated" because that's what search users prefer to find, then clearly people will.
And it's hard to imagine what that verification process would look like given current technology.
What about AI-assisted writing? e.g. improving style, grammar, readability, making explanations clearer and better structured? especially for non-native writers this is a challenge and not many can hire an editor or even a proofreader. I wonder if such use gets “penalized” by search engines the same way AI-generated content might?
> I've already run into scenarios where ChatGPT generated code that looked perfectly plausible, except for that the actual API used didn't really exist.
So the next question has to be: Was this still the right answer?
I've personally had plenty of instances in my programming career where the code I was working on really needed functions which were best shopped out to a common API.
To avoid interrupting my flow and to better inform the API I'd be designing for this, I just continued to write as if the API did exist. Then I went on to implement the functions that were needed.
Perhaps the bot was right to presume that there should be an API for this. You might even be able to then prompt ChatGPT to create each of the functions in that API.
Exactly, that there is an end to the rabbit hole is a limitation of today's models.
If something does not exist, it should be generated on the spot. GPT5 should check for the existence of an API and if it exists, test and validate it. If it fails tests or doesn't exist, create it.
Well, this is ChatGPT, not Copilot, so I'd assume that OP was looking for a snippet using a public library rather than an internal API. In that context, suggesting you use an API that doesn't exist is just wrong.
I've definitely done this with Copilot, though—it will suggest an API that doesn't actually exist but logically should in order to be consistent, and I'll go create it.
That seems more like misplaced expectations. Someone may have given you the impression that copilot was supposed to do things like that where that expectation seems to not be present for you in relation to ChatGPT.
However, as far as I know, the OpenAI team has not made it a goal to have ChatGPT only produce functional code using existing APIs. So I'm not sure we can call that an incorrect answer based on context.
If the API it demonstrated using logically should exist, it seems like the right answer is still to just go create it.
If I asked a coworker how to do X in framework Y, giving me the name of a function that should exist but doesn't is not a correct answer. If they told me "well, that's the function that should exist, you should just go submit a PR to framework Y", I would stop asking that coworker for help.
The difference I was drawing between ChatGPT and Copilot wasn't that Copilot has functionality ChatGPT doesn't, it's that it has context ChatGPT doesn't, so it suggests things related to internal APIs. In a conversation with ChatGPT it would be very difficult to get help with internal APIs, hence my assumption that OP wasn't referring to APIs they have any control over.
I was not suggesting the bot was telling the user to go make a PR to an open source framework but rather that they could create a library that contains those functions if that was the logical thing to do. Which is why I asked if that actually seemed the longer term right thing to do.
While I can easily agree that Copilot is probably the better tool for such questions, it is not clear from the parent comment whether their prompt to ChatGPT was asking to create code to do X or create code to do X using only existing publicly available libraries.
It's not immediately obvious that the bot failed to understand the question or that the answer was an example of the bot failing to understand the programming language. It could easily be that the user had an implied expectation of usability in a context they did not give to the bot.
That scenario is more like you asking a random person on the street, who happens to know Y framework, how to do X in that framework. Your coworker can be expected to get that you are looking for an answer that gets your current task done faster than you would be able to do without their assistance. The person on the street could not reasonably be expected to get that unless you give them that context.
>I've already run into scenarios where ChatGPT generated code that looked perfectly plausible, except for that the actual API used didn't really exist.
Yes! I remember generating a seemingly reasonable R script except that the library that it called to do most of the work didn't exist! It was like code from an alternate dimension!
I asked if there where any Open Source libraries that implemented a certain algorithm. It gave me links to 3 different GitHub repos, none of which existed.
This take strikes me as a little off. Programming languages are language. Unlike natural languages they are also based on context-free grammar. So an understanding of programming languages should actually be easier for even a general language model to incorporate than natural languages.
We can expect a bot like this to not really get context clues in natural language, although they seem to be getting better at that, but context is not necessary to have a true and functional understanding of a programming language. That was the point of creating such languages.
Using an API that doesn't exist but logically should once the use cases are demonstrated is not an example of lacking understanding, it is an example of advanced insight. A human might have invented the necessary functions inline with the rest of the project but if they are expressing functionality that is commonly applicable, then a common API for those functions is what the humans would eventually converge upon to clean up the code from the initial inline implementation, making it more consistent and readable.
My answer was mildly tongue in cheek, and I see where you’re going.
On the other hand, one of the other posters asked “to generate a parallax effect in Qt/QML. It simply used a QML Elemened with the name Parallax”. Is this an insight, or is this answering “yes, I could” to “could you pass me the salt?”. Maybe the line between the two is a fine one, and I didn’t realize that yet.
In general, it feels like copying part of the question (“write parallax code”) in the answer is the easy part of the task…
Saw that, and yea, that's totally a fall-back cop-out type answer.
I was pretty much just questioning everyone dismissing the whole category of answers like the R script one.
To me there does seem to be some nuance to here that's worth noticing. Some examples of this type of response are indeed too cheap and can be chalked up to lack of training data or something.
But in other cases it's actually not immediately obvious whether the answer the user got was their fault for not specifying that they are expecting code that works without additional supporting libraries.
A language model can't reasonably be expected to understand an expectation of usability or fitness for purpose in a context the user didn't specify.
The failure to realize that the API doesn’t exist and therefore the code won’t work in practice, however, is a major lack of insight and understanding.
Agreed. That does seem to be an example of a language model failing to understand the context in which the question was asked.
The user was implicitly expecting code that would function when executed immediately and as written with no additional supporting libraries included. This is different from code that would function correctly when executed after having downloaded relevant existing packages. Which is different from code that would function if executed along side additional supporting code from private libraries the user might not have access to. Etc...
Yet any of those answers fit for the same prompt, "create an R script that does such and such". The bot's lack of insight is on the likely intention behind the prompt rather than on the requested language. I'd say if it produces any code that fits the syntax and grammatical structure of the requested language, that's enough to say it understands the language.
Can confirm that this happened to me when I asked ChatGPT to generate a parallax effect in Qt/QML. It simply used a QML Elemened with the name Parallax.
Yeah, a few times when I ask for a reference to something outlandish, it generates a perfectly realistic looking paper alongside a doi link, that’s completely made up. Both the paper and the link link do not exist!
I have to wonder, were any of the the hypothesis' in those papers plausibly viable areas of inquiry?
Perhaps it could be useful if they train the bot to identify cases like this and state that no such references exist but also provide a thesis or suggest a line of study that would produce such a reference.
> Published content will be later used to train subsequent models, and being able to distinguish AI from human input may be very valuable going forward
Every discussion on AI take the example of ChatGPT and its inherent flaws but AI-generated content doesn't have to be dull and low quality.
One question that bother me is does it really matter? If AI-generated content is on par with Human-made or even better does it matter anymore that an AI generated it?
Maybe it's the sentimental value, empathy, fidelity?
If an AI had written Mozart's Requiem would it lessen its interest, its beauty?
I don't think AI has to be low-quality for GP's concern to be valid.
Humans get inputs from a large variety of sources, but if an AI's input is just text, then there's the potential for AI's input to mostly consist of its prior output. Iterate this, and its model could gradually diverge from the real world.
The equivalent in human society is groupthink, where members of a subgroup get most of their information from the same subgroup and end up believing weird things. We can counter that by purposely getting inputs from outside of our group. For a text-model AI, this means identifying text that wasn't produced by AI, as the article suggests.
AI is often limited in ways we aren't, but it also trivially consumes more than we can in a lifetime.
"Just text" in the case of GPT-3, but also it is trained on a token count exceeding the number of times an average synapse in a human brain will fire in a lifetime.
It can still get biases from the training set; while I'm not sure if "group think" is quite the right phrase, it does seem to "want" everyone to get along even when asking it to create multiple characters engaged in a conflict. (Or perhaps that's just an artefact of it estimating that I want that). Reminds me — in a bad way — of Jules Verne's From The Earth to the Moon.
Consuming and producing vast amounts of information is what makes the problem potentially worse than human groupthink. It enables the situation where AI is mostly consuming information produced by AI. That's the feedback loop I'm calling "groupthink." It could end up diverging from reality in the same way that chaotic functions diverge widely due to tiny differences in the initial conditions. The same problem exists if the AI consumes other types of information that it also produces.
Humans are more grounded by having a presence in the physical world. Plus they draw on various sources considered more reliable, like formal training, scientific papers, textbooks, quality journalism, etc. If we want AI to be reliable, we'll need it to put the most weight on similar sources, and maybe even have some real-world presence with sensors and robots.
Eventually AI will be able to produce new reliable information itself. But for that, it would have to recognize factual inconsistencies between sources and logical inconsistencies in arguments, and figure out how to resolve those, and do math correctly. I don't know what the state of the art is here, though ChatGPT tends to fail at basic arithmetic.
> That's the feedback loop I'm calling "groupthink."
While I think I get your point, I'd call that failure mode "believing it's own BS", and (perhaps I'm just being cynical here) I think humans collectively also have this failure mode.
That said, there's an old saying: "To err is human, to really foul up requires a computer" — it is quite possible for a machine, with merely the same category of flaws we have and no others, to be really bad for the world just because it's really fast and doesn't sleep.
There’s nothing saying AIs have to have text input, it’s just the method with the lowest friction of imagination. That’s why books of text have been around for so long.
There are already AIs that take input via image, video, and audio. The AI tech is input agnostic and only requires that someone figures out a way to get the input in.
> Every discussion on AI take the example of ChatGPT and its inherent flaws but AI-generated content doesn't have to be dull and low quality.
To get away from that we'd have to dramatically change our approach. The LLMs we have are trained on as much content as possible and essentially average out the style of their training data. What it writes reads like a B-grade high school essay because that is what you get when you average all the writing on the internet.
It's not obvious to me that a creative approach that boils down to "pick the most likely next word given the context so far" can avoid sounding bland.
>One question that bother me is does it really matter? If AI-generated content is on par with Human-made or even better does it matter anymore that an AI generated it?
> If an AI had written Mozart's Requiem would it lessen its interest, its beauty?
I think it's about intent. Art is interesting and beautiful to us because there is an undeniable human intent in creating it and a vision behind it.
ChatGPT and DALL-E are pretty cool but I think until AI get it's own intent and goals it's pretty fair to try to separate human art and AI art.
I've seen plenty of images generated by Midjourney and Stable Diffusion that I would describe as "interesting" and/or "beautiful".
For that matter, nature can obviously be both, and it doesn't have intentional design nor vision behind it. So it's clear that it's not a universal requirement.
What I meant is that something that is "interesting" and/or "beautiful" is just artistic, such as nature as you pointed out. For it to be art, there has to be intent behind it, otherwise it's just aesthetically pleasing.
My point was that art is more than just something that's aesthetically pleasing.
I would say that art is something that is deliberately created to be aesthetically pleasing. If it's done by an AI that was designed with intent to generate such things, I would consider them art, as well.
But if we're talking about definitions, surely what really matters is how most in society understand "art"? Now suppose we went around showing Midjourney-generated pics to random people on the streets and asking them whether it's art or not; how many do you think would say "no", or ask questions about artist's intent before giving an answer?
Most would say it's art, which is why these models are so interesting and have had such quick adoption. If you were to reveal that it's AI-made, some would also change their mind or say that it's art but not exactly in the same sense that human-made art is art.
I just think that until AI has real intent, it's a different thing that what we do. Which is not saying anything bad it, quite the contrary.
The definitions will inevitably shift over time, they're already ultimately subjective. If you believe in god, it's not hard to argue that nature is art and quite a lot of people believe in god.
Yes! There’s a common problem where people think that an ecosystem is infinite, or at least sufficiently large, when it’s not. We’ve done similar with dumping in the ocean, and now we’ve all got plastics in our blood, and we assumed soil quality was a given over time, too. AI content released into the wild will be consumed by AI; how can it not be? You’ve got a system which can produce content at a rate several orders of magnitude higher a human, of course the content ecosystem will be dominated by AI-generated content, so of course the quality of content generated by AI systems, which rely on non-AI-generated content to train, will go down over time.
I feel like we are nearing peak "uncurated" content, both for humans and machines. Humans are still grappling with our novel abundance problems.
As we move forward, suspect we will see an increase in curation services and AI models will do more with less. You can bootstrap a productive adult human on an almost infintismal slice of the training sets we are using for the current gen of AI, can't imagine future approaches are going to need such an unbounded input to get better results - but might be wrong!
If content is curated for its quality, whether or not it's AI generated (or assisted) doesn't matter.
We’d have to adjust capitalism to deal with the “novel abundance” problems. Most of the drive for novel content/audiences is simply to decide which people get a cut of the revenue (and/or audience).
If we focused on quality and stopped caring about who gets paid for what I suspect that not only would we have better quality overall but we’d also push the boundaries much faster thus making things even more interesting.
Strong disagree. Capitalism (can argue whether that's what we have or not) uses capital to respond to markets. Markets are signals of what people want.
The scarcity mindset in an the era of information abundance isn't being perpetuated by capitalism, it's being perpetuated by people making their own decisions one transaction at a time.
You make some strong logical cases so I'd like to use some of your language to try to give another perspective on my points:
Markets in our current system are (IMO) a distorted view of what people want. For example TVs: How much of the market is people who simply want a nice clean picture and how much is people who are feature-chasing based on the TV marketing? And then how much of the 'market' is actually smoke and mirrors accounting to increase the value of companies?
Some things that I think are only common because of capitalism (or "capitalism") and could therefore be phased out if we transitioned to an AI-supported system:
- "Keeping up with the Jones'"
- Needing short hits of artificial 'excitement' (such as the kind of novelty drip-fed to us by the movie industry)
- Keeping ideas secret as a form of Intellectual Property protection, and therefore the scarcity mindset that there are no good solutions and no new problems to solve (since we'd all be exposed to the latest of those if we chose)
- Capital being used for poor-quality projects (since many of these again are just about trying to grab more money for the people making them)
- People making their own decisions one transaction at a time. This is definitely underrated. Sure some people will continue to think short-term just because that's how their brains are wired, but I suspect the majority will find themselves able to breathe slowly for the first time and they will naturally unwind in to longer and longer-term thinking.
I think that in addition to influencing future models, AI content will also influence how humans think and write. People will start ironically and unironically copying GPT's style in their own writing, causing human produced content to increasing resemble AI content.
High school students that are prohibited from using AI for their essays will have a bad time. Even if they don't use AI chatbots themselves, they will unknowingly cite sources that were written by AI, or were written by someone who learned about the topic by asking ChatGPT.
Hmm, forgetting natural language for a moment and instead considering programming languages: it’s pretty easy to generate nonsense but semi-plausible looking ASTs without the help of AI. Could this be used to attack GitHub’s copilot?
Step 1. Release a tool that generates nonsense code across a thousand repositories, and allow anybody to publish crap to GitHub.
Step 2. Copilot trains on those nonsensical repositories because it can’t distinguish them from the real thing.
Imagine this as a security attack vector. Instead of nonsense, spam a bunch of repos with code that does a specific thing but in a very hard to understand way. Then add in a small piece of very hard to understand, but legit looking malicious code. Copilot trains on it and then starts feeding it to developers around the world. Probably easier ways to achieve this, but interesting to think about.
Those blogs already exist. Pretty much 90% of the results I see in Google for non-technical household related queries. Just incoherent rambling that sounds plausible but is complete nonsense.
> Published content will be later used to train subsequent models, and being able to distinguish AI from human input may be very valuable going forward
Not necessarily all AI contents are bad and all human contents are good. We need a way to separate the good from the bad, not AI from human, and it might be impossible to do 100% correct anyway.
I think I would compare it to Stack Overflow. Some of the solutions do exist there, but not all are applicable to the use case or the exact circumstances the person asks there and yet the prompt used by AI would remain the same. SO has its rating system, but it has the same issue as the sentence above. From that perspective, we have identified potentially good human output ( assuming it wasn't already pollinated with AI output, which seems less and less likely ) that should only be accessible by humans and we would need a separate forum for bad AI output ( that should be verified by humans as bad but maybe only be accessible by AI once verified ).
I am just spitballing. I do not really have a solution in mind. It just sounds like an interesting problem going forward.
> Could we end up having AI quality trend downwards due to AI ingesting its own old outputs and reinforcing bad habits?
No, because models are already trained like that. Datasets for large models are too vast for a human to even know what's inside, let alone label them manually. So instead they are processed (labeled, cropped, etc) by other models, with humans overseeing and tweaking the process. Often it's a chain with several models training each other, bootstrapped from whatever manual data you have, and curated by humans in key points of the process.
So it's actually the opposite - the hybrid bootstrapping approach that combines human curation and ML labeling of bulk low-quality data typically delivers far better results than training on a small but 100% manual dataset.
> They are processed by other models, humans overseeing and tweaking the process. Often it's a chain with several models training each other, bootstrapped from whatever manual data you have, and curated by humans in key points of the process.
A great description of what actually happens when you deal with massive datasets. One way to inspect a large dataset is to cluster it, and then look at just a few samples from each cluster, to get an overview.
Okay for image models I think humans could help a lot more than we give them credit for. We can read and parse images WAY faster than you might think.
What if we just crowdsource and have a new Folding@home protein thing but this time it’s for classifying data sets? LAION-5B has 5 billion image text pairs, if we got 10,000 people together that’d just be… 100,000 per person which would take… awhile but not forever. Humans can notice discrepancies super quickly. Like a slide show display the image and the text pair at a speed set by the user, and pause and tweak ones that are outright wrong or shitty.
Boom, refined image set.
Maybe? I’m looking at the LAION-5B example sets on their website and it seems to literally be this simple. A lot of the images seemed pretty poorly tagged. You get a gigantic manually tagged data set, at least for image classification.
I assume at some point, ChatGPT needs some kind of text ranking. Popular texts are usually correct (content and presentation) and useful, so they should rank higher. At some point, low-quality texts are filtered out. Personally, I don't care if a text is written by a human or a machine as long as it's good.
> Could we end up having AI quality trend downwards due to AI ingesting its own old outputs and reinforcing bad habits? I think it's a particular risk for text generation.
I just had my eyes opened reading that, because humans also do exactly that, inadvertently.
This isn't an issue, because it's possible to add prose quality and content accuracy scores to training data and train the model to predict those quantities during generation, which would allow you to condition the generation on high prose quality/accuracy. It just requires a small update to the model, and a shit ton of data set annotation time.
Likewise, images can be scored for aesthetics and consistency and models updated to predict and condition in the same way.
How would you score them at scale without training some model to differentiate real vs. AI content? If you need to train such a model, where would you get the data from?
We don't need to differentiate AI vs Human, just accurate and well written vs not. We'd do that the same way we've scored stuff at scale so far - grad students and crowdsourcing.
That just isn't true. It's expensive, but entirely doable. Also, it's perfectly normal to perform initial model training on a large data set to capture the statistical properties of language, then perform a second stage of model training on more curated data to cause the model to actually do what you want.
Current LLMs are trained on as much data that can be scraped from the public internet. It’s simply not possible to annotate that much data, even with crowdsourcing. It’s not even a matter of cost. You’d basically need to duplicate the amount of data on the internet. I don’t think you’re appreciating the scale of the data involved in training these models.
Not necessarily. The bloom model (a GPT competitor and similarly sized) was trained on 1.5T of text, which reduces down to 350B unique tokens. If you took a histogram of those unique tokens, it would have a very long tail with probably 1% or less being well represented. That leaves 350M common tokens to serve as the basis for token tuples being fed into crowdsourcing. There are probably ~2-5B very common token sequences, if you had 5 people view each token sequence and give it a few scores, and that process took ~1-2 minutes (these sequences are short), that leaves a conservative estimate of 50 billion person minutes, or ~34 million person days. If you paid these workers $15/hour, that comes out to $12.5 billion dollars, which is not prohibitively expensive for any big tech company when spread out over several years, particularly when it provides a massive competitive advantage.
BLOOM isn’t as good as GPT-3 because it doesn’t use as much training data. LLM quality is still data bound [0]. Further limiting data by requiring annotation is not going to work, at least with the current LLM modeling approach.
As the scale of input data goes up linearly, the scale of commonly observed input patterns goes up logarithmically. If we bumped the scale up an order of magnitude in terms of common input tokens, that still means we could annotate the important part of a 150TB text corpus for 125B worth of human annotation. Given that could break the budget of even large corporations, realistically we'd probably train a model to predict the scores of interest using a fraction of that much human annotation, which would be inferior but still a massive improvement. It is also likely that corporations would team up with indirect competitors to share the cost of annotation and gain an advantage against direct competitors.
How do you figure? Let's say a commonly observed input pattern comprises 1% of training data. For a data set of size N, 0.01 * N examples will contain the pattern. If we increase the size to 2N, 0.01 * 2 * N examples will contain the pattern. Why is the growth logarithmic?
As the data set size increases, the average frequency of non-trivial most frequent patterns will go down, and the tail will get much larger. Thus if you had a 1% cutoff, the percentage of the data set hitting this cutoff goes down as the data set size goes up. Take a look at pareto distributions with high alpha to understand the statistics of it.
Of course, this is only true if new data is distinct from old data. If you just copied your data set 10x and pretended it was a 10x larger data set, it would behave like you expect.
Hmm I’m still not convinced. Gather training data can be thought of sampling the underlying distribution of the data. In that sense, you’d expect the proportions of things to converge towards the underlying distribution as you gather more data.
That would be true if we were sampling from the underlying distribution in an unbiased and balanced way from the beginning. Instead data is generated and incorporated one set at a time, and each set is biased. Jargon and terms vary, but the language plumbing is the same - new sets bolster common phrases/idioms and lengthen the tail with specific tokens.
Keep in mind though, language isn't a stationary process.
Even if each dataset is biased, I’m not still not sure how you derived logarithmic growth from the general notion of bias in data. For instance, assuming the data is biased, perhaps it is biased in the other direction and contains more common patterns compared to the underlying distribution.
Ah, I see what you mean: the number of unique examples increases logarithmicly with data size, which kind of makes sense. Language, in this case, follows a power law.
I think you argument is that this means smaller datasets are ok because they contain "most" of what the larger datasets contains. But I think this data-power-rule implies the opposite. ML models can often get to 80-90% accuracy on some task. Unfortunately, these models often aren't that useful because that missing 10% of accuracy matters a lot to users. So what this data-power-rule implies is that, in order to get the last 10% of gains, you need 10x the amount of data.
Well, to get back to my original point, if we're trying to improve the quality and accuracy of model writing, and we want to do that by adding quality and accuracy scores to short token sequences, the power law distribution means we could get coverage on a significant portion of the data set by scoring just the most frequent sequences that aren't linguistic trivia. We could probably get to 50% average coverage fairly cheaply, and while diminishing returns would kick in and make getting to 80 or 90% much more expensive, at that point we could use a model to estimate the remainder, and have a perfectly suitable quality/accuracy scores to condition the model on. The model would output those quality/accuracy scores for the generated token sequence as well, so portions of output that were low quality/of questionable accuracy could be flagged.
An arms race between generative AI and discriminative AI is a good thing, it'll lead to higher quality models since the only way to "win" the game is to make output that is indistinguishable from legitimate content.
In fact, that is basically the GAN algorithm in meta form.
There will likely be selective pressure from human interaction with the data to curate good content above bad.
After all, we had the issue of millions of auto-generated bad pages in the web 1.0 SEO days. Search engines addressed it by figuring out how to rely more heavily on human behavior signals as an indication of value of data.
The thing that concerns me is that we may end up with a downward trend in accuracy.
If AI writes the bulk of the content, how long will it be before people simply do not put in the work to make sure things are true or put in the work to discover and understand new true things?
>Such things could end up creating a downwards trend in quality, as more and more of such junk gets posted, absorbed into the model and amplified further.
I feel like a similar thing already happened with YouTube recommendations
To your point, anecdotally, the system is heavily gamed. The other day, I saw reviews for a restaurant pop up that did not quite open yet. Either reviewers got a sneak peek behind the chef's curtain or those reviews are not quite true.
Sadly, word of mouth again becomes defacto the only semi-reliable way to separate crap from non-crap and even that comes with its own set of issues and attempts at gaming.
There is so much stuffing for a simple idea that I'm not sure if this piece deserves its own title, but I'll give it the benefit of the doubt.
One thing that I wonder though is how we will draw the line. If I'm writing a piece and do a Google search, and in that way invoke BERT under the hood, is anything that I write afterwards "AI-tainted"? What about the grammar checker? Or the spot removal tool in photoshop or gimp? Or the AI voice that reads back to me my own article so that I can find prose issues?
And that brings the other problem: do the general public really know the extent of AI use today, never mind in the future?
With all of that out of the way, yes, I would rather read text produced by human beings, not because of its quality--the AI knows, sometimes humans can't help themselves and just keep writing the same thing over and over, specially when it comes to fiction--but just to defend human dominance.
>What about the grammar checker? Or the spot removal tool in photoshop or gimp? Or the AI voice that reads back to me my own article so that I can find prose issues?
We could get this whole discussion back to some semblance of sanity if we stopped calling any form of remotely complicated automation "AI". The term might as well be meaningless now.
Nothing about any of all these "AIs" is intelligent in the sense of the layman's understanding of artificial intelligence, let alone intelligence of biological and philosophical schools of thought.
> There is so much stuffing for a simple idea that I'm not sure if this piece deserves its own title, but I'll give it the benefit of the doubt.
Frankly I had the same thought writing it :D
It's more of a stake in the ground sort of a thing I guess?
What I really want is somebody saying "hey, there is an open standard already here" so I can use it.
The idea has some legs, but they are weak for the many reasons pointed
out to me by fair criticism of "digital veganism". The main one is
that labelling is one small part of quality. Tijmen Schep in his 2016
"Design My Privacy" [1] proposed some really cool ideas around quality
and trustworthiness labelling of IoT/mobile devices, but ran into the
same issues. Responsibility ultimately lies with the consumer, and so
long as consumers remain uneducated as to why low quality is harmful,
and cannot verify the provenance of what they consume or the harmful
effects, nothing will change.
Right now we seem to be at the stage of "It's just McDonald's/KFC for
data - junk food is convenient, cheap and not a problem - therefore
mass production generative content won't be a problem".
The food analogy is powerful, but has limits, and I urge you to dig
into Digital Vegan [2] if you want to take it further.
>One thing that I wonder though is how we will draw the line. If I'm writing a piece and do a Google search, and in that way invoke BERT under the hood, is anything that I write afterwards "AI-tainted"? What about the grammar checker? Or the spot removal tool in photoshop or gimp? Or the AI voice that reads back to me my own article so that I can find prose issues?
>And that brings the other problem: do the general public really know the extent of AI use today, never mind in the future?
The line is drawn at human ownership/responsibility. A piece of content can be 'AI tainted' or '100% produced by AI', what makes the difference is if a human takes the responsibility of the end product or not.
Responsibility and ownership always lies with the humans. Even supposedly 100% AI generated content is still coming from a process started and maintained by humans. Currently also prompted by a human.
The humans running those processes can attempt to deny ownership or responsibility if they so choose but whenever it matters such as in law or any other arena dealing with liability or ownership rights, the humans will be made to own the responsibility.
Same as for self-driving cars. We can debate about who the operator is and to what extent the manufacturers, the occupants, or the owners are responsible for whether the car causes harm but we'll never try to punish the car while calling all humans involved blameless. The point of holding people responsible for outcomes and actions is to drive meaningful change in human behaviors in order to reduce harms and encourage human flourishing.
In terms of ownership and intellectual property, again the point of even having rules is to manage interactions between humans so we can behave civilly towards each other. There can be no meaningful category of content produced "100%" by AI unless AI become persons under the law or as considered by most humans.
If an AI system can ever truly produce content on its own volition, without any human action taken to make that specific thing happen, then that system would be a rational actor on par with other persons and we'll probably begin the debate over whether AI systems should be treated as people in society and under the law. That may even be a new category distinct from human persons such as it is with the concept of corporate persons.
> ... yes, I would rather read text produced by human beings, not because of its quality ... (snip) ... but just to defend human dominance.
One could make a strong argument that defending moral principles is preferable to preferring the underlying creative force to have a particular biological composition.
As an example, I don't want a system to incentivize humans kept as almost-slaves to retype AI generated content.
How can one tell the difference between all the gradations of "completely" human generated to not?
> One of my favorite products is “100% Fat-Free Pickled Cucumbers Fit (Gluten Free), “ which I once saw at the grocery store.
On my first fligh to the US, in the 90's, a rather obese lady in the row in front of me asked the flight attendant: "Excuse me. Do you have fat-free water?"
The flight attendant hesitated a split second, her face not moving an inch. Then she smiled and replied: "We certainly have fat-free water, madam. I fetch you a bottle straight away."
A few years ago in a hotel in London I had complementary water bottles on the night stand. The label said "Organic Water from Scotland". I was like: uhh organic water from Scotland is probably cattle piss. I prefer inorganic, non-bio water.
The long term impact of the ease of generating low nutrition digital content using language models may be that people put down their devices and return to the real world. We’re already far down that path with the existing internet where most content is generated for SEO.
Anything you’re consuming on the internet or even on a TV may just be random noise generated by some model so why waste your precious time consuming it?
On the flip side why waste your time producing content if it’s going to be drowned in a sea of garbage mashed together by some language model?
> The long term impact of the ease of generating low nutrition digital content using language models may be that people put down their devices
The problem is people don't always make the wise decision. Evidence: the junk food industry is alive and kicking.
Some people will disconnect from devices, but others may just say "this is the way things are now" and adjust themselves to the flavor of junk content.
Why are you assuming that it will make writing worse not better?
Just because it can be used by non-experts to create crappy written work, it can also be used by people who work with it to augment and improve their existing writen work.
<< Did the printing press also increase the amount of crap in circulation?
<< Why are you assuming that it will make writing worse not better?
Both of these are fascinating questions and, to me anyway, both can be answered with yes. The sheer amount of writing increased exponentially once more people could read, write and publish their own writings ( and internet only exacerbated this trend ). I accordance with pareto principle, most of it was of poor quality, but the upside was that good output likely did increase in terms of absolute number as well ( few people are bound to write something decent ).
I think parent is looking back at history and reasonably infers potential results ( more crap ).
Sorry, but all I can think of after reading this blog post is the evil bit RFC: https://www.ietf.org/rfc/rfc3514.txt which has had just as much effect on internet security as this proposal will have on controlling ai generated content.
This post reminds me of Samuel Butler's novel, Erewhon.
"Assume for the sake of argument that conscious beings have existed for some twenty million years: see what strides machines have made in the last thousand! May not the world last twenty million years longer? If so, what will they not in the end become? Is it not safer to nip the mischief in the bud and to forbid them further progress?"
That reminds me of the epilogue to HG Wells Time Traveler:
“ He, I know—for the question had been discussed among us long before the Time Machine was made—thought but cheerlessly of the Advancement of Mankind, and saw in the growing pile of civilization only a foolish heaping that must inevitably fall back upon and destroy its makers in the end. If that is so, it remains for us to live as though it were not so.”
That passage has haunted me since. I often wonder if that is the answer to the Fermi paradox. Civilization might be but a brief spark in the long night, separated from others by both time and distance insurmountable.
2. By someone who has consumed AI-generate content
3. With a ~2000-era spellchecker
4. By someone using ~2020-era neural speech-to-text software
5. With a ~2020-era spellchecker
6. By someone with an AI in the traditional editor role (reads it, gives detailed feedback)
7. By a human and an AI working together where the AI generates everything initially but the human fixes things and nothing goes out without human endorsement.
I'd probably draw the line at 7, but you could also argue for 6 or even 5.
I've always maintained that for any food product labeled "Home Style" or "Home-Made Flavor", the product must also feature a photograph of the factory floor where the product is made.
I was thinking about this exact problem a few days ago when I created a site hosting poems that were either 100% AI written, or 100% Human.
https://news.ycombinator.com/item?id=34472478
Then I asked people to guess the authorship. Amazingly, only 70% of the time the guess people make is correct. https://random-poem.com/
why : Is this Poem written by or by ? Guess & Click.
I'm guessing it will get even harder to tell as the AI improves further down the road.
Good Question. However this is 100% ChatGPT. Perhaps it knows that Weird Al intentionally misspells words in his work, hence it has intentionally introduced this typo. Which makes this type of AI even more awesome.
--edit
.. weird:
I asked ChatGPT if there is a typo in this poem. This is what it responded with:
ChatGPT: It appears that there is an intentional typo in the first line of the poem "Weird Al, oh Weird Yankovic Al" instead of "Weird Al Yankovic". Yankovic being the surname of the artist, this addition can be seen as a playful and humorous way to refer to the artist, and give the poem a personal touch.
This reminds me of the ‘portrait drawings’ to camera transition.
LLMs have given us a more interesting corridor in the Library of Babel - https://en.m.wikipedia.org/wiki/The_Library_of_Babel - but choosing the wheat from the chaff will still be the human endeavor because of the infinite possible BS.
I'm not sure how to best describe it, but every time I interact with AI, there is very little emotional response from it. Rather it's a "good enough" response rather than a sense of awe or horror.
I've been experimenting with writing recently and wanting to provide some AI imagery to match the emotions I'm expressing. A painting like "Wanderer above the Sea of Fog" evokes many emotions. But when I use the same description such as:
"a man standing upon a rocky precipice with his back to the viewer; he is gazing out on a landscape covered in a thick sea of fog through which other ridges, trees, and mountains pierce, which stretches out into the distance indefinitely."
I get the store-brand version that doesn't invoke any emotion. It is "good enough" to get the point across, but lacking the response. Similar to the countless recreations of the Mona Lisa, there is just something about organic perfection. I'm sure AI will get there one day, but who knows if we will react to it in this sense of wonderment.
Do you think your emotional response is related to the external knowledge of who produced the painting or is it only based on the visual impression?
We probably can still figure out if a painting is original or AI-generated but I don't think we can much longer as AI improve.
The question would be could we feel emotions even if the source material is artificial. I think the answer is yes. Human brain is can easily be tricked.
I guess it is more that AI doesn't struggle to create. Humans go through many emotions in the process of creating art. The outcome isn't as important and the appreciation that artists went through that grueling process and the outcome being what it is I believe is the response I'm trying to express.
So more to the human struggle of the production. At least for me.
>a man standing upon a rocky precipice with his back to the viewer; he is gazing out on a landscape covered in a thick sea of fog through which other ridges, trees, and mountains pierce, which stretches out into the distance indefinitely."
Might want to look into better prompt engineering. This is pretty generic and there are better prompts you can use.
We just need to give the AI memories from real people and give them a life span short enough that they won't realize it... and then if they go bananas and start causing harm, make another AI to terminate them... I swear I have heard this strategy used elsewhere.
Hear me out, the problem is not AI generated content but what this content stands for.
Why we use text? Half of it is about getting something from someone else BUT even more importantly we write text to change something in the worlds using our words.
The problem with AI pretending being a human only exist when that AI doesn't get anything from us and it's only good for extracting information from.
It's utterly futile to discuss with AI generated content here on HN but it's amazing experience on ChatGPT precisely because when we write each other something here on HN, we expect that our words will create some impact somewhere. Some option will change or we will befriend someone.
I have 0 problems with having interacting with AI which is an individual machine in Lisbon, and it is learning and evolving as the life happens. On the other hand, I hate the idea that I'm talking with a machine in Palo Alto and the only output of my conversation is some statistics that VC will gaze over and optimize for his own gain.
Just think about it how meaning is transferred from human to human: we can compress meaning in few markings and extract it on the other side only because as individuals we have experienced life and some markings saying "bored" is enough to transfer very complex situation from person to person. In the current state of the AI, being bored doesn't mean anything to LLMs but if individual AI machines lived human-like lives, I think it will start having meaning for them too.
IMHO, the problem with AI generated content currently is exactly the same with SPAM or other non-genuine content and has nothing to do with it's origins being biological or electronic.
We people will put eyes on a ball and call it our friend, we don't have problem with it.
Proving content to be AI-free content is fraught. It might be practical in narrow contexts, but I think it will be mostly impractical. It will even be theoretically impossible in many cases.
So, the article's underlying philosophy is "not there yet". It does not adequately address various real world challenges.
1. How is AI-generated content different from algorithms? I'd suggest drawing a line may be nonsensical.
2. What is the precise ethical motivation for wanting to avoid AI-influenced computation? I don't see a compelling case.
Examples:
A. Do we want civil engineers to use optimization software? Yes.
B. Do we like spelling and grammar checkers? Yes.
C. Do we want content generation software to suggest topics and hyperlinks? Yes.
D. Do we want to try out AI music? Yes. And we want to remix it.
E. Do we want to improve our health by making it more accessible and affordable? Yes.
If our goal include protecting human rights, health, dignity, and so on, we better darn well formulate our philosophy and policy goals in a coherent way.
You know that to certify something as kosher, a rabbi has to be paid to stand there all day next to the kishke stuffing machine or whatever. Makes the consumable a lot more expensive. On the other hand, I see job opportunities.
[edit]
Actually, just to expand on this a bit, this is effectively an argument for establishing a mark of authenticity that all literate civilizations have always striven to place on their intellectual output. There has never been a single rule uniting these efforts, but civilizations which placed more emphasis on safeguarding and defending the precision of the written word have tended to be rewarded with greater longevity. There's no reason that trend shouldn't accelerate when faced with the threat of inundation by meaningless language models. Just like gold coinage, we're looking at a period of debasement and inflation.
You could actually argue that language models themselves are an expression of anti-semitism, in the sense that they're an attempt to undermine the sacredness of the written word, to destroy or wash out the way that the meaning of words ennobles humanity, and to eradicate the special relationship that the law of language and the language of law create between God and Man. I only say it seems anti-semitic because that particular concept, as a high/sacred value, seems unique to Judaism (from my perspective, I can't think of another culture that considers it an inviolable precept) and so this attempted abolishment of the human hand in the written word seems particularly targeted at those who consider the word sacred; maybe this is yet to be threshed out. Maybe Bari Weiss will write about it once some nazis have ChatGPT come up with a totally bunk but plausible corruption of the Talmud. But love of the written word is something that should rightly be a general human value, because we'll live or die with it, Jews and everyone else, whether we want to or not. All civilizations fall when their coin is debased, and our coin today is information.
[edit2] also, I'm drunk, and I love y'all. I hope this stimulates debate, not hate.
I hate to say it, but it does not sound that far fetched. In some weird ways, tech already feels like magic to a lot of people and people surrounding it are today's wizards and priests willing that power into existence.
In practical sense, as a society, we do need something to separate good from bad. Technopriest cast does sound like a fun job description to me ( which will inevitably be corrupted and result in its own schism ).
Is "AI-free" clear? If I google something during the making of my project, is the project AI free? What if I type on a phone keyboard with autosuggestions? There is machine learning involved in all kinds of stuff and AI is a pretty general term.
Edwin Brady often talks about his (digital) lab assistant. I think that’s a good mental image: an AI assistant is what we’ve used so far and I think is fine. They help us bounce ideas back and forth (e.g. searching) and correct typical mistakes (typos, type errors in Edwin’s case). But as soon as they’re running the lab and we become the assistant, I think that’s where most people would draw the line.
Those pushing the concept of "AI-Free" have yet to nail down what amounts and types of automation may be used in the production of a thing before it can no longer be labeled "AI-Free".
On the other hand, "Organic" is very well defined at this point. In the US there is a whole legal framework around labeling foods, drugs, and cosmetics.
There is even a training and accreditation system in place for qualifying people as certifiers of organic practices as well as an application and review process for farms and manufacturers to be certified to use the label.
I disagree. We have called online distribution "content distribution" and the people who contribute stories, videos, and art "content creators" for at least a decade. It's a catch-all generic term for all content that another person would consume via a platform. In a general work setting you're probably not producing content for a platform, but you are certainly producing content for your coworkers and customers. It's so you don't have to say: "stories, blogs, vlogs, photos, digital art, products and any other thing that is consumed by others."
Agreed. If anything, word 'content' captured a broader array of work output in the mind of general public than work, which conjures up a lot of things, but rarely artistic endeavors. I do dislike using it as a part of describing my own habits, but it would appear that ship has sailed. In both my tech and non-tech circle, stuff they do is content consumption. I am not sure when it became so prevalent.
Well, we've got part of the Mentat angle covered already.
"It is by caffeine alone I set my mind in motion. It is by the juice of Java that thoughts acquire speed, the teeth acquire stains, the stains become a warning. It is by will alone I set my mind in motion."
> Customers and the audience should be able to know and choose if they are interacting with AI-generated content. More importantly, they need to be able to choose to interact exclusively with artisanal, human-produced ideas.
good that automattic is setting this policy for themselves, but as long as this is opt-in there is zero chance this fantasy will be reality
also @simonw’s “AI veganism”: https://news.ycombinator.com/item?id=32639643 its a good term because it will be about as popular and have as loud and passionate and few adherents as actual veganism
> good that automattic is setting this policy for themselves
To clarify, this not a policy but a design choice and an intent of one PM :D
I cannot speak for the actual company policy because we are at an exploration phase.
> also @simonw’s “AI veganism”: https://news.ycombinator.com/item?id=32639643 its a good term because it will be about as popular and have as loud and passionate and few adherents as actual veganism
Diets are such a rich metaphor for the approach to content!
Fast food on the consumption side, veganism, kosher, paleo etc. on the production side
> Some of the AI-generated output is factually wrong
While this is undoubtedly true and a problem that needs to be addressed, it's worth considering that humans gets things factually wrong too sometimes (intentionally or not). So perhaps a more interesting question is how much more or less correct an AI is than a human on a given task.
People talk about this with self-driving cars all the time. Arguably a self-driving car does not need to drive perfectly (not that this is not a good goal), but if it can drive more safely than the average human driver, there's still a significant chance of improving overall road safety.
Not saying there won't be those who do, but caring if something was created by AI instead of a human is about as intelligent as being bothered by your neighbor being a certain nationality, or boycotting a movie because one of the actors has the "wrong" skin color.
It matters what we consume, plenty of humans peddle rotten ideas and false narratives. With AI, I'd look into the models and the intentions of those who created/run it. What sort of content it delivers. Just as it makes sense with human authors to look up their biography and publishing history.
I disagree. I think humans have a higher creative ceiling and so the truly moving, life altering pieces of art will always need that human element; I’d rather not waste my time consuming the AI content when it’s peaks are lower.
AI content will be like modern pop songs and super hero movies. They fit a formula and people will still consider them “great” because the bar has been lowered by algorithms and reduced appetite for risk. but deep down we know they are not in the same league as works from The Beatles, Sting, or Martin Scorsese.
Here [1] is a picture of the Mona Lisa. Even if one isn't really into art, you can't help but be drawn into that smile. What did it mean? What was she thinking? What did Leonardo see, and when painting it did he intend to convey what he saw? Or was it a mystery to himself as well, something unbelievably well conveyed in the painting itself?
The reasons these questions are so fun to consider is because we know the answers exist, even if we also know we shall never know them. By contrast had this image just been something output by yet another neural network generator, there would be little to no interest of the same sort, as you would know that anything you infer beyond the most surface level is exactly the same as seeing shapes in the clouds.
Being into art, I think your example perfectly demonstrates why we can't get AI soon enough. There is no good reason for thousands of tourists to crowd around the Mona Lisa on a daily basis. They don't even understand what they're looking at.
Would people pay millions of dollars for an AI generated black square NFT? Maybe some would too, it really shows the retardation of the entire space. With the mainstream it's not even about art and craftsmanship but about big names and make-believe. And that's not just art but all fields including science to a degree. When we get AI, hopefully more people are going to start focusing on substance instead of marketing and illusion.
It's not clear that people actually care about and want AI-free OC, at least if you look at what kind of content is being consumed. Right now Google search seems to prioritize non-organic content, with search results often being a stream of blogspam and reddit/quora shill/astroturf crap that if not AI-generated, is close enough in terms of tone, accuracy, and originality, that it might as well be.
Meanwhile you never get any results from 4chan or KiwiFarms (sites with much more organic content), unless you go out of your way to specifically ask for it.
Google not prioritizing 4chan and KiwiFarms is a terrible example "not wanting organic content". They're not just "organic content", they're cesspools of the worst kind of organic content. 4chan is notoriously filled with questionable to outright illegal content/activity, and KiwiFarms is just a website to organize doxxing and online harassment. I don't understand why that's your standard for "organic content". I would have to heavily question your activity if you're being thwarted by a lack of those sites in your search results. I'm personally very happy with 4chan links not showing up in search results.
I also think you're misunderstanding how Google prioritizes content. It's not showing the content people want, as much as content that's optimized with SEO to look appealing.
> KiwiFarms is just a website to organize doxxing and online harassment
KF is a gossip site, like Lipstick Alley or DataLounge, but with a focus on e-celebs and internet phenomena. I go there whenever I want to get the dirt (ie. truth) about e-celebs or some weird internet trend or subculture. Its users literally just passively document and discuss this stuff, and there's a policy (and ethos) of "no touch." You and others endorsing its erasure (despite it not breaking any law) would know this (and perhaps secretly do) if you actually lurked there for a while instead of just credulously accepting its critics' characterization. It's also strange how the site is supposedly such a hive of doxxers, harassers, etc., yet nothing seems to happen to the journalists and tech people organizing against it.
I think the real reason it's in hot water is precisely because they just passively, permanently document so much, some of which some people with money and influence don't want documented. Maybe they documented something someone (some billionaire?) secretly doesn't want out there?
The sorts of people being "documented" on KiwiFarms are not celebrities. They're usually vulnerable people with some sort of mental illness who are struggling. And I don't buy for a moment the "no touch" policy. Just because you can't use a specific website to harass someone, does not mean you can't use the information on the website to harass them off-platform. This is a bad take. There's a major difference between journalism, tabloid journalism (which I also consider worthless and wrong), and stalking vulnerable people on the internet. Or as you call the latter, "documenting".
> That's because you haven't spent much, if any, time on the site. Lurk more.
Zero interest. You don't have eat white lead to know it will give you cancer.
> there is some truly bizarre social phenomena that you can't find documented elsewhere
There's a word for this: stalking. Many of these "social phenomena" are people, and like I said, ones who suffer with mental illness. You're quite literally advocating for, and enabling real-world stalking when you endorse write-ups about these sorts of people.
> apps which literally can be (and have been) used to coordinate harassment
Used for, vs created to enable. There's a difference you're either ignoring or willfully blind to here.
4chan constantly deletes its own content. There are a limited number of slots for threads, and making a new one causes the oldest one to disappear. Google does not care much for dead links.
Additionally, image submissions are basically never described in the text, so they are unsearchable even when they are live. There's some exceptions with archives but now you're in power-user territory.
Not being compatible with search engines is a side-effect that its userbase actually likes. You are either currently there or you are not.
There are archives, like rebeccablacktech and 4plebs, which Google likewise blackholes. You could argue Google also does not care much for archives, yet I still get StackOverflow clone results, for some reason.
The solution that comes to mind is something like the PGP web of trust, except the web would consist of verified humans.
This didn't work for PGP because people in general don't care about that. And I think people in general don't care if their content is AI-generated or not.
It's not like all human-generated content on the web is tremendously accurate or well-written. Hell, maybe the AI will even be better. :)
OpenAI is working on a watermark for their models (if not already there) that would recognize content as AI generated. If its good enough, it should be able to filter out AI content when training new models
Although I like most of the posting, some thoughts:
First of all, we think with all inventions we have excluded human beings from being required. In the end we get more efficient, but we still need workers. The trend though is to go to core business and standard following the industrialisation, I see ai as an opportunity to have more custom and service driven business on an industrial scale though, that is the real promise. This can start with service desk providing service rather then unusable rubbish ad they do now.
For the art and text: this industry is relatively new, and came with internet and PowerPoint decks. And yes, probably this will be impacted by ai, but do we really have to be sorry for that? We will have the same page filling nonsense created by ai, that is now created by cheap labour. Look at recipe sites, all kind of crap to make recipes into content so the copyright is somewhat addressed, but nobody reads that anyway.
As for all other changes we have had before: there will be some impact on business and life as we know it, but not as much as anyone expected upfront.
The problem is -- just like the food products the author uses as analogy -- there is no reasonable agreed-on line between "natural" and "unnatural", "homemade" vs "factory", and "AI" vs "not-AI" because the processes at work are a gradient.
Is my spell or grammar check an AI? No, why not? What if I translate a word from another language while writing, using an automatic translation tool? What about the algorithms that boost (or don't boost) my post through various networks, and pick an audience for it? And the analytics tools I use? etc. etc. And all of this will just become murkier and murkier.
"AI" is itself a problematic term, since as many of us know many of the things so-marketed are little more than a bundle of heuristics or a complicated statistical model. And many of the "authoring" tools generating texts or images or code are really just copy-and-pasting-and-mutating things without a lot of magic in-between, but in clever ways that "trick" us into seeing it as original authorship. (Ok us humans do this too)
Perhaps the most reasonable dividing line is: is there a machine here that is emulating a human, pretending to be one? That is perhaps the thing that bothers me most about the recent wave with ChatGPT, Google's Meena etc -- the authors of these systems didn't have to create systems that presented with a human-like identity with pronouns and the illusion of selfhood. But they did, and now we have the Lemoine incident etc.
Reminds me very much of Dune, and its "Orange Catholic Bible"'s “Thou shalt not make a machine in the likeness of a human mind.” I think Herbert was onto something -- the question of whether something is truly human (or human created) or not might become the key important issue of our time and the ambiguity and confusion around that is going to become troublesome.
Great piece, it seems to have produced a couple of non-ai-augmented thoughts...
1. When ambitious young cooks / farmers etc set out to make a name for themselves in the food industry, I don't get the impression they're rushing towards kosher food development en masse. But I could be wrong.
2. Over the course of my life I've observed (and lived) the response to the dissolution with mass market factory farm food to be a movement towards local, organic, small batch and visibly produced / performed food production.
3. I want to believe a distaste or boredom with ai produced content will lead us all offline and make us more focused on art, music, writing, advice etc produced by people we actually know in real life. Like a farmers market but for everything.
4. If I had to guess though I'd say its more likely in the short term to just increase the popularity of live streaming everything.
I should really have had chatGPT proof read this for me and optimize it for upvotes.
I felt reading the article that AI/ML vs "organic" content will become the difference between fine hand-crafted furniture and flat-pack Ikea stuff.
Sure, the flat-pack is better than nothing for a temporary apartment, but it is something to move beyond, not what we want to be stuck with for the rest of our lives and in all situations.
Quality furniture, especially hand-crafted, is expensive and far from being universally affordable. It's quite likely that the same thing will apply here wrt "artisan hand-crafted content", and thus AI-generated stuff will still constitute the vast majority of what the society consumes.
That said, if it gets good enough that you have to be an expert to tell AI-generated stuff from human-made, does it really matter? The upper classes will always play the silly "look what I can afford!" status games, and these often don't have an utilitarian component to them - something doesn't need to be actually better, it just needs to be recognized as more refined. Well, and cost more.
We take it for granted these days that most text we read is generated by humans. But imagine a future where it is assumed most text is AI by default, and human text is only as common as computer generated text is today. Imagine 9 out of 10 comments on hackernews being AI authored. This comment, AI authored.
The organic analogy to AI refers to locally-produced crops grown with natural fertilizers and pre-industrial pest and weed control strategies, as an antitode to industrial-fertilizer soaked, herbicide-and-pesticide drenched, antibiotic-and-synthetic-hormone laden, globally-transported heavily-processed foodstuffs.
While this does sound good to most people, it's always worth remembering that Old South plantations produced nothing but 100% organic cotton, despite their atrocious practices of human slavery.
If we include this perspective, AI has some potentially extremely positive potential. One of the most promising trends in agriculture is the adoption of AI to the problem of weed and pest control, as well as fertilizer use. AI robots crawling over fields can use image recognition to identify weeds (and kill them with IR lasers) as well as pest infestations in their early stages. Application of fertilizer to individual plants on as as-needed basis (rather than just dumping large volumes on the entire field) also becomes possible. This can be done far faster and more efficiently than by human laborers walking up and down rows of crops, weeding by hand. Potentially, this could make organic-style agriculture as cost-effective as the industrial variety.
Sure, this means fewer field labor agricultural jobs - which are pretty tough, backbreaking jobs by any measure. Similar arguments apply to a lot of the drudgery in creative and artistic endeavors.
Of course, AI could also replace most corporate board positions, and a lot of upper management as well, and hey, why not the shareholders and investors too? Are they of any more value to the operation of a business than the replaceable grunt workers are?
Let's make the world's smartest AI and have it make decisions about how capital should be allocated to promising startups, removing fallible humans from the loop. Resource optimization on steroids!
Yes, I read too much science fiction. See William Gibson, Iain M. Banks, Hannu Rajaniemi, and Adrian Tchaikovsky for examples of what can go wrong (and maybe right).
> do you really want to cry watching a movie that was 100% produced by robots?
I've had a lot of fun having ChatGPT write stories for me: I'd ask it make changes, to add a character, add a motivation, etc. I'm just playing around, and it's 100% produced by a robot and I enjoy it. I don't personally mind having an emotional response by a story generated by a "robot". I don't really understand how it being bot generated cheapens the experience. The emotion that I feel are elicited by my thoughts and reflections based on what I've read and experienced, not by the robot.
our emotions and desires are mediated by machines literally day in and day out. there is no point whatsoever in making an arbitrary line at a damn movie.
Maybe a good step would be figuring out a way to not immediately commodify all human activity and then smugly tell the affected to learn how to do something new when their life's work is rendered obsolete by technology.
There is a case to be made that talking to a human is the bland one now. Amidst all the censoring and self-sensoring and assumption of bad faith, there is a lot that is not possible to discuss in polite company, even online, but is interesting. AIs seem more willing to give me a straight answer rather than going of moral tangents or ego trips. I have had discussions with the chat that went deeper than the smartasses in reddit . I thought that that was a lost art
So, just like with good cheese, maybe i will go with full-fat
I think we should judge content independently of whether it was created by AI or humans. Labeling AI-free content doesn't seem to add much value to me in general (with exceptions).
> How much of it will be being certain that you are reading something generated by human before you're willing to commit to having an emotional response, even if the output is identical, right?
... Interesting. Like, do you really want to cry watching a movie that was 100% produced by robots?Maybe not
I certainly wouldn't mind. AI-generated content is a statistical summarization of knowledge produced by infinite humans randomly typing on typewriters, after all...
AI tools I believe it will soon become a helpful tool for many people. Just like how smartphones made photography accessible to more people and Javascript made it easier for people to enter the tech field.
However, we will have a difficult time when it comes to social media because AI content will become more and more, until it is just.
It seems plausible the next social media will remove the social part completely, and just have people as consumers, creators.
I've started to 'learn' the ChatGPT tone (you can even give ChatGPT some text and ask it if it thinks it was AI written). Now I know the general structure and language it uses I've been spotting it all over Reddit, HN and some blog posts.
I have noticed it makes me get bored of reading content, and I start to skim through it assuming it's just AI generated waffle.
I think the quality will keep improving, because humans will keep curating the training data to compete for best results. The downside is that "quality" is conflated with "performs best for a particular audience and platform" which could just as easily mean re-ingesting junk and spitting it back out... because that's what people respond to.
Reminds me of the Dovetail phyle in Neal Stephenson's The Diamond Age where the rich value purely human made goods in a world of matter compilers that can almost instantly make or duplicate just about anything.
On the other hand, I find Stable Diffusion the most interesting thing going on in art at the moment...
You'll need a special AI soon just to read and process the flood of superficially reasonable sounding bullshit AI will be throwing at you from all kinds of directions. On the bright side, the quality of writing style might overall increase with AI paraphrasing and style correction/adaptation tools.
Funny, just yesterday I asked ChatGTP for some nutritional advise:
>can you give me the most common symptom of every vitamin deficiency
Sure, here is a list of common symptoms associated with deficiency in certain vitamins:
Vitamin A: night blindness, dry skin, and poor immune function
Vitamin B1 (thiamine): fatigue, irritability, and muscle weakness
Vitamin B2 (riboflavin): cracks and sores at the corners of the mouth, itchy and scaly skin, and light sensitivity
Vitamin B3 (niacin): pellagra, which includes symptoms such as skin inflammation, diarrhea, and dementia
Vitamin B5 (pantothenic acid): fatigue, insomnia, and muscle cramps
Vitamin B6 (pyridoxine): anemia, skin disorders, and depression
Vitamin B7 (biotin): hair loss, scaly skin, and brittle nails
Vitamin B12: pernicious anemia, which can cause fatigue, muscle weakness, and nerve damage
Vitamin C: scurvy, which can lead to bleeding gums, easy bruising, and poor wound healing
Vitamin D: rickets in children, and osteomalacia in adults, which can cause muscle weakness, bone pain, and fractures
Vitamin E: nerve damage, muscle weakness, and impaired vision
Vitamin K: bleeding disorders, easy bruising, and poor blood clotting
It's important to note that these symptoms can also be caused by other factors and deficiency in vitamins can manifest differently between people. Consult with a doctor if you suspect a deficiency.
>can you list the most recommended food source for each vitamin
Sure, here is a list of some food sources that are high in certain vitamins:
Vitamin A: sweet potatoes, carrots, kale, spinach, and apricots
Vitamin B1 (thiamine): whole grains, nuts, seeds, and legumes
Vitamin B2 (riboflavin): almonds, mushrooms, and dairy products
Vitamin B3 (niacin): chicken, fish, and peanuts
Vitamin B5 (pantothenic acid): mushrooms, avocado, and sweet potatoes
Vitamin B6 (pyridoxine): chicken, fish, and chickpeas
Vitamin B7 (biotin): eggs, nuts, and sweet potatoes
Vitamin B12: fish, meat, and dairy products
Vitamin C: oranges, strawberries, kiwi, and bell peppers
Vitamin D: fatty fish such as salmon, mackerel and sardines, egg yolks and mushrooms
Vitamin E: almonds, sunflower seeds, and spinach
Vitamin K: leafy greens like spinach, kale, and broccoli, also found in parsley, Brussels sprouts and even kelp.
> do you really want to cry watching a movie that was 100% produced by robots?
The fear reminds me of the early days of hip hop, where songs remixed from the past were blasted as unoriginal. I think we've all mostly agreed now that you can build new content that honors the old whole being completely fresh.
This sums it up perfectly in my mind, the analogy is extremely apt IMHO. As a musician I find it impossible to spontaneously produce melodies, rhythms, etc, that aren't heavily influenced by what I've listened to - perfectly analogous to how AI generated art is heavily influenced by the billions of training images it's ingested. The difficulty in composing is finding a way to transform, combine, and synthesize the influences into something new. I have been approaching generative AI in the same way and can't help but some of the fears are overblown. It takes work to get coherent results. If your own work is incoherent or wholly derivative, your AI outputs will also be. If your work isn't incoherent or highly derivative, then I'd argue you've already done a major part (if not most) of the creative work in your mind - the AI is just another tool to realize it which empowers human artists to be creative like never before.
EDIT: Okay, now I'm shocked to see your comment grayed out all the way at the bottom. Interesting times we live in.
This strikes me as a distinction without a difference. Content creation is not even close to the same as GMO or processed foods. People are justified in being concerned about how food is made precisely because there are meaningful differences in the nutritional value and trace chemical content in the end products. And that matters because we don't fully understand the role those factors play in long term human health. Created content does not share similar distinctions based on how it was produced.
At the level of our personal experience of a story, image, or sound it does not matter if the content was human generated or even if the stories are real; they either elicit an emotional response or they don't.
This is why we can enjoy fiction and art in the first place. It's also why a given piece of art can mean different things to different people.
Deliberately cutting yourself off from the meaning a piece could have for you just because you aren't yet sure that a human produced it is unnecessarily specist and only acts to your own detriment. Should one likewise refuse to engage with works that had assistance from the artist's pets?
With regard to news stories and other informational content, if two stories about the same event are both accurate, timely, informative, and/or insightful, it does not matter if one was written by a human on a typewriter and the other written by a bot after having been prompted by a human. The present state of the art still requires some human editing to hit those marks but we're quickly approaching a time where there will not be any detectable differences.
This idea that people deserve to invent a new category of things to be prejudiced about seems silly and unhelpful to me. Why bother inventing arbitrary distinctions just so people can play favorites and be judgemental about it? People are already free to judge content based on relative merits. That seems like enough to me.
And it's important to account for the fact that the end goal with these tools is that there will be no differences between human and AI generated content. The teams working on these things have already made considerable progress to that end, it's not hard to imagine the next couple of iterations actually achieving that.
So anyone trying to create these arbitrary labels is staring an uphill battle for a fundamentally unhelpful end state.
We really don't need to put that kind of energy into reinforcing people's desire to feel superior. We have more than enough elitism/triablism/racism/otherism around as it is.
At the end of the day, good content is good content regardless of who or what produced it. We should focus on recognizing and celebrating the beauty, regardless of its origin.
The best part is the “gluten free” anti AI crowd will be left in the dust. They may think they’re being classy snobs but being against AI is going to be tantamount to saying you eat crayons and don’t read well.
I think it’s pretty on-point though, because while Celiac disease is real (gluten intolerance), most people saying no to gluten don’t have it. The notion of gluten “sensitivity” or allergy is debated by medical science to say the least. It’s largely hypochondria, mass hysteria, and often a trendy meme for snobs and hipsters.
Respectfully, you’re wrong. Celiac is not the only kind of gluten intolerance, and I’m speaking from personal experience. I eat gluten, and I get severe issues with digestion. I stop eating gluten, the issues magically disappear. I thought my issues were just “normal” until a couple of years ago, and that gluten intolerance was a BS made-up thing. I was wrong on both counts.
The easiest way to signal that your content is AI free is to publish content that would be very difficult to do with AI, eg conversations between two known personalities on recent news stories.
FWIW I would like to shamelessly plug https://www.humanproved.com here . The idea is that a human needs to pledge that their content is human generated to get a unique badge , that they can display alongside their content along with a url. Someone can use the url to verify the time stamp that the badge was issued.
>I don’t think you can stop technological progress. Humanity moves in the direction of better and more sophisticated tools to make life more convenient, and it will be more convenient to introduce more and more AI-produced content into the culture.
That is a rather boldly positive perspective.
History show us how great technologies can come and go and is never distributed uniformly. Think sewers and drainage systems for example.
Obviously sophisticated technologies can be used in odious endeavors, where more efficiency means more atrocity. Think genocides for example.
I find this to be a particularly interesting problem in this whole debacle.
Could we end up having AI quality trend downwards due to AI ingesting its own old outputs and reinforcing bad habits? I think it's a particular risk for text generation.
I've already run into scenarios where ChatGPT generated code that looked perfectly plausible, except for that the actual API used didn't really exist.
Now imagine a myriad fake blogs using ChatGPT under the hood to generate blog entries explaining how to solve often wanted problems, and that then being spidered and fed into ChatGPT 2.0. Such things could end up creating a downwards trend in quality, as more and more of such junk gets posted, absorbed into the model and amplified further.
I think image generation should be less vulnerable to this since all images need tagging to be useful, "ai generated" is a common tag that can be used to exclude reingesting old outputs, and also because with artwork precision doesn't matter so much. If people like the results, then it doesn't matter that much that something isn't drawn realistically.