While what I just wrote may come across like a joke, there's a lot of truth in it. These language models aren't actually smart, they're just good at parroting smart sounding things they've been taught (although one could argue humans often do the same thing); the problem here is they need more sources to intersplice to hide the true origins.
For other "AI" like Computer Vision this is less of a problem since even if you're inputting proprietary visual content, they aren't typically outputting parts of that content, but rather the tags that have been assigned to it (often by humans). This is becoming a problem as the outputs are directly made up using the inputs, and those inputs are proprietary.
I still don't understand the purported difference between "sounds smart" and "is smart".
How else can you verify an entity's intelligence other than its "output"? Every doctor, pilot, president has obtained that position of power and responsibility literally only by being tested on their outputs.
We can't see inside a human's brain just as we can't see inside an AI's brain. What is the difference? How else do you propose we verify intelligence?
Give it an inane prompt to better see the difference, like "write me an essay about camels and their value in cybersecurity" which gives us the answer
"The camel is a valuable animal in the cybersecurity field due to its ability to store large amounts of data in its hump. Camels are able to cross vast distances with little water, which makes them ideal for carrying large amounts of data across networks. They are also very sturdy animals, able to withstand harsh conditions and even attacks from cyber criminals. In addition, their long eyelashes protect their eyes from sand and dust, making them perfect for working in dusty environments such as data centers."
You asked it to write an essay on a nonsensical premise so you got a nonsensical essay. If you just ask chatGPT ‘what is the value of camels in cybersecurity’ it doesn’t see that as a cue to play:
> I'm not sure what you mean by "the value of camels in cybersecurity." Camels are a type of mammal that are commonly used for transportation and as a source of food and other resources in certain parts of the world. They do not have any direct connection to cybersecurity.
A human answer would say camels have very little to do with data security. A Google search would show you a link to Apache Camel. ChatGPT tells you how amazing a camel’s humps are for data storage.
You bring up a great point that how you ask the question changes how the system responds. An intelligent being such as yourself would see the two as asking the same thing.
But in fact I suspect that ChatGPT was designed to be playful, as well as helpful. I think the human based reinforcement phase they put it through to tune the chat output was probably used to encourage some of these creative generative responses where it takes a ludicrous idea and proceeds to execute on it.
ChatGPT was not designed to be either playful or helpful; it was designed to pick a likely next word, and repeat.
You don't think OpenAI thought putting it out for public access might serve some sort of purpose? They just created a technology based on an arbitrary algorithm without any thought as to what it might actually produce?
It absolutely is trained on loads of "playful" data
Otherwise known as a "sense of humor".
> A human answer would say camels have very little to do with data security.
Correction, a human without a sense of humor would say that camels have very little to do with data security.
Edit: If I'm getting downvotes, I might as well lean into them and make it worth my while...
Otherwise known as a "sense of humor". Something that ChatGPT seems to have that you don't. Oh, snap!
Its easy to see in practice with AI. Look at AI generated images and how they just cant get human hands right. Everyone knows a human should have 4 fingers and a thumb on each hand. But the AI does not understand that it is drawing a hand with those parameters. It is making something that sort of looks like other things it has been trained with.
The same is true for text generation. A common description for Chat GPT I've seen catch on is calling it "confidently wrong". It will string together words that sound good but it doesn't actually have any understanding behind them. Its logic is in stringing words together. This is more akin to a good con man faking knowledge than an expert explaining a topic.
No matter how amazing software gets, there's always someone out there reminding us that it's inadequate because it is not yet perfect.
If you want to avoid overselling, call it ML. The machine has learned to reproduce and combine patterns in images.
Now there's a lot of AI results coming up and it's not performing much better in that regard either though.
You can see that some people have a fundamental misunderstanding about how bicycles work, and this misunderstanding does not stem from the complex 3D shape of bicycles.
Don't you think drawing a hand is a bit of an unfair "gotcha" for an AI that can accurately draw thousands of other things? Humans have hands and they're one of the most important parts of the body; humans use hands for everything. And yet, all but the most artistic humans cannot draw human hands accurately. An AI doesn't have hands. Hands make up a small part of its training data. Why should we consider the hand the standard upon which to judge the quality of the AI?
I guess it would take someone who can look at and interpret the code to figure it out.
We use hands, recognize hands, some of us can draw or sculpt accurate representations of hands, we can talk about aspects of hands such as grip and fingerlength. All of these things encompass the idea and physical instance of "hand" in much more than just the artistic sense.
Also, we certainly know when we draw a hand poorly. Unless we are children, or have a brain condition. And if we are children or have a brain condition other adults wouldn't say that we have a good understanding of what a hand is.
Perhaps the AI knows it's drawing hands poorly. Perhaps it can't communicate this to us. Even so this should be in the code run somewhere.
> "while simultaneously saying the AI draws hands poorly because it doesn't understand hands?"
For two reasons:
1) The AI is just drawing. Even humans draw many things that they don't understand. If we draw something that we completely don't understand (such as through random scribbling) we don't even call it a representation. It's a fluke. I used to scribble and then trace images in my scribbling (if possible). Often I ended up tracing things that looked like a child's bad drawing of Donald Duck, but once, without having to trace particular lines at all, my scribbling was a perfect seeming of a rose flower (with some minor additional flourishes). I recognized the rose flower, but I certainly didn't set out to draw it.
2) I'm assuming the AI wasn't trained on medical and other information pertaining to a hand, the way humans are (even if the training isn't formal). It is trained on images. At best it is trained on images of discrete parts of the body, but this only allows it to understand the shape and relative position of each body part, at best. Not to understand the body part.
Ultimately it would take something looking at the code run to determine whether or not the AI brings an understanding of the concept of "hand" into the literal picture. If all it's bringing is #1 then it's not an understanding of "hand".
Sure, but the AI doesn't use hands so why would hands be of particular importance to the algorithm? Let's say the algorithm did draw perfect hands every time. Would you accept that it has understanding then? I doubt it. Your argument is essentially a slippery slope: no matter what it can do, you'd find something it didn't excel at and say "See, it has no understanding."
> Also, we certainly know when we draw a hand poorly. Unless we are children, or have a brain condition. And if we are children or have a brain condition other adults wouldn't say that we have a good understanding of what a hand is.
I've seen plenty of people claim they produce great art, when in reality it's terrible.
This "hand understanding" could probably be simplified for an artistic concept of "hand".
I believe a computer "understands" basic arithmetic on a non-conscious level. I haven't been convinced it understands the human hand, even in an artistic sense.
> "I've seen plenty of people claim they produce great art, when in reality it's terrible."
Yes. But other people typically don't go around saying that those people understand great art. Understand what makes it great.
As for the bikes, I'm not surprised most people cant recall the exact shape or design of a mechanical object they don't use or work on every day. I'm sure a hobby cyclist could accurately lay out one.
Is it? Humans understand that humans have hands, yet most are incapable of drawing a hand well. The AI also clearly understands that humans have hands. It does put something hand-like where a hand typically goes. But, like most humans, the AI is not good at accurately reproducing a hand.
What is the definition of "understanding on an intellectual level?" I have a friend who is an artist. She is doing a series of paintings with figures. She cannot paint hands to save her life. The hands she paints look very similar to the eerie vignettes produced by Stable Diffusion. I fail to see what makes the hands my friend paints somehow superior on an "intellectual level."
Edit, to add some context. How many of the hands your friend paints will look like the ones in this link?
I think you're getting way too hung up on the number of fingers. It's clear that the algorithm could do a better job with hands given more training data. There is nothing special about hands. So, if you're right that it doesn't understand hands, then this lack of understanding stems purely from a lack of training data. In which case, I think it's fair to say that it understands the other things it has sufficient training data to draw well.
This is like the joke about training a neural net on arithmetic where you get the wrong answer repeatedly until it remembers to answer 5+5=10. but then, until there are more data, 10+5 is also 10, because it didn’t actually understand arithmetic (to be fair, a human wouldn’t understand it by a single example either).
And you can see this in action by making ChatGPT believe 5+3=7.
ChatGPT manipulates symbols, and it captured the rules to do that very well. That is one of the abilities of general intelligence, but it’s not the only criterion for intelligence. You can do more than just manipulating symbols, you can also abstract over them, form your own thoughts and questions about them, be curious, reflect on your reasoning and explain it, deduct patterns from very few data because you have all the context from your previous knowledge and the abstractions built on top of it. Besides of the abstractions and functions programmed into it ChatGPT only has probabilities of symbols being related to other symbols (and the rules implied by that), but it cannot reason about these symbols and cannot form creative thought. Its „intelligence“ is limited to a finite order/level of abstraction (the features and parameters that define the model and allow it, for example, to capture shading and geometry, but not the concept of a human hand) while yours is basically limitless. You can always put another abstraction on top of what you just thought or experienced. The magic of deep learning was basically increasing the order of abstraction a neural net can capture, but it’s still limited.
On the other hand, I have my pet theory that the weirdness of dreams or psychedelics arises from the brain basically sampling the connections in the brain / piping random noise through its neural net (as a side effect of all the reorganization it’s doing).
Yet it correctly understands that faces typically have two eyes, one mouth, one nose, etc. So clearly this "lack of understanding that hands have five fingers" is unlikely to be inherent to the model.
Let's say I ask you to draw a lady bug. You'll draw a red shell with some black dots haphazardly strewn about. However, the most common lady bug in Europe always has 7 spots. It's unlikely that your drawing will reflect that. Why? Because you lack understanding of Coccinella septempunctata. But does that you mean you lack understanding in general? Of course not. Lady bugs simply aren't important to you.
So again, why are we elevating hands to be the litmus test of understanding? Yes, hands are important to humans. But this algorithm is not a human, so hands are no more important to it than anything else it can do. Like let's say if could draw perfect hands 100% of the time. Does that mean you would concede that it has understanding? I doubt it. You'd pick some other thing it didn't do well and say "See, it can't accurately draw eggs stacked in a pyramid, therefore it lacks understanding." The issue with your argument is that is a slippery slope without a specific reason why the correct rendering of hands is important.
And I'm not arguing that GPT-3 or Stable Diffusion are omnipotent. Clearly they're not. But that doesn't mean that can't understand things in their domain. As others have mentioned in adjacent comments, the only test we have for understanding, in humans or ML models, is measuring the correctness of an output for a given input. Essentially, your argument is that "It's an algorithm, it can't understand like a human," which is begging the question.
I'm not claiming that ChatGPT or any other ML algorithm is "generally intelligent." Just that it has an understanding of certain concepts.
It is, we just don’t know it’s exact features. It might very well be optimized for recognizing faces (and therefore to identify the features that make up a face). A general AI doesn’t have to be retrained on specifics. Sure, you „can“ (in a very generous hypothetical sense of the word) train a model like this on all pictures and movies in existence and then some, so that it has seen everything and never fails to give the wrong answer for any prompt that only involves things that were depicted at some point in time. You don’t have to show a child all hands on the planet for it to recognize hands have 5 fingers. You don’t even have to show children pictures of every body part once for every skin color. They only need to see 1-2 different skin colors once to make the deduction that every body part can come in different skin colors. That‘s understanding, a general intelligence. Try this with a model like ChatGPT and you get a racist model.
>And I'm not arguing that GPT-3 or Stable Diffusion are omnipotent. Clearly they're not. But that doesn't mean that can't understand things in their domain. As others have mentioned in adjacent comments, the only test we have for understanding, in humans or ML models, is measuring the correctness of an output for a given input. Essentially, your argument is that "It's an algorithm, it can't understand like a human," which is begging the question.
I'm not claiming that ChatGPT or any other ML algorithm is "generally intelligent." Just that it has an understanding of certain concepts.
We can also inspect the model. And even the ouputs are obviously different from what a human would be able to output, so it fails even that test.
If you argue that’s still intelligence, just on a lower level, you can absolutely do that. But at that point you’re basically saying everything is intelligent/conscious just on varying levels. In the sense that consciousness is what consciousness does. Which is a stance I generally agree with, but it’s also unfalsifiable and therefore meaningless in a scientific discussion.
ChatGPT knows that hands have five fingers because it's trained on text and text will almost always say that.
DALL-E doesn't know that hands have five fingers, it just knows what hands generally look like, and the number of fingers on a hand is just one of the elements it tries to match. At a glance, there are far more important elements of what a hand looks like, such as the shading.
Neither of these mean either AI is stupid. DALL-E doesn't have a concept of what a hand is, it just has an idea of what is looks like, and it's decent at recreating that.
Children do all the time, and I'm pretty sure children have a rudimentary understanding of hands.
One understands the problem space and used logical reasoning to produce a solution, the other put the problem into Google, hit "I'm feeling lucky" and then parroted the first result right or wrong (or an aggregate of that).
> How else can you verify an entity's intelligence other than its "output"?
I'd argue that's actually a good test. I'd also point out that that is exactly where these solutions let themselves down, because they make [to us] obvious errors and don't understand that they did. That's why you know they are smart sounding rather than actually smart.
> we can't see inside an AI's brain.
We absolutely can see inside a deep learning model's structure.
Sounds a lot like most college essays I've ever read.
A couple of things:
1) College students are in the act of learning, so what they are attempting to do is ideally just at their limit.
2) College students aren't intelligences focused on a single kind of output.
Microsoft et al. want you to think of it as if it was somehow similar to a human because it is convenient to them (“aw but look it’s learning!”), though of course in reality if it was even remotely equal to a human the whole controversy would be gone as they wouldn’t be able to do things like using it as they do without its informed consent and fair compensation, think how ludicrous this idea is and there’s your answer.
Something is either a copyright violation or not. You can't "launder" a creative work. It's either a copy or it isn't. It's either something that is a direct replacement for something else in the market or it is not.
If I write a book and you buy the book, read the book, sell the book, write another book about the ideas in the book, you can legally sell the new book I wrote. (Seldon v Baker)
Ideas are not covered by copyright. Ideas are fungible. Creative works are non-fungible.
Money laundering is dependent on money being fungible so the source can be hidden. Hiding the source of a copyright is incoherent.
A "copyright laundering tool" is incoherent.
Reread my comment. If you read my book and write your own it’s OK, if a megacorp provides a tool that ingests my book and automatically writes this book for you (and charges you, and does not pay or credit me) it’s not.
Why is it not OK? In addition to generating a derivative work based on my book (not on ideas in it, literally on my book, because again this tool is not conscious and does not operate ideas, only the letters), which depending on the license can be illegal, it also kills incentives for people to publish books at all—because now OpenAI can buy one copy of any book, feed it to its ML tool, then charge people for using it and no one else ever needs to buy this book and give any credit to its author. Implications for open information sharing like blogging and such go without mentioning.
1.) [person does x], it's OK,
2.) if a megacorp [does x], it's not.
Let's just be clear that the content of the argument doesn't matter and that most people in this conversation are guilty of base populism.
In all likelihood tools like Stable Diffusion and ChatGPT will be found to copying works in fair use.
The primary argument in favor of fair use is the substantial amount of non-infringing uses for tools like Stable Diffusion and ChatGPT (Sony v. Universal). This does not mean that tools like this are not capable of producing works that infringe just that the liability will be on the person choosing how to use the tool and choosing what to do with what the tool produces. By the time this goes to trial there will be entire industries built on significant non-infringing use of large language models that have nothing to do with the scope of the original copyrighted work and this will have a large impact on considerations of fair use!
> substantial amount of non-infringing uses
Your uses may or may not be infringing, but you can use something that infringes to create something that does not infringe. The infringement is at least done by OpenAI etc. when creating the tool you use, even if your use of the tool does not infringe.
I can train my own Stable Diffusion. So can you. So can everyone! And we can do so in the privacy of our own private properties and without anyone knowing what we’re doing with all of the images coming through our connections to our ISPs.
No one would really know how if anyone was creating images with or without Stable Diffusion. People would need to bring an image that is significant evidence of copyright infringement in order to get a court to issue a warrant to search for Stable Diffusion in a certain premises… but they can’t because the suspected image doesn’t look anything like the plaintiff’s to begin with.
One of us is running into a brick wall in this conversation and that brick wall is “has read related case and statutory law”.
What I’ve been doing is pretending we’re having an objective conversation about copyright so logically I’m treating this closer to a mock trial. I’m not actually sure what you’re doing but I would recommend referencing, you know, the extensive evidence you have to support claims of what is or isn’t infringing.
This is not something that our courts could ever decide. How would they know if I did not use a LLM tool to assist in writing my new book?
> Why it’s not OK? In addition to generating a derivative work based on my book (not on ideas in mine, literally on mine, because this tool is not conscious and does not operate ideas, only the letters), which is already obviously illegal
That's not what constitutes a derivative work in the sense of copyright infringement. A derivative work is still a derivative work no matter how it was made. However, it is perfectly legal to say, go to Disney's website, download a JPEG, convert that JPEG to 1s and 0s, print just a bunch of 1s and 0s and not the image, just like a printing press made up of just  and  character blocks, and sell that. Yes, the 1s and 0s are mathematically derived from the image but the image of 1s and 0s is not a visual derivative of the Disney image. That is, no one is going to buy a t-shirt of 1s and 0s instead of a Mickey t-shirt. Anyone can go to the Disney website and get those same 1s and 0s.
> it also kills incentives for people to publish books at all—because now OpenAPI can buy one copy of any book, feed it to its ML tool, then charge people for using it and no one else ever needs to buy this book and give any credit to its author. Implications for open information sharing like blogging and such go without mentioning.
There are lots of non-infringing uses for such a tool in the marketplace and this will factor into fair use. (Sony v Universal)
The law does not concern itself strictly with "exact copies". You are
missing the essential concepts of "substantial similarity" and
"representation". It Britain at least, it has been found a photograph
of a famous painting does not obtain its own copyright. Proving the
opposite; one cannot take a .wav file, convert it to an .mp3 and claim
that because the representations are totally different the "new work"
doesn't violate the original.
The OP you are arguing with turns a useful phrase with "copyright
laundry", because it's clear that GPT tools can and are being used for
Trying to move the argument to a higher level, I believe this legal
argument will ultimately be solved by a technical matter about
Everyone talks about the training data. Nobody talks about the noise.
But that is the KEY (literally).
Because there is nothing non-deterministic going on, but each run
seems to create something "new" a clearer interpretation of the
system-signal relation is needed. With exactly the same initial
conditions we can treat the seed noise as a decryption key. The law
will very probably eventually settle on an interpretation of large
models as an encoding mechanism whose seed noise is the other half of
a pair, which together constitute a transformation/transcoding of
the original source material.
The onus is going to fall on the creator to show how their "creative
process" is substantially different from transcoding a jpeg to a png
through a set of filters to obscure the source.
But let's talk about what is different.
It is not a copyright infringement I go to Disney's website, download a JPEG, convert that JPEG to 1s and 0s, print just a bunch of 1s and 0s and not the image and not ascii art of the image, just like a printing press made up of just  and  character blocks, and sell that. Yes, the 1s and 0s are mathematically derived from the image but the image of 1s and 0s is not a visual derivative of the Disney image. That is, no one is going to buy a t-shirt of 1s and 0s instead of a Mickey t-shirt. Anyone can go to the Disney website and get those same 1s and 0s.
This is different than the case where you're talking about a .wav and an .mp3, right?
> Trying to move the argument to a higher level, I believe this legal argument will ultimately be solved by a technical matter about noise.
> Everyone talks about the training data. Nobody talks about the noise. But that is the KEY (literally).
> Because there is nothing non-deterministic going on, but each run seems to create something "new" a clearer interpretation of the system-signal relation is needed. With exactly the same initial conditions we can treat the seed noise as a decryption key. The law will very probably eventually settle on an interpretation of large models as an encoding mechanism whose seed noise is the other half of a pair, which together constitute a transformation/transcoding of the original source material.
This is much more interesting than just copyright as this has to do with authorship itself!
The combination of the seed and the text inputs in either ChatGPT or Stable Diffusion result in a deterministic output. This means that these are facts. Who owns the output of a seed of "1" and a text input of "astronaut riding a horse"? Who owns what is able to be seen at a certain slice of the sky?
Someone could still be found to be infringing on copyrights when they slap certain things on t-shirts that happen to have a deterministic seed and prompt, but copyright is a separate concern from authorship.
If someone took a picture of a certain stellar formation somewhere that looked sort of like a cartoon head and then they put a little cartoon speech bubble and made an illustrated novel based on such characters... and then if I take my own pictures of those same stellar formations and I publish a book of astrophotograph... or then if I make my own illustrated novel with different characters but using the same stellar formations... or then however close that line can currently be toed with photographs of stellar formations is where we must be with photographs of seed/prompt formations.
> The onus is going to fall on the creator to show how their "creative process" is substantially different from transcoding a jpeg to a png through a set of filters to obscure the source.
I'm not familiar with any legal doctrine that follows this logic so I think any lawyers are going to have their defendants focus on establishing useful interpretations of their actions for the court.
As for how interpretations are finally arrived at, I'd just say that
the law has an uncanny way of configuring itself to suit the powers of
the day. :)
Sounds smart: pasting words together with little to no sense about whether the result is insensible or incorrect.
Smart: Actually having ideas, expressing them, and being able recognize errors with that process.
That's a problem for the future, not now. We're talking about something like ChatGPT, not some hypothetical strong AGI.
And what we have now spits out error-ridden nonsense that "sounds smart," and fantastic sci-fi covers were the people accidentally have 14 fingers or two legs and three feet.
Humans will say things such as "doohickey", "you know, this thing here", or "what if there was something that did this" when they know the concept but don't have the words. https://www.girlgeniusonline.com/comic.php?date=20121121
In the reverse, what's smart about the "turbo encabulator" lecture is the form, not the content. And the person giving the lecture absolutely knows this.
ChatGPT can form a lecture, and it can form it with correct information. But until it can form it with concepts that it doesn't have words for I don't see how you can say it "is smart".
See also https://en.wikipedia.org/wiki/Hyperlexia
At present, GPT3 and the rest of the language models generate bullshit: they have no understanding or care of whether the words they put together are true or not. We will need to see if future work can overcome that problem.
To me the difference comes down to the answer to the question: is the output useful or not?
Let's say I ask AI questions I already know the answer to. I'll use an example from this thread: Are camels good at cybersecurity?
If the AI answers no, I already knew that. I get no new useful information about camels out of this interaction.
If the AI answers yes, I know it's wrong. But again, there's no new useful about camels there for me. (yes I now know the AI stinks but that's not my point, my point is I'm asking about camels)
So asking AI questions you already know the answer to is pointless. The usefulness (and the goal) of AI is to ask it questions you don't know the answers to, to uncover new truths and insights.
If I don't know the answer to a question, how am I supposed to trust that the AI got the answer right? Especially when AI is demonstrating over and over again that it gets the answers wrong. It's clear that AI can't reason, and its results are unreliable to anyone who doesn't know the answer already.
So from a practical standpoint, these AIs can't tell you anything you don't already know. It can't be relied upon to teach/tell us new things. So it's an unreliable source of information and should not be relied upon to determine truths. Interestingly formatted, low quality information. All noise little signal.
This to me is the difference between "sounds smart" and "is smart". It's about whether or not you can trusting the quality of the output to be accurate and correct.
I've met plenty of humans that "sound smart". We call these people bullshitters. AI is at the well spoken BS stage. At best it can smooth talk uninformed people into believing falsehoods. In the age of misinformation, that makes it dangerous in my opinion. God help us if one of them ends up running for office, as it will out-BS all human politicians.
It's Artificial Intelligence after all. If the AI output is unreliable, then the AI isn't intelligent.
"sounds smart" is using the vocabulary in a specific way that gives some people the impression of smartness. "being smart" is more like transporting a complex idea to the brains of other people, your words and sentences need to align quite well for this to work out. I have not seen such a thing from AI yet.
Being smart is not just about knowing things, it is about being able to use that knowledge. You don't have to look inside someone's head to learn how well they can do that.
This really does encapsulate the average Hackernews commenter
That's debatable, and it's not necessarily true. Just the other day, we saw a story here (https://news.ycombinator.com/item?id=34474043) that suggested that a language model trained on Othello game transcripts did in fact build a latent-space "mental model" of the board state. By perturbating the latent space, the researchers could argue that the model wasn't simply parroting memorized game sequences.
Of course, this isn't to say that AI plagiarism is impossible or even unlikely. Plagiarism is a shortcut to acceptable quality output, and AI systems excel at finding shortcuts to acceptable output.
As for how this relates to plagiarism... well, it doesn't really. Someone can "seem smart" and also not be guilty of plagiarism. Someone can "be smart" and be guilty of plagiarism.
I've learned from personal experience that starting a reply with "You're wrong" is an anti-pattern most of the time. Instead, framing additional context/knowledge constructively can enrich conversation rather than shutting it down (and making things unnecessarily combative).
But to your point: We're both saying the same thing, I'm just being reductive on purpose.
Only rarely does that produce useful original thoughts, but I doubt that's any different from humans, really? The difference is that humans can second-guess themselves.
Google and Facebook of course do offer "ML-based conversion optimization", but it's widely regarded as hoopla by ad buyers.
The clickbait article is a formulaic format.
LOL, give me a break. No respectable technology professional considers CNET the bastion of tech journalism. Even before I knew that their articles are written by AI I never would rely on CNET as any kind of authority on matters technical.
Most 'pure' tech journalism does not last long as the market for it is too niche. BYTE magazine was dead before the Internet. The UK-based "Net" magazine (webdev-based) lasted until 2016. PC World is still kicking but runs on a skeleton crew and is online-only.
CNET has changed formats many times, and the well-known editors who used to make it worthwhile moved on years ago. That it's trundling on with computer-generated articles is I guess a kind of ironic curtain call.
Until 2020 it was owned by a $15 billion company (Paramount) and is now owned by a smaller media group also worth billions. I wouldn't really call that independent.
objective and balanced information about what happens in tech was always a problem as those sites and entities "never bite the hand that feeds them" but it seems it gets harder by the day as we drown in an ocean of fakery
But CNET is just the canary in the coal mine; I haven't gone there for tech coverage in over a decade because of the bland writing and the lack of clear editorial voice. The problem for me will be when the entire internet reads like this.
I partially blame the iPhone for this, it killed the idea of single-purpose gadgets like MP3 players and handheld consoles which usually had their own quirks. With those products gone (and CES looking much more barren), the articles became more about stuff like Tesla and Uber, or what app updates Google and Apple were pushing out that week.
As I recall, there were shifts already in place. The general popularity of journalists/analysts/etc. having essentially personal blogs under their parent company brand was already falling out of fashion as companies realized their own brands were being diluted. But CBS was obviously uncomfortable specifically with non-staffers using the parent platform.
They really had two of the most highly coveted URLs on the internet for a long time.
Near the end of the 19th century, all data processing was by human clerks. Tabulating machines, and later computers, allowed these resources to be applied to more productive pursuits.
What's the biggest drain of resources today? At least in the developed nations, a good candidate is lies and lack of integrity in decision-making, taken in its most general definition. Whether you are personally most irritated by its presence in government or private industry, it's obvious that this is widespread.
Does AI look like it is going to reduce this waste? No, it looks like it's going to add to it.
No, it will be easier to detect, because the new AIs will learn to mimic earlier generations of their AI - and their faults - rather than mimicking human writing. Because that will score higher on a loss function.
I'm going to assume Dead Internet theory - aka most text online is spambots. It's not strictly true, as humans are obstinately continuing to use the Internet and pouring content into them. But the Internet puts those humans on a level playing field with bots. And AI is basically perfect spambot material - it looks superficially different, meaning you can't remove it with simple patterns. So the training set, which is just scraped off the Internet, will be primarily composed of AI-generated material by less-sophisticated systems. Human-written content will be in the minority of the training set, meaning that no matter how good the optimizer statistics or model architecture is, the system will be spending most of its time remembering how to sound like a bad AI.
We need AI discriminators that analyze content and filter out stuff that is poorly written, cliche, trite and derivative.
I’m used to seeing papers show pairs as evidence that their models aren’t copying: generated images and the closest images in the training set. But that wouldn’t catch this kind of rephrase-and-mashup pattern’s visual analog. Has anybody looked at that closely?
What people call AI today is in fact not AI. It is a model that has been trained on input data, and generates prompted output based on that input.
In other words THEY ARE AUTOMATED PLAGIARISM MACHINES. THAT IS HOW ALL "AI" today work.
Computer generated articles are routine for boilerplate like financial summaries, sports results and police reports. But now creeping into more substantive articles.
All these morons hunting with a bone to pick are going to be the reason why language is going to be so obtuse for so many.
Fortune, one of the "victims" of this "plagiarism", makes its money by subscribing to the New York Times et. al., and paraphrasing articles and placing ads next to the paraphrases.
Journalism will survive. Journalism will always survive.
Shitty content mill “journalists” should probably get the AI to pad out their resumes for them.