How can one create content that only humans can process (read, save, edit, reply), avoiding AI scraping and analysis? In an age of advanced computer vision and audio processing, what methods could ensure information remains obscure to AI but readable by humans?
Pen and paper. You can never ensure that the information remains obscure, but if you keep the original documents away from a computer, and share them via post, they will only ever be seen by human eyes. You can’t ensure that someone won’t scan them at some point though.
States and corporations are investing billions of dollars into creating the infrastructure to train on everything, everywhere, all the time. As AI on the web runs out of data to train on, the need for organic training data will necessitate merging AI with global surveillance capitalism and the panipticon, and training on the real world. You're not going to be able to just opt out.
You may not be recording your conversations but someone will be. Everything you write will be scanned, and every camera will be training an AI on everything it sees.
That’s an interesting question, I had never really considered….
There’s probably no definitle 100% safe way, but one possibility might be to exploit some quick of human perception (kind of like what optional illusions do), but of course there’s no guarantees that a sufficiently advanced future AI won’t be able to read it (on enslave humans to translate it).
This requires cooperation of all the humans who process the information. Any human that processes it could transcribe it for an AI. It's like trying to stop corporate information leaks.
Create an encoding system based on various unconventional modifications to physical objects, do not document it anywhere, teach it to a small group of humans who are contractually bound to not spread the knowledge, and keep the knowledge going into the next generation along with the sense of purpose.
Unless the encoding system was miraculously complex and the amount of content produced with it remarkably small, reversing the encoding in order to process the data seems highly plausible, especially if the input to the cipher was typical of human generated content.
I think this is like asking "how do I encode binary that computers can't read?" or "how do I make tangible change that nobody can notice?"
You can't. By thinking, or writing, or speaking or gesturing we generate photographic, textual, audible information that can be parsed in multiple different ways. AI is simple and adaptable, however we fool it today becomes tomorrow's training benchmark.
Besides cipher encryption, I really don't think there are any ways to guarantee that AI cannot understand you. Most methods end up ensuring that humans can't understand you either.
I suppose you could have said the same thing about the invention of books, that would share the dangerous knowledge of kings and sages with the unstable and unwashed masses.
Humans had natural information barriers. A foreign power invading a secret library would need significant time and resources to process its contents, often missing crucial details. This created an asymmetry in knowledge and data processing capabilities.
Today, AI can analyze extreme amounts of data, eliminating this asymmetry and creating a gap with our own capability. How can we maintain some level of 'information obscurity' or processing advantage against the AI? Are there any methods that remain challenging for AI to interpret but are accessible to humans?
I was thinking more like creating "DRM-like" content in obscure copy that might withstand for a while ... shrug
I just don't understand the question, I guess. By sharing something online you are removing its information barriers almost entirely. That means both humans and AI have plausible and complete access to what you've made. If you don't want people or AI to have access to your content, then don't put it on the internet. I don't think you can have your cake and eat it too, there is no "stop humans from reading about me online" button and certainly no such thing exists for AI either.
So what I've done is remove my works from the open web and put it behind a login wall. If you want an account, I have to be certain that you're a human being and that you will not proceed to put my work somewhere where it can be scraped by an AI bot.
Which, in practice, means that I have to personally know you. I don't know of any other solution to this problem at this time.
Such schemes likely exist, but you would be hard pressed to find any system that wouldn’t also be burdensome for real humans to deal with. This is basically the same thing as CAPTCHA.
I’m curious if you’re more concerned about the corpus of your work being used in the training of a model, or, parts of your work being analyzed by an existing model? For the former, I suspect copyright laws will eventually come around that afford some protection. But as for having something summarize your work, I sense that no copyright laws would be made for that.
We seem to have read different chains of thought in this thread. The topic was "is it possible to encode things so that humans can understand but machines can't"?
Talldayo said "I really don't think there are any ways to guarantee that AI cannot understand you."
neon_me responded: "hence, we are doomed"
Talldayo responded "I suppose you could have said the same thing about the invention of books"
My comment was in response to that. Talldayo's response seemed a nonsequitor to me because books are providing data to the reader, not trying to collect and understand data from the reader.
Yeah, computers tend to hear differently than humans do, so you can cater some sort of message to them that humans won't process but they will. See https://www.mdpi.com/2079-9292/12/8/1928 as an example.
There have also been attacks on computer vision systems (like in cars) that can make them suddenly brake or misidentify a lane or street marker, etc., but in ways not obvious to humans: https://adversarial-designs.shop/blogs/blog/adversarial-patc... or to fool facial recognition systems into thinking you're someone else (in a way that won't fool anyone who actually knows you).
And more broadly, any sort of obscured malware tries to deliver a malicious payload while pretending to be something else to the human who runs it.
As species, we "seem unable" to develop any kind of information that would remain exclusively ours, especially when faced with a potential rival that processes data exponentially faster and with greater precision than we do.
Off the top of my head, not especially. The problem of rivals that can process data faster and with greater precision is a problem that existed before AI was used/commoditized.
I suppose I am concerned about AI using information to specifically target vast numbers of people at scale, based on their psychological traits/desires/vulnerabilities. Esp for political, psyops, dark pattern marketing, etc.
That's not really a AI-specific problem though. Any sort of hostile codebreaker is the same sort of threat.
What sort of information would you want to protect from AI but not other humans? If it's a secret, isn't it a matter of who gets to see it, not necessarily what? I'd sooner trust "our AI" than "their human".
I look for different platform/format of data than inveting language - because cracking the language/code/cipher is far easier for machines than humans.
This doesn't follow a read, save, edit, reply notion but I have seen Nightshade for images. I am unsure of its effectiveness. I heard an art friend mention it when Mid journey got popular last year. The term they use is that the data being shown or uploaded should be poisoned.
Hardcopies are usually written and shipped to print in electronic versions + there is no guarantee that someone will not take photos/scans of "dead trees" ... so this option is sadly - out of the table.
they tend to be on a separate network. but the moment you access them, presumably a sufficiently advanced ai can watch the flicker in the lights or measure the temperature changes in the room or watch the colour on your face and have a good guess as to what your orders were.
What rank prejudice is this? How will our children, meat or electrical, ever grow if we refuse to teach them?
I suspect the real issue you wish to address might be expressed better. Perhaps "How can we ensure that large AI companies haven't got favorable intellectual property rights over individual's output?"