Hacker News new | past | comments | ask | show | jobs | submit login
Spotting LLMs with Binoculars: Zero-Shot Detection of Machine-Generated Text (arxiv.org)
161 points by victormustar on Jan 23, 2024 | hide | past | favorite | 99 comments



I can talk a lot about this, since this is the space I've spent a lot in experimenting. All I will say is that all these detectors (a) create a ton of false-positives, and (b) are incredibly easy to bypass if you know what you are doing.

As an example, one method that I found that works extremely well is to simply rewrite the article section by section with instructions that require to mimic the writing style of an arbitrary block of human written text.

This works a lot better than (as an example) asking to write in a specific style. Like, if I just say something along the lines of "write in a casual style that conveys lightheartedness towards the topic" is not going to work as good as simply saying "rewrite mimicking the style in which the following text block is written X" (where X is an example of a block of human written text).

There are some silly things that will (a) trigger human written text to be detected as a AI and (b) that allow to avoid AI detection, e.g. using broad dictionary tends to trigger AI bots to detect the text as written by AI. So if you are using Grammarly to "improve your writing", then don't be surprised if it gets flagged. The inverse is true too. If you some statistical analyzes to replace less common expressions with more common expressions, AI-text is less likely to be detected as AI.

If someone is interested, I can talk a lot more about hundreds of experiments I've done by now.


> I can talk a lot about this, since this is the space I've spent a lot in experimenting.

So I'm a researcher in vision generation and haven't read too much about LLM detection but am aware of the error rates you mention. I have questions...

What I'm absolutely surprised by is the use of perplexity for detection. Why would you target perplexity? LMs are minimizing NLL/entropy. Then instruct based models are even more tuning in that direction such that the you're minimizing the cross-entropy as compared to human output (or at least human desired output). Which makes it obvious that it would flag generic or common patterns as AI generated. But I'm just absolutely baffled that this is the main metric being used, and in the case of this paper, the only metric. It also gives a very easy way to fool these detectors since it would suggest just throwing in a random word or spelling mistakes would throw off detection given that such actions clearly increase perplexity. To me this sounds like using a GAN's detector to identify outputs of GANs (the whole training method is about trying to fool the detector!) (Obviously I'm also not buying the zero-shot claim).


Yeah, agreed. In my experience what it’s ended up detecting is very crappy human written text


I will also add that, at least for now, if you are doing it for SEO, it _really_ doesn't matter. I was planning to make a case study benchmarking my algo against a bunch of other content generators. I was hoping for there to be statistically significant difference, but there was none. So, the thing that matters in the long-run is if the end-users find your content valuable, because that's how ultimately Google will decide whether to send more traffic to your content, rather than trying to detect if it was "AI generated".


I think the value of this is the extremely low false positive rate so it can act as a larger sieve when there is a large amount of inputs to test - What other Binocular style detectors have you experimented against where you're seeing a "ton of false positives"?


I use https://originality.ai/ as the benchmark. I've tested all commercially available services, and Originality (at the time; its been a few months) provided the lowest false-positive rate. As a testing sample, I've built a database of articles written by various text generators and compare them against articles that I scrapped from web from before 2017 (basically any text before LLMs saw daylight).

I am sure that these algorithms have evolved, but given my past experiments, I sincerely doubt that we are at a point that (a) cannot be easily bypassed if you are targeting them, (b) do not create a lot of false-positives.

As stated in another comment, I personally "gave up" on trying to bypass AI detection [it often negatively impacts output quality], at least for my use case, and focus on creating highest-possible value content.

I know that services like Surfer SEO are continuing to actively invest in bypassing all detectors. But... as a human, I do not enjoy their content and that what matters the most.


Just for fun, I just tested a few recently generated articles with https://huggingface.co/spaces/tomg-group-umd/Binoculars (someone linked it in this tread) and it ranked them as "Human-Generated" (which I assume means human written). And... I am not even trying to evade AI detection in my generated content. I was wholeheartedly expecting to fail. Meanwhile, Originality detects AI generated content with 85% confidence, which is ... fair enough.


If I'm reading this correctly, it's not making any particular claim with respect to text labeled human generated. What it's saying is that if it claims the text is machine generated, it's highly likely that it actually is.


The article you're commenting on actually states in its abstract:

> Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data.


Since you say you're knowledgeable on this, here's a question: If you have access to the model, wouldn't it be possible to inspect the sequence of token probabilities for a piece of text and derive from this a probability that the text was produced by that model at a given temperature? It would seem intuitive that the exact token probabilities are model specific and can be used to identify a model from its output given enough data.

I suppose an issue with this might be that an unknown prompt would add a lot of "hidden" information, but you could probably start from a guess or multiple guesses at the prompt.


That's pretty much how most of these methods work. It just doesn't work very well because good models have a reasonable probability of generating lots of different texts. So you don't get very different numbers on AI and Human generated texts. After all the models are trained to learn the probability distribution of exactly Human text.


It can be useful for small-scale verification in academics - TAs and schoolteachers can use it to ensure the assignments and homework submitted were actually worked on. Yes, an incumbent can spend more time and brains on making it look authentic despite using LLMs but you've already gone past a typical tardy student's usage pattern at that point - if she is too lazy to do her homework she can be safely assumed to be too lazy to spend time refining her prompts and weights as well.


I would not want to trust grades, in some cases even decisions about pass or fail, to a system which is prone to false positives.


Agreed, we shouldn't trust the system, but using it as a bloom filter to flag those that should be reviewed manually seems warranted.

If all we're getting is false positives then it can be used to reduce the workload.

If we also get false negatives then we'd be better off using existing techniques (manual or otherwise).


How do you do this manual review? How can a human spot LLM-generated text? The internet is full of horror stories of good students getting failing grades due to false positive LLM detectors where the manual review was cursory at best.


Or you know, assess people fairly face to face.


Which we know to be unfair due to learned biases...


I am curious actually! In general about your experiments, but also about integrating this detection algorithm to wider systems. Did you run any autogpt-like experiments with the AI generated text as a critique? My use case is a bit different (decision-making), so I play with relative plausibility instead of writing style. But I haven't found convincing ways of "converging" quite yet, i.e. benchmarks that don't rely solely on LLMs themselves to give their output.


To clarify, the style experiment I've referenced earlier was just that – an experiment. I did not implement those methods into my software. Instead, I focused on how to eliminate things like 'talking with authority without evidence', 'contradictions', 'talking in extremely abstract concepts', 'conclusions without insights', etc.

If you need a dataset to benchmark against, download any articles from pre 2017. There are a few ready-made datasets floating around the Internet.


Grammarly is used a lot by non-native English speakers translating their papers to English. I wonder how difficult publishing papers would be if AI checks become commonplace in the future.


Please go more ibto details on thos expeeiments!


Is a layman's interpretation of this to state: LLMs tend to perform like aggregated humanity, but any given human will differ. Since all the volume of a high-dimensional sphere is at the edge, almost nobody is like the mean, so the false-positive-rate is low?

It's a clever plan, until the LLMs do some adversarial training....


>It's a clever plan, until the LLMs do some adversarial training....

it's an unwinnable war.


It depends on what you're trying to achieve. For a spam filter that downranks low-effort noise (or equivalently, ranks more interesting stuff higher), it might be useful enough?


This is a super clear explanation!

Perhaps this measurement approximates a human reaction to chatGPT: 'This writing is distinctly indistinct.'


Made me think of how the bomber planes pilot seats were designed to the average human, which meant it fit no human.


Sounds like the bed of Procrustes: https://en.wikipedia.org/wiki/Procrustes#Mythology


Yeah it kind of sucks for people who don't like these systems that efforts to resist them are essentially the same as using GANs to train them.


https://www.smithsonianmag.com/history/what-the-luddites-rea...

> They did not invent a machine to destroy technology, but they knew how to use one. In Yorkshire, they attacked frames with massive sledgehammers they called “Great Enoch,” after a local blacksmith who had manufactured both the hammers and many of the machines they intended to destroy. “Enoch made them,” they declared, “Enoch shall break them.”

... And another reference for the phrase...

https://www.nigeltyas.co.uk/nigel-tyas-news/post/enoch-the-p...

> And here’s the funny thing. The weapons they reached for to wield and smash the machines were sledge hammers made by ... the Taylor brothers of Marsden. This irony was not lost on the Luddites and as they swung ‘Enoch’s hammers’ to damage his hated machines they cried: “Enoch made them, and Enoch shall break them”.


Fighting fire with fire.


Yes, and do we even need to wait? You could ask GPT-4 right now to write something imperfect.

Maybe optimizing for “an amount of imperfections, variability, and subtle but tell-tale clues as to convince an examiner that the writer is human and not a language model.”?

If that’s not enough it could also be asked to read the paper itself and suggest additional countermeasures based on a technical review.


Good summary. I wonder if the "perform like aggregated humanity" behavior comes from using greedy decoding. If you actually sample according to the token probs, it seems like you should still end up "on the edge".


This is very succinct, but can someone reduce more to explain it like i'm five level.




1. How well can it detect if the writer edits it?

2. How well can it detect if the prompter tries to hide it?

3. How well can it detect if people tend to start writing like chatGPT?

I grade a lot of papers and encourage/teach chatGPT use. It is so easy for me to detect poor usage. Quality is still easy to distinguish. Skillful use of these tools is a meaningful skill. In fact, it usually requires the same underlying skill! (Close reading, purposefulness, authenticity, etc)

I love chatGPT because it obviates stuffy academic writing. Who needs it. Be clear and direct, that’s valuable!


In the same way that giving a graphing calculator to students wont make them ace math classes, giving a student an English calculator (GPT4) wont make them ace anything with writing. Laziness will always yield poor results.


> I love chatGPT because it obviates stuffy academic writing. Who needs it. Be clear and direct, that’s valuable!

The goal of a paper, I thought, is thinking and not writing. Without outside help (including the AI), clear and direct thinking is what leads to clear and direct writing. What is being achieved here?


Do you use chatGPT? It’s a dialectic. You think with it. You have to tell it what you want, clearly and directly.

It’s why it’s often not a time saver for writing once you know what you want. But it can help you get going when you don’t. Many other benefits. It does not guarantee good writing, far from it!


> You have to tell it what you want, clearly and directly.

> It’s why it’s often not a time saver for writing once you know what you want. But it can help you get going when you don’t. Many other benefits. It does not guarantee good writing, far from it!

Thanks. I was thinking that I need to know clearly what I want, and if I do, ChatGPT would only slow me down. Your perspective makes much more sense.


Exactly. Perfect grammar, syntax, and essay structure can still yield an unconvincing argument.


Next we'll have editors that can tell us that what we're writing is trite.


LLMs tend to mimic "marketing vague" unless you give them targeted and specific instructions. At which point you're basically subediting an LLM.


According to their demo, their Limitations section of their github repo is AI-generated.

>All AI-generated text detectors aim for accuracy, but none are perfect and can have multiple failure modes (e.g., Binoculars is more proficient in detecting English language text compared to other languages). This implementation is for academic purposes only and should not be considered as a consumer product. We also strongly caution against using Binoculars (or any detector) without human supervision.



The paper itself goes into detail about why the US Constitution and other memorized texts are misclassified. It’s surprising but not a killer flaw, since in most contexts it would only apply to direct quotes of famous texts.


It is what you get when you ask ChatGPT for the US constitution, so yeah, AI generated.

This effect is described in the article. And depending on the context, it can be a feature rather than a bug. If you are using an LLM detector to check if a news article or student essay is "legit", then not only you don't want something from a LLM, but you don't want copy-paste plagiarism either. So for the purpose of checking legitimacy of supposedly original work, then it is a desirable kind of false positive.

I suppose simpler techniques can then be use to check for verbatim copies of famous text.


I just added another tweet with the caveat others mentioned. (It was not an intentional omission)

It’s still not a nuance that most people trying to identify AI will respect even if they know it. Given that constraint, I really doubt the accuracy metrics as well.


Well that is not a novel text, is it?


I'm not convinced that we're on the right path in detecting ai generated content.

We've been looking at the end result and making conclusions about the journey - and that will always comes with degrees of uncertainty. A false positive rate of 0.01% now probably will not be applicable as people adapt and grow alongside ai content.

I wonder if anyone's working on software that documents the journey of the output similar to like git commits, such that we can analyze both the metadata (journey) & output (end result) to determine human authenticity.


Edit history is an awesome learning data (as they show the train of thought), but I can imagine that models will easily learn to generate it.


I write start to finish, and rarely edit. I used to write essays in pen during exams and hand in the first version. My teachers despaired, and I countered that my grades were far above average, so why would I do it differently?

If you use edit history, you'll get lots of false positives.


My team is trying to get to that, albeit are very early and hardly integrated enough into training processes/scaled-inference

https://huggingface.co/spaces/EQTYLab/lineage-explorer


I have thousands of short hand style notations that I've written over the years. Recently, I've been having GPT rewrite them. I first provide GPT with approximately 8 kB worth of my longform writing, and then ask it to rewrite the short hand using similar diction and style.

Concerned about this issue I would also run the corresponding outputs through any LLM detection programs I could find (ZeroGPT, etc). None of the outputs have ever been detected as being machine generated.


The false positive rate would kill most use cases here. Even 1/10000 false accusations of academic integrity would be too much.


It's still a good heuristic. If you say have a search engine facing a flood of AI generated mush, would it be too disrespectful to a couple tabloids that use genuine third world labor to downrank 0.01% of them for being probably AI generated spam?

Humans shouldn't use simple heuristics like this to make such serious accusations either way.


Do you really think professors flag plagiarism only when it's cut and dry? Absolutely not. Plenty, if not most, flagged cases of plagiarism are ambiguous. The process at most colleges and universities typically accounts for this via something like review by an academic integrity committee.


Agree. I was accused twice of plagiarism by professors in college. Needless to say there was zero plagiarism involved. In both cases I wrote code that they didn't think it was likely that an Nth year college student would know how to write. In both cases it didn't go anywhere. One because they talked to me about the work and it was clear I knew my stuff, the second because the professor was certain I had cheated somehow but couldn't prove anything despite his attempts to find the code he insisted I must have found online. I'd say the existing process is already higher than 1/10000 false positive!


>> Do you really think professors flag plagiarism only when it's cut and dry?

The problem with computer systems flagging things is they are taken as truth.

Another comment on another post perfectly states this human behavior: https://news.ycombinator.com/item?id=39118716


Is it bad if it's an accusation?

Wouldn't it need additional data, such as actual proof, to become an allegation or even a claim or charge?


>> Wouldn't it need additional data, such as actual proof, to become an allegation or even a claim or charge?

What the computer surfaces as an accusation becomes fact by the users of the computer.

Even when the consequences are serious, this happens, an example from yesterday: https://www.nbcnews.com/news/us-news/man-says-ai-facial-reco...

Why should we assume this would go any differently when the consequences are not as serious?


Judging by the number of people who seem to blindly trust what ChatGPT says, yes it’s bad.

A false positive rate of 1/10000 would be almost the worst case in fact: truthy enough that people believe that it works, while still creating a vast absolute number of false positives.


>Wouldn't it need additional data, such as actual proof, to become an allegation or even a claim or charge?

Yes but that hasn't stopped people before.


Maybe rather than focusing so much on how to detect AI-generated content, we should instead focus on our general ability to validate the truthiness of content regardless of source. I don't really care if an AI wrote it, so long as the content is meaningful and informative. I do care if it's a load of junk, even if a human did write it.


There is no algorithm for truth.


No, but there are technical mechanisms that provide a high level of assurance in most scenarios. There's a reason iMessage does not suffer spam and why cheating in Xbox online gaming is non-existent.


Treating what your opponents consider truth as hostile is basically presuming they are arguing in bad faith. They may well be, but you may be misinformed.

At any rate, iMessage does not verify that what is said is true. It just weeds out obviously hostile communication.


I didn't say anything about truth, I'm speaking about a humanness signal.


Truth rests on axioms. You would have to teach an AI what you think is true, but then it would have a kind of confirmation bias.


If the text is good, and someday it will be, I don't care if an LLM wrote it. If it's bad, I don't care if a person wrote it.

The only reason to care is that the implicit proof-of-work signal has broken because LLM text is so cheap. Open forums might need to be pay-per-submission someday...


The problem as I see it is that 'text is good' has two distinct meanings - first, it doesn't sound like an AI wrote it, it's interesting, entertaining, in word 'readable'. The second is that it's accurate / true.

It feels like we're happy to take the first as a surrogate for the second, or at least being good at the first drops our guard on questioning the second.


I'm thinking that it's a fundamental legal right to not have to read anything written by an LLM. In other words, machine-generated text shouldn't be legally binding in any way...mostly because there's infinite amounts of it.

And it should be illegal to return machine-generate text in response to a discovery request.

Request: "Disclose all documents written between apr 1 and apr 12 regarding topics x,y,z."

Response: "There are 12 billion documents matching those parameters. Here they are."


For good measure: Why bother storing documents at all? Instead, train an LLM on the documents, and keep only a list of questions you want to be able answer.

Request: "Please respond with all documents related to X,Y,Z".

Response: "There are none."


If you only care about the text, maybe. But you may also care about the person who claims to have written the text.

That being said, automatic detection seems like a lost cause.


First of all I dont think we should aim at 100% precision and 100% recall. Instead it is more realistic to use the result as a filter for more downstream testings and treat it as a mean to increase the cost of cheating.

Also we can use more than just the text output. A human writer doesn't generate a piece of text in one pass. Instead they go through the drafting and editing process. We can design devices to capture keypress or pen stroke(iirc ther were studies on fraud detection based on keypress patterns/mouse movment). One can attempt to train a new model to mimic themselves so we need to somehow make sure that the amount of training data required is too much to be worth the effort.

For downstream testing, the goal isn't mainly to verify whether a piece of text is AI genereted but to make sure that a student who can pass the test would essentially have to know the material sufficient well(so this defeats the purpose of cheating).


Exactly. We are at frame 3 of https://xkcd.com/810/: "But what will you do when spammers train their bots to make automated constructive and helpful comments?"

We don't need ML-generated-text detectors. We need BS detectors. If they have false positives and trigger on human-generated BS, that's just a great side-benefit.

"This comment could have been written by an AI" is good enough reason to exclude it. That which does not need to be said, need not be said.

As that XKCD's title text says: "And what about all the people who won't be able to join the community because they're terrible at making helpful and constructive co-- ... oh."


If you need to understand what a particular person is thinking, then text they wrote themself can tell you a lot about that.


So much time and effort being wasted on a problem we are entirely unequipped to ever "solve."

I guess the full-employment economy demands much of us.



> false positive rate of 0.01%

What would be an acceptable false positive rate for something like this to be used at schools and universities?

Like, obviously 0.01% is not acceptable, but what would be?


Why do you think this rate is not acceptable? I would say it's more than acceptable, even as a single data point. If someone's submission comes up as positive on two separate occasions you've pretty much eliminated the chance of a false positive.


Only if the measures for an individual are uncorrelated, which there’s no reason to assume.

It needs to be <= 0.01% false positive for each individual author. If it’s just that 0.01% of all tests are false positive, that leaves the possibility than a given individual might have anywhere up to 100% false positives.


I would agree with that it's usable as a single datapoint, used with others (other submissions, general performance, etc).

However given that we already have professors literally failing people by just pasting and asking chatgpt, I'm not sure I'm comfortable with that.


I feel like 0.01% is one of the more dangerous levels of false positives. Widespread use would result in far more than 10,000 tests resulting in the near certainty that innocent people would be accused. The moderately low false positive rate would then be leaned on to imply guilt.

You might find for every 10,000 tests you get around 151 targets. For each one individually you can say they probably did it, but cumulatively you can say that one of them is probably innocent.

Consider using two positives. Do schools and universities generate 100,000,000 essays per year? Sure would suck for that innocent person tagged as having a one in a hundred million chance of not being guilty.


Nothing is perfect, I wish more things had 1 in a hundred million certainty. Is our police this effective? Are the current plagiarism checks used in school this accurate?


That kind of thing is easy to handwave away when it does not affect you personally.

No, our justice system is probably not that effective either. But that is also a problem, so why should we create more problems like that instead of fixing the ones that we already have?


So we can’t replace current systems unless the replacement is absolutely perfect? How perfect is turn it in?


If the current systems are even worse, we should throw those out.

TurnItIn specifically is horrible and should have never been a thing


If you ask people to write a program, maybe 5% of them write in Python. But if you ask them to give you two programs, then that's only 0.25%. Ask for seven programs and if they give you seven Python programs, they're one in a billion, the rarest of the rare.


Unless that person's writing style just happens to be very close to how the average human writes? If it's 0.01% for "any given human", I wonder what the numbers would look like for a human closer to average than usual?


0.01% sounds far better than what I would expect!

If the tech froze at its current state, this would be useful for schools. You don't need to expel a student right away after finding a match, but it is a strong indication that something is worth looking into.

(If the goal is to make students write essays, theses, etc. without an LLM writing it for them.)


> (If the goal is to make students write essays, theses, etc. without an LLM writing it for them.)

This is one of the "I don't think that this is the path that we should be taking."

When I was in school, my parents would proof read the essays to catch the spelling and grammatical errors that were in what I wrote (Bank Street Writer had a rudimentary spelling checker but that was it - https://en.wikipedia.org/wiki/Bank_Street_Writer ).

While my parents are both native English speakers and college educated, some of my classmates had less involved parents, or parents that didn't have the same degree of proficiency for writing. Did their essays suffer from a lack of parental proof reading?

In the past few months I wrote two short works of fiction as lore for a game that I play. I used ChatGPT to act as an editor for those works looking at it and occasionally prompting it to help refine a passage.

https://chat.openai.com/share/204de7f7-9cd7-4c45-aa2b-556791... for part of the editor session with it.

Having ChatGPT act as an editor (not text editor but as a critique of the text) helped refine the text that I wrote.

Working with ChatGPT as a tool (that is far beyond the red squiggles in a word processor) to help people working with the written word is a good and useful endeavor. This isn't trying to have ChatGPT supplant human creativity but rather help the person communicate more clearly.

---

I am leaving this in an unedited form, but here is this post with ChatGPT as an editor as an example of how I believe students should try to interact with it. https://chat.openai.com/share/d891f9ac-923b-47a8-8de9-ab7301...


Yeah, at some point it's sensible to learn using all the available tools. Use calculators in math class, the web while programming, etc.

Still, there is a reason why calculators are not used since the very first grade -> it makes sense to learn how to do basic calculations without them.


Not only basic calculations, over use of calculators is great at making students that are able to get the correct numbers, but don't develop the deeper intutions about numbers that are required to be able to do more abstract thinking which is needed in higher mathematics.


im amazed at how difficult this has proven to be

every time i ask a question to an llm it spits out a generic response format:

'''

well, subject x has a lot of nuance filled with even more nuance. and it may be that x is true but y could also be true, here's a list of related sentences:

1. subject 1 is pretty broad in scope but applies to the question

2. subject 2 is more niche conceptually and applies to the core of the topic without addressing every aspect of it

3. and the list goes on

'''

this is the technology you can't surpass?


I'm guessing you haven't done much with prompt customization. Try starting a conversation with something like "I'm the research assistant for a policy maker and often need to summarize complex topics for my boss. Please avoid any fluff or boilerplate language in your responses, including not reiterating the question, and favor brevity while still using a paragraph structure instead of bullet points." You'd be surprised by how much different your answers will look.


>>> despite not being trained on any ChatGPT data

I doubt that since ChatGPT trained all other LLM.


This was an open kaggle prize for a while


Research papers definitely need to be more nuanced with the "zero-shot" language. Originally this term was used to describe out of distribution and out of class instances and in the context of metalearning (if you don't know, see under the history section I left for context). This term has been really bastardized and it makes it difficult to differentiate works now. "Out-of-domain" is a fuzzy concept and I think there are some weird usages where people would call something OOD but wouldn't call a test set OOD. OOD classically doesn't mean something not in training data, but not in the distribution of data your data is a proxy for. Certainly the data here is within distribution as it is using LLMs.

> Our approach, Binoculars, is so named as we look at inputs through the lenses of two different language models.

How is LLM generated data out of domain of LLMs? Specifically their github demonstrates with Falcon-7B and Falcon-7B-Instruct models. Instruct models are specifically tuned on their own outputs. We can even say the non-instruct models are also "trained on" LLM outputs as you're using the outputs in the calculation of the cost functions, meaning they see that data and are using that information, which is why

> Unsurprisingly, LLMs tend to generate text that is unsurprising to an LLM.

Because they are trained on cross-entropy which directly related to perplexity. Are detector researchers really trying to use perplexity to detect LM generation? That seems odd since that's dependent on the exact thing LMs are minimizing... It also seems weird because the premise from the paper is that human writing has more "surprise" than that from an LM, but we're instructing LMs to sound more human. Going about detection this way does not sound like it would be a sustainable method (not that LLM detectors are reliable and I think we all know they frequently flag generic or standard text, which of course they do if you're highly dependent on entropy).

=== History ===

First example I'm aware of is the "one-shot" case from[0] (2000) and abstract says

> We suggest that this density over transforms may be shared by many classes, and demonstrate how using this density as “prior knowledge” can be used to develop a classifier based on only a single training example for each class.

Which we can think of as taking a model and fine tuning (often now just called training) with a single epoch, relying on the prior knowledge that the model learned that is general to other tasks (such as training on cifar-10 should be a good starting point for classifying lions).

Then come [1,2] in 2008. Where [1]'s title is "Importance of Semantic Representation: Dataless Classification" and [2] (from Yoshua Bengio's group) is "Zero-data Learning of New Tasks".

[1] trains on Wikipedia and then tests semantic classification on a modified 20 Newsgroup dataset (expanded labels) and Yahoo Answers dataset and is about the generalizability of the embedding mechanism cross domain. Specifically they compared Bag of Words (BoW) to Explicit Semantic Analysis (ESA).

I'll just quote for [2]

> We tested the ability of the models to perform zero-data generalization by testing the discrimination ability between two character classes not found in the training set.

Part of their experiments includes training on numeric character recognition and testing on alphabetical characters. They also do some low-shot experiments.

[0] https://people.cs.umass.edu/~elm/papers/cvpr2000.pdf

[1] https://citeseerx.ist.psu.edu/document?doi=ee0a332b4fc1e82a9...

[2] https://cdn.aaai.org/AAAI/2008/AAAI08-103.pdf




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: