Each year, companies spend billions of dollars gathering survey data to guide product decisions. However, a growing percentage of this data is AI-generated. This bad data can lead to misguided decisions and costs companies billions.
Unfortunately, a large body of research has shown that identifying AI-generated text isn’t reliable. At roundtable.ai, we discovered a different approach: keystroke tracking. After collecting millions of responses for our survey data-cleaning API, we noticed that AI and human-generated responses generate text in fundamentally different ways. Here's what we found.
-- Building an AI keystroke dataset --
Consider this question:
What temperature do you typically keep your home at and why?
And the following responses:
- About 70, I like it cold but dont want to spend too much on electricity.
- In my home, I usually set the temperature to around 70° Fahrenheit (21° Celsius). This temperature strikes a comfortable balance for me—it’s warm enough that I don’t feel the need for extra layers in colder months, and it’s cool enough that it doesn’t feel stifling during warmer seasons. I also find that this temperature helps keep energy bills manageable, providing a cozy atmosphere without overusing the heating or cooling system.
The AI response is obvious - it includes degree symbols, temperature conversions, and perfect paragraph structure. While humans could write this way, they rarely do in a rushed online survey.
-- AI keystroke patterns --
By focusing on questions where human and AI responses clearly diverge, we built a large dataset of labeled responses and their corresponding typing patterns. While the content was often similar, the typing patterns revealed consistent differences.
Human typing has a natural variance. The time between keystrokes follows a characteristic distribution with random pauses and bursts, with pauses most common at the end of words and phrases. See https://app.roundtable.ai/keystrokes/human for an example.
Humans also make typos and use backspaces and other editing techniques to fix them. By contrast, AI responses typically show unnaturally consistent intervals between keystrokes with minimal variance, and almost never include corrections or backspaces (https://app.roundtable.ai/keystrokes/ai). Other AI responses look like human-AI hybrids, where AI pipes in text programmatically and humans then edit the response (https://app.roundtable.ai/keystrokes/hybrid).
-- Real-world impacts --
To demonstrate how bad actors corrupt survey data, we ran a test study about an intentionally bad product - a solar-powered refrigerated hat. By classifying responses as "Human? or "AI", we found that the AI responses created two types of problems in downstream analysis.
First, AI responses showed consistently higher willingness to pay ($135 for AI vs. $40 for humans), replicating a general pattern of AI optimism compared to human responses.
Second, these responses added noise. Whereas human responses formed logical customer segments (motivated, indifferent, etc.), the flagged data formed nonsensical ones - for example, a segment rating the hat highly useful but being unwilling to pay.
-- Our ask --
Identifying AI content through text analysis is extremely difficult and usually unreliable. We suggest analyzing typing patterns, which can be more robust. The complex keystroke data provides clear signals that are much harder to fake (and often more understandable) than content alone. Of course, this is a cat and mouse game. As detection methods evolve, so do evasive techniques, and we're constantly updating our models.
Our detector was developed for enterprise surveys, but we’re exploring other domains where we can apply a similar technology. Do you deal with AIfraud in your domain? We’d love to learn more about applications in other industries.