I'm always suspicious of tests when test coverage is the main metric. I've seen ...

Bootstrapper909 · on April 19, 2023

That's a great question!

We actually use real user sessions to train our model, so when I use the term coverage our main metric is covering as many user behaviors as possible.

We collect data in a privacy-focused way essentially anonymizing all sensitive information, as we don't need to know the user specific context. Only the main flow.

johnhenning · on April 19, 2023

If this is trained on user sessions, how would the model learn to generate tests for edge cases that wouldn’t necessarily show up in the training data?

Bootstrapper909 · on April 19, 2023

We train the model based on user sessions to learn how to use an app. The model learns how to execute specific flows, but also how to interact with components in the more general sense. Since most developers use compostable components, patterns of usage are repeated across the same app.

Then, during test generation, we bias the model to explore edge cases (in a few ways), and the model is still able to complete those even with low sample.

In other words, we direct the model toward certain goals, and flows and also add chaos to the process which result in the model executing unexpected flows.

afro88 · on April 19, 2023

How do you know what data is sensitive, and how do you anonymize?

Thinking of apps that might fall under HIPAA etc

Bootstrapper909 · on April 19, 2023

We have a few privacy controls in place:

1. We hash all inner text and then backfill static strings on the server side. So every text that is specific to the user remains hashed

2. We detect special cases like passwords, SSNs, credit cards, and completely block it (even not hashed)

3. We provide full privacy controls to our customers to easily mask any sensitive elements

4. We discard the user IP and don't require any PII to be sent. So we can connect a session together, but don't really know who the user is