Hi HN, we’re Mayank and Matt of Roundtable (
https://roundtable.ai/). We use LLMs to produce cheap, yet surprisingly useful, simulations of surveys. Specifically, we train LLMs on standard, curated survey datasets. This approach allows us to essentially build general-purpose models of human behavior and opinion. We combine this with a nice UI that lets users easily visualize and interpret the results.
Surveys are incredibly important for user and market research, but are expensive and take months to design, run, and analyze. By simulating responses, our users can get results in seconds and make decisions faster. See https://roundtable.ai/showcase for a bunch of examples, and https://www.loom.com/share/eb6fb27acebe48839dd561cf1546f131 for a demo video.
Our product lets you add questions (e.g. “how old are you”) and conditions (e.g. “is a Hacker News user”) and then see how these affect the survey results. For example, the survey “Are you interested in buying an e-bike?” shows ‘yes’ 28% [1]. But if you narrow it down to people who own a Tesla, ‘yes’ jumps to 52% [2]. Another example: if you survey “where did you learn to code”, the question “how old are you?” makes a dramatic difference—for “45 or older” the answer is 55% “books” [3], but for “younger than 45” it’s 76% “online” [4]. One more: 5% of people answer “legroom” to the question “Which of the following factors is most important for choosing which airline to fly?” [5], and this jumps to 20% when you condition on people over six feet tall [6].
You wouldn’t think (well, we didn’t think) that such simulated surveys would work very well, but empirically they work a lot better than expected—we have run many surveys in the wild to validate Roundtable's results (e.g. comparing age demographics to U.S. Census data). We’re still trying to figure out why. We believe that LLMs that are pre-trained on the public Internet have internalized a lot of information/correlations about communities (e.g. Tesla drivers, Hacker News, etc.) and can reasonably approximate their behavior. In any case, researchers are seeing the same things that we are. A nice paper by a BYU group [7] discusses extracting sub-population information from GPT/LLMs. A related paper from Microsoft [8] shows how GPT can simulate different human behaviors. It’s an active research topic, and we hope we can get a sense of the theoretical basis relatively soon.
Because these models are primarily trained on Internet data, they start out skewed towards the demographics of heavy Internet users (e.g., high-income, male). We addressed this by fine-tuning GPT on the GSS (General Social Survey [9] - the gold standard of demographic surveys in the US) so our models emulate a more representative U.S. population.
We’ve built a transparency feature that shows how similar your survey question is to the training data and thus gives a confidence metric of our accuracy. If you click ‘Investigate Results’, we report the most similar (in terms of cosine distance between LLM embeddings) GSS questions as a way of estimating how much extrapolation / interpolation is going on. This doesn’t quite address the accuracy of the subpopulations / conditioning questions (we are working on this), but we thought we are at a sufficiently advanced point to share what we’ve built with you all.
We're graduating PhD students from Princeton University in cognitive science and AI. We ran a ton of surveys and behavioral experiments and were often frustrated with the pipeline. We were looking to leave academia, and saw an opportunity in making the survey pipeline better. User and market research is a big market, and many of the tools and methods the industry uses are clunky and slow. Mayank’s PhD work used large datasets and ML for developing interpretable scientific theories, and Matt’s developed complex experimental software to study coordinated group decision-making. We see Roundtable as operating at the intersection of our interests.
We charge per survey. We are targeting small and mid-market businesses who have market research teams, and ask for a minimum subscription amount. Pricing is at the bottom of our home page.
We are still in the early stages of building this product, and we’d love for you all to play around with the demo and provide us feedback. Let us know whatever you see - this is our first major endeavor into the private sector from academia, and we’re eager to hear whatever you have to say!
[1]: https://roundtable.ai/sandbox/e02e92a9ad20fdd517182788f4ae7e...
[2]: https://roundtable.ai/sandbox/6b4bf8740ad1945b08c0bf584c84c1...
[3] https://roundtable.ai/sandbox/d701556248385d05ce5d26ce7fc776...
[4] https://roundtable.ai/sandbox/8bd80babad042cf60d500ca28c40f7...
[5] https://roundtable.ai/sandbox/0450d499048c089894c34fba514db4...
[6] https://roundtable.ai/sandbox/eeafc6de644632af303896ec19feb6...
[7] https://arxiv.org/abs/2209.06899
[8] https://openreview.net/pdf?id=eYlLlvzngu
[9] https://www.norc.org/research/projects/gss.html
If a researcher comes out and says, “Surveys show that people want X, and they do not like Y,” and then others ask the researcher if they surveyed people, the answer would be “no.”
Fundamentally, people wanting feedback from humans will not get that by using your product.
The best you can say is this: “Our product is guessing people will say X.”