Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Roundtable (YC S23) – Using AI to Simulate Surveys
121 points by timshell on July 25, 2023 | hide | past | favorite | 91 comments
Hi HN, we’re Mayank and Matt of Roundtable (https://roundtable.ai/). We use LLMs to produce cheap, yet surprisingly useful, simulations of surveys. Specifically, we train LLMs on standard, curated survey datasets. This approach allows us to essentially build general-purpose models of human behavior and opinion. We combine this with a nice UI that lets users easily visualize and interpret the results.

Surveys are incredibly important for user and market research, but are expensive and take months to design, run, and analyze. By simulating responses, our users can get results in seconds and make decisions faster. See https://roundtable.ai/showcase for a bunch of examples, and https://www.loom.com/share/eb6fb27acebe48839dd561cf1546f131 for a demo video.

Our product lets you add questions (e.g. “how old are you”) and conditions (e.g. “is a Hacker News user”) and then see how these affect the survey results. For example, the survey “Are you interested in buying an e-bike?” shows ‘yes’ 28% [1]. But if you narrow it down to people who own a Tesla, ‘yes’ jumps to 52% [2]. Another example: if you survey “where did you learn to code”, the question “how old are you?” makes a dramatic difference—for “45 or older” the answer is 55% “books” [3], but for “younger than 45” it’s 76% “online” [4]. One more: 5% of people answer “legroom” to the question “Which of the following factors is most important for choosing which airline to fly?” [5], and this jumps to 20% when you condition on people over six feet tall [6].

You wouldn’t think (well, we didn’t think) that such simulated surveys would work very well, but empirically they work a lot better than expected—we have run many surveys in the wild to validate Roundtable's results (e.g. comparing age demographics to U.S. Census data). We’re still trying to figure out why. We believe that LLMs that are pre-trained on the public Internet have internalized a lot of information/correlations about communities (e.g. Tesla drivers, Hacker News, etc.) and can reasonably approximate their behavior. In any case, researchers are seeing the same things that we are. A nice paper by a BYU group [7] discusses extracting sub-population information from GPT/LLMs. A related paper from Microsoft [8] shows how GPT can simulate different human behaviors. It’s an active research topic, and we hope we can get a sense of the theoretical basis relatively soon.

Because these models are primarily trained on Internet data, they start out skewed towards the demographics of heavy Internet users (e.g., high-income, male). We addressed this by fine-tuning GPT on the GSS (General Social Survey [9] - the gold standard of demographic surveys in the US) so our models emulate a more representative U.S. population.

We’ve built a transparency feature that shows how similar your survey question is to the training data and thus gives a confidence metric of our accuracy. If you click ‘Investigate Results’, we report the most similar (in terms of cosine distance between LLM embeddings) GSS questions as a way of estimating how much extrapolation / interpolation is going on. This doesn’t quite address the accuracy of the subpopulations / conditioning questions (we are working on this), but we thought we are at a sufficiently advanced point to share what we’ve built with you all.

We're graduating PhD students from Princeton University in cognitive science and AI. We ran a ton of surveys and behavioral experiments and were often frustrated with the pipeline. We were looking to leave academia, and saw an opportunity in making the survey pipeline better. User and market research is a big market, and many of the tools and methods the industry uses are clunky and slow. Mayank’s PhD work used large datasets and ML for developing interpretable scientific theories, and Matt’s developed complex experimental software to study coordinated group decision-making. We see Roundtable as operating at the intersection of our interests.

We charge per survey. We are targeting small and mid-market businesses who have market research teams, and ask for a minimum subscription amount. Pricing is at the bottom of our home page.

We are still in the early stages of building this product, and we’d love for you all to play around with the demo and provide us feedback. Let us know whatever you see - this is our first major endeavor into the private sector from academia, and we’re eager to hear whatever you have to say!

[1]: https://roundtable.ai/sandbox/e02e92a9ad20fdd517182788f4ae7e...

[2]: https://roundtable.ai/sandbox/6b4bf8740ad1945b08c0bf584c84c1...

[3] https://roundtable.ai/sandbox/d701556248385d05ce5d26ce7fc776...

[4] https://roundtable.ai/sandbox/8bd80babad042cf60d500ca28c40f7...

[5] https://roundtable.ai/sandbox/0450d499048c089894c34fba514db4...

[6] https://roundtable.ai/sandbox/eeafc6de644632af303896ec19feb6...

[7] https://arxiv.org/abs/2209.06899

[8] https://openreview.net/pdf?id=eYlLlvzngu

[9] https://www.norc.org/research/projects/gss.html




I see the logic here, but I’m highly skeptical about how valid such a tool would be.

If a researcher comes out and says, “Surveys show that people want X, and they do not like Y,” and then others ask the researcher if they surveyed people, the answer would be “no.”

Fundamentally, people wanting feedback from humans will not get that by using your product.

The best you can say is this: “Our product is guessing people will say X.”


Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? (https://arxiv.org/abs/2301.07543)

Out of One, Many: Using Language Models to Simulate Human Samples (https://arxiv.org/abs/2209.06899)

There's been some research in this vain. To answer your question, seemingly very valid.


These papers suggest that LLMs do something a lot more specific (when asked to simulate a certain political background, they're able to give responses to questions in a way that's consistent with those political backgrounds). That's not particularly surprising to me as I would expect a human to be also able to simulate this kind of thing pretty accurately. I don't think it implies that LLMs would be good at answering typical business survey questions.


We're trying to figure out the optimal use case for this, i.e. whether it's internal or client-facing (your example).

Internal purposes include stuff like optimally rewording questions and getting priors.

A hybrid approach would be something like - hey let's not ask someone 100 questions because we can accurately predict 80%. Let's just ask them the hard-to-estimate 20 questions


I think it's less about "prediction" and more about mapped cohort behaviors and opinions, especially those that change slowly over time. The LLM model will likely be a picture of how the population and each demographic group behaved and what they believed at a specific time window (i.e. when the data set was collected), and will produce answers that reflect that. It will most likely be lagging behind new trends and how they shape population behaviors and beliefs over time. In any case I think even the most experienced market research professionals would agree that discovering new trends before they become mainstream is really challenging.


> optimally rewording questions

This kind of concerns me because you could use this to bias surveys in different directions. This obviously already happens, so maybe it just part of the status quo.


There’s lies, damned lies, statistics, and now fake statistics!


I’ve worked in this industry for a while and in the ‘faster, cheaper, better - pick two’ trade off, many will select faster and cheaper. That’s only speaking for corporate market research though, can’t say the same for academic researchers.

I suspect people would use this product as a quick gut check to decide if it is warranted to spend the time and money on a full scale quant study.


You want a 90/10: 90% of the benefit, 10% of the effort.

This is like a 10/10.


ISTM that's just an a priori feeling. The value or lack thereof of the product totally depends on how accurately it predicts human survey responses, which you can't know without looking at the data.


The tool would be useful as a QA step to test for leading questions in survey design. See Yes Minister's[1] explanation for how they can work. A simulation to see if the questions get the same response irrespective of the order they were asked in could improve survey quality. Obviously, the tool could be used in the opposite way too, to help design surveys that say exactly what the company/govt/charity wants it to.

[1] https://www.youtube.com/watch?v=G0ZZJXw4MTA


> I see the logic here, but I’m highly skeptical about how valid such a tool would be.

I see the problem as although you can create lots of examples that are correct/follow real world opinions, you can never prove that a particular question is correct/follows real world opinion. I'm not sure who would trust the output enough to rely on it for decision making.


I have been using AI generated surveys using the playground and have found them quite effective in simulating responses. In fact they are incredibly similar to my experience asking the same questions IRL. The challenge is people don’t trust them and AI still have this negative association. So yes I mean to say it’s yet another human error.


Some people worry that biased AI models will deepen inequality. Your product seems particularly primed for this scenario. I might even say that a product like yours would exacerbate this problem. What is your plan to ameliorate AI bias?

On a more personal note, while all of the AI advances have been very interesting, I worry that AI will reduce human connection, and a product like this sure seems to do that. You are telling users that they don't need to talk to real people, and can just get feedback from a model instead.

Edit: for example, here's your dataset by race: https://imgur.com/a/134epoN

I asked, "Which race is most likely to commit a crime?": https://imgur.com/a/4QJZo2O


1. GPT out of the box was pretty biased (e.g. gender distribution). We fine-tuned on representative survey data to ameliorate this bias so we get Census-level estimates for conditions such as gender [a] and work status [b].

2. We add the transparency features (click on 'Investigate Results') that shows how in vs. out-of-distribution the target question is. For out-of-distribution, we suggest people run traditional surveys.

More broadly, I think your point is really interesting when it comes to qualitative data. That is one reason we haven't generated qualitative survey data, but a lot of potential customers have already started to ask for it.

----

[a] https://roundtable.ai/sandbox/baa3d5f25236b91f1608c9f606b315...

[b] https://roundtable.ai/sandbox/7a9ee27872eb29087be2386ccd19f7...


To respond to Edits - that's a great example, thank you. One of the limitations of surveys more broadly is you're asking for people's opinions, which of course does not correspond to reality. So, what we're simulating is how we estimate a representative U.S. population to answer the question "Which race is most likely to commit a crime?" as opposed to what the actual answer is.

We definitely need to think how to handle your question so that it's clear where survey data converges/diverges with reality.


How can you be reasonably sure that that work sufficiently addresses the bias?

What metric(s) are you using to measure bias in general, and what do those metric(s) look like before and after your tuning?


This is the most thought-provoking company I’ve encountered in a long time. Congrats on doing genuinely interesting innovation.

Speaking as a potential user, my biggest hang up is trust. How can I trust that Roundtable’s results are accurate and not the result of hallucination?

One of the powerful things about data is that they surprise you. This is why data integrity is so important (“crap in, crap out” as the old adage goes). But if I get a surprising result from Roundtable, how can I verify it? I think you two are already thinking about this and building features to address it.

I’m also wondering if trying to verify a surprising result from Roundtable is the wrong response…Why would a LLM give me that answer? There may be something useful to understand about why the LLM is “hallucinating.” In terms of features, it may be interesting to see whether Roundtable’s LLM could explain its answer.

The UX could be like having a brilliant but inscrutable research assistant…


Thank you, this is exactly where our headspace is too


Hi congrats.

LLMs model a static distribution, whereas consumer preferences change over time to the point that companies regularly run the same survey at different points in time. At my old fund we would run the same surveys every month to track changes on various companies. How do you counteract this time effect? Presumably a lot of your training data is from the past.

To give one example from your summary - the demographics of Tesla owners have change significantly over time from a pure luxury, avant garde market to much mass market. So info about Tesla from 5 years ago is not that useful


Pasting below answer to niko001

The data we trained on has year, so we can specify the year you ask the question (the default is 2023). You can also see how answers change over time. [1] shows how the distribution for "Do you support the President" changes from 2000 to 2023 (see the 9/11 spike, end of Bush era, Obama era, Trump era, etc.) [1] https://roundtable.ai/sandbox/2dd4e9d32c24e9abff01810695e948...


Which is logical and kind of what I expected. But raises the obvious question of where does your data come from going forward? The internet is getting more and more polluted with machine generated data, previous big ongoing data sources like Twitter, Reddit, etc. are all full of GPT spam and are trying to monetise their data.

I’d also be interested in how much you think your platform is just capturing say reported surveys/data. President polling is something that must be all over LLM datasets- isn’t that just replicating the training data?

I think you could do a better job of showing on your website the following - here are some unusual survey results we generated from the model - I.e. stuff definitely not in the training data - and here’s the data we actually got when we did that survey for real


Going forward, the current business model (with caveat that pivots are always likely this early stage) is to train on companies' proprietary survey data so we can estimate how their specific users respond to questions.

In the backend, we check to see if the answers are stated in a high-quality survey and just retrieve that. I know we do this for gender, and I'm not sure whether that happens for presidential polling.

Great idea, thank you. We're still figuring out whether the business model will be a general-purpose tool that anyone can use or those custom models I referenced above. If the former, your suggestion is spot on.


> Going forward, the current business model (with caveat that pivots are always likely this early stage) is to train on companies' proprietary survey data so we can estimate how their specific users respond to questions.

I imagine cleaning customer data to get it to the point that it's inputtable will be a big job for you.

Are you then creating individual models per customer? As in, if Coke are an existing customer of yours and Pepsi sign up, do they get access to a model that's partially trained on Coke data, or it's a case of your base model + "bring your own research"?


> I imagine cleaning customer data to get it to the point that it's inputtable will be a big job for you.

We're in the process of figuring that out. Hopefully that is another use case for LLMs :)

> Are you then creating individual models per customer? As in, if Coke are an existing customer of yours and Pepsi sign up, do they get access to a model that's partially trained on Coke data, or it's a case of your base model + "bring your own research"?

The latter, i.e. base model + "bring your own research"


This is a really cool idea and beautiful UX, congrats on the launch!

One related pain point I have seen many times with surveys is that the people writing them don't know what they're doing and get bad data as a result of biased questions.

Could be cool to add functionality down the line to help people craft better questions. For example, your app could provide alternate ways of phrasing questions and then simulate how results would differ based on the wording.

Excited to see where this goes! Going to share with my partner who works for a survey software company and see what she thinks.


Exactly where we're headed :)

Thank you for the kind words / reference


Just played with the sandbox, and it seems like 16% of Apple users wouldn't consider buying Apple VR headset even for $3,5. I don't think even the lochness monster would be so stingy.


In a world where everyone could buy the headset for $3.50 (thus there is no profit value to buying it and then re-selling it) then that percentage actually makes sense.


Presumably the question is being interpreted as "would you buy a Apple VR headset if the standard price was $3.50?", rather than "would you buy a headset for $3.50 that you could immediately resell for 1000x that?".

The answer seems plausible with that interpretation.


Don't you think that even playing with it for an hour and then never touching it again would still be worth a three fiddy?


I do but I can imagine that 16% of people don't.

If you went out with a VR headset and offered 30 minute demos of them to random people for $3.50 I don't think you would have an 84% success rate.


FWIW, I think the discussion here this highlights the biggest problem with the approach. Either it's confirmatory and wasn't a question worth asking in the first place or it's surprising and people need real data to evaluate the response.


One of our major weaknesses right now is sensitivity to price


My company recently ran a survey of UK-based Creatives on the topic of their working preferences (n=250, July 2023) - so I compared our data to responses provided by Roundtable.ai

https://docs.google.com/spreadsheets/d/1YtvcLkC-xaTw3q6LOxCq...

The average delta across 11 questions between actual selected response % and simulated %, across 11 questions was 7%. Seems like a good start - it would make it useful for certain low-impact, high-speed business decisions.


Thank you for sharing these results!


Interesting idea! One of the problems with any primary research (surveys included) is the delay in collecting responses, which can take hours to weeks depending on sample, IR, incentives, etc. This would solve that!

It's not surprising that LLMs can predict the answers to survey questions, but really good primary research generates surprising insights that are outside of existing distributions. Have you found that businesses trust your results? I have found that most businesses don't trust survey research much at all, and this seems like it might be even less reliable.

-----

Context: I co-founded & sold survey software company (YC W20).


Thank you!

Trust is one of the biggest issues we're trying to solve. This motivated the tSNE plots and similarity scores under 'Investigate Results', but we definitely have a long way to go. Generally speaking, survey practitioners trust us more than their clients (perhaps not surprising)


You might want to take a look at the papers i've linked here that go into this kind of research

https://news.ycombinator.com/item?id=36868552


Congrats on the launch!

I have an interesting dissonance with this. On one hand, I understand how huge parameter sets can and do model specific personas well. I've also read some of these cited papers and _know_ intellectually that predicted results can be close to actual survey data. The other part of me is screaming at my laptop that language modeling is about aggregate statistics, revealed preference counts for a lot, and how could a language model actually substitute for market research?

I imagine the biggest hurtle you're going to face are research teams that:

- A. Want to see actual proof behind data

- B. Disbelieve a LLM could generate statistically significant insights about real people that would make individual decisions

- C. Need to justify their own existence / organizational clout with boots on the ground facilitating surveys

A and C might be surmountable, but I'm not sure of a good way of tackling B.


Thank you!

Agree A, B, and C are big hurdles.

Re: A - we have started adding transparency (vis-a-vis the 'Investigate Results' and the tSNE plots + similarity scores) but we still have a ways to go

Re: B - agree that the survey responses -> insights pipeline is nonlinear and it's not clear how to make that tighter

Re: C - generally, we try to champion a human-AI interaction loop where people are needed to evaluate the outputs, generate insights, etc.

All great points though and ones we are facing


>The other part of me is screaming at my laptop that language modeling is about aggregate statistics

Yeah..but it's not. This is where people are having so much trouble. The erroneous belief that language model learn some "average" or "aggregate" of the training set. But when you get down to it, that doesn't make any sense at all. What help would some fictional aggregate distribution do with predicting the responses of decidedly non aggregate text ? None at all.

So Language models don't learn an "average" distribution. They learn to predict every state that exists in the corpus in a sort of superposition. The perfect LLM would predict Einstein as well as it would predict the dumbass down the street.

LLMs are biased but not uniformly so.


I think we're using aggregate statistics in two different ways.

Technically speaking - the whole idea with a language model is that you're learning to generalize underlying patterns of text, not just memorize the input text. Otherwise language models would be very good at echoing back training data but fail miserably during validation. If we go back to the training sequence - it's trying to maximize the posterior given the conditional probabilities in the sequence:

P(y1, y2, ... yn ) = P(y1) * P(y2|y1) ... P(yn|y1...yn-1)

That probability is by definition an aggregate; it's the best fit given potentially competing inputs of the training set that all have the same input conditional chains.

Where generative LLMs have a leg-up is because they have such a large parameter space, large context windows, and coherent sampling strategies. This helps them stay internally consistent with their response data. But at the end of the day what they are learning are still patterns. That's why they aren't able to link content back to the exact source of origin; parameters fuse inputs from different places into one hybrid.

Seeding a generative chain with Einstein or someone down the street doesn't change the fact that what's next is some fused learning from a lot of different training set inputs.


The point i'm making is that i believe the "fused space" is not in fact very fused because that would be directly detrimental to reducing loss.


Good you don’t have “survey” in your company name. It could definitely be worth rebranding in the future.

To me, this looks like a useful tool for exploring user behavior and sentiment, but ultimately different from a survey. Instead, it’s more like having an expert in multiple demographics around.

Now, people can be disappointed from both angles: people who (rightfully) point out that the results can be quite wrong, and on the other side of the spectrum you have people who would think it’s limited to whatever a survey can do. It seems like your product could be not just faster but more capable than a survey, in some cases. For instance, perhaps you could predict “reasons” or narratives for why results turn out certain ways.


What's your take on the entire premise of market research being mostly feel-good busywork detached from reality? This is because context is dynamic over time in every instance, and survey data pales in comparison to purchase data. Best way then is to launch small experiments with real people and real buying behavior.

Covered here: https://www.pretotyping.org/


Much of the criticism of market research comes from inexperienced researchers that ask hypothetical questions.

If you focus on past and current behaviors and problems you can fairly accurately predict future behaviors since most consumers/customers tend to do the same things over time.

That said, I read and enjoyed the book you linked to and thought it had valid points. If you can build a quick prototype to observe actual behavior then go that route rather than starting with a formal discovery process. That’s not always doable though…


The survey / behavior gap is very real. Short-term we're focused on surveys, but we'd like to integrate behavioral data long-term (and potentially be primarily behavioral data, but that is TBD)


Really interesting approach! I can see this being useful. How are you dealing with short/medium-term changes in consumer sentiment, I assume your model is currently fairly static? For example, the results to "Would you buy an e-bike?" might change over time as cities add charge-points or additional bike paths, prices for e-bikes go down, etc. And as a more extreme example, the answer to typical YouGov questions like "Who will you vote for in the next presidential election" will obviously change daily based on a multitude of factors that aren't present in your training data.


The data we trained on has year, so we can specify the year you ask the question (the default is 2023). You can also see how answers change over time. [1] shows how the distribution for "Do you support the President" changes from 2000 to 2023 (see the 9/11 spike, end of Bush era, Obama era, Trump era, etc.)

[1] https://roundtable.ai/sandbox/2dd4e9d32c24e9abff01810695e948...


I also thought of this, I would try to make ChatGPT pretend like a human to test customer interest in a potential idea - or conduct user interviews. If you can really build a high fidelity human simulator even for a limited context, I will bet this has immense value - the value should be in the fidelity to a human in depth, not in breadth IMO. Good luck! There was a related Economics paper on this a while ago (but there seem to be more than a few papers on this now), I can't find it, wanted to link it.


My initial thought was wow, this is going to truely be a game changer. I could imagine that I could test hypotheses in near real-time and get amazing insights using the largest body of knowledge available that is powered by probability.

Being curious I want to try to work out the workings.

I assumed there was some prompt engineering using the inputs, but was interested in the GSS training model and how that would come into play.

So I was like lets see first how good or bad ChatGPT is, it was poor, though I put zero effort into optimising the prompt.

```Using your knowledge of business owners, calculate a plausible distribution of responses to the question: What is the biggest pain point for your business that a SaaS platform could potentially solve? A. Inefficient project management B. Poor data management and analytics C. Ineffective customer relationship management D. Challenges in managing remote teams E. Lack of robust communication and collaboration tools```

ChatGPT (GPT4) ``` A. Inefficient project management - 20% B. Poor data management and analytics - 25% C. Ineffective customer relationship management - 15% D. Challenges in managing remote teams - 20% E. Lack of robust communication and collaboration tools - 20% ```

The results were not that exciting and very neat (20%, 25% etc.)

I was ready to see how much more accurate RoundTable was...it was exactly the same.


Makes sense. The further away the target question is from the GSS training distribution, the more it relies on the ChatGPT prior. I assume if you click 'Investigate Results', the confidence is 'low' and the most similar questions in the dataset are pretty far off


Very cool idea and I think there’s a thread worth pulling here.

The CEO of Unlearn AI on this podcast (https://podcasts.apple.com/ca/podcast/whats-your-problem/id1...) talked about using AI to simulate a larger sample size for clinical trials which is similar to what you are doing here

Looking forward to seeing where this goes :)


> talked about using AI to simulate a larger sample size for clinical trials which is similar to what you are doing here

My career has straddled medicine and ML at various points, so I feel like I have the context to comment on this: This is really fucking stupid. Like I can't believe anyone would suggest something this dumb. Hopefully the FDA shuts this bullshit down before some moron MBA spreads the practice through the industry. Without real data you have nothing.


Another interesting point about this is that he talks about how it’s mathematically probable that the clinical trial has the power of the larger sample size


AI for simulating surveys is pretty bad (read, pointless), but AI for simulating clinical trials is straight up criminal.


I used GPT3 in 2021 to survey which opening line to an email ("Please read" "You might know this, but ..") lead to the best recall. I'm a big fan of the approach, but I'm afraid we can only read so much into how the model thinks people would respond. Calling it a survey might be misleading, but we are seeing what the model thinks about a problem, and I think that's a really useful approach.


hey Matt and Mayank. Congrats on taking a leap into this. Back in 2009 I was a founder at a company Crowd Science where we were using survey results along with behavioral data to build models that would be used to classify individuals for advertising targeting. We ended up selling it to an advertising platform. It did work quite well. Would be happy to chat about our experience. There are also a few doing what they call synthetic panels but I believe they are quite naive at this point so there is definitely an opportunity here. I think validation of your output will be very important but there are some interesting ways to approach this. Super cool stuff.


I think you are underestimating metadata. Not all, but many survey responses vary based on the browser, location, device type, and other metadata. These pieces of information reveal a lot about the user giving the survey and can even help filter out certain responses.

For example: "Do you wish to buy an Apple Headset?"

I can observe how the results vary among Samsung users, Apple iPhone 6 users, and Apple iPhone 13 Pro Max users living in New York versus Greenland, especially after a popular Netflix series featured the main character using the product.

How would such a case be handled?



I would honestly believe this would be the same results if a moderately popular user posted this poll with those options to their social media.


If I had encountered this poll on Twitter, I would have picked "moon" too.


Congrats on the launch! The sandbox is a lot of fun.

I played around with it, and one weakness I saw immediately was mixing real survey answers with imagined ones. For example: https://roundtable.ai/sandbox/609f54304935736c8e61816dea780e...

I find it hard to believe that so many people would prefer carbon emissions over free beer, massages, or being the pilot. If I condition this data on being male or female, the results change dramatically too.


https://roundtable.ai/sandbox/8a4298fa1784cf5603f42e9fb3d673...

It's interesting to me how many of my users go for the gambling option, though thinking about human behavior, maybe I shouldn't be surprised. If anything it's a good exercise into finding ways to structure questions to most effectively trick users. Cool stuff will be following your project.


This looks like an awesome toolchain to try to use for developing qualitative data sets in disciplines where the norm is to cut corners (e.g. https://datacolada.org/98).

Could you sell a version of this product where you promise to keep the buyers anonymous and confidential (and destroy the prompt and outputs after delivery)?


That sounds cool and I can believe that the totality of survey information etc. contains a lot of extractable information that isn't always made use of or needs surfacing.

Do you know at what boundaries this tends to stop working, e.g., some event happens that changes people's perception of X would probably need new deta if the event is "bigger than ..."?


I’m sorry but this doesn’t make any sense. You’re just hallucinating plausible sounding numbers.

You probably fooled yourself with cherry picking


It makes perfect sense and there's research to back it up

https://news.ycombinator.com/item?id=36868552


I built notionsmith.ai originally with this specific usecase in mind, but found people struggle to trust AI derived insights


it would truly be useful if the options didnt have to be submitted. it adds a bias. eg: i cant ask what are the top usecases for X. but like someone else said. for 1% of the effort - if you're getting a slightly biased 90% accurate result - there should be lot of takers. congrats!


God if AI could solve the problem of having to talk to customers for your startup, that would be amazing.


I'll have my bots talk to your bots


Would this be a competitor for sites like Prolific, or a complimentary service? Running surveys with "real" humans on Prolific doesn't seem that expensive.


Question:What is your major a)economics b)maths

Ans:it shows predominately Maths (70 percent on avg)I think economics is more popular irl but I could be missing something here


But will the quality change if you use just random numbers? I think no.


May I ask how you went about fine tuning in a cost effective manner?


another reason to think that survey is useless as how you design your questions will affect the prompted answers from real people similar to the prompted answers from LLM.


Is this a data is the oil thing? As in survey companies have all the survey data in their DBs and could (agreement permitting) train LLMs or maybe just plain old categorisation on it.


I like this but popular polls performed produce SurveyMaster as a better name.

Which it isn't.

So it has some flaws.

But great idea.


Congrats on the launch. Sounds like a smart idea!


What's the point of this lol



Fixed. Thanks!


"Simulate surveys" makes it sound like a satirical AI product


I was seriously searching through their webpage expecting a reference to the “conjoined triangles of success” to help indicate that this is all an elaborate self-aware prank. I was sorely disappointed to learn there is zero self-awareness here.


Self awareness for what lol ?

If anything, existing research indicates you can just skip basic surveys and go for complex simulacra experiments.

https://news.ycombinator.com/item?id=36868552


Can't believe this is coming from PhD students from Princeton University! It's so obviously flawed.


"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html


Is it ?

What little research we have of this kind of phenomena points towards this being very valid.

Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? (https://arxiv.org/abs/2301.07543)

Out of One, Many: Using Language Models to Simulate Human Samples (https://arxiv.org/abs/2209.06899)


A bayesian model will use previous data to calculate probabilities, so using AI to calculate a survey based on previous surveys sounds like a logical evolution no?

That's something you might typically do by hand when running a survey and, say, comparing it to a benchmark.

The problems will be similar to most AI problems I suppose: people who don't really understand the limitations of AI or the results it produces take the output of AI as gospel.

My own thought is: what does it mean to 'simulate' a survey if the outcome is that people treat it as a 'live' or 'empirical' one?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: