Hacker News new | past | comments | ask | show | jobs | submit login
A homework question in someone’s 11th grade statistics class (columbia.edu)
160 points by Tomte on Dec 6, 2022 | hide | past | favorite | 258 comments



> Stepping back, it’s interesting to see a homework question where there’s no unambiguously correct answer. This is sending a message that statistics is more like social science than like math, which could be valuable in itself—as long as it is made clear to students and teachers that this is the case.

Well, wait a minute. Do we know whether the purpose of this question is to make students consider all the options broadly, or is it that there is a "right" answer (in the sense that all other answers will be marked wrong) and it's just poorly written? I ran into those kinds of questions all the time on standardized tests, and they never let me answer in the form of an essay explaining the pros and cons!


Since it's a multiple choice question, one assumes there is a correct answer, and presumably that is A, because there is a clear argument for it (you wish to sample from the population distribution, which can be viewed as a mixture of an initial-respondents distribution and an initial-non-respondents distribution and so you cannot claim to have sampled from the mixture until you've sampled from both mixture components, since they are probably statistically distinct).

Tangentially though, a big question here is how wise it is for the US education system to be based so heavily on multiple choice questions. There are other countries with decent education systems that do not do this.


This smells like a scenario where the teacher isn't looking for the "right" answer, speaking broadly or objectively.

Instead, I suspect there is a particular concept or perspective they have been trying to impress on their students recently, and in this case they are expecting the answer that most demonstrates they have paid attention to the recent lessons.

I could be wrong of course.


I remember taking a multiple choice question test for some scholarship 35 years ago. 20 questions with the explicit instructions that for any question all of the choices could be correct or none of them or combination, and you were supposed to check only the answers that were correct. It was a VERY hard test. In addition you could score less than zero since every correct item that you checked would add points but every incorrect item itme you checked would subtract points AND every correct item you didn't check would also subtract points. You didn't get points for not checking incorrect items.

I've never seen a test like that for school work.


I had a few of those in university, they were a favorite of a particular professor for a while. On the very first exam, given that you had to get 50% of points to get a passing grade, IIRC about 20% had a score above 0.

Those exams got easier after a couple years. I imagine there were some words to be had with the dean.


> a big question here is how wise it is for the US education system to be based so heavily on multiple choice questions

There was a great blog post I read a while back on redesigning multiple choice tests to allow the student to indicate the "confidence" of their response, with a more confident answer being rewarded/penalized more heavily than a response with low confidence. This allowed for a statistically better sample of how well the student learned the material.

I thought for sure the post was written by Scott Aaronson, but I haven't been able to find it despite extensively searching his blog, so maybe it was someone else.


I believe you're thinking of this post by Terry Tao: https://terrytao.wordpress.com/2016/06/01/how-to-assign-part...


Yep, that was it! Thanks!


Is this this? https://theeffortfuleducator.com/2020/06/22/confidence-weigh...

Found with the following Google query: "multiple choice" "school" "confidence" "blog"


If I'm taking that test, why would I ever give a confidence estimate other than 0% and 100%? If I think I have a positive expected return, it's worth the risk to go for max points. If I don't, then I don't want to lose less points, I want to not answer at all, or equivalently give it zero weight.


Assuming the sibling to your comment linking to Terrence Tao is the correct one, the resolution is to harshly penalize being confidently wrong. The points are proportional to log(2p) (where p is your subjective probability of being correct, so you can theoretically lose all the points ever by saying your confidence is 100% and choosing the wrong answer.


That nightmare test design, even being the post they meant, doesn't fit the description I replied to. You are not ""allowing"" the student to state their certainly if you make the default penalty for a wrong answer either minus 4 or minus infinity. You are forcing a huge change.


If you calculated scores using a mean squared error loss function, it'd be better to hedge your bets if you were uncertain.


Can you explain what that calculation looks like? I'm at a loss.


Mean squared error is a function used in linear algebra/machine learning/statistics to optimize estimation functions. To calculate, you take the difference between the observed result and the “guess” of your model, then square it to get your loss.

In the case of taking a test, let’s say you’re answering a true/false question, true represented by 1 and false represented by 0. Let’s also assume you have no idea which one is correct, it’s a coin flip to you.

If you choose True, 50% of the time, the correct answer is true and you’ll have 0 loss, because (1-1)^2 is 0. The other 50% you’ll have (1-0)^2 is 1.

So your expected loss is 0.5(1)+0.5(0)=0.5

On the other hand, if you guess 0.5 (true with a confidence level of 50%), then 100% of the time your error is 0.5, and your mean squared error is 0.25.

In other words, you minimize your expected loss by guessing your true confidence level. This can be mathematically proven to work for any confidence level.

This could be adapted to multiple choice questions by treating each option as a true/false question.

https://en.m.wikipedia.org/wiki/Mean_squared_error


Okay, so the certainty is combined with the raw answer into a single number between true and false.


Correct. Or for more complex questions you could feasibly model it as guessing a point in multi-dimensional space. For example, if the answer to a question is a single word, and your loss function is the mse of semantic similarity between the guess and the true answer, but you, the student, think it might be one of 3 words, you could take a weighted average of the vectors representing those words in latent space of a large language model to minimize your mean squared error, where each of the weights is your estimation of the probability of that word being correct.

Sorry was that’s very wordy but hopefully you can get the point.

An easier to understand, but perhaps less sensible example would be to do the same thing in a quiz about arithmetic, so 5+5=9 and 6+2=7 is less wrong than 5+5=10 and 6+2=1.


> one assumes there is a correct answer, and presumably that is A, because there is a clear argument for it (you wish to sample from the population distribution, which can be viewed as a mixture of an initial-respondents distribution and an initial-non-respondents distribution and so you cannot claim to have sampled from the mixture until you've sampled from both mixture components, since they are probably statistically distinct)

But you can't survey non-respondents--because they don't respond!

What option A will actually result in is having a mixture of people who responded on the first try, and people who responded on the second try. In principle, neither of those will be representative of non-respondents. Whether this creates a significant problem in practice will depend on lots of other assumptions.

My own first choice would be either E or D, depending on (a) whether you have enough resources available to send 30 more emails, and (b) how much statistical power you are sacrificing with a sample of only 90 instead of 120. (If you were doing things properly, the number of emails you send out initially would be larger than the actual number you needed to get enough statistical power, by some factor that would depend on what fraction of people you expected to respond. In which case D would be the obvious option.)


The idea with A is that you will get a more representative sample if you follow up specifically with the people who did not initially respond. Yes, some will never respond, but the ones that do answer will increase the quality of the data, moreso than an additional random sample. The people who did not initially respond are more likely to be busy/lazy/unengaged, and without as many responses from that cohort, the data will be skewed.


> The idea with A is that you will get a more representative sample if you follow up specifically with the people who did not initially respond.

Perhaps. But you are also creating a potential confounder, since you now have two categories of responders instead of one.


That seems like a pretty small potential confounder. It doesn't seem to bother the US Census, for example, that they get some people's response via mail right away, and others they have to go knock on the door and pester them to respond. Any random sample of people is going to include some who really like answering questionnaires, and some who hate it or are too distracted or busy. Designing a survey where you plan to contact people multiple times seems perfectly normal and reasonable.


> That seems like a pretty small potential confounder. It doesn't seem to bother the US Census, for example, that they get some people's response via mail right away, and others they have to go knock on the door and pester them to respond. Any random sample [...]

The decennial census is not a random sample, though the Census Bureau does sruveys that are random samples separately.


The Census uses this technique when doing random samples. One of the ways to improve accuracy is to put a lot of effort into contacting a random sample of non respondents which simply isn’t viable at scale.


The US Census is not doing sociological research on people's opinions. They're collecting factual information. Big difference.


If you want social science research checkout the General Social Survey by NORC. They have been following the same cohorts of people for the last 50 years with biannual surveys. The data is open to view at https://gssdataexplorer.norc.org/


Option A is absolutely the worst choice. The non-responders have already selected themselves into a non-random set, and making a request to them alters the entire experiment.


> The non-responders have already selected themselves into a non-random set

Yes, and you're clawing back a non-random portion of that non-random set, which can be expected to improve the sample quality overall.


Why would this be expected to improve sample quality? You want a random sample, not a non-random piece of a non-random subset. Non-random sampling just introduces more confounders and makes your sample quality worse.


I would choose A, too, but if you can do A, the questionnaire isn’t anonymous.

I would go for A’: email all of them, reminding them to fill in the questionnaire, if they haven’t already done so.


> how wise it is for the US education system to be based so heavily on multiple choice questions

Who said anything about "US education system"?

Where in the original article does it say "US"?


Interestingly, in Israel they call a test with multiple choice questions "an American test".


Well, the question uses the word "freshmen", which I believe is US-only.


That's the point! Evaluating knowledge of university-level topics with childish multiple choice questions is so strongly associated with the United States that there's not that much doubt it's a US test.


I remember bombing a biology exam because I spent like 60% of the available time on a single question: "Is it good or bad for digestion to drink water with dinner?"

I mentally went over all the course notes, and was sure this was never covered directly. So, I thought, something must have been covered from which I can deduce this. Again I went over all the course notes mentally, and decided that was certainly not the case. I could made up an argument why it would be good for digestion, and why it would be bad for digestion.

Since no explanation was asked, I decided to approach it as a multiple choice question. Since I could recall more evidence for it being bad for digestion, I simply wrote that down.

I got 0 points. According to the teacher's key, could be either, as long as it included an argument (it was along the lines of "water makes food soft" which was not covered in the course because it's common sense).

Still get frustrated about it. Something can't be both bad and good. The question states "X is bad or good, which is it?", not "think of an argument why X could be bad or why X could be good". It basically rewards bullshitting and punishes answering the question faithfully.

I went into STEM after that.


Which discipline is biology included under:

A) Science

B) Technology

C) Engineering

D) Mathematics

E) None of the above


Which discipline is biology included under:

A) Science

   A far as we know consciousness runs on magic, so biology isn’t science. 
B) Technology

   Biology isn’t created by an intelligence, so it can’t be technology. 
C) Engineering

   If it is engineering, it’s very bad engineering. What kind of engineer runs a toxic waste pipe directly through a recreation area?
D) Mathematics

   Everything is math, too vague. 
E) None of the above

   Maybe E.


Biology doesn’t really study consciousness.


You're wrong in A - biology doesn't care about consciousness. And in B - biology can be very much created and simulated these days. C is a fun joke, but realistically - it's good efficiency and we have reflexes that shut off most nasty stuff happening by accident when we recreate.


Animal behavior is part of Biology.

Humans are animals, so Psychology is a subfield of Biology in that sense.

If consciousness is relevant for Psychology it is indirectly relevant for Biology, and even more so if it is relevant for the behavior of any species apart from humans.

I'm pretty sure Biology is Science, though, even if there could be some magic hidden there.

PARTS of Biology also fall under B, C and D (Biotech, Bioengineering, Game Theory), so one could claim that the answer is ALL of the above.


Biology isn't STEM now?


It is, I used it more as a synonym for math and engineering, but that's apparently incorrect.


There's no B in STEM /s


1) WHAT IS THE DESIRED SAMPLE SIZE obviously it isn't 120, because only a moron desires a sample size of 120 and sends out 120 random spam requests

2) WHAT IS THE NECESSARY CONFIDENCE INTERVAL

4) WHAT IS THE POPULATION SIZE

But in general, a 75% response rate is likely fantastic, and should be sufficient if the original "120" sample size was rationally selected. So D strikes me as overwhelmingly the best answer.


If it’s being used for evaluation, then there should be a “correct” (or “best”) answer. If it’s not used for evaluation, then I’d probably be more confused than enlightened by a multiple choice question with no best answer, and prefer it phrased as a discussion question or similar.

Just my $0.02.


Option C is missing, that must be the unambiguously correct answer.


What do you think it might say?


C. Use the 90 responses as a sample if it supports your intended conclusion. Start over otherwise.


Check if the 90 received confirms or is in conflict with the findings your boss wants to see.


Jinx!


Kick the non responders out of school. Now the sample is representative, or else.


On standardized tests, couldn't they be fake questions that exist only to catch cheaters?


William Cochran, best known for Cochran's Theorem (https://en.wikipedia.org/wiki/Cochran%27s_theorem), wrote a book about sampling techniques in 1977 (https://ia801409.us.archive.org/35/items/Cochran1977Sampling...) and dedicated an entire chapter (13) on the different approaches you can take for the cases of non-response to voluntary surveys.

One approach that might fit here is similar to what he calls "Double Sampling", which is basically to email the 30 that didn't respond, and even if you get something like 5-10 responses back, you can use that smaller sample to roughly represent the entire 30 that didn't respond. Not perfect (this will increase the error bars for your survey results), but nothing in survey design is.

A few other options are also offered, such as taking a much smaller representative sample, but somehow forcing 100% participation (possibly do-able in some circumstances) to help provide a gradient for the whole population, including those who didn't respond to the initial survey.


I was contacted for the Swiss labor market survey and ignored that because I was very busy with other things (business, baby, etc.) It required multiple extended phone interviews.

They followed up and emphasised that they take the experiment very seriously and that my answers would be considered representative of 500 people in Switzerland, notionally all juggling the same priorities as me.

I was very impressed at the seriousness and made time to take the survey.


The problem is representation not response. Forcing or sample again … not sure it will give you


This is the exact type of question that tricks those with more nuanced knowledge but doesn't trick those that memorize textbook answers. I'm not sure we should be encouraging this type of learning, despite it being how a lot of standardized testing questions work.


I wonder how this type of question even gets written. Does the author purposely put in these "gotcha" questions to see if you read the textbook? Or are they literally unaware of how bad the question is written - ie: their knowledge is limited to the point they don't see the nuance or unambiguity?


I would say it is a utter failure of a question if an expert in the topic is perplexed by the answer and there is (presumably) a supposedly correct answer.

Knowledge is punished? Don’t ever learn beyond the book? That’s outrageous.


I used to fully agree with that feeling but not anymore.

IMO they are two kinds of tests:

1. Tests to help you retain information.

2. Tests to evaluate your knowledge.

Assuming this question belongs to the first kind (memorization) and the answer is in a textbook being studied, then this is a good question.


So, a good question for a bad kind of test? Why shouldn't all tests be type 2?


Tests are one of the best technique to retain knowledge. It's called active learning, as opposed to passive learning.

Space Time Repetition systems such as Anki or Mnemosyne or SuperMemo are based on 2 concepts: (1) knowledge fades overtime unless reactivated and (2) asking questions is very effective to activate knowledge


You can also get spaced repetition through homework and revisiting material through lectures. I'm not denying that tests also don't do this (also forcing studying) but let's be honest that most course structures aren't created in a format that promote actual spaced repetition. You're moving the goal post a bit. If it is up to the student and not the teacher then the test is irrelevant to the discussion.


And type 2 tests don't achieve that?

Remember, we're talking about the benefit of questions that encourage the ignoring of nuance.


They can but:

- it's more time-consuming to create good evaluation questions.

- it's more time-consuming to provide a nuanced answer.

- it's more time-consuming to evaluate nuanced answers.

If the purpose of the tests is to help students retain the knowledge they've been recently exposed to, then the first kind of tests are more effective. That is, the teacher will be able to provide students with more tests, thus helping them retain more in a shorter amount of time for everyone.

It's a compromise.


Ok, you're thinking type 1 vs type 2 is multiple choice vs written answers.

I'm thinking type 2 can have multiple choice, but then there's only 1 answer that can be correct.

Type 1 can have multiple choice, but puts multiple correct answers and expects you to pick the one that is most directly covered in the class, marking the other correct answers incorrect. That, to me, is testing information retention without nuanced knowledge.


On the first part of your post:

We don't have the full test, so maybe the teacher had a preface saying "In the context of book XYZ...". That is what I do when I create flashcards for myself. I specify what knowledge I am testing. We may also assume it was implicit for the students.

I could also imagine that forcing the student to make one choice is a feature, not bug. For example this decision may teach the student that (1) not everything is binary and a single book has no absolute truth, (2) to move forward, you need to make a choice.

Regarding the second part:

> That, to me, is testing information without knowledge.

Memorization vs. Understanding is a false dichotomy. Both are required for proper learning. See https://www.coursera.org/learn/learning-how-to-learn.


> Memorization vs. Understanding is a false dichotomy.

Hey, you're flipping us around. Who was it that wrote this?:

> IMO they are two kinds of tests:

I'm the one that's arguing that it's a false dichotomy, that tests need to cover both and not one without the other.

> I could also imagine that forcing the student to make one choice is a feature, not bug. For example this decision may teach the student that (1) not everything is binary and a single book has no absolute truth

By writing a test that forces non-binary things to appear binary and presents the book as an absolute truth? That's going to teach them the opposite?

> (2) to move forward, you need to make a choice.

You're grasping at straws with this. You're telling me that marking correct answers incorrect is good because it teaches a life lesson?


Yes, rereading the thread, I think I'm going too far.

My main and original point is the following: tests that only test memorization are good on their own ("Tests to help you retain information.").

You need both memorization and understanding to learn. Yes, some tests can improve both; but focusing on only one aspect - here memorization - is fine too.


Yeah, to circle back, as I understood you, type 2 do both, type 1 only memorization, and I called type 1 bad, because, like the question of this post, they can punish understanding. You say that's fine; I still say it's bad, but I guess it's just a difference of standards.


Yes, thank you for the discussion!


Testing information retention is fine. But the information "The textbook claims A is the correct answer" is not worth remembering, is it?


It is not critical but I think it is interesting.

For example, I've read Code Complete 2, and while browsing online I found that some of the claims in the book was not solid.

- https://www.sicpers.info/2012/09/an-apology-to-readers-of-te...

- https://www.lesswrong.com/posts/4ACmfJkXQxkYacdLt/diseased-d...

The point made in the above links are interesting but, because I remember reading this particular book, they are sticking more deeply in me.

Also it probably depends heavily on the topic.

For example in economic or philosophy, there are different schools of thoughts and I think it is valuable to know from which school a "fact" is attached to. And before learning about every school, it is probably easier to just remember where you first read it.

This is not critical, but maybe having this information can help you create new insights.


That is a fair point and I realize that I do remember some facts because the questions on the respective exams were, in my opinion back then, terrible. I would, however, consider most of these facts trivial by now. Still, it has played a role in my understanding of the matter.

On the other hand, these tests are usually meant to evaluate learning progress and are not themselves thought of as teaching material. I find this quite unfortunate and would really wish that an exam were more like an individual learning session but here we are.

Thus, while this question as it it might advance understanding, it might at the same time hinder progress because someone will fail the exam for giving the wrong answer.

Or for spending too much time on a question that someone who failed to get the lesson answered in a second. This is an answer to what if the question is not meant to be answered "correctly" but just to stimulate learning. Unless the whole exam does not get graded - which would be great and hilarious at the same time - I fail to see the fairness here.


I think we're talking about two different things. I have no problem with tests in general. But I do have a big problem with text questions that are intended to be tricky.


Actually, that sounds like it's working as intended


The answer has to be A, under reasonable assumptions. If you assume that the university wants to find out the opinions of all students, and if you assume that students who didn't respond still have opinions, then the problem is that the students who didn't respond may have different opinions from the students who did, skewing the results. Only choice A tries to solve this problem.

People here are drastically overthinking it. Yes, it's true that the problem didn't state those assumptions, but you are supposed to make them, not to say "the university didn't give us its criteria for 'best plan' so all the answers are equally good". Given the way that problems like this work, it's obviously meant to test knowledge of the specific idea that the people who respond may be nonrepresentative of all the opinions. Only response A shows knowledge of this idea.


"Has to be" is a stretch. By nagging the non-respondents you are introducing a bias that the first group of respondents were not exposed to. If you annoy a respondent, it's very reasonable to assume that can influence their responses (probably negatively).

Option D seems objectively better to me over Option A based on this bias.


Option A is strictly superior to option D, as you are still have the data for original 90 responses. If second round responses are similar, you have increased confidence that you are not running into responder bias. If it's different, you can publish both and note uncertainties and need for further study.


I hear your argument, but that also introduces a slippery slope of cooking the data to support your original hypothesis. Now that you have the original data set and the expanded data set, one is likely to do that better than the other.

If the research industry were as pure as science itself of course that would never happen...


I actually vote against A. My work was always sending out 'completely anonymous' surveys that I generally ignored.

Resending to tell me I hadn't done it yet was indication enough to me that it wasn't completely anonymous at all.

While the question doesn't mention anonymity specifically, I'm sure it remains a point in some percentages mind.


But I could make a table Request(email, url, submitted) and a table Response(data)

And you couldn't connect the response to the user. Thats the important part right? not the ability to resend a request


I'd argue recording anything at all, including whether I did it, breaks the 'completely anonymous' contract.

Then we're left with...mostly anonymous. But at that point it's a black box. It could be done how you describe. Or it could be attaching my name to the survey and emailing the CEO directly. I'd never know the difference.


I mean sure. If you think someone is lying to you then it doesn't matter.

But I disagree with "getting a reminder means its not anonymous" its completely orthogonal.


Are you assuming the request to respond is broadcast to all recipients regardless of response status?


No implementation is something like email is primary key, url is per user.

going to page and submitting will do two things. the table Request will be updated and marked submitted. the table Response will be populated with only the data.


If you can derive the email at submission time then its not anonymous.


It seems like the disagreement is between what makes data anonymous.

If I write a letter and don't sign it. It's anonymous. Someone could use a corpus of my text and infer I wrote it. That doesn't mean I didn't write anonymously.

I could make 2 updates to 2 tables and then end result would be that having both tables wouldn't let you correlated the data with submitters.

Yes if you control every aspect of the process you can lie to people. Thats not the point. If you think someone is lying to you to harm you why would you interact with them?


The point is there's ways to achieve this even assuming bad actors and doing something that's obviously flawed doesn't seem helpful.


> obviously


I work for a company that actually runs surveys, and believe me: we don't know who the respondents are. They get a link with a randomly generated id, and that's it. For us, even their email address is hidden, because respondents register to panel at another company, which deletes this information three months after closing the survey, which usually runs for a few days. I can't vouch for the other company, of course, but I do believe them. Sometimes it would be really nice to know who these people are, because not only is non-response a problem, bad responses are an even greater problem, so we'd like to know who to exclude in advance.

So much for unbiased representation.


The question never says the survey is or claims to be. completely anonymous.


Two fallacies:

1. The non-responders are not responding for similar reasons.

2. Those reasons are somehow related to their opinions.

Both are clearly false. You cannot assume anything about their opinions from the fact that they didn't respond. All you have is their (non)action - which could have been caused by being ill, busy, distracted by personal issues, absent, drunk, or for any other number of reasons.

None of the answers are correct. If you wanted representative data you would have to:

1. Confirm with a different sample but also...

2. Use multiple different channels and methods (phone, in-person survey on campus, maybe an email survey with some kind of benefit to encourage participation, etc) to try to eliminate the non-responders who have personally valid but statistically irrelevant reasons (away, ill, etc).


As the blog post says non-responders do tend to have different opinions on average than responders. Typically they do try to demographically determine who those non-responders are so they can adjust and increase the weight of that group proportionally to their limited response. This is how most polling works in the real world and it does provide better accuracy.


Suppose the non-responders are identical to the responders. Then nothing is lost by trying to get them to participate again. But if they are not, replacing them will lose information.

> ill, busy, distracted by personal issues, absent, drunk

But being ill correlates (lightly) with being older, and being drunk heavily correlates with opinions. Being absent might correlate with wealth (and not even linearly). Personal issues probably correlate with education. Add them together, and you've got a fairly large part of the population missing from your sample.


>The answer has to be A

We have this problem at work. You try to contact customers and they never respond. Sending repeated queries has no effect.

The real answer is: The question is poor and ambiguous, therefore the question is wrong. It reflects poorly on the academic institution asking it. I would be in search of a new school for my child if I saw many test questions like this. They didn't even enumerate the answers correctly.

It is asking for the best plan of action for the University without clearly specing out the goal and all its parameters. The goal may be "Determine if at least half of our first year students had a positive experience." and that can possibly be solved with 90 responses if at least 60 had positive experiences.

It's just like when your boss asks for a time estimate on a task he's barely defined. It depends on the spec he's asking from you. Without clear specification, the question isn't answerable.


> It reflects poorly on the academic institution asking it.

Only in so far as it is posed with a multiple choice response and no opportunity to make a comment.

The question itself is about a commonplace occurrence in statistical sampling.


>The question itself is about a commonplace occurrence in statistical sampling.

It certainly is, but once again, the question does not specify enough information to choose an answer. Maybe A is correct, if you have a backup/secondary email to try. Maybe, like so many of us who work with email notifications, they can see the email wasn't opened. Or it bounced. Or it was opened and flagged as spam.

Is that all the information you have? You emailed 120 of them and received 90 clickthroughs? Well sir, that is an astounding clickthrough rate! Perhaps that exceeds every expectation of responses and is itself actionable. Perhaps the best plan of action for the University is to discover their magic clickthrough sauce and monetize it.

What is the best plan of action? Impossible to answer when they haven't specified a goal.


You may have missed the part where I pointed out that it's a fine question .. that requires an opportunity for comment.

It's not a multi choice question (a form I dislike) but a question that requires a reasoned response.


>it's a fine question

>It's not a multi choice question

But it IS a multiple choice question, and it is ambiguous, which is why it's not a fine question. It's a sloppy, poorly constructed question, which isn't even enumerated correctly, allowing the instructor to select the "correct" answer based on unspecified reasoning, potentially discriminating against the person answering the question (which I suspect is the intention).


> which isn't even enumerated correctly

You keep saying this, but the screenshot shows that the student has used an elimination tool (the X on the left) to eliminate one of the possible answers (C).

> allowing the instructor to select the "correct" answer based on unspecified reasoning, potentially discriminating against the person answering the question (which I suspect is the intention).

The question looks like it's from albert.io, so the answer is predetermined (and generally provided by the vendor).


I have no reason to believe that students that ignored the survey the first time would respond to a second survey. There will always be non-respondants, it's just part of the domain. I'd vote for sending out 120 more surveys simply because 120 is a very small sample size to begin with.


There’s always a risk that if you increase sample size without addressing selection bias, you’ll just end up with more confidence in an incorrect conclusion. (“Dewey defeats Truman!”)


Right, you also don't know what was covered in the class. But do the best you reasonably can to collect a random sample and then go with it is almost certainly the best answer in this case. (Although you can make an argument for D.)


Option A is absolutely the worst choice. The non-responders have already selected themselves into a non-random set, and making a request to them alters the entire experiment.


Surely the responders are also self selected into a non random set too?


The set you wanted, so IN the experimental design. Or did the experiment include also begging people to fill out a questionnaire they had initially trashed? Many people appear to think so.


The people who didn't represented when nudged can answer incorrectly because they are just trying to speed through it. It should be treated as a separate population and not merged into the rest.


This is how we do it in reality. Missing answers are very important!


No, the question says the original 120 were "sampling". There is no "all".


> Only choice A tries to solve this problem.

No, it doesn't. It doesn't get any information from non-responders; it can't, because they don't respond! It only gets information from some people who respond on the second try instead of the first.


Choice A will likely get some answers from students who didn't respond the first time, increasing the representativeness of your sample. It's only futile if you conflate as "non-responders" the people who didn't respond on the first opportunity with the smaller number who wouldn't respond even given a second opportunity.


> Choice A will likely get some answers from students who didn't respond the first time, increasing the representativeness of your sample.

Possibly. But it also introduces a potential confounder.

> It's only futile if you conflate as "non-responders" the people who didn't respond on the first opportunity with the smaller number who wouldn't respond even given a second opportunity.

No, the non-responders are the ones who never respond, period. Which, as I said, means you can't sample them to see what they're like.

What you're doing, instead, is creating two different subsamples of responders: those who responded on the first try, and those who responded on the second try. Which might make things worse, since now you have an additional variable that you created by sending a second email to those who didn't respond the first time.


Option A is absolutely the worst choice. The non-responders have already selected themselves into a non-random set, and making a request to them alters the entire experiment.


> The non-responders have already selected themselves into a non-random set, and making a request to them alters the entire experiment.

This is a better statement than any of mine of the point I've been trying to make.


There is nothing in the question which suggests some of them (or all) wouldn't respond to a second, third, or nth prompt. In many surveys, non-respondents receive multiple outreach attempts.


> There is nothing in the question which suggests some of them (or all) wouldn't respond to a second, third, or nth prompt.

Of course not. But making second, third, or nth requests is not "sampling from non-responders". It's just creating more and more different groups of responders. Which doesn't improve anything statistically that I can see; in fact it might make things worse, because you are introducing more and more potential confounders.


When this is a concern, I do two analyses -- one with and one without the potentially confounding group. It is still better to obtain the data, since otherwise we knowingly have a biased sample. (Somewhat similar to only surveying people who "seem approachable.")


> When this is a concern, I do two analyses -- one with and one without the potentially confounding group.

If this analysis tells you there is no significant difference between the subgroups, that's nice, because you can then just lump them into one and not worry about it any more. (This also tells you you didn't actually need to send out the second email, but what's done is done.)

But if this analysis tells you there is a significant difference between the subgroups, you don't know if it's something that was there already or something that you created by sending out a second email to people who had not responded to the first. So you're not actually better off than if you hadn't sent the second email; you're worse off, because you now have a confounding variable present and you don't know what to do about it.

> otherwise we knowingly have a biased sample

You have a biased sample regardless of whether you send out a second email or not.


It seems you could possibly apply this same logic to exclude any subgroup.

For example, you must sample the mall-goer population. Some people "seem approachable"/etc. ("respond to your first email"), and you approach them. You are worried that others will be annoyed if you approach. If this assumption is true (which it may not be), their responses could be biased and confounding.

Should you still survey them? I believe the stats textbooks would say yes, since it will result in a more representative sample.

I assume it would be a detriment to the survey if you exclude some groups merely because they aren't compatible with your survey-taking techniques. All surveys have some inherent measurement error, and there are some techniques to detect biases/inconsistencies without excluding entire groups.


> It seems you could possibly apply this same logic to exclude any subgroup.

No, just any subgroup that, by construction, is not reachable with whatever technique you are using. The key point is that there will always be such subgroups. See further comments below.

> Should you still survey them?

If you mean, should you still send out surveys the first time even though you don't know how people will respond? If you're going to do a survey at all, you have no choice, so this question is pointless.

If you mean, should you continue to pester people who didn't respond the first time, on the grounds that "it will result in a more representative sample", I would question whether that is really the case. See further comments below.

> I assume it would be a detriment to the survey if you exclude some groups merely because they aren't compatible with your survey-taking techniques.

It's a "detriment" if you fail to realize that there is no way to avoid it. Taking a survey inherently means you get responses from people who respond to surveys and not from people who don't. Going back and asking a second time for a response from people who didn't respond the first time inherently means you get responses from people who will respond the second time and not from people who won't. And so on. There is no way to avoid the fact that any survey-taking technique will inherently exclude some proportion of the population.

Once you realize this, you realize that the idea of "getting a more representive sample" is based on a false premise, that you can somehow "include" everyone if you just go about it the right way. You can't.


"Getting a more representative sample" is premised on getting as representative a sample as possible, not necessarily perfectly.

Using language such as "pester" is making assumptions about the group. In fact, many likely were busy or have just forgotten to reply. As I mentioned above, there are techniques for assessing biased/inconsistent responses.


> "Getting a more representative sample" is premised on getting as representative a sample as possible

But representative of what? You can't get a sample that is "representative" of non-responders--because they don't respond.

If you ask once and include whoever responds in your sample, your sample is representative (you hope) of people who respond.

If you ask non-responders again, your sample now includes people who responded when asked once, and people who responded when asked twice. But the "asked twice" part introduces an extra variable: did the fact that you asked them twice change something that you would rather had not been changed? Given that, it's not clear that the second sample is any more "representative" of anything useful than the first.


Representative of the population under study.

In practice, "extra variables" are naturally present in every study. For example, surveying mall-goers in the morning vs evening. Or surveying people at one end of the mall vs the other. These variables are likely far more confounding than "must send a followup email".

People asked in the evening might be more annoyed, but you shouldn't assume this a priori and decide to skip surveying them. Just as "busy people" in the studied population might forget to initially respond to an email and need a reminder.

Just as a personal anecdote, I've dealt with this issue quite a lot. In my surveys, the people who respond initially are almost always eager to give glowing reviews. If we didn't send follow-ups, we would have extremely positively biased results. Maybe it is true that sending reminders makes respondents slightly more negative than they normally would be. However, we'd completely exclude "unsatisfied people" otherwise and the survey results would be worthless.


> "extra variables" are naturally present in every study

That's true, but it doesn't change the fact that if you ask people a second time to respond, that's an extra variable you created, not one that was naturally there already. It's not a good idea to create extra variables in the course of doing the study.

It's true that the people who respond to the survey without any further prompting are not necessarily going to be representative of the entire population. But that doesn't mean you can "fix" that by further prompting. What it actually means is that surveys are a tool with only limited usefulness. That's just an inherent, inconvenient truth about surveys.


In terms of honest research, discovering uncertainties is as important as discovering trends, you just duly present these in your conclusions. If you are motivated to reach certain conclusions from the outset, you are looking for politics or marketing class, not statistics class.


The question cannot be answered. What does 'best' mean? The statistical most profound? The cheapest for the university? The optimized combination of both? ... And no, assuming that the 'best' is clearly defined, because it is a statistics test, is not valid.

Apart from this, the statistical best method is to include 'did not respond' in the model, as (an)other commenter(s) has (have) already mentioned. Bayes rules, btw!

Reselecting 130 students is not different from going with the 90, if the not-/responding is a confounding variable, as you just select more from the responder group.


> What does 'best' mean? The statistical most profound? The cheapest for the university?

It means the method producing the sample most representative of the population. It's fairly obvious they're not asking about (say) budget/cost minimization on a statistics test.


'Fairly obvious' ist not really scientific.

I may be a little bit picky, but if someone cannot formulate a clear question in such a simple case, he will fail more badly in more complicated cases.


Yeah, they're all different types of wrong. Which is cool for an essay question, but strikes me as BS for multiple choice.


Exactly. The question doesn’t even state a goal. Without a goal, there can’t be an answer to "the best action".


the most profound one in that case was probably C, or 'no answer'


This is simultaneously a great question and a terrible multiple-choice question.


Exactly. There needs to be a few lines for justification of your logic below the answers. Also, why was "c" missing other than to make this seem slapdash?

But to the author's point, I wish that more people understood just how "soft" statistics can be compared to other mathematics disciplines. Some folks just view statistics as gospel, regardless of methods actually applied.


C is missing in the screenshot because the student had already used the elimination tool to hide it.


Ooof. My inferential thinking is suffering this morning apparently.


If you've designed your sampling properly, a 75% response rate is more than sufficient. Maybe that was choice C? This isn't really a statistics question per se; it's a question about how to design and operate a research survey. Suggesting that you need a 100% response rate is absurd and perpetuates a deep misunderstanding of research design.


My survey shows that 100% of people answer unsolicited phone calls.

Before running with missing data, you need to make the case that there's no plausible reason why non-respondents would have different answers from respondents.


> My survey shows that 100% of people answer unsolicited phone calls.

Did you not count all the people that didn’t respond?

This is just a problem with inference, not the sampling. “100% of the _respondents_ answer unsolicited phone calls _at least some of the time_”.


Yes, that's the point. People who respond to surveys are different from people who don't. The challenge is: when you ask some question that might be correlated to responsiveness, like "how many hours per week do you work?", you basically end up with zero information about what you care about (how many hours average people work.)


> If you've designed your sampling properly

You are making a much stronger assumption than this. You are assuming access to demographic information of each subject, as well as population base rates for a sufficient set of attributes to correct for non-response bias. Maybe we have that, maybe we don’t. If we don’t, option A is a good tack.


Where does "75%" come from? If I were trying to estimate what proportion of the population can read, or what proportion of the population is too busy to answer a questionnaire, then I wouldn't assume that the 75% who responded are representative of the whole population. Those are extreme examples but there are lots of things that might be correlated with people's willingness to respond.

Perhaps choice C was: Start again with a new random sample of students, but this time offer FREE FOOD to everyone who responds.


Nobody operating a survey of this type assumes they'll get 100% response rate. (Although it is very common to offer something like a gift certificate to N randomly chosen people who answer the survey.)


I mostly agree but on the subject of student life, those judging student life to be poor could be the most likely not to engage with the survey. I would go forward with the data, but the response rate should be discussed in the report.


Without any a priori information about the similarity of the responders and the non-responders, you should assume that the non-responders have colluded to hide the effect from you as much as possible. This is not a random sample anymore.

You can still get information out of your sample of that form, but its very bad, and you should only pay attention to it if you can't afford to get anything better.


The only correct answer is "A: mail them" if you want to stay statistically relevant.

You cannot ignore the group that didn't answer the questionnaire, as they will most likely expose some of the behavior that you are researching (i.e. about life etc), and might have a huge impact on your results.

So, the statistical result you currently have (based on the 90/120 students) will most likely be biased, and is invalid. (25% of missing input that might heavily impact your outcome is most likely making your results useless)

Thus, the only way to make it statistically relevant is getting more answers from the no-show group.

B. If you start over and do the same thing you will most likely get similar results/no-shows. So that will not be a good solution

D. As explained before, ignoring the no-shows results in a potentially biased outcome.

E. If your initial set is most likely biased, adding another 30 subjects will not fix your initial bias.


The behavior that the non-responders are exposing is that they don't like answering surveys. Factor that in how you will, but if you beg/incent/demand that they answer, you are no more likely to get honest answers.

I remember kids in school who would just answer "A" to every question because they didn't like surveys and didn't care.


> the only way to make it statistically relevant is getting more answers from the no-show group.

You can't get answers from the no-show group, because they don't respond! You are just creating three groups instead of two: those who responded on the first try, those who responded on the second try, and those who don't respond at all.


The fact that you have to nag them will introduce bias by itself. Now 25% of your sample will be biased by the nag. It’s just a question of whether that’s worth it compared to not including them at all.

Anyway, this should be an open ended question.


> You cannot ignore the group that didn't answer the questionnaire, as they will most likely expose some of the behavior that you are researching (i.e. about life etc), and might have a huge impact on your results.

This is completely paradoxical.

You are saying that using the data from the 90 would be jumping to conclusion because you would probably ignore data that would not match those 90.

But making this claim IS jumping to conclusions, because you are making an assumption (the 30 have something in common explaining why they didn't fill the form).


See: non-response bias [1]

Parent comment should've said "could" instead of "will most likely", but their point is correct.

[1] https://en.wikipedia.org/wiki/Participation_bias


The article you link to establishes the reality of participation bias, but it does not exactly endorse option A. It does say "In e-mail surveys those who didn't answer can also systematically be phoned and a small number of survey questions can be asked. If their answers don't differ significantly from those who answered the survey, there might be no non-response bias. This technique is sometimes called non-response follow-up." This is not, however, the same as option A, which (as far as it goes) commingles responses from those who respond to the second prompt with those from the first, potentially concentrating a non-response bias in those who don't respond to either prompting. Furthermore, neither option A nor the above quote offers a remedy if evidence of non-response bias is found.


It's really not. The 30 might have something in common, which calls any findings that exclude them into question. This doesn't rely on any unreasonable assumptions


Option A is absolutely the worst choice. The non-responders have already selected themselves into a non-random set, and making a request to them alters the entire experiment.


You guys are thinking too much. The correct answer is whatever the teacher said the answer is in a previous class. If the teacher said in class that the sky is black, and the exam answers are: A) blue, B) black, C) white, D) grey... then the sky is B) black.

School is not about learning or thinking. It's about absorbing info fast and spitting it back out fast.


IMO, this is almost certainly a badly-written question with a single “correct” answer, A.

The common thread is that they all answers are related to “non-response bias.”

This is the type of Q that results from question-writers that have (1) no deep stats background/passion and (2) a list of textbook concepts to write multiple choice Qs for.

I used to go CRAZY in school, until I started thinking of these Qs not as “what’s the answer” and more “what answer is most likely to result in a question like this?”


Given the topic is "student life, academics, and athletics", would you care about the non-responses? If the topics correlate well to non-responses then you'd have a reason to care about the missing results. The best answer here is D, because there is always attrition when doing surveys and getting 100% responses isn't usually a goal for this type of experiment.


A- Asking students who haven't answered the questionnaire will likely yield a very low response rate (not a lot of entropy gained here)

B- Another shot at conducting the experiment? Great now we can do more statistics on two independent sampled sets (lots of entropy)

D- Why would we discard the chance of obtaining more information?

E- Worse than B


Since it's a "statistics class", I'd assume "best plan of action" would mean "which one of the options introduces less amount of bias" here. Like many I also struggled a bit with the question in isolation, but considering this IS a statistics class, it's probably trying to find how students would minimize bias here.

For this reason, assuming 30 people didn't answer due to some given bias*, assuming they were properly and randomly selected option B sending to another 120 students would probably hit the same bias again and we should expect around 30 students to also not answer due to this. So B and D both are hitting the same bias. Same for E, but with a smaller sample size. The only viable option is A, for which (assuming a normal bias) we should expect that some amount of the the students who did not answer initially would answer after some nudging, hence reducing the total amount of bias. We could be missing the strongest biased here and that might be relevant; but with this strategy at least we would maintain-to-reduce the amount of bias (vs maintaining it with all other options).

We could also offer some stronger incentive for those 30 to answer, either stick or carrot, but that's not an option.

*this could be anything and we are not measuring/finding it, but for some example of possible biases some students might have too many courses and biased against answering optional questionnaires, some might not be affected directly by the questionnaire so think not to answer, maybe the ones in univ. dorms tend to answer more than the ones commuting, etc.


I disagree, the answer ought to be whatever was planned in advance (before they saw the initial data). Once they've peeked at the data, it's not good to change the experimental procedure as is being suggested here.

The plan should have included some accommodation for non-response (like a reminder email, a suitable reward, etc), but if it didn't it's too late now. They should analyze the data they collected and apply lessons learned to the next survey.

A single reminder email is probably not a big source of bias, but something like increased rewards really could be. To make it ridiculous, if the school emailed the 30 non-responders and offered them $100,000 cash to fill out the survey, then asked "how do you feel about your school?" they will get very different responses than they got from the students who were offered a $5 bookstore gift card.


Depending on the value of "whatever was planned in advance" you could potentially do both.


Exactly, it's not about always doing X and never doing Y, but designing a procedure that best answers your research question (within the constraints of practicality) and reporting results in a way that they accurately reflect what you really did.


What really matters is if the bias of your experiment is understood/understandable by downstream consumers of your result. Question is poor in my opinion because it displays an ignorance of the basics of experiment consumption.


There's no right answer.

I suspect a few things are going on here.

1. The teacher forgot to include that the University wants 100 responses to the survey. 2. Missing answer C likely contains something like "Randomly select 14 more students to receive the survey"

That makes it an actual statistics question, you'd need to calculate the response rate from the original 120 and then compute the additional solicitations required to meet the response count goal of 100.


That still doesn't fix the problem with the question. Without knowing the goal of the survey (what it is trying to measure), you can't determine the (in)significance of the non-responses.

Each answer has a failing:

A - student didn't have much to say, answers tainted by student being annoyed and clicking through the survey to stop the spam

B - surveyor didn't like the answer they got, so they drew again. that's straightforward biased sampling

D - might not make for enough data, as you said

E - student body isn't stateless, and may have had time to talk about the survey. "Hey friends, let me know if any of you get asked to do that survey because I want to give them a piece of my mind"

Choosing between them is only possible by making some simplifying assumptions, which requires knowing what you're trying to find out.


I've enrolled in online college to get my undergrad degree.

Some of my courses were related to certificates that many people list on their resume.

The questions are very bad. Very very bad.

One example:

Which of the following considerations are most important when choosing a software solution:

A. System Requirements B. Something else C. Distribution Method / Software Installation D. Something else

My answer is Distribution Method (ie, how do you install it, or is it available as a web service). After working in tech for 20 years, distribution is still the hardest problem, and the one that can take the most time, when you consider the total time by all stakeholders.

They said that is not the answer because there are software installation managers.

---

I understand why that certificate has the answer they gave, and I understand why the test is in multiple choice format, I just deeply disagree with each of those decisions. After taking that test, I feel like anyone with that cert will need to be retrained after hiring. I feel like it is worse than useless for many of it's claimed benefits. (You do learn some about hardware and standards, which is nice)


This is likely the result of an overworked and underpaid academic being asked to quickly draft a few questions for the online exam tomorrow. It is definitely a sign of insufficient quality control.


Not underpaid. No experience in real life software engineering. Probably thinking of this ideal world where there is planning and then a waterfall implementation.


I think waterfall hits the problem on the head. The course and certification are the product of a waterfall process. Consider all the work to update the questions on the course: The course syllabus, training material, tests. There are processes for each of those, with stakeholders to sign off.

If you want to change a question or add a new priority, you might make one of your stakeholders angry because that was their favorite topic and they have strong opinions.

The rollout has to be planned so that people currently studying the material are not caught off guard when they take the test.

I have a lot of empathy for the people involved, but it's still a bad product.


If you send another note to the 30 that did not respond, they are no longer randomly selected, they have become another category that took two times to respond. In fact, once you choose to do anything but use the original sample you've altered your experiment.

On the other hand, it is a fucking preferences survey sent to students so do whatever...


If the population of students that respond after one survey request is different than the population of student that respond after two survey requests (or don't respond at all), then the original 90 responses aren't representative of the entire student population. "Altering your experiment" is a distraction because the presumed goal is to learn about the entire student population.


In order to answer a question in the form of "what is the best approach" the student needs to know what they want to optimize for. There is no global optimization target, so without this the student needs to guess the underlying goal in order to say what is "the best".

Question is bad.


> A large university emailed a questionnaire to a randomly selected sample of 120 students in this year’s freshman class. ... Thirty of the students emailed did not respond to the questionnaire.

I don't answer questionnaires because they're almost always gamed by the sender to cheat me out of something -- or when I do provide answers they're always false in a way directly opposite of whatever I feel like.

For a direct example, the University sent questions and then asks other students to solve why some students didn't answer. I presume there's no field for a long-form answer for when none of the canned answers provide the nuance. That's a gamed questionnaire and provides zero honest value.

Other answers might be "the student's email address is invalid" or "the student is inundated with other duties" or "the student doesn't understand the questions" or "the student disagrees that any of the answers are valid".

The best plan of action is for the University to directly contact (in-person) the students and find out how to help the student to answer the question. It is, after all, in the business of helping students to educate themselves, right?


If you wander outside the walls of the teacher's imagination you are a criminal. They fool you into thinking you're learning let's say statistics, but you're really learning to obey, and to submit your memory and your time.


I frequently wonder what opportunities I've missed out on having never been to high school or college. On the other hand, I'm frequently glad to have never had to deal with that sort of B.S.


"Did not respond" is a legitimate result of an experiment and should be included when making statistics. Increasing sample size doesn't change this.

The question itself seems confused.


Did not respond is indeed a legitimate result, however (as the blog points out) if the non-responders differ from the responders then every evaluation you do on the responders will be biased.

For example, if you ask students about their satisfaction with teaching, I'd guess that students with a bad experience are more likely to reply to your survey. Based on the data you gathered you will think that the teaching at the uni is worse than it really is.


Yup! And the “right” way to handle that bias depends on a lot of background information/subject matter experience.

If the question were “Which musical acts do you want for the Spring Fling festival?”, it might be okay—-or even smart—-to ignore the non-responders. Including data from people unlikely to attend is probably unhelpful. If you’re asking about workloads or engagement, you certainly can’t assume that data is missing at random or the non-responses are irrelevant.

For teaching specifically, one of the smartest questions I’ve seen is “How well do you think you’re doing in this course?” The crosstabs can help address response bias.


>I'd guess that students with a bad experience are more likely to reply to your survey.

Or only those with strong feelings one way or the other answer.

Or those with strongly negative feelings fear that the survey isn't really anonymous and they worry about retribution.

A. is pretty much how this would be done in most real-world situations. Make a second attempt to get people to answer and then go with what you have assuming you did get some reasonable response rate which 75% probably is.


In fact, when you conduct surveys, it's very common to ask screener questions. Among likely voters, among IT decision makers, among developers, etc.

It's fairly clear in this case: undergraduate(?) students.

In general, surveys are trying to get statistics from a demographic that's interesting to the person doing the survey, such as buyers or influencers of purchase decisions for a given product.


Wanting to use data is not a valid reason to allow for using data which is not suitable to use. If you send out 120 surveys and get 90 back, you can't make assumptions about what those 30 would have said and you just have to present the data you have.


Eh, it’s tricker than just “go with what you’ve got.”

For example, you should be checking whether the response rate is associated with other factors and incorporate that into your analysis. You might find that you have pretty good data from unhappy students, but not satisfied ones, or vice versa.


I mean it is extremely common to mislead (often unintentionally and with the best motives) and use the performance of collecting data to give credence to that. The alternative is to be up front about your methodology, which means not making assumptions at multiple stages in the process, and not shading the conclusions by 'looking for other factors' or other things. When you do multiple rounds of 'fixing' data you are just injecting assumptions about the true distribution, which violates the entire point of collecting data at all. If you 'know' what the answer should look like, just write that down that assumption and skip the extra steps, OR ensure the methodology will allow the data to prove you wrong, or allow the data to show a lack of a conclusion (including by lack of data).

I realize I'm taking a very harsh stance here, but I've seen again and again people 'fixing' data in multiple rounds, the effect of which is any actual insight is removed in favor of reinforcing the assumptions held before collecting data. When you do this at multiple steps in the process it becomes very hard to have a good intuition about whether you've done things that invalidate the conclusion (or the ability to draw any conclusion at all).


+1 to the confused question

What are we trying to test? That isn't clear so it's impossible to actually know how to follow up.

In this case "did not respond" could be perfectly sufficient, but we couldn't possibly know because all we know is:

"The questionnaire included topics on student life, academics, and athletics."


As an example, maybe you want to know if freshmen from different dorms had more or less satisfaction that average, but wanted to deal with the fact that different dorms had different response rates or could even have different reasons for not responding. The fear is that the data are not "Missing Completely at Random," (MCAR) meaning the missingness of the data is correlated with some of your predictors, e.g. freshmen who are too busy having a great time at one dorm, or students who are ultra-depressed at another dorm didn't check their email.

One solution would be to impute your missing data taking into account what data you do have about the missing participants (e.g. maybe dorm, major, gender). In a bayesian context you can include this imputation as part of the model fitting, which means uncertainty gets appropriately added to the results.

This is a good primer on how to handle this using the `brms` package in R:

https://cran.r-project.org/web/packages/brms/vignettes/brms_...


Could this be something where contacting non-responders or sending out additional questionnaires could skew the results? Maybe non-response is considered part of a poll and by fighting against that you would somehow be putting your thumb on the scale?

I'm not an expert either in statistics or in polling; so I'm just speculating.


> I'm not an expert either in statistics or in polling

Fortunately for us, the author is :)

The author has an outline available for a course on survey sampling for those interested in references and topics to learn more. http://www.stat.columbia.edu/~gelman/surveys.course/surveys_...


It certainly could and depends on the questions being asked. If you are sending out a questionnaire via email and some of the questions are along the lines of "how many times a day do you check your email?" or "how much time do you spend at your computer?" you could certainly get some skewed results.


Of course, it would not be a random sample if 75% get one questionnaire and 25% get two questionnaires and encouragement.


Probably close enough and potentially closer just sending out one request.


Perhaps:

* Take the first 90 results.

* Send the email, wait some specified time (Perhaps 75th percentile of time of how long it took to get the first 90 results)

* Apply Bayes Theorem if there are new replies

and move on.

Alternatively, having someone go physically TALK to students in quads and cafeterias will probably generate a higher quality of data. I'm guilty of it myself, but sometimes after a negative experience, I'm apt to just leave a 1-star review when it should really be a 3 or 4, out of spite. I think the internet has the inhibition-removing quality that encourages these behaviors, whereas if someone outside the store or whatever just asked me "Hey, how was your experience?" I'd have probably said, "Oh, it was alright, just wish they'd had the sauce I always order, I had to get something different."

But anyhow. It's probably C.


Tests and homework aren't written for someone who is just going in cold, you're supposed to take the class and read the material.

Since there's no way to judge the required sample size based on the information, it's extremely likely that option A was something specifically discussed.


That might make sense in a different case, but the actual question doesn't have enough context to answer it.

"What is the best plan of action for the university?"

"Best" meaning what?


It is a ridiculous and intellectually embarrassing question. I just picture what someone like Feynman would think of a question like this.

I wouldn't be shocked this is a higher order effect of too much of certain 20th century French philosophers. We can't even make test questions that aren't bullshit.

There literally should be a none of the above choice because of lack of information in the question.


Eh. As someone who has been involved with survey work--both giving and receiving--standard practice in this situation would probably be A and it wouldn't shock me if it had been covered in class. D is not wrong and might be the right answer if, e.g., emailing out a survey to a broader mailing list of people who don't have a close affiliation with you--i.e. you don't want to pester a general population. However, A is pretty much what is done for employee satisfaction surveys and things of that type.


Is it a test on the trivia of typical survey conduct or a question to probe understanding of statistics? If its a trivia question then ask it like "What is the most likely way a survey would followup in the event of ...".


I think Feynman would be pleased with question. He was well grounded in the real world of doing science with incomplete datasets. For researchers who gather data with surveys, this is a frequently encountered problem, so it is legitimate to discuss it an statistics course.

If there is a problem, it is the multiple choice format. This issue of handling incomplete datasets would be make a good essay question.

That said, it is perfectly reasonable to choose D and work with the information you have. That is what real world clinic studies do when they report the number of people who dropped out of a study.

It is also reasonable to choose A and attempt to get better coverage. That is what real world election pollsters do.


It's a homework question not a test question

The philosophical part might be the point


Philosophy of X is how we talk about X, the terminology and the state of our understanding. Philosophy is a huge part of learning, but we usually don't call it philosophy.


That would make sense if there was a freeform "why did you give this answer" section. As a multiple-choice question, there's nothing you can infer about the answer.


Or maybe it was a control question and the teacher is going to treat the spread of results as their own little survey...


But this obviously isn't a math class, which is the point. I think this post should be read in the context of "make students take statistics instead of calculus", then students will get questions like this instead of studying a topic with clear and unambiguous questions.


calculus is full of ambiguity. Not in a mathematical sense, but as someone who has recently TA'd a calculus class, I can confidently assert that no mathematics happens there. Mathematics is only unambiguous insofar as you can check your own work. None of those students would have been capable of that. They're just memorizing rules with many unintuitive caveats that are unintuitive either out of an absurd desire to restrict the curriculum, or because they are natural in the context of their proofs, but students are basically forbidden from seeing those.

Universities have decided that making calculus as capricious as possible serves their interests of having a weeder class.


In what way is a statistics class not a math class?


There is math statistics and non-math statistics. This question is a non-math statistics question.


It's essentially a mechanics of conducting surveys question.


But how does that speak to the nature of the entire class?


Questions like this, where you must have attended the class to know the answer, are often introduced to penalise no-shows. It can also expose suspected contract cheating in online exams.

However, it is quite risky, since it probably wouldn’t stand in court, e.g. when a failed student sues the school.


This is the kind of thing that is obvious for people who are good test takers. When you know the context of the question, then you know exactly what answer the instructor is looking for.


> Stepping back, it’s interesting to see a homework question where there’s no unambiguously correct answer. This is sending a message that statistics is more like social science than like math, which could be valuable in itself—as long as it is made clear to students and teachers that this is the case.

I think I agree with this sentiment but not the specific wording if taken literally. There is one "unambiguously correct answer" but it happens to be problematic in practice. The others are all "unambiguously incorrect" but possibly usable for a weaker result. Also statistics is much more like math than social science where it's important to be precise and correct and not qualitative hand-wavy reasoning.


I think statistics is the adapter between the social sciences and math, rather than being part of or similar to either.


Statistics can be that certainly but it's entirely optional. Analyzing JWST data uses statistics without the social science interpretive aspect.


Here is my answer:

The expected response rate of this year's freshman class to the questionnaire is 75 % (90 out of 120).

Option B, "Start over with a new sample of 120 students from this year’s freshman class." is on average not improving the situation, because we must assume that the most likely outcome is again a response rate of 75 %.

Option D, "Use the 90 questionnaires that were submitted as the final sample." is, of course, also not improving the situation.

Option E, "Randomly choose 30 more students from this year’s freshman class and email them the questionnaire." would give us on average 30*0.75=22.5 more returned questionnaires, which would improve the representativness of the survey in comparison to Option B and D.

Option A, "Send another email to those 30 students who did not respond encouraging them to complete the questionnaire." is the most interesting one. If we assume that the missing data from this 30 people is not random, in other words: that there is a correlation between not responding and some of the answers to the questions, it would be important to attempt to get as much of this missing data in a second call. The unknown figure is the response rate to such a second call.[1] But even if it is below 75 % it could increase the representativeness of the survey more than Option E, because it would be the only chance to represent first-time-non-respondents in the survey. However, we must also assume that the conscientiousness of the answers does not differ between first and second time respondents. Under these assumptions, I would choose Option A.

[1] This is why we cannot really be sure that Option E would not be better in a specfic case. If the second-time response rate is 0 %, Option E would trivially be better; if the second-time response rate is 75 % or more, Option A would be clearly better. Somewhere inbetween is the sweet spot were the advantage in terms of representativeness swiches from Option E to Option A. Where exactly this spot is depents on the unkown parameter of how much actually the answers of first-time-respondents and second-time-respondents differ.


I would think "A" is a good way to proceed. This might be a silly question, but I wonder how it affects us if we send this reminder message to everyone versus just the people in our sample who we know have not responded yet. I know that maybe some of the people who did previously respond might get confused and possibly try to respond again. I would be interested in seeing if they update their answers or if they stay consistent. Any statistical insights on this?

EDIT: I thought about this some more and considered that sending out a reminder message to everyone may be akin to starting a new survey using a non-random sample. But by reaching out to just nonrespondents, maybe it’s okay.


I got an undergraduate degree in statistics, and what we were taught was that there are lots of valid ways to sample but how you sample affects how you can generalize the results.

Since this is a college survey and not an FDA trial, it's probably ok to proceed with D or E in addition to A (B is the only bad answer). That said, there most certainly is a correlation between in non-response and lower scores on the survey, so when presenting findings it's important to represent that concern. In general, surveys are pretty weak means of drawing conclusions, but sometimes it's the means.


FWIW, ChatGPT answered this in one go:

"It is best for the university to use the 90 questionnaires that were submitted as the final sample. Since the university selected a random sample of 120 students and only 90 responded, those 90 responses provide a representative sample of the opinions of this year’s freshman class. Starting over with a new sample or adding 30 more students to the sample could introduce bias into the results. Additionally, sending another email to the 30 students who did not respond may not be effective in increasing the response rate."


Not for me:

"Of the options provided, the best option would be to send another email to the 30 students who did not respond to the initial questionnaire, encouraging them to complete it. This would allow the university to increase the response rate and collect more comprehensive data. Starting over with a new sample of 120 students would be time-consuming and may not necessarily produce better results, since there is no guarantee that the new sample would have a higher response rate. Using the 90 questionnaires that were submitted as the final sample would not provide a representative sample of the entire freshman class, and choosing 30 more students at random and emailing them the questionnaire would not address the low response rate from the initial sample."


Convincing, confident and opposing bullshit arguments, this thing should be in politics


Ok now put $1 in the GPT jar


This is a great high school-level question to test basic understanding of selection bias. A is the only answer to demonstrate that responses to web surveys are have inevitable self-selection bias.


F. None of the above

That's definitely the correct answer.

It might have been C: (C. Deal with the 120 data you have already as is) but since it has been deleted we don't know.


What's the difference between your C and the option D (use the 90 samples)

If I got 120 rows, where 30 was NULL, I'd throw those 30 away.


Those 30 are not NULL, they're "chose not to answer", usually classified as "don't know / won't answer"


I don’t think any of the answers there are unambiguously correct.

Did you even need 120 responses in the first place? Did you truly expect every single student to respond to an optional survey? How did you train 75% of your student population to respond to a survey in the first place, 10% seems like a more reasonable estimate.

The correct answer is to account for non-response in the first place, but I guess that was option C.


There is often attrition when doing surveys. If the correct answer is always A then that implies there can be no attrition for completing the survey. Since the list of topics asked don't seem to correlate with non-responses (such as doing a survey about sexual harassment where non-responses may be very important), the correct answer would be likely D, to discard the subpopulation.


There is no correct answer (though there could be incorrect answers).

Perhaps you want to investigate why the 30 students didn't submit the questionnaire. In the name of the study's sanctity, you'd best convince them to complete it. But you must not coerce them, or else the results would be tainted. So you invent a time machine...

OK, I'm done.


I think another option should to determine if those that didn't respond to the questionary belong to a certain type of alumns, for example the type of person that don't participate in social activities, anyway is important to know that some part of the population is different from the rest.


I'd pick D because it's the only answer that requires not doing any additional work.


I don't think they actually want you to respond as if you work there. I think most students probably know that. Most likely this question was in a unit about certain types of statistical biases. A is very clearly the correct answer.


> What is the best plan of action for the university?

Stop spamming the students. :p


Option A is absolutely the worst choice. The non-responders have already selected themselves into a non-random set, and making a request to them alters the entire experiment.


> Stepping back, it’s interesting to see a homework question where there’s no unambiguously correct answer.

Yeah, until the teacher tells you that you chose the _wrong_ answer.


I suppose it depends on the questionnaire that was sent out. Sounds like survivor bias might come into play. Was the questionnaire about dying from a university buffet?


Maybe this was a "Poll" type multiple choice question, given that it's online? Could be used as a jumping-off point for discussions next class.


The answer is C. But we don't know what it is, so we are trolled into debating which of the remaining answers is most likely correct. Answer: none.


Am pretty sure a better answer than any of these would be:

Randomly choose *40* more students from this year’s freshman class and email them the questionnaire.


This sounds like a good question for a political pollster. How do they evaluate all the people who don't answer the phone?


Real life statistics would use what he/she said. Adjust the response using the population statistics. Qed.


The problem with E is that it creates a biased sample of people who are willing to respond.


Why is this front page worthy? This is the academic equivalent of a shitpost imo.


I agree; ambiguous or poorly written questions are nothing new in high school academics. I suppose it's an interesting topic to argue over, but this question isn't anything special.


I think the best answer is "B. Start over with a new sample of 120 students from this year's freshman class.", given that you can still use the 90 answers you already have (I don't see why not). This would give the largest sample size.


If questionnaire's goal is to discover "Why don't students respond", then you could repeatedly random sample 120 students until, by chance, they all respond

You'd then have a lot of student data, but it is completely skewed

Solution is to stop doing statistics


This seems like it introduces another flavor of selection bias. You're still selecting a sample composed of people who respond and ignoring those who don't. I don't have a better answer, though.


The questionnaire is about "student life, academics, and athletics".


So "Are students academic enough to understand this questionnaire, or wheelchair bound to stop them posting responses?"

The skew still stands

A meta point is it is probably impossible to remove bias

EDIT: sorry, didn't see you're the person I replied to. I think you made a very good choice -- in the face of the unknown, increasing sample size and attempting to make it unbiased is possibly the best step


Given how surveys work in the real world, D would be done.


> This is sending a message that statistics is more like social science than like math

I think the real issue at hand is this question belongs in a sociology class, not a statistics class.


- "I think I’ll choose C."


The best course of action is to stop selling overpriced textbooks.


ChatGPT says:

  The best plan of action for the university is to use the 90 questionnaires that were submitted as the final sample. This is because the sample of 120 students was randomly selected, which means that the 90 students who did respond to the questionnaire are representative of the entire population of students in this year's freshman class. By using the responses from these 90 students, the university will be able to obtain accurate and reliable results from the questionnaire.


Why should we care what ChatGPT says?


ChatGPT is completely incorrect, almost like it has no actual insight into the question... The 90 that did respond may or may not be representative of the rest of the students.

Example:

I send out a questionaire about plans post incarceration to 120 ex inmates. I get 90 responses that are very positive, and conclude that 100% of inmates have positive outcomes after prison. In reality 30 ex inmates didn't respond because they had committed suicide, or been murdered, or were back in prison.


Extreme differences between those who answer a survey and those who don't is obviously always an area of concern. That said, under most circumstances, the alternative is you pretty much have to not bother with data (or do some qualitative research--e.g. interviews) or you have to go with what you have and maybe discuss potential areas of bias.


The topic for this questionnaire is "student life, academics, athletics", which is an important part of the question when determining if non-responses are important. In this case ChatGPT gets it correct, but explains the reason very poorly.


No it is not correct.

"the 90 students who did respond to the questionnaire are representative of the entire population"

How can anyone know this without knowing the size of the freshman class? For that matter, how does anyone know that 120 would have been a good number?


The more people you survey, the higher the confidence that the questionnaire is representative of the entire population, or the 'confidence interval'. Since the population was randomly sampled & non-responses don't correlate to the topic questions, you can assume that the 90 students are representative of the entire population with +- some confidence interval.


Yes, but we know neither the size of the population nor the confidence interval had the full 120 students responded.

Secondly, non-responses may actually correlate to the topic questions! For all anyone knows, students may not respond because

- too busy studying

- too busy partying

- too busy with intramural sports

- too disillusioned, feeling that it will not make a difference.

Etc


So, option A :)


Option D




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: