Hacker News new | past | comments | ask | show | jobs | submit login
Surprisingly popular (wikipedia.org)
241 points by tosh 3 months ago | hide | past | web | favorite | 87 comments

The capital is not Philadelphia, it's Harrisburg. So the idea must be that most people think they are right and others will agree with them, but there are people who get it right and know that others get it wrong, and they are responsible for the difference in scores.

There are four groups of people:

A - "Philadelphia is the capital, and others will agree."

B - "Philadelphia is the capital, but most others won't know that".

C - "Harrisburg is the capital, and others will agree."

D - "Harrisburg is the capital, but most others won't know that."

This technique eliminates groups A and C from consideration, and measures the difference in size between groups B and D.

Both groups B and D think they know something other people don't, but B is wrong and D is right.

In cases where people feel like they have "inside" knowledge, generally speaking it's because they are correct and knowledgeable (group D), not because they are misled (group B).

You often arrive in group D by virtue of having been in group A and then learning the actual truth of the matter.

You can arrive in group B by falling prey to a conspiracy theory, which makes this technique perhaps invalid in such cases? I wouldn't be surprised if the question "Do vaccines cause autism?" had a surprisingly popular answer of "Yes".

You should consider adding this breakdown to the wikipedia article

This is a great summary / reduction of the article. I appreciated how you broke it down into groups and made the example more concrete.

It's not a summary/reduction, it's a a better article with more information.

Wow, thank you!! :)

Thanks :)

if you don't post this to Wikipedia, may someone else post it on your behalf (i.e. do you accept Wikipedia's license on your words above?)

I posted it. Thanks :)

>conspiracy theories

I can't say what such a poll would say, but I feel fairly confident that using any sort of "surprising popularity" measure is no guarantee of Truth, but only an excuse after the fact to explain a result.

As you say, for some question the "surprisingly popular" answer could just as easily be dead wrong.

Really great explanation. This is more a way of finding an opinionated (smug?) minority who believe they know better than the crowd.

Sometimes experts could be right, but other times wrong. Vaccines/climate change/current politics could all have a strong effect here and still be wrong.

This comment thread is confusing because HN users have edited the article several times today.


That is indeed exactly the underpinning of the technique, not that you can tell from the wiki article.

It's hinted at, but you have really latch on to this statement at the beginning.

>is a wisdom of the crowd technique that taps into the expert minority opinion within a crowd.

For what it's worth I found the Wikipedia article clearer.

The article should state clearer that the answer is no, as it made no sense to me before reading your comment. (Edit, it's fixed now)

Thanks, I read the article multiple times and couldn't really understand why someone would give the wrong answer and then assume that most people would give the right answer (I didn't know Pennsylvania wasn't the capital of Philadelphia).

> I didn't know Pennsylvania wasn't the capital of Philadelphia

Yes you did. :)

You know what I meant :(

What about people who aren't sure?

If you give an answer but its a guess, you'd be less likely to say people agree with you.

assuming a binary question (very important) if you aren't sure, if you thought most people thought differently from you, why would you hold on to your belief that lacks both primary and secondary support?

"My best guess is X but I believe most people would contradict me" suggests that you consider yourself an expert who knows better than most people.

The problem with this technique is that is useless

We live in cultural bubbles, tending to cluster close to people like we (People that speak the same language, with similar points of view, economic levels, cultural answers engraved for solving the same problem, eating the same food, cooked in the same way, exposed to the same publicity, popular songs, sport teams and ideological propaganda). People make most of their friends in the same school whereas being teached the same things by the same teacher.

Therefore, we are notoriously bad guessing what other people think... out of our cultural bubble.

If you ask people questions like "Will your neighbors enjoy doing pig meat barbeques?", "Are this mushrooms poisonous" or "Is ok to put soap directly in your bath water?" they can provide a reliable answer, but the results can't be extrapolated out of the cluster. Are useles to find the truth. All cultures choose to ignore big chunks of human knowledge or do some things plain wrong. A poisonous mushroom can be made edible after cooking by an obscure technique culturally spread. Many jews will dislike pig meat. Many japanese will not find ok to put soap in the bath. The majority of people in the planet don't know and will not care about what is the capital of Guangdong, Entre Ríos, or Philadelphia.

If your starting variables are unconstrained (the people answering the question is anonymous and from an unknown pool) the technique is a sociological dice that will return different results each time (AKA pseudoscience)

On the other hand, how would you communicate uncertainty in the context of a binary question?

The way I understand it it's effectively it's a good way to identify trick questions and the people who correctly identify the "trick" (by saying that other people would fall for it and reply differently) are more likely to be right overall and it's reflected in the percentage difference.

Wouldn't you get the same result simply by saying "let's just consider the answers from people who said others would respond differently because they're more likely to be right"? That's more intuitive to me and I believe it's effectively the same operation.

Yeah, it's a bit like "second order" wisdom of the crowds.

First order: "How many jelly beans are in this jar?"

Second order: "What percentage of people will get the wrong answer to this question about Pennsylvania's capital?"

There's also the interplay with the Keynesian beauty contest: https://en.wikipedia.org/wiki/Keynesian_beauty_contest

In some sense the stock market is an infinite regress of these "what will other people think about value..." questions.

Shares do actually have a fundamental value, based on their lifetime dividends, buybacks and liquidation. So unlike, say, Bitcoin or the Dollar, there is something to be gained from knowing the zeroth-order reality.

Historically most shares did have a fundamental value.

When Buffet was starting out in 1950-1960s, shareholders had stronger rights and many shares did pay dividends. The worst case scenario would be gaining a controlling share and forcing some sort of payback.

Today there are many classes of shares which are disturbingly close to buying random cryptocoin.

Think about buying non-voting shares of Google,Facebook,Zynga, etc. Or how about buying whatever that travesty that ADR of Alibaba is ? What are you getting when you buy Alibaba in US?

What are you getting really when you buy those kind of shares?

You have very little of hope of getting dividends.

You have no hope of getting controlling shareholders to give up their controlling "some animals are more equal" shares.

So there is very little intrinsic value in those.

Thus the only value is the "castles in the air" valuation.

You've raised a very good point, and that's something I spend a lot of time thinking about. Zynga is a great example because there's a serious chance that they could go out of business before paying a single penny to shareholders.

However, for those with fear of that situation, there are saner equities on the market.

Disagree. Think about what makes a share liquid: lots of potential buyers (i.e. people perceiving it as a safe investment).

If you were offering a long period of healthy dividends for a good price, I wouldn't care if I could sell it.

Sure but where does the amount of dividends you're being given come from? All i'm saying is there's no objective "baseline", it's all perceived value no matter how you look at it.

> Think about what makes a share liquid: lots of potential buyers

You don't need lots of potential buyers for liquidity, only one for what you're trying to sell. It's why very low volume secondary markets can function and provide liquidity.

A lot of people will also obviously invest into things even when they do not consider them safe investments. If the potential return is thought to be high enough, large numbers of people will routinely invest into unsafe investments with full awareness that it's at least somewhat dangerous (whether stocks, crypto, real-estate, schemes, etc).

I get what you're trying to say but having multiple potential buyers is literally the definition of liquidity. If you only have one for what you're trying to sell then it's very easy for someone else to sell their share and boom you can't get rid of yours. These secondary markets work because there's a low amount of buyers AND sellers but if there would happen to be a sudden surge in sellers then the low liquidity of the market would eventually show itself.

It's an interesting heuristic, but without an exposition it's hard to know if there is any reason to believe it. Has this technique been assessed across various domains? Does it work better for political facts than for scientific facts? Where does it fail? It's a provocative article and a cool thought, but it seems more like a hypothesis.

(In particular, I'm referring to the assertion at the end of the article: "Because of the relatively high margin of 10%, there can be high confidence that the correct answer is No.")

One of the wiki's sources might have been a better linked article.

The MIT summary [1] notes "The researchers first derived their result mathematically, then assessed how it works in practice, through surveys spanning a range of subjects, including U.S. state capitols, general knowledge, medical diagnoses by dermatologists, and art auction estimates." Across all those areas, this technique had error rates about 20% lower than other competing techniques. Those techniques included simple majority vote to two different kinds of confidence-weighted scoring.

The paper earned a prestigious publication in Nature.

1: http://news.mit.edu/2017/algorithm-better-wisdom-crowds-0125

Awesome, thank you. And 'nacc found the link to the Nature article as well: https://news.ycombinator.com/item?id=20547787

One interesting thing about the paper is that it seems that the Wikipedia article incorrectly describes the procedure: respondents were not asked to guess whether the majority would agree with their position. They were asked to guess what per cent of other respondents would agree. I think that's a pretty severe difference in the method.

Digging down a bit this method is published in Nature! https://www.nature.com/articles/nature21054

Looks like an idea for a semi-supervised ensemble method for machine learning:

Prepare two equally sized ensembles of classifiers, let's call them A and B.

1. Train each classifier in ensemble A on labelled data to predict does a picture contains a cat.

2. Take some other unlabelled dataset and collect answers from classifiers from A for each picture from this dataset.

3. Train each classifier in ensemble B to predict average answer of classifiers from A for each picture from the unlabelled dataset.

Then for a picture from the test dataset it would be possible to get answers from ensemble A and from ensemble B and calculate what would be the surprisingly popular answer.

Please do this.

Let's look at another toy example, using made-up numbers:

Q: Is the earth flat?

    Yes: 10%
    No:  90%
Q: What do you think most people will respond to that question?

    Yes:  3%
    No:  97%

    Yes: 10% -  3% =  7%
    No:  90% - 97% = -7%

The earth is flat, I guess ;)

A more realistic example of that would be:

Q. Is the earth flat?

    Yes: 0.1%
    No: 99.9%
Q: What do you think most people will respond to that question?

    Yes: 10%
    No: 90%
Analysis: Yes: 0.1% - 10% = -9.9% No: 99.9% - 90% = 9.9%

Conclusion: the earth is not flat.

Not sure where you're getting the "10% believe the earth is flat" from, but if that's just your guess then it's pretty reasonable for people to say they think 10% of people will respond yes to the question.

No, his number is right.

It's not about what exactly the percentage flat-Earther would be (be it 1% or 0.1%), it's about which question has more "no" percentage.

And it should be Q2 because

Normal people will answer "No" to both questions.

Flat-Earther will answer "Yes" to the first question. Most of them would, however, answer "No" for the second, because they're well aware they are the minority on this topic (think "wake up sheeple!").

If there are only 0.1% of flat earthers, why would 10% of people think it to be the most popular answer? That does not seem realistic at all.

Or did you interpret the question as suggested by bena in his sibling answer, ie as an estimate for the answer distribution (that would then have to be averaged) instead of a choice of the most popular answer (as was done in the wiki article)?

>That does not seem realistic at all.

That does seem realistic considering how often the topic pf flat-earthers comes up in discussions on social media, despite it being an extremely niche view.

OP suggested that 10% of people think there are more than 50% of flat-earthers when the real number is 0.1%. Phrased another way, 10% of people expect half the people they meet will be flat-earthers when it's 1 in 1000.

Doesn't seem realistic to me.

> Not sure where you're getting the "10% believe the earth is flat" from

I'm not sure, that "yes" imply that the asked person believes that Earth is flat. I probably would be confused if someone asked me such a dumb question (why?), and I answer "yes" to make situation to be obscure not just for me.

> Not sure where you're getting the "10% believe the earth is flat" from,

I think OKCupid got numbers like this for the question, or something like "is the Earth bigger than the Sun". To which I attribute to the smartass constant -- or rather that dumb questions elicit dumb responses.

Works with creationism then.

But, interestingly, there may be a symmetry between people who overestimate the amount of creationists and creationists who underestimate their numbers.

So maybe that technique is actually valid. It spots an asymmetry in people who think the majority is wrong on something.

The second question does not ask for an estimate of prevalence, but to pick the answer you think most popular. You can over-estimate the amounts as much as you like as long as the majority choice remains the same, ie it would not matter whether you think there are 0.1% or 49.9% of creationists...

Given how 10% of people responded "yes" and so few thought people would respond "yes," I would agree that "yes" is in fact surprisingly popular.

The article suggests "surpisingly popular"==correct answers from experts.

It could just as easily be crazy answers from crazy people.

We cannot evaluate that claim based on hypothetical figures. 10% claiming the Earth was flat would be a very biased sample in the real world. It would more likely be 98% and 99%, resulting in only 1% surprisingly popular.

I would argue that the test is non-informative for controversial questions of any kind. These are cases where

* each side believes that those who disagree are wrong, not just mistaken in their remembrance of a fact.

* each side is well-informed about which view is more popular

The fact that everyone is willing to stick to their answer despite everyone knowing what "most people" think means that an answer's being surprisingly popular doesn't bear either way on its being right. The test is ideal for simple questions of fact which are not disputed.

I think this a fair criticism which you haven't well explained, that a minority expert might be minority misled.

That was left as an exercise for the reader :p

As you said, knowingly going against the consensus does not an expert make.

This also shows my problem with the way the concept is presented.

The correct answer to the second question is "No". That is what most people would say. So technically, if we saw Yes:0, No:100, that would just mean everyone knows what they popular choice is.

What they want to ask is "How many people would say the answer is "No"?" Then if the answers are far afield, we can say whether or not an answer is surprisingly popular.

So in the article and with your example, the second question isn't actually a reflection of the proportions of the first. They're kind of disconnected.

Q: Is the earth flat? -> "Not controversial enough..." (nowadays, in 2019) ie. it does not fit the "surprisingly popular" definition because it isn't popular. There's just a few loudmouths who make it seem like that (combined with a bunch of jokers who joke about it). Hence you don't get many people who will argue most people will respond Yes. You could regard that as a flaw, I see it as a feature.

Yes, it's definitely vulnerable to situations where a large number of people have been "seeded" with the specific belief that they are part of a minority that holds the correct answer.

Still, the article says this:

    Because of the relatively high margin of 10%, there can 
    be high confidence that the [surprisingly popular answer 
    is correct]
I would maintain this holds true, even though we can name some easy counterexamples like flat-earthism.

One, I'm sure that nowhere near 10% of an educated population believes in flat-earthism.

Two, I think this technique is certain to yield the correct answer quite a bit more often than it will yield an incorrect one. Since it claims to achieve "high confidence" and not "absolute confidence" I think it would still be a pretty valuable metric in many instances.

You could argue that “yes” is indeed a surprisingly popular answer to a question that should have 0 yes’s. However, the 7% margin probably isn’t considered significant enough to draw a conclusion on the real answer.

Good example.

Guess this technique may also detect facts that are involved in common conspiracy.

I’m not sure what to think of a Wikipedia article without any sections on Uses, and the only Example being unreferenced.

Notice the relationships between Dempster–Shafer theory -- AKA the theory of belief functions / evidence theory -- for reasoning with uncertainty...


How does this play out in the stock market?

In the cases of Brexit and the 2016 US Presidential election, betting markets reported a majority of money on Remain and Clinton with a majority of bets on Leave and Trump.

Perhaps the number of bets is analogous to what people believe, while total money bet is analogous to what people think others believe. If so, the surprisingly popular answer is the one with the smallest average bet.

The obvious analog in the stock market is average order size, but this isn't a generally-available statistic, and it's not clear to me whether it would be meaningful.

Volume (shares traded per time unit) is closely-watched, however, and several technical analysis "indicators" compare price changes to volume.[1][2][3][4]

High liquidity is a state of relatively small price changes per unit of trading volume. It generally means traders are not paying large premiums/discounts to previous trades. In a way, this corresponds to a smaller bets on future returns, and thus, to surprisingly popular returns...

[1] https://en.wikipedia.org/wiki/Ease_of_movement

[2] https://en.wikipedia.org/wiki/Force_index

[3] https://en.wikipedia.org/wiki/Volume-price_trend

[4] There are similar concepts in modern portfolio theorgy, such as Amihud illiquidity.

I see people mentioning whenever the answer is right or not, but that's actually irrelevant. The heuristic only measures how surprising the popular answer is to the population.

It's not presented as irrelevant in the article. The article presents it as "a wisdom of the crowd technique", and ends with "there can be high confidence that the correct answer is No".

So it is assumed that people using this technique could be inferring the correct answer by finding the surprisingly popular answer. Not just a observation about what the population thinks.

Using this on subjective questions (e.g. “Is Cymbeline one of Shakespeare’s best plays?”) would help you find... what? Underrated things?

I'm not sure the algorithm would emit any answer in the case where everyone has an equal level of objective foundation for their subjective belief.

But it would probably help in cases where popular opinion is entirely misinformed about the subjective question, not having any basis other than (already misinformed) hearsay on which to form their own subjective opinion.

So, for example, if there was a musician who had an absolutely terrible song that somehow became the song they were best known for (being a "one-hit wonder" whose song wasn't really a "hit"), the public might believe that that song is their best song, since it's the only song of theirs the public has ever heard of. Experts (i.e. people who have heard more than the one song of theirs), on the other hand, would tend to agree that it's certainly not their best song.

(Given that example, I'm inclined to suggest that you could use this algorithm to determine when people are being judged overly-harshly for things, e.g. whether to ban someone from a website just because they've received a lot of reports about that person's behavior.)

The example uses it to spot a case where most people are wrong, but some large minority of people expect that most people will answer incorrectly, while themselves answering correctly. A large enough difference (10%, in the example case) between the "what do you guess others will answer?" and what people actually answered indicates the majority opinion is, in fact, wrong.

Have any political pollsters ever done this sort of question format?

You mean like this:

1. Who will you vote for?

2. Who do you think will win?

I don't know if we can learn anything from the surprisingly popular answer here, but now I wonder if question 2 would be a better predictor of the election result. But it will be tainted by previous polls and news.

Forgetting one thing, surpassingly popular to who? You need to answer the "to who" part.

The wisdom of crowds is a truly fascinating thing... the fact that people don't really need to know what they're talking about as long as there's enough of them really blows my mind.

The concept in the article is pretty much a counter example of that. It says that if there is a core part of the crowd that are expert on the topic at hand, and they answer differently from the crowd wisdom, there is a way to find out about it using a meta-question about the crowd wisdom itself.

The insanity of crowds

I wonder if this could be viewed as an application of https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

Nope. That is overconfidence of the ignorant, and this is dealing with trusting the high confidence of the correct.

It doesn't really matter here that people answered incorrectly.

I don't think so. It's not saying anything about how highly confident the correct are. It's about how they know that most people are incorrect. It's about misconceptions.

There is this core part of the unpopular-but-correct answerers that knows that the incorrect answer is popular. The larger this core part is, the higher the "score" of the "surprisingly popular" answer is. It's about finding something that is "mistakenly popular".

Yeah but this is in the context of not knowing the correct answer.

Saying that you believe the popular option is different to yours is a display of high confidence in your own answer, and low confidence in other people's knowledge.

Dunning kruger only deals with highly confident wrong people, which isn't the interesting group in our surprisingly popular measurements.

Bringing dunning kruger in to it would make more sense if there was an "im not 100% confident" option, but instead it was an absolute dichotomy.

It doesn't really mean that all of the people answering incorrectly are completely confident and therefore are dunning krugering. Someone who is not overconfident at 60% confidence adds the same "yes" data point that someone who is 100% confident and wrong does.

In October 2016:

* Do you think Trump will win the US Presidential election?

* Do most voters think Trump will win?

Philadelphia should be the capital of Pennsylvania

It's a surprising pattern... Most U.S. state capitals are not their largest cities, and most large cities are not capitals. Phoenix is the only city in the top 10 that is also a capital.

Again? Party like it's 1799.

Pittsburg/Philadelphia should be the capital.

There is already widespread belief, in the rest of the state, that the Philadelphia area (as the largest concentration of population and wealth) has an unfairly outsized influence over Harrisburg.

The result of moving the capital to Philadelphia would surely strain the legitimacy of the state government even more.

Moving it to Pittsburgh would accomplish the same exact thing except it would be arguably even less just.

There are any number of reasons why keeping it in Harrisburg is imperfect but it's probably the least bad solution.

No, I mean really Pittsburg/Philadelphia, like Minneapolis/St. Paul, Seattle/Tacoma, or Raleigh/Durham.

Yes, the distance between respective city centers is farther than those others, but that makes it more inclusive.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact