Hacker News new | past | comments | ask | show | jobs | submit login
A group of 3,000 citizens is making better forecasts than CIA analysts (npr.org)
297 points by sizzle on Apr 2, 2014 | hide | past | web | favorite | 135 comments

This Economist post[0] addresses some of the many comments about statistical outliers:

> The big surprise has been the support for the unabashedly elitist “super-forecaster” hypothesis. The top 2% of forecasters in Year 1 showed that there is more than luck at play. If it were just luck, the “supers” would regress to the mean: yesterday’s champs would be today’s chumps. But they actually got better. When we randomly assigned “supers” into elite teams, they blew the lid off IARPA’s performance goals. They beat the unweighted average (wisdom-of-overall-crowd) by 65%; beat the best algorithms of four competitor institutions by 35-60%; and beat two prediction markets by 20-35%


I would bet that the only reason they're beating prediction markets is because the markets aren't "real money" or they aren't able to participate in them.

If they were, they would go make money in the prediction markets, and the prediction markets would become more accurate.

Could this be simple selectivity? I do not know how CIA selects analysts, but I doubt it is by having large pool of applicants that are evaluated on their predictions. On the other hand we have group of 30 people, selected for their guessing from amongst 3000 population. When put that way, I would expect 30 people to be consistently better.

These people are also isolated from the internal agency narrative, I imagine that can help quite a bit.

In other words, what if the "taxpayer-funded" forecasts we're permitted to know about are "on the books", so to speak, and so much more is "off the books" and memory-holed beyond even the typical "black magic marker, and white out" form of redacted classification that some times surfaces into the public eye?

Truths so brutal, and predictions so harsh that no one dares write them down.

But now we enter the information age, and certain realities are essentially undeniable, and can no longer concealed from the civilian panopticons like Google.

Sort of like how TV Networks (and even basic cable channels to a degree), with their Standards & Practices departments, waste time sanitizing television destined for the public air waves and radio spectrum, while media on the internet operates with an almost total lack of restraint.

You can find ads online.

Basically, get a degree in a related field (international politics, security, languages, demographics, economics, etc), and have decent marks. Apply. Don't tell anyone (until they have briefed you on what you can say). Don't have any skeletons in your closet.

It's basically like getting a USAID job, but probably not as competitive (lots of funding = lots of positions, and most people doing these kinds of degrees tend to not be very pro-defence) and with slightly higher security requirements.

Predictions are all about discerning signals from noise, and some people seem to be naturally better at discernment. I'm sure it can be learned as well (to a degree), but I'm not at all surprised that some people are exceptionally better than others.

Very nice. Thanks for finding that.

>If it were just luck, the “supers” would regress to the mean: yesterday’s champs would be today’s chumps.


If I just got really freaking lucky with my predictions yesterday, then there's no reason to imagine I would get really freaking lucky again. Over time, my average accuracy would become the mean of the distribution.

On the other hand, if I actually _am_ good at making predictions, then I will continue to make good predictions.

I need to know more. If the questions are all yes/no questions ("Will there be a significant attack on Israeli territory before May 10, 2014?"), and the sample set is large enough, you would -expect- there to be some outliers who mostly got things right, by pure chance. And even outside of that, I want to know what sort of average the CIA agents were batting; if they were hitting just 50%, I would -expect- nearly any sample size to have an outlier who did better.

That is, to borrow the old get rich quick scheme, take a list of 1024 addresses. Send to half of them "Stock X will be higher on (date) than today", and to the other half "Stock X will be lower on (date) than today". Repeat with the set you sent the correct 'predictions' to, and continue repeating until you're down to four people who you've sent the correct prediction to every time. Suggest they sign up for your premium stock tips newsletter.

All you're doing is playing statistics, you're not actually demonstrating any predictive ability.

"and the sample set is large enough, you would -expect- there to be some outliers who mostly got things right, by pure chance."

According to the article, "In fact, she's so good she's been put on a special team with other super forecasters whose predictions are reportedly 30 percent better than intelligence officers with access to actual classified information."

If they mean that after the team was formed, the predictions are still 30% better, than random chance ceases to be an acceptable explanation. If they mean that the team that they formed was 30% better on the predictions used to grant entrance into the team, then your objection still holds. Unfortunately, I don't think the article makes it clear which is the case.

That said, there are definitely some mechanisms that could lead an outside group to higher accuracy on these sorts of things. The two biggest are: 1. The insiders are simply that bad, for whatever reason, and random chance can easily beat them, and 2. The failures of the insiders are correlated and overwhelm their theoretically superior information, and I'd especially finger potential ideological biases there, which could affect both information filtering and the analysis. (It is one thing to believe the world should be a certain way as a result of one's ideology... it is quite another to believe the world must already be a certain way as a result of one's ideology, even when one is staring evidence in the face that it is not. One is noble and laudable... the other somewhat less so.)

As a guess, I would suspect that the researchers who set up the experiment are more careful about their analysis than NPR is with its reporting. NPR needs to popularize its information, thus ensuring that we, the casual reader, can get a grasp of what's going on. That necessarily dumbs down the content.

Why did they go half way, though? If you're going to dumb down things, why don't just say "scientists found that" instead of giving a distorted view of the experiment, popularizing a wrong idea of what real science actually looks like?

Its not clear they continue to be 30% better - just that they started out that way, and got put on a team.

I'm still not convinced these folks aren't just outliers. People in all walks of life give far too much credence to lucky outliers - CEOs are hired that way, ball players, stock portfolios, sports predictors. If you make enough predictions and cherry-pick the best, you can always find a 'miracle man'.

"Its not clear they continue to be 30% better"

Yes, that was my point. The article failed to make it clear. It is worded such that it could go either way.

I agree they may be outliers. However, I can't dismiss out-of-hand the possibility that they are actually better. Outliers are common... but so are systematic/correlated weaknesses within large organizations.

> but so are systematic/correlated weaknesses within large organizations

Indeed, that's such a big problem that there's a sort-of famous book written by a CIA analyst, "Psychology of Intelligence Analysis", that gives a pretty clear breakdown of how and why analysts fail to reason correctly about situations.

It also included examples of how Robert Gates (yes, the later SECDEF) managed to improve analyst reporting by actually reading the reports submitted and figuratively shredding the ones which were bad.

Before that there had been effectively no downside to submitting reports which were crap, intellectually speaking, and eventually things had become just professional-sounding educated guesses.

Large agencies don't have the benefit of small groups self-critiquing their own performance, so they have to try very hard to set and enforce standards of quality. When they don't do this it should not be surprising when quality drops.

Total guess here, but you would imagine that since it is a project based on statistical prediction, they would continue to measure their ability to predict.

Yes I look forward to a follow-up on this group

> "According to one report, the predictions made by the Good Judgment Project are often better even than intelligence analysts with access to classified information"

This makes me wonder how well analysts do, when they aren't given classified intelligence information. It's entirely possible that the analysts' skills are being hamstrung by information they think they know, but isn't actually true.

> and I'd especially finger potential ideological biases there, which could affect both information filtering and the analysis

An excellent book on just this subject


In general these are valid remarks. But we're not talking about random clueless people deciding to play statistics, the research is obviously conducted by people who know how this kind of stuff (statistics) works. This reminds me of laypeople trying to poke holes in theories of physics (and I'm not talking about quacks and lunatics, just well intentioned people who think they have a better explanation based on rudimentary high school or even undergraduate science and math). It's such a charming, cute naivete. Do they really think their simple ideas never occurred to a single professional physicist who immediately saw it was wrong for a number of reasons?

So, do you really think the researchers in this case hadn't thought of this? I mean yeah, it's OK to doubt and inquire, but come on, this is so rudimentary...

> If the questions are all yes/no questions

The article says (or at least implies) that the subjects give probabilities of "yes" to each question.

EDIT: Oh, and to add, the article clearly says they're not just making shit up and randomly coming up with probabilities. They collect open source info on events and base their estimates on that. So it's not just blind guessing.

As to the researchers probably know what they're doing, see my original comment and the very first thing I said, "I need to know more". It's not charming cute naiveté, I simply really think that the article tells me nothing about the methodology.

See response here as to why they're still yes/no questions (TLDR; the outcomes are yes/no) - https://news.ycombinator.com/item?id=7515824

Well, I didn't mean your comment was a case of naivete, because, TBH, the analogy with my layman physics example isn't perfect, but it reminded me, it's a similar thing: people who are or should be aware that they have only superficial info on something, act as though what they know is all there is to it. The point is, we can't possibly come up with a meaningful critique of this research based on the given article, and, I think, the most reasonable course of action in this particular case is to assume the researchers know what they're doing (considering who's involved).

I think the only thing to conclude from the article is that normal people can beat CIA analysts. This is similar to the story about monkey stock picking. (near useless FT story url http://www.ft.com/intl/cms/s/0/abd15744-9793-11e2-b7ef-00144...)

Hm, I wouldn't say normal people, but subpopulations of normal people (i.e. not individuals but groups). It's an important distinction in this case. But it depends on how CIA analysts' estimates are aggregated if at all. In fact it would be extremely interesting if CIA normally got predictions by averaging multiple analysts' estimates. That would mean normal people with open source intel are beating trained analysts with confidential intel! In that case, we could say normal people beat CIA analysts. But to me this article implies analysts' estimates are not taken as averages. Any CIA analysts here to set the record straight? :)

And further, more people in the average means less variance. Do they compare sub-populations of equal size?

>was given access to a website that listed dozens of carefully worded questions on events of interest to the intelligence community, along with a place for her to enter her numerical estimate of their likelihood.

If you need to know more, try reading the article? It says that they're not yes/no questions.

I'm participating in this project, as are at least a couple other HNers (surprising to no one, huh?).

I won't go into details about it because it's not my place to do so at this point (maybe after).

But I will confirm that for every response you give, you're required to enter a percentage estimate of likelihood. For example, you'd enter 90% or 72.212% or whatever on whichever question you're responding to. So there's a potential mechanism for further ranking of participants beyond the binary. The voting mechanism itself is more complicated, but again, I'll leave discussion for when it's over.

The interesting thing about this is it raises the [theoretical, in the absence of information about weighting/ranking systems] possibility professional intelligence analysts' relative underperformance against the measure is less due to inaccurately identifying high/low probability more frequently than amateur "superforecasters", and more due to professionals making the systematic error of overconfidence in the evidence they have when weighting their estimates - e.g. if the amateurs are a lot more likely to either pick 50% on events where there genuinely isn't enough information to forecast and less likely to assign single digit probabilities to events which no available evidence suggest are likely which nevertheless happen. In other words, it sounds highly plausible that if you asked simple binary questions about expected outcomes both groups would give almost identical answers and usually be correct, but the professionals are more confident when both groups are wrong.

If this is the case then it's reasonable to assume CIA's statisticians would have done the analysis and know that's the reason these "superforecasters" are better: doubt

I guess the reverse is also possible: professional intelligence analysts are systematically tending towards being overcautious and tend to pick numbers towards the middle of the range, either out of a desire not to look silly or because they're more aware of policy implications. But subjectively I'd assign that a lower probability.

That's very interesting! How did you become involved in this project?

Whether or not they happened is a yes/no though. It's not "I believe this is 60% likely to happen" "Why yes! It was exactly 60%!", it is instead "This person said it was 60% likely to happen, and it happened. That means this person was right, for a certain degree of right"; we don't actually know what that means.

It may be that they're weighting it based on confidence level (so saying 0% chance on something that happens counts against them, but 49% chance on something that happens counts against them less), but it still counts as a yes/no in that given enough people, and a random distribution of answers, you would expect a subset of people to always be right (though the 'amount' of right changes, this person said 51% chance of it happening, this person said 100% chance of it happening; they both got it right).

Agreed. How do we know that the top 1% predictors are there through skill rather than luck? Knowing more about the questions, numbers etc would help us to come to a good idea about how many people should make these predictions by chance alone, and so on.

Or, she could be a "Teela Brown" -- someone who is naturally (or unnaturally...) lucky.


I don't see why this is being down voted as someone who has read the Ringworld novels I couldn't help but think of Teela Brown and her luck.

Because chiph is offering a fictional explanation for a real world phenomenon. There have been lots of scientific studies involving randomness in fields as diverse as economics, game theory, gambling behavior, and parapsychology; if 'luckiness' was a measurable characteristic of people then you'd expect it to reliably show up in such contexts. Invoking it as an explanation where a simpler alternative exists (the idea that some people are more skilled at analyzing current affairs) is illogical.

I cant downvote, but i expected the link to point to someone real, not a fictional character.

And on the opposite side of the spectrum : http://en.wikipedia.org/wiki/Roy_Sullivan

I'm assuming their smart enough to know that and that the article was just glossing over the details.

Now the fun question; are there any anti-forecasters? People who guess wrong so consistently and powerfully they're as good as super forecasters?

This is a really interesting question, actually.

In my undergrad ecology class, we were assigned a scent-based laboratory experiment which involved all the male and female members of the class wearing the same white t-shirt for 48 hours in order to accrue bodily odors-- heavy workouts, perfume, showering, etc were all prohibited. At the end, everyone in the class sniffed all of the t-shirts and guessed which t shirt had been worn by what sex.

Most people had a 50-50 chance, and a few guessed slightly better than chance. I was far worse than chance, guessing only 10% correctly. In a way, anti-forecasting is a type of forecasting, too. If someone had asked me for my guess and then guessed the opposite, they'd have had superior guessing power. This paradigm falls apart quickly when the chances of events aren't so clear, however.

Ah, the old Costanza maneuver.

I wonder how practice/training would impact those results.

This is a common factor which is considered when researching approaches to crowd sourcing. The article at [0] shows how their approach deals with anti-forcasters ("adversarial answers") by doing exactly as you say - inverting their answer and calling them good.

[0] - http://scholar.google.com/scholar?cluster=144091158652582285...

That's an excellent question I'd like to know the answer to. I've personally known lots of people who were ideologically predisposed towards certain biases who would consistently predict outcomes of political events incorrectly.

e.g. "if <insert presidential candidate> is elected, that's gonna be the end of America" with similarly failed predictions over the next four years when that candidate is elected.

No, there were not.

Well, people claiming to be psychic have sometimes managed to get worse-than-random results when tested...

For a prediction market anyone can join checkout https://scicast.org/

Whats neat about SciCast is you can make predictions based on the assumption of how other questions will turn out.

eg I think that if the price of bitcoin exceeds 1000 by 2015 at least one presidential candidate will accept bitcoin donations. If btc ends up being below 1000 I make no prediction.

Just to add, the SciCast project came out of the DAGGRE project, which was a competitor to Good Judgment in the IARPA prediction challenge.

Pure speculation: random, untrained and disinterested parties might be better than CIA analysts at making geopolitical predictions because their results aren't politicized. "Are there WMDs in Iraq?"

I actually thought the opposite.

A lot of these federal positions are military. So I assumed it was some enlisted guy out of high school that didn't want a desk job but wants to be firing guns. He's been doing it years and wants out. He's not invested in any of this analysis.

Compared to the guy that self-selected to sign up for an analysis website as a hobby. He finds it fun to think and analyze. He gets points for being right and has a score to improve.

The other factor is workloads. If a CIA analyst has to make X decisions in Y time, then they may be rushing.

I read this story as a failure of federal services to reward successful performers and remove failing performers.

And there's always going to be a political aspect to recruiting. If you believe this terrorism stuff is overblown and most other countries keep themselves to themselves and aren't out to get us, you probably won't be signing up to be a CIA analyst. But if you think terrorists and communists are a big threat to the country, signing up with the CIA would seem much more important.

I was having the same thought, at what point/how do you factor in the, "if I make this predication, which is contrary to my boss's expectations, will I still have a job". Surely there is a huge amount of pressure on "rowing in the same direction" towards a predetermined group think (which is terrible, of course).

There's also the "if I make this prediction and I'm wrong, will it affect me?" bias too.

The more involved you are, and especially the more accountable you are held for your decisions, the more you're likely to second guess yourself. There's normally a reason your second guess wasn't your first guess.

It's totally unclear what they are actually doing.

Are they averaging everyone's answer (wisdom of the crowd)? Are they looking back at who got lucky and declaring them "super forecasters" (and then seeing large regressions to the mean)? Did they have separate periods of testing to try and control for this? Did they have a control group of similar size just flipping coins?

The way this is written it sounds like someone used a Gladwell book as their entire science education.

There are multiple experimental groups in the project, which use different methods. One of the methods is a prediction market, where you can buy or sell shares of any given outcome, with the market price being the group estimate of the probability. For example, if the group thinks something will happen with 35% probability, you can buy shares for $35 (fake money). When the questions are eventually resolved, correct shares are valued at $100, and incorrect shares are valued at $0.

There were multiple experiment groups at the beginning, but I'm not sure if any are left besides Good Judgment. DAGGRE, the prediction market group, is out.

Edit: Oops, I misunderstood. Apparently Good Judgment also has a prediction market group inside of it. Thanks nbouscal.

They may be good at prediction, but who writes their questions?

It's probably much easier to reach a successful prediction when a lot of the framing has already been done for them. If you take the Ukraine question, for example, there's already a lot of intelligence encapsulated, such as the City and Date when the CIA is expecting a Russian invasion. As a post mentions below, the questions may even be phrased/biased towards the right answer.

I wonder if they would be as good at prediction if they didn't have the resources and intelligence of the CIA providing them with the correct focus, prioritization, and framing.

The CIA is also providing focus, prioritization, and framing for themselves, and is being outperformed.

Now I'm really confused. The title of this post was originally "So You Think You're Smarter Than A CIA Agent ", which is consistent with the title of the article. Now it's been changed to editorialize the content? Isn't that the opposite of HN policies?

Is this policy written somewhere? The only trend I've noticed is that I always agree with anyone complaining about a changed submission title. (I don't think you're complaining, are you? The NPR title is pretty cheap IMHO.)

You can read the guidelines here:


This may fall under:

" Otherwise please use the original title, unless it is misleading or linkbait. "

I would call it linkbait, cool. Sad to see coming from NPR.

Thanks for the link, it's been a few years since I read those rules.

I had the same thing happen to one of my posts. It wasn't changed to the title of the article or to something that made it more descriptive, it was just given a random new title that made it more vague.

Now actually selecting the outliers, the people that, on average, made really good predictions despite access to limited information, and data-mining their features set (psychological traits, information diet, background etc.), THAT would be an interesting study...

They are doing a bit of that too. They have found correlations with IQ and other metrics. http://www.cis.upenn.edu/~ungar/papers/forecast_AAAI_MAGG.pd...

Highly recommend this book, which is on the subject ... http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds

This is a fantastic book and explains most of the questions currently in the thread about the circumstances where large groups of people produce more effective answers than any one of its members and the cases where they do not. If this article was interesting it is worth reading.

The book also briefly covers the Policy Analysis Market, a program to experiment with using a futures market to help predict terrorist attacks. As this program used real money, and offered the opportunity to make money off of terrorist attacks, some members of Congress were outraged and the program was immediately cancelled (http://news.bbc.co.uk/2/hi/americas/3106559.stm). This version discussed in this NPR article seems more publicly palatable. In fact, it looks like the similarity was already reported: http://articles.latimes.com/2012/aug/21/nation/la-na-cia-cro...

By the way, can anyone recommend a more rigorous book than Wisdom of Crowds on this subject?

This one [1] is recommended by the guys behind the Radio Lab podcast (specific episode [2]). I left the referral tag on it since it'll go to help them produce more great radio.

[1] http://www.amazon.com/exec/obidos/ASIN/0385503865/wnycorg-20

[2] http://www.radiolab.org/story/91502-the-invisible-hand/

John Cook has a recent post that gets to the point:


Crowds (as in mobs) aren’t wise; large groups of independent decision makers are wise.

I am a big fan / proponent of prediction "markets" or similar otherwise incentivized crowd prediction tools. Happy to see someone execute it within the government while avoiding the terrible branding and public relations faux pas of past "terrorism market" endeavors.

Hopefully this will scale more broadly, continue to improve its incentives, and increase our capability to prepare for future events.

The issue of crowd sourcing answers to questions is a very interesting one. I've recently been reading research in this area with a colleague, and it is far more involved than the article makes it sound.

If you are interested, one of the first people to work on the problem were Dawid & Skene [0] back in 1979. They were trying to solve the issue of clinicians talking to patients and asking how they are. Each time they asked, the patient may give a slightly different answer. The goal was to estimate the "true" answer from the number of various "noisy" answers (sound familiar after reading the OP?)

The general intuition for most future research into the topic is that if you tend to agree with others more often than not, your score goes up. If your score is higher, your answers are trusted more.

Then there are a whole bunch of interesting additions to this principle, such as factoring in question difficulty [1] (e.g. questions which have the most disagreement can be considered more difficult, and hence you may want to ask more people about these questions to get a clearer picture). It is also helpful to seed certain questions for which the answer is "known" (may or may not be possible, depending on the project). Then you can use the answers to these questions to help score people even more accurately.

An even more interesting approach is to take into account people who answer adversarially, that is, they always answer incorrectly (either due to malicious intent, or they consistently misinterpret the questions in a backward fasion). If you have somebody who answers incorrectly 90% of the time, then they provide almost exactly the same information as someone who answers correctly 90% of the time. All you need to do is flip their vote once you've identified this.

The whole field is very facinating, and there seems to be more and more publications coming out with interesting approaches to the problem.

[0] http://scholar.google.com/scholar?cites=15953840037809938294...

[1] http://scholar.google.com/scholar?cluster=144091158652582285...

One thing that I really enjoyed about this article was its description of "the wisdom of crowds." Typically that phrase shrouded in marketing bullshit that just boils down to the idea that groups are always smarter than individuals because magic. That idea on its own is really counterintuitive and hard to accept.

It was a good call to talk about the scientific basis for the idea; I wish more places that relied on crowdsourcing would do the same thing.

> That idea on its own is really counterintuitive and hard to accept.

Not really... For example, have you heard of the Cicada puzzle? That is solved by groups of people, and probably wouldn't be solved if it was just one person.

Multiple people with specialized knowledge > one person with specialized knowledge.

We didn't evolve to be social animals because it would make us stupid...

A note on the title change:

When the story title is linkbait and there are no good subtitles, we sometimes draw on the first sentence of the story. In this case I used the description under the photo. The key thing is to use only language that's already there.

Doesn't this article seem to contradict itself? It cites superforecasters' amazing ability, then talks about how better forecasting is the result of averaging the noise from a crowd of diverse opinions...

That's what jumped out at me as well. First the article extols how the Wisdom of Crowds is smarter than the experts. Then it goes on to say how some of the crowd are superforecasters, (essentially experts), and the rest of the crowd makes no difference.

It would be interesting to see how small the group of superforecasters would need to get before their accuracy falls below that of the CIA analysts.

Good test!

This is a great topic for a site that uses some "wisdom of crowds" mechanisms as part of the story and comment ranking. Though HN is different in some significant ways: in the experiment, subjects are asked carefully worded questions and give answers without being shown any information from others, whereas here there are several signals available before voting on a story or comment, like prior vote count and previous comments, which can bias a vote.

They started off reasonable. Ask ordinary people to assign probabilities to potential world events that can be verified. The next step is also reasonable. With a large enough crowd, the estimation noise around the "true" probability will cancel out. That's how the crowd mean estimated the ox weight so closely.

But then they turned left onto stupid street. They took the people who, perhaps by chance alone, predicted results closest to the crowd mean, and put them on a "team". What the heck is the point of that? It just reduces the size of the crowd and shows a fundamental misunderstanding of statistics.

Past performance does not necessarily predict future results.

Start with 10000 coins, and flip them all. Keep the ones that land heads. Flip those, and keep the ones that land heads. Repeat until all coins land on tails, or one coin remains as the "champion of heads." What is the probability that when it is flipped again, it comes up heads? Still 50%. It isn't a supercoin. You just managed to select it from a giant group by making it the sole outlier from a series of independent trials.

Though you should probably check to make sure it actually has a tails side, just in case.

> It just reduces the size of the crowd and shows a fundamental misunderstanding of statistics.

Why do you assume that what's in the NPR article is all there is to it, that there's no info, technical or other, left out?

Of course they thought of that, and checked. There's no regression to the mean, the elite group further improved, their estimates were 65 % better than the overall crowd.[0]

[0] http://www.economist.com/news/21589145-how-sort-best-rest-wh...

That article is much more informative. Kudos to you, nzp.

Heh, actually, kudos to joshuahedlund[0], I found it in his comment.

[0] https://news.ycombinator.com/item?id=7515263

How does the 'crowd' adjust for bias in sources? e.g. if they are all picking the answers to the surveys based on similar methods, e.g. google searches or from listening to the news, how does the crowd avoid the biases from news sources? You can't just 'average them out' if there are systematic skews.

e.g. just because half of the people are watching (say) CNN and the other half are watching (say) Fox News doesn't magically mean that the average of their opinions will be correct. What if both news sources overplayed the likelihood of terrorist attacks? How would the crowd adjust to this?

>First, if you want people to get better at making predictions, you need to keep score of how accurate their predictions turn out to be, so they have concrete feedback.

>But also, if you take a large crowd of different people with access to different information and pool their predictions, you will be in much better shape than if you rely on a single very smart person, or even a small group of very smart people."

Both points are clearly true. But it's obvious that the CIA, and probably anything political, has problems "keeping score," not getting more voices.

Obviously there are some very smart people working in government intelligence agencies, including the CIA, MI6, etc. However, in my experience working in the private intelligence sector, I found that many former government employees lacked creativity and relied too heavily on 'classified sources' and their network. The best analysts were extremely lateral thinkers who used Google and public databases, but would never have passed a government test on account of their quirks.

> In other words, there are errors on every side of the mark, but there is a truth at the center that people are responding to, and if you average a large number of predictions together, the errors will end up canceling each other out, and you are left with a more accurate guess.

This just reminds me all too much of the Twitch Plays Pokémon phenomenon that took place last month. The fact that they actually are completing games is, frankly, astounding.

It seems to me that all professional experts are subject to groupthink. However, could anonymous lay forecasters still make such excellent predictions without those groups in place?

Anyway, I think all this argues in favor of Robin Hanson's concept of "Futarchy":


Maybe the founding father's "fear" of the ignorant masses and choice of representative government needs to be reevaluated. Go back to direct actual democracy. But with ranges instead of yes/no. So as to capture "average is true signal".

Maybe first step is chance voting of representatives from yes/no to 1-100 how much you wan this person to win.

So could I suggest that CIA forecasts are affected by internal politics, perceived needs of political establishment, and afflicted by biases in top secret material that cannot be easily counter-balanced.

Whereas if they put these super-forecasters in the CIA and said "if you dont predict what i want its off to Afghanistan with you" they will soon regress to the mean.

Most people in the world only have access to public information so if you think most geopolitical events are brought about by "the masses" instead of by some Platoesque elites (which is of course very much debatable) it may well make sense to base your predictions on the data that is available to most players.

Hearing the article, I felt thrown into PKD novel. One of the recurring theme in his work are precogs, normal people with special abilities to forecast the future. Or also to mispredict the future very accurately. Large corporations use groups of precogs to plan marketing campaigns, product designs, etc.

The Wikipedia page on prediction markets storied history and interesting bit. Curious that USM had a prototype that was shutdown for being a "terrorism futures market."


"The wisdom of crowds is a very important part of this project, and it's an important driver of accuracy," Tetlock said

This part threw me off. But overall, crowdsourced estimates could be a useful tool, provided everyone is actually trying to guess correctly (ie national pride, monetary reward, etc)

I went to undergrad with quite a few future CIA analysts. You should not overestimate the average intelligence of CIA analysts. No doubt they have some brilliant people, but you shouldn't be surprised that amateurs with Google News outperform careerists inside a giant bureaucracy.

That article felt like it ended far too early. I was left with a "is that it?" feeling. Not that I was underwhelmed by the message of the article, but that I couldn't find a message in it. If it had a point beyond "crowdsourcing works", I missed it.

True for far, far too many articles IMO.

Assuming you can do that (ie: forecast world events), anyone know how you can use that for your own advantage, beside investing in stocks (even if not necessarily with the stock exchange, OTC or classic VC/angel investing enters this too) or commodities?

John Brunner's book "The Shockwave Rider" (1975), which was heavily inspired by Toffler's "Future Shock", had this concept (called a Delphi pool) as a major and successful component of the future society.

Delphi pools were invented by RAND, not Brunner, IIRC, have been studied over the years (Armstrong's _Principles of Forecasting_ has some papers on them), and work differently from prediction markets: Delphi pools collect a small number of experts, asks for written analyses and breakdowns and qualitative assessments (not necessarily the precise numerical probabilities of prediction markets), feeds each expert's analysis to the others, asks them for a new analysis incorporating the others, and repeats this for 2 or 3 rounds until a rough consensus has been reached. This is not very similar to prediction markets.

Doug Stanhope has a good rant about these opinion polls:


Are the underpinnings of this experiment in line with the idea behind the HyperLogLog story[0] I read about the other day?


If you have 3000 people predicting the outcomes of events with say a probablity of 0.3 after 6 trials there would be 2 people who got all of them correct and many others who would have a very good record out of pure chance alone.

Unless they publish the detailed numbers we can't be sure of whats going on here.

If some person with access only to normal sources is able to perform so well against people with access to lots of classified sources and information , what are we to conclude ?

Either they are just lucky or the people at the CIA aren't as competent as we think they are.

If we are unwilling to accept any of these theories then we need to know what makes these people good at it. I don't see how just plain simple analysis can beat people at the CIA.

But the point here seems to be to use the group average, not to find outliers with good prediction ability per se.

That's exactly it, which is why I was so confused about the "team" of "superpredictors". While you might learn something by comparing their methods with everybody else, you still have to do the same amount of scrutiny on everyone in the crowd to learn anything useful.

With enough data, you can apply an automatic adjustment to even the most horrible predictors, to make their contribution to the final result more useful. If, for instance, someone always predicts exactly 10% more favorable results for Israel, independently of the facts, you can just reverse that for that one person on every question mentioning Israel, to produce a bias-adjusted result. That is tremenddously difficult, though, and not nearly as useful as just increasing the size of the crowd.

Which is the whole point. With a big enough crowd, you don't need to do that at all, because the stupid little biases unrelated to the facts tend to be noise, not signal.

"In fact, Tetlock and his team have even engineered ways to significantly improve the wisdom of the crowd"

Does anyone know what these methods might be?

One example briefly mentioned was providing feedback to respondents to let them know when they were getting things right or wrong.

And how would that work?

Take their example question - 'Will North Korea launch a new multi-stage missile before May 1, 2014?' That's a yes/no question. But none of the participants knows the answer. They are just trying to forecast the future based upon the balance of probabilities. So if NK does launch a missile, the people who answered 'no' were wrong, but the reasons upon which they came to pick 'no' could still have been right. So giving feedback to these people and telling them that they were wrong does not magically aid/improve predictions.

Perhaps a simpler example could be used. 'Will the next roll of this dice score 3?' Well, it's obviously more likely that some other result will happen. But if the 3 does come up, you can't say that all those people who said 'no' are worse at predictions...

Even trickier - it sounds like the participants are giving probabilities. So if you guessed some possibility was 42% likely and then it did in fact happen - are you right or wrong? I don't think there is an answer to that question.

PredictionBook [1] does it that way. You get a chart of how many times you assigned a certain probability and what percent of times

[1] http://predictionbook.com/

Simple math applied to a large enough set of predictions will let you know whether or not your probabilities are accurate. It is the same as playing poker - every hand you are essentially betting on probabilities multiple times (okay right now I thing I have a 50% chance of winning this hand and the pot is X and I have to put in Y to keep playing, I'm going to continue). In the short term it is very hard to know whether or not you are putting money in when you "should" because of variance. In the long term, the variance disappears and you can see that, for example, you are making an estimated 2 bets/100 hands you play which means you are "more correct" than the people you are playing against at estimating the odds. A little more complicated than that of course because you are also using bets as weapons to drive better hands out through bluffing and semi-bluffing, etc but the general point holds that a large enough sample size of actual results versus compared results means that your probabilities will hold or they won't

Works great for poker, where you play lots of hands. Works less great for the situation in the article. Just how many North Korean missile launches do you need a person to predict before you know they are lucky or not?

Doesn't matter, as long as they predict a large enough sample of events. Just requires an assumption that they will be as accurate on every type of event, which probably isn't true, but close enough for statistics

I did exactly this in high school:

I set up an experiment to find the people best at guessing Zener cards picked at random.

I surveryed 100 people and the best 5 did notably better than the others. I dubbed these people "clairvoyant". (Spoiler: on retesting, they weren't, but then again I only followed this process once...).

Wisdom of the crowds will work just fine for anything modestly obvious, and it will fail for any black swan events that intelligence actually needs to be aware of - if the predictors even get asked about those.

This reminds me of the future predicted in Rainbows End by Vernor Vinge.

Where he explores the ability to spin up pools of analysts like a cloud server.

Same rules apply to other institutions e.g. investment banks and recommendations. Everybody sticks close to the herd.

Was the project shut down? (the website is not working, sorry if I haven't missed that in the article)

GJP is still running and just added some new IARPA questions today. Registration for the next season may not be open yet, though.

Crowdsource prophet. This is not new, that happened before with the "peak oil community".

The formatting of the questions and answers could contribute to a candidate's success as well. Unlikely events could be phrased differently than likely events with the intention of skewing the success rate to the positive.

I would hate for data from this organization to be used for any sort of real decision making process. You might as well hire a psychic.

Inaccurate headline. Citizenship is not required for participation.

How does this forecasting market compare to something like Intrade?

Intrade is a futures market where people with insider knowledge can buy into a position. So if you know of a scandal for a political candidate that will cause him to withdraw, you can go buy up all the other candidates and make some money when he does. This will alter his price, becoming a predictor before the news actually comes out.

Additionally, there is the "put your money where your mouth is" aspect of it. While every candidate in the primaries, is announced as, "the next president of the United States", you can see who the people really think will win based on which futures they purchase.

The CIA effort is more like taking a whole lot of people, asking them to predict something and seeing who is good at it. There appears to be some manipulation of those guesses across the breadth of the crowd but it is not mentioned. In contrast to futures markets, the players in the CIA effort have no stake in the game.

Right, so one would expect something like Intrade or other prediction markets to leverage both the statistical "wisdom of crowds" and insider information to be even more accurate.

The GJP prediction market is pretty similar to how Intrade operated, but a head to head comparison is going to be very difficult inasmuch as a) Intrade has been shut down for like a year now; and b) Intrade's contracts overlapped only a little bit with the IARPA questions.

I noticed this phenomenon alot when I used to work at Intrade.

I sometimes have good runs in blackjack.

Where's the API?

It might be possible to put something together using Amazon's Mechanical Turk.


False classified information might actually work against the analyst who has access to it.

The same piece of info, if public, will have to stand on its own, without the bias introduced by restricted access / multiple levels of filtering / the attached opinions of people wearing suits.

Even true classified information can cause problems if overvalued. The questions seem to be rather general predictions so publicly available geopolitical context likely matters more than any individual bit of inside information.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact