> The big surprise has been the support for the unabashedly elitist “super-forecaster” hypothesis. The top 2% of forecasters in Year 1 showed that there is more than luck at play. If it were just luck, the “supers” would regress to the mean: yesterday’s champs would be today’s chumps. But they actually got better. When we randomly assigned “supers” into elite teams, they blew the lid off IARPA’s performance goals. They beat the unweighted average (wisdom-of-overall-crowd) by 65%; beat the best algorithms of four competitor institutions by 35-60%; and beat two prediction markets by 20-35%
If they were, they would go make money in the prediction markets, and the prediction markets would become more accurate.
Truths so brutal, and predictions so harsh that no one dares write them down.
But now we enter the information age, and certain realities are essentially undeniable, and can no longer concealed from the civilian panopticons like Google.
Sort of like how TV Networks (and even basic cable channels to a degree), with their Standards & Practices departments, waste time sanitizing television destined for the public air waves and radio spectrum, while media on the internet operates with an almost total lack of restraint.
Basically, get a degree in a related field (international politics, security, languages, demographics, economics, etc), and have decent marks. Apply. Don't tell anyone (until they have briefed you on what you can say). Don't have any skeletons in your closet.
It's basically like getting a USAID job, but probably not as competitive (lots of funding = lots of positions, and most people doing these kinds of degrees tend to not be very pro-defence) and with slightly higher security requirements.
On the other hand, if I actually _am_ good at making predictions, then I will continue to make good predictions.
That is, to borrow the old get rich quick scheme, take a list of 1024 addresses. Send to half of them "Stock X will be higher on (date) than today", and to the other half "Stock X will be lower on (date) than today". Repeat with the set you sent the correct 'predictions' to, and continue repeating until you're down to four people who you've sent the correct prediction to every time. Suggest they sign up for your premium stock tips newsletter.
All you're doing is playing statistics, you're not actually demonstrating any predictive ability.
According to the article, "In fact, she's so good she's been put on a special team with other super forecasters whose predictions are reportedly 30 percent better than intelligence officers with access to actual classified information."
If they mean that after the team was formed, the predictions are still 30% better, than random chance ceases to be an acceptable explanation. If they mean that the team that they formed was 30% better on the predictions used to grant entrance into the team, then your objection still holds. Unfortunately, I don't think the article makes it clear which is the case.
That said, there are definitely some mechanisms that could lead an outside group to higher accuracy on these sorts of things. The two biggest are: 1. The insiders are simply that bad, for whatever reason, and random chance can easily beat them, and 2. The failures of the insiders are correlated and overwhelm their theoretically superior information, and I'd especially finger potential ideological biases there, which could affect both information filtering and the analysis. (It is one thing to believe the world should be a certain way as a result of one's ideology... it is quite another to believe the world must already be a certain way as a result of one's ideology, even when one is staring evidence in the face that it is not. One is noble and laudable... the other somewhat less so.)
I'm still not convinced these folks aren't just outliers. People in all walks of life give far too much credence to lucky outliers - CEOs are hired that way, ball players, stock portfolios, sports predictors. If you make enough predictions and cherry-pick the best, you can always find a 'miracle man'.
Yes, that was my point. The article failed to make it clear. It is worded such that it could go either way.
I agree they may be outliers. However, I can't dismiss out-of-hand the possibility that they are actually better. Outliers are common... but so are systematic/correlated weaknesses within large organizations.
Indeed, that's such a big problem that there's a sort-of famous book written by a CIA analyst, "Psychology of Intelligence Analysis", that gives a pretty clear breakdown of how and why analysts fail to reason correctly about situations.
It also included examples of how Robert Gates (yes, the later SECDEF) managed to improve analyst reporting by actually reading the reports submitted and figuratively shredding the ones which were bad.
Before that there had been effectively no downside to submitting reports which were crap, intellectually speaking, and eventually things had become just professional-sounding educated guesses.
Large agencies don't have the benefit of small groups self-critiquing their own performance, so they have to try very hard to set and enforce standards of quality. When they don't do this it should not be surprising when quality drops.
This makes me wonder how well analysts do, when they aren't given classified intelligence information. It's entirely possible that the analysts' skills are being hamstrung by information they think they know, but isn't actually true.
An excellent book on just this subject
So, do you really think the researchers in this case hadn't thought of this? I mean yeah, it's OK to doubt and inquire, but come on, this is so rudimentary...
> If the questions are all yes/no questions
The article says (or at least implies) that the subjects give probabilities of "yes" to each question.
EDIT: Oh, and to add, the article clearly says they're not just making shit up and randomly coming up with probabilities. They collect open source info on events and base their estimates on that. So it's not just blind guessing.
See response here as to why they're still yes/no questions (TLDR; the outcomes are yes/no) - https://news.ycombinator.com/item?id=7515824
If you need to know more, try reading the article? It says that they're not yes/no questions.
I won't go into details about it because it's not my place to do so at this point (maybe after).
But I will confirm that for every response you give, you're required to enter a percentage estimate of likelihood. For example, you'd enter 90% or 72.212% or whatever on whichever question you're responding to. So there's a potential mechanism for further ranking of participants beyond the binary. The voting mechanism itself is more complicated, but again, I'll leave discussion for when it's over.
If this is the case then it's reasonable to assume CIA's statisticians would have done the analysis and know that's the reason these "superforecasters" are better: doubt
I guess the reverse is also possible: professional intelligence analysts are systematically tending towards being overcautious and tend to pick numbers towards the middle of the range, either out of a desire not to look silly or because they're more aware of policy implications. But subjectively I'd assign that a lower probability.
It may be that they're weighting it based on confidence level (so saying 0% chance on something that happens counts against them, but 49% chance on something that happens counts against them less), but it still counts as a yes/no in that given enough people, and a random distribution of answers, you would expect a subset of people to always be right (though the 'amount' of right changes, this person said 51% chance of it happening, this person said 100% chance of it happening; they both got it right).
In my undergrad ecology class, we were assigned a scent-based laboratory experiment which involved all the male and female members of the class wearing the same white t-shirt for 48 hours in order to accrue bodily odors-- heavy workouts, perfume, showering, etc were all prohibited. At the end, everyone in the class sniffed all of the t-shirts and guessed which t shirt had been worn by what sex.
Most people had a 50-50 chance, and a few guessed slightly better than chance. I was far worse than chance, guessing only 10% correctly. In a way, anti-forecasting is a type of forecasting, too. If someone had asked me for my guess and then guessed the opposite, they'd have had superior guessing power. This paradigm falls apart quickly when the chances of events aren't so clear, however.
 - http://scholar.google.com/scholar?cluster=144091158652582285...
e.g. "if <insert presidential candidate> is elected, that's gonna be the end of America" with similarly failed predictions over the next four years when that candidate is elected.
Whats neat about SciCast is you can make predictions based on the assumption of how other questions will turn out.
eg I think that if the price of bitcoin exceeds 1000 by 2015 at least one presidential candidate will accept bitcoin donations. If btc ends up being below 1000 I make no prediction.
A lot of these federal positions are military. So I assumed it was some enlisted guy out of high school that didn't want a desk job but wants to be firing guns. He's been doing it years and wants out. He's not invested in any of this analysis.
Compared to the guy that self-selected to sign up for an analysis website as a hobby. He finds it fun to think and analyze. He gets points for being right and has a score to improve.
The other factor is workloads. If a CIA analyst has to make X decisions in Y time, then they may be rushing.
I read this story as a failure of federal services to reward successful performers and remove failing performers.
The more involved you are, and especially the more accountable you are held for your decisions, the more you're likely to second guess yourself. There's normally a reason your second guess wasn't your first guess.
Are they averaging everyone's answer (wisdom of the crowd)? Are they looking back at who got lucky and declaring them "super forecasters" (and then seeing large regressions to the mean)? Did they have separate periods of testing to try and control for this? Did they have a control group of similar size just flipping coins?
The way this is written it sounds like someone used a Gladwell book as their entire science education.
Edit: Oops, I misunderstood. Apparently Good Judgment also has a prediction market group inside of it. Thanks nbouscal.
It's probably much easier to reach a successful prediction when a lot of the framing has already been done for them. If you take the Ukraine question, for example, there's already a lot of intelligence encapsulated, such as the City and Date when the CIA is expecting a Russian invasion. As a post mentions below, the questions may even be phrased/biased towards the right answer.
I wonder if they would be as good at prediction if they didn't have the resources and intelligence of the CIA providing them with the correct focus, prioritization, and framing.
This may fall under:
" Otherwise please use the original title, unless it is misleading or linkbait. "
Thanks for the link, it's been a few years since I read those rules.
The book also briefly covers the Policy Analysis Market, a program to experiment with using a futures market to help predict terrorist attacks. As this program used real money, and offered the opportunity to make money off of terrorist attacks, some members of Congress were outraged and the program was immediately cancelled (http://news.bbc.co.uk/2/hi/americas/3106559.stm). This version discussed in this NPR article seems more publicly palatable. In fact, it looks like the similarity was already reported: http://articles.latimes.com/2012/aug/21/nation/la-na-cia-cro...
By the way, can anyone recommend a more rigorous book than Wisdom of Crowds on this subject?
Crowds (as in mobs) aren’t wise; large groups of independent decision makers are wise.
Hopefully this will scale more broadly, continue to improve its incentives, and increase our capability to prepare for future events.
If you are interested, one of the first people to work on the problem were Dawid & Skene  back in 1979. They were trying to solve the issue of clinicians talking to patients and asking how they are. Each time they asked, the patient may give a slightly different answer. The goal was to estimate the "true" answer from the number of various "noisy" answers (sound familiar after reading the OP?)
The general intuition for most future research into the topic is that if you tend to agree with others more often than not, your score goes up. If your score is higher, your answers are trusted more.
Then there are a whole bunch of interesting additions to this principle, such as factoring in question difficulty  (e.g. questions which have the most disagreement can be considered more difficult, and hence you may want to ask more people about these questions to get a clearer picture). It is also helpful to seed certain questions for which the answer is "known" (may or may not be possible, depending on the project). Then you can use the answers to these questions to help score people even more accurately.
An even more interesting approach is to take into account people who answer adversarially, that is, they always answer incorrectly (either due to malicious intent, or they consistently misinterpret the questions in a backward fasion). If you have somebody who answers incorrectly 90% of the time, then they provide almost exactly the same information as someone who answers correctly 90% of the time. All you need to do is flip their vote once you've identified this.
The whole field is very facinating, and there seems to be more and more publications coming out with interesting approaches to the problem.
It was a good call to talk about the scientific basis for the idea; I wish more places that relied on crowdsourcing would do the same thing.
Not really... For example, have you heard of the Cicada puzzle? That is solved by groups of people, and probably wouldn't be solved if it was just one person.
Multiple people with specialized knowledge > one person with specialized knowledge.
We didn't evolve to be social animals because it would make us stupid...
When the story title is linkbait and there are no good subtitles, we sometimes draw on the first sentence of the story. In this case I used the description under the photo. The key thing is to use only language that's already there.
It would be interesting to see how small the group of superforecasters would need to get before their accuracy falls below that of the CIA analysts.
But then they turned left onto stupid street. They took the people who, perhaps by chance alone, predicted results closest to the crowd mean, and put them on a "team". What the heck is the point of that? It just reduces the size of the crowd and shows a fundamental misunderstanding of statistics.
Past performance does not necessarily predict future results.
Start with 10000 coins, and flip them all. Keep the ones that land heads. Flip those, and keep the ones that land heads. Repeat until all coins land on tails, or one coin remains as the "champion of heads." What is the probability that when it is flipped again, it comes up heads? Still 50%. It isn't a supercoin. You just managed to select it from a giant group by making it the sole outlier from a series of independent trials.
Though you should probably check to make sure it actually has a tails side, just in case.
Why do you assume that what's in the NPR article is all there is to it, that there's no info, technical or other, left out?
Of course they thought of that, and checked. There's no regression to the mean, the elite group further improved, their estimates were 65 % better than the overall crowd.
e.g. just because half of the people are watching (say) CNN and the other half are watching (say) Fox News doesn't magically mean that the average of their opinions will be correct. What if both news sources overplayed the likelihood of terrorist attacks? How would the crowd adjust to this?
>But also, if you take a large crowd of different people with access to different information and pool their predictions, you will be in much better shape than if you rely on a single very smart person, or even a small group of very smart people."
Both points are clearly true. But it's obvious that the CIA, and probably anything political, has problems "keeping score," not getting more voices.
This just reminds me all too much of the Twitch Plays Pokémon phenomenon that took place last month. The fact that they actually are completing games is, frankly, astounding.
Anyway, I think all this argues in favor of Robin Hanson's concept of "Futarchy":
Maybe first step is chance voting of representatives from yes/no to 1-100 how much you wan this person to win.
Whereas if they put these super-forecasters in the CIA and said "if you dont predict what i want its off to Afghanistan with you" they will soon regress to the mean.
This part threw me off. But overall, crowdsourced estimates could be a useful tool, provided everyone is actually trying to guess correctly (ie national pride, monetary reward, etc)
Unless they publish the detailed numbers we can't be sure of whats going on here.
If some person with access only to normal sources is able to perform so well against people with access to lots of classified sources and information , what are we to conclude ?
Either they are just lucky or the people at the CIA aren't as competent as we think they are.
If we are unwilling to accept any of these theories then we need to know what makes these people good at it. I don't see how just plain simple analysis can beat people at the CIA.
With enough data, you can apply an automatic adjustment to even the most horrible predictors, to make their contribution to the final result more useful. If, for instance, someone always predicts exactly 10% more favorable results for Israel, independently of the facts, you can just reverse that for that one person on every question mentioning Israel, to produce a bias-adjusted result. That is tremenddously difficult, though, and not nearly as useful as just increasing the size of the crowd.
Which is the whole point. With a big enough crowd, you don't need to do that at all, because the stupid little biases unrelated to the facts tend to be noise, not signal.
Does anyone know what these methods might be?
Take their example question - 'Will North Korea launch a new multi-stage missile before May 1, 2014?' That's a yes/no question. But none of the participants knows the answer. They are just trying to forecast the future based upon the balance of probabilities. So if NK does launch a missile, the people who answered 'no' were wrong, but the reasons upon which they came to pick 'no' could still have been right. So giving feedback to these people and telling them that they were wrong does not magically aid/improve predictions.
Perhaps a simpler example could be used. 'Will the next roll of this dice score 3?' Well, it's obviously more likely that some other result will happen. But if the 3 does come up, you can't say that all those people who said 'no' are worse at predictions...
I set up an experiment to find the people best at guessing Zener cards picked at random.
I surveryed 100 people and the best 5 did notably better than the others. I dubbed these people "clairvoyant". (Spoiler: on retesting, they weren't, but then again I only followed this process once...).
Wisdom of the crowds will work just fine for anything modestly obvious, and it will fail for any black swan events that intelligence actually needs to be aware of - if the predictors even get asked about those.
Where he explores the ability to spin up pools of analysts like a cloud server.
I would hate for data from this organization to be used for any sort of real decision making process. You might as well hire a psychic.
Additionally, there is the "put your money where your mouth is" aspect of it. While every candidate in the primaries, is announced as, "the next president of the United States", you can see who the people really think will win based on which futures they purchase.
The CIA effort is more like taking a whole lot of people, asking them to predict something and seeing who is good at it. There appears to be some manipulation of those guesses across the breadth of the crowd but it is not mentioned. In contrast to futures markets, the players in the CIA effort have no stake in the game.
The same piece of info, if public, will have to stand on its own, without the bias introduced by restricted access / multiple levels of filtering / the attached opinions of people wearing suits.