There's calibration, but you can also just see contests where you pit the community aggregate against individual forecasters and see who wins. The Metaculus aggregate is really dominant in this contest of predicting outcomes in 2023, for example. See this: https://www.astralcodexten.com/p/who-predicted-2023
He's never lost a bet (with academics, researchers, pundits, etc.) for 17 years of public betting. You can see his current open bets here and predict whether he'll win.
There are several AI categories on the track record page; make sure you're not just selecting the one or you'll miss a lot. There's a careful analysis of the overall track record on AI questions here:
https://www.metaculus.com/notebooks/16708/exploring-metaculu...
The short version is that the Brier score is much better than .25 for AI questions, and the weighted Metaculus Prediction is more accurate still.
> The short version is that the Brier score is much better than .25 for AI questions, and the weighted Metaculus Prediction is more accurate still.
Added more categories. 1 year out is 0.217. I agree that's better than chance, though "much better"?
That said, this is dominated by bad community predictions pre-2020 and there's not much data recently for binary questions. I agree that CRPS is better - but it's not clear to me from that link how early they are looking at questions - accuracy gets better closer to resolve date -- I'm claiming that longer-term predictions are shakier.
Can I see the list of questions used in this analysis somewhere? Is it literally just the set of questions I see when I filter for "Resolved" and "Artificial Intelligence"?
My impression from browsing that set of questions is that it's a mix of pretty trivial things like "how expensive will chatGPT be?" or "when will Google release Bard?". There are very few questions in the bunch I'd even consider interesting, let alone ones where the metaculus prediction appears to have offered any meaningful insight.
No, it's also 'AI and Machine Learning,' 'Artificial Intelligence, and 'Forecasting AI Progress.' The list of resolved questions is roughly this: https://www.metaculus.com/questions/?status=resolved&has_gro... (though that will include a few questions that have resolved since the analysis.)
Anyone can forecast, sure. But there's a large body of research on the accuracy of aggregated forecasts and on the ability of forecasters to become more accurate with practice. (Thinking here in particular of work by Mellers & Tetlock.)
Metaculus provides a transparent track record of community forecasts here: https://www.metaculus.com/questions/track-record/ It's very difficult for any one person to consistently beat the community.