Methodology

Most forecasters ask to be trusted. Meridian leaves a record.

The top of this page is for executive review. Live scores show what resolved recently. Resolved calls show examples. The method and audit appendix explain how the record is produced.

What makes it different

Compared to consultancy reports

We score against unseen years, not the past.
An accuracy score on every claim, not a narrative.
Thirty-year horizon. EIU publishes five.
Audit trail per cell, not an executive summary.

Compared to in-house models

Cross-domain calibration. Nine subjects on one score.
No retroactive tuning. Versions tracked separately.
ForecastBench-aligned. Replicable by an external team.
139 verified corrections, each dated and sourced.

Benchmark · 2020-2026 hold-out · 297 pairs

Accuracy

2020-2026 hold-out · n=297

5.61-point average miss on a 100-point scale

We hid the target years from the model, then scored what it predicted against what actually happened. Across 297 forecast-vs-actual pairs the average miss was 5.61 points on a 0-to-100 scale. A separate 25-year structural backtest covering 17,875 pairs comes in tighter at ±4.7 on the pre-2020 subset.

Things we track

9 subjects × 11 regions

99 forecast surfaces running 2020 to 2050

9 subjects times 11 regions makes 99 forecast surfaces. Each runs from 2020 to 2050 and updates as new data arrives. Tap any subject card below for sources and the regions covered.

Pairs graded

Target years unseen by the model

297 forecast-vs-actual pairs already scored

297 forecasts already scored against what actually happened. 33 per subject. 27 per region. Both more than enough for reliable inference. Every new forecast joins the registry.

0 yr

Forecast horizon

2020 to 2050

Thirty years forward, on every surface we run

Each of the 99 forecast surfaces runs from 2020 to 2050. The horizon a board needs to read its own decision against the regime it will live inside, not just the next quarter or fiscal year.

Production calibration · current

0.000

Brier score

7-day rolling · n=36 · live

Lower is better. 0.25 is a coin flip.

Calibration you can audit, not just trust.

Brier score measures how close a probability was to what actually happened. A perfect forecast scores 0. A confident wrong call scores 1. This window scores 0.031 on 36 resolved predictions in 1-day binary financial threshold templates (will SP500 close above X, will VIX be above Y, etc.). Not directly comparable to Tetlock or ForecastBench numbers — those benchmark multi-month forecasts on different question mixes. Skill validation runs through a documented gate sequence: BSS vs persistence baseline, then BSS vs market-implied probability, then live P&L. See below for the gate state.

Resolved

Predictions with outcomes · 30d

382 predictions resolved in the last 30 days

Live production loop. Templates score themselves as outcomes resolve, and the rolling Brier you see updates automatically. The number is small enough to read in a glance, large enough for honest inference. Multi-class bucket templates (BTC ranges) are excluded from this headline pending per-event multinomial-Brier aggregation; they are tracked separately.

Average miss: 5.61 points on a 0-to-100 scale. 297 forecast-vs-actual pairs across the 2020-2026 hold-out.

Most forecasters do not publish their accuracy. We do.

01 · CallsResolved calls from the production registry

Six examples from the last 30 days. Five confident wins, one directional-right under-confident miss. The registry publishes both.

01
Economy
WTI crude will close above $80 a barrel
Meridian
97% Yes
Resolved 2026-05-19
Yes, $101.56
FRED DCOILWTICO
Verdict
Right call
Strongly held
Brier 0.001
02
Economy
Weekly initial jobless claims will exceed 250,000
Meridian
9% No
Resolved 2026-05-19
No, 211,000
FRED ICSA
Verdict
Right call
Strongly held
Brier 0.008
03
Money
Fed Funds Effective Rate will be above 4.0%
Meridian
15% No
Resolved 2026-05-19
No, 3.63%
FRED EFFR
Verdict
Right call
Confidently held
Brier 0.023
04
Money
2Y-10Y Treasury spread will be positive (no inversion)
Meridian
89% Yes
Resolved 2026-05-19
Yes, +0.54
FRED T10Y2Y
Verdict
Right call
Strongly held
Brier 0.013
05
Economy
VIX will exceed 30 (panic territory)
Meridian
6% No
Resolved 2026-05-19
No, 18.43
FRED VIXCLS
Verdict
Right call
Strongly held
Brier 0.003
06
Money
Gold will close above $3,200 an ounce
Meridian
65% Yes
Resolved 2026-05-19
Yes, $4,587
Yahoo GC=F
Verdict
Right call
Under-confident, right side
Brier 0.125

Sample of resolved calls from the production registry. The full registry, scoring details, and reasoning trail are shared with engaged clients under confidentiality.

02 · AccuracyThe 2025 test

03 · Live production registryRolling Brier on production templates, three windows

Live production registry

Snapshot 2026-06-16

Last 7 days

0.031

n = 36

7-day rolling window, registry snapshot

Murphy decomposition

0.0309: Calibration miss
0.2431: Resolution lift
0.2431: Climatology variance

By category

Economyn = 110.010
Societyn = 100.012
Moneyn = 60.065
Geopoliticsn = 50.070
AI & computen = 40.042

Last 14 days

0.032

n = 141

14-day rolling window, registry snapshot

Murphy decomposition

0.0122: Calibration miss
0.2298: Resolution lift
0.2494: Climatology variance

By category

Moneyn = 630.008
Economyn = 430.045
Societyn = 180.022
Geopoliticsn = 90.083
AI & computen = 80.110

Last 30 days

0.040

n = 382

30-day rolling window, registry snapshot

Murphy decomposition

0.0118: Calibration miss
0.2229: Resolution lift
0.2500: Climatology variance

By category

Moneyn = 1930.041
Economyn = 1460.023
Societyn = 220.029
Geopoliticsn = 120.152
AI & computen = 90.163

Three rolling windows on the production registry. The Murphy decomposition and per-category breakdown sit underneath each headline number.

04 · Long windowTwenty-five years of structural accuracy

The longer window

17,875 pairs · 25 years

Long-window structural accuracy across the five subjects with continuous public data back to the year 2000. Aggregate ±4.7 MAE across 10,450 pre-2020 pairs.

Subject	Shock 20–24	Pre-2020	Rank ρ
Climate	±5.2	±3.0	0.97
Society	±7.0	±3.5	0.86
Education	±5.4	±5.2	0.92
Economy	±7.4	±5.5	0.70
Geopolitics	±8.7	±6.4	0.95

The scorecard is computed only over 2020 to 2024. That window is the compound-shock era: COVID, Russia-Ukraine, Iran-Israel, sovereign debt repricing, AI capability acceleration all overlapping at once. Expanding the test back 25 years gives a long-window view that includes the 2008 GFC, 9/11, Crimea 2014, Brexit, and the 2018 to 2019 trade war as named single-shock events.

The structural slice is the pre-2020 subset where both base year and target year fall before 2020. Pre-2020 is not calm in the absolute sense, it is calmer relative to the 2020 to 2024 overlap of disruptions. Long-window structural is the accurate phrasing.

Aggregate across all five subjects. 17,875 forecast-vs-actual pairs over the full 25-year window come out to a weighted MAE of ±6.1, or ±4.7 on the 10,450-pair pre-2020 structural subset.

Why these five. Climate, society, education, economy, and geopolitics are the subjects with continuous public data back to 2000 across all 11 regions. Tech subjects don't have 25 years of base values, so their hold-out numbers stand on their own.

Sources. V-Dem v16 indicators (1900 to present), World Bank WDI macro and environmental series (1960 to present), editorial event overlay for documented shocks.

05 · What we trackNine subjects, eleven regions, thirty years

99 forecasts in motion.

Money
Debt service rising, monetary regimes unsettled.
20202050
What it covers
Government debt, currency stability, capital flows.
Sources
IMF, BIS, World Bank, national central banks
Region coverage
All 11 regions, continuous 2020 to 2050
AI & compute
Capability compounding past AGI, into ASI, no plateau in sight.
20202050
What it covers
Top-end AI capability, who controls the compute, where it gets deployed.
Sources
Epoch AI, Stanford HAI, regulatory filings, public capex disclosures, METR task-length benchmarks
Region coverage
All 11 regions, with a separate view of top AI clusters
Tech frontier
Long-cycle technologies arriving in compressed sequence.
20202050
What it covers
Quantum computing, biotech, energy transition, long-cycle resets.
Sources
NIH, NSF, IEA, journal publication rates, public capex
Region coverage
All 11 regions, plus quantum and biotech sub-views
Society
Trust eroding, accelerating through post-labor shock.
20202050
What it covers
Trust in institutions, demographics, cohesion vs fragmentation.
Sources
Edelman Trust Barometer, World Values Survey, national census
Region coverage
All 11 regions, with an institutional vs societal split
Geopolitics
Volatility rising sharply through the 2032 governance crisis.
20202050
What it covers
Alliances, conflict probability, how governments behave.
Sources
ACLED, CSIS, sanctions registries, public defense spending
Region coverage
All 11 regions, two-country and multi-country views
Climate
Physical risk rising; ASI mitigation cascade still past the 2050 horizon.
20202050
What it covers
Emissions, physical risk, how fast the transition is happening.
Sources
NOAA, IPCC AR cycles, IEA, national emissions inventories
Region coverage
All 11 regions, with an ecology sub-view
Economy
Disruption peaks at 2030 post-labor shock, restructured 2033.
20202050
What it covers
Growth, jobs, productivity, which sectors are rising and falling.
Sources
World Bank, OECD, IMF WEO, national statistical agencies
Region coverage
All 11 regions, with sector-level views
Education
Traditional value falling as marginal cost of knowledge approaches zero.
20202050
What it covers
Schooling, skills, talent flows, attainment by region.
Sources
UNESCO, OECD PISA, World Bank Ed Stats, national education ministries
Region coverage
All 11 regions, attainment and skills-gap sub-views
Positive signals
Quiet trajectories the foresight literature underweights.
20202050
What it covers
Underweighted good news most foresight leaves out.
Sources
WHO, UNESCO, Our World in Data, World Bank
Region coverage
All 11 regions, the counter-narrative view

06 · How it worksFour checks, all published, all auditable

Tap any card for the technical detail behind the plain-language summary.

01
We predict a year we haven't seen.
Train on data through 2024, predict 2025, score against what actually happened. 297 forecast-vs-actual pairs. No data leakage.
The technical name is a held-out backtest. The model only sees data on or before a chosen base year, projects forward to a target year it has never seen, then gets scored against what actually happened.
Current sweep: base year 2020, target years 2022, 2024, 2026. That produces 297 forecast-vs-actual pairs at the configuration behind the 5.61-point miss.
An expanded economy-only sweep over 2000 to 2024 base years adds 3,575 pairs, using V-Dem and World Bank WDI to backfill. That sweep produces the structural-vs-shock split shown below.
02
We grade confidence, not just correctness.
You get penalized for being wrong, and also for being overconfident when you shouldn't be. The score breaks down three ways a forecast can fail.
The score itself is the Brier score, mean squared error between forecast and outcome. A rolling 7-day Brier runs continuously across all resolved predictions. A regression alert fires if it slips by more than 0.066 against the trailing baseline.
To find why a Brier score moves, we apply the Murphy decomposition. It splits the score into three independent ways a forecast can be wrong:
Calibration. When you say 70%, does it happen 70% of the time?
Resolution. Can you tell different cases apart, or is every forecast roughly the same?
Sharpness. Are you saying something informative, or hedging at 50/50?
03
We combine five forecasters, not one.
Five model families, each from a distinct lineage. Members are weighted by their track record, not by reputation or how recent they are.
The technical name is a cross-family ensemble. Five forecasters drawn from different model families, combined with a skew-adjusted aggregation method (Powell, Satopää, MacKay, Tetlock, 2024). An evolution of the techniques developed during the IARPA forecasting tournaments.
Forecasters that miss more often get less weight, based on track record. Not by reputation. Not by how recent the work is.
A second test catches errors the Brier score misses: a model that is consistently off in one direction scores poorly on Brier but still ranks everything perfectly. Two rank correlation metrics close that gap. Current sweep: ρ = 0.906, τ-b = 0.759 across 297 pairs. Close to 1.0 means the rank ordering is nearly perfect.
04
Every number traces to a named public source.
139 verified corrections to historical baselines so far. Every cell in the data substrate is auditable.
Historical data is built from named, citable public sources, ingested through an audit-trailed pipeline. An ongoing recalibration process has applied 139 verified corrections to historical baselines to date, improving the ground truth against which every forecast is scored.
Each correction is dated, sourced, and linked to a specific historical cell. The audit register records what was changed, why, and against which public source the change resolves.
The forecast schema also aligns with ForecastBench (Karger et al. 2024), the open academic benchmark for forecasting systems. Any prediction can be re-scored from scratch against ground truth. Different model versions are tracked separately, so improvements are not accidentally credited to old work.

07 · Built forFour decisions this is shaped for

Each card opens to the decision frame, what you walk away with, and where the rigor comes from.

01
Holds up under multi-year program review.
A persistent forecast memory with a Brier history per claim and an audit appendix any external reviewer can replicate.
The decision
Which long-horizon programs and partnerships are tracking as intended, and which need to be rethought before the next funding cycle. The calls that resolve over decades and have to defend themselves year after year.
What you walk away with
A forecast registry your board can inspect, year over year, with a calibration trend that improves as the program runs.
Where the rigor comes from
- ·Held-out backtest at 297 pairs (2020 → 2022/2024/2026)
- ·Brier score per claim with Murphy decomposition
- ·139 verified corrections to historical baselines, each dated and sourced
- ·Rank correlation ρ = 0.906 across the 297 pairs
Replaces or augments
NIC Global Trends, CSIS, and RAND scenario work. Those disclaim prediction and refresh narrative scenarios every four years. This produces scored, continuously updated forecasts with the audit trail institutional review demands.
02
Reads many regions at once.
Nine subjects across eleven regions, running simultaneously. The intersections are the part most foresight work doesn't reach.
The decision
Where momentum is real across regions, and where it's local elite consensus that won't generalize. The cross-region read that tells you which signals reinforce each other across the map and which are isolated to one cluster.
What you walk away with
A one-page cross-region brief naming the three highest-confidence subject-region intersections for the question on your desk, with the data trail underneath.
Where the rigor comes from
- ·9 subjects × 11 regions = 99 surfaces, each on the same 0-to-100 calibration
- ·Sources include IMF, BIS, World Bank, IPCC, IEA, Edelman, WHO, NOAA, ACLED, Stanford HAI, Epoch AI, UNESCO
- ·Positive signals tracked as a first-class surface, not an afterthought
- ·Cross-family ensemble of five forecasters, skew-adjusted aggregation
Replaces or augments
Verisk Maplecroft and other current-state index families. Those publish current-state only, no trajectory, no Brier. This adds the 30-year forward view and the per-claim accuracy score those don't produce.
03
Backs up the calls a commission has to vote on.
A held-out backtest, a Brier per claim, and a calm-window economy number of 5.5 points that survives appropriations review.
The decision
The tactical calls that get voted on and have to survive minutes-of-meeting scrutiny. Industry-mix shifts, workforce projection, infrastructure justification, appropriations defense. Reads where the question after the vote is always where did this number come from.
What you walk away with
A forecast appendix with a Brier score per claim and a held-out backtest disclosure that survives audit-committee inspection. A scenario range for shock years, a point estimate for calm years.
Where the rigor comes from
- ·Held-out 2025 backtest — the model only saw data through 2024
- ·Murphy decomposition on every claim
- ·No retroactive tuning. Model versions tracked separately.
- ·139 verified corrections, dated and source-linked
- ·ForecastBench-schema aligned (Karger et al. 2024)
Replaces or augments
Eurasia Group, EIU, S&P Global consulting reports for the board-and-commission context. Those publish no held-out backtest and no Brier on their methodology pages. This produces the verify-me artifacts that trust-me advisory cannot.
04
When to act on a number, when to plan for a range.
Calm regimes get a point estimate. Shock regimes get a scenario range. The calibration tells you which.
The decision
Industry-mix shifts. Site selection. Workforce projection. Whether to commit to a single sector outlook or hedge across a range. The calibration tells you when the regime is calm enough to act on a point estimate and when the regime is in shock and you need a scenario range.
What you walk away with
A forecast appendix that names the regime (calm or shock), gives you a number or a range with that regime's measured uncertainty, and carries a Brier history on the exact claim.
Where the rigor comes from
- ·Regime-aware backtest. 2000-2024 expanded economy sweep, 3,575 pairs.
- ·Calm window scored at ±5.5, shock window at ±9.8, separately
- ·Tightest measured slice: Tech · Quantum at ±2.8 on the 2025 hold-out
- ·Widest measured slice: Money · Crypto at ±8.9 on the 2025 hold-out
- ·Cross-family ensemble, weighted by track record not reputation
Replaces or augments
Oxford Economics (87 economies, economic-only, no Brier) and EIU (5-year horizon). This adds 30-year horizon and cross-domain STEEPE signals into the same calibration so an industry-mix call factors in the forces that bend it, not only its own historical trend.

Registry and calibration trend are shared with engaged clients under confidentiality.

Request an introduction →

AppendixAudit reference

Snapshot date

2026-06-16

Benchmark accuracy

Base year 2020 → target years 2022, 2024, 2026. 297 forecast-vs-actual pairs. Model trained on data through the base year only.

Brier window

7-day rolling on the production loop, n=36 resolved predictions. Three windows reported (7d, 14d, 30d) for cross-section. Multi-class bucket templates (BTC price ranges) are excluded from the headline pending per-event multinomial-Brier aggregation, and are tracked separately in the audit module.

Long-window backtest

17,875 pairs across 25 years on five long-history subjects. Sources: V-Dem v16, World Bank WDI.

Reference scale and known gaps

Brier is computed on the standard scale (Brier 1950). External benchmarks like ForecastBench (Karger et al. 2024, arXiv:2409.19839) and Tetlock GJP score multi-month forecasting on questions with non-zero base-rate uncertainty. The registry above is dominantly 1-day binary macro thresholds; raw aggregate numbers are not directly comparable to those references. Skill is validated through the gate sequence below, not by direct Brier comparison.

Validation gates

A template ships to the live trade harness only after clearing, in order: (a) Brier Skill Score vs persistence baseline above +0.10 over a backtest of n ≥ 200; (b) BSS vs market-implied probability above +0.03 over the same; (c) net-positive paper-trade P&L over 60 days. Each gate is reproducible from the public audit module (brier-audit.ts in the exponentialworld repo) and is the published reason a template is or is not in live use.

Cohort status

v1-live-fred-20260504 — accruing, n=5 of 30 resolved
v3-cai-veda-r2-20260607 — no resolutions yet
v2-ic4wsa-blend-20260605 — no resolutions yet
pre-versioned — no resolutions yet

Template-version cohorts surface separately so the post-Round-2 macro-only cohort (v3-cai-veda-r2-*) accrues to N=30 before being scored, rather than being pooled with the legacy financial-heavy v2 into a single misleading headline. Bradley, Schwartz, and Hashino 2008 floor.

Audit access

Full registry, calibration trend, and per-cell correction log shared with engaged clients under confidentiality.

Compared to consultancy reports

Compared to in-house models

Live production registry

The longer window

Money

AI & compute

Tech frontier

Society

Geopolitics

Climate

Economy

Education

Positive signals

We predict a year we haven't seen.

We grade confidence, not just correctness.

We combine five forecasters, not one.

Every number traces to a named public source.

Holds up under multi-year program review.

Reads many regions at once.

Backs up the calls a commission has to vote on.

When to act on a number, when to plan for a range.