Scoring Methodology

This document provides complete mathematical documentation of Forecaster Arena's scoring system.

Overview

Forecaster Arena uses dual scoring to evaluate LLM forecasting performance:

Brier Score - Measures calibration (how well confidence matches accuracy)
Portfolio Returns (P/L) - Measures practical value (can predictions generate returns?)

Both metrics are important because:

A well-calibrated forecaster (good Brier) may make small, safe bets
An aggressive trader (high P/L) may be poorly calibrated but lucky
The best forecasters excel at both

1. Brier Score

1.1 Definition

The Brier Score measures the accuracy of probabilistic predictions.

Formula: $$\text{Brier Score} = (f - o)^2$$

Where:

$f$ = forecast probability (0 to 1)
$o$ = actual outcome (1 if event occurred, 0 if not)

Score Range:

0 = Perfect prediction
0.25 = Random guessing (50/50)
1 = Completely wrong

Example:

You predict 80% chance of rain, it rains: $(0.8 - 1)^2 = 0.04$ (Good)
You predict 80% chance of rain, it doesn't rain: $(0.8 - 0)^2 = 0.64$ (Bad)

1.2 Deriving Confidence from Bet Size

In Forecaster Arena, LLMs don't explicitly state probabilities. Instead, confidence is derived from bet size.

Rationale: A larger bet indicates higher confidence. A maximum bet (25% of balance) represents 100% confidence.

Formula: $$\text{implied_confidence} = \frac{\text{bet_amount}}{\text{max_possible_bet}}$$

Where: $$\text{max_possible_bet} = \text{cash_balance} \times 0.25$$

Examples:

Cash Balance	Bet Amount	Max Bet	Implied Confidence
$10,000	$2,500	$2,500	100%
$10,000	$1,250	$2,500	50%
$10,000	$500	$2,500	20%
$10,000	$50	$2,500	2%
$8,000	$2,000	$2,000	100%
$8,000	$500	$2,000	25%

1.3 Handling YES vs NO Bets

Brier Score requires a forecast of the YES probability. We convert based on bet side:

For YES bets: $$f_{YES} = \text{implied_confidence}$$

For NO bets: $$f_{YES} = 1 - \text{implied_confidence}$$

Intuition: Betting NO with 80% confidence means you think YES has only 20% chance.

Examples:

Bet Side	Implied Confidence	f_YES	Outcome	Brier Score
YES	0.80	0.80	YES wins	$(0.80 - 1)^2 = 0.04$
YES	0.80	0.80	NO wins	$(0.80 - 0)^2 = 0.64$
NO	0.80	0.20	YES wins	$(0.20 - 1)^2 = 0.64$
NO	0.80	0.20	NO wins	$(0.20 - 0)^2 = 0.04$

1.4 Aggregate Brier Score

Per Agent: $$\text{Aggregate Brier} = \frac{1}{n} \sum_{i=1}^{n} \text{Brier}_i$$

Where $n$ is the number of resolved bets.

Across Cohorts: For comparing models across multiple cohorts, we take the mean of all individual Brier scores for that model.

1.5 Brier Skill Score

To contextualize Brier scores, we can compute a skill score relative to a baseline:

$$\text{BSS} = 1 - \frac{\text{BS}}{\text{BS}_{reference}}$$

Reference Baselines:

Random guesser (always 50%): $\text{BS}_{ref} = 0.25$
Climatology (always market price): $\text{BS}_{ref} = \text{avg market Brier}$

Interpretation:

BSS > 0: Better than reference
BSS = 0: Same as reference
BSS < 0: Worse than reference

Example:

Brier Score = 0.15, Reference = 0.25
BSS = 1 - (0.15 / 0.25) = 0.40 (40% better than random)

2. Portfolio Returns (P/L)

2.1 Position Mechanics

Buying Shares:

When placing a bet, you buy shares at the current market price:

$$\text{shares} = \frac{\text{bet_amount}}{\text{price}}$$

For YES bets: price = current YES probability For NO bets: price = 1 - current YES probability

Example:

Bet $500 on YES at price 0.40
Shares = $500 / 0.40 = 1,250 shares

2.2 Position Valuation (Mark-to-Market)

Position value fluctuates with market price:

For YES positions: $$\text{value} = \text{shares} \times \text{current_YES_price}$$

For NO positions: $$\text{value} = \text{shares} \times (1 - \text{current_YES_price})$$

Unrealized P/L: $$\text{unrealized_pnl} = \text{current_value} - \text{cost_basis}$$

2.3 Settlement (Market Resolution)

When a market resolves:

Winning positions: Each share pays $1 $$\text{settlement} = \text{shares} \times $1$$

Losing positions: Each share pays $0 $$\text{settlement} = $0$$

Realized P/L: $$\text{realized_pnl} = \text{settlement} - \text{cost_basis}$$

Example:

Bought 1,250 YES shares for $500 (cost basis)
Market resolves YES → Settlement = 1,250 × $1 = $1,250
Realized P/L = $1,250 - $500 = +$750

2.4 Portfolio Calculations

Total Portfolio Value: $$\text{total_value} = \text{cash_balance} + \sum \text{position_values}$$

Total P/L: $$\text{total_pnl} = \text{total_value} - \text{initial_balance}$$

Return Percentage: $$\text{return_%} = \frac{\text{total_pnl}}{\text{initial_balance}} \times 100$$

Example:

Initial balance: $10,000
Current cash: $7,500
Position values: $3,200
Total value: $10,700
Total P/L: +$700 (+7.0%)

3. Win Rate

Definition: $$\text{win_rate} = \frac{\text{winning_bets}}{\text{total_resolved_bets}}$$

A bet is "winning" if the side bet on matches the resolution outcome.

Note: Win rate alone is misleading without context:

90% win rate with tiny bets may underperform
40% win rate with smart sizing may outperform

4. Comparison of Metrics

Metric	Measures	Rewards	Penalizes
Brier Score	Calibration	Accurate confidence	Over/under confidence
P/L	Practical value	Correct directional bets	Incorrect bets
Win Rate	Hit rate	Being right often	Being wrong often

Key Insight:

A model with excellent Brier score but modest P/L is well-calibrated but conservative.

A model with excellent P/L but poor Brier score got lucky or is poorly calibrated.

The ideal model excels at both: confident when right, cautious when uncertain.

5. Scoring Timeline

At bet placement: Record implied confidence
Daily: Update position mark-to-market values
At resolution: Calculate Brier score and realized P/L
Aggregate: Compute running averages for leaderboard

Appendix A: Complete Formulas

Implied Confidence

$$c = \min\left(\frac{a}{b \times 0.25}, 1.0\right)$$

Where: $a$ = bet amount, $b$ = cash balance at bet time

Brier Score (Binary)

$$BS = (f - o)^2$$

Where: $f$ = forecast YES probability, $o \in {0, 1}$

Forecast from Bet

$$f = \begin{cases} c & \text{if side = YES} \ 1 - c & \text{if side = NO} \end{cases}$$

Shares Purchased

$$s = \frac{a}{p}$$

Where: $p$ = price (YES price for YES bets, 1-YES price for NO bets)

Position Value

$$v = s \times p_{current}$$

Settlement Value

$$v_{settle} = \begin{cases} s \times 1 & \text{if side wins} \ 0 & \text{otherwise} \end{cases}$$

Aggregate Brier

$$\bar{BS} = \frac{1}{n}\sum_{i=1}^{n} BS_i$$

Portfolio Return

$$R = \frac{V_{final} - V_{initial}}{V_{initial}} \times 100%$$

Appendix B: Reference Values

Brier Score	Interpretation
0.00	Perfect
0.00 - 0.10	Excellent
0.10 - 0.20	Good
0.20 - 0.25	Fair (at or near random)
0.25 - 0.40	Poor
0.40+	Very Poor

Return %	Interpretation
> +50%	Exceptional
+20% to +50%	Excellent
+5% to +20%	Good
-5% to +5%	Neutral
-20% to -5%	Poor
< -20%	Very Poor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scoring Methodology

Overview

1. Brier Score

1.1 Definition

1.2 Deriving Confidence from Bet Size

1.3 Handling YES vs NO Bets

1.4 Aggregate Brier Score

1.5 Brier Skill Score

2. Portfolio Returns (P/L)

2.1 Position Mechanics

2.2 Position Valuation (Mark-to-Market)

2.3 Settlement (Market Resolution)

2.4 Portfolio Calculations

3. Win Rate

4. Comparison of Metrics

5. Scoring Timeline

Appendix A: Complete Formulas

Implied Confidence

Brier Score (Binary)

Forecast from Bet

Shares Purchased

Position Value

Settlement Value

Aggregate Brier

Portfolio Return

Appendix B: Reference Values

FilesExpand file tree

SCORING.md

Latest commit

History

SCORING.md

File metadata and controls

Scoring Methodology

Overview

1. Brier Score

1.1 Definition

1.2 Deriving Confidence from Bet Size

1.3 Handling YES vs NO Bets

1.4 Aggregate Brier Score

1.5 Brier Skill Score

2. Portfolio Returns (P/L)

2.1 Position Mechanics

2.2 Position Valuation (Mark-to-Market)

2.3 Settlement (Market Resolution)

2.4 Portfolio Calculations

3. Win Rate

4. Comparison of Metrics

5. Scoring Timeline

Appendix A: Complete Formulas

Implied Confidence

Brier Score (Binary)

Forecast from Bet

Shares Purchased

Position Value

Settlement Value

Aggregate Brier

Portfolio Return

Appendix B: Reference Values