Bet Impact Model Report
Overview
The Bet Impact Model helps you score, prioritize, and measure the effectiveness of strategic initiatives (bets) based on their expected or actual impact on metrics. The model supports two distinct modes: Forecast-Informed Mode for forecast-based scoring, and Observed-Effect Mode for measured-impact analysis using regression.
When to Use This Report
Use the Bet Impact Model report when you need to:
Prioritize Your Roadmap
- Problem: You have more ideas than resources and need to decide which initiatives to pursue
- Solution: Score all potential bets using expected impact, confidence, importance weighting, and effort to identify the highest ROI opportunities
Make Resource Allocation Decisions
- Problem: Multiple teams are competing for limited engineering, design, or budget resources
- Solution: Objectively compare bets across teams using a standardized scoring methodology to allocate resources where they’ll have the most impact
Balance Quick Wins vs. Big Bets
- Problem: You need a mix of short-term results and long-term strategic initiatives
- Solution: The effort-normalized scoring reveals both high-impact/low-effort quick wins and high-impact/high-effort transformational bets
Account for Uncertainty
- Problem: Some bets are proven concepts while others are speculative, but you’re comparing them equally
- Solution: Confidence levels and uncertainty bands risk-adjust your scores, so proven bets score higher than equally-sized but uncertain bets
Measure Actual Impact After Launch
- Problem: You launched an initiative but don’t know if it actually worked or if metrics changed due to other factors
- Solution: Observed-Effect Mode uses regression to isolate the true measured effect of your bet, controlling for confounding variables
Validate Your Predictions
- Problem: You want to improve your team’s ability to forecast impact
- Solution: Compare pre-launch expected impact (Forecast-Informed Mode) with post-launch measured impact (Observed-Effect Mode) to calibrate future estimates
Consider Strategic Importance
- Problem: A bet might affect a low-priority operational metric or a critical north star metric
- Solution: Importance weighting ensures you account for strategic value of the metrics being impacted
Build an Evidence-Based Culture
- Problem: Prioritization is based on opinion, politics, or who argues loudest
- Solution: Replace subjective debates with data-driven scoring that anyone can understand and trust
Track Portfolio Performance
- Problem: You’ve launched multiple bets but don’t know which ones delivered and which ones flopped
- Solution: Use Observed-Effect Mode to measure all your bets consistently and build a knowledge base of what works
What It Measures
Forecast-Informed Mode
Used before launching a bet to prioritize which initiatives to pursue.
-
Bet Total Score
- Aggregates impact across all metrics the bet influences using a delta-magnitude weighted average of edge scores by default
- Can be configured to use simple summation with
aggregationMode: "sum" - Formula (per edge):
score = (|delta| × confidence_weight × importance_weight) / effort - Higher scores indicate better return on investment
-
Per-Metric Impact
- Delta: The average change per forecast period in each metric over the horizon (not cumulative total). For cumulative impact, multiply delta by the actual forecast points used (the lesser of horizon and available forecasts).
- Percentage Change: The relative change from baseline (set to 0 when baseline is near zero to avoid extreme ratios)
- Shows which metrics will be most affected by the bet
-
Confidence Weighting
- Based on your assessment of how certain you are about the impact
- Levels: High (0.9), Medium (0.8), Low (0.6), None (0.5)
- Reduces scores for uncertain bets to account for risk
-
Importance Weighting
- Derived from metric strategic type:
- North Star: 1.0 (primary success metric)
- Goal: 0.8 (key performance metrics)
- Input: 0.5 (operational/input metrics)
- Can be explicitly overridden with a custom value (0-1) when you have specific knowledge
- Derived from metric strategic type:
-
Effort Normalization
- Divides impact by effort (person-weeks) to calculate ROI
- Helps identify high-impact, low-effort opportunities
-
Aggregate Impact
- Sum of absolute deltas across all affected metrics
- Shows total expected magnitude of change from the bet
-
Uncertainty Bands & Risk-Adjusted Scoring (when forecast percentiles are available)
- Score Range: Scores based on p10 (pessimistic), p50 (median), and p90 (optimistic) forecast percentiles. Since scores use magnitude, check delta signs for direction.
- Risk-Adjusted Score: Median score discounted by uncertainty penalty (capped at 50%)
- Uncertainty Penalty: Based on the spread between p90 and p10 forecasts
- Use this to compare bets with different levels of forecast certainty
How to Create This Report
-
Add bets and metrics to your board
- Add the strategic initiatives (bets) you want to evaluate
- Add the metrics these bets are expected to impact
-
Connect bets to metrics
- Draw connections from each bet to the metrics it aims to influence
- Configure each connection with confidence level and expected impact
-
Generate the report
- Select a bet on the board
- Click on the “Report” button in the popup menu
- Choose “Bet Impact Analysis” from the report options
Observed-Effect Mode
Used after launching a bet to measure actual effectiveness.
-
Effect Size
- The measured impact of the intervention on the metric in raw units
- Calculated using regression:
metric = β₀ + β₁×intervention + β₂×control₁ + ... + ε - β₁ represents the effect size of your bet (change per 0→1 intervention switch)
-
Statistical Significance
- P-value: Probability that the observed effect is due to chance
- P < 0.05 indicates statistically significant impact
- Helps distinguish real effects from random variation
-
Confidence Interval
- 95% CI: Range of plausible effect sizes (lower bound to upper bound)
- Provides intuitive understanding of measurement precision
- Narrower intervals = more precise estimates
-
Confidence & Uncertainty
- Standard Error: Measures precision of the effect estimate
- Confidence Weight: Derived from p-value, used in scoring
- Lower p-values yield higher confidence weights
- Reliability: Result-level reliability classification (
high/medium/low) combines data quality gates, statistical significance, and confidence interval sharpness
-
Controlled Analysis
- Accounts for other factors (control metrics) that might influence the outcome
- Isolates the true causal effect of the bet from confounding variables
- Ensures you’re measuring the bet’s impact, not external factors
-
Score & Interpretation
- Combines effect size, significance, and importance weight into a single score
- Provides human-readable interpretation of results
- Example: “moderate positive effect (significant, p=0.032, confidence weight=0.95, 95% CI: [12.3, 45.7])“
Scoring Formula Details
Forecast-Informed Mode Score
score = (|delta| × confidence_weight × importance_weight) / effortWhere:
delta: Average change per forecast period over the horizonconfidence_weight: 0.5 (None), 0.6 (Low), 0.8 (Medium), or 0.9 (High)importance_weight: 0.5 to 1.0 based on metric strategy type (north_star=1.0, goal=0.8, input=0.5) or explicit overrideeffort: Person-weeks or similar effort measure
⚠️ Important: Score reflects magnitude/ROI, not direction. A large negative (harmful) effect can produce a high score. Always check the delta sign and metric desirability to determine benefit vs harm.
Observed-Effect Mode Score
score = (|effect_size| × significance_weight × importance_weight) / effortWhere:
effect_size: Regression coefficient for the intervention (raw metric units)significance_weight: Function of p-value (higher for p < 0.05)importance_weight: Same as Forecast-Informed Modeeffort: Actual effort spent on the bet
⚠️ Important: Like Forecast-Informed Mode, score is based on absolute magnitude. Check interpretation and effect size sign for directionality.
Configuration Options
Confidence Levels
- High (0.9): You have strong evidence or historical data supporting the expected impact
- Medium (0.8): Reasonable assumptions but with some uncertainty
- Low (0.6): Speculative or unproven hypothesis
- None (0.5): Baseline uncertainty, no specific evidence
Metric Strategy Types
Metrics are classified by their strategic importance in the hierarchy:
- North Star (1.0): The primary success metric that defines overall business health
- Goal (0.8): Key performance metrics that drive the north star
- Input (0.5): Operational and input metrics that influence goals
Custom Importance Weight Override
You can explicitly set an importance weight (0-1) to override the strategy type default when you have specific knowledge about the metric’s strategic value or scope.
Effort Metrics
- Measured in person-weeks, person-months, or story points
- Should include all phases: design, development, testing, rollout
- Be consistent across bets for accurate comparison
Aggregation Modes
- Weighted Average (default): Total score is delta-magnitude weighted average of edge scores. Prevents “edge farming” where adding many small-impact edges inflates scores.
- Sum: Total score is sum of all edge scores. May exceed configured max score. Use when you want additive scoring across metrics.
Interpreting the Results
Pre-Launch (Forecast-Informed Mode)
-
Prioritizing Bets
- Rank bets by total score to identify highest ROI opportunities
- Review per-metric contributions to understand value drivers
- Consider risk-adjusted scores (when available) vs. pure potential
-
Resource Allocation
- Focus on high-score, low-effort bets for quick wins
- Balance portfolio between high-confidence incremental bets and low-confidence transformational bets
- Use aggregate impact to understand scale of change
-
Expectation Setting
- Share delta (remember: average change per forecast period, not cumulative) and percentage change with stakeholders
- Communicate confidence levels to set realistic expectations
- Document assumptions for post-launch validation
-
Understanding Uncertainty
- Review score ranges (p10/p50/p90) to see best/worst case scenarios
- Compare risk-adjusted scores to understand downside protection
- Use uncertainty penalty as a measure of forecast volatility
Post-Launch (Observed-Effect Mode)
-
Validating Hypotheses
- Compare effect size to expected delta from Forecast-Informed Mode
- Check if p-value < 0.05 for statistical significance
- Review 95% CI to understand range of plausible impacts
- Check interpretation for directional accuracy (positive vs. negative)
-
Learning & Iteration
- For significant positive effects: Scale up and replicate
- For non-significant effects: Investigate confounds, insufficient data, or true null effect
- For significant negative effects: Roll back or adjust approach
-
Improving Estimates
- Track actual vs. predicted impact to calibrate future forecasts
- Adjust confidence levels based on accuracy of past predictions
- Refine effort estimates based on actual time spent
Applying Insights
Short-Term Actions
-
Bet Prioritization
- Create a ranked list of bets using total score
- Start with top-scoring bets that fit current capacity
- Queue lower-scoring bets or deprioritize permanently
-
Resource Planning
- Allocate team capacity to highest-impact bets
- Balance workload using effort estimates
- Plan for dependencies between related bets
-
Stakeholder Communication
- Share scoring rationale and methodology
- Present per-metric breakdowns for transparency
- Set expectations using confidence levels and uncertainty bands
Medium-Term Strategy
-
Portfolio Management
- Monitor distribution of bets across strategy types
- Ensure mix of quick wins and transformational initiatives
- Track aggregate impact across all active bets
-
Continuous Measurement
- Run statistical analysis for all launched bets
- Compare predicted vs. actual impact systematically
- Build a knowledge base of what works
-
Process Improvement
- Refine scoring based on actual outcomes
- Adjust confidence and importance weight calibration
- Improve effort estimation accuracy
Long-Term Planning
-
Strategic Direction
- Identify patterns in high-impact bet characteristics
- Double down on strategy types that consistently deliver
- Develop organizational capabilities in high-leverage areas
-
Data-Driven Culture
- Use scoring to replace subjective prioritization
- Make bet impact measurement a standard practice
- Train teams to think in terms of expected value and ROI
-
Predictive Capability
- Build forecasting models based on historical accuracy
- Develop intuition for effort and impact estimation
- Create playbooks for common bet types
Notes & Limits
API Migration Note (Mode Naming Update)
The Bet Impact API now uses clearer mode names:
forecast_informed(new canonical name, replaceshybrid)observed_effect(new canonical name, replacesstatistical)
Backward compatibility is enabled during migration:
- Input: API still accepts legacy
modevalueshybridandstatistical, and maps them to the new canonical values. - Output:
forecastInformedResultsis the canonical response key (legacy aliashybridResultsstill included).observedEffectResultsis the canonical response key (legacy aliasstatisticalResultsstill included).
- Response header:
X-Deprecation-Noticeis included to signal legacy alias deprecations.
Recommended client update path:
- Send canonical
modevalues now. - Read canonical response keys now.
- Keep legacy fallback parsing only temporarily for older deployments.
Important Behavioral Notes
-
Magnitude vs. Direction: Scores use absolute values (magnitude). A large harmful effect scores high just like a large beneficial effect. Always check delta/effect size sign and metric desirability before acting.
-
Average vs. Cumulative Delta: The
deltafield shows average change per forecast period over the horizon, not total cumulative impact. For cumulative impact, multiply delta by the actual forecast points used (the lesser of horizon and available forecasts). -
Aggregation Default: Total score uses delta-magnitude weighted average by default, not simple sum. This prevents score inflation from adding many low-impact edges. Configure
aggregationMode: "sum"if you want additive scoring. -
Importance Weight Semantics: When derived from strategy type, this represents strategic value/hierarchy weighting (north star > goal > input), not necessarily population reach. Use explicit override (0-1) when you have specific reach or scope data.
-
Score Floor Exceptions: The
minScoreconfiguration sets a floor for scores, but two cases always return exactly 0 regardless of minScore: (1) zero-impact edges where delta ≈ 0, and (2) Learning Mode results with insufficient data quality. This ensures “no impact” and “unreliable” cases are clearly distinguished from low-but-real impact. -
Zero-Impact Contribution Fallback: When total delta magnitude across all edges is zero (all deltas ≈ 0), contribution percentages default to equal shares (100/N% per edge). Interpret this as “no measurable impact” rather than a meaningful distribution of contributions.
Data Quality & Statistical Limitations
For Observed-Effect Mode:
-
Strict data quality gates: Results show “cannot be reliably determined” (score=0, confidence=0) when minimum requirements aren’t met (sample ≥6, R² ≥0.1, df ≥3, valid MSE, no SE fallback). This is intentional to prevent misleading results.
-
What to do when data quality insufficient:
- Add more time: Collect more observations before/after intervention
- Add better controls: Include relevant confounding variables as control metrics
- Ensure variance: Verify intervention has variance (not all 0s or all 1s) and metric changes over time
- Review intervention coding: Make sure intervention periods are correctly identified
-
Intervention coding: Observed-Effect Mode interprets effect as “change per unit increase” in the intervention variable. For binary interventions (0/1), this means “change when switching from off to on.” For continuous interventions, it represents a “dose-response” effect (e.g., change per dollar spent). Use non-negative intervention values; for clearest interpretation, use binary coding.
-
No automatic seasonality/trend handling: Regression is simple linear model without built-in time trend or seasonal adjustments. For metrics with strong trends or seasonality, add a time index or seasonal indicators as control variables to improve estimates.
-
Overfitting risk: The model uses in-sample R² without cross-validation or regularization. Use minimal, well-justified controls; avoid adding many controls relative to your data length. The degrees-of-freedom requirement (df ≥ 3) is a minimum—aim comfortably above it for reliable estimates.
-
No convergence reporting: Gradient descent stops when loss change falls below a tolerance or when it hits the max-iteration cap; the result doesn’t report which happened. On difficult datasets (high collinearity, many controls), estimates may be less reliable.
-
Numeric fields when data quality insufficient: When data quality gates are triggered (e.g., SE fallback, low R²), score and confidence are set to 0, but effect size, CI, and p-value may still show numeric values. Treat these as unreliable when dataQuality warnings are present.
-
Correlation ≠ Causation: Results are directional, not guaranteed causation. For strong causal claims, ensure proper experimental setup (randomization, control groups) or use well-justified controls to account for confounding.
For Forecast-Informed Mode:
-
Forecast quality directly affects delta accuracy. Poor forecasts lead to unreliable scores.
-
Uncertainty bands (p10/p50/p90 scores) only available when forecasts include percentile predictions.
-
Confidence levels are subjective assessments. Calibrate these based on historical prediction accuracy.
General:
-
The model focuses on metric-level impact. It does not account for qualitative factors, strategic alignment beyond metric hierarchy, or non-metric benefits.
-
Forecast periods are point-based:
horizon,lagDays, andrampDaysare measured in forecast periods (e.g., if forecasts are daily, these represent days; if weekly, they represent weeks). Ensure forecast cadence matches your timeline expectations. -
Scale dominance in multi-metric bets: Scoring uses absolute deltas, so metrics with larger natural scales (e.g., revenue in millions of dollars) will dominate over smaller-scale metrics (e.g., daily signups in hundreds) even when percent changes are comparable. For multi-metric bets, consider: (1) comparing metrics with similar scales, (2) reviewing percent changes alongside scores for relative impact assessment, or (3) using separate single-metric analyses when scales differ significantly.
-
Results assume metrics are correctly defined, tracked, and represent what you intend to measure.
Data Requirements
Forecast-Informed Mode Requirements
- Each bet must have at least one edge (connection to a metric)
- Each edge requires:
- Valid forecast data (at least 1 point, preferably covering the horizon)
- Expected impact specification (mode: percentage or absolute, and value)
- Confidence level (none, low, medium, or high)
- Strategy type for the target metric (north_star, goal, or input)
- Effort must be a positive number
Observed-Effect Mode Requirements
- Minimum sample size: 6 observations (more is better; aim for 20+ for reliable estimates)
- Intervention data: Binary (0 or 1) recommended for on/off experiments. Continuous values acceptable for dose-response analysis. Must have variance (not all same value).
- Validation mode: Use
interventionValidation: "binary"when you want strict 0/1 enforcement; default"non_negative"supports dose-response style interventions. - Metric data: Must have variance (not constant) and align by date with intervention
- Date alignment: Metric and intervention (and controls if provided) must have matching dates
- Minimum degrees of freedom: df = n - p ≥ 3 (sample size minus number of parameters)
- Minimum model fit: R² ≥ 0.1 (model explains at least 10% of variance)
Pro tip: For cleanest results, use a clear before/after intervention period with stable baseline and treatment phases.
Example Use Cases
Product Roadmap Prioritization
Score all proposed features to determine quarterly roadmap. Balance quick wins (high score, low effort) with strategic bets (high impact, higher effort).
Marketing Campaign Selection
Evaluate different campaign ideas pre-launch using Forecast-Informed Mode. After launch, use Observed-Effect Mode to measure actual lift and refine future campaign selection.
A/B Test Analysis
Use Observed-Effect Mode to rigorously measure experiment effects, accounting for confounding factors. Prioritize rollout of winning variants based on effect size and significance.
Resource Allocation
When deciding between competing initiatives, use bet scoring to make objective, data-driven decisions about where to invest limited resources.
By leveraging the Bet Impact Model, you can move from opinion-based to evidence-based prioritization, systematically improve prediction accuracy, and maximize the return on your strategic investments.