Bet Impact Model – Segflow Documentation

What is the Bet Impact Analysis?

The Bet Impact Analysis helps you compare strategic initiatives (“bets”) by estimating how much lift they can create on your metrics relative to the effort required. It’s built for prioritisation: decide what to do next, explain why, and keep a tight feedback loop between planning and real outcomes.

This is not a black-box prediction tool. It’s a transparent framework: you can see the assumptions, the expected metric deltas, and how those inputs drive the ranking.

What Value It Gives

Better prioritisation – Compare bets on “leverage per effort,” not volume of opinions.
Clear trade-offs – Make assumptions explicit (impact, timing, confidence) so discussions become concrete.
Stronger alignment – Connect initiatives to the metrics they’re meant to move and agree on what “impact” means.
A learning loop – Score bets before shipping, then measure impact after shipping to calibrate future estimates.
Risk awareness – Use uncertainty and diagnostics so you don’t over-invest in fragile bets.

Common Use Cases

Quarterly planning and sequencing – Decide what to build first when capacity is limited.
“Which bet is worth it?” – Compare an ambitious bet vs. a smaller bet with faster ROI.
Leadership alignment – Turn competing priorities into a shared, metric-grounded ranking.
Setting expectations on timing – Model lag and ramp so stakeholders don’t expect instant movement.
Post-launch accountability – Measure what actually happened and refine confidence for the next cycle.

Two Modes: Plan Now, Learn Later

Planning Mode (Before a Bet Ships)

Use Planning Mode to estimate impact using a baseline forecast and your assumptions.

Baseline forecast – Segflow estimates the “business as usual” path for each metric from your historical data.
Expected impact + timing – You set an expected change (absolute or %) plus optional lag and ramp.
Effort-normalised scoring – Bigger impact gets more credit, but effort reduces the score so you can compare ROI.
Confidence + reach – Confidence and metric importance (north star vs goal vs input) weight the score.
Uncertainty-aware – When forecasts include uncertainty bands, Segflow shows low/median/high outcomes and a risk-adjusted score.

Learning Mode (After a Bet Ships)

Use Learning Mode to estimate real-world impact once the bet is live.

Intervention effect estimation – Estimate the change in the metric when the bet is “on” vs. “off.”
Controls supported – Add related metrics as controls to reduce noise and improve attribution.
Effect + significance – Results include effect size and a p-value so you can separate signal from noise.
Data quality guardrails – If the dataset is too small or unstable, the analysis warns you and de-emphasises the score.

How Bets Are Ranked

A bet can be connected to multiple metrics. Segflow scores each bet → metric connection, then aggregates those into a single bet score and a contribution breakdown.

Under the hood (Planning Mode scoring)

Compute expected lift per metric
Segflow compares a baseline forecast to an impacted scenario where your expected impact is applied (including lag/ramp). This produces a delta and % change.
Convert qualitative inputs into weights
- Confidence is mapped to a weight (e.g., medium confidence weighs more than low).
- Reach is based on metric importance (north star > goal > input) or an explicit override.
Score each connection (edge)
Segflow follows a RICE-like shape:
score ∝ |delta| × confidence × reach ÷ effort
(Magnitude is used, so you must still sanity-check direction.)
Adjust for uncertainty (when available)
If forecasts include percentile bands, Segflow produces a low/median/high score range and discounts the median score when uncertainty is wide (up to a 50% penalty).
Aggregate into a bet score
By default, metric connections are weighted by impact magnitude so the bet score reflects the biggest levers. You’ll also see which edges contribute most.

Under the hood (Learning Mode measurement)

Measure the effect while the bet is active
Segflow fits a regression model with an intervention indicator (bet off/on), plus optional controls.
Report “lift” and confidence
- Effect size: estimated change in the metric’s units when the bet is on (lift).
- p-value: whether the effect is statistically distinguishable from noise.
- Confidence weight: derived from significance; weak evidence is down-weighted.
Apply data-quality guardrails When the dataset is too small or the model is unstable, Segflow flags it and sets the confidence/score to 0 so you don’t over-interpret.

Data Quality Thresholds (Learning Mode):

Minimum 6 observations required
R² ≥ 0.1 (model must explain at least 10% of variance)
Degrees of freedom ≥ 3 (observations minus parameters)
Intervention must have variance (not all 0s or all 1s)
Metric must have variance (not constant)
MSE must be finite (no regression failures)
Standard error calculation must succeed (no matrix inversion failures)

Intervention Coding:

Binary (0/1) recommended for on/off experiments—effect size means “change when switching from off to on.”
Continuous values valid for dose-response analysis—effect size means “change per unit increase” (e.g., per dollar spent, per hour of effort).

Regression Limitations:

Regression is a simple linear model without built-in trend or seasonality handling. For time-series with strong trends or seasonality, add a time index or seasonal indicators as control variables.
Overfitting risk: The model uses in-sample R² without cross-validation or regularization. Use minimal, well-justified controls; avoid adding many controls relative to your data length. The degrees-of-freedom rule (df ≥ 3) is a minimum—aim comfortably above it. Watch dataQuality warnings.
No convergence reporting: Gradient descent stops when loss change falls below a tolerance or when it hits the max-iteration cap; the result doesn’t report which happened. On difficult datasets (high collinearity, many controls), estimates may be less reliable.
When data quality is insufficient, numeric fields (effect size, CI, p-value) may still appear but should be treated as unreliable. Check dataQuality warnings before interpreting results.
These data quality thresholds are fixed and cannot be configured.

Example: Picking the Next Bet (Illustrative)

Scenario: You have two competing initiatives: “Improve onboarding” and “Launch a referral program.” You want the highest ROI bet for the next 6–12 weeks.

How you’d use Bet Impact Analysis:

Connect each bet to the metrics it’s meant to move (e.g., activation, retained users, referrals, CAC).
Estimate expected impact, lag/ramp, effort, and confidence for each connection.
Run Planning Mode and compare total score and risk-adjusted score.

What you might learn:

Onboarding ranks highest because it’s moderate impact, faster to ship, and moves a high-reach metric.
Referrals has a higher upside, but larger uncertainty and higher effort, so it ranks lower risk-adjusted.

What you do next:

Ship the top bet, then use Learning Mode to estimate measured lift and update future confidence based on what actually happened.

What You Provide

Required Inputs

Bets + effort – Use one consistent unit (e.g., person-weeks).
Bet → metric connections – Which metrics the bet is intended to move.
Expected impact – Absolute or percentage change, with optional lag/ramp timing.
Confidence – Your evidence level for the estimate.
Metric importance / reach – North star vs goal vs input (or an explicit reach override).

Optional Inputs (Advanced)

Forecast horizon – The number of forecast periods used for Planning Mode scoring (default: 90). These are forecast points, not calendar days—if forecasts are weekly, this means 90 weeks. If fewer forecast points are available than the specified horizon, scoring uses all available points.
Controls (Learning Mode) – Other metrics that might explain part of the movement.
Scoring configuration – Custom confidence/reach scales or aggregation behavior.

Lag and Ramp Configuration

lagDays – Forecast periods before impact begins (max 365). Use when effects are delayed (e.g., marketing campaigns need time to reach audience). Note: these are forecast points—for daily forecasts, 1 = 1 day; for weekly forecasts, 1 = 1 week.
rampDays – Forecast periods for linear ramp from 0% to 100% impact (max 365). Use for gradual rollouts.
Ramp is linear: at period lagDays + rampDays/2, impact is 50%.

Advanced Configuration Options

aggregationMode – "weighted_average" (default) or "sum". Weighted average weights edges by delta magnitude; sum mode adds all edge scores together (useful when you want to reward bets that move many metrics).
clampRiskAdjusted – When true, risk-adjusted scores are clamped to min/max bounds. Default: false.
confidenceWeights – Custom weights for confidence levels (must be 0-1). Default: none=0.5, low=0.6, medium=0.8, high=0.9.
reachScale – Custom weights for metric importance types (must be 0-1). Default: north_star=1.0, goal=0.8, input=0.5.
minScore – Minimum score floor (default: 0). Scores are clamped to this value. Exception: Zero-impact edges (delta ≈ 0) and Learning Mode results with insufficient data quality always return score = 0, bypassing minScore. This ensures “no impact” and “unreliable” cases are clearly distinguished from low-but-real impact.
maxScore – Maximum score ceiling (default: uncapped). When set, scores are clamped to this value. Note: In sum aggregation mode, total scores may exceed maxScore.

What You Get Back

Core Outputs (Planning Mode)

Bet ranking – Total score per bet for side-by-side prioritisation.
Expected lift – Average delta per forecast period and % change per bet → metric connection. For cumulative total, multiply delta by the actual forecast points used (the lesser of horizon and available forecasts). Note: Percentage change is set to 0 when the baseline is near zero to avoid extreme ratios.
Contribution breakdown – Which metric connections drive the bet score. Note: In weighted average mode (default), percentageOfTotal reflects each connection’s share of total delta magnitude. In sum mode, it reflects each connection’s share of total absolute score. When total impact is zero (all deltas ≈ 0), contributions default to equal shares (100/N% per edge)—interpret this as “no measurable impact” rather than meaningful distribution.
Baseline vs impacted values – The “business as usual” forecast and the impacted scenario.

Core Outputs (Learning Mode)

Measured lift – Estimated effect size in the metric’s units when the bet is active.
Statistical confidence – p-value and an evidence-weighted confidence score.
95% Confidence Interval – Range of plausible effect sizes (lower to upper bound) for the measured lift.
Score – Computed score based on effect size, significance, and reach, normalized to scoring bounds. Used for ranking statistical results.
Diagnostics – Model fit and data quality warnings to prevent false certainty.

Optional Outputs

Risk-adjusted scores – Discounted scores when forecast uncertainty is wide.
Score ranges – Scores based on p10/p50/p90 forecast percentiles. Since scores use magnitude (absolute delta), check delta signs for direction. Uncertainty bands are most reliable when percentile forecasts cover the full horizon.

How to Interpret Results

Scores are relative – Use them to rank bets in the same planning cycle, not as absolute predictions.
Direction is your responsibility – Scoring uses magnitude; always confirm the movement is desirable (e.g., lower CAC is good, higher churn is not).
Effort matters – A smaller lift can outrank a larger lift if it requires far less effort.
Use risk-adjusted when uncertain – When score ranges are wide, sequence based on conservative, risk-adjusted scores.
Look at contributions – A bet can rank high because of one key metric; know which connection is carrying the score.
Watch for scale dominance – Scoring uses absolute deltas, so metrics with larger scales (e.g., revenue in millions) naturally dominate over smaller-scale metrics (e.g., daily signups in hundreds) even if percent changes are similar. For multi-metric bets, consider using metrics with comparable scales, or review percent changes alongside scores for relative comparison.
In Learning Mode, read lift + p-value together – Big lift with weak significance is directional; small lift with strong significance is real but modest.
Heed data quality warnings – If the analysis flags insufficient data, treat results as learning signals, not decisions.

Best Practices

Keep impact assumptions realistic; conservative inputs are often more decision-useful.
Use lag/ramp to match reality (rollouts and delayed effects are normal).
Keep effort estimates consistent across teams so comparisons stay fair.
Prioritise a small set of primary outcome metrics per bet; add secondary metrics only when they matter.
Re-run as bets progress and update assumptions as you learn.
Pair with Driver Analysis: identify the levers, then score bets by how efficiently they move them.

Summary

The Bet Impact Analysis turns prioritisation into a transparent, metric-grounded process. Use Planning Mode to choose what to ship next — and Learning Mode to measure what actually happened, so your prioritisation gets smarter every cycle.