Performance Metrics & Scoring
Return metrics
Total return — the overall percentage gain or loss over the test period. The headline number, but never sufficient on its own.
Annualized return — total return normalized to a per-year basis, allowing comparison across strategies tested over different time periods.
Equity curve — the visual trajectory of portfolio value over time. A smooth, upward-sloping curve is more desirable than a jagged one that arrives at the same endpoint — the path matters, not just the destination.
Risk metrics
Maximum drawdown — the largest peak-to-trough decline during the test period. This is the single most important risk metric. A strategy that returns 50% but experiences a 40% drawdown along the way is very different from one that returns 30% with only a 10% drawdown.
Maximum drawdown duration — how long the strategy spent in its deepest drawdown before recovering to a new equity high. Long drawdown durations test psychological endurance in live trading.
Volatility — the standard deviation of returns, measuring how much the equity curve fluctuates period to period.
Risk-adjusted metrics
Sharpe ratio — return per unit of risk (return minus risk-free rate, divided by volatility). A Sharpe above 1.0 is generally considered good; above 2.0 is excellent. The Sharpe ratio allows direct comparison between strategies with different return and risk profiles.
Sortino ratio — similar to Sharpe but only penalizes downside volatility. A strategy with high upside volatility but low downside volatility will have a higher Sortino than Sharpe, reflecting that upside volatility is desirable.
Trade statistics
Win rate — percentage of trades that were profitable. A common misconception is that higher win rates are always better. A strategy with a 30% win rate can be highly profitable if the average winner is much larger than the average loser.
Average win / average loss — the mean profit on winning trades versus the mean loss on losing trades. The ratio between these (reward-to-risk ratio) combines with win rate to determine overall profitability.
Trade count — the total number of trades in the test period. Low trade counts (under 30) reduce statistical confidence in the results — the strategy may not have been tested enough.
Average trade duration — how long positions are held on average. This indicates whether the strategy is high-frequency (many short trades) or position-oriented (fewer, longer trades).
How to read metrics together
No single metric tells the full story. A useful evaluation framework:
- Start with the equity curve — does the path look reasonable?
- Check maximum drawdown — can you tolerate the worst-case decline?
- Compare Sharpe ratio — is the return justified by the risk?
- Verify trade count — is there enough statistical sample?
- Check win rate + reward-to-risk — does the math work?
- Look at drawdown duration — can you wait that long for recovery?
A strategy that passes all six checks is worth investigating further. A strategy that fails any of them deserves skepticism.