Performance Metrics & Scoring

June 16, 20262 min read

Return metrics

Total return — the overall percentage gain or loss over the test period. The headline number, but never sufficient on its own.

Annualized return — total return normalized to a per-year basis, allowing comparison across strategies tested over different time periods.

Equity curve — the visual trajectory of portfolio value over time. A smooth, upward-sloping curve is more desirable than a jagged one that arrives at the same endpoint — the path matters, not just the destination.

Risk metrics

Maximum drawdown — the largest peak-to-trough decline during the test period. This is the single most important risk metric. A strategy that returns 50% but experiences a 40% drawdown along the way is very different from one that returns 30% with only a 10% drawdown.

Maximum drawdown duration — how long the strategy spent in its deepest drawdown before recovering to a new equity high. Long drawdown durations test psychological endurance in live trading.

Volatility — the standard deviation of returns, measuring how much the equity curve fluctuates period to period.

Risk-adjusted metrics

Sharpe ratio — return per unit of risk (return minus risk-free rate, divided by volatility). A Sharpe above 1.0 is generally considered good; above 2.0 is excellent. The Sharpe ratio allows direct comparison between strategies with different return and risk profiles.

Sortino ratio — similar to Sharpe but only penalizes downside volatility. A strategy with high upside volatility but low downside volatility will have a higher Sortino than Sharpe, reflecting that upside volatility is desirable.

Trade statistics

Win rate — percentage of trades that were profitable. A common misconception is that higher win rates are always better. A strategy with a 30% win rate can be highly profitable if the average winner is much larger than the average loser.

Average win / average loss — the mean profit on winning trades versus the mean loss on losing trades. The ratio between these (reward-to-risk ratio) combines with win rate to determine overall profitability.

Trade count — the total number of trades in the test period. Low trade counts (under 30) reduce statistical confidence in the results — the strategy may not have been tested enough.

Average trade duration — how long positions are held on average. This indicates whether the strategy is high-frequency (many short trades) or position-oriented (fewer, longer trades).

How to read metrics together

No single metric tells the full story. A useful evaluation framework:

Start with the equity curve — does the path look reasonable?
Check maximum drawdown — can you tolerate the worst-case decline?
Compare Sharpe ratio — is the return justified by the risk?
Verify trade count — is there enough statistical sample?
Check win rate + reward-to-risk — does the math work?
Look at drawdown duration — can you wait that long for recovery?

A strategy that passes all six checks is worth investigating further. A strategy that fails any of them deserves skepticism.

Matthew J. Goss, Jr.

Retired COMEX/NYMEX floor trader, Goldman Sachs and FlexTrade Systems alumnus, multi-instrumentalist, published author, and independent mathematics researcher. Founder of Quantiterate.