Skip to main content

backtesting · 10 min read

Monte Carlo Simulation: Stress-Testing a Strategy Against 1,000 Futures

A single backtest shows one history. Monte Carlo shows the thousand histories that could have happened — and tells you whether your strategy survives bad luck.

By Quantinger Research

The Problem With One Backtest

A backtest gives you exactly one number for each metric: one total return, one max drawdown, one Sharpe ratio. It tells you what did happen on one specific sequence of historical trades.

But that sequence was partly luck. Your winners and losers arrived in a particular order. If the same trades had arrived in a different order — the same wins and losses, just shuffled — your equity curve would look completely different. The drawdowns would land in different places. The worst losing streak might have hit when your account was smaller, doing far more damage.

A backtest can't show you this. It shows one path through history. Monte Carlo simulation shows you the thousand paths that were equally possible — and reveals whether your strategy's success depended on a lucky ordering or whether it has genuine, robust edge.

What Monte Carlo Actually Does

The core idea is resampling. You take your strategy's real trade results — the actual sequence of wins and losses from your backtest — and you reshuffle them thousands of times, building a new equity curve each time.

Each shuffle represents an alternate history: the same strategy, the same edge, the same individual trade outcomes, but arriving in a different order. Some shuffles will produce smooth equity curves. Others will cluster the losses early, producing brutal drawdowns. Others will cluster wins early, producing a deceptively smooth ride.

After a thousand simulations, you have a distribution — not a single number, but a range of possible outcomes with probabilities attached. Now you can ask the questions that actually matter:

  • In the worst 5% of possible orderings, how bad does the drawdown get?
  • What's the probability the strategy ends a year profitable?
  • How wide is the range of likely returns?

This is the difference between "my strategy made 50%" and "my strategy makes between 20% and 80% in 90% of possible futures, with a 5% chance of a drawdown exceeding 35%." The second statement is honest. The first is a single lucky (or unlucky) draw presented as destiny.

The Two Flavors of Resampling

There are two main ways to run the simulation, and they answer slightly different questions.

Trade-order resampling (shuffling). You keep your exact set of trades but randomize their order. This tests sequence risk — whether your results depend on a fortunate ordering. It preserves the actual win rate and the actual size of each win and loss, only changing when they occur. This is the most common and most intuitive method.

Bootstrap resampling (sampling with replacement). Instead of just reordering, you draw trades randomly from your historical set with replacement — meaning the same trade can appear multiple times, and some won't appear at all. This creates synthetic trade sequences that could plausibly come from the same underlying strategy. It tests a broader question: across many plausible samples of this strategy's behavior, what's the range of outcomes?

Both are valid. Shuffling is stricter about preserving your actual results; bootstrapping explores a wider space of possibilities. A thorough analysis runs both.

Reading a Monte Carlo Fan Chart

The standard visualization is a "fan chart" — many faint equity curves overlaid, fanning out from the starting point. The spread of the fan shows the range of possible outcomes. A tight fan means your strategy's outcome is fairly predictable regardless of ordering. A wide fan means luck plays a large role.

The key lines on the chart are the percentile bands:

  • The median (50th percentile) line is your "expected" path — half of simulations did better, half did worse.
  • The 5th percentile band is your bad-luck scenario — only 5% of orderings produced a worse result. This is what you should plan around.
  • The 95th percentile is your good-luck ceiling — don't expect it.

The single most important number to extract is the 5th-percentile maximum drawdown. This answers: "If I'm unlucky but not catastrophically so, how much of my account should I expect to lose at the worst point?" If that number is bigger than you can stomach, the strategy is too risky for you regardless of its average return.

Why This Catches Overfit Strategies

Monte Carlo has a powerful side effect: it exposes strategies whose success was fragile.

A genuinely robust strategy produces a reasonably tight fan of outcomes, most of them profitable. The edge shows up across orderings because the edge is real.

An overfit strategy — one that captured the specific noise of one historical period — often falls apart under resampling. Its backtested success depended on a particular sequence of fortunate trades. Shuffle that sequence and the drawdowns balloon, the win rate's clustering breaks, and a meaningful fraction of simulations end in ruin. If 20% of your Monte Carlo paths lead to a 50%+ drawdown, your "great backtest" was riding luck.

This is why Monte Carlo belongs in every validation workflow alongside walk-forward analysis. Walk-forward tests whether the edge generalizes across time. Monte Carlo tests whether it survives across sequence. Together they catch most of the ways a backtest lies.

The Limits — Be Honest About Them

Monte Carlo is powerful but not omniscient, and using it well means knowing what it can't do.

It assumes your trades are independent. Standard resampling treats each trade as a draw from the same distribution. But real markets have regime — winning and losing trades sometimes cluster because market conditions cluster. If your strategy has serial correlation (losses tend to follow losses), naive resampling underestimates real-world drawdown risk. More sophisticated block-bootstrap methods preserve some of this clustering.

It can't invent outcomes your strategy never produced. If your backtest period never contained a 2008-style crash, Monte Carlo resampling of those trades won't either. It explores reorderings of what happened, not entirely new market regimes. A strategy that looks bulletproof in Monte Carlo can still be destroyed by a market event outside its historical sample.

Garbage in, garbage out. If your underlying backtest is flawed — unrealistic fills, no slippage, look-ahead bias — Monte Carlo just produces a thousand flawed futures. It amplifies the quality of your backtest; it doesn't fix it.

How to Use It in Practice

A disciplined workflow:

  1. Build and walk-forward-validate your strategy first. Monte Carlo on an unvalidated strategy is premature.
  2. Run trade-order resampling (1,000+ simulations) on the out-of-sample results.
  3. Extract the 5th-percentile max drawdown and the probability of a profitable outcome.
  4. Size your position so that even the 5th-percentile drawdown is survivable — not the median, the 5th percentile.
  5. If a meaningful fraction of paths lead to ruin, reject the strategy regardless of its average return.

The trader who sizes for the median outcome gets destroyed by the bad-luck ordering that eventually arrives. The trader who sizes for the 5th percentile survives it and stays in the game. Monte Carlo is how you find that number before the market finds it for you.

The Bottom Line

A backtest is one story. Monte Carlo is the distribution of stories that were equally likely. It converts "my strategy made X%" into an honest probability statement about what could happen — including how bad the bad cases get. For anyone risking real capital, that honesty is the difference between informed risk and blind hope.


Run Monte Carlo on your own strategies: Quantinger's backtester includes Monte Carlo fan charts with percentile bands and drawdown distributions built in.