Methodology

FXOptimize is an audit-grade research tool. Same input + same seed produces the same output, every time. Below is what each number means and how it's computed — so you can decide for yourself whether to trust it.

TL;DR

Lower bound, not best case

Most portfolio-analysis and propfirm-pass-rate tools rank by the point estimate — the single most likely number. That's gameable (cherry-pick a favorable backtest, claim 95% pass rate) and brittle (small samples inflate point estimates).

FXOptimize instead reports the lower bound of a 95% bootstrap confidence interval. If two strategies tie on point estimate but one has a wider CI, the narrower-CI strategy ranks higher because its evidence is more stable.

Concrete example. Strategy A has CI [40%, 95%] (point estimate 67.5%). Strategy B has CI [55%, 90%] (point estimate 72.5%). Even though A's point estimate looks comparable, FXOptimize ranks B as the stronger candidate — its floor is 15 percentage points higher. The lower bound is what survives the worst trial in the data.

Walk-forward windows

For each metric, FXOptimize slides a fixed-size window across the user's MT4/MT5 backtest with stride = window/2. Window length matches the relevant time horizon — for propfirm pass-rate it matches the firm's evaluation period (e.g. 30 days for FTMO 2-step Phase 1); for portfolio drawdown it's user-configurable.

A 1-year backtest yields ~24 windows. A 2-year backtest yields ~48. For unlimited-time challenges (FundedNext, FundingPips, FXIFY, Goat) we cap at 60 days per window — long enough to compound, short enough that the bootstrap has enough samples to estimate the tails.

Sliding windows beat single-period validation because they capture regime changes. A strategy that was profitable Q1-Q3 2023 but blew up in Q4 needs to surface that volatility in the CI; one big window hides it, many small windows expose it.

Monte Carlo trade-shuffling

Within each window, FXOptimize replays the trades 1,000 times with within-day trade order shuffled. Within-day shuffling preserves daily clustering (so daily DD remains realistic) but captures path-dependence: the same trades in different intraday order can pass or fail daily-DD rules differently.

The window result is the fraction of replays that survived every rule (for propfirm pass-rate) or the distribution of drawdown outcomes (for portfolio analysis). The per-window results form a sample, which we then bootstrap.

What we do not shuffle. Trades are NOT shuffled across days — that would destroy the realism of daily-DD evaluation. We shuffle within-day only. For grid/martingale/recovery EAs that depend on cross-day position state, the within-day-exchangeability assumption is violated, and FXOptimize flags those EAs and excludes them from primary-match selection rather than producing a misleading number.

Bootstrap 95% confidence intervals

The per-window results form a sample. To estimate uncertainty around the mean, FXOptimize bootstraps that sample 5,000 times: resample N windows with replacement, compute the mean of each resample, then take the 2.5th and 97.5th percentiles of the resampled means. That is the 95% confidence interval.

The CI captures sampling uncertainty across walk-forward windows. Narrower CIs mean tighter agreement across the full backtest period. Wide CIs mean the metric is unstable across regimes — a signal worth surfacing, not hiding.

Sample-size guard

FXOptimize requires at least 12 walk-forward windows of data. Below that, the bootstrap CI is too unstable to be useful — too few unique resamples to estimate the tails. Rather than producing a misleading high-but-uncertain estimate, the report returns NoneViable with a warning that explains why.

This is the brand commitment in math form. The brave UX is "we tell you the data is too thin" — not "we'll give you a number anyway."

Failure-mode attribution

For propfirm pass-rate, when a single firm-side rule (weekend holding, news trading, max position size) accounts for ≥70% of failed iterations, the report surfaces it as Rule mismatch instead of a misleading 0% CI-low. The portfolio is structurally incompatible with that firm — not statistically inferior.

The report also identifies the worst-EA contributor: which EA in the portfolio was the largest negative contributor on failure-causing days across the most failed iterations. This makes the failure attribution actionable — you know which EA to drop, not just that the portfolio failed.

Reproducibility

Every randomness source is seeded with rand_chacha, which is deterministic across platforms. The default seed is 42. Same input + same seed = bit-identical report.

The engine is implemented in Rust, compiled to WebAssembly (~300 KB bundle), and runs entirely in your browser. Trade data never leaves your device. The Verified Badge generator embeds the high-level summary (CI bounds, master seed, issued date) in the URL fragment, which browsers do not transmit to servers.

What FXOptimize does not do

This is the part most tools skip. We don't.

Audit trail

Every report stamps:

If two reports were generated on the same input but show different numbers, the engine version or rule-set version moved. The version log is in the blog.

Open methodology, closed source

The methodology is fully documented — what's described above is the entire flow. The implementation is closed source (this is how the project sustains itself), but every claim above is verifiable: run the same backtest twice with the same seed and compare. Run a known-good portfolio and verify the CI lower bound matches what you'd compute by hand on a reasonable subset.

If you find a methodology gap or a number that looks off, email [email protected] with the input + the output and I'll dig in. Real reply, usually within 24 hours.

Read a sample report → Try the analyzer free About the founder