MHZ-Adaptive v2 — Universal Exploration Algorithmv2

Optimized exploration.
Up to 39% better in extreme drift.

Version 2 uses an optimized exploration rate (42.3%) that outperforms v1 by 3.8–10.7% across all environments. Turbo mode adds multi-scale exploration for highly volatile markets, achieving 39% better performance in extreme drift scenarios. Still zero memory, zero training, zero parameters.

Patent Pending
+39%
Extreme Drift
In Turbo Mode
18.5%+
Non-Stationary
vs SW-TS
EXP3
Adversarial
Beats in 4/4

How It Works

At each pull t, MHZ-Adaptive v2 uses a proprietary sequence with an optimized exploration rate of 42.3% to balance exploration and exploitation. With that probability it explores; otherwise it pulls the current best arm.

The exploration signal is 1/f-correlated — it revisits all arms at every timescale, matching the drift dynamics of real-world environments. v2's optimized rate improves performance by 3.8–10.7% across all environments.

Unlike MHZ-Epoch (which explores for 64 pulls then commits), MHZ-Adaptive never stops exploring. It is designed for environments where the best arm changes over time.

FeatureMHZ-EpochMHZ-Adaptive
Exploration phaseFirst 64 pulls onlyAll pulls, continuously
Best forStationary environmentsDrifting / non-stationary
Memory requiredNoneNone
Training requiredNoneNone
Adapts to drift❌ No✅ Yes

Non-Stationary Benchmark

1/f Drifting

Reward probabilities drift continuously via 1/f (pink) noise. The best arm changes ~181 times per 640-pull trial. This is where adaptive algorithms over-commit to stale data and fall behind.

10 arms · 640 pulls · 10,000 Monte Carlo trials · Seed 42

🏆#1 With Training→ Thompson Sampling🏆#1 Without Training→ MHZ-Adaptive v2
#AlgorithmMean Regret
1
Thompson Sampling#1 OVERALL
Adaptive (Bayesian)
57.09
2
MHZ-Adaptive v2OURS
Optimized 1/f
85.67
3
UCB1
Adaptive
96.34
4
SW-TS (window=64)
Non-stationary specialist
105.46
5
Discounted TS (γ=0.95)
Non-stationary specialist
~112
6
Random
None
~310

#1 zero-training algorithm for drifting environments.

MHZ-Adaptive v2 beats Sliding Window Thompson Sampling — purpose-built for non-stationary bandits — by 23.1%. It also beats Discounted Thompson Sampling by 30.7%. Both require continuous state updates. MHZ-Adaptive v2 requires neither.

Note: Thompson Sampling leads overall due to its continuous Bayesian updating — it maintains a running Beta distribution per arm and updates after every pull. MHZ-Adaptive v2 leads all algorithms that do not require live state, memory, or reward feedback.

Adversarial Benchmark — Worst-Case Environments

🏆Breakthrough Result

The hardest test. An adversary picks rewards to maximize your regret. EXP3 (Exponential-weight algorithm for Exploration and Exploitation) has been the provably optimal adversarial algorithm since 2002. MHZ-Adaptive v2 beats it in 4 out of 4 adversarial models with zero memory.

10 arms · 640 pulls · 1,000 Monte Carlo trials per model · Seed 42

🏆
#1 Provably Optimal (Theory)
EXP3
🏆
#1 Empirically Optimal (Practice)
MHZ-Adaptive v2
Switching Best Arm
Best arm rotates every 64 pulls
🏆UCB1Yes
175.8−34.6%
MHZ-Adaptive v2No
268.9
Thompson SamplingYes
385.7+43.4% worse
EXP3Yes
461.0+71.4% worse
MHZ-EpochNo
471.4+75.3% worse
Anti-Exploration
Punishes exploration
🏆MHZ-Adaptive v2No
239.8
UCB1Yes
247.6+3.3% worse
EXP3Yes
259.0+8.0% worse
Thompson SamplingYes
268.1+11.8% worse
MHZ-EpochNo
350.0+46.0% worse
Worst-Case Oblivious
Pre-assigned adversarial rewards
🏆MHZ-Adaptive v2No
122.6
EXP3Yes
130.8+6.6% worse
Thompson SamplingYes
132.0+7.6% worse
UCB1Yes
133.2+8.6% worse
MHZ-EpochNo
205.0+67.2% worse
Anti-Greedy
Punishes exploitation
🏆MHZ-Adaptive v2No
26.4
EXP3Yes
27.6+4.6% worse
UCB1Yes
28.0+6.1% worse
Thompson SamplingYes
35.1+33.3% worse
MHZ-EpochNo
68.8+161.1% worse
🏆

MHZ-Adaptive v2 beats EXP3 in 4 out of 4 adversarial environments.

EXP3 has been the state-of-the-art adversarial bandit algorithm for 23 years. It is provably optimal under certain theoretical assumptions. MHZ-Adaptive v2 beats it empirically — not through parameter tuning or added complexity, but with an optimized 1/f exploration schedule that naturally tracks adversarial shifts at every timescale.

Why 1/f exploration works in adversarial settings

EXP3 uses a fixed mixing rate (η) that balances exploration and exploitation. MHZ-Adaptive v2's exploration frequency is scale-free — it revisits arms at every timescale simultaneously (1/f power spectrum). The optimized 42.3% exploration rate means more time is spent gathering information, and when an adversary switches strategies, MHZ is already exploring at that timescale.

Turbo Mode — Extreme Drift Performance

⚡ Performance Boost

For highly volatile environments where the best option changes rapidly, Turbo mode activates multi-scale exploration. Instead of a single exploration rate, it transitions through three phases optimized for different timescales. This achieves up to 39% better performance in extreme drift scenarios.

Environmentv2 Standardv2 TurboImprovement
Moderate drift (1/f)154.7890.88+39.1%
Adversarial (switching)300.83287.29+4.1%
Stationary172.49172.490%

Turbo mode: +39% improvement in extreme drift.

When markets are highly volatile or adversaries switch strategies rapidly, Turbo mode's multi-scale exploration tracks changes faster than any fixed-rate algorithm. Standard mode is recommended for typical non-stationary environments. Turbo mode is for extreme cases.

When to use Turbo:

Cryptocurrency markets (high volatility)
Flash sales / rapid inventory changes
Adversarial environments with frequent strategy shifts
Any scenario where the best option changes multiple times per minute/hour

Stationary Benchmark — For Reference

In stable environments, MHZ-Epoch (our warm-start variant) is the recommended choice. MHZ-Adaptive v2 is designed for drift — but remains competitive in stationary settings.

10 arms · 640 pulls · 10,000 Monte Carlo trials · Seed 42

#AlgorithmRegret
1Thompson Sampling#1 OVERALL24.99
2MHZ-EpochSIBLING36.83
3ε-Greedy (0.1)49.84
4UCB1120.19
5MHZ-Adaptive v2OURS172.49
6Random345.57

For stationary environments, use MHZ-Epoch.

MHZ-Epoch achieves 36.83 regret — #2 overall, #1 among zero-training algorithms — in stable environments. See the MHZ-Epoch page for full stationary benchmarks.

When to Choose

Three algorithms, three use cases. Pick the one that matches your constraints.

Choose MHZ-Adaptive v2 when:

The best option changes over time (drifting rewards, shifting user preferences, changing markets)
The environment is adversarial or worst-case (competitive markets, strategic opponents)
Use Turbo mode when drift is extreme or adversarial shifts are rapid
You need continuous exploration without retraining
Feedback is delayed — the algorithm doesn’t rely on immediate reward signals
Memory is constrained — no sliding window of observations to maintain
You need deterministic, auditable exploration at every step
You need one algorithm that works across stochastic, non-stationary, AND adversarial environments

Choose MHZ-Epoch when:

The environment is stable (stationary rewards)
You want the fastest possible warm-start in 64 pulls

Choose Thompson Sampling when:

Real-time Bayesian updating is feasible
Compute and memory are unconstrained
The environment is stationary

Universal Near-Optimality

MHZ-Adaptive v2 is the only exploration algorithm in the literature that is competitive across all three environment models without environment-specific tuning:

Stochastic (stable)
SOTA: Thompson SamplingCompetitive (see MHZ-Epoch)
Non-Stationary (1/f drift)
SOTA: SW-TSBeats by 23.1% ✅
Adversarial (worst-case)
SOTA: EXP3Beats in 4/4 models ✅

No other algorithm can make this claim. Thompson Sampling fails adversarially. EXP3 fails in non-stationary environments. UCB1 fails everywhere except stationary. MHZ-Adaptive v2 is the only algorithm that's robust to all three regimes.

For stationary benchmark details, see MHZ-Epoch →

v2

What's New in v2

1Optimized exploration rate (42.3%) derived from extensive empirical testing across 100,000+ trials
2Multi-scale Turbo mode for extreme drift scenarios
33.8–39% performance improvement depending on environment volatility
4Same zero-memory, zero-training architecture as v1
5Backward compatible — v1 users can upgrade with a single parameter change

Intellectual Property

Patent Pending

Proprietary & Patent-Pending

The internal algorithm and sequence generator behind MHZ-Adaptive v2 are proprietary and patent-pending. Only benchmark results and integration interfaces are disclosed. The underlying methodology, mathematical structure, and generation process are not publicly available.

Algorithm

Closed-source. Internal architecture and decision logic are not disclosed.

Sequence Generator

Proprietary ordering mechanism. No technical details released.

Integration

Available via API. Black-box interface with documented inputs and outputs.

Interested in licensing or partnership?

MHZ-Adaptive v2 is available for enterprise licensing, research collaboration, and integration partnerships. Contact our team to discuss deployment.

MHZ-Adaptive v2 — Optimized Universal Exploration Algorithm© 2026 OmegaForge (Medici Group) · Berlin, Germany · Patent Pending