MHZ-Epoch
Strongest Fixed Exploration Schedule
69% lower regret than UCB1 after only 64 structured pulls. Second only to adaptive Thompson Sampling in stationary environments. Designed for real-world drifting conditions.
Cumulative Regret • 10,000 Trials • Stationary Bandit
Stationary Results
Regret comparison against standard baselines. 10,000-trial stochastic bandit.
Strongest fixed exploration schedule tested
MHZ-Epoch achieves 36.83 cumulative regret using only 64 structured exploration pulls, then switches to greedy exploitation. It outperforms all non-adaptive baselines by a wide margin. Thompson Sampling (24.99), which adapts on every pull, remains the overall leader on stationary benchmarks.
Benchmark Leaderboard
Stationary multi-armed bandit · 10 arms · 640 pulls · 10,000 trials · cumulative regret (lower is better).
| # | Algorithm | Regret |
|---|---|---|
| 1 | Thompson Sampling#1 OVERALL | 24.99 |
| 2 | MHZ-EpochOURS | 36.83 |
| 3 | ε-Greedy (0.1) | 49.84 |
| 4 | UCB1 | 120.19 |
| 5 | Random | 345.57 |
Cumulative Regret Curves — 10,000 Trials
Interactive chart coming soon
Non-Stationary Benchmarks In Progress
The stationary benchmark above measures performance on fixed reward distributions — an environment where Thompson Sampling's per-pull adaptation gives it a natural advantage. Thompson wins there. That result is expected.
Real-world applications — markets, user preferences, clinical efficacy, ad performance — are non-stationary. Reward distributions drift. Adaptive methods over-commit to stale estimates and stop exploring. MHZ-Epoch's persistent multi-scale exploration structure is engineered to maintain performance exactly when environments shift — the scenario where Thompson Sampling's greedy adaptation becomes a liability.
Benchmarks on drifting and adversarial bandits are currently running. Results will be published here.
Target Domains
Environments where a fixed, pre-computed exploration schedule offers structural advantages.
Training Efficiency
Reduce wasted compute cycles. Converge on optimal configurations faster during hyperparameter search and model selection.
Recommendation Systems
Identify the best content, product, or action to surface with fewer exploration rounds and lower opportunity cost.
Non-Stationary Environments
Persistent multi-scale exploration structure designed for drifting reward distributions where adaptive methods over-commit.
Edge Devices
Lightweight sequential decisions with minimal state. No neural network overhead. Fixed schedule runs anywhere.
A/B Testing
Structured exploration with deterministic allocation. Reach statistical significance with a pre-computed ordering.
Resource Allocation
Dynamic budget distribution across competing options. Minimize regret in portfolio, bid, and scheduling problems.
Intellectual Property
Patent PendingProprietary & Patent-Pending
The internal algorithm and sequence generator behind MHZ-Epoch are proprietary and patent-pending. Only benchmark results and integration interfaces are disclosed. The underlying methodology, mathematical structure, and generation process are not publicly available.
Algorithm
Closed-source. Internal architecture and decision logic are not disclosed.
Sequence Generator
Proprietary ordering mechanism. No technical details released.
Integration
Available via API. Black-box interface with documented inputs and outputs.
Interested in licensing or partnership?
MHZ-Epoch is available for enterprise licensing, research collaboration, and integration partnerships. Contact our team to discuss deployment.