#1 zero-training exploration.
Immune to feedback delay.
MHZ-Epoch outperforms UCB1 by 69% in stationary environments. It's immune to delayed feedback. And it's proven on 25 million movie ratings and 500,000 real ad impressions. Zero training. Zero ML infrastructure. Runs on a laptop.
vs UCB1 • Stationary Environment • 10,000 Trials
For drifting environments, see MHZ-Adaptive → which beats SW-TS by 23%+.
Non-Stationary Benchmark
1/f DriftingThe hard problem. Reward probabilities drift continuously via 1/f (pink) noise. The best arm changes ~181 times per trial. Adaptive algorithms over-commit to stale data and fall behind. MHZ-Adaptive v2's optimized 1/f exploration matches the environment's own dynamics — and beats every specialist algorithm designed for this problem.
10 arms · 640 pulls · 10,000 Monte Carlo trials · Seed 42
| # | Algorithm | Regret |
|---|---|---|
| 1 | Thompson Sampling#1 OVERALL | 57.09 |
| 2 | MHZ-Adaptive v2OURS | 85.67 |
| 3 | UCB1 | 96.34 |
| 4 | SW-TS (window=64) | 105.46 |
| 5 | Discounted TS (γ=0.95) | ~112 |
| 6 | Random | ~310 |
MHZ-Adaptive v2: #1 zero-training algorithm in drifting environments.
MHZ-Adaptive v2 beats SW-TS — the published state-of-the-art non-stationary specialist — by 23.1%. SW-TS requires a sliding window of stored observations and continuous updates. MHZ-Adaptive v2 requires neither. See the full MHZ-Adaptive v2 benchmark →
Note: Thompson Sampling leads overall due to its continuous Bayesian updating — it maintains a running Beta distribution per arm and updates after every pull. MHZ-Adaptive v2 leads all algorithms that do not require live state, memory, or reward feedback.
Stationary Benchmark — Also Competitive
In stable environments, MHZ-Epoch holds its ground. It outperforms UCB1 and ε-Greedy while remaining the only zero-training algorithm in the top tier.
10 arms · 640 pulls · 10,000 Monte Carlo trials · Seed 42
| # | Algorithm | Regret |
|---|---|---|
| 1 | Thompson Sampling#1 OVERALL | 24.99 |
| 2 | MHZ-EpochOURS | 36.83 |
| 3 | ε-Greedy (0.1) | 49.84 |
| 4 | UCB1 | 120.19 |
| 5 | Random | 345.57 |
#1 zero-training algorithm in stable environments.
Beats UCB1 by 69% and ε-Greedy by 26%. The only algorithms ranked above it require live Bayesian inference on every pull.
Delayed Feedback Benchmark — Latency Robustness
Real-world decisions rarely get immediate feedback. Clinical trials take months. Ad conversions take days. Hiring outcomes take years. Thompson Sampling assumes immediate feedback — when it's delayed, performance collapses. MHZ-Epoch's exploration phase is deterministic and feedback-free. Delay doesn't affect it at all.
10 arms · 640 pulls · 1,000 Monte Carlo trials · Delay values: 0 to 640 pulls
At just 5% feedback delay, MHZ-Epoch beats Thompson Sampling (p = 6.33e-04, highly significant). In a 40-day campaign, that's 2 days. In a 1-year trial, that's 18 days.
Mean Regret vs Feedback Delay
The only algorithm immune to feedback delay.
MHZ-Epoch's regret is constant regardless of how long feedback takes. Thompson Sampling — the 90-year gold standard — degrades 14× when feedback is fully delayed. At D=32 (5% delay), MHZ-Epoch is already better.
Real-World Interpretation
Real-World Validation — MovieLens 25M
Synthetic benchmarks are useful, but real-world data is the ultimate test. We evaluated MHZ-Epoch on the MovieLens 25M dataset — 25 million movie ratings from 162,000 real users. Using replay evaluation (the industry-standard offline testing method), MHZ-Epoch demonstrated robust performance on actual recommendation data without any training or parameter tuning.
MHZ-Epoch performs optimally from the first recommendation. No training period. No warm-up phase. Just deterministic exploration that works immediately on real user preferences. With 50% CTR on matched recommendations, MHZ-Epoch demonstrates precise taste discovery through its 64-pull deterministic exploration phase.
Proven on 25 million real ratings.
MHZ-Epoch works on actual user data, not just simulations. The 64-pull exploration phase finds good movies immediately, without needing to learn user preferences first. With 50% CTR on matched recommendations, MHZ-Epoch demonstrates that deterministic exploration can discover user taste as effectively as Bayesian learning — making it ideal for cold start scenarios where you have no user history.
Real-World Applications
Ad-Tech Validation — Criteo Production Data
We evaluated MHZ-Epoch on the Criteo Ad Click dataset — 500,000 real ad impressions from Criteo's production advertising system. This is the same data that powers billion-dollar ad platforms. MHZ-Epoch achieved competitive performance against industry-standard machine learning models, without using any context features or requiring any training infrastructure.
Dataset Details
Performance Comparison
| Algorithm | CTR | Context Used | Training Required | Infrastructure |
|---|---|---|---|---|
| ε-Greedy | 0.2738 | ❌ | ❌ | Laptop |
| UCB1 | 0.2616 | ❌ | ❌ | Laptop |
| Logistic Regression | 0.2580 | ✅ 13 features | ✅ Online learning | ML pipeline |
| MHZ-Adaptive v2 | 0.2533 | ❌ None | ❌ None | Laptop |
| LinUCB | 0.2513 | ✅ 13 features | ✅ Online learning | ML pipeline |
| Random | 0.2489 | ❌ | ❌ | Laptop |
| MHZ-Epoch | 0.2443 | ❌ None | ❌ None | Laptop |
94.7% of Logistic Regression's CTR. Zero ML infrastructure.
MHZ-Epoch runs on a laptop and achieves 94.7% of what industry-standard machine learning delivers. No feature engineering. No model training. No GPUs. No serving infrastructure. Just OmegaForge's proprietary algorithm running on commodity hardware. This is what zero-infrastructure ad-tech looks like.
Feature extraction pipeline, online SGD training, model serving, GPU acceleration
A single lightweight data structure and a counter
Real-World Applications
When to Choose
Three algorithms, three use cases. Pick the one that matches your constraints.
Choose MHZ-Epoch when:
Choose MHZ-Adaptive when:
Choose Thompson Sampling when:
Target Domains
Environments where a fixed, pre-computed exploration schedule offers structural advantages.
Training Efficiency
Reduce wasted compute cycles. Converge on optimal configurations faster during hyperparameter search and model selection.
Recommendation Systems
Identify the best content, product, or action to surface with fewer exploration rounds and lower opportunity cost.
Non-Stationary Environments
Persistent multi-scale exploration structure designed for drifting reward distributions where adaptive methods over-commit.
Edge Devices
Lightweight sequential decisions with minimal state. No neural network overhead. Fixed schedule runs anywhere.
A/B Testing
Structured exploration with deterministic allocation. Reach statistical significance with a pre-computed ordering.
Resource Allocation
Dynamic budget distribution across competing options. Minimize regret in portfolio, bid, and scheduling problems.
Intellectual Property
Patent PendingProprietary & Patent-Pending
The internal algorithm and sequence generator behind MHZ-Epoch are proprietary and patent-pending. Only benchmark results and integration interfaces are disclosed. The underlying methodology, mathematical structure, and generation process are not publicly available.
Algorithm
Closed-source. Internal architecture and decision logic are not disclosed.
Sequence Generator
Proprietary ordering mechanism. No technical details released.
Integration
Available via API. Black-box interface with documented inputs and outputs.
Interested in licensing or partnership?
MHZ-Epoch is available for enterprise licensing, research collaboration, and integration partnerships. Contact our team to discuss deployment.