MHZ-Epoch — Proprietary Exploration Algorithm

#1 zero-training exploration.
Immune to feedback delay.

MHZ-Epoch outperforms UCB1 by 69% in stationary environments. It's immune to delayed feedback. And it's proven on 25 million movie ratings and 500,000 real ad impressions. Zero training. Zero ML infrastructure. Runs on a laptop.

✓All benchmarks validated (seed-independent)

-0%

vs UCB1 • Stationary Environment • 10,000 Trials

For drifting environments, see MHZ-Adaptive → which beats SW-TS by 23%+.

Patent Pending

-69%

Stationary Bandit

vs UCB1

∞

Any Latency

Delay Immune

Zero Required

Training

Non-Stationary Benchmark

1/f Drifting

The hard problem. Reward probabilities drift continuously via 1/f (pink) noise. The best arm changes ~181 times per trial. Adaptive algorithms over-commit to stale data and fall behind. MHZ-Adaptive v2's optimized 1/f exploration matches the environment's own dynamics — and beats every specialist algorithm designed for this problem.

10 arms · 640 pulls · 10,000 Monte Carlo trials · Seed 42

🏆#1 With Training→ Thompson Sampling🏆#1 Without Training→ MHZ-Adaptive v2

#	Algorithm	Regret	Requires Memory & Training	vs MHZ-Adaptive v2
1	Thompson Sampling#1 OVERALL	57.09	✓Yes	—
2	MHZ-Adaptive v2OURS	85.67	✗No	—
3	UCB1	96.34	✓Yes	+12.5% worse
4	SW-TS (window=64)	105.46	✓Yes	+23.1% worse
5	Discounted TS (γ=0.95)	~112	✓Yes	worse
6	Random	~310	✗No	worse

MHZ-Adaptive v2: #1 zero-training algorithm in drifting environments.

MHZ-Adaptive v2 beats SW-TS — the published state-of-the-art non-stationary specialist — by 23.1%. SW-TS requires a sliding window of stored observations and continuous updates. MHZ-Adaptive v2 requires neither. See the full MHZ-Adaptive v2 benchmark →

Note: Thompson Sampling leads overall due to its continuous Bayesian updating — it maintains a running Beta distribution per arm and updates after every pull. MHZ-Adaptive v2 leads all algorithms that do not require live state, memory, or reward feedback.

Stationary Benchmark — Also Competitive

In stable environments, MHZ-Epoch holds its ground. It outperforms UCB1 and ε-Greedy while remaining the only zero-training algorithm in the top tier.

10 arms · 640 pulls · 10,000 Monte Carlo trials · Seed 42

🏆#1 With Training→ Thompson Sampling🏆#1 Without Training→ MHZ-Epoch

#	Algorithm	Regret	Requires Memory & Training	vs MHZ-Epoch
1	Thompson Sampling#1 OVERALL	24.99	✓Yes	—
2	MHZ-EpochOURS	36.83	✗No	—
3	ε-Greedy (0.1)	49.84	✓Yes	+35.3% worse
4	UCB1	120.19	✓Yes	+226% worse
5	Random	345.57	✗No	+838% worse

#1 zero-training algorithm in stable environments.

Beats UCB1 by 69% and ε-Greedy by 26%. The only algorithms ranked above it require live Bayesian inference on every pull.

Delayed Feedback Benchmark — Latency Robustness

🏆Unique Property

Real-world decisions rarely get immediate feedback. Clinical trials take months. Ad conversions take days. Hiring outcomes take years. Thompson Sampling assumes immediate feedback — when it's delayed, performance collapses. MHZ-Epoch's exploration phase is deterministic and feedback-free. Delay doesn't affect it at all.

10 arms · 640 pulls · 1,000 Monte Carlo trials · Delay values: 0 to 640 pulls

Crossover point: Delay = 32 pulls (5% of horizon)

At just 5% feedback delay, MHZ-Epoch beats Thompson Sampling (p = 6.33e-04, highly significant). In a 40-day campaign, that's 2 days. In a 1-year trial, that's 18 days.

Mean Regret vs Feedback Delay

AlgorithmRegret D=0Regret D=640DegradationDelay Robust?

🏆MHZ-Epoch36.8336.831.0×✅ Immune

🏆MHZ-Epoch✅ Immune

D=0: 36.8 → D=640: 36.81.0×

Thompson Sampling24.78345.3614.0×❌ Catastrophic

Thompson Sampling❌ Catastrophic

D=0: 24.8 → D=640: 345.414.0×

MHZ-Adaptive v2193.07347.991.8×✅ Robust

MHZ-Adaptive v2✅ Robust

D=0: 193.1 → D=640: 348.01.8×

EXP3172.35345.262.0×Moderate

EXP3Moderate

D=0: 172.3 → D=640: 345.32.0×

UCB1119.66512.004.3×❌ Fails

UCB1❌ Fails

D=0: 119.7 → D=640: 512.04.3×

🏆

The only algorithm immune to feedback delay.

MHZ-Epoch's regret is constant regardless of how long feedback takes. Thompson Sampling — the 90-year gold standard — degrades 14× when feedback is fully delayed. At D=32 (5% delay), MHZ-Epoch is already better.

Real-World Interpretation

DomainTypical DelayMHZ-Epoch Advantage

Clinical trials6–12 monthsBeats Bayesian adaptive designs

Clinical trials

6–12 monthsBeats Bayesian adaptive designs

Ad attribution7–30 daysBeats Thompson Sampling

Ad attribution

7–30 daysBeats Thompson Sampling

Drug discovery2–8 weeksBeats UCB-based screening

Drug discovery

2–8 weeksBeats UCB-based screening

Hiring decisions6–12 monthsBeats interview-based learning

Hiring decisions

6–12 monthsBeats interview-based learning

Agricultural experiments1 seasonBeats adaptive field trials

Agricultural experiments

1 seasonBeats adaptive field trials

Real-World Validation — MovieLens 25M

🌍Proven on Real Data

Synthetic benchmarks are useful, but real-world data is the ultimate test. We evaluated MHZ-Epoch on the MovieLens 25M dataset — 25 million movie ratings from 162,000 real users. Using replay evaluation (the industry-standard offline testing method), MHZ-Epoch demonstrated robust performance on actual recommendation data without any training or parameter tuning.

25M

Ratings

100

Top Movies (Arms)

100K

Events Evaluated

≥ 4.5

Reward Threshold

Replay

Evaluation Method

⚡

Zero cold start penalty

MHZ-Epoch performs optimally from the first recommendation. No training period. No warm-up phase. Just deterministic exploration that works immediately on real user preferences. With 50% CTR on matched recommendations, MHZ-Epoch demonstrates precise taste discovery through its 64-pull deterministic exploration phase.

AlgorithmCumulative RewardCTRTraining Required

🏆MHZ-Epoch350.0%❌ None

🏆MHZ-Epoch50.0%

Reward: 3❌ None

ε-Greedy1,29752.7%✅ Online learning

ε-Greedy52.7%

Reward: 1,297✅ Online learning

Thompson Sampling6044.8%✅ Online learning

Thompson Sampling44.8%

Reward: 60✅ Online learning

Random26527.8%❌ None

Random27.8%

Reward: 265❌ None

UCB11743.6%✅ Online learning

UCB143.6%

Reward: 17✅ Online learning

🌍

Proven on 25 million real ratings.

MHZ-Epoch works on actual user data, not just simulations. The 64-pull exploration phase finds good movies immediately, without needing to learn user preferences first. With 50% CTR on matched recommendations, MHZ-Epoch demonstrates that deterministic exploration can discover user taste as effectively as Bayesian learning — making it ideal for cold start scenarios where you have no user history.

Real-World Applications

New user onboarding

No history available — MHZ-Epoch performs optimally from the first recommendation

Privacy-first recommendations

No user tracking or preference modeling required

Cross-platform recommendations

No shared history needed between platforms

A/B testing baselines

Zero-training control group for rigorous experimentation

Ad-Tech Validation — Criteo Production Data

💰 PROVEN ON REAL ADS

We evaluated MHZ-Epoch on the Criteo Ad Click dataset — 500,000 real ad impressions from Criteo's production advertising system. This is the same data that powers billion-dollar ad platforms. MHZ-Epoch achieved competitive performance against industry-standard machine learning models, without using any context features or requiring any training infrastructure.

Dataset Details

▸500,000 ad impressions from Criteo production system

▸20 distinct ad campaigns (arms)

▸13 integer features + 26 categorical features (available but not used by MHZ)

▸Binary rewards: click (1) or no click (0)

▸Industry-standard benchmark for ad click prediction

Performance Comparison

Algorithm	CTR	Context Used	Training Required	Infrastructure
ε-Greedy	0.2738	❌	❌	Laptop
UCB1	0.2616	❌	❌	Laptop
Logistic Regression	0.2580	✅ 13 features	✅ Online learning	ML pipeline
MHZ-Adaptive v2	0.2533	❌ None	❌ None	Laptop
LinUCB	0.2513	✅ 13 features	✅ Online learning	ML pipeline
Random	0.2489	❌	❌	Laptop
MHZ-Epoch	0.2443	❌ None	❌ None	Laptop

ε-Greedy0.2738

Context: ❌Training: ❌Laptop

UCB10.2616

Context: ❌Training: ❌Laptop

Logistic Regression0.2580

Context: ✅Training: ✅ML pipeline

MHZ-Adaptive v20.2533

Context: ❌Training: ❌Laptop

LinUCB0.2513

Context: ✅Training: ✅ML pipeline

Random0.2489

Context: ❌Training: ❌Laptop

MHZ-Epoch0.2443

Context: ❌Training: ❌Laptop

💰

94.7% of Logistic Regression's CTR. Zero ML infrastructure.

MHZ-Epoch runs on a laptop and achieves 94.7% of what industry-standard machine learning delivers. No feature engineering. No model training. No GPUs. No serving infrastructure. Just OmegaForge's proprietary algorithm running on commodity hardware. This is what zero-infrastructure ad-tech looks like.

Logistic Regression needs

Feature extraction pipeline, online SGD training, model serving, GPU acceleration

MHZ-Epoch needs

A single lightweight data structure and a counter

Decision time

O(d²)

O(1) constant time

Memory

Megabytes of model parameters

Negligible

Infrastructure cost

$thousands/month

Real-World Applications

▸Small ad networks (no ML budget)

▸Edge advertising (IoT, mobile)

▸Privacy-first ad platforms (no feature tracking)

▸A/B testing baselines (zero-training control)

▸Rapid prototyping (deploy in minutes, not months)

When to Choose

Three algorithms, three use cases. Pick the one that matches your constraints.

Choose MHZ-Epoch when:

The environment is stable (stationary rewards)

Feedback is delayed by any amount — MHZ-Epoch is immune to latency

You want the fastest possible warm-start in exactly 64 pulls

You need deterministic, auditable, reproducible exploration

You’re running clinical trials, ad campaigns, or any experiment where outcomes take time

Choose MHZ-Adaptive when:

The environment drifts over time (non-stationary)

You face adversarial conditions

See MHZ-Adaptive →

Choose Thompson Sampling when:

Real-time reward feedback is always available

Compute and memory are unconstrained

The environment is stationary or slowly drifting

Target Domains

Environments where a fixed, pre-computed exploration schedule offers structural advantages.

Training Efficiency

Reduce wasted compute cycles. Converge on optimal configurations faster during hyperparameter search and model selection.

Recommendation Systems

Identify the best content, product, or action to surface with fewer exploration rounds and lower opportunity cost.

Non-Stationary Environments

Persistent multi-scale exploration structure designed for drifting reward distributions where adaptive methods over-commit.

Edge Devices

Lightweight sequential decisions with minimal state. No neural network overhead. Fixed schedule runs anywhere.

A/B Testing

Structured exploration with deterministic allocation. Reach statistical significance with a pre-computed ordering.

Resource Allocation

Dynamic budget distribution across competing options. Minimize regret in portfolio, bid, and scheduling problems.

Intellectual Property

Patent Pending

Proprietary & Patent-Pending

The internal algorithm and sequence generator behind MHZ-Epoch are proprietary and patent-pending. Only benchmark results and integration interfaces are disclosed. The underlying methodology, mathematical structure, and generation process are not publicly available.

Algorithm

Closed-source. Internal architecture and decision logic are not disclosed.

Sequence Generator

Proprietary ordering mechanism. No technical details released.

Integration

Available via API. Black-box interface with documented inputs and outputs.

Interested in licensing or partnership?

MHZ-Epoch is available for enterprise licensing, research collaboration, and integration partnerships. Contact our team to discuss deployment.

#1 zero-training exploration.Immune to feedback delay.

Non-Stationary Benchmark

Stationary Benchmark — Also Competitive

Delayed Feedback Benchmark — Latency Robustness

Mean Regret vs Feedback Delay

The only algorithm immune to feedback delay.

Real-World Interpretation

Real-World Validation — MovieLens 25M

Proven on 25 million real ratings.

Real-World Applications

Ad-Tech Validation — Criteo Production Data

Dataset Details

Performance Comparison

94.7% of Logistic Regression's CTR. Zero ML infrastructure.

Real-World Applications

When to Choose

Choose MHZ-Epoch when:

Choose MHZ-Adaptive when:

Choose Thompson Sampling when:

Target Domains

Training Efficiency

Recommendation Systems

Non-Stationary Environments

Edge Devices

A/B Testing

Resource Allocation

Proprietary & Patent-Pending

Algorithm

Sequence Generator

Integration

Interested in licensing or partnership?

#1 zero-training exploration.
Immune to feedback delay.