Causal · Counterfactual · RLHF Research Console
Tenantconsolidated · 14 tenants
Window
live · refreshed 14s ago
Causal ROICounterfactualsSimulation LabLearning TimelineRec QualityPolicy EvolutionTreatment vs Control
AI Policy Evolution Center
Version-by-version reward, objective weights, and shadow→prod promotion
live · 28d window · n=18,420
Policy lineage
5 versions · current = v4.5 (shadow)Version
Date
Reward
Accept
Dominant objective
Change
v3.0
Mar 14
0.612
58%
Greedy margin
Baseline
v3.4
Apr 02
0.681+6.9%
63%
+ Risk discount
Added downside penalty
v4.0
Apr 28
0.742+6.1%
71%
+ Causal head
DR-learner integrated
v4.2
May 09
0.798+5.6%
74%
+ Service constraint
OTIF guardrail
v4.5
May 16
0.847+4.9%
78%
+ Counterfactual reg.
CF regularizer (shadow)
Objective weight delta · v4.2 → v4.5
Carrier reliability
+6.0pp
Lane volatility
-3.0pp
Accessorial risk
+7.0pp
Fuel surcharge
-4.0pp
Service margin
+5.0pp
Counterfactual prior
+11.0pp
previous current
Policy effectiveness
Expected reward
0.847
Acceptance rate
78%
KL divergence (prev)
0.082
Exploration variance
0.31
Off-policy uplift
+6.1%
Shadow eval (n)
12,408