bandit_router
bandit_router
¶
Bandit router — Thompson Sampling / UCB for query→model selection.
Classes¶
ArmStats
dataclass
¶
Statistics for a single arm (model).
BanditRouterPolicy
¶
BanditRouterPolicy(*, strategy: Literal['thompson', 'ucb'] = 'thompson', exploration_factor: float = 2.0, min_pulls: int = 3, reward_threshold: float = 0.5)
Multi-armed bandit router using Thompson Sampling or UCB.
Each (query_class, model) pair is an arm. Rewards come from trace outcomes.
Source code in src/openjarvis/learning/bandit_router.py
Functions¶
route
¶
route(context: RoutingContext, models: List[str]) -> str
Select model using the configured bandit strategy.
Source code in src/openjarvis/learning/bandit_router.py
update
¶
Update arm statistics with observed reward.
Source code in src/openjarvis/learning/bandit_router.py
get_stats
¶
Get arm statistics.
Source code in src/openjarvis/learning/bandit_router.py
Functions¶
ensure_registered
¶
Register BanditRouterPolicy if not already present.