heuristic_reward
heuristic_reward
¶
Heuristic reward function — weighted score from latency, cost, efficiency.
Classes¶
HeuristicRewardFunction
¶
HeuristicRewardFunction(*, weight_latency: float = 0.4, weight_cost: float = 0.3, weight_efficiency: float = 0.3, max_latency: float = 30.0, max_cost: float = 0.01)
Bases: RewardFunction
Computes a scalar reward based on latency, cost, and token efficiency.
Each component is normalised to [0, 1] and combined via a weighted sum.