Skip to content

heuristic_reward

heuristic_reward

Heuristic reward function — weighted score from latency, cost, efficiency.

Classes

HeuristicRewardFunction

HeuristicRewardFunction(*, weight_latency: float = 0.4, weight_cost: float = 0.3, weight_efficiency: float = 0.3, max_latency: float = 30.0, max_cost: float = 0.01)

Bases: RewardFunction

Computes a scalar reward based on latency, cost, and token efficiency.

Each component is normalised to [0, 1] and combined via a weighted sum.

Source code in src/openjarvis/learning/heuristic_reward.py
def __init__(
    self,
    *,
    weight_latency: float = 0.4,
    weight_cost: float = 0.3,
    weight_efficiency: float = 0.3,
    max_latency: float = 30.0,
    max_cost: float = 0.01,
) -> None:
    self.weight_latency = weight_latency
    self.weight_cost = weight_cost
    self.weight_efficiency = weight_efficiency
    self.max_latency = max_latency
    self.max_cost = max_cost