types
types
¶
Episode dataclasses for orchestrator training.
Adapted from IPW's episode_builder.py. These types represent the core
data structures for orchestrator RL/SFT training: actions, observations,
episode steps, and complete episodes with aggregate metrics.
Classes¶
OrchestratorAction
dataclass
¶
OrchestratorObservation
dataclass
¶
OrchestratorObservation(content: str, latency_seconds: float = 0.0, cost_usd: float = 0.0, energy_joules: float = 0.0, power_watts: float = 0.0, tokens: int = 0)
EpisodeStep
dataclass
¶
EpisodeStep(turn: int, action: OrchestratorAction, observation: OrchestratorObservation)
Single step in an episode.
Attributes¶
Episode
dataclass
¶
Episode(task_id: str, initial_prompt: str, steps: List[EpisodeStep] = list(), final_answer: str = '', ground_truth: str = '', correct: bool = False, total_energy_joules: float = 0.0, total_cost_usd: float = 0.0, total_latency_seconds: float = 0.0, total_tokens: int = 0, max_power_watts: float = 0.0, metadata: Dict[str, Any] = dict())
Complete RL episode with aggregate metrics.
Attributes¶
steps
class-attribute
instance-attribute
¶
steps: List[EpisodeStep] = field(default_factory=list)
Sequence of (action, observation) pairs.
final_answer
class-attribute
instance-attribute
¶
Final answer produced by orchestrator.
correct
class-attribute
instance-attribute
¶
Whether final answer matches ground truth.
Functions¶
add_step
¶
add_step(action: OrchestratorAction, observation: OrchestratorObservation) -> None
Add a step to the episode and update aggregate metrics.
Source code in src/openjarvis/learning/orchestrator/types.py
num_turns
¶
compute_ipj
¶
Compute Intelligence Per Joule (IPJ).
Returns: IPJ score (higher is better). 0.0 if energy is zero or the answer is incorrect.
Source code in src/openjarvis/learning/orchestrator/types.py
to_dict
¶
Convert episode to dictionary for serialization.
Source code in src/openjarvis/learning/orchestrator/types.py
EpisodeState
dataclass
¶
EpisodeState(initial_prompt: str, history: List[Tuple[OrchestratorAction, OrchestratorObservation]] = list(), final_answer: Optional[str] = None)
Mutable state during episode execution.
Attributes¶
history
class-attribute
instance-attribute
¶
history: List[Tuple[OrchestratorAction, OrchestratorObservation]] = field(default_factory=list)
History of (action, observation) pairs.
final_answer
class-attribute
instance-attribute
¶
Final answer (set when is_final_answer action is taken).
Functions¶
add_turn
¶
add_turn(action: OrchestratorAction, observation: OrchestratorObservation) -> None
Add a turn to the episode history.
Source code in src/openjarvis/learning/orchestrator/types.py
num_turns
¶
to_episode
¶
to_episode(task_id: str, ground_truth: str, correct: bool) -> Episode
Convert state to Episode for reward computation.
Source code in src/openjarvis/learning/orchestrator/types.py
PolicyOutput
dataclass
¶
PolicyOutput(thought: str, tool_name: str, tool_input: str, is_final_answer: bool = False, raw_text: str = '', confidence: float = 1.0)
Functions¶
normalize_number
¶
Try to parse a string as a number.
Returns None if not a valid number.
Source code in src/openjarvis/learning/orchestrator/types.py
extract_answer
¶
Extract the core answer from a potentially verbose response.
Handles patterns like: - "The answer is 4" - "Result: 4.0" - "4" (unchanged) - "Therefore, the answer is approximately 4"
Source code in src/openjarvis/learning/orchestrator/types.py
grade_answer
¶
Grade an answer against expected, with smart matching.
Handles: - Exact string match (case-insensitive) - Numeric comparison with tolerance - Answer extraction from verbose responses
Args: predicted: The model's answer. expected: Ground truth answer. tolerance: Tolerance for numeric comparisons.
Returns: True if answer is correct.