policy_model
policy_model
¶
Policy model wrapper for orchestrator training.
Adapted from IPW's policy.py. Wraps a HuggingFace causal LM
(e.g. Qwen3-1.7B) to predict structured actions in the orchestrator
environment. All torch/transformers imports are guarded so the
module can be imported without GPU dependencies.
Classes¶
OrchestratorPolicyModel
¶
OrchestratorPolicyModel(model: Any = None, tokenizer: Any = None, max_tokens: int = 256, temperature: float = 0.7)
Wrapper around a causal LM for orchestrator policy prediction.
Input format (prompt)::
Task: {initial_prompt}
Available tools: calculator, think, ...
History:
Turn 1:
Thought: ...
Tool: ...
Observation: ...
What should you do next?
Format your response as:
THOUGHT: [your reasoning]
TOOL: [tool_name]
INPUT: [input for tool]
Output format (from model)::
THOUGHT: [reasoning]
TOOL: [tool_name]
INPUT: [input]
--- or ---
FINAL_ANSWER: [answer]
Source code in src/openjarvis/learning/intelligence/orchestrator/policy_model.py
Functions¶
from_pretrained
classmethod
¶
from_pretrained(model_name: str = 'Qwen/Qwen3-1.7B', gradient_checkpointing: bool = False, load_in_8bit: bool = False, device: Optional[str] = None, **kwargs: Any) -> 'OrchestratorPolicyModel'
Load model from a HuggingFace checkpoint.
Raises ImportError if transformers is not installed.
Source code in src/openjarvis/learning/intelligence/orchestrator/policy_model.py
from_checkpoint
classmethod
¶
Load from a previously saved checkpoint directory.
Source code in src/openjarvis/learning/intelligence/orchestrator/policy_model.py
predict_action
¶
predict_action(state: EpisodeState, available_tools: List[str]) -> OrchestratorAction
Predict the next action given current state.
Source code in src/openjarvis/learning/intelligence/orchestrator/policy_model.py
save
¶
Save model and tokenizer to path.