taubench_env
taubench_env
¶
TauBench task environment — native OpenJarvis agent in tau2 simulation.
Plugs OpenJarvis's inference engine into tau2-bench's orchestrator as a
HalfDuplexAgent, so the multi-turn conversation loop, user simulator,
domain tools, database, and evaluation all come from tau2-bench while the
agent's LLM calls go through OpenJarvis.
Classes¶
JarvisHalfDuplexAgent
¶
JarvisHalfDuplexAgent(tools: list, domain_policy: str, engine: Any, model: str, temperature: float = 0.7, max_tokens: int = 4096)
A tau2 HalfDuplexAgent backed by OpenJarvis's inference engine.
Replaces tau2's built-in LLMAgent while keeping the same interface so the Orchestrator, UserSimulator, and evaluation work unchanged.
Source code in src/openjarvis/evals/execution/taubench_env.py
TauBenchTaskEnv
¶
TauBenchTaskEnv(record: EvalRecord, engine_key: Optional[str] = None, model: Optional[str] = None, temperature: float = 0.7, max_tokens: int = 4096, user_model: Optional[str] = None, num_trials: int = 1, telemetry: bool = False, gpu_metrics: bool = False)
Per-task environment for TauBench evaluation.
Creates an OpenJarvis-powered agent, plugs it into tau2's orchestrator, runs the simulation, and stores results in record.metadata for the scorer.