Skip to content

student_runner

student_runner

Real student runner for distillation experiments.

Replaces the MagicMock() in the experiment runner script with a callable that actually invokes the student model via vLLM (or any OpenAI-compatible engine) and returns structured results.

Classes

StudentResult dataclass

StudentResult(content: str, score: float = 0.0, trace_id: str = '', latency_seconds: float = 0.0, tokens_used: int = 0)

Result from running the student model on a task.

VLLMStudentRunner

VLLMStudentRunner(host: str = 'http://localhost:8001', model: str = 'Qwen/Qwen3.5-9B', temperature: float = 0.6, max_tokens: int = 4096)

Invoke the student model via a vLLM OpenAI-compatible endpoint.

PARAMETER DESCRIPTION
host

vLLM server URL (e.g. http://localhost:8001).

TYPE: str DEFAULT: 'http://localhost:8001'

model

Model name as registered in vLLM (e.g. Qwen/Qwen3.5-9B).

TYPE: str DEFAULT: 'Qwen/Qwen3.5-9B'

temperature

Sampling temperature.

TYPE: float DEFAULT: 0.6

max_tokens

Max tokens for the student response.

TYPE: int DEFAULT: 4096

Source code in src/openjarvis/learning/distillation/student_runner.py
def __init__(
    self,
    host: str = "http://localhost:8001",
    model: str = "Qwen/Qwen3.5-9B",
    temperature: float = 0.6,
    max_tokens: int = 4096,
) -> None:
    import httpx

    self._host = host.rstrip("/")
    self._model = model
    self._temperature = temperature
    self._max_tokens = max_tokens
    self._client = httpx.Client(base_url=self._host, timeout=300.0)

Functions

build_benchmark_samples_from_traces

build_benchmark_samples_from_traces(trace_store: Any, *, limit: int = 50, min_feedback: float | None = None) -> list

Build PersonalBenchmarkSample objects from the trace store.

Pulls recent traces (optionally filtered by feedback score) and converts them into benchmark samples the teacher can reference.

Source code in src/openjarvis/learning/distillation/student_runner.py
def build_benchmark_samples_from_traces(
    trace_store: Any,
    *,
    limit: int = 50,
    min_feedback: float | None = None,
) -> list:
    """Build PersonalBenchmarkSample objects from the trace store.

    Pulls recent traces (optionally filtered by feedback score) and
    converts them into benchmark samples the teacher can reference.
    """
    from openjarvis.learning.optimize.personal.synthesizer import (
        PersonalBenchmarkSample,
    )

    traces = trace_store.list_traces(limit=limit)
    samples = []
    for t in traces:
        fb = getattr(t, "feedback", None)
        if min_feedback is not None and (fb is None or fb < min_feedback):
            continue
        samples.append(
            PersonalBenchmarkSample(
                trace_id=t.trace_id,
                query=t.query,
                reference_answer=t.result[:2000] if t.result else "",
                agent=t.agent,
                category="benchmark",
                feedback_score=fb if fb is not None else 0.0,
            )
        )
    logger.info("Built %d benchmark samples from traces", len(samples))
    return samples