Skip to content

vllm_metrics

vllm_metrics

vLLM Prometheus metrics scraper — fetches and parses /metrics endpoint.

Classes

VLLMMetrics dataclass

VLLMMetrics(ttft_p50: float = 0.0, ttft_p95: float = 0.0, ttft_p99: float = 0.0, gpu_cache_usage_pct: float = 0.0, e2e_latency_p50: float = 0.0, e2e_latency_p95: float = 0.0, queue_depth: float = 0.0)

Parsed vLLM performance metrics.

VLLMMetricsScraper

VLLMMetricsScraper(host: str = 'http://localhost:8000')

Scrapes vLLM's Prometheus /metrics endpoint.

Source code in src/openjarvis/telemetry/vllm_metrics.py
def __init__(self, host: str = "http://localhost:8000") -> None:
    self._host = host.rstrip("/")
Functions
scrape
scrape() -> VLLMMetrics

Fetch and parse vLLM metrics. Returns zeroed metrics on error.

Source code in src/openjarvis/telemetry/vllm_metrics.py
def scrape(self) -> VLLMMetrics:
    """Fetch and parse vLLM metrics. Returns zeroed metrics on error."""
    try:
        resp = httpx.get(f"{self._host}/metrics", timeout=5.0)
        resp.raise_for_status()
    except (
        httpx.ConnectError, httpx.TimeoutException, httpx.HTTPStatusError,
    ) as exc:
        logger.debug("Failed to fetch vLLM metrics: %s", exc)
        return VLLMMetrics()

    return self._parse(resp.text)