gpu_monitor
gpu_monitor
¶
GPU monitoring via pynvml — background poller for GPU metrics.
Classes¶
GpuHardwareSpec
dataclass
¶
Peak theoretical capabilities for a known GPU model.
GpuSnapshot
dataclass
¶
GpuSnapshot(power_watts: float, utilization_pct: float, memory_used_gb: float, temperature_c: float, device_id: int = 0)
A single point-in-time reading from one GPU device.
GpuSample
dataclass
¶
GpuSample(energy_joules: float = 0.0, mean_power_watts: float = 0.0, peak_power_watts: float = 0.0, mean_utilization_pct: float = 0.0, peak_utilization_pct: float = 0.0, mean_memory_used_gb: float = 0.0, peak_memory_used_gb: float = 0.0, mean_temperature_c: float = 0.0, peak_temperature_c: float = 0.0, duration_seconds: float = 0.0, num_snapshots: int = 0)
Aggregated GPU metrics over an inference bracket.
GpuMonitor
¶
Background GPU poller using pynvml.
Usage::
mon = GpuMonitor(poll_interval_ms=50)
with mon.sample() as result:
# ... run inference ...
pass
print(result.energy_joules)
mon.close()
Source code in src/openjarvis/telemetry/gpu_monitor.py
Functions¶
available
staticmethod
¶
Return True if pynvml is importable and can be initialized.
Source code in src/openjarvis/telemetry/gpu_monitor.py
sample
¶
sample() -> Generator[GpuSample, None, None]
Context manager that polls GPUs during the block, then populates the sample.
If pynvml is unavailable or no devices are found, yields an empty
:class:GpuSample without starting a background thread.
Source code in src/openjarvis/telemetry/gpu_monitor.py
close
¶
Shut down pynvml if it was initialized.
Source code in src/openjarvis/telemetry/gpu_monitor.py
Functions¶
lookup_gpu_spec
¶
lookup_gpu_spec(name: str) -> Optional[GpuHardwareSpec]
Return the :class:GpuHardwareSpec for name, or None if unknown.
Matches are case-insensitive substring lookups against the keys in
:data:GPU_SPECS.