Skip to content

energy_nvidia

energy_nvidia

NVIDIA energy monitor — hardware counters (Volta+) with polling fallback.

Classes

NvidiaEnergyMonitor

NvidiaEnergyMonitor(poll_interval_ms: int = 50)

Bases: EnergyMonitor

NVIDIA energy monitor using pynvml.

Primary mode (Volta+): Reads nvmlDeviceGetTotalEnergyConsumption() start/end hardware counters (millijoules). Delta / 1000 = joules.

Fallback mode (pre-Volta): Trapezoidal integration of nvmlDeviceGetPowerUsage() — same algorithm as legacy GpuMonitor.

A lightweight polling thread still runs in both modes for utilization, memory, and temperature metrics (no hw counter for those).

Source code in src/openjarvis/telemetry/energy_nvidia.py
def __init__(self, poll_interval_ms: int = 50) -> None:
    self._poll_interval_s = poll_interval_ms / 1000.0
    self._handles: List = []
    self._device_count = 0
    self._device_name = ""
    self._initialized = False
    self._hw_counter_available = False

    if _PYNVML_AVAILABLE:
        try:
            pynvml.nvmlInit()
            self._device_count = pynvml.nvmlDeviceGetCount()
            self._handles = [
                pynvml.nvmlDeviceGetHandleByIndex(i)
                for i in range(self._device_count)
            ]
            if self._handles:
                self._device_name = pynvml.nvmlDeviceGetName(self._handles[0])
                if isinstance(self._device_name, bytes):
                    self._device_name = self._device_name.decode()
            self._initialized = True
            self._hw_counter_available = self._probe_hw_counter()
        except Exception as exc:
            logger.debug("NVIDIA energy monitor initialization failed: %s", exc)
            self._initialized = False