efficiency
efficiency
¶
MFU/MBU efficiency calculator for GPU inference telemetry.
Computes Model FLOPs Utilization (MFU) and Model Bandwidth Utilization (MBU) to quantify how efficiently a model uses available GPU compute and memory bandwidth.
Classes¶
EfficiencyMetrics
dataclass
¶
EfficiencyMetrics(mfu_pct: float = 0.0, mbu_pct: float = 0.0, actual_flops: float = 0.0, peak_flops: float = 0.0, actual_bandwidth_gb_s: float = 0.0, peak_bandwidth_gb_s: float = 0.0, ipj: float = 0.0)
Results of an MFU/MBU efficiency calculation.
Functions¶
estimate_model_flops_per_token
¶
Estimate FLOPs for one forward-pass token of a dense transformer.
For dense models, FLOPs per token ≈ 2 * params. For MoE models, pass
active_params_b (the number of active parameters per token).
Args:
param_count_b: Total parameter count in billions.
active_params_b: Active parameters per token in billions. If None,
defaults to param_count_b (dense model).
Returns: Estimated FLOPs per token.
Source code in src/openjarvis/telemetry/efficiency.py
estimate_model_bytes_per_token
¶
Estimate bytes of memory loaded per decode step.
Args: param_count_b: Total parameter count in billions. bytes_per_param: Bytes per parameter (default 2.0 for FP16).
Returns: Bytes loaded per token.
Source code in src/openjarvis/telemetry/efficiency.py
compute_efficiency
¶
compute_efficiency(param_count_b: float, active_params_b: float | None, gpu_peak_tflops: float, gpu_peak_bandwidth_gb_s: float, tokens_per_sec: float, num_gpus: int = 1, energy_joules: float = 0.0, accuracy: float = 0.0, bytes_per_param: float = 2.0) -> EfficiencyMetrics
Compute MFU, MBU, and derived efficiency metrics.
Args: param_count_b: Total parameter count in billions. active_params_b: Active parameters per token in billions (None for dense). gpu_peak_tflops: Peak theoretical TFLOPS per GPU (e.g. 312 for A100 SXM FP16). gpu_peak_bandwidth_gb_s: Peak memory bandwidth per GPU in GB/s (e.g. 2039 for A100 SXM). tokens_per_sec: Measured generation throughput (tokens/second). num_gpus: Number of GPUs used for inference. energy_joules: Total energy consumed in joules (for IPJ calculation). accuracy: Accuracy score in [0, 1] (for IPJ calculation). bytes_per_param: Bytes per parameter (default 2.0 for FP16).
Returns:
:class:EfficiencyMetrics with all computed values.