data
data
¶
TrainingDataMiner — extract supervised training pairs from the TraceStore.
Provides three extraction modes:
- SFT pairs — (input, output) pairs from high-quality traces for supervised fine-tuning.
- Routing pairs — per-query-class statistics identifying the best model for each class.
- Agent config pairs — per-query-class statistics identifying the best agent and tool combination.
Classes¶
TrainingDataMiner
¶
Extract supervised training pairs from stored traces.
| PARAMETER | DESCRIPTION |
|---|---|
trace_store
|
Any object with a
TYPE:
|
min_quality
|
Minimum
TYPE:
|
min_samples_per_class
|
Minimum number of samples a query class must have to appear in routing/agent-config results.
TYPE:
|
Source code in src/openjarvis/learning/training/data.py
Functions¶
extract_sft_pairs
¶
Return SFT training pairs from high-quality traces.
Each entry is a dict with keys: input, output,
query_class, model, feedback.
Duplicate (input, output) pairs are collapsed; the first
occurrence is kept.
Source code in src/openjarvis/learning/training/data.py
extract_routing_pairs
¶
Return per-query-class routing recommendations.
Returns a dict mapping query class to:
best_model— model with highest average feedback for the class.avg_feedback— average feedback across all models for the class.sample_count— total number of qualifying traces in the class.all_models— dict of{model: {"avg_feedback": float, "count": int}}.
Source code in src/openjarvis/learning/training/data.py
extract_agent_config_pairs
¶
Return per-query-class agent and tool recommendations.
Returns a dict mapping query class to:
best_agent— agent with the highest average feedback.best_tools— most frequently used tools by the best agent.avg_feedback— average feedback across all agents for the class.sample_count— total number of qualifying traces in the class.