Per-Role Breakdown
Pass rates and metric means split by role template (sorted weakest first)
| Role | N | Pass | Correctness | Completeness | Context | Context | Opening | Voice | Faithfulness | |
|---|---|---|---|---|---|---|---|---|---|---|
| llm_genai_engineer | 22 | 55% | 0.79 | 0.65 | 0.81 | 0.75 | 0.40 | 0.62 | 0.89 | Inspect → |
| backend_engineer | 25 | 64% | 0.83 | 0.72 | 0.84 | 0.81 | 0.47 | 0.65 | 0.92 | Inspect → |
| mlops_engineer | 18 | 67% | 0.82 | 0.74 | 0.85 | 0.80 | 0.48 | 0.68 | 0.91 | Inspect → |
| ml_engineer | 28 | 71% | 0.85 | 0.78 | 0.88 | 0.82 | 0.55 | 0.71 | 0.93 | Inspect → |
| frontend_engineer | 15 | 73% | 0.81 | 0.76 | 0.85 | 0.81 | 0.52 | 0.69 | 0.91 | Inspect → |
| sre | 14 | 79% | 0.85 | 0.78 | 0.86 | 0.83 | 0.55 | 0.72 | 0.92 | Inspect → |
| data_engineer | 20 | 80% | 0.87 | 0.82 | 0.88 | 0.84 | 0.60 | 0.74 | 0.93 | Inspect → |