MOCK · trigger workflow to load real dataLast run: 5h agoJudge: gpt-4o-mini
Run eval

Trend Over Time

Score trajectory across 4 prompt versions / runs

Per-Metric Trajectory
each line = one of the 7 metrics across runs
Run Log
most recent first
RunLabelDateNPass
20260509T013000Z-mock-baselinevoice-rule-fix-v25/9/2026, 1:30:00 AM50064.2%
20260501T030000Z-no-essayno-essay-rule5/1/2026, 3:00:00 AM45157.2%
20260422T030000Z-voice-tightenvoice-tighten-v14/22/2026, 3:00:00 AM40855.4%
20260415T030000Z-baselinebaseline-v14/15/2026, 3:00:00 AM31252.2%