run report
⚙ Configure
Revisions
— click a commit to include / exclude it everywhere · oldest → newest
Overview
Coverage
Results
Overview
X axis
Series (one line per)
Chart
lines
bars
Label adoption
Marker
Distributions
box = median + quartiles, dashed line = mean (hover a point for model/tier/task/run)
Model
Boxes by
revision
model (per revision)
Points
none (clean boxes)
outliers only
every run
Scale
logarithmic
linear
linear (clip outliers)
Coverage
— % of expected runs done per task × revision (hover for counts) · click a cell for the per-model breakdown (⚙ to pick models)
Tasks
— click a task to expand it: what it asks, how a match is scored, and what each model answered
Run results
Group by
model
task
model & task