ELAGrades 6-8
EL Education K-8 Language Arts
EL Education
Judge anthropic/claude-opus-4-7 · Signal Studio judge (claude-opus-4-7) · 2026-05-25
86
Gateways 3/3 Meets% indicator
exact match
Agreement vs gold
Indicator level — gold and the judge as two raters over the rubric criteria.
Exact match
85.7%
28 indicators vs gold
Weighted κ
0.84
ordinal-weighted agreement
MAE
0.21
lower is better
Signed bias
+0.14
judge over-scores
Gateway rollup
Indicator scores rolled up to EdReports gateway ratings (sequential gating + no-0s cap), gold vs judge.
G1 · Text Quality and Complexity
28/28 pts
GoldMeets
→JudgeMeets
agreeG2 · Building Knowledge
32/32 pts
GoldMeets
→JudgeMeets
agreeG3 · Usability
24/25 pts
GoldMeets
→JudgeMeets
agreeGateway-level agreement: exact 100% · κ 1.00 (3 gateways)
Divergences (4)
Indicators where the judge's score differed from gold. Amber = judge under-scored, blue = over-scored.
1l
1→2
gold Partially Meets Expectations · judge 2 pts2h
2→4
gold Partially Meets Expectations · judge 4 pts3j
2→4
gold Partially Meets Expectations · judge 4 pts3b
2→1
gold Meets Expectations · judge 1 pts