Leaderboard

Side-by-side benchmark results for T4 and G5 experiment sets.

Lower time is better.

G5 Foundation (Canonical)

Metric	Value
Baseline pipeline mean	2407.974 ms
Runtime warm request mean (3-run)	82.707 ms
Runtime warm request p99 (3-run)	91.738 ms
Baseline/runtime ratio (pipeline)	29.11x

G5 Cold First-Hit — True TTFT (3-run Means, 2026-02-17)

Model	Pre-fix TTFT	Post-fix TTFT	Speedup
qwen	27574.564 ms	1774.951 ms	15.535x
donut	67360.388 ms	572.485 ms	117.663x
bart	77520.798 ms	743.652 ms	104.243x
minilm	23.342 ms	22.698 ms	1.028x

All values above are runtime-instrumented timing.ttft_ms.

Internal vs External Routing (G5, 2026-02-17)

Metric	Internal	External	Ratio
Mean latency	94.849 ms	97.927 ms	1.032x

Task	Internal	External
general_short	150.767 ms	152.274 ms
receipt_extract	80.732 ms	81.270 ms
search_grounded	46.945 ms	57.237 ms
summarize_short	100.950 ms	100.928 ms

Internal routing is faster across all tasks (ratio > 1 means internal wins).

Historical Legacy Mixed-Mode Context

Set	Runtime HTTP request mean	Runtime HTTP request p99
T4 (2026-02-15)	146279.609 ms	156769.1 ms
G5 (2026-02-15)	77449.605 ms	83346.187 ms
G5 registry-cached single run (2026-02-16)	82.913 ms	91.877 ms

Parity Health

Set	Checked	Failed	Strict
T4	3	0	true
G5	3	0	true

Findings Changelog

Dated summary of major experiment findings and interpretation.

Routing Comparison

Internal vs external routing benchmark results from Plan v2 Track B.

On this page

G5 Foundation (Canonical)G5 Cold First-Hit — True TTFT (3-run Means, 2026-02-17)Internal vs External Routing (G5, 2026-02-17)Historical Legacy Mixed-Mode Context Parity Health