Canonical G5 Artifact Set

Exact files selected as the official latest result set.

What "Canonical Artifact Set" Means

It means: pick one exact, complete run set as the official reference.

That avoids mixing metrics from different runs, hardware, or timestamps.

Current Canonical Set

Date: 2026-02-16 (UTC) Hardware class: AWS g5.xlarge (NVIDIA A10G)

Files:

Baseline: /benchmarks/g5-20260215/baseline_20260215T064542Z.json
Warm run 1: /benchmarks/g5-20260216-foundation/runtime_foundation_warm_r1_20260216T215110Z.json
Warm run 2: /benchmarks/g5-20260216-foundation/runtime_foundation_warm_r2_20260216T215340Z.json
Warm run 3: /benchmarks/g5-20260216-foundation/runtime_foundation_warm_r3_20260216T215609Z.json
Cold run 1: /benchmarks/g5-20260216-foundation/runtime_foundation_cold_r1_20260216T215848Z.json
Cold run 2: /benchmarks/g5-20260216-foundation/runtime_foundation_cold_r2_20260216T220317Z.json
Cold run 3: /benchmarks/g5-20260216-foundation/runtime_foundation_cold_r3_20260216T220744Z.json
Foundation summary: /benchmarks/g5-20260216-foundation/foundation_summary_20260216T221232Z.json
Parity (same runtime revision): /benchmarks/g5-20260216-fix2/parity_fix2_20260216T170320Z.json

Latest Cold Optimization Set (Supplemental)

Date: 2026-02-17 (UTC) Set id: g5-20260217-cold-indexcache

Files:

Key Results

Warm Steady-State (3-run aggregate)	Value
Request mean	82.707 ms
Request p99	91.738 ms
Startup to healthy	2004.489 ms

Cold First-Hit — True TTFT (3-run, pre-fix 2026-02-17)	TTFT mean	Full latency mean
qwen	27574.564 ms	27652.393 ms
donut	67360.388 ms	67391.855 ms
bart	77520.798 ms	77560.962 ms
minilm	23.342 ms	47.583 ms

Cold First-Hit — True TTFT (3-run, post-fix 2026-02-17)	TTFT mean	Full latency mean
qwen	1774.951 ms	1851.099 ms
donut	572.485 ms	603.490 ms
bart	743.652 ms	783.444 ms
minilm	22.698 ms	46.528 ms

Note: Both tables use runtime-instrumented timing.ttft_ms (not SSE proxy). Post-fix run includes tensor lookup index cache changes.

Baseline vs Runtime (reference)	Value
Baseline pipeline mean	2407.974 ms
Runtime warm request mean	82.707 ms
Ratio (baseline/runtime)	29.11x

Parity Status

Strict mode requested: true
Checked models: 3
Failed models: 0
Donut parity: intentionally skipped (documented in parity JSON)

Routing Comparison

Internal vs external routing benchmark results from Plan v2 Track B.

Benchmark Status

What was run already, what still remains, and why.

On this page

What "Canonical Artifact Set" Means Current Canonical Set Latest Cold Optimization Set (Supplemental)Key Results Parity Status