Canonical G5 Artifact Set
Exact files selected as the official latest result set.
What "Canonical Artifact Set" Means
It means: pick one exact, complete run set as the official reference.
That avoids mixing metrics from different runs, hardware, or timestamps.
Current Canonical Set
Date: 2026-02-16 (UTC) Hardware class: AWS g5.xlarge (NVIDIA A10G)
Files:
- Baseline:
/benchmarks/g5-20260215/baseline_20260215T064542Z.json - Warm run 1:
/benchmarks/g5-20260216-foundation/runtime_foundation_warm_r1_20260216T215110Z.json - Warm run 2:
/benchmarks/g5-20260216-foundation/runtime_foundation_warm_r2_20260216T215340Z.json - Warm run 3:
/benchmarks/g5-20260216-foundation/runtime_foundation_warm_r3_20260216T215609Z.json - Cold run 1:
/benchmarks/g5-20260216-foundation/runtime_foundation_cold_r1_20260216T215848Z.json - Cold run 2:
/benchmarks/g5-20260216-foundation/runtime_foundation_cold_r2_20260216T220317Z.json - Cold run 3:
/benchmarks/g5-20260216-foundation/runtime_foundation_cold_r3_20260216T220744Z.json - Foundation summary:
/benchmarks/g5-20260216-foundation/foundation_summary_20260216T221232Z.json - Parity (same runtime revision):
/benchmarks/g5-20260216-fix2/parity_fix2_20260216T170320Z.json
Latest Cold Optimization Set (Supplemental)
Date: 2026-02-17 (UTC) Set id: g5-20260217-cold-indexcache
Files:
- Summary JSON:
/benchmarks/g5-20260217-cold-indexcache/cold_indexcache_summary_20260217T185316Z.json - Summary Markdown:
/benchmarks/g5-20260217-cold-indexcache/cold_indexcache_summary_20260217T185316Z.md - Cold run 1:
/benchmarks/g5-20260217-cold-indexcache/runtime_truettft_cold_indexcache_r1_20260217T184653Z.json - Cold run 2:
/benchmarks/g5-20260217-cold-indexcache/runtime_truettft_cold_indexcache_r2_20260217T184821Z.json - Cold run 3:
/benchmarks/g5-20260217-cold-indexcache/runtime_truettft_cold_indexcache_r3_20260217T184949Z.json
Key Results
| Warm Steady-State (3-run aggregate) | Value |
|---|---|
| Request mean | 82.707 ms |
| Request p99 | 91.738 ms |
| Startup to healthy | 2004.489 ms |
| Cold First-Hit — True TTFT (3-run, pre-fix 2026-02-17) | TTFT mean | Full latency mean |
|---|---|---|
| qwen | 27574.564 ms | 27652.393 ms |
| donut | 67360.388 ms | 67391.855 ms |
| bart | 77520.798 ms | 77560.962 ms |
| minilm | 23.342 ms | 47.583 ms |
| Cold First-Hit — True TTFT (3-run, post-fix 2026-02-17) | TTFT mean | Full latency mean |
|---|---|---|
| qwen | 1774.951 ms | 1851.099 ms |
| donut | 572.485 ms | 603.490 ms |
| bart | 743.652 ms | 783.444 ms |
| minilm | 22.698 ms | 46.528 ms |
Note: Both tables use runtime-instrumented timing.ttft_ms (not SSE proxy). Post-fix run includes tensor lookup index cache changes.
| Baseline vs Runtime (reference) | Value |
|---|---|
| Baseline pipeline mean | 2407.974 ms |
| Runtime warm request mean | 82.707 ms |
| Ratio (baseline/runtime) | 29.11x |
Parity Status
- Strict mode requested:
true - Checked models:
3 - Failed models:
0 - Donut parity: intentionally skipped (documented in parity JSON)