Treni

Canonical G5 Artifact Set

Exact files selected as the official latest result set.

What "Canonical Artifact Set" Means

It means: pick one exact, complete run set as the official reference.

That avoids mixing metrics from different runs, hardware, or timestamps.

Current Canonical Set

Date: 2026-02-16 (UTC) Hardware class: AWS g5.xlarge (NVIDIA A10G)

Files:

Latest Cold Optimization Set (Supplemental)

Date: 2026-02-17 (UTC) Set id: g5-20260217-cold-indexcache

Files:

Key Results

Warm Steady-State (3-run aggregate)Value
Request mean82.707 ms
Request p9991.738 ms
Startup to healthy2004.489 ms
Cold First-Hit — True TTFT (3-run, pre-fix 2026-02-17)TTFT meanFull latency mean
qwen27574.564 ms27652.393 ms
donut67360.388 ms67391.855 ms
bart77520.798 ms77560.962 ms
minilm23.342 ms47.583 ms
Cold First-Hit — True TTFT (3-run, post-fix 2026-02-17)TTFT meanFull latency mean
qwen1774.951 ms1851.099 ms
donut572.485 ms603.490 ms
bart743.652 ms783.444 ms
minilm22.698 ms46.528 ms

Note: Both tables use runtime-instrumented timing.ttft_ms (not SSE proxy). Post-fix run includes tensor lookup index cache changes.

Baseline vs Runtime (reference)Value
Baseline pipeline mean2407.974 ms
Runtime warm request mean82.707 ms
Ratio (baseline/runtime)29.11x

Parity Status

  • Strict mode requested: true
  • Checked models: 3
  • Failed models: 0
  • Donut parity: intentionally skipped (documented in parity JSON)

On this page