Benchmark Status
What was run already, what still remains, and why.
Direct Answers
- "Did we run the start benchmark (Phase 1 baseline)?" Yes.
- "Did we rerun after true TTFT and cold fixes?" Yes (2026-02-17 on G5).
- "Did we run all benchmarks in the full plan?" Not fully.
- "Is there anything else to run?" Yes (Phase 3 loops, A100, H100).
What Has Been Run
Phase 1 (Baseline, Python stack)
- T4 set: baseline JSON exists.
- G5 set: baseline JSON exists.
- Includes cold start breakdown, warm model runs, and pipeline runs.
Phase 2 (Minimal runtime benchmark)
- T4 set: runtime JSON exists.
- G5 set: runtime JSON exists.
- Includes cold starts, model run timing, and HTTP request latency.
- True TTFT rerun exists (runtime timing, not SSE proxy).
- Cold optimization rerun exists after tensor index-cache fix.
Week 3 (Numerical parity)
- T4 parity: strict mode, 0 failures.
- G5 parity: strict mode, 0 failures.
- Donut is intentionally skipped in parity check and marked as skipped.
Phase 3 comparison report
- T4 comparison report exists.
- G5 comparison report exists.
Latest Key Findings (2026-02-17)
- Warm path on G5 remains strong (
~80.8 msmean,~89.6 msp99). - Internal routing is faster than external routing (
1.032xexternal/internal ratio). - Cold TTFT dropped dramatically after tensor index-cache fix:
- qwen:
27.6s -> 1.77s(15.5x) - donut:
67.4s -> 0.57s(117.7x) - bart:
77.5s -> 0.74s(104.2x) - minilm: roughly unchanged (
23.3ms -> 22.7ms)
- qwen:
What Is Still Missing Per Plan
If following the full sequence:
- Phase 3 agentic loop capability study.
- A100 run set.
- H100 run set.
- Final paper-grade figure/table package.
Canonical Clarification
- Full-system canonical set remains g5-20260216-foundation.
- Cold optimization is tracked as g5-20260217-cold-indexcache (latest cold-specific canonical evidence).
Artifact Pointers
- True TTFT set:
/benchmarks/g5-20260217-truettft/ - Cold index-cache set:
/benchmarks/g5-20260217-cold-indexcache/ - Routing comparison set:
/benchmarks/g5-20260217-routing/