Treni

Benchmark Status

What was run already, what still remains, and why.

Direct Answers

  • "Did we run the start benchmark (Phase 1 baseline)?" Yes.
  • "Did we rerun after true TTFT and cold fixes?" Yes (2026-02-17 on G5).
  • "Did we run all benchmarks in the full plan?" Not fully.
  • "Is there anything else to run?" Yes (Phase 3 loops, A100, H100).

What Has Been Run

Phase 1 (Baseline, Python stack)

  • T4 set: baseline JSON exists.
  • G5 set: baseline JSON exists.
  • Includes cold start breakdown, warm model runs, and pipeline runs.

Phase 2 (Minimal runtime benchmark)

  • T4 set: runtime JSON exists.
  • G5 set: runtime JSON exists.
  • Includes cold starts, model run timing, and HTTP request latency.
  • True TTFT rerun exists (runtime timing, not SSE proxy).
  • Cold optimization rerun exists after tensor index-cache fix.

Week 3 (Numerical parity)

  • T4 parity: strict mode, 0 failures.
  • G5 parity: strict mode, 0 failures.
  • Donut is intentionally skipped in parity check and marked as skipped.

Phase 3 comparison report

  • T4 comparison report exists.
  • G5 comparison report exists.

Latest Key Findings (2026-02-17)

  • Warm path on G5 remains strong (~80.8 ms mean, ~89.6 ms p99).
  • Internal routing is faster than external routing (1.032x external/internal ratio).
  • Cold TTFT dropped dramatically after tensor index-cache fix:
    • qwen: 27.6s -> 1.77s (15.5x)
    • donut: 67.4s -> 0.57s (117.7x)
    • bart: 77.5s -> 0.74s (104.2x)
    • minilm: roughly unchanged (23.3ms -> 22.7ms)

What Is Still Missing Per Plan

If following the full sequence:

  1. Phase 3 agentic loop capability study.
  2. A100 run set.
  3. H100 run set.
  4. Final paper-grade figure/table package.

Canonical Clarification

  • Full-system canonical set remains g5-20260216-foundation.
  • Cold optimization is tracked as g5-20260217-cold-indexcache (latest cold-specific canonical evidence).

Artifact Pointers

On this page