Treni Experiment Docs

Thesis

Paper Objectives and Thesis

Results

Findings Changelog Leaderboard Routing Comparison

Artifacts

Canonical G5 Artifact Set Benchmark Status Raw Artifacts

Roadmap

Benchmark Status

What was run already, what still remains, and why.

Direct Answers

"Did we run the start benchmark (Phase 1 baseline)?" Yes.
"Did we rerun after true TTFT and cold fixes?" Yes (2026-02-17 on G5).
"Did we run all benchmarks in the full plan?" Not fully.
"Is there anything else to run?" Yes (Phase 3 loops, A100, H100).

What Has Been Run

Phase 1 (Baseline, Python stack)

T4 set: baseline JSON exists.
G5 set: baseline JSON exists.
Includes cold start breakdown, warm model runs, and pipeline runs.

Phase 2 (Minimal runtime benchmark)

T4 set: runtime JSON exists.
G5 set: runtime JSON exists.
Includes cold starts, model run timing, and HTTP request latency.
True TTFT rerun exists (runtime timing, not SSE proxy).
Cold optimization rerun exists after tensor index-cache fix.

Week 3 (Numerical parity)

T4 parity: strict mode, 0 failures.
G5 parity: strict mode, 0 failures.
Donut is intentionally skipped in parity check and marked as skipped.

Phase 3 comparison report

T4 comparison report exists.
G5 comparison report exists.

Latest Key Findings (2026-02-17)

Warm path on G5 remains strong (~80.8 ms mean, ~89.6 ms p99).
Internal routing is faster than external routing (1.032x external/internal ratio).
Cold TTFT dropped dramatically after tensor index-cache fix:
- qwen: 27.6s -> 1.77s (15.5x)
- donut: 67.4s -> 0.57s (117.7x)
- bart: 77.5s -> 0.74s (104.2x)
- minilm: roughly unchanged (23.3ms -> 22.7ms)

What Is Still Missing Per Plan

If following the full sequence:

Phase 3 agentic loop capability study.
A100 run set.
H100 run set.
Final paper-grade figure/table package.

Canonical Clarification

Full-system canonical set remains g5-20260216-foundation.
Cold optimization is tracked as g5-20260217-cold-indexcache (latest cold-specific canonical evidence).

Artifact Pointers

True TTFT set: /benchmarks/g5-20260217-truettft/
Cold index-cache set: /benchmarks/g5-20260217-cold-indexcache/
Routing comparison set: /benchmarks/g5-20260217-routing/

Canonical G5 Artifact Set

Exact files selected as the official latest result set.

Raw Artifacts

Direct JSON and report files for each benchmark set.

On this page

Direct Answers What Has Been Run Phase 1 (Baseline, Python stack)Phase 2 (Minimal runtime benchmark)Week 3 (Numerical parity)Phase 3 comparison report Latest Key Findings (2026-02-17)What Is Still Missing Per Plan Canonical Clarification Artifact Pointers