Treni

Findings Changelog

Dated summary of major experiment findings and interpretation.

At A Glance

  • Warm request path on G5 is stable and fast in the current runtime.
  • Internal routing beats external routing on matched benchmark tasks.
  • Cold start had a major bottleneck; tensor lookup indexing removed most of it.
  • Remaining cold cost is concentrated mostly in Qwen first-hit path.

Timeline

Latest Key Numbers

Warm Path (G5)

  • Warm steady-state request mean: ~80.8 ms
  • Warm steady-state p99: ~89.6 ms

Routing (Internal vs External, G5)

  • Internal mean: 94.849 ms
  • External mean: 97.927 ms
  • External/Internal: 1.032x (internal faster)

Cold TTFT Before vs After Index Cache (3-run means, G5)

ModelBeforeAfterSpeedup
qwen27574.564 ms1774.951 ms15.535x
donut67360.388 ms572.485 ms117.663x
bart77520.798 ms743.652 ms104.243x
minilm23.342 ms22.698 ms1.028x

What Was Actually Tested

  1. Baseline (Python/dependency path) runs on T4 and G5.
  2. Runtime cold and warm request-path benchmarks.
  3. True runtime-reported TTFT (not SSE first-event proxy).
  4. Internal-vs-external routing comparison on matched tasks.
  5. Week 3 numerical parity checks (strict mode; donut intentionally skipped in parity harness).

What Is Not Finished Yet

  1. Phase 3 agentic loop capability study (retrieval correction, tool-state adaptation, confidence-gated branching).
  2. A100/H100 reruns from the original expansion phase.
  3. Paper-grade figures package.

On this page