Treni

TODO

Live execution checklist and next actions.

Priority Order

Current Checklist

Track A: Cold/Hot Foundations

  • True TTFT instrumentation in runtime request path.
  • 3x cold-first-hit repeatability set (G5).
  • 3x warm steady-state repeatability set (G5).
  • Cold bottleneck fix: per-model tensor lookup index cache.
  • Cold rerun after fix with artifact pack.
  • Add stage-level cold decomposition metrics (tokenizer load, index build, tensor upload, first decode step).
  • Optimize remaining Qwen cold-first-hit stages after decomposition.

Track B: Internal vs External Routing

  • Minimal external baseline harness.
  • Matched task set and budgets.
  • Internal vs external run and report (G5).
  • Add explicit failure-amplification tests (timeouts/retries under load).

Track C: Agentic Loop Capability

  • Freeze 3 loop scenarios and success criteria.
  • Implement evaluators (success rate + steps-to-convergence).
  • Run internal vs external loop benchmark.
  • Publish trace-backed capability report.

Expansion

  • Full A100 run set.
  • Full H100 run set.
  • Paper-grade figure/table package.

Immediate Next Actions

  1. Implement stage-level cold instrumentation for Qwen/BART/Donut.
  2. Produce a short cold-stage breakdown report from 3-run measurements.
  3. Patch next cold bottleneck and rerun cold validation set.

On this page