TODO
Live execution checklist and next actions.
Priority Order
Current Checklist
Track A: Cold/Hot Foundations
- True TTFT instrumentation in runtime request path.
- 3x cold-first-hit repeatability set (G5).
- 3x warm steady-state repeatability set (G5).
- Cold bottleneck fix: per-model tensor lookup index cache.
- Cold rerun after fix with artifact pack.
- Add stage-level cold decomposition metrics (tokenizer load, index build, tensor upload, first decode step).
- Optimize remaining Qwen cold-first-hit stages after decomposition.
Track B: Internal vs External Routing
- Minimal external baseline harness.
- Matched task set and budgets.
- Internal vs external run and report (G5).
- Add explicit failure-amplification tests (timeouts/retries under load).
Track C: Agentic Loop Capability
- Freeze 3 loop scenarios and success criteria.
- Implement evaluators (success rate + steps-to-convergence).
- Run internal vs external loop benchmark.
- Publish trace-backed capability report.
Expansion
- Full A100 run set.
- Full H100 run set.
- Paper-grade figure/table package.
Immediate Next Actions
- Implement stage-level cold instrumentation for Qwen/BART/Donut.
- Produce a short cold-stage breakdown report from 3-run measurements.
- Patch next cold bottleneck and rerun cold validation set.