Leaderboard
Side-by-side benchmark results for T4 and G5 experiment sets.
Lower time is better.
| Metric | Value |
|---|
| Baseline pipeline mean | 2407.974 ms |
| Runtime warm request mean (3-run) | 82.707 ms |
| Runtime warm request p99 (3-run) | 91.738 ms |
| Baseline/runtime ratio (pipeline) | 29.11x |
| Model | Pre-fix TTFT | Post-fix TTFT | Speedup |
|---|
| qwen | 27574.564 ms | 1774.951 ms | 15.535x |
| donut | 67360.388 ms | 572.485 ms | 117.663x |
| bart | 77520.798 ms | 743.652 ms | 104.243x |
| minilm | 23.342 ms | 22.698 ms | 1.028x |
All values above are runtime-instrumented timing.ttft_ms.
| Metric | Internal | External | Ratio |
|---|
| Mean latency | 94.849 ms | 97.927 ms | 1.032x |
| Task | Internal | External |
|---|
| general_short | 150.767 ms | 152.274 ms |
| receipt_extract | 80.732 ms | 81.270 ms |
| search_grounded | 46.945 ms | 57.237 ms |
| summarize_short | 100.950 ms | 100.928 ms |
Internal routing is faster across all tasks (ratio > 1 means internal wins).
| Set | Runtime HTTP request mean | Runtime HTTP request p99 |
|---|
| T4 (2026-02-15) | 146279.609 ms | 156769.1 ms |
| G5 (2026-02-15) | 77449.605 ms | 83346.187 ms |
| G5 registry-cached single run (2026-02-16) | 82.913 ms | 91.877 ms |
| Set | Checked | Failed | Strict |
|---|
| T4 | 3 | 0 | true |
| G5 | 3 | 0 | true |