# PAct Phase1B OT Router Official-Default Evaluation

- Samples: `0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99`
- Sampler policy: PAct official defaults; no `--ss_steps`, `--slat_steps`, `--ss_cfg_strength`, or `--slat_cfg_strength` passed.
- Dataset validation: `True`

| Variant | Mean score | Part MAE | Joint F1 | Success |
| --- | ---: | ---: | ---: | ---: |
| Official PAct | 40.33 | 2.05 | 0.10 | 1.00 |
| OT first | 39.37 | 2.06 | 0.08 | 1.00 |
| Mask prior | 40.26 | 2.06 | 0.10 | 1.00 |
| RGB edge | 40.48 | 2.06 | 0.10 | 1.00 |
| Virtual patch | 39.52 | 2.07 | 0.08 | 1.00 |
| First-third edge | 41.68 | 2.14 | 0.13 | 1.00 |

## Readout

- All 6 variants completed 100/100 exports. No VLM analysis was run.
- Command audit passed: none of the variant commands include `--ss_steps`, `--slat_steps`, `--ss_cfg_strength`, or `--slat_cfg_strength`; all logs resolve to `sssteps25_slatsteps25`.
- The best aggregate result is `First-third edge`: mean score `41.68`, a `+1.35` delta over official PAct, with `60/100` samples improved and mean joint F1 rising from `0.104` to `0.129`.
- `RGB edge` is nearly neutral-positive: mean score `40.48`, a `+0.15` delta, with joint F1 essentially tied to official PAct.
- `OT first`, `Mask prior`, and `Virtual patch` do not provide a reliable aggregate gain in this full Eval100 run.
- Part count remains the bottleneck: the best-scoring variant does not reduce part-count MAE (`2.14` vs official `2.05`). Dense objects such as #26 and #73 still collapse badly in all variants.

Conclusion: Phase 1B is complete on Eval100. The Stage1 OT route is runnable at official-default sampling and a wider first-third edge injection gives a modest score/joint-F1 gain, but the current Phase 1 implementation does not yet solve the central part-decomposition/cardinality problem. Phase 2 should focus on stronger mask softening/dropout and edge/marginal constraints rather than treating Phase 1 as solved scientifically.
