# PAct Phase1B OT Router Official-Default Evaluation

- Samples: `29, 60, 62, 74, 91`
- Sampler policy: PAct official defaults; no `--ss_steps`, `--slat_steps`, `--ss_cfg_strength`, or `--slat_cfg_strength` passed.
- Dataset validation: `True`

| Variant | Mean score | Part MAE | Joint F1 | Success |
| --- | ---: | ---: | ---: | ---: |
| Official PAct | 50.35 | 0.00 | 0.20 | 1.00 |
| OT first | 42.04 | 0.00 | 0.00 | 1.00 |
| Mask prior | 42.47 | 0.00 | 0.00 | 1.00 |
| RGB edge | 42.49 | 0.00 | 0.00 | 1.00 |
| Virtual patch | 42.51 | 0.00 | 0.00 | 1.00 |
| First-third edge | 51.07 | 0.00 | 0.20 | 1.00 |

## Readout

- This Phase1B run used official-default PAct sampling for every row. The saved commands do not include `--ss_steps`, `--slat_steps`, `--ss_cfg_strength`, or `--slat_cfg_strength`; all stdout logs resolve to `sssteps25_slatsteps25`.
- All variants exported valid objects for all 5 samples, and all 30 VLM calls completed.
- On this easy 2-part subset, part-count separation is saturated: every variant predicts 2/2 parts for every sample, so `part_count_mae=0` does not prove better part segmentation.
- The wider `first-third edge` route is the only OT variant that slightly beats official PAct on the mean weighted score (`51.07` vs `50.35`) while keeping the same mean joint F1 (`0.20`).
- Single-block OT variants are stable but currently worse than official PAct on this subset, mainly because sample #62 drops from the official score `82.05` to about `40-41`.

Conclusion: Phase 1B is now completed as an official-default pilot evaluation. The result is not a strong positive proof yet; it says the OT route is runnable and one wider edge variant is competitive, but the current easy subset is too saturated in part count to validate part-separation gains. The next Phase 1 check should use harder multi-part cases from Eval100.
