# PAct hard-case OT + VLM exploration

Generated: `2026-05-14T22:13:10+00:00`

Portal: `http://106.14.105.96:28080/experiments/pact-transporter-hardcase-ot-vlm-20260514/index.html`

This run uses the hardest samples from the official non-PM diagnostic set and compares six concrete edits against the raw official PAct outputs.
Each edited method now exports a full transformed mesh GLB in addition to the lightweight box proxy.

## Scope

- hard samples: `11`
- caption model: `microsoft/git-base`
- caption prompt: `Unprompted image caption from microsoft/git-base.`

## Summary

| variant | strict F1 | count err | axis err | tree valid |
|---|---:|---:|---:|---:|
| raw | 0.000 | 1.727 | 65.45 | 1.000 |
| OT-2D | 0.000 | 2.818 | 90.00 | 1.000 |
| OT-Proto | 0.000 | 1.182 | 90.00 | 1.000 |
| NonOT-Hier | 0.000 | 1.909 | 65.45 | 1.000 |
| VLM-Seg | 0.000 | 2.909 | 90.00 | 1.000 |
| VLM-Joint | 0.000 | 1.182 | 90.00 | 1.000 |
| VLM-Struct | 0.000 | 1.909 | 65.45 | 1.000 |

## Interpretation

OT-2D is the segmentation-aware transport attempt.
OT-Proto is the prototype-bank joint attempt.
NonOT-Hier is the structure-first hierarchy repair.
VLM-Seg, VLM-Joint and VLM-Struct use the caption model to inject semantic priors into count, joint and structure decisions.

## Hard Samples

- `electronics_104011`: `this is a vector illustration of a computer screen.`
- `electronics_103972`: `digital art selected for the #`
- `electronics_103867`: `digital art selected for the #`
- `electronics_103978`: `3d model of a box`
- `small_appliances_103043`: `digital art selected for the #`
- `electronics_104020`: `digital art selected for the #`
- `electronics_103988`: `a box of chocolates.................................`
- `electronics_103878`: `the box for the printer.`
- `electronics_104030`: `the box in the middle of the road`
- `small_appliances_103016`: `3d model of a box`
- `small_appliances_103466`: `3d model of a box`
