# GAPartNet Kinematic Segmenter -> PAct Trial

Date: `2026-05-17`

This run tests a replacement for the failed Appendix-D style preprocessing on GAPartNet CAD renders. Instead of asking `VLM -> SAM2 -> VLM classification -> VLM merge` to infer part masks from rendered facets, I trained a small supervised segmenter on GAPartNet's dataset-native semantic/kinematic masks and exported the predicted view-0 masks in PAct's expected input format.

This is not an official PAct fine-tune and does not modify PAct's Stage 1 or Stage 2 model. It is a preprocessing/conditioning experiment: can we predict the kind of 2D part mask PAct actually needs, then run official PAct inference unchanged?

## Setup

- Training data: 6 GAPartNet objects, views `1-14`.
- Validation data: the same 6 objects, views `15-19`.
- Prediction target: view `0`.
- Input: rendered RGBA image.
- Target: cleaned, contiguous GAPartNet part mask derived from `semantic_masks_merge_fixed/*.npz`.
- Model: small U-Net style RGB-A segmenter, maximum 9 labels.
- PAct inference: official `infer_imgs.py`, no PAct weights changed.

## Segmenter Metrics

| metric | value |
|---|---:|
| train items | 84 |
| val items | 30 |
| best observed val mean part IoU | 0.637 at step 450 |
| final val mean part IoU | 0.613 at step 500 |
| final val pixel accuracy | 0.886 |
| view-0 predicted-mask mean IoU | 0.410 |

Per-sample view-0 mask IoU:

| sample | category | predicted-mask IoU |
|---|---|---:|
| `GAPartNet_Refrigerator_10068` | Refrigerator | 0.806 |
| `GAPartNet_StorageFurniture_35059` | StorageFurniture | 0.719 |
| `GAPartNet_Dishwasher_11622` | Dishwasher | 0.469 |
| `GAPartNet_Microwave_7119` | Microwave | 0.286 |
| `GAPartNet_Table_19179` | Table | 0.163 |
| `GAPartNet_Oven_101773` | Oven | 0.016 |

![mask IoU metrics](kinematic_segmenter_metrics.png)

## PAct Export QA

Official PAct inference completed on all 6 predicted-mask inputs and exported articulated objects for all of them.

| sample | exported parts | vertices | semantic names |
|---|---:|---:|---|
| `GAPartNet_Dishwasher_11622` | 2 | 13,532 | drawer, base |
| `GAPartNet_Microwave_7119` | 2 | 10,851 | base, door |
| `GAPartNet_Oven_101773` | 2 | 43,159 | base, door |
| `GAPartNet_Refrigerator_10068` | 3 | 28,162 | door, base, drawer |
| `GAPartNet_StorageFurniture_35059` | 2 | 32,851 | base, door |
| `GAPartNet_Table_19179` | 2 | 32,329 | door, base |

The QA pass here only means the export is non-empty, not that the articulation is correct. The visual comparison below is the real readout.

![PAct comparison](kinematic_segmenter_pact_comparison.png)

## Conclusion

This route is meaningfully better aligned with PAct than the Appendix-D VLM/SAM2 trial, because it learns the dataset's actual kinematic mask protocol instead of segmenting rendered facets. However, the first small-scale model is not production-ready:

- Strong cases: refrigerator and storage furniture. Their predicted masks are close to the GT-derived mask and PAct outputs look structurally plausible.
- Usable but imperfect: dishwasher. The main body is captured, but the mask boundaries are already distorted.
- Failed or weak cases: oven, table, microwave. PAct still exports non-empty mesh, but the conditioning mask is wrong enough that the joint/part semantics are not trustworthy.

The practical decision is therefore:

1. Keep the official PAct inference path unchanged.
2. Replace the fragile VLM/SAM2 preprocessing with a supervised dataset-native part segmenter.
3. Add a strict mask-quality gate before PAct inference. If predicted-mask confidence/IoU proxy is low, the sample should be marked as not reliable instead of silently exported.
4. Scale training beyond 6 objects, with category-conditioned sampling and more view diversity, before using it as the default GAPartNet preprocessing route.

## Artifacts

- Segmenter script: `evaluations/pact_kinematic_segmenter_20260517/run_kinematic_segmenter.py`
- Training/prediction report: `training_and_prediction_report.json`
- Predicted PAct inputs: `pact_inputs_predicted_view0/`
- Official PAct inference outputs: `inference_pact_from_predicted_masks/`
- QA report: `inference_pact_from_predicted_masks/qa_report.json`