# Official Appendix-D Style Preprocessing Trial

Date: `2026-05-17`

This trial reruns PAct preprocessing as closely as the local released code allows to the paper's Appendix-D route:

`VLM granularity -> SAM2 candidate masks -> VLM part classification -> VLM part merging -> PAct mask.exr -> official PAct inference`

It is intentionally separated from the earlier GAPartNet GT-derived preprocessing route. No GT mask cleanup, tiny-label filtering, or contiguous-label repair is used here.

## Setup

- Appendix-D runner: `/data/250010098/PAct/scripts/run_vlm_sam2_mask_labeling.py`
- Local implementation note: `/data/250010098/PAct/PAPER_APPENDIX_D_VLMSAM2_REPRO_20260419.md`
- SAM2 model: `facebook/sam2.1-hiera-small`
- VLM backend: OpenAI-compatible GLM visual model, `glm-4.6v-Flash`
- PAct model: `/data/250010098/PAct-Transporter/model_cache/PAct000/PAct`
- Source images: repaired RGB render inputs from `pact_repair_visual_quality_20260517/inference_inputs_rgb_fixed_six`
- Completed official-style preprocess samples: 5
- Failed sample: `GAPartNet_StorageFurniture_35059`, blocked by repeated VLM `429` during Stage 1

This is still not a perfect paper reproduction because the paper describes GPT/VLM usage, while this machine used the available GLM visual endpoint. Structurally, however, this is the official Appendix-D style pipeline: VLM + SAM2 + VLM merge.

## Preprocessing Result

| sample | VLM Stage0 category | SAM2 segments | final parts | final part status |
|---|---|---:|---:|---|
| `GAPartNet_Dishwasher_11622` | Air Conditioner | 6 | 6 | all `leftover_segment_*: fixed` |
| `GAPartNet_Microwave_7119` | Appliance, tall with upper display panel | 2 | 2 | all `leftover_segment_*: fixed` |
| `GAPartNet_Oven_101773` | refrigerator | 6 | 1 | `part: fixed` |
| `GAPartNet_Refrigerator_10068` | suitcase | 5 | 5 | all `leftover_segment_*: fixed` |
| `GAPartNet_Table_19179` | cabinet | 6 | 6 | all `leftover_segment_*: fixed` |
| `GAPartNet_StorageFurniture_35059` | piano | incomplete | incomplete | VLM Stage1 repeatedly returned `429` |

The failure is already visible before PAct inference: SAM2 follows rendered faces and color/material facets rather than kinematic bodies, and Stage 1 classification returns no semantic segments for the completed samples. Stage 2 therefore falls back to fixed leftover segments.

## PAct Inference QA

Official PAct inference was run on the 5 completed Appendix-D masks.

Output root:

`inference_official_appendix_d_5/seed42_slatcfg7.0_sscfg7.0_sssteps8_slatsteps8_artioutmean_feature_regression_steps`

Mesh QA:

| sample | QA status | reason | exported parts | notes |
|---|---|---|---:|---|
| `GAPartNet_Dishwasher_11622` | pass | geometry exists | 6 | generated mesh, but driven by facet-like mask |
| `GAPartNet_Microwave_7119` | pass | geometry exists | 2 | no meaningful articulation |
| `GAPartNet_Oven_101773` | fail | too few parts | 1 | semantic collapse to base |
| `GAPartNet_Refrigerator_10068` | fail | missing `object.json` | 4 GLB parts | export incomplete in this 5-sample run |
| `GAPartNet_Table_19179` | pass | geometry exists | 5 | visual opening exists but mask source is non-semantic |

Raw QA pass rate is `3/5 = 0.60`. This number should not be interpreted as success, because the QA script only checks coarse mesh existence. The visual and intermediate JSON show that the official-style masks are not semantically valid for these synthetic GAPartNet renders.

## Visual Evidence

![Appendix-D vs clean baseline](appendix_d_vs_clean_baseline_visual_comparison_5.png)

The comparison columns are:

1. input RGB,
2. SAM2 overlay,
3. final Appendix-D mask,
4. PAct output from Appendix-D mask,
5. earlier GT-derived clean-mask baseline.

The left three columns show the actual problem. Appendix-D masks split diagonal render facets, thin material strips, and handles instead of stable kinematic bodies. The GT-derived clean baseline is not an official Appendix-D reproduction, but it demonstrates that PAct itself behaves much better when the 2D mask protocol matches part bodies.

## Conclusion

For these GAPartNet CAD-render inputs, the official Appendix-D style pipeline is currently a negative result. It is not robust to our synthetic render distribution:

- VLM category recognition is unstable on CAD renders.
- SAM2 segments rendered mesh facets instead of functional parts.
- VLM Stage1 produced empty classifications for completed samples.
- Stage2 therefore produced fixed leftover segments, not articulated semantic parts.
- Downstream PAct can still synthesize mesh, but articulation quality is not reliable because the conditioning mask is wrong.

This supports the earlier diagnosis: for GAPartNet training/evaluation, a dataset-native preprocessing route is required. The official Appendix-D route is more appropriate for real or natural-looking single images, while GAPartNet should use semantic/kinematic supervision or a render pipeline that exposes actual part-body masks.

## Files

- Visual comparison: `appendix_d_vs_clean_baseline_visual_comparison_5.png`
- 5-sample QA: `inference_official_appendix_d_5/qa_report.json`
- 5-sample PAct inputs: `inference_inputs_appendix_d_5/`
- Stage outputs: `preprocess/<sample>/stage0_granularity.json`, `stage1_classification.json`, `stage2_merge.json`, `summary.json`
- Exported articulated objects: `inference_official_appendix_d_5/seed42_slatcfg7.0_sscfg7.0_sssteps8_slatsteps8_artioutmean_feature_regression_steps/exported_arti_objects/`