# Textured Source-Mask Official PAct Trial

Generated: `2026-05-17T15:46:29Z`

## Goal

Test whether replacing GAPartNet's weak rendered inputs with locally available textured source assets from the GRScenes/ArtVIP-side packages improves official PAct inference, while keeping PAct itself as close to official inference as possible.

## What Was Kept Official

- PAct inference used the official script: `/data/250010098/PAct/infer_imgs.py`.
- Model weights used the local official checkpoint cache: `/data/250010098/PAct-Transporter/model_cache/PAct000/PAct`.
- No architecture changes, no extra heads, and no fine-tuning were used in this trial.
- The input layout follows PAct's expected image triplet: natural-ish RGB `_processed.png`, part-label `_mask.exr`, and optional preview overlays.

## What Was Not Fully Official

This is not the paper's Appendix-D preprocessing route. The paper route is approximately:

`GPT/VLM part proposal -> SAM2 segmentation -> VLM part classification -> VLM merge`.

Here, I used source-native SDF/link structure to render RGB and link masks. This is closer to an upper-bound diagnostic for PAct's input format than to a full official data preparation reproduction.

Local GRScenes caveat: the locally available GRScenes releases inspected here mostly mark objects as `is_textured: false`. Therefore the first texture-positive test used storage/suitcase assets from the local `artvip_sdf` package whose metadata points back to textured PartNetMobility-style suitcase sources.

## Inputs

Rendered with:

`/data/250010098/PAct-Transporter/evaluations/pact_textured_source_preprocess_20260517/render_textured_sdf_to_pact_pyrender.py`

Input root:

`/data/250010098/Unified_dataset/packages/scenesmith_unified_articulated_v0.4_sourcefaithful_scenesmith_format_20260507/artvip_sdf/storage`

Samples:

| sample | source links | visual meshes | source-mask note |
|---|---:|---:|---|
| `839` | 5 | 224 | multiple tiny side/handle labels; hardest case |
| `840` | 4 | 24 | body-dominant mask with a small movable label |
| `842` | 2 | 32 | cleanest two-part mask |

Manifest:

`pact_inputs_textured_source_masks/manifest.json`

## Official PAct Run

Command:

```bash
/data/250010098/conda_envs/trellis2/bin/python /data/250010098/PAct/infer_imgs.py \
  --data_dir pact_inputs_textured_source_masks \
  --outdir inference_official_pact_textured_source_masks \
  --model_path /data/250010098/PAct-Transporter/model_cache/PAct000/PAct \
  --batch_size 1 --export_arti_objects --save_glb \
  --arti_out_mode mean_feature_regression_steps --arti_mean_num 20 \
  --slat_cfg_strength 7.0 --ss_cfg_strength 7.0 \
  --ss_steps 12 --slat_steps 12 --grid_size 3 \
  --no-save_video_grid --no-save_cond_vis_grid \
  --render_num_frames 12 --video_fps 6 \
  --mesh_simplify_ratio 0.95 --texture_size 512
```

Output root:

`inference_official_pact_textured_source_masks/seed42_slatcfg7.0_sscfg7.0_sssteps12_slatsteps12_artioutmean_feature_regression_steps`

## Mesh QA

| sample | exported parts | semantic names | vertices | faces | coarse QA |
|---|---:|---|---:|---:|---|
| `839` | 3 | `base`, `door` | 20368 | 33380 | pass |
| `840` | 2 | `base`, `door` | 13783 | 22453 | pass |
| `842` | 2 | `base`, `door` | 12329 | 19979 | pass |

QA report:

`inference_official_pact_textured_source_masks/qa_report.json`

## Visual Readout

Comparison sheet:

`textured_source_pact_comparison.png`

The result is materially better than the earlier GAPartNet fragmented outputs:

- `842` is the cleanest case. The generated object preserves the red suitcase-like body and produces a plausible two-part articulated object.
- `840` keeps a coherent brown storage/suitcase body, though the generated object is simplified and still does not fully reproduce source geometry details.
- `839` is usable but visibly weaker. The central body is recognizable, yet side handles/small labels turn into mismatched small dark parts and are not physically reliable.

## Conclusion

Textured source assets plus source-native masks do help PAct avoid the worst residual-fragment failure mode. The official model can export complete meshes and plausible simple articulated structures when the conditioning image/mask pair is visually clean and close to PAct's expected distribution.

But this does not solve the deeper problem by itself. The improvement is strongest for simple two-part suitcase/storage cases. For dense multi-link or tiny-part objects, PAct still tends to simplify, merge, or misplace small articulated components. The more reliable path is therefore:

1. Keep official PAct inference as the geometry/articulation generator.
2. Replace noisy generic 2D segmentation with dataset-native or learned kinematic part masks.
3. Prefer textured/natural RGB conditioning over semantic-color CAD renders.
4. Add a mask-quality gate before PAct: reject tiny-label-heavy masks or merge them according to kinematic semantics before inference.