PAct Official Failure Modes — phenomena run

Dataset audit: True · samples: 5 · mean weighted score: 46.47

manual benchmark_index selection: 19, 22, 27, 35, 72

report.json · report.md

#19 GRScenes / architectural_fixtures

VLM card

score 52.5 · part MAE 0 · joint F1 0.33

skipped

#22 ArtVIP / household_items

VLM card

score 36.1 · part MAE 0 · joint F1 0.00

skipped

#27 ArtVIP / household_items

VLM card

score 55.6 · part MAE 0 · joint F1 0.50

skipped

#35 ArtVIP / household_items

VLM card

score 35.5 · part MAE 0 · joint F1 0.00

skipped

#72 PartNetMobility / major_appliances

VLM card

score 52.6 · part MAE 0 · joint F1 0.22

skipped