PAct Min-Evidence Span2 Hard-5 Official Defaults

Dataset audit: True · samples: 5 · mean weighted score: 50.81

manual benchmark_index selection: 0, 4, 26, 73, 97

report.json · report.md

#00 ArtVIP / major_appliances

VLM card

score 55.9 · part MAE 0 · joint F1 0.36

skipped

#04 GAPartNet / small_appliances

VLM card

score 68.8 · part MAE 3 · joint F1 0.74

skipped

#26 GAPartNet / electronics

VLM card

score 46.5 · part MAE 1 · joint F1 0.11

skipped

#73 PartNetMobility / electronics

VLM card

score 44.4 · part MAE 0 · joint F1 0.08

skipped

#97 PartNetMobility / major_appliances

VLM card

score 38.5 · part MAE 1 · joint F1 0.00

skipped