PAct hard-case OT + VLM exploration

Hard subset selected from the official 100-sample non-PM diagnostics. VLM uses a local image-to-text model and all variants are compared with the same strict scorer.

JSON · Markdown · Portal

hard samples11
caption modelmicrosoft/git-base
raw strict F10.000
best strict F10.000

Variant Summary

variantstrict F1count erroraxis errortree valid
OT-2D0.0002.81890.001.000
OT-Proto0.0001.18290.001.000
NonOT-Hier0.0001.90965.451.000
VLM-Seg0.0002.90990.001.000
VLM-Joint0.0001.18290.001.000
VLM-Struct0.0001.90965.451.000

Charts

Method buttons below load full transformed mesh GLBs for OT-2D, OT-Proto, NonOT-Hier, VLM-Seg, VLM-Joint, and VLM-Struct. Box proxy GLBs are retained in report.json as proxy_glb.

electronics_104011

Printer · caption: this is a vector illustration of a computer screen.

processed image mask image
raw F1 0.000 best F1 0.000

electronics_103972

Printer · caption: digital art selected for the #

processed image mask image
raw F1 0.000 best F1 0.000

electronics_103867

Printer · caption: digital art selected for the #

processed image mask image
raw F1 0.000 best F1 0.000

electronics_103978

Printer · caption: 3d model of a box

processed image mask image
raw F1 0.000 best F1 0.000

small_appliances_103043

CoffeeMachine · caption: digital art selected for the #

processed image mask image
raw F1 0.000 best F1 0.000

electronics_104020

Printer · caption: digital art selected for the #

processed image mask image
raw F1 0.000 best F1 0.000

electronics_103988

Printer · caption: a box of chocolates.................................

processed image mask image
raw F1 0.000 best F1 0.000

electronics_103878

Printer · caption: the box for the printer.

processed image mask image
raw F1 0.000 best F1 0.000

electronics_104030

Printer · caption: the box in the middle of the road

processed image mask image
raw F1 0.000 best F1 0.000

small_appliances_103016

CoffeeMachine · caption: 3d model of a box

processed image mask image
raw F1 0.000 best F1 0.000

small_appliances_103466

Toaster · caption: 3d model of a box

processed image mask image
raw F1 0.000 best F1 0.000