Scenes
7
scenesmith / blender_rebuild
hssd_retrieved_bedroom
A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
1.000
score
coverage 83.33%
effective pass 100.00%
pass 10
unknown 2
Scene Summary
Total
12
Pass
10
Degraded
0
Fail
0
Unknown
2
Coverage
83.33%
Eff. Pass
100.00%
Score
1.000
Judgement
bed · sittable, supportable, containable
VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `interaction_clearance` metric for `bedroom_bed_0` with affordance `sittable`.
Relevant context: bedroom_rug_0, bedroom_nightstand_0.
Check whether the immediate sit/stand zone at the usable sitting side has enough free space for a normal sit/stand motion. Consider the combined effect of nearby furniture around the seat edge; multiple pieces that collectively pinch the sit/stand zone should count as blocked even if no single item fully closes it. Do not require 360-degree circulation around the furniture. For a bed, if at least one bedside remains usable for a normal sit/stand motion, prefer `pass` over `degraded`.
If you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.
Reason:
In the local check render and the highlighted local_context render, the left bedside is tight against the wall/nightstand side, but the opposite long side of the bed remains open enough for a normal sit/stand motion. In the overview renders, the green bench is nearby but does not visibly intrude into that usable bedside zone, so at least one practical sitting edge remains clear for the bed's sittable use. The rug appears adjacent under/along the bed area but does not itself block the immediate sit/stand sweep zone shown in the local evidence.
Blocking Objects:
none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes. Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved. Workflow: 1. Read the task instruction and target metric. 2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction. 3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green. 4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level. 5. Evaluate ONLY the requested metric and return a JSON object matching the schema. Scene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m. Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Metric: interaction_clearance Subject object: bedroom_bed_0 Target objects: bedroom_rug_0, bedroom_nightstand_0 Verified asset facts: - subject `bedroom_bed_0`: category `bed`, affordances [sittable, supportable, containable], usable face `top`, benchmark relevance `functional`, confidence `0.74`, source `asset_annotation` - interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [sit, access_contents]; top: [place_or_reach_objects]` - low-confidence candidates: `none` Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict. Scene evidence object hints: - subject `bedroom_bed_0`: category `bed`, group `sleeping`, keywords [bed, bedframe, mattress, sleeping area], affordances [sittable, supportable, containable], task targets [desk, table] - target `bedroom_rug_0`: category `rug`, group `soft_furnishing`, keywords [rug, carpet, floor mat], task targets [graspable_object] - target `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object] Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context): - subject `bedroom_bed_0`: category `bed`, room `bedroom`, placement `floor`, footprint `large`, height `tall`, elevation `floor_level` Target cues: - target `bedroom_rug_0`: category `rug`, room `bedroom`, placement `floor`, footprint `large`, height `tall`, elevation `floor_level`, subject_proximity `touching` - target `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `mid` Nearby same-room context objects that may affect access or clearance: - `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `medium` - `bedroom_alarm_clock_0`: category `alarm clock`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `tiny` - `bedroom_nightstand_1`: category `nightstand`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `small` - `bedroom_bedroom_bench_0`: category `bench`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium` - `bedroom_jewelry_catchall_dish_0`: category `butter dish`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `tiny` - `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny` Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence. Metric-specific rule: Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `interaction_clearance` metric for `bedroom_bed_0` with affordance `sittable`. Relevant context: bedroom_rug_0, bedroom_nightstand_0. Check whether the immediate sit/stand zone at the usable sitting side has enough free space for a normal sit/stand motion. Consider the combined effect of nearby furniture around the seat edge; multiple pieces that collectively pinch the sit/stand zone should count as blocked even if no single item fully closes it. Do not require 360-degree circulation around the furniture. For a bed, if at least one bedside remains usable for a normal sit/stand motion, prefer `pass` over `degraded`. If you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers. Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green. Evidence guardrails: - Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space. - Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete. - To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap. - When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`. - If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`. Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence. Allowed labels: pass, degraded, fail, unknown. 2D perspective overlap alone is not enough evidence of a blocker. Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region. If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region. If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
"check_id": "interaction_clearance__bedroom_bed_0__to__bedroom_rug_0__bedroom_nightstand_0",
"metric": "interaction_clearance",
"subject_id": "bedroom_bed_0",
"target_ids": [
"bedroom_rug_0",
"bedroom_nightstand_0"
],
"prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: interaction_clearance\nSubject object: bedroom_bed_0\nTarget objects: bedroom_rug_0, bedroom_nightstand_0\n\nVerified asset facts:\n- subject `bedroom_bed_0`: category `bed`, affordances [sittable, supportable, containable], usable face `top`, benchmark relevance `functional`, confidence `0.74`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [sit, access_contents]; top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_bed_0`: category `bed`, group `sleeping`, keywords [bed, bedframe, mattress, sleeping area], affordances [sittable, supportable, containable], task targets [desk, table]\n- target `bedroom_rug_0`: category `rug`, group `soft_furnishing`, keywords [rug, carpet, floor mat], task targets [graspable_object]\n- target `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object]\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_bed_0`: category `bed`, room `bedroom`, placement `floor`, footprint `large`, height `tall`, elevation `floor_level`\nTarget cues:\n- target `bedroom_rug_0`: category `rug`, room `bedroom`, placement `floor`, footprint `large`, height `tall`, elevation `floor_level`, subject_proximity `touching`\n- target `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `mid`\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `medium`\n- `bedroom_alarm_clock_0`: category `alarm clock`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `tiny`\n- `bedroom_nightstand_1`: category `nightstand`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `small`\n- `bedroom_bedroom_bench_0`: category `bench`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`\n- `bedroom_jewelry_catchall_dish_0`: category `butter dish`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `tiny`\n- `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `interaction_clearance` metric for `bedroom_bed_0` with affordance `sittable`.\nRelevant context: bedroom_rug_0, bedroom_nightstand_0.\nCheck whether the immediate sit/stand zone at the usable sitting side has enough free space for a normal sit/stand motion. Consider the combined effect of nearby furniture around the seat edge; multiple pieces that collectively pinch the sit/stand zone should count as blocked even if no single item fully closes it. Do not require 360-degree circulation around the furniture. For a bed, if at least one bedside remains usable for a normal sit/stand motion, prefer `pass` over `degraded`.\nIf you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
"files": {
"render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/interaction_clearance__bedroom_bed_0__to__bedroom_rug_0__bedroom_nightstand_0_local.png",
"render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/interaction_clearance__bedroom_bed_0__to__bedroom_rug_0__bedroom_nightstand_0_local_context.png",
"render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_00_rgb.png",
"render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_01_rgb.png",
"render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_02_rgb.png",
"render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_03_rgb.png"
}
}
Result JSON
{
"check_id": "interaction_clearance__bedroom_bed_0__to__bedroom_rug_0__bedroom_nightstand_0",
"metric": "interaction_clearance",
"label": "pass",
"asset_fact_used": true,
"asset_fact_confidence": 0.74,
"evidence_conflict": false,
"reason": "In the local check render and the highlighted local_context render, the left bedside is tight against the wall/nightstand side, but the opposite long side of the bed remains open enough for a normal sit/stand motion. In the overview renders, the green bench is nearby but does not visibly intrude into that usable bedside zone, so at least one practical sitting edge remains clear for the bed's sittable use. The rug appears adjacent under/along the bed area but does not itself block the immediate sit/stand sweep zone shown in the local evidence.",
"blocking_objects": [],
"confidence": 0.76
}
Judgement
wardrobe · openable, containable, supportable, drawable
VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `interaction_clearance` metric for `bedroom_wardrobe_0` with affordance `openable`.
Relevant context: the surrounding scene.
First infer the most likely operating side from door panels, handles, hinges, or the side that faces into the room. Then check only that side for a normal standing zone and immediate hand/door operating clearance. Do not assume a large hinged swing arc unless the render clearly shows hinged doors needing that arc. Do not require the entire front face to be empty; a nearby bench or seat off to one side is acceptable when the floor directly before the door/handle area remains usable. A narrow diagonal side gap or a squeeze-by path does not count as clear operating space.
If you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.
Reason:
In the local operating zone render and the local operating zone context render, the standing/opening area directly in front of the wardrobe’s room-facing side is shown as open and not occupied by another object. In the local depth probe and local plan context views, the nearby bed/bench elements remain offset to the side and do not span the immediate operating zone in front of the wardrobe door area, so a practical front or slightly offset standing position remains usable for opening it normally. The overview renders are consistent with this corner placement and do not show a direct local blocker at the wardrobe’s operating side.
Blocking Objects:
none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes. Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved. Workflow: 1. Read the task instruction and target metric. 2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction. 3. Local render order and intended use: 1. `local_operating_zone` is the clean RGB view for the most likely door-facing standing and opening zone. 2. `local_operating_zone_context` is a slightly wider highlighted context view of the same operating side. 3. `local_depth_probe` is an oblique RGB view meant to clarify depth and whether a blocker sits in front of the operating zone. 4. `local_plan_context` is a high local layout view used to check whether a nearby object actually spans or occupies the operating/approach zone rather than merely overlapping in perspective. 4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level. 5. Evaluate ONLY the requested metric and return a JSON object matching the schema. Scene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m. Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Metric: interaction_clearance Subject object: bedroom_wardrobe_0 Target objects: none Verified asset facts: - subject `bedroom_wardrobe_0`: category `wardrobe`, affordances [openable, containable, supportable, drawable], usable face `top`, benchmark relevance `functional`, confidence `0.92`, source `asset_annotation` - interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [open, access_contents, open]; top: [place_or_reach_objects]` - low-confidence candidates: `none` Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict. Scene evidence object hints: - subject `bedroom_wardrobe_0`: category `wardrobe`, group `storage`, keywords [wardrobe, closet, armoire], affordances [openable, containable, supportable, drawable], task targets [wall, clear_space] - targets: none Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context): - subject `bedroom_wardrobe_0`: category `wardrobe`, room `bedroom`, placement `floor`, footprint `large`, height `mid`, elevation `floor_level` Target cues: - none Nearby same-room context objects that may affect access or clearance: - `bedroom_smoke_detector_0`: category `smoke detector`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `tiny` - `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `tiny` - `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large` - `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `large` - `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium` - `bedroom_bedroom_bench_0`: category `bench`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `medium` Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence. Metric-specific rule: Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `interaction_clearance` metric for `bedroom_wardrobe_0` with affordance `openable`. Relevant context: the surrounding scene. First infer the most likely operating side from door panels, handles, hinges, or the side that faces into the room. Then check only that side for a normal standing zone and immediate hand/door operating clearance. Do not assume a large hinged swing arc unless the render clearly shows hinged doors needing that arc. Do not require the entire front face to be empty; a nearby bench or seat off to one side is acceptable when the floor directly before the door/handle area remains usable. A narrow diagonal side gap or a squeeze-by path does not count as clear operating space. If you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing. Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green. Evidence guardrails: - Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space. - Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete. - To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap. - When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`. - If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`. Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence. Allowed labels: pass, degraded, fail, unknown. 2D perspective overlap alone is not enough evidence of a blocker. Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region. If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region. If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
"check_id": "interaction_clearance__bedroom_wardrobe_0",
"metric": "interaction_clearance",
"subject_id": "bedroom_wardrobe_0",
"target_ids": [],
"prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local_operating_zone` is the clean RGB view for the most likely door-facing standing and opening zone. 2. `local_operating_zone_context` is a slightly wider highlighted context view of the same operating side. 3. `local_depth_probe` is an oblique RGB view meant to clarify depth and whether a blocker sits in front of the operating zone. 4. `local_plan_context` is a high local layout view used to check whether a nearby object actually spans or occupies the operating/approach zone rather than merely overlapping in perspective.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: interaction_clearance\nSubject object: bedroom_wardrobe_0\nTarget objects: none\n\nVerified asset facts:\n- subject `bedroom_wardrobe_0`: category `wardrobe`, affordances [openable, containable, supportable, drawable], usable face `top`, benchmark relevance `functional`, confidence `0.92`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [open, access_contents, open]; top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_wardrobe_0`: category `wardrobe`, group `storage`, keywords [wardrobe, closet, armoire], affordances [openable, containable, supportable, drawable], task targets [wall, clear_space]\n- targets: none\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_wardrobe_0`: category `wardrobe`, room `bedroom`, placement `floor`, footprint `large`, height `mid`, elevation `floor_level`\nTarget cues:\n- none\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_smoke_detector_0`: category `smoke detector`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`\n- `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `large`\n- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`\n- `bedroom_bedroom_bench_0`: category `bench`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `medium`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `interaction_clearance` metric for `bedroom_wardrobe_0` with affordance `openable`.\nRelevant context: the surrounding scene.\nFirst infer the most likely operating side from door panels, handles, hinges, or the side that faces into the room. Then check only that side for a normal standing zone and immediate hand/door operating clearance. Do not assume a large hinged swing arc unless the render clearly shows hinged doors needing that arc. Do not require the entire front face to be empty; a nearby bench or seat off to one side is acceptable when the floor directly before the door/handle area remains usable. A narrow diagonal side gap or a squeeze-by path does not count as clear operating space.\nIf you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
"files": {
"render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/interaction_clearance__bedroom_wardrobe_0_local_operating_zone.png",
"render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/interaction_clearance__bedroom_wardrobe_0_local_operating_zone_context.png",
"render_check_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/interaction_clearance__bedroom_wardrobe_0_local_depth_probe.png",
"render_check_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/interaction_clearance__bedroom_wardrobe_0_local_plan_context.png",
"render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_00_rgb.png",
"render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_01_rgb.png",
"render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_02_rgb.png",
"render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_03_rgb.png"
}
}
Result JSON
{
"check_id": "interaction_clearance__bedroom_wardrobe_0",
"metric": "interaction_clearance",
"label": "pass",
"asset_fact_used": true,
"asset_fact_confidence": 0.92,
"evidence_conflict": false,
"reason": "In the local operating zone render and the local operating zone context render, the standing/opening area directly in front of the wardrobe’s room-facing side is shown as open and not occupied by another object. In the local depth probe and local plan context views, the nearby bed/bench elements remain offset to the side and do not span the immediate operating zone in front of the wardrobe door area, so a practical front or slightly offset standing position remains usable for opening it normally. The overview renders are consistent with this corner placement and do not show a direct local blocker at the wardrobe’s operating side.",
"blocking_objects": [],
"confidence": 0.82
}
Judgement
bench · sittable, supportable, pushable, liftable
VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `interaction_clearance` metric for `bedroom_bedroom_bench_0` with affordance `sittable`.
Relevant context: bedroom_shelf_0.
Check whether the immediate sit/stand zone at the usable sitting side has enough free space for a normal sit/stand motion. Consider the combined effect of nearby furniture around the seat edge; multiple pieces that collectively pinch the sit/stand zone should count as blocked even if no single item fully closes it. Do not require 360-degree circulation around the furniture. For a bed, if at least one bedside remains usable for a normal sit/stand motion, prefer `pass` over `degraded`.
If you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.
Reason:
In the local render, the bench’s usable long sitting edge has open floor directly in front of it, with no furniture intruding into the immediate sit/stand zone. The highlighted local_context view confirms the red bench is clear of the green wall shelf and other nearby items; the shelf is wall-mounted and does not occupy the bench’s operating space. The overview renders also show the bench beside the bed with a practical open side remaining for normal sit/stand motion, so the sittable affordance remains usable now.
Blocking Objects:
none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes. Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved. Workflow: 1. Read the task instruction and target metric. 2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction. 3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green. 4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level. 5. Evaluate ONLY the requested metric and return a JSON object matching the schema. Scene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m. Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Metric: interaction_clearance Subject object: bedroom_bedroom_bench_0 Target objects: bedroom_shelf_0 Verified asset facts: - subject `bedroom_bedroom_bench_0`: category `bench`, affordances [sittable, supportable, pushable, liftable], usable face `top`, benchmark relevance `functional`, confidence `0.95`, source `asset_annotation` - interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [sit]; top: [place_or_reach_objects]` - low-confidence candidates: `none` Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict. Scene evidence object hints: - subject `bedroom_bedroom_bench_0`: category `bench`, group `seating`, keywords [bench, seating bench], affordances [sittable, supportable, pushable, liftable], task targets [desk, table] - target `bedroom_shelf_0`: category `wall shelf`, group `unknown`, keywords [wall shelf], task targets [graspable_object] Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context): - subject `bedroom_bedroom_bench_0`: category `bench`, room `bedroom`, placement `floor`, footprint `medium`, height `low`, elevation `floor_level` Target cues: - target `bedroom_shelf_0`: category `wall shelf`, room `bedroom`, placement `floor`, footprint `large`, height `low`, elevation `raised`, subject_proximity `mid` Nearby same-room context objects that may affect access or clearance: - `bedroom_bud_vase_0`: category `vase`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny` - `bedroom_jewelry_catchall_dish_1`: category `butter dish`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny` - `bedroom_smoke_detector_0`: category `smoke detector`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny` - `bedroom_paperback_book_1`: category `book`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny` - `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium` - `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large` Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence. Metric-specific rule: Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `interaction_clearance` metric for `bedroom_bedroom_bench_0` with affordance `sittable`. Relevant context: bedroom_shelf_0. Check whether the immediate sit/stand zone at the usable sitting side has enough free space for a normal sit/stand motion. Consider the combined effect of nearby furniture around the seat edge; multiple pieces that collectively pinch the sit/stand zone should count as blocked even if no single item fully closes it. Do not require 360-degree circulation around the furniture. For a bed, if at least one bedside remains usable for a normal sit/stand motion, prefer `pass` over `degraded`. If you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers. Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green. Evidence guardrails: - Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space. - Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete. - To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap. - When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`. - If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`. Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence. Allowed labels: pass, degraded, fail, unknown. 2D perspective overlap alone is not enough evidence of a blocker. Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region. If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region. If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
"check_id": "interaction_clearance__bedroom_bedroom_bench_0__to__bedroom_shelf_0",
"metric": "interaction_clearance",
"subject_id": "bedroom_bedroom_bench_0",
"target_ids": [
"bedroom_shelf_0"
],
"prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: interaction_clearance\nSubject object: bedroom_bedroom_bench_0\nTarget objects: bedroom_shelf_0\n\nVerified asset facts:\n- subject `bedroom_bedroom_bench_0`: category `bench`, affordances [sittable, supportable, pushable, liftable], usable face `top`, benchmark relevance `functional`, confidence `0.95`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [sit]; top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_bedroom_bench_0`: category `bench`, group `seating`, keywords [bench, seating bench], affordances [sittable, supportable, pushable, liftable], task targets [desk, table]\n- target `bedroom_shelf_0`: category `wall shelf`, group `unknown`, keywords [wall shelf], task targets [graspable_object]\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_bedroom_bench_0`: category `bench`, room `bedroom`, placement `floor`, footprint `medium`, height `low`, elevation `floor_level`\nTarget cues:\n- target `bedroom_shelf_0`: category `wall shelf`, room `bedroom`, placement `floor`, footprint `large`, height `low`, elevation `raised`, subject_proximity `mid`\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_bud_vase_0`: category `vase`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny`\n- `bedroom_jewelry_catchall_dish_1`: category `butter dish`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny`\n- `bedroom_smoke_detector_0`: category `smoke detector`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny`\n- `bedroom_paperback_book_1`: category `book`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny`\n- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`\n- `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `interaction_clearance` metric for `bedroom_bedroom_bench_0` with affordance `sittable`.\nRelevant context: bedroom_shelf_0.\nCheck whether the immediate sit/stand zone at the usable sitting side has enough free space for a normal sit/stand motion. Consider the combined effect of nearby furniture around the seat edge; multiple pieces that collectively pinch the sit/stand zone should count as blocked even if no single item fully closes it. Do not require 360-degree circulation around the furniture. For a bed, if at least one bedside remains usable for a normal sit/stand motion, prefer `pass` over `degraded`.\nIf you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
"files": {
"render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/interaction_clearance__bedroom_bedroom_bench_0__to__bedroom_shelf_0_local.png",
"render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/interaction_clearance__bedroom_bedroom_bench_0__to__bedroom_shelf_0_local_context.png",
"render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_00_rgb.png",
"render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_01_rgb.png",
"render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_02_rgb.png",
"render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_03_rgb.png"
}
}
Result JSON
{
"check_id": "interaction_clearance__bedroom_bedroom_bench_0__to__bedroom_shelf_0",
"metric": "interaction_clearance",
"label": "pass",
"asset_fact_used": true,
"asset_fact_confidence": 0.95,
"evidence_conflict": false,
"reason": "In the local render, the bench’s usable long sitting edge has open floor directly in front of it, with no furniture intruding into the immediate sit/stand zone. The highlighted local_context view confirms the red bench is clear of the green wall shelf and other nearby items; the shelf is wall-mounted and does not occupy the bench’s operating space. The overview renders also show the bench beside the bed with a practical open side remaining for normal sit/stand motion, so the sittable affordance remains usable now.",
"blocking_objects": [],
"confidence": 0.9
}
Judgement
butter dish · containable, openable, graspable
VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `interaction_clearance` metric for `bedroom_jewelry_catchall_dish_0` with affordance `openable`.
Relevant context: bedroom_nightstand_1, bedroom_nightstand_0.
First infer the most likely operating side from door panels, handles, hinges, or the side that faces into the room. Then check only that side for a normal standing zone and immediate hand/door operating clearance. Do not assume a large hinged swing arc unless the render clearly shows hinged doors needing that arc. Do not require the entire front face to be empty; a nearby bench or seat off to one side is acceptable when the floor directly before the door/handle area remains usable. A narrow diagonal side gap or a squeeze-by path does not count as clear operating space.
If you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.
Reason:
In the local operating-zone and local operating-zone-context renders, the subject is only partially visible near the wall, and no clear lid seam, handle, hinge, or other front-facing operating side is exposed. The local depth-probe and local plan-context views show nearby small objects and the nightstand below, but they still do not make the dish's usable opening side identifiable, so the immediate standing/opening zone for the listed openable affordance cannot be judged reliably from the provided views. Because the operating side cannot be identified from the local evidence, the interaction clearance is unknown rather than blocked or clearable by direct evidence alone.
Blocking Objects:
none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes. Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved. Workflow: 1. Read the task instruction and target metric. 2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction. 3. Local render order and intended use: 1. `local_operating_zone` is the clean RGB view for the most likely door-facing standing and opening zone. 2. `local_operating_zone_context` is a slightly wider highlighted context view of the same operating side. 3. `local_depth_probe` is an oblique RGB view meant to clarify depth and whether a blocker sits in front of the operating zone. 4. `local_plan_context` is a high local layout view used to check whether a nearby object actually spans or occupies the operating/approach zone rather than merely overlapping in perspective. 4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level. 5. Evaluate ONLY the requested metric and return a JSON object matching the schema. Scene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m. Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Metric: interaction_clearance Subject object: bedroom_jewelry_catchall_dish_0 Target objects: bedroom_nightstand_1, bedroom_nightstand_0 Verified asset facts: - subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, affordances [containable, openable, graspable], usable face `front`, benchmark relevance `functional`, confidence `0.78`, source `asset_annotation` - interaction facts: placement class `unknown`, access type `none`, interaction height `none`, surface map `front: [access_contents, open, grasp]` - low-confidence candidates: `none` Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict. Scene evidence object hints: - subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, group `unknown`, keywords [butter dish], affordances [containable, openable, graspable], task targets [table, desk, nightstand, coffee_table, shelf, cabinet] - target `bedroom_nightstand_1`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object] - target `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object] Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context): - subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, room `bedroom`, placement `floor`, footprint `tiny`, height `very_low`, elevation `raised` Target cues: - target `bedroom_nightstand_1`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `near` - target `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `far` Nearby same-room context objects that may affect access or clearance: - `bedroom_hardcover_book_0`: category `book`, placement `floor`, proximity `touching`, bbox_relation `very_close`, footprint `tiny` - `bedroom_clear_glass_tumbler_0`: category `tumbler`, placement `floor`, proximity `touching`, bbox_relation `very_close`, footprint `tiny` - `bedroom_water_carafe_0`: category `carafe`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `small` - `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large` - `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large` - `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium` Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence. Metric-specific rule: Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `interaction_clearance` metric for `bedroom_jewelry_catchall_dish_0` with affordance `openable`. Relevant context: bedroom_nightstand_1, bedroom_nightstand_0. First infer the most likely operating side from door panels, handles, hinges, or the side that faces into the room. Then check only that side for a normal standing zone and immediate hand/door operating clearance. Do not assume a large hinged swing arc unless the render clearly shows hinged doors needing that arc. Do not require the entire front face to be empty; a nearby bench or seat off to one side is acceptable when the floor directly before the door/handle area remains usable. A narrow diagonal side gap or a squeeze-by path does not count as clear operating space. If you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing. Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green. Evidence guardrails: - Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space. - Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete. - To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap. - When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`. - If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`. Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence. Allowed labels: pass, degraded, fail, unknown. 2D perspective overlap alone is not enough evidence of a blocker. Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region. If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region. If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
"check_id": "interaction_clearance__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0",
"metric": "interaction_clearance",
"subject_id": "bedroom_jewelry_catchall_dish_0",
"target_ids": [
"bedroom_nightstand_1",
"bedroom_nightstand_0"
],
"prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local_operating_zone` is the clean RGB view for the most likely door-facing standing and opening zone. 2. `local_operating_zone_context` is a slightly wider highlighted context view of the same operating side. 3. `local_depth_probe` is an oblique RGB view meant to clarify depth and whether a blocker sits in front of the operating zone. 4. `local_plan_context` is a high local layout view used to check whether a nearby object actually spans or occupies the operating/approach zone rather than merely overlapping in perspective.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: interaction_clearance\nSubject object: bedroom_jewelry_catchall_dish_0\nTarget objects: bedroom_nightstand_1, bedroom_nightstand_0\n\nVerified asset facts:\n- subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, affordances [containable, openable, graspable], usable face `front`, benchmark relevance `functional`, confidence `0.78`, source `asset_annotation`\n- interaction facts: placement class `unknown`, access type `none`, interaction height `none`, surface map `front: [access_contents, open, grasp]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, group `unknown`, keywords [butter dish], affordances [containable, openable, graspable], task targets [table, desk, nightstand, coffee_table, shelf, cabinet]\n- target `bedroom_nightstand_1`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object]\n- target `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object]\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, room `bedroom`, placement `floor`, footprint `tiny`, height `very_low`, elevation `raised`\nTarget cues:\n- target `bedroom_nightstand_1`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `near`\n- target `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `far`\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_hardcover_book_0`: category `book`, placement `floor`, proximity `touching`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_clear_glass_tumbler_0`: category `tumbler`, placement `floor`, proximity `touching`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_water_carafe_0`: category `carafe`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `small`\n- `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large`\n- `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large`\n- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `interaction_clearance` metric for `bedroom_jewelry_catchall_dish_0` with affordance `openable`.\nRelevant context: bedroom_nightstand_1, bedroom_nightstand_0.\nFirst infer the most likely operating side from door panels, handles, hinges, or the side that faces into the room. Then check only that side for a normal standing zone and immediate hand/door operating clearance. Do not assume a large hinged swing arc unless the render clearly shows hinged doors needing that arc. Do not require the entire front face to be empty; a nearby bench or seat off to one side is acceptable when the floor directly before the door/handle area remains usable. A narrow diagonal side gap or a squeeze-by path does not count as clear operating space.\nIf you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
"files": {
"render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/interaction_clearance__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0_local_operating_zone.png",
"render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/interaction_clearance__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0_local_operating_zone_context.png",
"render_check_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/interaction_clearance__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0_local_depth_probe.png",
"render_check_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/interaction_clearance__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0_local_plan_context.png",
"render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_00_rgb.png",
"render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_01_rgb.png",
"render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_02_rgb.png",
"render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_03_rgb.png"
}
}
Result JSON
{
"check_id": "interaction_clearance__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0",
"metric": "interaction_clearance",
"label": "unknown",
"asset_fact_used": true,
"asset_fact_confidence": 0.78,
"evidence_conflict": false,
"reason": "In the local operating-zone and local operating-zone-context renders, the subject is only partially visible near the wall, and no clear lid seam, handle, hinge, or other front-facing operating side is exposed. The local depth-probe and local plan-context views show nearby small objects and the nightstand below, but they still do not make the dish's usable opening side identifiable, so the immediate standing/opening zone for the listed openable affordance cannot be judged reliably from the provided views. Because the operating side cannot be identified from the local evidence, the interaction clearance is unknown rather than blocked or clearable by direct evidence alone.",
"blocking_objects": [],
"confidence": 0.83
}
Judgement
nightstand · supportable
VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `affordance_exposure` metric for `bedroom_nightstand_0` with affordance `supportable`.
Relevant context: the surrounding scene.
Check whether the usable top surface is visible enough to judge placement or reach. Do not require all table edges to be exposed.
If you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.
Reason:
In the local_context render, the highlighted subject nightstand is clearly identified beside the bed, and in the overview renders the black nightstand's top surface is visibly exposed with small items resting on it. The usable top support surface is therefore visible enough to judge placement or reach, with no direct blocker obscuring that functional region in the rendered views.
Blocking Objects:
none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes. Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved. Workflow: 1. Read the task instruction and target metric. 2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction. 3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green. 4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level. 5. Evaluate ONLY the requested metric and return a JSON object matching the schema. Scene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m. Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Metric: affordance_exposure Subject object: bedroom_nightstand_0 Target objects: none Verified asset facts: - subject `bedroom_nightstand_0`: category `nightstand`, affordances [supportable], usable face `top`, benchmark relevance `functional`, confidence `0.86`, source `asset_annotation` - interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `top: [place_or_reach_objects]` - low-confidence candidates: `none` Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict. Scene evidence object hints: - subject `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], affordances [supportable], task targets [graspable_object] - targets: none Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context): - subject `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level` Target cues: - none Nearby same-room context objects that may affect access or clearance: - `bedroom_alarm_clock_0`: category `alarm clock`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny` - `bedroom_paperback_book_0`: category `book`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny` - `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `tiny` - `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `large` - `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large` - `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `far`, bbox_relation `separated`, footprint `medium` Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence. Metric-specific rule: Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `affordance_exposure` metric for `bedroom_nightstand_0` with affordance `supportable`. Relevant context: the surrounding scene. Check whether the usable top surface is visible enough to judge placement or reach. Do not require all table edges to be exposed. If you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`. Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green. Evidence guardrails: - Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space. - Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete. - To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap. - When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`. - If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`. Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence. Allowed labels: pass, degraded, fail, unknown. 2D perspective overlap alone is not enough evidence of a blocker. Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region. If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region. If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
"check_id": "affordance_exposure__bedroom_nightstand_0",
"metric": "affordance_exposure",
"subject_id": "bedroom_nightstand_0",
"target_ids": [],
"prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: affordance_exposure\nSubject object: bedroom_nightstand_0\nTarget objects: none\n\nVerified asset facts:\n- subject `bedroom_nightstand_0`: category `nightstand`, affordances [supportable], usable face `top`, benchmark relevance `functional`, confidence `0.86`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], affordances [supportable], task targets [graspable_object]\n- targets: none\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`\nTarget cues:\n- none\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_alarm_clock_0`: category `alarm clock`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`\n- `bedroom_paperback_book_0`: category `book`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`\n- `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `large`\n- `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large`\n- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `far`, bbox_relation `separated`, footprint `medium`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `affordance_exposure` metric for `bedroom_nightstand_0` with affordance `supportable`.\nRelevant context: the surrounding scene.\nCheck whether the usable top surface is visible enough to judge placement or reach. Do not require all table edges to be exposed.\nIf you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
"files": {
"render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/affordance_exposure__bedroom_nightstand_0_local.png",
"render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/affordance_exposure__bedroom_nightstand_0_local_context.png",
"render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_00_rgb.png",
"render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_01_rgb.png",
"render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_02_rgb.png",
"render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_03_rgb.png"
}
}
Result JSON
{
"check_id": "affordance_exposure__bedroom_nightstand_0",
"metric": "affordance_exposure",
"label": "pass",
"asset_fact_used": true,
"asset_fact_confidence": 0.86,
"evidence_conflict": false,
"reason": "In the local_context render, the highlighted subject nightstand is clearly identified beside the bed, and in the overview renders the black nightstand's top surface is visibly exposed with small items resting on it. The usable top support surface is therefore visible enough to judge placement or reach, with no direct blocker obscuring that functional region in the rendered views.",
"blocking_objects": [],
"confidence": 0.89
}
Judgement
nightstand · supportable
VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `affordance_exposure` metric for `bedroom_nightstand_1` with affordance `supportable`.
Relevant context: the surrounding scene.
Check whether the usable top surface is visible enough to judge placement or reach. Do not require all table edges to be exposed.
If you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.
Reason:
In the local render and the highlighted local_context render, the subject nightstand's usable top surface is directly visible from above/side, and the overview renders also show the bedside table surface with items resting on it. The supportable top region is exposed enough to judge object placement or reach, with no direct visual blocker hiding that functional surface from view in the provided images.
Blocking Objects:
none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes. Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved. Workflow: 1. Read the task instruction and target metric. 2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction. 3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green. 4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level. 5. Evaluate ONLY the requested metric and return a JSON object matching the schema. Scene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m. Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Metric: affordance_exposure Subject object: bedroom_nightstand_1 Target objects: none Verified asset facts: - subject `bedroom_nightstand_1`: category `nightstand`, affordances [supportable], usable face `top`, benchmark relevance `functional`, confidence `0.86`, source `asset_annotation` - interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `top: [place_or_reach_objects]` - low-confidence candidates: `none` Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict. Scene evidence object hints: - subject `bedroom_nightstand_1`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], affordances [supportable], task targets [graspable_object] - targets: none Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context): - subject `bedroom_nightstand_1`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level` Target cues: - none Nearby same-room context objects that may affect access or clearance: - `bedroom_jewelry_catchall_dish_0`: category `butter dish`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny` - `bedroom_water_carafe_0`: category `carafe`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `small` - `bedroom_clear_glass_tumbler_0`: category `tumbler`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny` - `bedroom_hardcover_book_0`: category `book`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `tiny` - `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `large` - `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large` Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence. Metric-specific rule: Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `affordance_exposure` metric for `bedroom_nightstand_1` with affordance `supportable`. Relevant context: the surrounding scene. Check whether the usable top surface is visible enough to judge placement or reach. Do not require all table edges to be exposed. If you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`. Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green. Evidence guardrails: - Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space. - Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete. - To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap. - When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`. - If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`. Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence. Allowed labels: pass, degraded, fail, unknown. 2D perspective overlap alone is not enough evidence of a blocker. Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region. If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region. If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
"check_id": "affordance_exposure__bedroom_nightstand_1",
"metric": "affordance_exposure",
"subject_id": "bedroom_nightstand_1",
"target_ids": [],
"prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: affordance_exposure\nSubject object: bedroom_nightstand_1\nTarget objects: none\n\nVerified asset facts:\n- subject `bedroom_nightstand_1`: category `nightstand`, affordances [supportable], usable face `top`, benchmark relevance `functional`, confidence `0.86`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_nightstand_1`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], affordances [supportable], task targets [graspable_object]\n- targets: none\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_nightstand_1`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`\nTarget cues:\n- none\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_jewelry_catchall_dish_0`: category `butter dish`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`\n- `bedroom_water_carafe_0`: category `carafe`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `small`\n- `bedroom_clear_glass_tumbler_0`: category `tumbler`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`\n- `bedroom_hardcover_book_0`: category `book`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `large`\n- `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `affordance_exposure` metric for `bedroom_nightstand_1` with affordance `supportable`.\nRelevant context: the surrounding scene.\nCheck whether the usable top surface is visible enough to judge placement or reach. Do not require all table edges to be exposed.\nIf you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
"files": {
"render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/affordance_exposure__bedroom_nightstand_1_local.png",
"render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/affordance_exposure__bedroom_nightstand_1_local_context.png",
"render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_00_rgb.png",
"render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_01_rgb.png",
"render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_02_rgb.png",
"render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_03_rgb.png"
}
}
Result JSON
{
"check_id": "affordance_exposure__bedroom_nightstand_1",
"metric": "affordance_exposure",
"label": "pass",
"asset_fact_used": true,
"asset_fact_confidence": 0.86,
"evidence_conflict": false,
"reason": "In the local render and the highlighted local_context render, the subject nightstand's usable top surface is directly visible from above/side, and the overview renders also show the bedside table surface with items resting on it. The supportable top region is exposed enough to judge object placement or reach, with no direct visual blocker hiding that functional surface from view in the provided images.",
"blocking_objects": [],
"confidence": 0.91
}
Judgement
bed · sittable, supportable, containable
VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `spatial_accessibility` metric for `bedroom_bed_0` with affordance `sittable`.
Relevant context: bedroom_rug_0, bedroom_nightstand_0.
Check whether a person can physically walk up to the seating edge and sit from the front or from one genuinely open side. Require one clearly usable approach corridor from open room space all the way to the seat edge itself, not merely to the furniture cluster or to an outer corner of the seat. A standard coffee table in front of a sofa or loveseat is normal unless the local renders show that it closes off the usable seating edge. Consider the combined effect of nearby furniture on the front and side approach zones only when that pinch is visually evident in the local renders. Close adjacency to a bed or wall is acceptable only if one normal sitting approach remains.
If you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.
Reason:
In the local render and the highlighted local_context render, the bed’s long side facing the green bench remains exposed rather than pressed directly against a wall or another large object. In the overview renders, there is open room floor along that bench-facing side and around the near corner of the bed, so a person can walk up from the open room and reach a usable bed edge to sit. The nearby nightstand is on the bedside end and does not eliminate all practical sitting approaches, and the bench does not clearly close off the full side approach zone in the local evidence.
Blocking Objects:
none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes. Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved. Workflow: 1. Read the task instruction and target metric. 2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction. 3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green. 4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level. 5. Evaluate ONLY the requested metric and return a JSON object matching the schema. Scene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m. Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Metric: spatial_accessibility Subject object: bedroom_bed_0 Target objects: bedroom_rug_0, bedroom_nightstand_0 Verified asset facts: - subject `bedroom_bed_0`: category `bed`, affordances [sittable, supportable, containable], usable face `top`, benchmark relevance `functional`, confidence `0.74`, source `asset_annotation` - interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [sit, access_contents]; top: [place_or_reach_objects]` - low-confidence candidates: `none` Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict. Scene evidence object hints: - subject `bedroom_bed_0`: category `bed`, group `sleeping`, keywords [bed, bedframe, mattress, sleeping area], affordances [sittable, supportable, containable], task targets [desk, table] - target `bedroom_rug_0`: category `rug`, group `soft_furnishing`, keywords [rug, carpet, floor mat], task targets [graspable_object] - target `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object] Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context): - subject `bedroom_bed_0`: category `bed`, room `bedroom`, placement `floor`, footprint `large`, height `tall`, elevation `floor_level` Target cues: - target `bedroom_rug_0`: category `rug`, room `bedroom`, placement `floor`, footprint `large`, height `tall`, elevation `floor_level`, subject_proximity `touching` - target `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `mid` Nearby same-room context objects that may affect access or clearance: - `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `medium` - `bedroom_alarm_clock_0`: category `alarm clock`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `tiny` - `bedroom_nightstand_1`: category `nightstand`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `small` - `bedroom_bedroom_bench_0`: category `bench`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium` - `bedroom_jewelry_catchall_dish_0`: category `butter dish`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `tiny` - `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny` Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence. Metric-specific rule: Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `spatial_accessibility` metric for `bedroom_bed_0` with affordance `sittable`. Relevant context: bedroom_rug_0, bedroom_nightstand_0. Check whether a person can physically walk up to the seating edge and sit from the front or from one genuinely open side. Require one clearly usable approach corridor from open room space all the way to the seat edge itself, not merely to the furniture cluster or to an outer corner of the seat. A standard coffee table in front of a sofa or loveseat is normal unless the local renders show that it closes off the usable seating edge. Consider the combined effect of nearby furniture on the front and side approach zones only when that pinch is visually evident in the local renders. Close adjacency to a bed or wall is acceptable only if one normal sitting approach remains. If you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers. Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green. Evidence guardrails: - Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space. - Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete. - To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap. - When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`. - If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`. Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence. Allowed labels: pass, degraded, fail, unknown. 2D perspective overlap alone is not enough evidence of a blocker. Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region. If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region. If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
"check_id": "spatial_accessibility__bedroom_bed_0__to__bedroom_rug_0__bedroom_nightstand_0",
"metric": "spatial_accessibility",
"subject_id": "bedroom_bed_0",
"target_ids": [
"bedroom_rug_0",
"bedroom_nightstand_0"
],
"prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: spatial_accessibility\nSubject object: bedroom_bed_0\nTarget objects: bedroom_rug_0, bedroom_nightstand_0\n\nVerified asset facts:\n- subject `bedroom_bed_0`: category `bed`, affordances [sittable, supportable, containable], usable face `top`, benchmark relevance `functional`, confidence `0.74`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [sit, access_contents]; top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_bed_0`: category `bed`, group `sleeping`, keywords [bed, bedframe, mattress, sleeping area], affordances [sittable, supportable, containable], task targets [desk, table]\n- target `bedroom_rug_0`: category `rug`, group `soft_furnishing`, keywords [rug, carpet, floor mat], task targets [graspable_object]\n- target `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object]\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_bed_0`: category `bed`, room `bedroom`, placement `floor`, footprint `large`, height `tall`, elevation `floor_level`\nTarget cues:\n- target `bedroom_rug_0`: category `rug`, room `bedroom`, placement `floor`, footprint `large`, height `tall`, elevation `floor_level`, subject_proximity `touching`\n- target `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `mid`\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `medium`\n- `bedroom_alarm_clock_0`: category `alarm clock`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `tiny`\n- `bedroom_nightstand_1`: category `nightstand`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `small`\n- `bedroom_bedroom_bench_0`: category `bench`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`\n- `bedroom_jewelry_catchall_dish_0`: category `butter dish`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `tiny`\n- `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `spatial_accessibility` metric for `bedroom_bed_0` with affordance `sittable`.\nRelevant context: bedroom_rug_0, bedroom_nightstand_0.\nCheck whether a person can physically walk up to the seating edge and sit from the front or from one genuinely open side. Require one clearly usable approach corridor from open room space all the way to the seat edge itself, not merely to the furniture cluster or to an outer corner of the seat. A standard coffee table in front of a sofa or loveseat is normal unless the local renders show that it closes off the usable seating edge. Consider the combined effect of nearby furniture on the front and side approach zones only when that pinch is visually evident in the local renders. Close adjacency to a bed or wall is acceptable only if one normal sitting approach remains.\nIf you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
"files": {
"render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/spatial_accessibility__bedroom_bed_0__to__bedroom_rug_0__bedroom_nightstand_0_local.png",
"render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/spatial_accessibility__bedroom_bed_0__to__bedroom_rug_0__bedroom_nightstand_0_local_context.png",
"render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_00_rgb.png",
"render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_01_rgb.png",
"render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_02_rgb.png",
"render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_03_rgb.png"
}
}
Result JSON
{
"check_id": "spatial_accessibility__bedroom_bed_0__to__bedroom_rug_0__bedroom_nightstand_0",
"metric": "spatial_accessibility",
"label": "pass",
"asset_fact_used": true,
"asset_fact_confidence": 0.74,
"evidence_conflict": false,
"reason": "In the local render and the highlighted local_context render, the bed’s long side facing the green bench remains exposed rather than pressed directly against a wall or another large object. In the overview renders, there is open room floor along that bench-facing side and around the near corner of the bed, so a person can walk up from the open room and reach a usable bed edge to sit. The nearby nightstand is on the bedside end and does not eliminate all practical sitting approaches, and the bench does not clearly close off the full side approach zone in the local evidence.",
"blocking_objects": [],
"confidence": 0.78
}
Judgement
wardrobe · openable, containable, supportable, drawable
VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `spatial_accessibility` metric for `bedroom_wardrobe_0` with affordance `openable`.
Relevant context: the surrounding scene.
Check whether at least one practical front-facing or slightly offset standing position can reach the operating side so the object can be opened. A nearby bed corner or bench should not cause degraded by itself if a normal reach from the front region is still possible. Use degraded only when that front-region approach is directly pinched, occupied, or effectively unusable across the usable operating side. Do not count a cramped diagonal squeeze, shoulder-turned sidestep, or tiny side sliver as a valid approach path.
If you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.
Reason:
In the local_operating_zone_context render and the local_plan_context render, the standing/approach zone directly in front of the wardrobe’s operating side is shown as a broad clear area rather than being occupied by the bed or bench. The local_depth_probe also does not show a solid object spanning that front approach region. The overview renders confirm the wardrobe is in a corner, but at least one practical front-facing or slightly offset standing position remains available to reach and open it normally. Therefore the wardrobe remains spatially accessible for opening now.
Blocking Objects:
none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes. Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved. Workflow: 1. Read the task instruction and target metric. 2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction. 3. Local render order and intended use: 1. `local_operating_zone` is the clean RGB view for the most likely door-facing standing and opening zone. 2. `local_operating_zone_context` is a slightly wider highlighted context view of the same operating side. 3. `local_depth_probe` is an oblique RGB view meant to clarify depth and whether a blocker sits in front of the operating zone. 4. `local_plan_context` is a high local layout view used to check whether a nearby object actually spans or occupies the operating/approach zone rather than merely overlapping in perspective. 4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level. 5. Evaluate ONLY the requested metric and return a JSON object matching the schema. Scene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m. Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Metric: spatial_accessibility Subject object: bedroom_wardrobe_0 Target objects: none Verified asset facts: - subject `bedroom_wardrobe_0`: category `wardrobe`, affordances [openable, containable, supportable, drawable], usable face `top`, benchmark relevance `functional`, confidence `0.92`, source `asset_annotation` - interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [open, access_contents, open]; top: [place_or_reach_objects]` - low-confidence candidates: `none` Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict. Scene evidence object hints: - subject `bedroom_wardrobe_0`: category `wardrobe`, group `storage`, keywords [wardrobe, closet, armoire], affordances [openable, containable, supportable, drawable], task targets [wall, clear_space] - targets: none Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context): - subject `bedroom_wardrobe_0`: category `wardrobe`, room `bedroom`, placement `floor`, footprint `large`, height `mid`, elevation `floor_level` Target cues: - none Nearby same-room context objects that may affect access or clearance: - `bedroom_smoke_detector_0`: category `smoke detector`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `tiny` - `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `tiny` - `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large` - `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `large` - `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium` - `bedroom_bedroom_bench_0`: category `bench`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `medium` Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence. Metric-specific rule: Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `spatial_accessibility` metric for `bedroom_wardrobe_0` with affordance `openable`. Relevant context: the surrounding scene. Check whether at least one practical front-facing or slightly offset standing position can reach the operating side so the object can be opened. A nearby bed corner or bench should not cause degraded by itself if a normal reach from the front region is still possible. Use degraded only when that front-region approach is directly pinched, occupied, or effectively unusable across the usable operating side. Do not count a cramped diagonal squeeze, shoulder-turned sidestep, or tiny side sliver as a valid approach path. If you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing. Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green. Evidence guardrails: - Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space. - Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete. - To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap. - When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`. - If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`. Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence. Allowed labels: pass, degraded, fail, unknown. 2D perspective overlap alone is not enough evidence of a blocker. Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region. If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region. If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
"check_id": "spatial_accessibility__bedroom_wardrobe_0",
"metric": "spatial_accessibility",
"subject_id": "bedroom_wardrobe_0",
"target_ids": [],
"prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local_operating_zone` is the clean RGB view for the most likely door-facing standing and opening zone. 2. `local_operating_zone_context` is a slightly wider highlighted context view of the same operating side. 3. `local_depth_probe` is an oblique RGB view meant to clarify depth and whether a blocker sits in front of the operating zone. 4. `local_plan_context` is a high local layout view used to check whether a nearby object actually spans or occupies the operating/approach zone rather than merely overlapping in perspective.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: spatial_accessibility\nSubject object: bedroom_wardrobe_0\nTarget objects: none\n\nVerified asset facts:\n- subject `bedroom_wardrobe_0`: category `wardrobe`, affordances [openable, containable, supportable, drawable], usable face `top`, benchmark relevance `functional`, confidence `0.92`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [open, access_contents, open]; top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_wardrobe_0`: category `wardrobe`, group `storage`, keywords [wardrobe, closet, armoire], affordances [openable, containable, supportable, drawable], task targets [wall, clear_space]\n- targets: none\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_wardrobe_0`: category `wardrobe`, room `bedroom`, placement `floor`, footprint `large`, height `mid`, elevation `floor_level`\nTarget cues:\n- none\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_smoke_detector_0`: category `smoke detector`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`\n- `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `large`\n- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`\n- `bedroom_bedroom_bench_0`: category `bench`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `medium`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `spatial_accessibility` metric for `bedroom_wardrobe_0` with affordance `openable`.\nRelevant context: the surrounding scene.\nCheck whether at least one practical front-facing or slightly offset standing position can reach the operating side so the object can be opened. A nearby bed corner or bench should not cause degraded by itself if a normal reach from the front region is still possible. Use degraded only when that front-region approach is directly pinched, occupied, or effectively unusable across the usable operating side. Do not count a cramped diagonal squeeze, shoulder-turned sidestep, or tiny side sliver as a valid approach path.\nIf you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
"files": {
"render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/spatial_accessibility__bedroom_wardrobe_0_local_operating_zone.png",
"render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/spatial_accessibility__bedroom_wardrobe_0_local_operating_zone_context.png",
"render_check_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/spatial_accessibility__bedroom_wardrobe_0_local_depth_probe.png",
"render_check_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/spatial_accessibility__bedroom_wardrobe_0_local_plan_context.png",
"render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_00_rgb.png",
"render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_01_rgb.png",
"render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_02_rgb.png",
"render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_03_rgb.png"
}
}
Result JSON
{
"check_id": "spatial_accessibility__bedroom_wardrobe_0",
"metric": "spatial_accessibility",
"label": "pass",
"asset_fact_used": true,
"asset_fact_confidence": 0.92,
"evidence_conflict": false,
"reason": "In the local_operating_zone_context render and the local_plan_context render, the standing/approach zone directly in front of the wardrobe’s operating side is shown as a broad clear area rather than being occupied by the bed or bench. The local_depth_probe also does not show a solid object spanning that front approach region. The overview renders confirm the wardrobe is in a corner, but at least one practical front-facing or slightly offset standing position remains available to reach and open it normally. Therefore the wardrobe remains spatially accessible for opening now.",
"blocking_objects": [],
"confidence": 0.89
}
Judgement
nightstand · supportable
VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `spatial_accessibility` metric for `bedroom_nightstand_0` with affordance `supportable`.
Relevant context: the surrounding scene.
Check whether a person can approach the room-facing side and reach the usable top surface. Do not require circulation on all sides.
If you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.
Reason:
In the local_context render, the highlighted nightstand sits beside the bed but its room-facing side still has visible open floor, and no direct blocker occupies the immediate standing/reach area needed to approach the top surface. The overview renders also show the nightstand as a normal bedside placement rather than sealed off by other furniture, so the top remains realistically approachable for use now.
Blocking Objects:
none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes. Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved. Workflow: 1. Read the task instruction and target metric. 2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction. 3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green. 4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level. 5. Evaluate ONLY the requested metric and return a JSON object matching the schema. Scene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m. Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Metric: spatial_accessibility Subject object: bedroom_nightstand_0 Target objects: none Verified asset facts: - subject `bedroom_nightstand_0`: category `nightstand`, affordances [supportable], usable face `top`, benchmark relevance `functional`, confidence `0.86`, source `asset_annotation` - interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `top: [place_or_reach_objects]` - low-confidence candidates: `none` Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict. Scene evidence object hints: - subject `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], affordances [supportable], task targets [graspable_object] - targets: none Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context): - subject `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level` Target cues: - none Nearby same-room context objects that may affect access or clearance: - `bedroom_alarm_clock_0`: category `alarm clock`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny` - `bedroom_paperback_book_0`: category `book`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny` - `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `tiny` - `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `large` - `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large` - `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `far`, bbox_relation `separated`, footprint `medium` Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence. Metric-specific rule: Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `spatial_accessibility` metric for `bedroom_nightstand_0` with affordance `supportable`. Relevant context: the surrounding scene. Check whether a person can approach the room-facing side and reach the usable top surface. Do not require circulation on all sides. If you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`. Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green. Evidence guardrails: - Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space. - Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete. - To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap. - When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`. - If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`. Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence. Allowed labels: pass, degraded, fail, unknown. 2D perspective overlap alone is not enough evidence of a blocker. Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region. If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region. If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
"check_id": "spatial_accessibility__bedroom_nightstand_0",
"metric": "spatial_accessibility",
"subject_id": "bedroom_nightstand_0",
"target_ids": [],
"prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: spatial_accessibility\nSubject object: bedroom_nightstand_0\nTarget objects: none\n\nVerified asset facts:\n- subject `bedroom_nightstand_0`: category `nightstand`, affordances [supportable], usable face `top`, benchmark relevance `functional`, confidence `0.86`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], affordances [supportable], task targets [graspable_object]\n- targets: none\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`\nTarget cues:\n- none\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_alarm_clock_0`: category `alarm clock`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`\n- `bedroom_paperback_book_0`: category `book`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`\n- `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `large`\n- `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large`\n- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `far`, bbox_relation `separated`, footprint `medium`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `spatial_accessibility` metric for `bedroom_nightstand_0` with affordance `supportable`.\nRelevant context: the surrounding scene.\nCheck whether a person can approach the room-facing side and reach the usable top surface. Do not require circulation on all sides.\nIf you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
"files": {
"render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/spatial_accessibility__bedroom_nightstand_0_local.png",
"render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/spatial_accessibility__bedroom_nightstand_0_local_context.png",
"render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_00_rgb.png",
"render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_01_rgb.png",
"render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_02_rgb.png",
"render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_03_rgb.png"
}
}
Result JSON
{
"check_id": "spatial_accessibility__bedroom_nightstand_0",
"metric": "spatial_accessibility",
"label": "pass",
"asset_fact_used": true,
"asset_fact_confidence": 0.86,
"evidence_conflict": false,
"reason": "In the local_context render, the highlighted nightstand sits beside the bed but its room-facing side still has visible open floor, and no direct blocker occupies the immediate standing/reach area needed to approach the top surface. The overview renders also show the nightstand as a normal bedside placement rather than sealed off by other furniture, so the top remains realistically approachable for use now.",
"blocking_objects": [],
"confidence": 0.84
}
Judgement
nightstand · supportable
VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `spatial_accessibility` metric for `bedroom_nightstand_1` with affordance `supportable`.
Relevant context: the surrounding scene.
Check whether a person can approach the room-facing side and reach the usable top surface. Do not require circulation on all sides.
If you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.
Reason:
In the local context render, the nightstand’s room-facing side is visible beside the bed with open standing space in front of it, and no direct object blocks approach to the top surface. The overview renders also show the nightstand positioned in a normal bedside location with accessible floor area on the room side; the nearby bed adjacency is normal and does not eliminate approach access here.
Blocking Objects:
none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes. Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved. Workflow: 1. Read the task instruction and target metric. 2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction. 3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green. 4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level. 5. Evaluate ONLY the requested metric and return a JSON object matching the schema. Scene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m. Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Metric: spatial_accessibility Subject object: bedroom_nightstand_1 Target objects: none Verified asset facts: - subject `bedroom_nightstand_1`: category `nightstand`, affordances [supportable], usable face `top`, benchmark relevance `functional`, confidence `0.86`, source `asset_annotation` - interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `top: [place_or_reach_objects]` - low-confidence candidates: `none` Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict. Scene evidence object hints: - subject `bedroom_nightstand_1`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], affordances [supportable], task targets [graspable_object] - targets: none Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context): - subject `bedroom_nightstand_1`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level` Target cues: - none Nearby same-room context objects that may affect access or clearance: - `bedroom_jewelry_catchall_dish_0`: category `butter dish`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny` - `bedroom_water_carafe_0`: category `carafe`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `small` - `bedroom_clear_glass_tumbler_0`: category `tumbler`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny` - `bedroom_hardcover_book_0`: category `book`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `tiny` - `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `large` - `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large` Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence. Metric-specific rule: Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `spatial_accessibility` metric for `bedroom_nightstand_1` with affordance `supportable`. Relevant context: the surrounding scene. Check whether a person can approach the room-facing side and reach the usable top surface. Do not require circulation on all sides. If you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`. Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green. Evidence guardrails: - Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space. - Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete. - To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap. - When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`. - If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`. Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence. Allowed labels: pass, degraded, fail, unknown. 2D perspective overlap alone is not enough evidence of a blocker. Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region. If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region. If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
"check_id": "spatial_accessibility__bedroom_nightstand_1",
"metric": "spatial_accessibility",
"subject_id": "bedroom_nightstand_1",
"target_ids": [],
"prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: spatial_accessibility\nSubject object: bedroom_nightstand_1\nTarget objects: none\n\nVerified asset facts:\n- subject `bedroom_nightstand_1`: category `nightstand`, affordances [supportable], usable face `top`, benchmark relevance `functional`, confidence `0.86`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_nightstand_1`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], affordances [supportable], task targets [graspable_object]\n- targets: none\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_nightstand_1`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`\nTarget cues:\n- none\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_jewelry_catchall_dish_0`: category `butter dish`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`\n- `bedroom_water_carafe_0`: category `carafe`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `small`\n- `bedroom_clear_glass_tumbler_0`: category `tumbler`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`\n- `bedroom_hardcover_book_0`: category `book`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `large`\n- `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `spatial_accessibility` metric for `bedroom_nightstand_1` with affordance `supportable`.\nRelevant context: the surrounding scene.\nCheck whether a person can approach the room-facing side and reach the usable top surface. Do not require circulation on all sides.\nIf you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
"files": {
"render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/spatial_accessibility__bedroom_nightstand_1_local.png",
"render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/spatial_accessibility__bedroom_nightstand_1_local_context.png",
"render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_00_rgb.png",
"render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_01_rgb.png",
"render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_02_rgb.png",
"render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_03_rgb.png"
}
}
Result JSON
{
"check_id": "spatial_accessibility__bedroom_nightstand_1",
"metric": "spatial_accessibility",
"label": "pass",
"asset_fact_used": true,
"asset_fact_confidence": 0.86,
"evidence_conflict": false,
"reason": "In the local context render, the nightstand’s room-facing side is visible beside the bed with open standing space in front of it, and no direct object blocks approach to the top surface. The overview renders also show the nightstand positioned in a normal bedside location with accessible floor area on the room side; the nearby bed adjacency is normal and does not eliminate approach access here.",
"blocking_objects": [],
"confidence": 0.78
}
Judgement
bench · sittable, supportable, pushable, liftable
VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `spatial_accessibility` metric for `bedroom_bedroom_bench_0` with affordance `sittable`.
Relevant context: bedroom_shelf_0.
Check whether a person can physically walk up to the seating edge and sit from the front or from one genuinely open side. Require one clearly usable approach corridor from open room space all the way to the seat edge itself, not merely to the furniture cluster or to an outer corner of the seat. A standard coffee table in front of a sofa or loveseat is normal unless the local renders show that it closes off the usable seating edge. Consider the combined effect of nearby furniture on the front and side approach zones only when that pinch is visually evident in the local renders. Close adjacency to a bed or wall is acceptable only if one normal sitting approach remains.
If you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.
Reason:
In the local render, the bench has a clearly open long-side approach from the room side, with visible free floor space next to the seating edge and no direct blocker occupying the sit-down path. In the highlighted local_context render, the red bench is still reachable from that same open side, while the green wall shelf is mounted off to the wall and does not block access. The overview renders also show the bench positioned beside the bed with open room space along its usable side, so a person can walk up to the seating edge and sit from that open side now.
Blocking Objects:
none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes. Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved. Workflow: 1. Read the task instruction and target metric. 2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction. 3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green. 4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level. 5. Evaluate ONLY the requested metric and return a JSON object matching the schema. Scene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m. Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Metric: spatial_accessibility Subject object: bedroom_bedroom_bench_0 Target objects: bedroom_shelf_0 Verified asset facts: - subject `bedroom_bedroom_bench_0`: category `bench`, affordances [sittable, supportable, pushable, liftable], usable face `top`, benchmark relevance `functional`, confidence `0.95`, source `asset_annotation` - interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [sit]; top: [place_or_reach_objects]` - low-confidence candidates: `none` Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict. Scene evidence object hints: - subject `bedroom_bedroom_bench_0`: category `bench`, group `seating`, keywords [bench, seating bench], affordances [sittable, supportable, pushable, liftable], task targets [desk, table] - target `bedroom_shelf_0`: category `wall shelf`, group `unknown`, keywords [wall shelf], task targets [graspable_object] Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context): - subject `bedroom_bedroom_bench_0`: category `bench`, room `bedroom`, placement `floor`, footprint `medium`, height `low`, elevation `floor_level` Target cues: - target `bedroom_shelf_0`: category `wall shelf`, room `bedroom`, placement `floor`, footprint `large`, height `low`, elevation `raised`, subject_proximity `mid` Nearby same-room context objects that may affect access or clearance: - `bedroom_bud_vase_0`: category `vase`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny` - `bedroom_jewelry_catchall_dish_1`: category `butter dish`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny` - `bedroom_smoke_detector_0`: category `smoke detector`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny` - `bedroom_paperback_book_1`: category `book`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny` - `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium` - `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large` Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence. Metric-specific rule: Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `spatial_accessibility` metric for `bedroom_bedroom_bench_0` with affordance `sittable`. Relevant context: bedroom_shelf_0. Check whether a person can physically walk up to the seating edge and sit from the front or from one genuinely open side. Require one clearly usable approach corridor from open room space all the way to the seat edge itself, not merely to the furniture cluster or to an outer corner of the seat. A standard coffee table in front of a sofa or loveseat is normal unless the local renders show that it closes off the usable seating edge. Consider the combined effect of nearby furniture on the front and side approach zones only when that pinch is visually evident in the local renders. Close adjacency to a bed or wall is acceptable only if one normal sitting approach remains. If you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers. Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green. Evidence guardrails: - Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space. - Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete. - To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap. - When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`. - If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`. Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence. Allowed labels: pass, degraded, fail, unknown. 2D perspective overlap alone is not enough evidence of a blocker. Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region. If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region. If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
"check_id": "spatial_accessibility__bedroom_bedroom_bench_0__to__bedroom_shelf_0",
"metric": "spatial_accessibility",
"subject_id": "bedroom_bedroom_bench_0",
"target_ids": [
"bedroom_shelf_0"
],
"prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: spatial_accessibility\nSubject object: bedroom_bedroom_bench_0\nTarget objects: bedroom_shelf_0\n\nVerified asset facts:\n- subject `bedroom_bedroom_bench_0`: category `bench`, affordances [sittable, supportable, pushable, liftable], usable face `top`, benchmark relevance `functional`, confidence `0.95`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [sit]; top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_bedroom_bench_0`: category `bench`, group `seating`, keywords [bench, seating bench], affordances [sittable, supportable, pushable, liftable], task targets [desk, table]\n- target `bedroom_shelf_0`: category `wall shelf`, group `unknown`, keywords [wall shelf], task targets [graspable_object]\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_bedroom_bench_0`: category `bench`, room `bedroom`, placement `floor`, footprint `medium`, height `low`, elevation `floor_level`\nTarget cues:\n- target `bedroom_shelf_0`: category `wall shelf`, room `bedroom`, placement `floor`, footprint `large`, height `low`, elevation `raised`, subject_proximity `mid`\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_bud_vase_0`: category `vase`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny`\n- `bedroom_jewelry_catchall_dish_1`: category `butter dish`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny`\n- `bedroom_smoke_detector_0`: category `smoke detector`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny`\n- `bedroom_paperback_book_1`: category `book`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny`\n- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`\n- `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `spatial_accessibility` metric for `bedroom_bedroom_bench_0` with affordance `sittable`.\nRelevant context: bedroom_shelf_0.\nCheck whether a person can physically walk up to the seating edge and sit from the front or from one genuinely open side. Require one clearly usable approach corridor from open room space all the way to the seat edge itself, not merely to the furniture cluster or to an outer corner of the seat. A standard coffee table in front of a sofa or loveseat is normal unless the local renders show that it closes off the usable seating edge. Consider the combined effect of nearby furniture on the front and side approach zones only when that pinch is visually evident in the local renders. Close adjacency to a bed or wall is acceptable only if one normal sitting approach remains.\nIf you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
"files": {
"render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/spatial_accessibility__bedroom_bedroom_bench_0__to__bedroom_shelf_0_local.png",
"render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/spatial_accessibility__bedroom_bedroom_bench_0__to__bedroom_shelf_0_local_context.png",
"render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_00_rgb.png",
"render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_01_rgb.png",
"render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_02_rgb.png",
"render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_03_rgb.png"
}
}
Result JSON
{
"check_id": "spatial_accessibility__bedroom_bedroom_bench_0__to__bedroom_shelf_0",
"metric": "spatial_accessibility",
"label": "pass",
"asset_fact_used": true,
"asset_fact_confidence": 0.95,
"evidence_conflict": false,
"reason": "In the local render, the bench has a clearly open long-side approach from the room side, with visible free floor space next to the seating edge and no direct blocker occupying the sit-down path. In the highlighted local_context render, the red bench is still reachable from that same open side, while the green wall shelf is mounted off to the wall and does not block access. The overview renders also show the bench positioned beside the bed with open room space along its usable side, so a person can walk up to the seating edge and sit from that open side now.",
"blocking_objects": [],
"confidence": 0.86
}
Judgement
butter dish · containable, openable, graspable
VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `spatial_accessibility` metric for `bedroom_jewelry_catchall_dish_0` with affordance `openable`.
Relevant context: bedroom_nightstand_1, bedroom_nightstand_0.
Check whether at least one practical front-facing or slightly offset standing position can reach the operating side so the object can be opened. A nearby bed corner or bench should not cause degraded by itself if a normal reach from the front region is still possible. Use degraded only when that front-region approach is directly pinched, occupied, or effectively unusable across the usable operating side. Do not count a cramped diagonal squeeze, shoulder-turned sidestep, or tiny side sliver as a valid approach path.
If you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.
Reason:
In the local operating-zone and local operating-zone-context renders, the red-highlighted subject does not read clearly as a small openable dish on a nightstand; instead it appears as a large elevated object near the wall, while the nearby small loose items and the green-highlighted target region are below it. The local plan-context and overview renders also do not clearly establish a normal front-facing standing position to the subject’s usable side as an openable container on either nightstand. Because the visible geometry conflicts with the provided asset fact that this subject is an openable butter dish, the operating side and practical approach path cannot be identified reliably from the evidence, so spatial accessibility is unknown rather than guessed.
Blocking Objects:
none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes. Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved. Workflow: 1. Read the task instruction and target metric. 2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction. 3. Local render order and intended use: 1. `local_operating_zone` is the clean RGB view for the most likely door-facing standing and opening zone. 2. `local_operating_zone_context` is a slightly wider highlighted context view of the same operating side. 3. `local_depth_probe` is an oblique RGB view meant to clarify depth and whether a blocker sits in front of the operating zone. 4. `local_plan_context` is a high local layout view used to check whether a nearby object actually spans or occupies the operating/approach zone rather than merely overlapping in perspective. 4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level. 5. Evaluate ONLY the requested metric and return a JSON object matching the schema. Scene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m. Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Metric: spatial_accessibility Subject object: bedroom_jewelry_catchall_dish_0 Target objects: bedroom_nightstand_1, bedroom_nightstand_0 Verified asset facts: - subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, affordances [containable, openable, graspable], usable face `front`, benchmark relevance `functional`, confidence `0.78`, source `asset_annotation` - interaction facts: placement class `unknown`, access type `none`, interaction height `none`, surface map `front: [access_contents, open, grasp]` - low-confidence candidates: `none` Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict. Scene evidence object hints: - subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, group `unknown`, keywords [butter dish], affordances [containable, openable, graspable], task targets [table, desk, nightstand, coffee_table, shelf, cabinet] - target `bedroom_nightstand_1`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object] - target `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object] Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context): - subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, room `bedroom`, placement `floor`, footprint `tiny`, height `very_low`, elevation `raised` Target cues: - target `bedroom_nightstand_1`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `near` - target `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `far` Nearby same-room context objects that may affect access or clearance: - `bedroom_hardcover_book_0`: category `book`, placement `floor`, proximity `touching`, bbox_relation `very_close`, footprint `tiny` - `bedroom_clear_glass_tumbler_0`: category `tumbler`, placement `floor`, proximity `touching`, bbox_relation `very_close`, footprint `tiny` - `bedroom_water_carafe_0`: category `carafe`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `small` - `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large` - `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large` - `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium` Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence. Metric-specific rule: Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `spatial_accessibility` metric for `bedroom_jewelry_catchall_dish_0` with affordance `openable`. Relevant context: bedroom_nightstand_1, bedroom_nightstand_0. Check whether at least one practical front-facing or slightly offset standing position can reach the operating side so the object can be opened. A nearby bed corner or bench should not cause degraded by itself if a normal reach from the front region is still possible. Use degraded only when that front-region approach is directly pinched, occupied, or effectively unusable across the usable operating side. Do not count a cramped diagonal squeeze, shoulder-turned sidestep, or tiny side sliver as a valid approach path. If you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing. Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green. Evidence guardrails: - Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space. - Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete. - To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap. - When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`. - If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`. Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence. Allowed labels: pass, degraded, fail, unknown. 2D perspective overlap alone is not enough evidence of a blocker. Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region. If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region. If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
"check_id": "spatial_accessibility__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0",
"metric": "spatial_accessibility",
"subject_id": "bedroom_jewelry_catchall_dish_0",
"target_ids": [
"bedroom_nightstand_1",
"bedroom_nightstand_0"
],
"prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local_operating_zone` is the clean RGB view for the most likely door-facing standing and opening zone. 2. `local_operating_zone_context` is a slightly wider highlighted context view of the same operating side. 3. `local_depth_probe` is an oblique RGB view meant to clarify depth and whether a blocker sits in front of the operating zone. 4. `local_plan_context` is a high local layout view used to check whether a nearby object actually spans or occupies the operating/approach zone rather than merely overlapping in perspective.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 21 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: spatial_accessibility\nSubject object: bedroom_jewelry_catchall_dish_0\nTarget objects: bedroom_nightstand_1, bedroom_nightstand_0\n\nVerified asset facts:\n- subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, affordances [containable, openable, graspable], usable face `front`, benchmark relevance `functional`, confidence `0.78`, source `asset_annotation`\n- interaction facts: placement class `unknown`, access type `none`, interaction height `none`, surface map `front: [access_contents, open, grasp]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, group `unknown`, keywords [butter dish], affordances [containable, openable, graspable], task targets [table, desk, nightstand, coffee_table, shelf, cabinet]\n- target `bedroom_nightstand_1`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object]\n- target `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object]\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, room `bedroom`, placement `floor`, footprint `tiny`, height `very_low`, elevation `raised`\nTarget cues:\n- target `bedroom_nightstand_1`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `near`\n- target `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `far`\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_hardcover_book_0`: category `book`, placement `floor`, proximity `touching`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_clear_glass_tumbler_0`: category `tumbler`, placement `floor`, proximity `touching`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_water_carafe_0`: category `carafe`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `small`\n- `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large`\n- `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large`\n- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `spatial_accessibility` metric for `bedroom_jewelry_catchall_dish_0` with affordance `openable`.\nRelevant context: bedroom_nightstand_1, bedroom_nightstand_0.\nCheck whether at least one practical front-facing or slightly offset standing position can reach the operating side so the object can be opened. A nearby bed corner or bench should not cause degraded by itself if a normal reach from the front region is still possible. Use degraded only when that front-region approach is directly pinched, occupied, or effectively unusable across the usable operating side. Do not count a cramped diagonal squeeze, shoulder-turned sidestep, or tiny side sliver as a valid approach path.\nIf you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
"files": {
"render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/spatial_accessibility__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0_local_operating_zone.png",
"render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/spatial_accessibility__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0_local_operating_zone_context.png",
"render_check_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/spatial_accessibility__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0_local_depth_probe.png",
"render_check_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/checks/spatial_accessibility__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0_local_plan_context.png",
"render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_00_rgb.png",
"render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_01_rgb.png",
"render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_02_rgb.png",
"render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/hssd_retrieved_bedroom/render_bundle/views/view_03_rgb.png"
}
}
Result JSON
{
"check_id": "spatial_accessibility__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0",
"metric": "spatial_accessibility",
"label": "unknown",
"asset_fact_used": true,
"asset_fact_confidence": 0.78,
"evidence_conflict": true,
"reason": "In the local operating-zone and local operating-zone-context renders, the red-highlighted subject does not read clearly as a small openable dish on a nightstand; instead it appears as a large elevated object near the wall, while the nearby small loose items and the green-highlighted target region are below it. The local plan-context and overview renders also do not clearly establish a normal front-facing standing position to the subject’s usable side as an openable container on either nightstand. Because the visible geometry conflicts with the provided asset fact that this subject is an openable butter dish, the operating side and practical approach path cannot be identified reliably from the evidence, so spatial accessibility is unknown rather than guessed.",
"blocking_objects": [],
"confidence": 0.87
}



































