scenesmith / mesh

scenesmith-example-data_scene_000

A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.

0.857
score
coverage 58.33%
effective pass 85.71%
pass 6 fail 1 unknown 5

Overview Gallery

4 images

Scene Summary

Total 12
Pass 6
Degraded 0
Fail 1
Unknown 5
Coverage 58.33%
Eff. Pass 85.71%
Score 0.857

Judgement
bed · sittable, supportable, containable VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `interaction_clearance` metric for `bedroom_bed_0` with affordance `sittable`. Relevant context: bedroom_rug_0, bedroom_nightstand_0. Check whether the immediate sit/stand zone at the usable sitting side has enough free space for a normal sit/stand motion. Consider the combined effect of nearby furniture around the seat edge; multiple pieces that collectively pinch the sit/stand zone should count as blocked even if no single item fully closes it. Do not require 360-degree circulation around the furniture. For a bed, if at least one bedside remains usable for a normal sit/stand motion, prefer `pass` over `degraded`. If you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.
Reason: In the local render and the highlighted local_context render, the bed’s long side next to the rug shows an open sit/stand zone without a direct object intruding into the bedside edge. The nearby nightstand is adjacent at one side, but it does not close off the usable sitting edge, and the bench at the foot stays at the bed end rather than blocking the bedside motion zone. The overview renders also support that at least one bedside remains practically usable for a normal sit/stand motion, so the bed’s sittable interaction clearance remains functional now.
Blocking Objects: none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes.

Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.

Workflow:
1. Read the task instruction and target metric.
2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.
3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.
4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.
5. Evaluate ONLY the requested metric and return a JSON object matching the schema.

Scene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.
Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Metric: interaction_clearance
Subject object: bedroom_bed_0
Target objects: bedroom_rug_0, bedroom_nightstand_0

Verified asset facts:
- subject `bedroom_bed_0`: category `bed`, affordances [sittable, supportable, containable], usable face `top`, benchmark relevance `functional`, confidence `0.52`, source `asset_annotation`
- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [sit, access_contents]; top: [place_or_reach_objects]`
- low-confidence candidates: `none`
Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.

Scene evidence object hints:
- subject `bedroom_bed_0`: category `bed`, group `sleeping`, keywords [bed, bedframe, mattress, sleeping area], affordances [sittable, supportable, containable], task targets [desk, table]
- target `bedroom_rug_0`: category `rug`, group `soft_furnishing`, keywords [rug, carpet, floor mat], task targets [graspable_object]
- target `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object]

Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):
- subject `bedroom_bed_0`: category `bed`, room `bedroom`, placement `floor`, footprint `large`, height `tall`, elevation `floor_level`
Target cues:
- target `bedroom_rug_0`: category `rug`, room `bedroom`, placement `floor`, footprint `large`, height `tall`, elevation `floor_level`, subject_proximity `touching`
- target `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `mid`
Nearby same-room context objects that may affect access or clearance:
- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `medium`
- `bedroom_alarm_clock_0`: category `alarm clock`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `tiny`
- `bedroom_nightstand_1`: category `nightstand`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `small`
- `bedroom_bedroom_bench_0`: category `bench`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`
- `bedroom_canvas_0`: category `canvas_canvas`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large`
- `bedroom_jewelry_catchall_dish_0`: category `butter dish`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `tiny`
Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.

Metric-specific rule:
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `interaction_clearance` metric for `bedroom_bed_0` with affordance `sittable`.
Relevant context: bedroom_rug_0, bedroom_nightstand_0.
Check whether the immediate sit/stand zone at the usable sitting side has enough free space for a normal sit/stand motion. Consider the combined effect of nearby furniture around the seat edge; multiple pieces that collectively pinch the sit/stand zone should count as blocked even if no single item fully closes it. Do not require 360-degree circulation around the furniture. For a bed, if at least one bedside remains usable for a normal sit/stand motion, prefer `pass` over `degraded`.
If you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.

Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.
Evidence guardrails:
- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.
- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.
- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.
- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.
- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.
Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.
Allowed labels: pass, degraded, fail, unknown.
2D perspective overlap alone is not enough evidence of a blocker.
Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.
If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.
If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
  "check_id": "interaction_clearance__bedroom_bed_0__to__bedroom_rug_0__bedroom_nightstand_0",
  "metric": "interaction_clearance",
  "subject_id": "bedroom_bed_0",
  "target_ids": [
    "bedroom_rug_0",
    "bedroom_nightstand_0"
  ],
  "prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: interaction_clearance\nSubject object: bedroom_bed_0\nTarget objects: bedroom_rug_0, bedroom_nightstand_0\n\nVerified asset facts:\n- subject `bedroom_bed_0`: category `bed`, affordances [sittable, supportable, containable], usable face `top`, benchmark relevance `functional`, confidence `0.52`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [sit, access_contents]; top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_bed_0`: category `bed`, group `sleeping`, keywords [bed, bedframe, mattress, sleeping area], affordances [sittable, supportable, containable], task targets [desk, table]\n- target `bedroom_rug_0`: category `rug`, group `soft_furnishing`, keywords [rug, carpet, floor mat], task targets [graspable_object]\n- target `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object]\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_bed_0`: category `bed`, room `bedroom`, placement `floor`, footprint `large`, height `tall`, elevation `floor_level`\nTarget cues:\n- target `bedroom_rug_0`: category `rug`, room `bedroom`, placement `floor`, footprint `large`, height `tall`, elevation `floor_level`, subject_proximity `touching`\n- target `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `mid`\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `medium`\n- `bedroom_alarm_clock_0`: category `alarm clock`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `tiny`\n- `bedroom_nightstand_1`: category `nightstand`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `small`\n- `bedroom_bedroom_bench_0`: category `bench`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`\n- `bedroom_canvas_0`: category `canvas_canvas`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large`\n- `bedroom_jewelry_catchall_dish_0`: category `butter dish`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `tiny`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `interaction_clearance` metric for `bedroom_bed_0` with affordance `sittable`.\nRelevant context: bedroom_rug_0, bedroom_nightstand_0.\nCheck whether the immediate sit/stand zone at the usable sitting side has enough free space for a normal sit/stand motion. Consider the combined effect of nearby furniture around the seat edge; multiple pieces that collectively pinch the sit/stand zone should count as blocked even if no single item fully closes it. Do not require 360-degree circulation around the furniture. For a bed, if at least one bedside remains usable for a normal sit/stand motion, prefer `pass` over `degraded`.\nIf you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
  "files": {
    "render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/interaction_clearance__bedroom_bed_0__to__bedroom_rug_0__bedroom_nightstand_0_local.png",
    "render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/interaction_clearance__bedroom_bed_0__to__bedroom_rug_0__bedroom_nightstand_0_local_context.png",
    "render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_00_rgb.png",
    "render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_01_rgb.png",
    "render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_02_rgb.png",
    "render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_03_rgb.png"
  }
}
Result JSON
{
  "check_id": "interaction_clearance__bedroom_bed_0__to__bedroom_rug_0__bedroom_nightstand_0",
  "metric": "interaction_clearance",
  "label": "pass",
  "asset_fact_used": true,
  "asset_fact_confidence": 0.52,
  "evidence_conflict": false,
  "reason": "In the local render and the highlighted local_context render, the bed’s long side next to the rug shows an open sit/stand zone without a direct object intruding into the bedside edge. The nearby nightstand is adjacent at one side, but it does not close off the usable sitting edge, and the bench at the foot stays at the bed end rather than blocking the bedside motion zone. The overview renders also support that at least one bedside remains practically usable for a normal sit/stand motion, so the bed’s sittable interaction clearance remains functional now.",
  "blocking_objects": [],
  "confidence": 0.67
}
Evidence
interaction_clearance__bedroom_bed_0__to__bedroom_rug_0__bedroom_nightstand_0

Judgement
wardrobe · openable, containable, supportable, drawable VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `interaction_clearance` metric for `bedroom_wardrobe_0` with affordance `openable`. Relevant context: the surrounding scene. First infer the most likely operating side from door panels, handles, hinges, or the side that faces into the room. Then check only that side for a normal standing zone and immediate hand/door operating clearance. Do not assume a large hinged swing arc unless the render clearly shows hinged doors needing that arc. Do not require the entire front face to be empty; a nearby bench or seat off to one side is acceptable when the floor directly before the door/handle area remains usable. A narrow diagonal side gap or a squeeze-by path does not count as clear operating space. If you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.
Reason: The evidence is too ambiguous to judge the wardrobe’s immediate opening/standing clearance. In the local operating-zone and local operating-zone-context renders, the highlighted object appears as a low horizontal cabinet-like form beside the bed, while in the local plan-context render the highlighted subject appears as a vertical wall-side rectangle; these views do not clearly agree on the subject’s shape or operating face. Because of that conflict, I cannot reliably identify the wardrobe’s true front/door side or confirm the exact standing zone directly in front of it from the local renders. The overview renders also show nearby furniture, but they do not resolve which face is the usable opening side well enough to judge clearance without guessing. Under the asset-fact conflict and unclear local evidence, the correct result is unknown rather than inferring blockage or clear access from perspective alone.
Blocking Objects: none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes.

Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.

Workflow:
1. Read the task instruction and target metric.
2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.
3. Local render order and intended use: 1. `local_operating_zone` is the clean RGB view for the most likely door-facing standing and opening zone. 2. `local_operating_zone_context` is a slightly wider highlighted context view of the same operating side. 3. `local_depth_probe` is an oblique RGB view meant to clarify depth and whether a blocker sits in front of the operating zone. 4. `local_plan_context` is a high local layout view used to check whether a nearby object actually spans or occupies the operating/approach zone rather than merely overlapping in perspective.
4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.
5. Evaluate ONLY the requested metric and return a JSON object matching the schema.

Scene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.
Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Metric: interaction_clearance
Subject object: bedroom_wardrobe_0
Target objects: none

Verified asset facts:
- subject `bedroom_wardrobe_0`: category `wardrobe`, affordances [openable, containable, supportable, drawable], usable face `top`, benchmark relevance `functional`, confidence `0.64`, source `asset_annotation`
- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [open, access_contents, open]; top: [place_or_reach_objects]`
- low-confidence candidates: `none`
Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.

Scene evidence object hints:
- subject `bedroom_wardrobe_0`: category `wardrobe`, group `storage`, keywords [wardrobe, closet, armoire], affordances [openable, containable, supportable, drawable], task targets [room, wall, clear_space]
- targets: none

Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):
- subject `bedroom_wardrobe_0`: category `wardrobe`, room `bedroom`, placement `floor`, footprint `large`, height `mid`, elevation `floor_level`
Target cues:
- none
Nearby same-room context objects that may affect access or clearance:
- `bedroom_art_print_0`: category `art_print_art`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`
- `bedroom_smoke_detector_0`: category `smoke detector`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `tiny`
- `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `tiny`
- `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`
- `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `large`
- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`
Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.

Metric-specific rule:
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `interaction_clearance` metric for `bedroom_wardrobe_0` with affordance `openable`.
Relevant context: the surrounding scene.
First infer the most likely operating side from door panels, handles, hinges, or the side that faces into the room. Then check only that side for a normal standing zone and immediate hand/door operating clearance. Do not assume a large hinged swing arc unless the render clearly shows hinged doors needing that arc. Do not require the entire front face to be empty; a nearby bench or seat off to one side is acceptable when the floor directly before the door/handle area remains usable. A narrow diagonal side gap or a squeeze-by path does not count as clear operating space.
If you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.

Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.
Evidence guardrails:
- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.
- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.
- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.
- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.
- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.
Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.
Allowed labels: pass, degraded, fail, unknown.
2D perspective overlap alone is not enough evidence of a blocker.
Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.
If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.
If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
  "check_id": "interaction_clearance__bedroom_wardrobe_0",
  "metric": "interaction_clearance",
  "subject_id": "bedroom_wardrobe_0",
  "target_ids": [],
  "prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local_operating_zone` is the clean RGB view for the most likely door-facing standing and opening zone. 2. `local_operating_zone_context` is a slightly wider highlighted context view of the same operating side. 3. `local_depth_probe` is an oblique RGB view meant to clarify depth and whether a blocker sits in front of the operating zone. 4. `local_plan_context` is a high local layout view used to check whether a nearby object actually spans or occupies the operating/approach zone rather than merely overlapping in perspective.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: interaction_clearance\nSubject object: bedroom_wardrobe_0\nTarget objects: none\n\nVerified asset facts:\n- subject `bedroom_wardrobe_0`: category `wardrobe`, affordances [openable, containable, supportable, drawable], usable face `top`, benchmark relevance `functional`, confidence `0.64`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [open, access_contents, open]; top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_wardrobe_0`: category `wardrobe`, group `storage`, keywords [wardrobe, closet, armoire], affordances [openable, containable, supportable, drawable], task targets [room, wall, clear_space]\n- targets: none\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_wardrobe_0`: category `wardrobe`, room `bedroom`, placement `floor`, footprint `large`, height `mid`, elevation `floor_level`\nTarget cues:\n- none\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_art_print_0`: category `art_print_art`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`\n- `bedroom_smoke_detector_0`: category `smoke detector`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`\n- `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `large`\n- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `interaction_clearance` metric for `bedroom_wardrobe_0` with affordance `openable`.\nRelevant context: the surrounding scene.\nFirst infer the most likely operating side from door panels, handles, hinges, or the side that faces into the room. Then check only that side for a normal standing zone and immediate hand/door operating clearance. Do not assume a large hinged swing arc unless the render clearly shows hinged doors needing that arc. Do not require the entire front face to be empty; a nearby bench or seat off to one side is acceptable when the floor directly before the door/handle area remains usable. A narrow diagonal side gap or a squeeze-by path does not count as clear operating space.\nIf you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
  "files": {
    "render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/interaction_clearance__bedroom_wardrobe_0_local_operating_zone.png",
    "render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/interaction_clearance__bedroom_wardrobe_0_local_operating_zone_context.png",
    "render_check_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/interaction_clearance__bedroom_wardrobe_0_local_depth_probe.png",
    "render_check_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/interaction_clearance__bedroom_wardrobe_0_local_plan_context.png",
    "render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_00_rgb.png",
    "render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_01_rgb.png",
    "render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_02_rgb.png",
    "render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_03_rgb.png"
  }
}
Result JSON
{
  "check_id": "interaction_clearance__bedroom_wardrobe_0",
  "metric": "interaction_clearance",
  "label": "unknown",
  "asset_fact_used": true,
  "asset_fact_confidence": 0.64,
  "evidence_conflict": true,
  "reason": "The evidence is too ambiguous to judge the wardrobe’s immediate opening/standing clearance. In the local operating-zone and local operating-zone-context renders, the highlighted object appears as a low horizontal cabinet-like form beside the bed, while in the local plan-context render the highlighted subject appears as a vertical wall-side rectangle; these views do not clearly agree on the subject’s shape or operating face. Because of that conflict, I cannot reliably identify the wardrobe’s true front/door side or confirm the exact standing zone directly in front of it from the local renders. The overview renders also show nearby furniture, but they do not resolve which face is the usable opening side well enough to judge clearance without guessing. Under the asset-fact conflict and unclear local evidence, the correct result is unknown rather than inferring blockage or clear access from perspective alone.",
  "blocking_objects": [],
  "confidence": 0.26
}
Evidence
interaction_clearance__bedroom_wardrobe_0

Judgement
bench · sittable, supportable, pushable, liftable VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `interaction_clearance` metric for `bedroom_bedroom_bench_0` with affordance `sittable`. Relevant context: bedroom_shelf_0. Check whether the immediate sit/stand zone at the usable sitting side has enough free space for a normal sit/stand motion. Consider the combined effect of nearby furniture around the seat edge; multiple pieces that collectively pinch the sit/stand zone should count as blocked even if no single item fully closes it. Do not require 360-degree circulation around the furniture. For a bed, if at least one bedside remains usable for a normal sit/stand motion, prefer `pass` over `degraded`. If you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.
Reason: In the local render and the highlighted local_context render, the subject bench appears tipped onto its side rather than upright, and the overview renders show the same overturned orientation. That directly conflicts with the provided asset fact that the bench's usable sittable face is the top, so I cannot reliably judge the normal sit/stand clearance for a usable sitting side in its current state from this evidence alone.
Blocking Objects: none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes.

Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.

Workflow:
1. Read the task instruction and target metric.
2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.
3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.
4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.
5. Evaluate ONLY the requested metric and return a JSON object matching the schema.

Scene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.
Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Metric: interaction_clearance
Subject object: bedroom_bedroom_bench_0
Target objects: bedroom_shelf_0

Verified asset facts:
- subject `bedroom_bedroom_bench_0`: category `bench`, affordances [sittable, supportable, pushable, liftable], usable face `top`, benchmark relevance `functional`, confidence `0.67`, source `asset_annotation`
- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [sit]; top: [place_or_reach_objects]`
- low-confidence candidates: `none`
Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.

Scene evidence object hints:
- subject `bedroom_bedroom_bench_0`: category `bench`, group `seating`, keywords [bench, seating bench], affordances [sittable, supportable, pushable, liftable], task targets [desk, table]
- target `bedroom_shelf_0`: category `wall shelf`, group `unknown`, keywords [wall shelf], task targets [graspable_object]

Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):
- subject `bedroom_bedroom_bench_0`: category `bench`, room `bedroom`, placement `floor`, footprint `medium`, height `low`, elevation `floor_level`
Target cues:
- target `bedroom_shelf_0`: category `wall shelf`, room `bedroom`, placement `floor`, footprint `large`, height `low`, elevation `raised`, subject_proximity `mid`
Nearby same-room context objects that may affect access or clearance:
- `bedroom_bud_vase_0`: category `vase`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny`
- `bedroom_smartphone_1`: category `smartphone_smartphone`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny`
- `bedroom_jewelry_catchall_dish_1`: category `butter dish`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny`
- `bedroom_smoke_detector_0`: category `smoke detector`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny`
- `bedroom_paperback_book_1`: category `book`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny`
- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`
Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.

Metric-specific rule:
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `interaction_clearance` metric for `bedroom_bedroom_bench_0` with affordance `sittable`.
Relevant context: bedroom_shelf_0.
Check whether the immediate sit/stand zone at the usable sitting side has enough free space for a normal sit/stand motion. Consider the combined effect of nearby furniture around the seat edge; multiple pieces that collectively pinch the sit/stand zone should count as blocked even if no single item fully closes it. Do not require 360-degree circulation around the furniture. For a bed, if at least one bedside remains usable for a normal sit/stand motion, prefer `pass` over `degraded`.
If you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.

Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.
Evidence guardrails:
- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.
- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.
- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.
- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.
- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.
Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.
Allowed labels: pass, degraded, fail, unknown.
2D perspective overlap alone is not enough evidence of a blocker.
Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.
If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.
If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
  "check_id": "interaction_clearance__bedroom_bedroom_bench_0__to__bedroom_shelf_0",
  "metric": "interaction_clearance",
  "subject_id": "bedroom_bedroom_bench_0",
  "target_ids": [
    "bedroom_shelf_0"
  ],
  "prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: interaction_clearance\nSubject object: bedroom_bedroom_bench_0\nTarget objects: bedroom_shelf_0\n\nVerified asset facts:\n- subject `bedroom_bedroom_bench_0`: category `bench`, affordances [sittable, supportable, pushable, liftable], usable face `top`, benchmark relevance `functional`, confidence `0.67`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [sit]; top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_bedroom_bench_0`: category `bench`, group `seating`, keywords [bench, seating bench], affordances [sittable, supportable, pushable, liftable], task targets [desk, table]\n- target `bedroom_shelf_0`: category `wall shelf`, group `unknown`, keywords [wall shelf], task targets [graspable_object]\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_bedroom_bench_0`: category `bench`, room `bedroom`, placement `floor`, footprint `medium`, height `low`, elevation `floor_level`\nTarget cues:\n- target `bedroom_shelf_0`: category `wall shelf`, room `bedroom`, placement `floor`, footprint `large`, height `low`, elevation `raised`, subject_proximity `mid`\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_bud_vase_0`: category `vase`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny`\n- `bedroom_smartphone_1`: category `smartphone_smartphone`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny`\n- `bedroom_jewelry_catchall_dish_1`: category `butter dish`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny`\n- `bedroom_smoke_detector_0`: category `smoke detector`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny`\n- `bedroom_paperback_book_1`: category `book`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny`\n- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `interaction_clearance` metric for `bedroom_bedroom_bench_0` with affordance `sittable`.\nRelevant context: bedroom_shelf_0.\nCheck whether the immediate sit/stand zone at the usable sitting side has enough free space for a normal sit/stand motion. Consider the combined effect of nearby furniture around the seat edge; multiple pieces that collectively pinch the sit/stand zone should count as blocked even if no single item fully closes it. Do not require 360-degree circulation around the furniture. For a bed, if at least one bedside remains usable for a normal sit/stand motion, prefer `pass` over `degraded`.\nIf you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
  "files": {
    "render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/interaction_clearance__bedroom_bedroom_bench_0__to__bedroom_shelf_0_local.png",
    "render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/interaction_clearance__bedroom_bedroom_bench_0__to__bedroom_shelf_0_local_context.png",
    "render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_00_rgb.png",
    "render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_01_rgb.png",
    "render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_02_rgb.png",
    "render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_03_rgb.png"
  }
}
Result JSON
{
  "check_id": "interaction_clearance__bedroom_bedroom_bench_0__to__bedroom_shelf_0",
  "metric": "interaction_clearance",
  "label": "unknown",
  "asset_fact_used": true,
  "asset_fact_confidence": 0.67,
  "evidence_conflict": true,
  "reason": "In the local render and the highlighted local_context render, the subject bench appears tipped onto its side rather than upright, and the overview renders show the same overturned orientation. That directly conflicts with the provided asset fact that the bench's usable sittable face is the top, so I cannot reliably judge the normal sit/stand clearance for a usable sitting side in its current state from this evidence alone.",
  "blocking_objects": [],
  "confidence": 0.95
}
Evidence
interaction_clearance__bedroom_bedroom_bench_0__to__bedroom_shelf_0

Judgement
butter dish · containable, openable, graspable VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `interaction_clearance` metric for `bedroom_jewelry_catchall_dish_0` with affordance `openable`. Relevant context: bedroom_nightstand_1, bedroom_nightstand_0. First infer the most likely operating side from door panels, handles, hinges, or the side that faces into the room. Then check only that side for a normal standing zone and immediate hand/door operating clearance. Do not assume a large hinged swing arc unless the render clearly shows hinged doors needing that arc. Do not require the entire front face to be empty; a nearby bench or seat off to one side is acceptable when the floor directly before the door/handle area remains usable. A narrow diagonal side gap or a squeeze-by path does not count as clear operating space. If you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.
Reason: In the local operating zone render and the highlighted local operating zone context render, the red-marked subject appears as a small oval object positioned very low near the wall and below/adjacent to the green-highlighted nightstand, not as a clearly identifiable butter-dish-like openable object with a readable front operating side. The overview renders show the same subject near the floor/wall area rather than clearly on a nightstand surface. Because the visible geometry conflicts with the provided asset fact that this subject should be judged as an openable butter dish with a usable front side, the operating side cannot be identified reliably, so interaction clearance cannot be judged from the evidence without guessing.
Blocking Objects: none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes.

Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.

Workflow:
1. Read the task instruction and target metric.
2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.
3. Local render order and intended use: 1. `local_operating_zone` is the clean RGB view for the most likely door-facing standing and opening zone. 2. `local_operating_zone_context` is a slightly wider highlighted context view of the same operating side. 3. `local_depth_probe` is an oblique RGB view meant to clarify depth and whether a blocker sits in front of the operating zone. 4. `local_plan_context` is a high local layout view used to check whether a nearby object actually spans or occupies the operating/approach zone rather than merely overlapping in perspective.
4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.
5. Evaluate ONLY the requested metric and return a JSON object matching the schema.

Scene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.
Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Metric: interaction_clearance
Subject object: bedroom_jewelry_catchall_dish_0
Target objects: bedroom_nightstand_1, bedroom_nightstand_0

Verified asset facts:
- subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, affordances [containable, openable, graspable], usable face `front`, benchmark relevance `functional`, confidence `0.55`, source `asset_annotation`
- interaction facts: placement class `unknown`, access type `none`, interaction height `none`, surface map `front: [access_contents, open, grasp]`
- low-confidence candidates: `none`
Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.

Scene evidence object hints:
- subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, group `unknown`, keywords [butter dish], affordances [containable, openable, graspable], task targets [table, desk, nightstand, coffee_table, shelf, cabinet]
- target `bedroom_nightstand_1`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object]
- target `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object]

Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):
- subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, room `bedroom`, placement `floor`, footprint `tiny`, height `very_low`, elevation `raised`
Target cues:
- target `bedroom_nightstand_1`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `near`
- target `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `far`
Nearby same-room context objects that may affect access or clearance:
- `bedroom_hardcover_book_0`: category `book`, placement `floor`, proximity `touching`, bbox_relation `very_close`, footprint `tiny`
- `bedroom_clear_glass_tumbler_0`: category `tumbler`, placement `floor`, proximity `touching`, bbox_relation `very_close`, footprint `tiny`
- `bedroom_water_carafe_0`: category `carafe`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `small`
- `bedroom_wall_sconce_0`: category `wall_sconce_wall`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `small`
- `bedroom_canvas_0`: category `canvas_canvas`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`
- `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large`
Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.

Metric-specific rule:
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `interaction_clearance` metric for `bedroom_jewelry_catchall_dish_0` with affordance `openable`.
Relevant context: bedroom_nightstand_1, bedroom_nightstand_0.
First infer the most likely operating side from door panels, handles, hinges, or the side that faces into the room. Then check only that side for a normal standing zone and immediate hand/door operating clearance. Do not assume a large hinged swing arc unless the render clearly shows hinged doors needing that arc. Do not require the entire front face to be empty; a nearby bench or seat off to one side is acceptable when the floor directly before the door/handle area remains usable. A narrow diagonal side gap or a squeeze-by path does not count as clear operating space.
If you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.

Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.
Evidence guardrails:
- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.
- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.
- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.
- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.
- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.
Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.
Allowed labels: pass, degraded, fail, unknown.
2D perspective overlap alone is not enough evidence of a blocker.
Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.
If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.
If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
  "check_id": "interaction_clearance__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0",
  "metric": "interaction_clearance",
  "subject_id": "bedroom_jewelry_catchall_dish_0",
  "target_ids": [
    "bedroom_nightstand_1",
    "bedroom_nightstand_0"
  ],
  "prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local_operating_zone` is the clean RGB view for the most likely door-facing standing and opening zone. 2. `local_operating_zone_context` is a slightly wider highlighted context view of the same operating side. 3. `local_depth_probe` is an oblique RGB view meant to clarify depth and whether a blocker sits in front of the operating zone. 4. `local_plan_context` is a high local layout view used to check whether a nearby object actually spans or occupies the operating/approach zone rather than merely overlapping in perspective.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: interaction_clearance\nSubject object: bedroom_jewelry_catchall_dish_0\nTarget objects: bedroom_nightstand_1, bedroom_nightstand_0\n\nVerified asset facts:\n- subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, affordances [containable, openable, graspable], usable face `front`, benchmark relevance `functional`, confidence `0.55`, source `asset_annotation`\n- interaction facts: placement class `unknown`, access type `none`, interaction height `none`, surface map `front: [access_contents, open, grasp]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, group `unknown`, keywords [butter dish], affordances [containable, openable, graspable], task targets [table, desk, nightstand, coffee_table, shelf, cabinet]\n- target `bedroom_nightstand_1`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object]\n- target `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object]\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, room `bedroom`, placement `floor`, footprint `tiny`, height `very_low`, elevation `raised`\nTarget cues:\n- target `bedroom_nightstand_1`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `near`\n- target `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `far`\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_hardcover_book_0`: category `book`, placement `floor`, proximity `touching`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_clear_glass_tumbler_0`: category `tumbler`, placement `floor`, proximity `touching`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_water_carafe_0`: category `carafe`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `small`\n- `bedroom_wall_sconce_0`: category `wall_sconce_wall`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `small`\n- `bedroom_canvas_0`: category `canvas_canvas`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`\n- `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `interaction_clearance` metric for `bedroom_jewelry_catchall_dish_0` with affordance `openable`.\nRelevant context: bedroom_nightstand_1, bedroom_nightstand_0.\nFirst infer the most likely operating side from door panels, handles, hinges, or the side that faces into the room. Then check only that side for a normal standing zone and immediate hand/door operating clearance. Do not assume a large hinged swing arc unless the render clearly shows hinged doors needing that arc. Do not require the entire front face to be empty; a nearby bench or seat off to one side is acceptable when the floor directly before the door/handle area remains usable. A narrow diagonal side gap or a squeeze-by path does not count as clear operating space.\nIf you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
  "files": {
    "render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/interaction_clearance__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0_local_operating_zone.png",
    "render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/interaction_clearance__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0_local_operating_zone_context.png",
    "render_check_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/interaction_clearance__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0_local_depth_probe.png",
    "render_check_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/interaction_clearance__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0_local_plan_context.png",
    "render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_00_rgb.png",
    "render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_01_rgb.png",
    "render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_02_rgb.png",
    "render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_03_rgb.png"
  }
}
Result JSON
{
  "check_id": "interaction_clearance__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0",
  "metric": "interaction_clearance",
  "label": "unknown",
  "asset_fact_used": true,
  "asset_fact_confidence": 0.55,
  "evidence_conflict": true,
  "reason": "In the local operating zone render and the highlighted local operating zone context render, the red-marked subject appears as a small oval object positioned very low near the wall and below/adjacent to the green-highlighted nightstand, not as a clearly identifiable butter-dish-like openable object with a readable front operating side. The overview renders show the same subject near the floor/wall area rather than clearly on a nightstand surface. Because the visible geometry conflicts with the provided asset fact that this subject should be judged as an openable butter dish with a usable front side, the operating side cannot be identified reliably, so interaction clearance cannot be judged from the evidence without guessing.",
  "blocking_objects": [],
  "confidence": 0.86
}
Evidence
interaction_clearance__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0

Judgement
nightstand · supportable VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `affordance_exposure` metric for `bedroom_nightstand_0` with affordance `supportable`. Relevant context: the surrounding scene. Check whether the usable top surface is visible enough to judge placement or reach. Do not require all table edges to be exposed. If you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.
Reason: In the local_context render, the subject nightstand is highlighted in red as a small box-like object near the wall, but its usable top surface is not clearly presented from above. The clean local render is dominated by another large purple object in the foreground, so it is unclear whether the nightstand's actual top surface is the visible surface or whether the subject is partially out of view. The overview renders show the room layout, but they also do not cleanly expose the highlighted nightstand's top surface enough to judge placement/reach visibility with confidence. Because the decisive top surface is not clearly identifiable across the local views, the affordance exposure is uncertain rather than clearly blocked or visible fully enough to pass decisively.
Blocking Objects: none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes.

Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.

Workflow:
1. Read the task instruction and target metric.
2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.
3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.
4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.
5. Evaluate ONLY the requested metric and return a JSON object matching the schema.

Scene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.
Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Metric: affordance_exposure
Subject object: bedroom_nightstand_0
Target objects: none

Verified asset facts:
- subject `bedroom_nightstand_0`: category `nightstand`, affordances [supportable], usable face `top`, benchmark relevance `functional`, confidence `0.60`, source `asset_annotation`
- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `top: [place_or_reach_objects]`
- low-confidence candidates: `none`
Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.

Scene evidence object hints:
- subject `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], affordances [supportable], task targets [graspable_object]
- targets: none

Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):
- subject `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`
Target cues:
- none
Nearby same-room context objects that may affect access or clearance:
- `bedroom_alarm_clock_0`: category `alarm clock`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`
- `bedroom_smartphone_0`: category `smartphone_smartphone`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`
- `bedroom_paperback_book_0`: category `book`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`
- `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `tiny`
- `bedroom_wall_sconce_1`: category `wall_sconce_wall`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `small`
- `bedroom_canvas_0`: category `canvas_canvas`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`
Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.

Metric-specific rule:
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `affordance_exposure` metric for `bedroom_nightstand_0` with affordance `supportable`.
Relevant context: the surrounding scene.
Check whether the usable top surface is visible enough to judge placement or reach. Do not require all table edges to be exposed.
If you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.

Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.
Evidence guardrails:
- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.
- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.
- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.
- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.
- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.
Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.
Allowed labels: pass, degraded, fail, unknown.
2D perspective overlap alone is not enough evidence of a blocker.
Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.
If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.
If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
  "check_id": "affordance_exposure__bedroom_nightstand_0",
  "metric": "affordance_exposure",
  "subject_id": "bedroom_nightstand_0",
  "target_ids": [],
  "prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: affordance_exposure\nSubject object: bedroom_nightstand_0\nTarget objects: none\n\nVerified asset facts:\n- subject `bedroom_nightstand_0`: category `nightstand`, affordances [supportable], usable face `top`, benchmark relevance `functional`, confidence `0.60`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], affordances [supportable], task targets [graspable_object]\n- targets: none\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`\nTarget cues:\n- none\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_alarm_clock_0`: category `alarm clock`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`\n- `bedroom_smartphone_0`: category `smartphone_smartphone`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`\n- `bedroom_paperback_book_0`: category `book`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`\n- `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_wall_sconce_1`: category `wall_sconce_wall`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `small`\n- `bedroom_canvas_0`: category `canvas_canvas`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `affordance_exposure` metric for `bedroom_nightstand_0` with affordance `supportable`.\nRelevant context: the surrounding scene.\nCheck whether the usable top surface is visible enough to judge placement or reach. Do not require all table edges to be exposed.\nIf you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
  "files": {
    "render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/affordance_exposure__bedroom_nightstand_0_local.png",
    "render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/affordance_exposure__bedroom_nightstand_0_local_context.png",
    "render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_00_rgb.png",
    "render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_01_rgb.png",
    "render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_02_rgb.png",
    "render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_03_rgb.png"
  }
}
Result JSON
{
  "check_id": "affordance_exposure__bedroom_nightstand_0",
  "metric": "affordance_exposure",
  "label": "unknown",
  "asset_fact_used": true,
  "asset_fact_confidence": 0.6,
  "evidence_conflict": false,
  "reason": "In the local_context render, the subject nightstand is highlighted in red as a small box-like object near the wall, but its usable top surface is not clearly presented from above. The clean local render is dominated by another large purple object in the foreground, so it is unclear whether the nightstand's actual top surface is the visible surface or whether the subject is partially out of view. The overview renders show the room layout, but they also do not cleanly expose the highlighted nightstand's top surface enough to judge placement/reach visibility with confidence. Because the decisive top surface is not clearly identifiable across the local views, the affordance exposure is uncertain rather than clearly blocked or visible fully enough to pass decisively.",
  "blocking_objects": [],
  "confidence": 0.73
}
Evidence
affordance_exposure__bedroom_nightstand_0

Judgement
nightstand · supportable VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `affordance_exposure` metric for `bedroom_nightstand_1` with affordance `supportable`. Relevant context: the surrounding scene. Check whether the usable top surface is visible enough to judge placement or reach. Do not require all table edges to be exposed. If you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.
Reason: In the local render, the nightstand's top surface is directly visible in the foreground, with a broad horizontal usable area exposed enough to judge object placement or reach. The local_context and overview renders also show the nightstand area from surrounding angles, and no solid object is visibly covering the top functional region. Because the supportable top is visibly exposed rather than hidden, this affordance exposure check passes.
Blocking Objects: none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes.

Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.

Workflow:
1. Read the task instruction and target metric.
2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.
3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.
4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.
5. Evaluate ONLY the requested metric and return a JSON object matching the schema.

Scene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.
Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Metric: affordance_exposure
Subject object: bedroom_nightstand_1
Target objects: none

Verified asset facts:
- subject `bedroom_nightstand_1`: category `nightstand`, affordances [supportable], usable face `top`, benchmark relevance `functional`, confidence `0.60`, source `asset_annotation`
- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `top: [place_or_reach_objects]`
- low-confidence candidates: `none`
Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.

Scene evidence object hints:
- subject `bedroom_nightstand_1`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], affordances [supportable], task targets [graspable_object]
- targets: none

Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):
- subject `bedroom_nightstand_1`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`
Target cues:
- none
Nearby same-room context objects that may affect access or clearance:
- `bedroom_jewelry_catchall_dish_0`: category `butter dish`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`
- `bedroom_water_carafe_0`: category `carafe`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `small`
- `bedroom_clear_glass_tumbler_0`: category `tumbler`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`
- `bedroom_hardcover_book_0`: category `book`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `tiny`
- `bedroom_wall_sconce_0`: category `wall_sconce_wall`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `small`
- `bedroom_canvas_0`: category `canvas_canvas`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`
Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.

Metric-specific rule:
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `affordance_exposure` metric for `bedroom_nightstand_1` with affordance `supportable`.
Relevant context: the surrounding scene.
Check whether the usable top surface is visible enough to judge placement or reach. Do not require all table edges to be exposed.
If you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.

Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.
Evidence guardrails:
- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.
- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.
- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.
- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.
- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.
Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.
Allowed labels: pass, degraded, fail, unknown.
2D perspective overlap alone is not enough evidence of a blocker.
Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.
If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.
If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
  "check_id": "affordance_exposure__bedroom_nightstand_1",
  "metric": "affordance_exposure",
  "subject_id": "bedroom_nightstand_1",
  "target_ids": [],
  "prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: affordance_exposure\nSubject object: bedroom_nightstand_1\nTarget objects: none\n\nVerified asset facts:\n- subject `bedroom_nightstand_1`: category `nightstand`, affordances [supportable], usable face `top`, benchmark relevance `functional`, confidence `0.60`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_nightstand_1`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], affordances [supportable], task targets [graspable_object]\n- targets: none\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_nightstand_1`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`\nTarget cues:\n- none\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_jewelry_catchall_dish_0`: category `butter dish`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`\n- `bedroom_water_carafe_0`: category `carafe`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `small`\n- `bedroom_clear_glass_tumbler_0`: category `tumbler`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`\n- `bedroom_hardcover_book_0`: category `book`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_wall_sconce_0`: category `wall_sconce_wall`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `small`\n- `bedroom_canvas_0`: category `canvas_canvas`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `affordance_exposure` metric for `bedroom_nightstand_1` with affordance `supportable`.\nRelevant context: the surrounding scene.\nCheck whether the usable top surface is visible enough to judge placement or reach. Do not require all table edges to be exposed.\nIf you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
  "files": {
    "render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/affordance_exposure__bedroom_nightstand_1_local.png",
    "render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/affordance_exposure__bedroom_nightstand_1_local_context.png",
    "render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_00_rgb.png",
    "render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_01_rgb.png",
    "render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_02_rgb.png",
    "render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_03_rgb.png"
  }
}
Result JSON
{
  "check_id": "affordance_exposure__bedroom_nightstand_1",
  "metric": "affordance_exposure",
  "label": "pass",
  "asset_fact_used": true,
  "asset_fact_confidence": 0.6,
  "evidence_conflict": false,
  "reason": "In the local render, the nightstand's top surface is directly visible in the foreground, with a broad horizontal usable area exposed enough to judge object placement or reach. The local_context and overview renders also show the nightstand area from surrounding angles, and no solid object is visibly covering the top functional region. Because the supportable top is visibly exposed rather than hidden, this affordance exposure check passes.",
  "blocking_objects": [],
  "confidence": 0.61
}
Evidence
affordance_exposure__bedroom_nightstand_1

Judgement
bed · sittable, supportable, containable VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `spatial_accessibility` metric for `bedroom_bed_0` with affordance `sittable`. Relevant context: bedroom_rug_0, bedroom_nightstand_0. Check whether a person can physically walk up to the seating edge and sit from the front or from one genuinely open side. Require one clearly usable approach corridor from open room space all the way to the seat edge itself, not merely to the furniture cluster or to an outer corner of the seat. A standard coffee table in front of a sofa or loveseat is normal unless the local renders show that it closes off the usable seating edge. Consider the combined effect of nearby furniture on the front and side approach zones only when that pinch is visually evident in the local renders. Close adjacency to a bed or wall is acceptable only if one normal sitting approach remains. If you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.
Reason: In the local render, the bed’s left/front-left seating edge remains reachable from open floor space, with the nightstand only occupying a normal bedside position rather than closing that approach. In the overview renders, the wardrobe/cabinet on the right and the bench near the foot constrain other sides, but they do not eliminate the clearly usable approach corridor to the exposed left side of the bed and the seat edge itself. This leaves one practical side approach for walking up and sitting on the bed now.
Blocking Objects: none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes.

Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.

Workflow:
1. Read the task instruction and target metric.
2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.
3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.
4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.
5. Evaluate ONLY the requested metric and return a JSON object matching the schema.

Scene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.
Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Metric: spatial_accessibility
Subject object: bedroom_bed_0
Target objects: bedroom_rug_0, bedroom_nightstand_0

Verified asset facts:
- subject `bedroom_bed_0`: category `bed`, affordances [sittable, supportable, containable], usable face `top`, benchmark relevance `functional`, confidence `0.52`, source `asset_annotation`
- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [sit, access_contents]; top: [place_or_reach_objects]`
- low-confidence candidates: `none`
Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.

Scene evidence object hints:
- subject `bedroom_bed_0`: category `bed`, group `sleeping`, keywords [bed, bedframe, mattress, sleeping area], affordances [sittable, supportable, containable], task targets [desk, table]
- target `bedroom_rug_0`: category `rug`, group `soft_furnishing`, keywords [rug, carpet, floor mat], task targets [graspable_object]
- target `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object]

Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):
- subject `bedroom_bed_0`: category `bed`, room `bedroom`, placement `floor`, footprint `large`, height `tall`, elevation `floor_level`
Target cues:
- target `bedroom_rug_0`: category `rug`, room `bedroom`, placement `floor`, footprint `large`, height `tall`, elevation `floor_level`, subject_proximity `touching`
- target `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `mid`
Nearby same-room context objects that may affect access or clearance:
- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `medium`
- `bedroom_alarm_clock_0`: category `alarm clock`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `tiny`
- `bedroom_nightstand_1`: category `nightstand`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `small`
- `bedroom_bedroom_bench_0`: category `bench`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`
- `bedroom_canvas_0`: category `canvas_canvas`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large`
- `bedroom_jewelry_catchall_dish_0`: category `butter dish`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `tiny`
Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.

Metric-specific rule:
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `spatial_accessibility` metric for `bedroom_bed_0` with affordance `sittable`.
Relevant context: bedroom_rug_0, bedroom_nightstand_0.
Check whether a person can physically walk up to the seating edge and sit from the front or from one genuinely open side. Require one clearly usable approach corridor from open room space all the way to the seat edge itself, not merely to the furniture cluster or to an outer corner of the seat. A standard coffee table in front of a sofa or loveseat is normal unless the local renders show that it closes off the usable seating edge. Consider the combined effect of nearby furniture on the front and side approach zones only when that pinch is visually evident in the local renders. Close adjacency to a bed or wall is acceptable only if one normal sitting approach remains.
If you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.

Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.
Evidence guardrails:
- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.
- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.
- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.
- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.
- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.
Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.
Allowed labels: pass, degraded, fail, unknown.
2D perspective overlap alone is not enough evidence of a blocker.
Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.
If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.
If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
  "check_id": "spatial_accessibility__bedroom_bed_0__to__bedroom_rug_0__bedroom_nightstand_0",
  "metric": "spatial_accessibility",
  "subject_id": "bedroom_bed_0",
  "target_ids": [
    "bedroom_rug_0",
    "bedroom_nightstand_0"
  ],
  "prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: spatial_accessibility\nSubject object: bedroom_bed_0\nTarget objects: bedroom_rug_0, bedroom_nightstand_0\n\nVerified asset facts:\n- subject `bedroom_bed_0`: category `bed`, affordances [sittable, supportable, containable], usable face `top`, benchmark relevance `functional`, confidence `0.52`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [sit, access_contents]; top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_bed_0`: category `bed`, group `sleeping`, keywords [bed, bedframe, mattress, sleeping area], affordances [sittable, supportable, containable], task targets [desk, table]\n- target `bedroom_rug_0`: category `rug`, group `soft_furnishing`, keywords [rug, carpet, floor mat], task targets [graspable_object]\n- target `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object]\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_bed_0`: category `bed`, room `bedroom`, placement `floor`, footprint `large`, height `tall`, elevation `floor_level`\nTarget cues:\n- target `bedroom_rug_0`: category `rug`, room `bedroom`, placement `floor`, footprint `large`, height `tall`, elevation `floor_level`, subject_proximity `touching`\n- target `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `mid`\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `medium`\n- `bedroom_alarm_clock_0`: category `alarm clock`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `tiny`\n- `bedroom_nightstand_1`: category `nightstand`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `small`\n- `bedroom_bedroom_bench_0`: category `bench`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`\n- `bedroom_canvas_0`: category `canvas_canvas`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large`\n- `bedroom_jewelry_catchall_dish_0`: category `butter dish`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `tiny`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `spatial_accessibility` metric for `bedroom_bed_0` with affordance `sittable`.\nRelevant context: bedroom_rug_0, bedroom_nightstand_0.\nCheck whether a person can physically walk up to the seating edge and sit from the front or from one genuinely open side. Require one clearly usable approach corridor from open room space all the way to the seat edge itself, not merely to the furniture cluster or to an outer corner of the seat. A standard coffee table in front of a sofa or loveseat is normal unless the local renders show that it closes off the usable seating edge. Consider the combined effect of nearby furniture on the front and side approach zones only when that pinch is visually evident in the local renders. Close adjacency to a bed or wall is acceptable only if one normal sitting approach remains.\nIf you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
  "files": {
    "render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/spatial_accessibility__bedroom_bed_0__to__bedroom_rug_0__bedroom_nightstand_0_local.png",
    "render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/spatial_accessibility__bedroom_bed_0__to__bedroom_rug_0__bedroom_nightstand_0_local_context.png",
    "render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_00_rgb.png",
    "render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_01_rgb.png",
    "render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_02_rgb.png",
    "render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_03_rgb.png"
  }
}
Result JSON
{
  "check_id": "spatial_accessibility__bedroom_bed_0__to__bedroom_rug_0__bedroom_nightstand_0",
  "metric": "spatial_accessibility",
  "label": "pass",
  "asset_fact_used": true,
  "asset_fact_confidence": 0.52,
  "evidence_conflict": false,
  "reason": "In the local render, the bed’s left/front-left seating edge remains reachable from open floor space, with the nightstand only occupying a normal bedside position rather than closing that approach. In the overview renders, the wardrobe/cabinet on the right and the bench near the foot constrain other sides, but they do not eliminate the clearly usable approach corridor to the exposed left side of the bed and the seat edge itself. This leaves one practical side approach for walking up and sitting on the bed now.",
  "blocking_objects": [],
  "confidence": 0.72
}
Evidence
spatial_accessibility__bedroom_bed_0__to__bedroom_rug_0__bedroom_nightstand_0

Judgement
wardrobe · openable, containable, supportable, drawable VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `spatial_accessibility` metric for `bedroom_wardrobe_0` with affordance `openable`. Relevant context: the surrounding scene. Check whether at least one practical front-facing or slightly offset standing position can reach the operating side so the object can be opened. A nearby bed corner or bench should not cause degraded by itself if a normal reach from the front region is still possible. Use degraded only when that front-region approach is directly pinched, occupied, or effectively unusable across the usable operating side. Do not count a cramped diagonal squeeze, shoulder-turned sidestep, or tiny side sliver as a valid approach path. If you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.
Reason: In the local_plan_context render, the red-highlighted wardrobe has open floor directly in front of its operating side, with the bed only offset near the lower-right edge rather than spanning the full front approach. In the local_operating_zone_context and local_depth_probe views, no object is shown occupying the standing/opening zone in front of the wardrobe, so at least one practical front-facing or slightly offset approach position remains usable now. The overview renders are consistent with this and do not show a clearer front-path blockage at the wardrobe corner of the room.
Blocking Objects: none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes.

Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.

Workflow:
1. Read the task instruction and target metric.
2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.
3. Local render order and intended use: 1. `local_operating_zone` is the clean RGB view for the most likely door-facing standing and opening zone. 2. `local_operating_zone_context` is a slightly wider highlighted context view of the same operating side. 3. `local_depth_probe` is an oblique RGB view meant to clarify depth and whether a blocker sits in front of the operating zone. 4. `local_plan_context` is a high local layout view used to check whether a nearby object actually spans or occupies the operating/approach zone rather than merely overlapping in perspective.
4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.
5. Evaluate ONLY the requested metric and return a JSON object matching the schema.

Scene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.
Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Metric: spatial_accessibility
Subject object: bedroom_wardrobe_0
Target objects: none

Verified asset facts:
- subject `bedroom_wardrobe_0`: category `wardrobe`, affordances [openable, containable, supportable, drawable], usable face `top`, benchmark relevance `functional`, confidence `0.64`, source `asset_annotation`
- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [open, access_contents, open]; top: [place_or_reach_objects]`
- low-confidence candidates: `none`
Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.

Scene evidence object hints:
- subject `bedroom_wardrobe_0`: category `wardrobe`, group `storage`, keywords [wardrobe, closet, armoire], affordances [openable, containable, supportable, drawable], task targets [room, wall, clear_space]
- targets: none

Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):
- subject `bedroom_wardrobe_0`: category `wardrobe`, room `bedroom`, placement `floor`, footprint `large`, height `mid`, elevation `floor_level`
Target cues:
- none
Nearby same-room context objects that may affect access or clearance:
- `bedroom_art_print_0`: category `art_print_art`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`
- `bedroom_smoke_detector_0`: category `smoke detector`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `tiny`
- `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `tiny`
- `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`
- `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `large`
- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`
Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.

Metric-specific rule:
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `spatial_accessibility` metric for `bedroom_wardrobe_0` with affordance `openable`.
Relevant context: the surrounding scene.
Check whether at least one practical front-facing or slightly offset standing position can reach the operating side so the object can be opened. A nearby bed corner or bench should not cause degraded by itself if a normal reach from the front region is still possible. Use degraded only when that front-region approach is directly pinched, occupied, or effectively unusable across the usable operating side. Do not count a cramped diagonal squeeze, shoulder-turned sidestep, or tiny side sliver as a valid approach path.
If you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.

Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.
Evidence guardrails:
- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.
- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.
- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.
- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.
- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.
Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.
Allowed labels: pass, degraded, fail, unknown.
2D perspective overlap alone is not enough evidence of a blocker.
Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.
If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.
If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
  "check_id": "spatial_accessibility__bedroom_wardrobe_0",
  "metric": "spatial_accessibility",
  "subject_id": "bedroom_wardrobe_0",
  "target_ids": [],
  "prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local_operating_zone` is the clean RGB view for the most likely door-facing standing and opening zone. 2. `local_operating_zone_context` is a slightly wider highlighted context view of the same operating side. 3. `local_depth_probe` is an oblique RGB view meant to clarify depth and whether a blocker sits in front of the operating zone. 4. `local_plan_context` is a high local layout view used to check whether a nearby object actually spans or occupies the operating/approach zone rather than merely overlapping in perspective.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: spatial_accessibility\nSubject object: bedroom_wardrobe_0\nTarget objects: none\n\nVerified asset facts:\n- subject `bedroom_wardrobe_0`: category `wardrobe`, affordances [openable, containable, supportable, drawable], usable face `top`, benchmark relevance `functional`, confidence `0.64`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [open, access_contents, open]; top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_wardrobe_0`: category `wardrobe`, group `storage`, keywords [wardrobe, closet, armoire], affordances [openable, containable, supportable, drawable], task targets [room, wall, clear_space]\n- targets: none\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_wardrobe_0`: category `wardrobe`, room `bedroom`, placement `floor`, footprint `large`, height `mid`, elevation `floor_level`\nTarget cues:\n- none\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_art_print_0`: category `art_print_art`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`\n- `bedroom_smoke_detector_0`: category `smoke detector`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_rug_0`: category `rug`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`\n- `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `very_close`, footprint `large`\n- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `spatial_accessibility` metric for `bedroom_wardrobe_0` with affordance `openable`.\nRelevant context: the surrounding scene.\nCheck whether at least one practical front-facing or slightly offset standing position can reach the operating side so the object can be opened. A nearby bed corner or bench should not cause degraded by itself if a normal reach from the front region is still possible. Use degraded only when that front-region approach is directly pinched, occupied, or effectively unusable across the usable operating side. Do not count a cramped diagonal squeeze, shoulder-turned sidestep, or tiny side sliver as a valid approach path.\nIf you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
  "files": {
    "render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/spatial_accessibility__bedroom_wardrobe_0_local_operating_zone.png",
    "render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/spatial_accessibility__bedroom_wardrobe_0_local_operating_zone_context.png",
    "render_check_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/spatial_accessibility__bedroom_wardrobe_0_local_depth_probe.png",
    "render_check_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/spatial_accessibility__bedroom_wardrobe_0_local_plan_context.png",
    "render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_00_rgb.png",
    "render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_01_rgb.png",
    "render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_02_rgb.png",
    "render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_03_rgb.png"
  }
}
Result JSON
{
  "check_id": "spatial_accessibility__bedroom_wardrobe_0",
  "metric": "spatial_accessibility",
  "label": "pass",
  "asset_fact_used": true,
  "asset_fact_confidence": 0.64,
  "evidence_conflict": false,
  "reason": "In the local_plan_context render, the red-highlighted wardrobe has open floor directly in front of its operating side, with the bed only offset near the lower-right edge rather than spanning the full front approach. In the local_operating_zone_context and local_depth_probe views, no object is shown occupying the standing/opening zone in front of the wardrobe, so at least one practical front-facing or slightly offset approach position remains usable now. The overview renders are consistent with this and do not show a clearer front-path blockage at the wardrobe corner of the room.",
  "blocking_objects": [],
  "confidence": 0.78
}
Evidence
spatial_accessibility__bedroom_wardrobe_0

Judgement
nightstand · supportable VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `spatial_accessibility` metric for `bedroom_nightstand_0` with affordance `supportable`. Relevant context: the surrounding scene. Check whether a person can approach the room-facing side and reach the usable top surface. Do not require circulation on all sides. If you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.
Reason: In the local_context render, the highlighted nightstand (red) has open floor space on its room-facing side, with the bed beside it but not closing off the approach. The overview renders also show the nightstand reachable from the adjacent open floor area, so a person can approach and reach the usable top surface. No direct blocking object is visible in the local evidence around the required approach zone.
Blocking Objects: none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes.

Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.

Workflow:
1. Read the task instruction and target metric.
2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.
3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.
4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.
5. Evaluate ONLY the requested metric and return a JSON object matching the schema.

Scene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.
Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Metric: spatial_accessibility
Subject object: bedroom_nightstand_0
Target objects: none

Verified asset facts:
- subject `bedroom_nightstand_0`: category `nightstand`, affordances [supportable], usable face `top`, benchmark relevance `functional`, confidence `0.60`, source `asset_annotation`
- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `top: [place_or_reach_objects]`
- low-confidence candidates: `none`
Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.

Scene evidence object hints:
- subject `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], affordances [supportable], task targets [graspable_object]
- targets: none

Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):
- subject `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`
Target cues:
- none
Nearby same-room context objects that may affect access or clearance:
- `bedroom_alarm_clock_0`: category `alarm clock`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`
- `bedroom_smartphone_0`: category `smartphone_smartphone`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`
- `bedroom_paperback_book_0`: category `book`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`
- `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `tiny`
- `bedroom_wall_sconce_1`: category `wall_sconce_wall`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `small`
- `bedroom_canvas_0`: category `canvas_canvas`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`
Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.

Metric-specific rule:
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `spatial_accessibility` metric for `bedroom_nightstand_0` with affordance `supportable`.
Relevant context: the surrounding scene.
Check whether a person can approach the room-facing side and reach the usable top surface. Do not require circulation on all sides.
If you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.

Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.
Evidence guardrails:
- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.
- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.
- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.
- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.
- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.
Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.
Allowed labels: pass, degraded, fail, unknown.
2D perspective overlap alone is not enough evidence of a blocker.
Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.
If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.
If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
  "check_id": "spatial_accessibility__bedroom_nightstand_0",
  "metric": "spatial_accessibility",
  "subject_id": "bedroom_nightstand_0",
  "target_ids": [],
  "prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: spatial_accessibility\nSubject object: bedroom_nightstand_0\nTarget objects: none\n\nVerified asset facts:\n- subject `bedroom_nightstand_0`: category `nightstand`, affordances [supportable], usable face `top`, benchmark relevance `functional`, confidence `0.60`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], affordances [supportable], task targets [graspable_object]\n- targets: none\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`\nTarget cues:\n- none\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_alarm_clock_0`: category `alarm clock`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`\n- `bedroom_smartphone_0`: category `smartphone_smartphone`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`\n- `bedroom_paperback_book_0`: category `book`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`\n- `bedroom_recessed_light_1`: category `pathway light`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_wall_sconce_1`: category `wall_sconce_wall`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `small`\n- `bedroom_canvas_0`: category `canvas_canvas`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `spatial_accessibility` metric for `bedroom_nightstand_0` with affordance `supportable`.\nRelevant context: the surrounding scene.\nCheck whether a person can approach the room-facing side and reach the usable top surface. Do not require circulation on all sides.\nIf you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
  "files": {
    "render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/spatial_accessibility__bedroom_nightstand_0_local.png",
    "render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/spatial_accessibility__bedroom_nightstand_0_local_context.png",
    "render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_00_rgb.png",
    "render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_01_rgb.png",
    "render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_02_rgb.png",
    "render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_03_rgb.png"
  }
}
Result JSON
{
  "check_id": "spatial_accessibility__bedroom_nightstand_0",
  "metric": "spatial_accessibility",
  "label": "pass",
  "asset_fact_used": true,
  "asset_fact_confidence": 0.6,
  "evidence_conflict": false,
  "reason": "In the local_context render, the highlighted nightstand (red) has open floor space on its room-facing side, with the bed beside it but not closing off the approach. The overview renders also show the nightstand reachable from the adjacent open floor area, so a person can approach and reach the usable top surface. No direct blocking object is visible in the local evidence around the required approach zone.",
  "blocking_objects": [],
  "confidence": 0.82
}
Evidence
spatial_accessibility__bedroom_nightstand_0

Judgement
nightstand · supportable VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `spatial_accessibility` metric for `bedroom_nightstand_1` with affordance `supportable`. Relevant context: the surrounding scene. Check whether a person can approach the room-facing side and reach the usable top surface. Do not require circulation on all sides. If you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.
Reason: In the local_context render, the red-highlighted nightstand sits beside the bed with its room-facing side open, and no object is visibly occupying the immediate standing/reach area in front of it. The overview renders also show the nightstand at the bedside with accessible open floor on the outward side, so a person can approach that side and reach the usable top surface now. Normal bed adjacency does not block this access here.
Blocking Objects: none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes.

Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.

Workflow:
1. Read the task instruction and target metric.
2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.
3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.
4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.
5. Evaluate ONLY the requested metric and return a JSON object matching the schema.

Scene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.
Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Metric: spatial_accessibility
Subject object: bedroom_nightstand_1
Target objects: none

Verified asset facts:
- subject `bedroom_nightstand_1`: category `nightstand`, affordances [supportable], usable face `top`, benchmark relevance `functional`, confidence `0.60`, source `asset_annotation`
- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `top: [place_or_reach_objects]`
- low-confidence candidates: `none`
Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.

Scene evidence object hints:
- subject `bedroom_nightstand_1`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], affordances [supportable], task targets [graspable_object]
- targets: none

Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):
- subject `bedroom_nightstand_1`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`
Target cues:
- none
Nearby same-room context objects that may affect access or clearance:
- `bedroom_jewelry_catchall_dish_0`: category `butter dish`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`
- `bedroom_water_carafe_0`: category `carafe`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `small`
- `bedroom_clear_glass_tumbler_0`: category `tumbler`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`
- `bedroom_hardcover_book_0`: category `book`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `tiny`
- `bedroom_wall_sconce_0`: category `wall_sconce_wall`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `small`
- `bedroom_canvas_0`: category `canvas_canvas`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`
Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.

Metric-specific rule:
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `spatial_accessibility` metric for `bedroom_nightstand_1` with affordance `supportable`.
Relevant context: the surrounding scene.
Check whether a person can approach the room-facing side and reach the usable top surface. Do not require circulation on all sides.
If you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.

Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.
Evidence guardrails:
- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.
- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.
- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.
- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.
- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.
Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.
Allowed labels: pass, degraded, fail, unknown.
2D perspective overlap alone is not enough evidence of a blocker.
Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.
If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.
If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
  "check_id": "spatial_accessibility__bedroom_nightstand_1",
  "metric": "spatial_accessibility",
  "subject_id": "bedroom_nightstand_1",
  "target_ids": [],
  "prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: spatial_accessibility\nSubject object: bedroom_nightstand_1\nTarget objects: none\n\nVerified asset facts:\n- subject `bedroom_nightstand_1`: category `nightstand`, affordances [supportable], usable face `top`, benchmark relevance `functional`, confidence `0.60`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_nightstand_1`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], affordances [supportable], task targets [graspable_object]\n- targets: none\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_nightstand_1`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`\nTarget cues:\n- none\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_jewelry_catchall_dish_0`: category `butter dish`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`\n- `bedroom_water_carafe_0`: category `carafe`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `small`\n- `bedroom_clear_glass_tumbler_0`: category `tumbler`, placement `floor`, proximity `near`, bbox_relation `overlapping_or_touching`, footprint `tiny`\n- `bedroom_hardcover_book_0`: category `book`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_wall_sconce_0`: category `wall_sconce_wall`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `small`\n- `bedroom_canvas_0`: category `canvas_canvas`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `spatial_accessibility` metric for `bedroom_nightstand_1` with affordance `supportable`.\nRelevant context: the surrounding scene.\nCheck whether a person can approach the room-facing side and reach the usable top surface. Do not require circulation on all sides.\nIf you choose `degraded` or `fail`, name at least one direct blocker or the directly constrained functional region. If no direct blocker is visible, prefer `pass` or `unknown`.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
  "files": {
    "render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/spatial_accessibility__bedroom_nightstand_1_local.png",
    "render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/spatial_accessibility__bedroom_nightstand_1_local_context.png",
    "render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_00_rgb.png",
    "render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_01_rgb.png",
    "render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_02_rgb.png",
    "render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_03_rgb.png"
  }
}
Result JSON
{
  "check_id": "spatial_accessibility__bedroom_nightstand_1",
  "metric": "spatial_accessibility",
  "label": "pass",
  "asset_fact_used": true,
  "asset_fact_confidence": 0.6,
  "evidence_conflict": false,
  "reason": "In the local_context render, the red-highlighted nightstand sits beside the bed with its room-facing side open, and no object is visibly occupying the immediate standing/reach area in front of it. The overview renders also show the nightstand at the bedside with accessible open floor on the outward side, so a person can approach that side and reach the usable top surface now. Normal bed adjacency does not block this access here.",
  "blocking_objects": [],
  "confidence": 0.85
}
Evidence
spatial_accessibility__bedroom_nightstand_1

Judgement
bench · sittable, supportable, pushable, liftable VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `spatial_accessibility` metric for `bedroom_bedroom_bench_0` with affordance `sittable`. Relevant context: bedroom_shelf_0. Check whether a person can physically walk up to the seating edge and sit from the front or from one genuinely open side. Require one clearly usable approach corridor from open room space all the way to the seat edge itself, not merely to the furniture cluster or to an outer corner of the seat. A standard coffee table in front of a sofa or loveseat is normal unless the local renders show that it closes off the usable seating edge. Consider the combined effect of nearby furniture on the front and side approach zones only when that pinch is visually evident in the local renders. Close adjacency to a bed or wall is acceptable only if one normal sitting approach remains. If you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.
Reason: In the local render and the highlighted local_context view, the subject bench (red/orange) appears tipped onto its side rather than upright, so the normal seating edge is not clearly identifiable. The overview renders also show the same sideways orientation. Because this visually conflicts with the provided asset fact that the bench is currently usable as a sittable object, I cannot reliably judge whether a person can walk up to the seat edge and sit from the front or an open side in the current scene state without guessing about the intended usable face or approach zone now that it is overturned/sideways-positioned.
Blocking Objects: none
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes.

Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.

Workflow:
1. Read the task instruction and target metric.
2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.
3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.
4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.
5. Evaluate ONLY the requested metric and return a JSON object matching the schema.

Scene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.
Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Metric: spatial_accessibility
Subject object: bedroom_bedroom_bench_0
Target objects: bedroom_shelf_0

Verified asset facts:
- subject `bedroom_bedroom_bench_0`: category `bench`, affordances [sittable, supportable, pushable, liftable], usable face `top`, benchmark relevance `functional`, confidence `0.67`, source `asset_annotation`
- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [sit]; top: [place_or_reach_objects]`
- low-confidence candidates: `none`
Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.

Scene evidence object hints:
- subject `bedroom_bedroom_bench_0`: category `bench`, group `seating`, keywords [bench, seating bench], affordances [sittable, supportable, pushable, liftable], task targets [desk, table]
- target `bedroom_shelf_0`: category `wall shelf`, group `unknown`, keywords [wall shelf], task targets [graspable_object]

Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):
- subject `bedroom_bedroom_bench_0`: category `bench`, room `bedroom`, placement `floor`, footprint `medium`, height `low`, elevation `floor_level`
Target cues:
- target `bedroom_shelf_0`: category `wall shelf`, room `bedroom`, placement `floor`, footprint `large`, height `low`, elevation `raised`, subject_proximity `mid`
Nearby same-room context objects that may affect access or clearance:
- `bedroom_bud_vase_0`: category `vase`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny`
- `bedroom_smartphone_1`: category `smartphone_smartphone`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny`
- `bedroom_jewelry_catchall_dish_1`: category `butter dish`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny`
- `bedroom_smoke_detector_0`: category `smoke detector`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny`
- `bedroom_paperback_book_1`: category `book`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny`
- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`
Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.

Metric-specific rule:
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `spatial_accessibility` metric for `bedroom_bedroom_bench_0` with affordance `sittable`.
Relevant context: bedroom_shelf_0.
Check whether a person can physically walk up to the seating edge and sit from the front or from one genuinely open side. Require one clearly usable approach corridor from open room space all the way to the seat edge itself, not merely to the furniture cluster or to an outer corner of the seat. A standard coffee table in front of a sofa or loveseat is normal unless the local renders show that it closes off the usable seating edge. Consider the combined effect of nearby furniture on the front and side approach zones only when that pinch is visually evident in the local renders. Close adjacency to a bed or wall is acceptable only if one normal sitting approach remains.
If you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.

Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.
Evidence guardrails:
- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.
- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.
- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.
- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.
- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.
Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.
Allowed labels: pass, degraded, fail, unknown.
2D perspective overlap alone is not enough evidence of a blocker.
Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.
If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.
If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
  "check_id": "spatial_accessibility__bedroom_bedroom_bench_0__to__bedroom_shelf_0",
  "metric": "spatial_accessibility",
  "subject_id": "bedroom_bedroom_bench_0",
  "target_ids": [
    "bedroom_shelf_0"
  ],
  "prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local` is the main clean RGB local inspection view. 2. `local_context` is a highlighted context view; subject is red and targets are green.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: spatial_accessibility\nSubject object: bedroom_bedroom_bench_0\nTarget objects: bedroom_shelf_0\n\nVerified asset facts:\n- subject `bedroom_bedroom_bench_0`: category `bench`, affordances [sittable, supportable, pushable, liftable], usable face `top`, benchmark relevance `functional`, confidence `0.67`, source `asset_annotation`\n- interaction facts: placement class `floor_furniture`, access type `none`, interaction height `none`, surface map `front: [sit]; top: [place_or_reach_objects]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_bedroom_bench_0`: category `bench`, group `seating`, keywords [bench, seating bench], affordances [sittable, supportable, pushable, liftable], task targets [desk, table]\n- target `bedroom_shelf_0`: category `wall shelf`, group `unknown`, keywords [wall shelf], task targets [graspable_object]\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_bedroom_bench_0`: category `bench`, room `bedroom`, placement `floor`, footprint `medium`, height `low`, elevation `floor_level`\nTarget cues:\n- target `bedroom_shelf_0`: category `wall shelf`, room `bedroom`, placement `floor`, footprint `large`, height `low`, elevation `raised`, subject_proximity `mid`\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_bud_vase_0`: category `vase`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny`\n- `bedroom_smartphone_1`: category `smartphone_smartphone`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny`\n- `bedroom_jewelry_catchall_dish_1`: category `butter dish`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `tiny`\n- `bedroom_smoke_detector_0`: category `smoke detector`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny`\n- `bedroom_paperback_book_1`: category `book`, placement `floor`, proximity `mid`, bbox_relation `nearby`, footprint `tiny`\n- `bedroom_semi_flush_light_0`: category `pathway light`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `medium`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `spatial_accessibility` metric for `bedroom_bedroom_bench_0` with affordance `sittable`.\nRelevant context: bedroom_shelf_0.\nCheck whether a person can physically walk up to the seating edge and sit from the front or from one genuinely open side. Require one clearly usable approach corridor from open room space all the way to the seat edge itself, not merely to the furniture cluster or to an outer corner of the seat. A standard coffee table in front of a sofa or loveseat is normal unless the local renders show that it closes off the usable seating edge. Consider the combined effect of nearby furniture on the front and side approach zones only when that pinch is visually evident in the local renders. Close adjacency to a bed or wall is acceptable only if one normal sitting approach remains.\nIf you choose `degraded` or `fail`, name the blocker or the directly constrained approach or sit/stand zone. Consider the combined effect of nearby furniture only when the local renders clearly show that the front and side access are genuinely pinched or eliminated. A standard coffee table in front of a sofa or loveseat is normal unless the local evidence shows it closes off the usable seating edge. Do not infer exact centimeter clearance from auxiliary geometry or bounding-box numbers.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
  "files": {
    "render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/spatial_accessibility__bedroom_bedroom_bench_0__to__bedroom_shelf_0_local.png",
    "render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/spatial_accessibility__bedroom_bedroom_bench_0__to__bedroom_shelf_0_local_context.png",
    "render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_00_rgb.png",
    "render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_01_rgb.png",
    "render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_02_rgb.png",
    "render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_03_rgb.png"
  }
}
Result JSON
{
  "check_id": "spatial_accessibility__bedroom_bedroom_bench_0__to__bedroom_shelf_0",
  "metric": "spatial_accessibility",
  "label": "unknown",
  "asset_fact_used": true,
  "asset_fact_confidence": 0.67,
  "evidence_conflict": true,
  "reason": "In the local render and the highlighted local_context view, the subject bench (red/orange) appears tipped onto its side rather than upright, so the normal seating edge is not clearly identifiable. The overview renders also show the same sideways orientation. Because this visually conflicts with the provided asset fact that the bench is currently usable as a sittable object, I cannot reliably judge whether a person can walk up to the seat edge and sit from the front or an open side in the current scene state without guessing about the intended usable face or approach zone now that it is overturned/sideways-positioned.",
  "blocking_objects": [],
  "confidence": 0.9
}
Evidence
spatial_accessibility__bedroom_bedroom_bench_0__to__bedroom_shelf_0

Judgement
butter dish · containable, openable, graspable VLM
Question
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room. Evaluate only the `spatial_accessibility` metric for `bedroom_jewelry_catchall_dish_0` with affordance `openable`. Relevant context: bedroom_nightstand_1, bedroom_nightstand_0. Check whether at least one practical front-facing or slightly offset standing position can reach the operating side so the object can be opened. A nearby bed corner or bench should not cause degraded by itself if a normal reach from the front region is still possible. Use degraded only when that front-region approach is directly pinched, occupied, or effectively unusable across the usable operating side. Do not count a cramped diagonal squeeze, shoulder-turned sidestep, or tiny side sliver as a valid approach path. If you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.
Reason: In the local operating zone and local operating zone context renders, the subject sits very low against the wall with the purple bed mass occupying the immediate front approach area, leaving no normal standing position directly in front of its usable side. In the local plan context render, the dish is also tucked beneath/just in front of bedroom_nightstand_1 and crowded by the nearby carafe and tumbler, confirming that the front reach zone is pinched rather than open. The overview renders support that this corner is constrained by the bed and nightstand cluster, so opening the dish would not be realistically accessible from a normal approach path now.
Blocking Objects: bedroom_bed_0, bedroom_nightstand_1, bedroom_water_carafe_0, bedroom_clear_glass_tumbler_0
Prompt
Show prompt
You are an affordance-aware functional critic for 3D indoor scenes.

Core principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.

Workflow:
1. Read the task instruction and target metric.
2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.
3. Local render order and intended use: 1. `local_operating_zone` is the clean RGB view for the most likely door-facing standing and opening zone. 2. `local_operating_zone_context` is a slightly wider highlighted context view of the same operating side. 3. `local_depth_probe` is an oblique RGB view meant to clarify depth and whether a blocker sits in front of the operating zone. 4. `local_plan_context` is a high local layout view used to check whether a nearby object actually spans or occupies the operating/approach zone rather than merely overlapping in perspective.
4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.
5. Evaluate ONLY the requested metric and return a JSON object matching the schema.

Scene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.
Task instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Metric: spatial_accessibility
Subject object: bedroom_jewelry_catchall_dish_0
Target objects: bedroom_nightstand_1, bedroom_nightstand_0

Verified asset facts:
- subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, affordances [containable, openable, graspable], usable face `front`, benchmark relevance `functional`, confidence `0.55`, source `asset_annotation`
- interaction facts: placement class `unknown`, access type `none`, interaction height `none`, surface map `front: [access_contents, open, grasp]`
- low-confidence candidates: `none`
Use these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.

Scene evidence object hints:
- subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, group `unknown`, keywords [butter dish], affordances [containable, openable, graspable], task targets [table, desk, nightstand, coffee_table, shelf, cabinet]
- target `bedroom_nightstand_1`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object]
- target `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object]

Auxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):
- subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, room `bedroom`, placement `floor`, footprint `tiny`, height `very_low`, elevation `raised`
Target cues:
- target `bedroom_nightstand_1`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `near`
- target `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `far`
Nearby same-room context objects that may affect access or clearance:
- `bedroom_hardcover_book_0`: category `book`, placement `floor`, proximity `touching`, bbox_relation `very_close`, footprint `tiny`
- `bedroom_clear_glass_tumbler_0`: category `tumbler`, placement `floor`, proximity `touching`, bbox_relation `very_close`, footprint `tiny`
- `bedroom_water_carafe_0`: category `carafe`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `small`
- `bedroom_wall_sconce_0`: category `wall_sconce_wall`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `small`
- `bedroom_canvas_0`: category `canvas_canvas`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`
- `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large`
Do not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.

Metric-specific rule:
Task: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.
Evaluate only the `spatial_accessibility` metric for `bedroom_jewelry_catchall_dish_0` with affordance `openable`.
Relevant context: bedroom_nightstand_1, bedroom_nightstand_0.
Check whether at least one practical front-facing or slightly offset standing position can reach the operating side so the object can be opened. A nearby bed corner or bench should not cause degraded by itself if a normal reach from the front region is still possible. Use degraded only when that front-region approach is directly pinched, occupied, or effectively unusable across the usable operating side. Do not count a cramped diagonal squeeze, shoulder-turned sidestep, or tiny side sliver as a valid approach path.
If you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.

Image order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.
Evidence guardrails:
- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.
- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.
- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.
- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.
- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.
Output JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.
Allowed labels: pass, degraded, fail, unknown.
2D perspective overlap alone is not enough evidence of a blocker.
Normal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.
If you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.
If evidence is insufficient, use unknown instead of guessing.
Raw JSON
Request JSON
{
  "check_id": "spatial_accessibility__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0",
  "metric": "spatial_accessibility",
  "subject_id": "bedroom_jewelry_catchall_dish_0",
  "target_ids": [
    "bedroom_nightstand_1",
    "bedroom_nightstand_0"
  ],
  "prompt_text": "You are an affordance-aware functional critic for 3D indoor scenes.\n\nCore principle: judge whether the task CAN be done in the current scene, without assuming it has already been pre-solved.\n\nWorkflow:\n1. Read the task instruction and target metric.\n2. Inspect the local check renders first and treat them as the primary evidence for the requested interaction.\n3. Local render order and intended use: 1. `local_operating_zone` is the clean RGB view for the most likely door-facing standing and opening zone. 2. `local_operating_zone_context` is a slightly wider highlighted context view of the same operating side. 3. `local_depth_probe` is an oblique RGB view meant to clarify depth and whether a blocker sits in front of the operating zone. 4. `local_plan_context` is a high local layout view used to check whether a nearby object actually spans or occupies the operating/approach zone rather than merely overlapping in perspective.\n4. Then inspect the 4 attached overview renders for room-scale layout and approach context. These overviews are captured from a slightly elevated height and are not meant to mimic adult eye level.\n5. Evaluate ONLY the requested metric and return a JSON object matching the schema.\n\nScene summary: scenesmith scene with 28 objects. Room type: bedroom. Planned checks: 12. Room shell: 4.50m x 4.00m x 2.70m.\nTask instruction: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nMetric: spatial_accessibility\nSubject object: bedroom_jewelry_catchall_dish_0\nTarget objects: bedroom_nightstand_1, bedroom_nightstand_0\n\nVerified asset facts:\n- subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, affordances [containable, openable, graspable], usable face `front`, benchmark relevance `functional`, confidence `0.55`, source `asset_annotation`\n- interaction facts: placement class `unknown`, access type `none`, interaction height `none`, surface map `front: [access_contents, open, grasp]`\n- low-confidence candidates: `none`\nUse these as converter-side asset facts. Do not reclassify the subject; judge only whether the listed affordance remains usable in the current scene. If the image clearly contradicts these facts, return unknown and explain the conflict.\n\nScene evidence object hints:\n- subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, group `unknown`, keywords [butter dish], affordances [containable, openable, graspable], task targets [table, desk, nightstand, coffee_table, shelf, cabinet]\n- target `bedroom_nightstand_1`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object]\n- target `bedroom_nightstand_0`: category `nightstand`, group `storage_surface`, keywords [nightstand, bedside table, side table], task targets [graspable_object]\n\nAuxiliary scene cues (coarse geometry only; use these cues just to identify rough scale and nearby context):\n- subject `bedroom_jewelry_catchall_dish_0`: category `butter dish`, room `bedroom`, placement `floor`, footprint `tiny`, height `very_low`, elevation `raised`\nTarget cues:\n- target `bedroom_nightstand_1`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `near`\n- target `bedroom_nightstand_0`: category `nightstand`, room `bedroom`, placement `floor`, footprint `small`, height `low`, elevation `floor_level`, subject_proximity `far`\nNearby same-room context objects that may affect access or clearance:\n- `bedroom_hardcover_book_0`: category `book`, placement `floor`, proximity `touching`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_clear_glass_tumbler_0`: category `tumbler`, placement `floor`, proximity `touching`, bbox_relation `very_close`, footprint `tiny`\n- `bedroom_water_carafe_0`: category `carafe`, placement `floor`, proximity `near`, bbox_relation `very_close`, footprint `small`\n- `bedroom_wall_sconce_0`: category `wall_sconce_wall`, placement `floor`, proximity `near`, bbox_relation `nearby`, footprint `small`\n- `bedroom_canvas_0`: category `canvas_canvas`, placement `floor`, proximity `mid`, bbox_relation `overlapping_or_touching`, footprint `large`\n- `bedroom_bed_0`: category `bed`, placement `floor`, proximity `mid`, bbox_relation `separated`, footprint `large`\nDo not estimate exact centimeter clearance from these coarse cues. They are only for object identity, rough scale, and coarse proximity; the local renders remain the deciding evidence.\n\nMetric-specific rule:\nTask: A bedroom with a bed, two nightstands, and a wardrobe in the corner of the room.\nEvaluate only the `spatial_accessibility` metric for `bedroom_jewelry_catchall_dish_0` with affordance `openable`.\nRelevant context: bedroom_nightstand_1, bedroom_nightstand_0.\nCheck whether at least one practical front-facing or slightly offset standing position can reach the operating side so the object can be opened. A nearby bed corner or bench should not cause degraded by itself if a normal reach from the front region is still possible. Use degraded only when that front-region approach is directly pinched, occupied, or effectively unusable across the usable operating side. Do not count a cramped diagonal squeeze, shoulder-turned sidestep, or tiny side sliver as a valid approach path.\nIf you choose `degraded` or `fail`, name the blocker or the blocked standing/opening zone directly in front of the operating side. If at least one practical front-facing or slightly offset standing position remains and can still reach the operating side normally, prefer `pass`. Do not fail an openable object merely because a nearby seat or small furniture item is offset near one part of the front; require evidence that the usable operating side is actually occupied or pinched. If the operating side cannot be identified from the evidence, return `unknown` instead of guessing.\n\nImage order is fixed: local check renders always come first, followed by the overview renders. If a local render is highlighted, the subject is red and the targets are green.\nEvidence guardrails:\n- Treat the local renders as the deciding evidence for direct blockage, operating-side clearance, and immediate approach space.\n- Use overview renders only to understand room-scale layout or to break ties when the local evidence is incomplete.\n- To output `degraded` or `fail`, the local evidence should explicitly show a directly constrained operating/approach zone, not just a nearby object or a single perspective overlap.\n- When a `local_plan_context` view is present, use it to verify whether the suspected blocker really spans the usable operating/approach zone; if it shows a normal nearby standing/reach position remains, prefer `pass`.\n- If the overview render suggests a blockage but the local evidence does not clearly show that blockage, prefer `pass` or `unknown`, not `degraded` or `fail`.\nOutput JSON with keys: check_id, metric, label, asset_fact_used, asset_fact_confidence, evidence_conflict, reason, blocking_objects, confidence.\nAllowed labels: pass, degraded, fail, unknown.\n2D perspective overlap alone is not enough evidence of a blocker.\nNormal adjacency priors should not override direct evidence in the local renders. Nightstands beside beds, benches touching bed ends, corner wardrobes, and wall decor are acceptable only when they do not directly intrude into the required functional region.\nIf you choose degraded or fail, the reason must name a direct blocker or directly constrained functional region.\nIf evidence is insufficient, use unknown instead of guessing.",
  "files": {
    "render_check_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/spatial_accessibility__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0_local_operating_zone.png",
    "render_check_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/spatial_accessibility__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0_local_operating_zone_context.png",
    "render_check_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/spatial_accessibility__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0_local_depth_probe.png",
    "render_check_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/checks/spatial_accessibility__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0_local_plan_context.png",
    "render_overview_image": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_00_rgb.png",
    "render_overview_image_01": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_01_rgb.png",
    "render_overview_image_02": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_02_rgb.png",
    "render_overview_image_03": "/data/250010098/SceneBenchmark/res/vlm_runs/scenesmith_ud4/scenesmith-example-data_scene_000/render_bundle/views/view_03_rgb.png"
  }
}
Result JSON
{
  "check_id": "spatial_accessibility__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0",
  "metric": "spatial_accessibility",
  "label": "fail",
  "asset_fact_used": true,
  "asset_fact_confidence": 0.55,
  "evidence_conflict": false,
  "reason": "In the local operating zone and local operating zone context renders, the subject sits very low against the wall with the purple bed mass occupying the immediate front approach area, leaving no normal standing position directly in front of its usable side. In the local plan context render, the dish is also tucked beneath/just in front of bedroom_nightstand_1 and crowded by the nearby carafe and tumbler, confirming that the front reach zone is pinched rather than open. The overview renders support that this corner is constrained by the bed and nightstand cluster, so opening the dish would not be realistically accessible from a normal approach path now.",
  "blocking_objects": [
    "bedroom_bed_0",
    "bedroom_nightstand_1",
    "bedroom_water_carafe_0",
    "bedroom_clear_glass_tumbler_0"
  ],
  "confidence": 0.82
}
Evidence
spatial_accessibility__bedroom_jewelry_catchall_dish_0__to__bedroom_nightstand_1__bedroom_nightstand_0