- Hongchi Xia1
- Entong Su2
- Marius Memmel2
- Arhan Jain2
- Raymond Yu2
- Numfor Mbiziwo-Tiapo2
- Ali Farhadi2,3
- Abhishek Gupta2
- Shenlong Wang1
- Wei-Chiu Ma4
- 1University of Illinois at Urbana-Champaign
- 2University of Washington
- 3Allen Institute for AI
- 4Cornell University
CVPR 2025

Abstract
Creating virtual digital replicas from real-world data unlocks significant potential across domains like gaming and robotics.
In this paper, we present DRAWER, a novel framework that converts a video of a static indoor scene into a photorealistic and interactive digital environment.
Our approach centers on two main contributions:
(i) a reconstruction module based on a dual scene representation that reconstructs the scene with fine-grained geometric details,
and (ii) an articulation module that identifies articulation types and hinge positions, reconstructs simulatable shapes and appearances and integrates them into the scene.
The resulting virtual environment is photorealistic, interactive, and runs in real time, with compatibility for game engines and robotic simulation platforms.
We demonstrate the potential of DRAWER by using it to automatically create an interactive game in Unreal Engine and to enable real-to-sim-to-real transfer for robotics applications.
Overview
DRAWER automatically converts a video of a static scene without any interactions with the doors and objects in the scene into an interactive environment with segmented objects and articulated doors.

System Design
Given multiple posed images from a single video, we employ a Dual Scene Representation that combines high-fidelity rendering (3D Gaussian Splatting) with aligned geometry (Mesh From BakedSDF).
We then animate the scene with physical reasoning to estimate articulated and movable rigid-body objects.
Our amodel shape estimation with hidden region texturing enables us to create an interactable digital twin, supporting real-time physical interactions such as opening drawers/cabinets, moving objects, and rendering novel views.

Why We Need a Dual Scene Representation?
We need to reconstruct the scene with high-fidelity rendering and aligned geometry.
The high-fidelity rendering is achieved by 3D Gaussian Splatting, which provides a photorealistic view of the scene.
The aligned geometry is achieved by Mesh From BakedSDF, which provides a mesh representation of the scene.
The dual scene representation allows us to combine the strengths of both methods, resulting in a more accurate and realistic reconstruction of the scene.

RGB GT

2DGS Geometry

Ours Geometry (SDF)

2DGS Rendering (~100fps)

SDF Rendering (<1 fps)

Ours Rendering (Gaussian On SDF Mesh) (~100fps)
Articulation Simulation Rendering
We visualize the comparisons between our articulation simulation results and simulation generated by KlingAI.
Our Simulation Result
KlingAI Generation
Articulation Simulation Motions
We visualize the comparisons between our predicted articulation motion trajectories (red) and the GT trajectories (blue).
Interactable 3D Reconstruction
We visualize the interactable 3D reconstruction in multiple kitchens. To avoid clutter in visualization, we randomly select a subset of drawers/cabinets to open.
Real-to-Sim-to-Real
We deploy policies trained in simulation in a zero-shot fashion in the real world on a Franka Emika Panda robot mounted on a mobile base.
We trained independent policies for each substage of the problem - drawer closing, picking and placing, and opening.
The videos below are played in a normal speed without any accelerations.
The videos below are played in a normal speed without any accelerations.
Gaming: Opening the doors
We demonstrate our interactive game in Unreal Engine with game features of opening cabinet and drawer doors.
When the player presses a key, they can push or pull the cabinet/drawer doors in the direction they're aiming, based on where their crosshair is pointing.
Gaming: Shooting rigid objects
We demonstrate our interactive game in Unreal Engine with game features including shooting rigid objects segmented from the scene.
The player can shoot yellow balls with the gun.