Datasets
VideoRobotics
Human-Object Interaction
Whole-body video of people interacting with objects across the full range of everyday physical engagement, including reaching, grasping, lifting, carrying, placing, and operating. The framing keeps the person's body, the object, and the surrounding context all in view rather than cropping to the hands alone.
Hours12K+
Objects100+
Interactions8+
Training Use Cases
✓Robot imitation learning from whole-body human demonstration
✓Action recognition for object-centric activity
✓Embodied AI training with full-body context
✓Video generation of contact-rich human action
Key Highlights
✓100+ object categories covered including furniture, containers, tools, food items, packaging, clothing, and electronics
✓8+ interaction types represented including reaching, grasping, lifting, carrying, placing, opening, closing, and operating
✓Whole-body framing rather than hand-only crops, preserving posture and surrounding context
✓Both single-object handling and multi-object sequences captured in real settings
Metadata Fields
durationLength of clip in HH:MM:SS
resolutionPixel dimensions (e.g., 1920x1080, 3840x2160)
frame_rateFrames per second (e.g., 24, 30, 60, 120)
contains_audioWhether the clip carries an audio track (boolean)
primary_categoryDominant content category assigned to a video
interaction_typereaching | grasping | lifting | carrying | placing | opening | closing | operating
object_classfurniture | container | tool | food | packaging | clothing | electronic
person_countNumber of people in frame
framingfull_body | upper_body | hands_and_object