Datasets
VideoRobotics

Human-Object Interaction

Whole-body video of people interacting with objects across the full range of everyday physical engagement, including reaching, grasping, lifting, carrying, placing, and operating. The framing keeps the person's body, the object, and the surrounding context all in view rather than cropping to the hands alone.

Hours12K+
Objects100+
Interactions8+

Training Use Cases

Robot imitation learning from whole-body human demonstration
Action recognition for object-centric activity
Embodied AI training with full-body context
Video generation of contact-rich human action
Key Highlights
100+ object categories covered including furniture, containers, tools, food items, packaging, clothing, and electronics
8+ interaction types represented including reaching, grasping, lifting, carrying, placing, opening, closing, and operating
Whole-body framing rather than hand-only crops, preserving posture and surrounding context
Both single-object handling and multi-object sequences captured in real settings

Metadata Fields

durationLength of clip in HH:MM:SS
resolutionPixel dimensions (e.g., 1920x1080, 3840x2160)
frame_rateFrames per second (e.g., 24, 30, 60, 120)
contains_audioWhether the clip carries an audio track (boolean)
primary_categoryDominant content category assigned to a video
interaction_typereaching | grasping | lifting | carrying | placing | opening | closing | operating
object_classfurniture | container | tool | food | packaging | clothing | electronic
person_countNumber of people in frame
framingfull_body | upper_body | hands_and_object