VideoRobotics

Human-Object Interaction

Whole-body video of people interacting with objects across the full range of everyday physical engagement, including reaching, grasping, lifting, carrying, placing, and operating. The framing keeps the person's body, the object, and the surrounding context all in view rather than cropping to the hands alone.

Hours12K+

Objects100+

Interactions8+

Training Use Cases

✓Robot imitation learning from whole-body human demonstration

✓Action recognition for object-centric activity

✓Embodied AI training with full-body context

✓Video generation of contact-rich human action

Key Highlights

✓100+ object categories covered including furniture, containers, tools, food items, packaging, clothing, and electronics

✓8+ interaction types represented including reaching, grasping, lifting, carrying, placing, opening, closing, and operating

✓Whole-body framing rather than hand-only crops, preserving posture and surrounding context

✓Both single-object handling and multi-object sequences captured in real settings

Metadata Fields

durationLength of clip in HH:MM:SS

resolutionPixel dimensions (e.g., 1920x1080, 3840x2160)

frame_rateFrames per second (e.g., 24, 30, 60, 120)

contains_audioWhether the clip carries an audio track (boolean)

primary_categoryDominant content category assigned to a video

interaction_typereaching | grasping | lifting | carrying | placing | opening | closing | operating

object_classfurniture | container | tool | food | packaging | clothing | electronic

person_countNumber of people in frame

framingfull_body | upper_body | hands_and_object