Datasets
VideoGaming
Third-Person Gameplay
Trailing-camera gameplay where the player controls both the avatar's action and the camera framing it. Every input is timestamped to microseconds and synchronized to video, with body movement and camera adjustment captured as separate channels in the same trajectory.
Hours112K+
Games50+
Genres5+
Training Use Cases
✓Action-conditioned video generation and learned world models
✓Embodied agent training and behavioral cloning
✓Camera and view policy learning
✓Vision-language-action models bridging visual observation and discrete action
Key Highlights
✓50+ game titles and engines spanning action, RPG, platformer, action-adventure, and sports
✓Per-frame player input captured alongside the video
✓Both keyboard-and-mouse and controller input streams captured
✓Camera adjustment and avatar control captured as separate signals within each session
Metadata Fields
durationLength of clip in HH:MM:SS
resolutionPixel dimensions (e.g., 1920x1080, 3840x2160)
frame_rateFrames per second (e.g., 24, 30, 60, 120)
contains_audioWhether the clip carries an audio track (boolean)
primary_categoryDominant content category assigned to a video
stylephotorealistic | stylized | low_poly | painterly
game_genreaction | rpg | platformer | action_adventure | sports | other
game_titleGame name and version (e.g., Elden Ring 1.10)
input_devicekeyboard_mouse | controller
camera_distanceclose | medium | far | dynamic