Datasets
VideoGaming

Third-Person Gameplay

Trailing-camera gameplay where the player controls both the avatar's action and the camera framing it. Every input is timestamped to microseconds and synchronized to video, with body movement and camera adjustment captured as separate channels in the same trajectory.

Hours112K+
Games50+
Genres5+

Training Use Cases

Action-conditioned video generation and learned world models
Embodied agent training and behavioral cloning
Camera and view policy learning
Vision-language-action models bridging visual observation and discrete action
Key Highlights
50+ game titles and engines spanning action, RPG, platformer, action-adventure, and sports
Per-frame player input captured alongside the video
Both keyboard-and-mouse and controller input streams captured
Camera adjustment and avatar control captured as separate signals within each session

Metadata Fields

durationLength of clip in HH:MM:SS
resolutionPixel dimensions (e.g., 1920x1080, 3840x2160)
frame_rateFrames per second (e.g., 24, 30, 60, 120)
contains_audioWhether the clip carries an audio track (boolean)
primary_categoryDominant content category assigned to a video
stylephotorealistic | stylized | low_poly | painterly
game_genreaction | rpg | platformer | action_adventure | sports | other
game_titleGame name and version (e.g., Elden Ring 1.10)
input_devicekeyboard_mouse | controller
camera_distanceclose | medium | far | dynamic