Datasets
VideoGaming

Open-World Navigation

Long-horizon traversal of large, freely navigable game worlds, with every keypress, mouse movement, and controller input timestamped to microseconds and synchronized to video frames. Sessions cover full traversal arcs, including the open-ended decision-making that drives where the player goes next.

Hours61.2K+
Games100+
Genres4+

Training Use Cases

Action-conditioned video generation and learned world models
Embodied agent training and behavioral cloning
Vision-language-action models bridging visual observation and discrete action
Navigation policy learning in 3D environments
Key Highlights
100+ game titles and engines covering open-world action, RPG, survival, and exploration
Per-frame player input captured alongside the video
Both keyboard-and-mouse and controller input streams captured
Long, uninterrupted traversal sequences alongside short combat and interaction clips

Metadata Fields

durationLength of clip in HH:MM:SS
resolutionPixel dimensions (e.g., 1920x1080, 3840x2160)
frame_rateFrames per second (e.g., 24, 30, 60, 120)
contains_audioWhether the clip carries an audio track (boolean)
primary_categoryDominant content category assigned to a video
stylephotorealistic | stylized | low_poly | painterly
subgenreaction | rpg | survival | exploration | sandbox
game_titleGame name and version (e.g., Elden Ring 1.10, GTA V Build 3258)
input_devicekeyboard_mouse | controller
traversal_typewalking | vehicle | flight | swimming (one or more values)