Datasets
Video
Talking Head
People speaking to camera or across from an interviewer, framed tightly with controlled lighting that isolates face and voice from surroundings. The footage captures clean audio alongside unobstructed front-of-face framing across a wide range of speakers.
Hours449K+
Countries30+
Languages10+
Training Use Cases
✓Audio-driven facial animation and lip-sync
✓Avatar and digital human generation
✓Visual speech recognition and audio-visual ASR
✓Expression and affect recognition during natural speech
Key Highlights
✓30+ countries of origin and 10+ languages including English, Spanish, Mandarin, French, German, Hindi, Japanese, Arabic
✓5+ framing conventions from tight close-up through medium, two-shot, over-the-shoulder, and cut coverage
✓Solo direct-to-camera and two-person sit-down conversation formats across the set
✓Controlled lighting throughout, with face and voice prioritized over surroundings
Metadata Fields
durationLength of clip in HH:MM:SS
resolutionPixel dimensions (e.g., 1920x1080, 3840x2160)
frame_rateFrames per second (e.g., 24, 30, 60, 120)
contains_audioWhether the clip carries an audio track (boolean)
primary_categoryDominant content category assigned to a video
styletight_close_up | medium_close | two_shot | over_the_shoulder | multi_camera_cut
conversation_formatsolo_to_camera | one_on_one_interview | sit_down_two_shot | hosted_segment
speaker_count1 | 2
languagePrimary spoken language (ISO 639-1 code)
country_of_originCountry where footage was produced (ISO 3166 code)