Datasets
Audio

Conversational Speech

Recordings of how people actually talk to each other in real life, captured across phone calls, in-person chats, video calls, and casual catch-ups. The set keeps the rough texture of natural speech intact, including fillers, hesitations, false starts, overlapping speech, laughter, and the full range of informal pronunciation.

Hours521K+
Countries40+
Contexts6+

Training Use Cases

ASR robust to disfluencies and casual speech
Conversational AI training on natural dialogue
Speaker diarization in informal contexts
Voice cloning across natural speech variation
Key Highlights
40+ countries of speaker origin spanning urban, rural, and international communities
6+ recording contexts including phone calls, video calls, in-person, cafe, vehicle, and home settings
Unscripted and unrehearsed throughout, with disfluencies, fillers, and overlap kept intact
Both close-mic and ambient capture conditions represented in the set

Metadata Fields

durationLength of recording in HH:MM:SS
sample_rateAudio sample rate (e.g., 16kHz, 44.1kHz, 48kHz)
channelsmono | stereo
languagePrimary spoken language (ISO 639-1 code)
primary_categoryDominant content category assigned to a recording
conversation_typephone_call | video_call | in_person | casual_chat
speaker_countNumber of speakers in the recording
recording_contexthome | cafe | vehicle | outdoor | office
has_overlapWhether overlapping speech occurs in the recording (boolean)