Datasets
Audio
Conversational Speech
Recordings of how people actually talk to each other in real life, captured across phone calls, in-person chats, video calls, and casual catch-ups. The set keeps the rough texture of natural speech intact, including fillers, hesitations, false starts, overlapping speech, laughter, and the full range of informal pronunciation.
Hours521K+
Countries40+
Contexts6+
Training Use Cases
✓ASR robust to disfluencies and casual speech
✓Conversational AI training on natural dialogue
✓Speaker diarization in informal contexts
✓Voice cloning across natural speech variation
Key Highlights
✓40+ countries of speaker origin spanning urban, rural, and international communities
✓6+ recording contexts including phone calls, video calls, in-person, cafe, vehicle, and home settings
✓Unscripted and unrehearsed throughout, with disfluencies, fillers, and overlap kept intact
✓Both close-mic and ambient capture conditions represented in the set
Metadata Fields
durationLength of recording in HH:MM:SS
sample_rateAudio sample rate (e.g., 16kHz, 44.1kHz, 48kHz)
channelsmono | stereo
languagePrimary spoken language (ISO 639-1 code)
primary_categoryDominant content category assigned to a recording
conversation_typephone_call | video_call | in_person | casual_chat
speaker_countNumber of speakers in the recording
recording_contexthome | cafe | vehicle | outdoor | office
has_overlapWhether overlapping speech occurs in the recording (boolean)