Audio

Conversational Speech

Recordings of how people actually talk to each other in real life, captured across phone calls, in-person chats, video calls, and casual catch-ups. The set keeps the rough texture of natural speech intact, including fillers, hesitations, false starts, overlapping speech, laughter, and the full range of informal pronunciation.

Hours521K+

Countries40+

Contexts6+

Training Use Cases

✓ASR robust to disfluencies and casual speech

✓Conversational AI training on natural dialogue

✓Speaker diarization in informal contexts

✓Voice cloning across natural speech variation

Key Highlights

✓40+ countries of speaker origin spanning urban, rural, and international communities

✓6+ recording contexts including phone calls, video calls, in-person, cafe, vehicle, and home settings

✓Unscripted and unrehearsed throughout, with disfluencies, fillers, and overlap kept intact

✓Both close-mic and ambient capture conditions represented in the set

Metadata Fields

durationLength of recording in HH:MM:SS

sample_rateAudio sample rate (e.g., 16kHz, 44.1kHz, 48kHz)

channelsmono | stereo

languagePrimary spoken language (ISO 639-1 code)

primary_categoryDominant content category assigned to a recording

conversation_typephone_call | video_call | in_person | casual_chat

speaker_countNumber of speakers in the recording

recording_contexthome | cafe | vehicle | outdoor | office

has_overlapWhether overlapping speech occurs in the recording (boolean)