Datasets
Audio

Multilingual Conversations

Captured across casual chats, structured interviews, group discussions, and call-style exchanges, multilingual conversations show how natural speech behaves outside scripted broadcast contexts. The form covers two-person dialogues through small-group debates, with code-switching, dialect variation, and natural disfluencies preserved rather than cleaned out.

Hours812K+
Languages50+
Countries100+

Training Use Cases

Multilingual speech recognition
Multilingual conversational AI training and evaluation
Speech translation and cross-lingual alignment
Language and dialect identification
Key Highlights
50+ languages and major dialects represented across speakers
100+ countries of speaker origin spanning urban, rural, and diaspora communities
Code-switching and intra-conversation language mixing preserved as it occurs naturally
Mix of two-person dialogues, group discussions, and phone-call format conversations

Metadata Fields

durationLength of recording in HH:MM:SS
sample_rateAudio sample rate (e.g., 16kHz, 44.1kHz, 48kHz)
channelsmono | stereo
languagePrimary spoken language (ISO 639-1 code)
primary_categoryDominant content category assigned to a recording
conversation_formattwo_person_dialogue | group_discussion | phone_call
speaker_countNumber of speakers in the recording
country_of_originCountry where speaker is based (ISO 3166 code)
has_code_switchingWhether the recording includes code-switching between languages (boolean)