Datasets
Audio
Multilingual Conversations
Captured across casual chats, structured interviews, group discussions, and call-style exchanges, multilingual conversations show how natural speech behaves outside scripted broadcast contexts. The form covers two-person dialogues through small-group debates, with code-switching, dialect variation, and natural disfluencies preserved rather than cleaned out.
Hours812K+
Languages50+
Countries100+
Training Use Cases
✓Multilingual speech recognition
✓Multilingual conversational AI training and evaluation
✓Speech translation and cross-lingual alignment
✓Language and dialect identification
Key Highlights
✓50+ languages and major dialects represented across speakers
✓100+ countries of speaker origin spanning urban, rural, and diaspora communities
✓Code-switching and intra-conversation language mixing preserved as it occurs naturally
✓Mix of two-person dialogues, group discussions, and phone-call format conversations
Metadata Fields
durationLength of recording in HH:MM:SS
sample_rateAudio sample rate (e.g., 16kHz, 44.1kHz, 48kHz)
channelsmono | stereo
languagePrimary spoken language (ISO 639-1 code)
primary_categoryDominant content category assigned to a recording
conversation_formattwo_person_dialogue | group_discussion | phone_call
speaker_countNumber of speakers in the recording
country_of_originCountry where speaker is based (ISO 3166 code)
has_code_switchingWhether the recording includes code-switching between languages (boolean)