Browse all datasets

Multichannel Audio

Multi-Speaker Podcasts

A large-scale collection of natural, multi-speaker podcast conversations spanning diverse topics, accents, and languages. Sourced from real-world recordings to reflect authentic human dialogue.

  • 500K+

    500K+

    500K+

    Hours

    Hours

    Hours

    An extensive audio corpus providing the volume and variety needed to train robust speech recognition, diarization, and language models at scale.

    450K+

    450K+

    450K+

    Conversations Available

    Conversations Available

    Conversations Available

    Each entry is a distinct, real-world podcast conversation featuring multiple speakers in natural, unscripted dialogue.

    15+

    15+

    15+

    Languages

    Languages

    Languages

    Spanning a broad range of languages and regional dialects, from Mandarin and Hindi to European and Latin American varieties, enabling truly multilingual model development.

    30+

    30+

    30+

    Accents & Dialects

    Accents & Dialects

    Accents & Dialects

    Featuring a wide spectrum of regional accents and dialects within each language, ensuring models trained on this data generalize across real-world speaker variation.

Singaporean Accents

English

0:000:00

Metadata

Number of Speakers

11

Primary Category

Podcasts

Quality Score

75

Split Audio Channels

Yes

Content Overview

Dataset Coverage

This dataset contains multi-speaker podcast audio capturing natural, unscripted conversations across a wide range of topics, languages, and accents, enabling robust speaker diarization, accent-aware speech recognition, and conversational language modelling across real-world dialogue conditions.

Region

North American

European

MENA

Global

Topic

Sports

News

Business

Technology

Explore more datasets

Explore more datasets