Of the roughly 7,000 languages spoken on Earth today, a significant proportion have never been written down. They exist only in the mouths and ears of their speakers — in conversation, in song, in the stories told at night. When the last fluent speaker of such a language dies, nothing remains. No manuscript, no dictionary, no grammar. The language vanishes as completely as if it had never existed. This is the crisis that a small network of audio archives, field linguists, and computational phonologists is racing to address, armed with microphones, spectrographs, and an increasingly urgent awareness that sound may be the only medium capable of preserving what writing cannot.
Languages That Cannot Be Written Down
The challenge begins with a fact that literate societies tend to overlook: writing is not a natural property of language. Of the approximately 7,000 living languages catalogued by Ethnologue, fewer than half have any established writing system. Many of the rest are tonal languages in which pitch contour — the rise and fall of a speaker’s voice — distinguishes one word from another. In Mandarin Chinese, the syllable ma can mean mother, hemp, horse, or scold depending on its tone. But Mandarin has a centuries-old written tradition and sophisticated tonal notation. Most tonal languages do not. According to the World Atlas of Language Structures, of 527 languages surveyed for tonal features, 220 possess some form of tone system — 132 with simple tone systems and 88 with complex ones. Many of these are among the world’s most endangered languages, spoken in sub-Saharan Africa, Southeast Asia, and the Pacific, where oral tradition has historically been the primary mode of cultural transmission.
The problem is not merely that these languages lack alphabets. It is that their essential features resist alphabetisation. Click consonants, tonal registers, pharyngeal articulations, and the subtle rhythmic patterns of oral poetry cannot be fully captured by any writing system yet devised. The International Phonetic Alphabet provides diacritics for tone — acute accents for high tones, grave accents for low, macrons for mid-level, carons for rising — but these are approximations designed for linguistic analysis, not for community use. A language whose speakers have never used writing cannot be saved by writing alone.
The Science of Tonal Preservation: Why Recording Is Not Enough
If writing cannot capture tonal languages, the obvious answer is sound recording. But recording introduces its own set of problems, many of them technical and surprisingly difficult to solve. Capturing tonal variation is one of the most demanding aspects of speech data collection. Unlike basic phoneme recognition, where a recording device needs only to capture the sequence of sounds, tone-sensitive recording requires preserving exact pitch contours and relative pitch differences between syllables.
Low-quality recording devices — the kind most readily available in the remote communities where endangered languages are spoken — compress audio in ways that flatten or distort pitch. Tonal distinctions can be blurred or lost entirely. Background noise, whether from wind, insects, or generators, can mask the subtle pitch shifts on which meaning depends. Even when equipment is adequate, speaker behaviour introduces variability. In natural speech, the pitch characteristics of lexical tones are highly variable — shaped by the anatomy of the speaker’s vocal folds, by discourse prominence, by sentence-level phrasing, and by the gradual downdrift of fundamental frequency over the course of an utterance. A speaker addressing a non-native listener or reading from a written prompt may unconsciously flatten their tonal range, producing recordings that fail to capture the full system.
There are also disagreements among researchers about what they are hearing. The Siberian language Ket has been described as having none, two, four, or eight tones by different analysts. Dar Fur Daju, spoken in Sudan, is reported as non-tonal in one source and transcribed with three tone levels in another. These are not trivial discrepancies. They reflect the genuine difficulty of analysing tonal systems from limited data, particularly when the analyst is not a native speaker.
ELAR and PARADISEC: The Great Digital Archives
Two institutions stand at the centre of the global effort to preserve endangered language audio. The Endangered Languages Archive (ELAR), now based at the Berlin-Brandenburg Academy of Sciences and Humanities, holds materials for over 770 endangered languages recorded in more than 90 countries, with over 550 individual deposits. Founded in 2002 with funding from Arcadia, the charitable fund of Lisbet Rausing and Peter Baldwin, ELAR was created alongside the Endangered Languages Documentation Programme to provide a permanent repository for the recordings, transcriptions, and analyses that field linguists produce. It accepts collections from any documenter regardless of funding source, and access is free.
PARADISEC — the Pacific and Regional Archive for Digital Sources in Endangered Cultures — operates as a consortium of the University of Sydney, the University of Melbourne, and the Australian National University. Its focus is the Asia-Pacific region, where linguistic diversity is staggering: Papua New Guinea alone contains roughly 900 languages, many spoken by communities of a few hundred people. PARADISEC’s collection comprises 3,700 hours of audio recordings representing 1,291 languages, occupying approximately 140 terabytes of storage. The archive was inscribed into the Australian Register of the UNESCO Memory of the World Program in 2013 and distributes copies of its holdings to cultural centres across the Pacific, including the Vanuatu Kaljoral Senta, the Institute of Papua New Guinea Studies, and the Solomon Islands National Museum.
Both archives face the same existential challenge: most of their analogue source materials — reel-to-reel tapes, cassettes, and early digital formats — are physically deteriorating. PARADISEC has warned that most analogue tapes in its collection are not expected to last beyond 2025, making digitisation not merely desirable but urgent. Every year of delay means potential data loss that cannot be recovered.
Endangered Click Languages of Southern Africa
Nowhere is the fragility of oral languages more starkly illustrated than among the click languages of southern Africa. The Khoisan language families — a loose grouping of languages that share the distinctive use of click consonants as phonemes — include some of the most phonologically complex and most endangered languages on Earth.
N|uu, a language of the San people of South Africa, possesses 112 distinct sounds including 45 click consonants. As of late 2025, it has one known fluent speaker: Katrina Esau, aged 92. Her brother Simon Sauls, who was one of the last remaining speakers, died in June 2021. The language has been passed down entirely orally — there was no written form until linguists began documentation work. The first N|uu dictionary was completed in 2022, and in May 2024, Katrina Esau began teaching local schoolchildren the basics of N|uu — the first intergenerational transmission in decades. Whether this effort can produce fluent speakers before the language’s last native voice falls silent is an open question.
Taa, also known as !Xóõ, holds the distinction of having the largest documented consonant inventory of any language, with over 100 consonants including clicks across five places of articulation. It retains approximately 2,500 to 3,000 speakers in the Kalahari region of northern Botswana and eastern Namibia, making it the most vital surviving language of the Tuu family. But vitality is relative. The pressures of sedentarisation, dominant-language education, and economic marginalisation continue to erode its speaker base.
The phonological complexity of click languages makes them particularly dependent on audio preservation. No writing system can adequately represent the full range of click articulations, accompanying tonal patterns, and phonation types that distinguish words in these languages. The clicks themselves — dental, lateral, palatal, alveolar, bilabial — are produced by mechanisms entirely absent from most of the world’s languages, and the tonal systems layered on top of them add further dimensions that defy simple transcription. For these languages, the recording is not a supplement to the written record. It is the record.
Oral Poetry and Song as Linguistic Vessels
Oral poetry occupies a unique position in language preservation. It is not merely a cultural artefact but a linguistic technology — a system for encoding and transmitting complex information across generations without writing. The Hawaiian mele tradition, for example, preserves language, cultural values, genealogies, and cosmological narratives within chanted verse. The epic Kumulipo, which tells the creation story of the world and the Hawaiian people, functions simultaneously as literature, history, and linguistic repository. Maori waiata — sung poetry — encodes genealogy, spiritual beliefs, and tribal history in forms specifically designed for oral transmission and memorisation.
UNESCO has stated that oral expressions and their public performance are more effective at safeguarding a language than dictionaries, grammars, and databases. Languages “live in songs and stories, riddles and rhymes,” making the protection of oral traditions and language transmission closely linked activities. This is not sentimentality. A Jicarilla Apache traditional story, for example, requires approximately 40 hours for full correct interpretation because of the layers of philosophy, religion, custom, and linguistic register embedded within it. No dictionary entry could capture what that narrative contains.
The implication for preservation is clear: recording oral poetry is not an optional cultural supplement to linguistic documentation. It is an essential component. A language archived only as word lists and grammatical paradigms is a skeleton. The oral traditions — the songs, the stories, the conversational genres — are the living tissue.
Computational Phonology: From Spectrograms to Searchable Archives
The sheer volume of audio that archives like ELAR and PARADISEC hold creates its own problem: raw recordings are useful only if they can be transcribed, annotated, and searched. Traditional manual transcription is extraordinarily labour-intensive. Phone-level annotation of audio can take up to 13 hours per minute of recording. At that rate, the more than 50,000 hours of audio held by the ARC Centre of Excellence for the Dynamics of Language alone would require approximately two million hours of human labour to transcribe — an obvious impossibility.
Computational tools are beginning to close this gap. ELPIS, the Endangered Language Pipeline and Inference System developed by the ARC Centre of Excellence for the Dynamics of Language, allows language documentation workers with minimal computational experience to build their own automatic speech recognition models. It has been used to construct ASR systems for 16 languages from the Asia-Pacific region, giving access to both Kaldi and Huggingface Transformers frameworks. The Montreal Forced Aligner automates time-aligned phonetic transcription using Praat TextGrids, reducing annotation time from up to 13 hours per minute of audio to approximately 30 to 40 minutes — a roughly twenty-fold improvement. ELAN, originally developed at the Max Planck Institute for Psycholinguistics, provides annotation software for multimedia recordings that has become standard in linguistic fieldwork.
In 2017, the ARC Centre of Excellence partnered with Google to develop machine learning technologies for endangered language audio, producing AI models for 12 Indigenous Australian languages. Te Hiku Media in New Zealand developed an automatic speech recognition model for Te Reo Maori achieving 92 percent accuracy, outperforming efforts by major technology companies — a notable case of community-led AI development. At Dartmouth, researcher Rolando Coto Solano has built ASR models for Cook Islands Maori, Bribri, and Cabecar using machine learning to identify speech patterns from audio.
More recently, the Living Tongues Institute has partnered with audio company Shure to deploy compact, solar-rechargeable microphones — their MoveMic system — for recording endangered languages in remote field conditions. Their Living Dictionaries platform now serves over 210 language communities with more than 250,000 entries, combining audio recordings with multimedia lexical data in a format that communities themselves can contribute to and maintain.
There are risks, however. In December 2024, reports emerged of AI-generated books purporting to teach Abenaki that contained incorrect translations and non-Abenaki words. Members of the Abenaki First Nation condemned the publications, highlighting the danger of deploying language technology without community oversight. The tools are powerful, but they are not neutral, and their misuse can do active harm to the communities they claim to serve.
The Race to Record
The Catalogue of Endangered Languages estimates that an average of 3.5 languages become extinct every year — approximately one every four months. The United Nations has cited a figure of one indigenous language disappearing every two weeks. Without intervention, language loss could triple within 40 years, with projections suggesting that between 50 and 90 percent of current languages will be severely endangered or dead by 2100. The UN has designated 2022–2032 as the International Decade of Indigenous Languages, a recognition at the highest institutional level that the crisis demands coordinated global action.
For tonal and oral languages, the window is especially narrow. These are languages whose full complexity can only be captured in sound, whose grammatical and phonological structures resist reduction to text, and whose speakers are often elderly, geographically isolated, and under pressure from dominant-language education systems. Research published in Nature Ecology & Evolution has identified higher levels of schooling in a dominant language and greater road density — connectivity to dominant-culture areas — as significant predictors of language endangerment. The forces driving language loss are structural, economic, and infrastructural, not merely cultural.
The archives, the field recordings, the computational tools — these are not solutions to the crisis of language loss. Languages survive when children speak them, when communities use them, when they retain social and economic value. But when a language cannot be saved as a living system, sound archives become the last line of defence against total erasure. A high-fidelity recording of an elderly speaker narrating a traditional story in a tonal click language preserves something that no other medium can: the actual sound of human knowledge, encoded in patterns of pitch and articulation that evolved over millennia and that, once lost, cannot be reconstructed from any written description. The microphone, in this context, is not merely a research tool. It is a vessel for carrying what would otherwise be silence.
