A language dies roughly every two weeks. With nearly half of the world’s 7,000 languages at risk of vanishing within a generation, linguists and technologists are locked in an unprecedented race against time. But a new ally has emerged: artificial intelligence. From New Zealand to the American Midwest, AI-powered tools are opening pathways to document, teach, and revitalize languages that might otherwise fall silent forever.
The Data Problem No One Talks About
Modern AI thrives on data — vast oceans of text, audio, and video. Large language models like GPT and Claude have been trained on billions of words, but overwhelmingly in a handful of dominant languages. As Carlos Pinhanez of IBM Research points out, “For 98% of the languages in the world, we’re never going to have the data we have for a hundred languages.” This asymmetry means the same technology that powers instant translation between English and Mandarin is virtually useless for Choctaw, Lakota, or Cook Islands Māori — precisely the languages most in need of support.
Yet researchers are finding creative ways around this scarcity. At Dartmouth College, Professor Rolando Coto Solano has built automatic speech-recognition models for Cook Islands Māori that can identify speech patterns from audio recordings and transcribe them into text. Graduate student Ivory Yang is exploring how generative AI can help preserve Nüshu, an ancient women’s script from southern China that was nearly lost to history. Their work demonstrates that AI can produce valuable linguistic resources even from minimal data — a critical breakthrough for languages with few remaining speakers and limited written records.
Te Hiku Media and the Māori Speech Revolution
Perhaps the most striking example of AI-driven language preservation comes from Aotearoa New Zealand. Te Hiku Media, a local broadcaster, has developed automatic speech recognition models that can transcribe te reo Māori with 92% accuracy. Their multilingual model, Papa Reo, was created under the leadership of Peter Lucas Jones, who Time Magazine named one of the most influential people in AI in 2024.
What makes Te Hiku’s approach distinctive isn’t just the technology — it’s the philosophy. The project operates under principles of Indigenous data sovereignty, meaning the Māori community retains ownership and control over the linguistic data used to train the models. This stands in sharp contrast to Big Tech approaches where training data is extracted without community consent or benefit.
Robots That Speak Anishinaabemowin
Danielle Boyer, a 24-year-old Anishinaabe roboticist, has taken a radically different approach. She designed the “Skobot,” a small robot intended to perch on a wearer’s shoulder, listening and joining in on conversation — in fluent Anishinaabemowin, an Algonquian language spoken across the Great Lakes region. The Skobot isn’t meant to replace human speakers. Instead, it serves as a conversational companion that normalises the use of endangered languages in everyday life, particularly for younger generations who may have limited access to fluent elders.
Meanwhile, at the University of Southern California, Dr. Jacqueline Brixey has developed “ChoCo,” a Choctaw language corpus, and “Masheli,” a conversational dialogue system that allows users to interact in Choctaw. Her speech-to-text system is expected to be publicly available through dedicated apps, representing one of the first comprehensive AI-powered toolkits for a specific Native American language.
The Danger of Getting It Wrong
For all its promise, AI in language preservation carries real risks. Dr. Brixey herself has warned that tools like ChatGPT perform poorly with Indigenous languages: “ChatGPT could be good in Choctaw, but it’s currently ungrammatical; it shares misinformation about the tribe” and “makes up what it claims are tribal stories.” When a language has only hundreds or thousands of speakers, misinformation generated by AI could pollute the very record researchers are trying to protect.
This is why initiatives like FLAIR — First Languages AI Reality, based at Mila in Quebec — insist on building AI foundations that respect linguistic self-determination. Their automatic speech recognition research focuses on creating methods for the rapid development of custom models for endangered languages, but always with Indigenous communities in the driver’s seat. The technology is a tool, not a substitute for the cultural knowledge that gives language its meaning.
A New Chapter, Not an Ending
The intersection of AI and language preservation is still in its earliest chapters. IBM and the University of São Paulo are developing open-source AI tools for the approximately 200 Indigenous languages spoken in Brazil, with plans to transfer the technology to Indigenous organisations and startups. Google and Microsoft are building translation tools designed to make endangered languages more accessible to digital audiences. And the United Nations, having declared 2022–2032 the International Decade of Indigenous Languages, is lending institutional weight to the cause.
None of this will matter without the speakers themselves. AI can transcribe, translate, and generate — but it cannot replicate the lived experience of speaking a language within its cultural context. The most successful projects share a common thread: they centre community ownership, treat technology as a servant rather than a saviour, and recognise that preserving a language means preserving a way of seeing the world. In that race against silence, AI is proving to be a powerful accelerant — but the runners are, and must remain, human.
