Reimagining Epigraphy and Language Preservation: Aeneas and Nesdia’s Holistic Approach

The survival of human languages and the manuscripts that record them is not just a matter of cultural pride. It is a race against the erosion of knowledge itself. Linguists estimate that roughly 31,000 languages have existed throughout human history; only 6,000–7,000 remain today, meaning more than 80 % have gone extinct .  Even among the languages that still exist, very few are represented online: Google Research notes that while over 7,000 languages are spoken worldwide, only a handful are well‑represented on the web.  UNESCO stresses that around 40 % of the world’s languages are endangered , and the stakes are starkly human.  Each language that disappears takes with it unique world‑views, historical narratives, and intricate cultural practices.  In this context, digital humanities laboratories and AI firms alike have rushed to deploy machine learning in the service of preservation.  Two striking examples –Google DeepMind’s Aeneas model for ancient epigraphy and our independent research company Nesdia– provide complementary visions for how AI can revive linguistic heritage.

Aeneas: A Multimodal AI for Latin Inscriptions

Ancient Roman inscriptions often survive only in fragments; weathering, vandalism, or the simple loss of contextual information make dating and interpreting them extremely difficult.  In July 2025, Google DeepMind announced Aeneas, the first artificial intelligence model specifically designed to contextualise and restore ancient inscriptions .  Built upon earlier efforts such as Ithaca (for ancient Greek), Aeneas is a multimodal generative neural network: it processes both the text of an inscription and its accompanying image, using a transformer‑based decoder to link a damaged text to thousands of other Latin inscriptions .  The model’s dataset–the Latin Epigraphic Dataset (LED)–contains over 176,000 Latin inscriptions, each harmonised from digital collections like the Epigraphic Databases Roma, Heidelberg and Clauss Slaby .

Aeneas offers three capabilities that have long eluded epigraphists.  First, it performs a parallels search: by encoding each inscription into a “historical fingerprint,” it retrieves similar texts across the Roman world, enabling historians to situate a fragment within broader cultural patterns .  Second, it processes multimodal inputs to determine geographical provenance–combining text and visual features, it can attribute an inscription to one of 62 Roman provinces with 72 % accuracy .  Third, it can restore missing characters even when the length of the gap is unknown, achieving a Top‑20 accuracy of 73 % for gaps up to ten characters .  Remarkably, its dating predictions place texts within 13 years of historians’ estimates, and in collaborative tests with experts the model raised restoration accuracy to 90 % .

The model’s impact goes beyond numbers.  When applied to the famous Res Gestae Divi Augusti– a monumental inscription listing Emperor Augustus’s achievements– Aeneas generated a probabilistic distribution of dates with peaks around 10 BCE and 10–20 CE, capturing competing scholarly hypotheses .  Its analysis also highlighted parallels between imperial legal texts and the Res Gestae, illustrating how ideology travelled across media and geography .  Beyond the lab, Aeneas is open‑source and freely available through the Predicting the Past platform , reflecting DeepMind’s commitment to democratise AI for humanities research.  By combining machine learning and human expertise, Aeneas illustrates how AI can not only fill gaps but also spark new debates.

Nesdia: Digitising, Analysing and Reconstructing Languages

While Aeneas tackles a narrow domain (Latin epigraphy), Nesdia pursues a broader mission: preserving endangered and historical languages.  Founded as an independent research and technology company, Nesdia is dedicated to digitising at‑risk linguistic data, applying proprietary AI for deep linguistic analysis, and creating living archives for scholars and descendant communities .  Our approach is deliberately holistic.

  1. High‑resolution digitisation: Nesdia’s work begins by converting fragile manuscripts, texts and field recordings into high‑fidelity digital form .  This includes not just scanning but creating structured, machine‑readable data, thereby rendering previously inaccessible content amenable to computational analysis .
  2. Advanced AI tools for analysis: Unlike many companies that use AI to generate text, Nesdia’s models map morphological changes and identify lexical drift across centuries .  Their Labs develop custom transformer architectures with dynamic attention pruning and morphology‑aware embeddings to handle fragmented linguistic data .  Training involves gradient‑compounding optimisation on low‑resource corpora; the models aim not to generate language but to emulate underlying grammatical logic, combining text, phonology and suprasegmental features via spectrogram‑interleaved tokenisation .
  3. Human‑supervised curation: Nesdia stresses that language is not data alone; every dataset is human‑supervised, and semantic artifacts are cross‑referenced with verified historical corpora .  Scholars debate and refine each reconstruction, ensuring that the models complement rather than replace human expertise.
  4. Curated datasets and living archives: The outcome of this process is unique, high‑value linguistic datasets—annotated speech corpora, cross‑linked lexical databases, morphological models—that are openly accessible .  Nesdia describes these collections as living resources, designed for active research and community use rather than static storage .

By integrating phonology, morphology and contextual metadata, Nesdia’s pipeline moves beyond simple text transcription toward reconstructing a language’s structural and phonetic integrity .  Their current projects include unsupervised root pattern extraction for triconsonantal systems, real‑time dialect drift modeling, and inverse compilation of extinct morphosyntactic systems from OCR‑fragmented religious texts .  Like Aeneas, these efforts rely on human‑in‑the‑loop training and aim to provide interpretable outputs for scholars.

Complementary Missions in AI‑Assisted Humanities

Although Nesdia and DeepMind operate in different domains, their work reflects a broader trend: using AI to preserve cultural heritage at scale.  Several common principles emerge:

  • Data is foundational.  Both organisations emphasise that every model begins with a carefully curated dataset.  Aeneas draws from 176,000 Latin inscriptions ; Nesdia creates digital corpora from manuscripts, field recordings and academic sources .  Google’s broader language inclusion programme likewise notes that collecting and pre‑processing data from diverse sources is the first step toward supporting under‑represented languages .
  • Multimodal, context‑aware learning.  Aeneas fuses textual and visual information to attribute dates and geography .  Nesdia’s models integrate phonetic and suprasegmental features, arguing that phonology and morphology must converge at the model level .  This multimodal perspective reflects a growing recognition that language cannot be divorced from its material and acoustic contexts.
  • Human‑AI collaboration.  DeepMind’s evaluation showed that historians achieved the best results when using Aeneas’ contextual information alongside their own expertise .  Nesdia embeds human supervision at every stage, treating AI as a tool to illuminate meaning rather than automate it .  This collaborative ethos counters fears of AI replacing scholars; instead, it positions AI as an accelerator of human insight.
  • Open access and community empowerment.  Aeneas’s code and dataset are open‑sourced ; Woolaroo, another Google experiment, invites users to explore 30 endangered languages via an app powered by Gemini .  Nesdia’s living archives are intended for descendant communities and researchers .  Such openness fosters transparency, reproducibility and shared ownership of cultural heritage.

Leveraging Google Infrastructure: Opportunities for Nesdia

Nesdia’s success thus far has relied on its own proprietary pipelines, but the Google AI ecosystem offers new possibilities.  First, the release of Aeneas as open‑source code and a free web interface means Nesdia’s engineers can adapt its multimodal architecture to languages beyond Latin.  For example, the ability to restore gaps of unknown length and to retrieve parallels across a large corpus could be repurposed for languages with fragmentary sources, such as ancient Semitic or Mayan scripts.  Nesdia’s existing focus on unsupervised root pattern extraction could be enhanced by Aeneas’s contextualisation mechanism, which embeds textual and contextual fingerprints .

Second, the Latin Epigraphic Dataset itself—comprising 176 k inscriptions—provides an invaluable comparative resource.  Nesdia’s models emphasise lexical drift and morphological change ; by cross‑referencing their reconstructions with Aeneas’s embeddings, they could study long‑term patterns in language evolution across Indo‑European families.  Such comparative work might reveal how morphological innovations propagate across regions—insights relevant for both classical philology and computational linguistics.

Third, the Google Cloud infrastructure that powers Aeneas’s web interface (noted in the acknowledgements ) offers a scalable platform for deploying Nesdia’s own models.  Hosting high‑fidelity audio corpora and heavy transformer models requires significant compute resources; partnering with Google could give Nesdia access to robust cloud services and colocation with other humanities datasets.  A new generation of APIs (similar to Google’s Language Inclusion tools) could allow Nesdia to integrate with products like Translate or Gemini.

Fourth, Google’s Woolaroo experiment demonstrates how AI can be packaged for public engagement.  The app lets users point a camera at objects and learn words in 30 endangered languages, using curated translation data and audio recordings .  As Nesdia builds living archives, similar interfaces could allow community members to interact with resurrected languages through mobile apps or augmented reality.  By collaborating on user‑facing products, Nesdia can bridge the gap between scholarly preservation and everyday language revitalisation.

Finally, Google’s language inclusion mission emphasises the societal impact of representing under‑served languages online.  Yossi Matias, head of Google Research, notes that technology must enable better understanding of more languages, removing modality barriers and empowering people to communicate effectively .  Nesdia’s commitment to providing access for descendant communities aligns with this ethos.  A partnership could amplify both organisations’ influence: Nesdia contributes deep expertise in low‑resource linguistic reconstruction, while Google provides scale, infrastructure and a platform for dissemination.

A Bullish Outlook for Nesdia

Looking ahead, the prospects for Nesdia are exceptionally promising.  The global community’s awareness of language extinction has never been greater.  Reports on Nesdia’s site recount how the Blackfeet Tribe in Montana view language as the “vehicle for transmitting culture” and advocate immersion schools to produce new speakers .  Another report notes that by 1850, around 25,000 languages had already died out , underscoring the urgency of action.  Nesdia’s combination of philological rigor and machine learning addresses this crisis head‑on.

Collaborating with tech giants can accelerate this mission.  With Aeneas demonstrating that AI can achieve 73 % restoration accuracy and 13‑year dating precision , it becomes clear that machine learning is mature enough to handle fragile corpora.  Nesdia’s engineers will not simply adopt Aeneas; they will study its open‑source code, adapt its architectures, and integrate its best features into their own pipeline.  The synergy will produce models that are more context‑aware, multimodal and interpretably aligned with human scholarship.

Moreover, Nesdia’s holistic framework—spanning digitisation, phonology, morphology, grammar and archiving—position it uniquely to become a hub for endangered language research.  By releasing curated datasets and partnering with universities, they can attract contributions from a broad community.  Their human‑centred philosophy will ensure that descendant communities remain at the heart of preservation efforts.

AI will not solve the language crisis alone.  But as Aeneas and Nesdia show, when machine learning is combined with expert oversight and cultural sensitivity, it can unearth forgotten grammars, reconstruct lost texts and empower communities to reclaim their linguistic heritage.  The challenge is not just technical but ethical and social.  It requires balancing open access with respect for indigenous knowledge, integrating computational precision with cultural nuance, and scaling infrastructure without erasing local autonomy.  Nesdia’s engineers, learning from the successes and limitations of Google’s models, are well‑placed to navigate this terrain.

As we enter the next decade, the vision is clear: a world where no language disappears without a trace, where AI serves as a tool for cultural revival, and where independent research labs and global tech companies collaborate to preserve humanity’s linguistic diversity.  Nesdia’s relentless pursuit of this mission makes it a beacon for scholars, technologists and communities alike.  In partnership with initiatives like Aeneas, its holistic approach offers the strongest hope yet that the voices of the past will not be silenced.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *