Digital Humanities and Scholarship Open access Peer reviewed

Corpus Corporum

Philipp Roelli

Umanistica Digitale | May 21, 2026

Abstract

The Corpus Corporum project hosted by the University of Zurich is the largest structured digital collection of Latin texts. The texts span from antiquity to the twentieth century, currently totalling approximately 226 million words across thirty corpora. Conceived as an open-access research infrastructure, it provides philologists, linguists, historians, and scholars of Latin with a unified environment for reading, searching, and analysing texts encoded in standardised TEI XML format. Important Latin dictionaries are integrated into the site. The platform, built on open-source technologies including BaseX, Sphinx, and TreeTagger, maintains a distinction between corpus, author, work, and edition levels, and integrates persistent identifiers (VIAF, Wikidata) and external resources such as geschichtsquellen.de. Recent advancements are discussed in the article, especially two major new analytical tools. The Text Reuse module enables configurable intertextual analysis based on k-skip-n-gram algorithms, while the Metrical Analysis module automatically identifies Latin poetic metres. These innovations allow large-scale, reproducible investigations of textual transmission and poetic structure. An example concerning the sources of Isidore of Seville’s Etymologiae is briefly discussed. Future developments envision AI-assisted translation, semantic indexing, and synonym-based search, thereby enhancing the platform’s potential as a comprehensive, interoperable resource for digital Latin philology and the broader field of computational humanities.

Direct answer

What can I do from this paper page?

Use this page to scan "Corpus Corporum" quickly: start with the summary and abstract, then check the authors, source, topics, and related papers. From here, open Scollr to follow Digital Humanities and Scholarship research, save the paper, or map adjacent work.

Authors

Researchers on this paper

Philipp Roelli

first | University of Zurich

Research areas

Follow related topics

Latest Digital Humanities and Scholarship research Authorship Attribution and Profiling Latest Natural Language Processing Techniques research

Citation

BibTeX

@article{Roelli2026Corpus,
  title = {Corpus Corporum},
  author = {Philipp Roelli},
  journal = {Umanistica Digitale},
  year = {2026},
  doi = {10.60923/issn.2532-8816/23668},
  url = {https://doi.org/10.60923/issn.2532-8816/23668}
}

FAQ

Using this paper in a discovery workflow

How do I find related work for this paper?

Use the related papers and topic links on this page as starting points. In Scollr, you can also open the paper and build a literature map around its references, citing papers, and related work.

How can I keep up with new Digital Humanities and Scholarship research papers?

Follow Digital Humanities and Scholarship research in Scollr. New papers from the topic flow into a personalized feed, and you can save useful studies to revisit later.

Can I cite this paper from this page?

This page includes a static BibTeX block for Corpus Corporum. Always verify the DOI, source, and publication details against the publisher record before submitting a manuscript.

Follow this research in Scollr

Follow the topics and authors behind this paper, save useful studies, and build a literature map when you are ready to go deeper.

Get the app

Corpus Corporum

Abstract

What can I do from this paper page?

Researchers on this paper

Philipp Roelli

Follow related topics

Related papers

BibTeX

Using this paper in a discovery workflow

How do I find related work for this paper?

How can I keep up with new Digital Humanities and Scholarship research papers?

Can I cite this paper from this page?

Follow this research in Scollr