From Regex to Typed AST: The ANSELMUS Combinator-Based Parser for the IbiScript DSL Applied to Middle Persian and Parthian Bilingual Inscriptions 
Abstract
This paper introduces ANSELMUS, a combinator-based parser for IbiScript, a new Domain-Specific Language (DSL) targeting epigraphic texts according to the Leiden Conventions. Implemented in Rust via the nom library, ANSELMUS produces a typed Abstract Syntax Tree (AST) that encodes the epigraphic grammar at the type level, enabling static validation and multi-format serialization—a departure from the regex-to-XML model of existing tools. The library compiles to WebAssembly for client-side deployment without server infrastructure, exposes a Command Line Interface (CLI) and Python bindings for integration with existing Digital Humanities (DH) pipelines, and is fully conformant with TEI P5 and EpiDoc guidelines. The system’s validity is demonstrated through its application to a corpus of Middle Persian and Parthian bilingual inscriptions (3rd century CE), characterized by complex fragmentary states and diverse epigraphic notations.
Keywords
Full Text:
PDFDOI: http://dx.doi.org/10.2423/i22394303v16n1p147
References
Aho, A. V., Lam, M. S., Sethi, R., & Ullman, J. D. (2006). Compilers: Principles, techniques, and tools (Pearson new international edition, 2nd ed.). Essex, United Kingdom: Pearson.
Barats, C., Schafer, V., & Fickers, A. (2020). Fading Away... The challenge of sustainability in digital studies. DHQ: Digital Humanities Quarterly, 14(3). https://dhq-static.digitalhumanities.org/pdf/000484.pdf
Baumann, R., Bodard, G., Cayless, H., Sosin, J., & Viglianti, R. (2011). Integrating Digital Papyrology. In Big Tent Digital Humanities. Presented at the ADHO, Stanford, CA.
Bodard, G. (2010). EpiDoc: Epigraphic Documents in XML for Publication and Interchange. Latin On Stone: Epigraphic Research and Electronic Archives, 101–118.
Bodard, G., Mylonas, E., Elliott, T., Stoyanova, S., Tupman, C., & Vagionakis, I. (2026). EpiDoc Guidelines post 9.8 dev. Retrieved from https://epidoc.stoa.org/gl/latest/
Burnard, L. (2014). What is the Text Encoding Initiative?: How to add intelligent markup to digital resources. Marseille, France: OpenEdition Press. https://doi.org/10.4000/books.oep.426
Burnard, L., & Baumann, S. (2022). TEI P5: Guidelines for Electronic Text Encoding and Interchange. Retrieved from https://guidelines.teipublisher.com/
Cayless, H., Roueché, C., Elliott, T., & Bodard, G. (2009). Epigraphy in 2017. DHQ: Digital Humanities Quarterly, 3(1).
Cereti, C. G., & Terribili, G. (2014). The Middle Persian and Parthian Inscriptions on the Paikuli Tower. Iranica Antiqua, 347–412.
Cereti, C. G., & Terribili, G. (2022). Epigraphic Findings at Paikuli (2018-2019). A Preliminary Study. Vicino Oriente, 26, 53–75. https://doi.org/10.53131/VO2724-587X2022_4
Chaniotis, A., Corsten, T., Papazarkadas, N., & Tybout, R. A. (2026). Supplementum Epigraphicum Graecum Online (SEGO). Retrieved from https://scholarlyeditions.brill.com/sego/
Couprie, G. (2026). nom: Rust parser combinator framework. Retrieved from https://github.com/rust-bakery/nom?tab=readme-ov-file
Cummings, J. (2018). A world of difference: Myths and misconceptions about the TEI. Digital Scholarship in the Humanities. https://doi.org/10.1093/llc/fqy071
Davis, M. (2025). UTS #18: Unicode Regular Expressions. Retrieved from https://www.unicode.org/reports/tr18/?utm_source=copilot.com
Del Grosso, A. M., Zenzaro, S., Boschetti, F., & Ranocchia, G. (Eds.). (2024). Bridging Traditional and Digital Papyrology with Domain-Specific Languages. The GreekSchools Case Study. In The Digital Critical Edition of Greek Papyri: Issues, Projects, and Perspectives: Vol. III. Berlin, Germany: De Gruyter. https://doi.org/10.1515/9783111070162
Dow, S. (1969). Conventions in Editing: A Suggested Reformulation of the Leiden System. Durham, NC.
Faghihi, Y., Holford, M., & Jones, H. (2022). Teaching the Text Encoding Initiative: Context, Community and Collaboration. Journal of Open Humanities Data, 8, 15. https://doi.org/10.5334/johd.72
Fenlon, K. S. (2020). Sustaining Digital Humanities Collections: Challenges and Community-Centred Strategies. International Journal of Digital Curation, 15(1), 13. https://doi.org/10.2218/ijdc.v15i1.725
Herzfeld, E. (1924). Paikuli: Monument and Inscription of the Early History of the Sasanian Empire (D. Reimer; Ernst Vohsen, Vols. 1–2). Berlin, Germany: D. Reimer.
Ide, N. M., & Sperberg-McQueen, C. M. (1995). The TEI: History, Goals, and Future. Computers and the Humanities, 29(1).
King’s College London, Department of Digital Humanities. (2024). Kiln: A framework for publishing XML and TEI content [JavaScript]. [Legacy] Department of Digital Humanities, King’s College London. Retrieved from https://github.com/kcl-ddh/kiln (Original work published 2011)
Marruzzo, A. (2024). Middle Persian and Digital Innovation. New Web Implementations, Algorithmic Methods and Fonts (Unpublished master’s thesis). Roma, Italy.
Marruzzo, A. (2025). Digitising Iranian Scripts: The Paikuli Web Platform and a New Pahlavi Typeface as Tools for Multi-Period Analysis. Presented at the PolEmA - Polycentric Empires in Western Asia, Roma, Sapienza University of Rome.
Marruzzo, A. (2026a). ANSELMUS [Rust]. Retrieved from https://crates.io/crates/anselmus
Marruzzo, A. (2026b). BEDA [Rust]. Retrieved from https://crates.io/crates/beda
Marruzzo, A. (2026c). IbiScript [Rust]. Retrieved from https://crates.io/crates/ibiscript
Martinelli, N. (2025, December 18). Software rot: Saving science’s digital legacy. Retrieved from Software Heritage website: https://www.softwareheritage.org/2025/12/18/software-rot-saving-sciences-digital-legacy/
Materni, M. (2020). Complessità della codifica ed ergonomia strumentale nel contesto XML-TEI: Dove siamo? (Bilancio a partire da un nuovo progetto di edizione digitale medievale). Umanistica Digitale, No 8, research in the age of Digital Humanities. https://doi.org/10.6092/ISSN.2532-8816/9976
Mugelli, G., Re, G., & Taddei, A. (2020). Annotazione digitale di testi antichi. Lingue antiche e Digital Humanities, tra ricerca e didattica. Umanistica Digitale, 35–60. https://doi.org/10.6092/ISSN.2532-8816/9962
Panciera, S. (Ed.). (1982). Epigrafia e ordine senatorio. Atti Del Colloquio Internazionale AIEGL Di Roma, 4–5, IX–XIII. Rome, Italy: Edizioni di storia e letteratura.
Panciera, S. (2012). What Is an Inscription? Problems of Definition and Identity of an Historical Source. Zeitschrift Für Papyrologie Und Epigraphik, (183), 1–10.
Pape, S., Schöch, C., & Wegner, L. (2012). TEICHI and the Tools Paradox: Developing a Publishing Framework for Digital Editions. Journal of the Text Encoding Initiative, (Issue 2). https://doi.org/10.4000/jtei.432
Papyri.info Collaborative. (2013). Leiden+ Documentation. Retrieved from Papyri.info website: https://papyri.info/docs/leiden_plus
Papyri/SoSol. (2010–2025). SoSol: Son of Suda On-Line / Papyri.info collaborative editing platform. Retrieved from https://papyri.info/
Pichler, A. (2021). Hierarchical or Non-hierarchical? A Philosophical Approach to a Debate in Text Encoding. DHQ: Digital Humanities Quarterly, 15(1).
Rezania, K., Cantera, A., Eide, Ø., & Neuefeind, C. (2021–2030). Zoroastrian Middle Persian: Digital Corpus and Dictionary (MPCD). Ruhr-Universität Bochum / Freie Universität Berlin / Universität zu Köln. Retrieved from https://www.mpcorpus.org/
Schmidt, D. (2014). Towards an Interoperable Digital Scholarly Edition. Journal of the Text Encoding Initiative, (Issue 7). https://doi.org/10.4000/jtei.979
Schubart, W. (1918). Einführung in die Papyruskunde. Berlin, Germany: Weidmannsche Buchhandlung.
Sipser, M. (2013). Introduction to the Theory of Computation (3rd ed., international edition). Boston, MA: Cengage Learning.
Skjærvø, P. O., & Humbach, H. (1978–1983). The Sassanian inscription of Paikuli (Vols. 1–3). Wiesbaden, Germany: Reichert.
Sosin, J. (2012). Digital Papyrology. In P. Schubert (Ed.), Proceedings of the 26th International Congress of Papyrology (pp. 767–772). Bibliothèque d’Etudes Papyrologiques.
TEI Consortium. (2026). TEI P5: Guidelines for Electronic Text Encoding and Interchange. Retrieved from https://www.tei-c.org/release/doc/tei-p5-doc/en/html/
Wilcken, U. (1932). Das Leydener Klammersystem. Leipzig, Germany: B.G. Teubner Verlagsgesellschaft.
Williams, A. C., Santarsiero, A., Meccariello, C., Verhasselt, G., Carroll, H. D., Wallin, J. F., … Brusuelas, J. H. (2015). Proteus: A platform for born digital critical editions of literary and subliterary papyri. 2015 Digital Heritage, 453–456. Granada, Spain: IEEE. https://doi.org/10.1109/DigitalHeritage.2015.7419546
Article Metrics
Metrics powered by PLOS ALM
Refbacks
- There are currently no refbacks.
Copyright (c) 2026 Andrea Marruzzo

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
SCIRES-IT, e-ISSN 2239-4303
Journal founded by Virginia Valzano




