4TB: A new tool to study Tocharian A – Old Uyghur parallels
Views: 59 / PDF downloads: 19
DOI:
https://doi.org/10.32523/2664-5157-2026-2SI-77-92Keywords:
Central Asian Buddhism, Maitreya, Maitreyasamiti-Nāṭaka, Maitrisimit nom bitig, Old Turkic, Old Uyghur, parallel corpus, Silk Road cultures, Tocharian A, translation corpusAbstract
The paper introduces 4TB, a web-based corpus developed to support research on Tocharian A and Old Uyghur. Its aim is to collect and align Tocharian A fragments with their Old Uyghur (and later Sanskrit) parallels, particularly the Tocharian A Maitreyasamiti-Nāṭaka and its Old Uyghur translation Maitrisimit nom bitig. Despite extensive scholarship, no complete editions of these texts are currently published. A Tocharian A edition is in preparation and will serve as the basis for the corpus, while 4TB focuses on assembling Old Uyghur parallel fragments as a foundation for a future full edition of the Maitrisimit nom bitig. Compared to existing digital resources, Tocharian is relatively well represented, whereas Old Turkic corpora remain limited, making even a partial parallel corpus a meaningful contribution. The paper discusses key problems of corpus design and development. Transliteration is omitted for both languages, while transcription is handled differently. For Tocharian A, it follows established conventions; for Old Uyghur, all data are normalized into a unified transcription system, with limited correction of outdated forms. Additional challenges arise from inconsistent transcription practices in the literature. Translations are provided as in the source publications (Russian for Tocharian A, mainly German for Old Uyghur), with plans to add automatic English translations. The corpus is structured around tokens as the smallest unit, but alignment operates on higher-level units due to the lack of reliable word-to-word correspondence. Because sentence segmentation is problematic, a flexible unit called a “passage” is introduced. Passages are grouped into “passage groups” to account for manuscript variation and are then aligned across languages, allowing for discrepancies such as omissions and additions. This approach preserves textual coherence and differs from standard KWIC-based corpus models. The corpus offers several core features. Dictionaries cover all lemmatized tokens and link forms and spellings with grammatical annotation. A concordance displays passages together with their parallels. Search tools support queries by spelling, form, lemma, and grammatical features, including cross-linguistic searches. Editing tools allow modification of texts and lexical data, with semi-automatic lemmatization and potential for further expansion. Overall, 4TB provides a scalable framework that can be used in future corpus projects. It facilitates cross-linguistic research on Tocharian and Turkic languages and improves access to the material for non-linguistic specialists such as Buddhologists.
Downloads
Reference
Bonelli E.T., 2010. Theoretical overview of the evolution of corpus linguistics. The Routledge Handbook of Corpus Linguistics. Editors: Anne O’Keeffe and Michael McCarthy. 1st ed. p. cm. (Routledge handbooks in applied linguistics). P. 14–28.
Burlak S.A., Itkin I.B., 2004. Tokharskij tekst A 446: yeshchë odna rukopis’ tokharskoy versii Maitreyasamiti-Nāṭaka. [Tocharian Text A 446: Another manuscript of the Tocharian version of the Maitreyasamiti-Nāṭaka] Voprosy Jazykoznanija. 3. P. 24–35). [in Russian].
Burlak S.A., Itkin I.B., 2013. Tokharskie yazyki [Tocharian languages]. Yazyki mira. Reliktovye indoevropeyskie yazyki Peredney i Tsentral’noy Azii. Red. koll.: Yu.B. Koryakov, A.A. Kibrik [Languages of the world: relict Indo-European languages of Western and Central Asia. Editorial Board: Yu.B. Koryakov, A.A. Kibrik]. Moscow: Academia. P. 386–485. [in Russian].
Geng Sh., Laut J.-P., Pinault G.-J., 2004a. Neue Ergebnisse der Maitrisimit-Forschung. Zeitschrift der Deutschen Morgenländischen Gesellschaft [New results of Maitrisimit research. Journal of the German Oriental Society]. 154. P. 347–369. [in German].
Geng Sh., Laut J.-P., Pinault G.-J., 2004b. Neue Ergebnisse der Maitrisimit-Forschung (II): Struktur und Inhalt des 26. Kapitels [New Results of Maitrisimit Research (II): Structure and content of chapter 26]. Studies on the Inner Asian Languages. 19. P. 29–94 + Plates III–XIII. [in German].
Erdal M., 2004. A Grammar of Old Turkic. Vol. Central Asia 3. Handbook of Oriental Studies 8. Leiden: Brill. 575 p.
Erdal M., Gippert J., Röhrborn K., Zieme P., Nevskaya I., Knüppel M., Özertural Z., Taube J., 2003. Vorislamische Alttürkische Texte: Elektronisches Corpus [Pre-Islamic Old Turkic texts: Electronic corpus]. [Electronic resource]. Available at: https://vatec2.fkidg1.uni-frankfurt.de/ (Accessed: 29.03.2026). [in German].
Derin M.O., Harada T., 2021. Universal Dependencies for Old Turkish. In Proceedings of the Fifth Workshop on Universal Dependencies (UDW, Syntax Fest 2021). Sofia, Bulgaria. Association for Computational Linguistics. P. 129–141.
Itkin I.B., Kuritsyna A.V., Malyshev S.V., 2017. Tocharian A text THT 1331 and the “Höllenkapitel” of the “Maitrisimit nom bitig”: some more remarks. Tocharian and Indo-European studies. 18. P. 71–81.
Itkin I.B., Kuritsyna A.V., Wilkens J., Nugteren H., 2025. THT-fragments of Maitreyasamiti-Nāṭaka: Current state of the topic and some new identifications. Acta Orientalia Academiae Scientiarum Hungaricae. 1 (78). P. 85–113.
Lefer M.-A., 2020. Parallel Corpora. Magali Paquot, Stefan Th. Gries (eds.). A Practical Handbook of Corpus Linguistics. Springer. P. 257–282.
Kenning M.-M., 2010. What are parallel and comparable corpora and how can we use them? The Routledge Handbook of Corpus Linguistics. Editors: Anne O’Keeffe and Michael McCarthy. 1st ed. p. cm. (Routledge handbooks in applied linguistics). P. 487–500.
Malzahn M., Braun M., Fellner H.A., Koller B., 2011. A Comprehensive Edition of Tocharian Manuscripts. [Electronic resource]. Available at: https://cetom.univie.ac.at/ (Accessed: 29.03.2026).
Müller F.W.K., Sieg E., 1916. Maitrisimit und ‘Tocharisch’ [Maitrisimit and ‘Tocharian’]. Sitzungsberichte der Königlich Preußischen Akademie der Wissenschaften [Proceedings of the Royal Prussian Academy of Sciences]. P. 395–417. [in German].
Peyrot M., Semet A., 2016. A comparative study of the beginning of the 11th act of the Tocharian A Maitreyasamitināṭaka and the Old Uyghur Maitrisimit. Acta Orientalia Hungarica. 69. P. 355–78.
Pinault G.-J., 1999. Restitution du Maitreyasamiti-Nāṭaka en tokharien A: Bilan provisoire et recherches complémentaires sur l’acte XXVI [Restoration of the Maitreyasamiti-Nāṭaka in Tocharian A: Provisional assessment and additional research on act XXVI]. Tocharian and Indo-European Studies. 8. P. 189–240. [in French].
Semet A., Äysa A., 2014. Prophezeiung über die Maitreya-Geburt. Neues zum 11. Kapitel der uighurischen Maitrisimit nom bitig [Prophecy of the birth of Maitreya: New findings on chapter 11 of the Uyghur Maitrisimit nom bitig]. Aysima Mirsultan. Mihriban Tursun Aydın. Erhan Aydın (Hrsg.): Eski Türkçeden Çağdaş Uygurcaya. Mirsultan Osman’ın Doğumunun 85. Yılına Armağan. Konya. P. 221–249. [in German].
Tekin Ş., 1980. Maitrisimit nom bitig. Die uigurische Übersetzung eines Werkes der buddhistischen Vaibhāṣika-Schule. 1. Teil: Transliteration, Übersetzung, Anmerkungen. [Maitrisimit nom bitig. The Uyghur translation of a work of the Buddhist Vaibhāṣika school. Part 1: Transliteration, translation, notes.] Schriften zur Geschichte und Kultur des Alten Orients, Berliner Turfantexte [Writings on the history and culture of the Ancient Orient, Berlin Turfan Texts]. IX. Berlin: Akademie-Verlag. 264 p. [in German].
Weisser M., 2022. What corpora are available? Anne O’Keeffe and Michael McCarthy (Eds.). The Routledge Handbook of Corpus Linguistics. Second edition. Routledge. P. 89–102.
Wilkens J., 2008. Maitrisimit und Maitreyasamitināṭaka. Aspects of research into Central Asian Buddhism. In memoriam Kōgi Kudara. Edited by Peter Zieme. Silk Road Studies 16. Turnhout: Brepols. P. 407–433.
Wilkens J., 2021. Handwörterbuch des Altuigurischen. Altuigurisch – Deutsch – Türkisch. Herausgegeben von der Akademie der Wissenschaften zu Göttingen [Concise dictionary of Old Uyghur. Old Uyghur – German – Turkish. Published by the Göttingen Academy of Sciences]. Göttingen: Universitätsverlag. 929 p. [in German].
Wilkens J., 2023. Einige Beobachtungen zu Übersetzungstechnik der altuigurischen Maitrisimit [Some observations on the translation technique of the Old Uyghur Maitrisimit]. Journal of Old Turkic Studies. 2 (7). P. 553–571. [in German].
Wołk K., 2015. Noisy-parallel and comparable corpora filtering methodology for the extraction of bi-lingual equivalent data at sentence level. Computer Science. 2 (16). P. 169–184.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 M.В. Выжлаков

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

















