Corpus Palaeography: Machine Learning, Scribal Profiling and the Dating and Localisation of Manuscripts Containing Old English, c. 800–1200
Recent innovations in machine learning and digital typesetting offer the scope for a paradigm shift in philological data extraction, analysis and argumentation, where texts are compared not on the basis of generalisation and exemplification, but millions of individual datapoints. Through an Handwritten Text Recognition (HTR) model, trained on c. 800 pages (c. 250,000 words) of Old English to recognise a character inventory of almost 600 letter-forms and marks of punctuation with a character error rate of just 4.15%, we show the potential for a new corpus palaeography.
This page was last updated on 29 April 2025