Skip to main content
Event - this is a past event

SunoikisisDC: Preparing texts and data cleaning

Event information>

Dates

This is a past event
Time
4:00 pm to 5:30 pm
Institute

Institute of Classical Studies

Event type

Research Training

Event series

Digital classics

Contact

Email only

Speakers: Jonathan Blaney (Cambridge University), Gabriel Bodard (University of London), Katharine Shields (King's College London) 

Sunoikisis Digital Classics session 2

This session following from the preceding one on sources of open texts, with a discussion of the importance of text cleaning and data preparation to any digital analysis or other process. We consider different processes that are more or less tolerant of "messy" texts, including poor OCR and similar artefacts, and highlight the importance of including a realistic level of text preparation in your project planning budget. We look at a few options for cleaning and repairing large quantities of text, before offering a simple tutorial to regular expressions, which can be used to remove repetitive and predictable unwanted features across one or multiple texts, including at massive scale. 

Follow live or later at: https://youtu.be/Or-SaNznWz0 

Further information, readings and exercise at: https://github.com/SunoikisisDC/SunoikisisDC-2024-2025/wiki/2-Preparing-Texts

This page was last updated on 6 December 2024