Elites, Networks and Power in Modern China

DissertationBaptiste0
Event Extraction from Facsimiles of Ancient Documents for History Studies
Baptiste Blouin
Doctoral Dissertation, Aix-Marseille
2022

In the era of massive digitization of historical sources, automatic event extraction is a crucial step in processing historical texts. Event processing is an active research area within the natural language processing community, but resources and systems are mainly developed for the processing of contemporary texts. In this context, this thesis aims to automatically extract events from historical documents. This thesis proposes interdisciplinary exchanges to adapt recent ontologies for historical research purposes. Beyond the specific needs of digital humanities, OCR-processed historical documents that are over a century old are far from the type of material contemporary approaches are accustomed to handling. Whether in terms of diachrony, quality, or domain adaptation, processing such documents presents major challenges in natural language processing. We then propose domain adaptation techniques combining the use of recent specialized architectures and preprocessing steps, which help mitigate these difficulties while leveraging contemporary resources. Finally, based on a recent paradigm that frames tasks as a question-answering problem, we propose an event extraction pipeline tailored for the processing of historical documents. From extracting a trigger word for an event in a sentence to representing over a century's worth of events as graphs, we suggest a targeted exploration of a vast amount of historical sources

Contact