Named Entity Resolution for Historical Texts
Author | : Audrey Holmes |
Publisher | : |
Total Pages | : 34 |
Release | : 2019 |
ISBN-10 | : OCLC:1129599622 |
ISBN-13 | : |
Rating | : 4/5 ( Downloads) |
Download or read book Named Entity Resolution for Historical Texts written by Audrey Holmes and published by . This book was released on 2019 with total page 34 pages. Available in PDF, EPUB and Kindle. Book excerpt: The field of digital humanities has spurred an increase in applications of computational linguistics to historical documents, but the field remains underdeveloped. Standard natural language processing (NLP) techniques developed using contemporary texts tend to perform poorly when applied to historical documents due to challenges such as spelling variation, semantic shifts, and lack of standard orthography. In this thesis, we compare performance of common Named Entity Recognition (NER) libraries including Stanford CoreNLP, spaCy, and Flair on historical texts. We also present a method for named entity resolution designed specifically for historical texts, which combines domain adapted word embeddings with phonetic and lexical similarities. This has the potential to increase the speed of digitization of historical documents and improve search capabilities across historical corpora. The algorithm is one of the first trained on historical documents and improves upon common approaches to spelling normalization for historical documents using only lexical and/or phonetic similarity. Additionally, we provide a user interface so that scholars without programming expertise can easily use the tools developed in this thesis. Future work will include linking historical named entities to contemporary references and constructing knowledge graphs for historical corpora.