Lemmatization and grammatical annotation of the corpus histórico judeoespañol (CORHIJE): problems, solutions and resolutions
Abstract
After a brief review of the most salient features of the Corpus Histórico Judeoespañol - CORHIJE —which was already presented at the III Edition of the Congreso de Corpus Diacrónicos en lenguas Iberorrománicas (CODILI, Zurich 2014 )—, this paper describes the ongoing process of lemmatization and grammatical annotation of the corpus. We focus on describing the challenges we have encountered during the annotation process and the solutions we have applied to them, which, in some cases, have led us to take relatively arbitrary resolutions in accordance with the description and analysis goals we were trying to achieve: problems, solutions, and resolutions that amplify the title of our presentation.
Keywords
Digital Corpus Design, Linguistic Corpora, Judeo-Spanish, DiachronyReferences
GARCÍA MORENO, Aitor, dir. (2008-2017): Diccionario Histórico del Judeoespañol (DHJE). http://www.esefardic.es/dhje.
GARCÍA MORENO, Aitor y F. Javier PUEYO MENA (2013-2017): Corpus Histórico Judeoespañol (CORHIJE). http://www.esefardic.es/corhije.
HASSÁN, Iacob (1978): «Transcripción normalizada de textos judeoespañoles», Estudios Sefardíes, 1, pp. 147-150.
PADRÓ, Lluís (2011): «Analizadores Multilingües en FreeLing», Linguamatica, 3:2, pp. 13-20.
SÁNCHEZ-MARCO, Cristina, Gemma BOLEDA y Lluís PADRÓ (2001): «Extending the Tool, or How to Annotate Historical Language Varieties», en Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 1-9.
Published
Downloads
Copyright (c) 2017 Aitor García Moreno, Francisco Javier Pueyo Mena
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.