Lemmatization and grammatical annotation of the corpus histórico judeoespañol (CORHIJE): problems, solutions and resolutions

Authors

  • Aitor García Moreno ILC-CSIC / IUMP
  • Francisco Javier Pueyo Mena College of the Holy Cross (USA)

Abstract

After a brief review of the most salient features of the Corpus Histórico Judeoespañol - CORHIJE —which was already presented at the III Edition of the Congreso de Corpus Diacrónicos en lenguas Iberorrománicas (CODILI,  Zurich 2014 )—, this paper describes the ongoing process of lemmatization and grammatical annotation of the corpus. We focus on describing the challenges we have encountered during the annotation process and the solutions we have applied to them, which, in some cases, have led us to take relatively arbitrary resolutions in accordance with the description and analysis goals we were trying to achieve: problems, solutions, and resolutions that amplify the title of our presentation.

Keywords

Digital Corpus Design, Linguistic Corpora, Judeo-Spanish, Diachrony

References

GARCÍA MORENO, Aitor, dir. (2008-2017): Diccionario Histórico del Judeoespañol (DHJE). http://www.esefardic.es/dhje.

GARCÍA MORENO, Aitor y F. Javier PUEYO MENA (2013-2017): Corpus Histórico Judeoespañol (CORHIJE). http://www.esefardic.es/corhije.

HASSÁN, Iacob (1978): «Transcripción normalizada de textos judeoespañoles», Estudios Sefardíes, 1, pp. 147-150.

PADRÓ, Lluís (2011): «Analizadores Multilingües en FreeLing», Linguamatica, 3:2, pp. 13-20.

SÁNCHEZ-MARCO, Cristina, Gemma BOLEDA y Lluís PADRÓ (2001): «Extending the Tool, or How to Annotate Historical Language Varieties», en Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 1-9.

Published

15-10-2017

Downloads

Download data is not yet available.