Lemmatization and grammatical annotation of the corpus histórico judeoespañol (CORHIJE): problems, solutions and resolutions


  • Aitor García Moreno ILC-CSIC / IUMP
  • Francisco Javier Pueyo Mena College of the Holy Cross (USA)


After a brief review of the most salient features of the Corpus Histórico Judeoespañol - CORHIJE —which was already presented at the III Edition of the Congreso de Corpus Diacrónicos en lenguas Iberorrománicas (CODILI,  Zurich 2014 )—, this paper describes the ongoing process of lemmatization and grammatical annotation of the corpus. We focus on describing the challenges we have encountered during the annotation process and the solutions we have applied to them, which, in some cases, have led us to take relatively arbitrary resolutions in accordance with the description and analysis goals we were trying to achieve: problems, solutions, and resolutions that amplify the title of our presentation.


Digital Corpus Design, Linguistic Corpora, Judeo-Spanish, Diachrony


GARCÍA MORENO, Aitor, dir. (2008-2017): Diccionario Histórico del Judeoespañol (DHJE). http://www.esefardic.es/dhje.

GARCÍA MORENO, Aitor y F. Javier PUEYO MENA (2013-2017): Corpus Histórico Judeoespañol (CORHIJE). http://www.esefardic.es/corhije.

HASSÁN, Iacob (1978): «Transcripción normalizada de textos judeoespañoles», Estudios Sefardíes, 1, pp. 147-150.

PADRÓ, Lluís (2011): «Analizadores Multilingües en FreeLing», Linguamatica, 3:2, pp. 13-20.

SÁNCHEZ-MARCO, Cristina, Gemma BOLEDA y Lluís PADRÓ (2001): «Extending the Tool, or How to Annotate Historical Language Varieties», en Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 1-9.




Download data is not yet available.