Cybernetics and Information Technologies, Vol. 5, No 1, 2005

INSTITUTE OF INFORMATION TECHNOLOGIES - BAS

Cybernetics and Information Technologies
Volume 5, No 1. Sofia, 2005, Bulgarian Academy of Sciences

Bulgarian MULTEXT-East Corpus – Structure and Content

Ludmila Dimitrova*, Radoslav Pavlov*, Kiril Simov**, Lydia Sinapova***

* Institute of Mathematics and Informatics, 1113 Sofia
** Institute for Parallel Processing, 1113 Sofia
*** Institute of Information Technologies, 1113 Sofia; Simpson College, IA, USA

Abstract: The first Bulgarian language electronic corpus is included in the MULTEXT-East (MTE) multilingual corpus of the MTE-project COP 106. The Bulgarian corpus is developed in the framework of MTE according to the methodology and requirements of the project.

Keywords: language resources, lexical databases, morpho-syntactic descriptors, annotated multilingual corpus.