Bulgarian MULTEXT-East Corpus – Structure and Content
Ludmila Dimitrova*, Radoslav Pavlov*, Kiril Simov**,
Lydia Sinapova***
* Institute of Mathematics and Informatics, 1113 Sofia
** Institute for Parallel Processing, 1113 Sofia
*** Institute of Information Technologies, 1113 Sofia; Simpson College, IA, USA
Abstract:
The first Bulgarian language electronic corpus is included in the MULTEXT-East (MTE) multilingual corpus of the MTE-project COP 106. The Bulgarian corpus is developed in the framework of MTE according to the methodology and requirements of the project.
Keywords: language resources, lexical databases, morpho-syntactic descriptors, annotated multilingual corpus.