INSTITUTE OF INFORMATION TECHNOLOGIES - BAS

Cybernetics and Information Technologies
Volume 2, No 2. Sofia, 2002, Bulgarian Academy of Sciences


The Bulgarian Dictionary in Multilingual Lexical Data Bases

Ludmila Dimitrova*, Radoslav Pavlov**, Kiril Simov***

* Institute of Mathematics and Informatics, 1113 Sofia
** Institute of Information Technologies, 1113 Sofia
*** Central Laboratory for Parallel Processing, 1113 Sofia

Abstract:
This paper describes the process of preparing Bulgarian lexical databases for the CONCEDE EC project whose aim is to harmonise the methodology, tools and resources for building Lexical Data Bases (LDBs) in a general-purpose document-interchange format, for six Central European languages: Bulgarian, Czech, Estonian, Hungarian, Romanian and Slovene. The selection of the words on the basis of their frequency in naturally occurring texts - Orwell's 1984 - ensures that the project produce the lexical databases useful for real applications.

Keywords: dictionary encoding, lexical databases, document type definition