The NEMLAR project
and its results
The NEMLAR (Network for Euro-Mediterranean Language Resources) project was started in order to help pave the way for collaborative effort for Arabic language resources in the Mediterranean area. The project was supported by the European Union under the UNCO-MED programme which supports collaboration between EU and the countries in the Mediterranean region. The project ran between 2003 and 2005, and had 14 partners see from: Egypt, Jordan, Lebanon, Morocco, Tunisia, West Bank & Gaza Strip, Denmark, France, Greece and The Netherlands.
The results of the NEMLAR project can be summarised as follows.
People and institutions:
Network: First of all, the NEMLAR core Network, consisting of 14 partners, has proved to be very well suited for the task. Together partners cover the area of HLT and language resources in a very comprehensive way. And also geographically, the Mediterranean region is well covered. However, in order to promote the NEMLAR ideas, and to give people access to information about Arabic language technology and sharable Arabic language resources, the NEMLAR project has extended its network. One of the important extension is the regional one: the Arabic language is very important outside the Mediterranean region as well, so members are welcome from other regions of the world.
Information and documents:
Two surveys have been produced by the project partners. Report on Survey on Arabic Resources and Toole in Mediterranean countries gives an overview of existing Arabic LR’s and tools in the region. As a derive of this survey, a list of institutions and companies involved in the production and distribution of LRs and tools has been made. The second survey is Survey on the industrial Needs for Language Resources and Tools in the Mediterranean Countries. The needs of industry is important for giving priorities to the development of LRs. The two surveys and the list of institutions and companies may be extended in scope and coverage., through new members of the network anf through promotion at conferences, newsletters etc.
Blark for Arabic: A BLARK (Basic Language Resource Kit) describes the minimal set of language resources that are necessary for developing pre-competitive HLT for a language. The NEMLAR project has elaborated the first BLARK for Arabic. This taken together with the survey on existing LRs, is a very good starting point for deciding on priorities for development of LRs and tools.
NEMLAR has been able to do development of a few language resources, and has chosen the most important ones, based on the BLARK and the needs expressed by industry and research.
Written corpus: An annotated written corpus of Modern Standard Arabic, fully vowelized, POS-tagged (approx. 500K words).
Speech database: A speech database for text to speech synthesis with a male and a female voice with a well designed textual corpus of Modern Standard Arabic.
Speech database: An Arabic speech database of broadcast news, fully annotated various levels (orthographically, named entity etc.)
Dissemination and meeting places:
Conference: NEMLAR held the first Arabic Language Resources and Tools Conference in Cairo 2004. It brought together academics and industry form all over the world to discuss issues in Arabic HLT