Welcome to the NEMLAR newsletter. We bring you the latest news on language resources and language technologies for Arabic in Europe and the Southern Mediterranean countries and keep you abreast of the results achieved in the project, upcoming events, and other useful information. A version of this newsletter is also available from: http://www.nemlar.org/Newsletter To subscribe, please send an email to: nemlar@cst.dk If you find this newsletter useful and informative, feel free to forward it to others. The newsletter will appear every quarter. Please send any feedback you may have to: nemlar@cst.dk NEMLAR is a European Commission supported initiative dedicated to surveying the state of the art of language resources for the Arabic language and the needs for such resources, to providing a BLARK specification for Arabic and to promote the development of Arabic language resources - in Europe and the Southern Mediterranean countries. --------------------------------------------------------------------- Newsletter Content ------------------ 1. NEMLAR project is coming to an end 2. Proceedings from 'Arabic Language Resources and Tools' conference, 22-23 September 2004, Cairo, Egypt 3. New language resources, books, papers, software and journals 4. Upcoming Events 5. Links 1.NEMLAR project is coming to an end ************************************ The NEMLAR project was started in order to help pave the way for a collaborative effort for Arabic language resources in the Mediterranean area. The project has been running 2003-2005. Now, as NEMLAR is coming to and end, we can summarise the results as follows. People and institutions: * Network: First of all, the NEMLAR core network, consisting of the 14 partners, has proven to be very well suited for the task. Together the partners cover the important areas of HLT and language resources in a very comprehensive way. And also geographically, the Mediterranean region is well covered. However, in order to promote the NEMLAR ideas and to give more people access to information about Arabic language technology and shareable Arabic language resources, the NEMLAR project has extended its network. One of the important extensions is the regional one: the Arabic language is very important outside the Mediterranean region as well. Further extension of the network, maybe in a different form will be pursued, with the view of creating an international community. Information and documents: * Surveys: Two surveys have been produced by the project partners. "Report on Survey on Arabic Language Resources and Tools in Mediterranean Countries" gives an overview of existing Arabic LRs and tools in the region. As a derivative of this survey, a list of institutions and companies involved in the production and distribution of LRs and tools has been made. The second survey is "Survey on the Industrial Needs for Language Resources and Tools in Mediterranean Countries". The needs of industry are important for giving priorities to the development of LRs. The two surveys and the list of institutions and companies may be extended in scope and coverage, through new members of the network and through promotion at conferences, newsletters etc. * BLARK for Arabic: A BLARK (Basic Language Resource Kit) describes the minimal set of language resources that are necessary for developing pre-competitive HLT for a language. The NEMLAR project has elaborated the first BLARK for Arabic. This is a very good starting point for deciding on development of LRs and tools. We hope that the BLARK document will also develop in the future. Language resources: * NEMLAR has been able to do development of a few language resources, and has chosen the most important ones, based on the BLARK and the needs expressed by industry and research. The resources are: a) an annotated & unannotated written corpus of Modern Standard Arabic, fully vowelized (approx. 500K words) b) an audio/speech database for speech synthesis with a male and female voice with a well designed textual corpus of Modern Standard Arabic; c) an Arabic database of broadcast news; fully annotated at various levels (orthographically, named entities , ...) The resources will be available through ELRA. Even though these resources have been developed, the Work on developing basic language resources and tools has to continue as a collaborative effort. Dissemination, meeting place: * Conference: NEMLAR held the first Arabic Language Resources and Tools Conference in Cairo 2004. It brought together academics and industry from all over the world to discuss issues in Arabic HLT. Plans are in progress to follow up on this success, hopefully in 2006. 2. Proceedings from 'Arabic Language Resources and Tools' conference, 22-23 September 2004, Cairo, Egypt **************************************************************************** You may purchase from CST the printed proceedings and proceedings on CDs from the Arabic Language Resources and Tools' conference, 22-23 September 2004, Cairo, Egypt. For more information please see http://nemlar.org/proceedings.html 3. New language resources, books, papers, software and journals *************************************************************** Books: • Beyond Morphology - Subtitle: Interface Conditions on Word Formation by Peter Ackema, and Ad Neeleman, Oxford Studies in Theoretical Linguistics No. 6. For more information see http://www.oup.co.uk/isbn/0-19-926729-4 • Eastern Arabic with MP3 Files by Frank A. Rice and Majed F. Sa'id. For more information see http://www.press.georgetown.edu/ • A Dictionary of Moroccan Arabic - Moroccan-English/English-Moroccan from Georgetown University Press. For more information see http://linguistlist.org/issues/15/15-3473.html • A Short Reference Grammar of Moroccan Arabic. For more information see http://linguistlist.org/issues/15/15-1365.html Software: • Software at Pertinence. For more information see http://www.pertinence.net/pin/index.jsp?ui.lang=ar Visit the NEMLAR web site for more information: http://www.nemlar.org 4. Upcoming Events ******************** • Exploring Syntactically Annotated Corpora, July 14th in connection with CORPUS LINGUISTICS 2005, 14-17 July 2005, University of Birmingham. For more information http://www.bultreebank.org/ESyntAC/ • IJCA 2005, Workshop on Grammatical Inference Applications: Successes and Future Challenges. Held prior to IJCAI-05, July 31, 2005, Edinburgh, Scotland. For more information see http://www.ics.mq.edu.au/~menno/IJCAI05/ • EUROLANG 2005, Babes-Bolyai University, Cluj-Napoca, Romania, July 25 - August 6, 200. For more information see http://www.cs.ubbcluj.ro/eurolan2005/ • ESSLLI 2005, 17th European Summer School in Logic, Language and Information August 8-19, 2005, Heriot-Watt University, Edinburgh, Scotland. For more information see http://www.macs.hw.ac.uk/esslli05/ • CIC-2005, International Conference on Computer Science September 5-9, 2005, Mexico City, Mexico. For more information see http://www.cic.ipn.mx/conf/cic/2005/ • MT Summit, Phuket, Thailand, 13-15 September 2005. For more information see http://www.tcllab.org/Pages/mtsummit.html • The Fifth Conference on Language Engineering, September 14-15, 2005, Cairo, Egypt. For more information see http://ntserver.asueng.eun.eg/esle • EDOC, International Workshop on Vocabularies, Ontologies and Rules for The Enterprise (VORTE 2005), September 20 2005, Enschede, The Netherlands. For more information see http://wwwhome.cs.utwente.nl/~guizzard/VORTE05/ • SPECOM 2005, 10th International Conference on Speech and Computer, October 17-19 2005 University of Patras, Patras, Greece. For more information see http://www.wcl.ee.upatras.gr/specom2005.htm • LULCL 2005, Lesser Used Languages and Computer Linguistics, 27-28 October 2005, Eurac research, Bolzano. For more information see http://www.eurac.edu/Org/LanguageLaw/Multilingualism/Projects/Conference2005.htm • WORM 2005, October 31 - November 4 2005 , Larnaca, Cyprus. For more information see http://www.starlab.vub.ac.be/staff/mustafa/WORM_2005.htm • Linguistic Information Integration in Arabic Character and Text Recognition November 5-7 2005, Tozeur, Tunisia. For more information http://www.nemlar.org/Events/CFP.pdf • Translating and the Computer, 27 November 24-25, 2005, London. For more information see http://www.aslib.com/training/conferences/index.htm • AUCOXF Conf on Languages and Linguistics, March 24-25, 2006. For more information see http://www.aucegypt.edu/conferences/aucoxf/ Visit the NEMLAR web site for more information http://www.nemlar.org/Events 5. Links ******** • ELRA distributes Arabic language resources: http://www.elra.info • Linguistic Data Consortium distributes Arabic language resources - LDC: http://www.ldc.upenn.edu • Link to Arabic NLP technologies at RDI (to be found under the submenu item 'Arabic NLP' under the main menu item 'Technologies'): http://www.RDI-eg.com • The Faharis Site, list of Arabic web resources: http://www.faharis.net • Latifa Al-Sulaiti's homepage with collections of tools and resources for Arabic: http://www.comp.leeds.ac.uk/latifa/survey.htm • Link to free morphological analyzers for the Arabic language: http://www.glue.umd.edu/~kareem/research/ • Visit the Linguist List related to Arabic language: http://listserv.linguistlist.org/archives/arabic-l.html • List of pointers to Arabic and other Semitic NLP and Speech sites: http://www.elsnet.org/arabiclist.html • Lists of websites with theses dealing with Arabic human language technologies: http://www.biomath.jussieu.fr/ATALA/these/#Idx3 http://www.technolangue.net/rubrique.php3?id_rubrique=11 • Links to Arabic processing tools at the University of Leeds: http://www.comp.leeds.ac.uk/latifa/survey.htm • Links to Arabic tools, resources, conferences, etc: http://www.mghamdi.com/links.htm ---------------------------------------------------------------------- This newsletter is published by the NEMLAR project (http://www.nemlar.org) and produced by Center for Sprogteknologi. Since the project will end on July 31st 2005, this issue will be the last newsletter from the project. However, the partners behind the project will investigate the possibility to continue a newsletter in the future. To contact the project co-ordinator: Center for Sprogteknologi (CST), University of Copenhagen Project Co-ordinator: Bente Maegaard Tel: +45 35 32 90 74, Fax: +45 35 32 90 89 email: nemlar@cst.dk