Nemlar logo


Annotated corpora

Back to BLARK for written language

3 existent but only company-internal, 2 existent and freely usable for PreR&D, 1 existent and freely usable for both PreR&D and R&D.
4 more than 10,000 €, 3 1,000 - 10,000 €, 2 100 - 1,000 €, 1 less than 100 € or free
3 black box, 2 glass box (you can see but not change it) 1 freely manipulable

R means for research, C means for commercial use.
For availability = 3 (company internal) other features are irrelevant.

Name of Corpus Provider Size Other information Availability, cost, manip.
Arabic morphologically analyzed, PoS tagged andvowelized corpus RDI 750K words Multi domain balanced coverage:, literature, business, science, sport, politics etc. 1,4,1
Manually PoS and sense taggedArabic collocates Sakhr 2 million words   3
Monolingual Arabic, PoS tagged Corpus Sakhr 7 million words Manually tagged for PoS, Case, Endings an Named Entity 3

MEDAR is supported by the European Commission's ICT programme and is running from
February 1st 2008 until July 31st 2010

European Flag