Nemlar logo


Multimodal corpora
for hand or typed OCR

Back to BLARK for written language

3 existent but only company-internal, 2 existent and freely usable for PreR&D, 1 existent and freely usable for both PreR&D and R&D.
4 more than 10,000 €, 3 1,000 - 10,000 €, 2 100 - 1,000 €, 1 less than 100 € or free
3 black box, 2 glass box (you can see but not change it) 1 freely manipulable

R means for research, C means for commercial use.
For availability = 3 (company internal) other features are irrelevant.

Name of Resource Provider Size Other information Availability, price, manip.
IFN/ENIT IFN/ENIT   Handwritten scanned pages 2,1,1
Training corpus of Arabic typed written OCR RDI 1,200 pages of A4 scanned at 300 and 600 dpi Covering the 20 most famous Arabic fonts under Mac and MS-Windows
See description here
Arabic/Farsi font library Sakhr 26 fonts   3
Arabic Omni Data Sakhr   Arabic script - OMNI data trained for the feature space of Arabic characters covering both Naskh and Kofi font families 3

MEDAR is supported by the European Commission's ICT programme and is running from
February 1st 2008 until July 31st 2010

European Flag