Nemlar logo





Contact:
nemlar@hum.ku.dk


Spoken Resources

Back to BLARK for spoken language

Availability:
3 existent but only company-internal, 2 existent and freely usable for PreR&D, 1 existent and freely usable for both PreR&D and R&D.
Cost:
4 > 10,000 €, 3 1,000 - 10,000 €, 2 100 - 1,000 €, 1 < 100 € or free
Adaptability:
3 black box, 2 glass box (you can see but not change it) 1 freely manipulable

R means for research, C means for commercial use.
For availability = 3 (company internal) other features are irrelevant.

)
Acustic Data
Name of Resource Provider Size Other information Availability, cost, manip.
SpeechDat like database UOB/ENS   More than 100 speakers French/Arabic, For speech recognition, Lebanese/Syrian/French 1,1,1
Arabic digits UOB   For speech recognition, Lebanese accent 1,1,1
Speech database in 4 languages LibanCell 10,000 announcement with 10 words/announcements   3
Labelled database for TTS Millenium     3
Arabic broadcast news speech corpus (BNSC) ELRA/LDC More than 20 hours of transcribed Arabic news in Modern Standard Arabic. Domain: news 1,2,1
Arabic acoustic corpus mono-speaker Benabbou, Morocco     3
Arabic Phonetic database King Abdulaziz City for Science and Technology   Lang: En-Ar 3
Holy Qur’an multi-speaker RDI 60 hours   1,4,1
Single male speaker concatenative Arabic TTS database RDI 1 hour, 1,300 sentences   1,3,1
Single female speaker concatenative Arabic TTS database RDI 4 hours, 3,000 sentences   1,3,1
Arabic concatenative TTS male recording Sakhr MSA 3 hours   3
Arabic ASR recording db Sakhr 56 hours of MSA and Colloquial Arabic   3
Human Names Language Model Sakhr 500,000 Names Egyptian and Saudi human names corpus 3
Arabic Acoustic Model Sakhr     3
CALLHOME Egyptian Arabic Speech LDC 120 Egyptian Colloquial Arabic telephone conversations Calls lasting up to 30 minutes 1,2,1
CALLFRIEND Egyptian Arabic LDC 60 telephone conversations between native speaker of Egyptian dialect of Arabic Calls lasted between 5 and 30 minutes. Includes documentation. All calls are domestic. 1,2,1
CALLHOME Egyptian Arabic Speech Supplement LDC 20 telephone conversations. Transcripts for 120 Egyptian Colloquial Arabic telephone conversations. 273,681,144 bytes (261 Mbytes) or 8 hours of audio data. 20 data files in sphere format, 8 KHz shorten-compressed 2-channel mulaw. 1,1,1
GlobalPhone Arabic ELRA About 100 adult native speakers were asked to read 100 sentences. The GlobalPhone corpus provides transcribed speech data for the development and evaluation of large vocabulary continuous speech recognition systems. 1,3
OrienTel United Arab Emirates MSA ELRA 500 speakers (254 males, 246 females) Recorded over the local fixed and mobile telephone network. 1,4
OrienTel Arabic as spoken in Israel ELRA 750 Arabic speakers (375 males, 375 females) Recorded over the Israeli fixed and mobile telephone network. 1,4
OrienTel Jordan MCA ELRA 757 Jordanian speakers (393 males, 364 females) Recorded over the Jordanian fixed and mobile telephone network. 1,4
OrienTel Jordan MSA ELRA 556 Jordanian speakers (288 males, 268 females) Recorded over the Jordanian fixed and mobile telephone network. 1,4
OrienTel Egypt MCA ELRA 750 Egyptian speakers (398 males, 352 females) Recorded over the Egyptian fixed and mobile telephone network. 1,4
OrienTel Egypt MSA ELRA 500 Egyptian speakers (254 males, 246 females) Recorded over the Egyptian fixed and mobile telephone network. 1,4
OrienTel Morocco MCA ELRA 772 Moroccan speakers (383 males, 389 females) Recorded over the Moroccan fixed and mobile telephone network. 1,4
OrienTel Morocco MSA ELRA 530 Moroccan speakers (264 males, 266 females) Recorded over the Moroccan fixed and mobile telephone network. 1,4
OrienTel Tunisia MCA ELRA 792 Tunisian speakers (426 males, 366 females) Recorded over the Tunisian fixed and mobile telephone network. 1,4
OrienTel Tunisia MSA ELRA 598 Tunisian speakers (359 males, 239 females) Recorded over the Tunisian fixed and mobile telephone network. 1,4
OrienTel United Arab Emirates MCA ELRA 880 speakers (432 males, 448 females) v Recorded over the local fixed and mobile telephone network. 1,4
Arabic Broadcast news LDC Recordings from several Arabic radio channels $700
The Corpus of Spoken Palestinian Arabic (CoSPAUniversity of Haifa, Israel Between 1996 and 1998, 200 hours of recorded speech have been collected. The aim is to collect data that would cover the whole linguistic area of Palestinian Arabic. -
KACST Arabic Phonetics Database KACST, Saudi Arabia The database contains more than 46,000 files. The KAPD is a detailed and comprehensive database that shows the articulatory mechanism of Arabic sounds. KAPD is available on 3 CD’s for researchers and students of Speech.
Saudi Accented Arabic Voice Bank KACST, Saudi Arabia 1033 native speakers Saudi accented Arabic telephone speech database Can be licensed to be used in research or to develop products when a contract with KACST is signed.


Written Corpus for Speech Technologys
Name of Resource Provider Size Other information Availability, cost, manip.
Corpus for di-syllables Abdelhak Mouradi, Noureddine Chenfour   Domain: text-to-speech 1,2,1
CALLHOME Egyptian Arabic Transcripts LDC Contiguous 5 or 10 minute segments taken from 120 unscripted telephone conversations The transcripts are timestamped by speaker turn for alignment with the speech signal and are provided in standard orthography. 1,2,1


Phonetic Lexicon
Name of Resource Provider Size Other information Availability, cost, manip.
Special pronunciations dictionary Sakhr 20,000 entries Dict. for handling pronunciation anormalities 3
Name master dictionary Sakhr 100,000 Names   3
LC-STAR Standard Arabic Phonetic lexicon ELRA 110,271 entries 52,981 common word entries, 50,135 proper names, 7,155 special application words. 1,4


MEDAR is supported by the European Commission's ICT programme and is running from
February 1st 2008 until July 31st 2010

European Flag