CMU-CS-QTR-124 Computer Science Qatar School of Computer Science, Carnegie Mellon University
The Qatar Arabic Language Bank Guidelines Wajdi Zaghouani*, Nizar Habashy**, Behrang Mohit* September 2014
The Qatar Arabic Language Bank (QALB) is a corpus of Arabic text with manual corrections. The Arabic text comes from three sources: native speakers, non-native speakers, and machine translation (into Arabic). The corpus consists mainly of Modern Standard Arabic (MSA) texts but some dialect Arabic usage may occur. The goals of the annotation are: to provide training data for learning based Arabic error correction tools, and to provide a gold-standard to be used in the evaluation of error correction algorithms. This document is the reference guidelines for text correction in the QALB project.
115 pages
| |
Return to:
SCS Technical Report Collection This page maintained by reports@cs.cmu.edu |