Computer Science Qatar
School of Computer Science, Carnegie Mellon University


The Qatar Arabic Language Bank Guidelines

Wajdi Zaghouani*, Nizar Habashy**, Behrang Mohit*

September 2014


Keywords: Annotation, Guidelines, Arabic, Errors, Corpus

The Qatar Arabic Language Bank (QALB) is a corpus of Arabic text with manual corrections. The Arabic text comes from three sources: native speakers, non-native speakers, and machine translation (into Arabic). The corpus consists mainly of Modern Standard Arabic (MSA) texts but some dialect Arabic usage may occur. The goals of the annotation are: to provide training data for learning based Arabic error correction tools, and to provide a gold-standard to be used in the evaluation of error correction algorithms. This document is the reference guidelines for text correction in the QALB project.

115 pages

*Carnegie Mellon University Qatar Campus
**New York University Abu Dhabi

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by