CERCLL Project Close-Up: The L2 Written Arabic Corpus

Project Director: Dr. Samira Farwaneh, Associate Professor, Arabic Language and Linguistics
Project Assistant: Mohammed Tamimi, SLAT Ph.D. Candidate

The L2 (or interlanguage) written Arabic corpus project is a gradually expanding database of written samples produced by L2 and heritage students studying Arabic as a second or foreign language. Most essays in the collection were initially handwritten, and later carefully typed by one assistant and proofread by another to ensure that errors in the originals were not unconsciously corrected by the typist.

The typed essays are now located within a searchable database, where they are tagged by learner level (beginning, intermediate, or advanced), learner type (L2 vs. heritage), and genre (description, narration, or instruction). The database now features nearly 300 essays, most from second, third, and fourth year Arabic students. Roughly one-fifth of the essays were written by heritage students with some background in a regional variety of Arabic. The rest were produced by L2 learners. Essays categorized as “reflective” were written at home as homework assignments, while “spontaneous” essays were written during in-class exams.

The complete database is freely accessible online at http://l2arabiccorpus.cercll.arizona.edu.

One of the many challenges of foreign language instruction is that syllabi and textbooks are designed following native speakers intuition which may not be reflected in interlanguage grammars. For example, the preliminary data collected so far show that learners of Arabic correlate stress with length, and this correlation is expressed as orthographic errors involving inaccurate insertion or omission of long vowels. In Arabic, however, length and stress are interrelated but independent suprasegmental features; a vowel may be long but not stressed, or it may be stressed but not long. Teachers rarely emphasize stress and length features in class activities and examinations, focusing their attention instead on grammatical mood and case endings which often are not overtly marked in authentic texts.

The corpus will serve as a significant source of empirical data for hypothesis testing in second language acquisition research. It will also be a resource for syllabus design, textbook development and assessment, dictionary design, and teaching methodology for Arabic instructors.

The first public presentations introducing the project were given at the spring 2010 Western Consortium of Middle Eastern Languages Workshop held in Tucson and the Georgetown Roundtable on Arabic Linguistics and Pedagogy. This past Spring, it was introduced at the NCOLCTL annual conference as well. The project received positive and constructive feedback at these events.