Persian Linguistics System Architecture Dictionary Structure Demos Publications Persian Resources


The Shiraz machine translation system is a MT prototype that translates Persian text into English. The project began in October 1997 and the final version was delivered in August 1999. The system uses typed feature structures and an underlying unification-based formalism to describe Persian linguistic phenomena. It is able to run on Unix as well as Windows machines. The Shiraz system uses an electronic bilingual Persian to English dictionary consisting of approximately 50,000 terms, a complete morphological analyzer and a syntactic parser. The system components were tested on a bilingual tagged corpus developed from a large Persian corpus of on-line material (approximately 10MB). The machine translation system is mainly targeted at translating news material.

The dictionary was built by a team of Persian lexicographers and includes single words, compounds and phrasal expressions. The dictionary contains information about the orthography, morphosyntactic category and syntactic properties of lexical items as well as the English word-sense equivalents.

The current system performs tokenization and full morphological analysis. Compounds and light verbs are also recognized. The syntactic parser can analyze noun phrases (including relative clauses), preposition phrases and basic sentential constructions.


Project Members
Rémi Zajac, Project Manager
Karine Megerdoomian, Computational Linguist
Jan Amtrup, Computer Specialist
Hamid M. Rad , Computational Linguist, Technical Writer
Mohammad Reza Aidinejad, Lexicographer
Jane Freider, Computer Specialist
Mike Freider, Computer Specialist