| Shiraz Home | Persian Linguistics |
| Introduction | Morphology | Noun Phrase | Persian Syntax |
Persian syntax is quite ambiguous in written form which raises certain difficulties in automatic parsing of written text. Several factors contribute to the ambiguity: Although Persian is a verb-final language, it does not adhere to a strict word order and the sentential constituents may occur in various positions in the clause; this is especially the case for preposition phrases and adverbials. In addition, there are no overt markers, such as case morphology, to indicate the function of a noun phrase or its boundary; in Persian, only specific direct objects receive an overt marker. Although in spoken language, the ezafe morpheme is used to link the elements within the noun phrase, this morpheme, being a short vowel, is absent in written text. Furthermore, subjects are optional in Persian and subject-verb agreement is not always present for inanimate subjects. Since short vowels are not transcribed, lexical ambiguity is also another problem in automatic parsing of Persian text.
Persian preposition phrases, however, are easily recognized and can be used to mark phrasal boundaries in the sentence. Additionally, the verb almost always occurs in the sentence-final position in written text which facilitates parsing. This section provides a description of Persian syntax, especially concerning issues that may arise in a computational analysis of written text. Certain rules used in the Shiraz syntactic grammar are presented. Syntactic disambiguation methods, if available, are also discussed.
Persian is an SOV language: the sentences appear in the word order Subject-Object-Verb. The verb is marked for tense and aspect and usually agrees with the subject in person and number. Persian is a pro-drop language, thus the subject is optional. The object marker râ is used to indicate specific direct objects in simple sentences.1
If there is an oblique object or a Prepositional Phrase in the clause, it precedes the indefinite direct object as shown in (2), but usually follows the specific or definite object as in (3).
Although these examples describe the canonical word order, Persian is a free word order language and the sentential constituents can be moved around in the clause. These "scrambled" clauses often give rise to focused or topicalized readings. In the written language, although most elements may appear in relatively free word order, the sentences often remain verb-final. Adverbs and preposition phrases, however, can appear in various positions quite freely. Apart from manner adverbs, which occur within the verb phrase, other adverbs may appear almost anywhere in the clause, in between the various constituents. Adverbs usually can not occur following the verb.
Although Persian is verb-final at the sentential level, it behaves like head-initial languages in noun phrases (NP) and preposition phrases (PP). Thus, the head noun in a NP is often followed by the modifiers and possessors (4), and the preposition precedes the complement NP (5).
Certain preposition phrases, such as locative and directional PPs, can follow the verb as shown in the following examples. The preposition is sometimes optional in these cases. These constructions, however, do not often occur in written text.
The head noun is preceded by the determiner, the numeral constructions and the quantifiers, and it is followed by the modifiers, which usually consist of an adjectival phrase (AP). Superlative adjectives, however, do not appear in the AP; instead, they precede the head noun. Numeral constructions, quantifiers and superlative adjectives are in complementary distribution, i.e., if one of these elements is present, the others cannot occur within the NP.
The relative ordering of the constituents of the simple NP is given below:
where the head is a Noun and the parts of speech or phrases that can appear in each of the other categories are as shown below. Note that all the constituents, with the exception of the head noun, are optional.
NP = determiner specifier head modifier
The modifiers are linked to the head noun with the ezafe
morpheme. The following example represents a simple Noun Phrase where CL stands for Classifier and Ez for the ezafe morpheme. Classifiers indicate the class or type of the noun. Thus, for instance, tâ is used with count inanimate nouns, nafar indicates people, qalâde (=collar) can be used when giving a count for dogs, etc.
The infinitival constructions are very similar to the English gerundive. The infinitive head can appear in a predicate construction or with an adverbial. The objects of the verb become arguments of a possessive construction as exemplified in (11).
In the current Shiraz grammar, these boundary markers have been
incorporated within the NP rules. Thus, if a simple noun phrase
carries a boundary marker, it is not allowed to join with another NP
to form a more complex phrase. As a simple illustration, consider the
two N'-forming rules, NounBarClitic and
NounBarEzafe. These rules contain a left-hand side (lhs)
and a right-hand side (rhs) as in rewrite rules. In the first rule,
the right-hand side is satisfied if a clitic is detected (indicated by
clitic.function: True). As can be seen in the left-hand
side of this rule, this nominal element is tagged as the head of the
N' (per.NounBar) and the value of the boundary feature is set to True. This boundary value is transferred up when the higher NP level is formed; this NP will not be allowed to join to another noun phrase following it since the boundary has already been set to True.
// N' --> N carrying a boundary marker
NounBarClitic = per.Rule[
lhs: per.NounBar[
head: #head,
boundary: True],
rhs: <:
#head= per.Noun[infl.clitic.function: True]
:>
];
In the case of the NounBarEzafe rule, however, when an ezafe feature is detected (shown in the right-hand side of the rule as infl.ezafe:per.EzTrue), the boundary feature in the left-hand side is set to False. This allows the N' and the higher NP to join to the following noun phrase construction.
// N' --> N carrying ezafe - no boundary set
NounBarEzafe = per.Rule[
lhs: per.NounBar[
head: #head,
boundary: False],
rhs: <:
#head= per.Noun[infl.ezafe: per.EzTrue]
:>
];
The relative clause may be separated from the head noun by the main verb as illustrated below. In addition, several relative clauses could follow a head noun.
| zamin xordan | "floor eat" | to fall | |||
| zendegi kardan | "life do" | to live | |||
| gul zadan | "deception hit" | to deceive | |||
| shekast dâdan | "defeat give" | to defeat | |||
| e'lâm kardan | "announcement do" | to announce | |||
| âsib didan | "damage see" | to be damaged | |||
| pâyân yâftan | "ending find" | to end | |||
| na're keshidan | "yelling pull" | to yell, to roar | |||
| e'teqâd dâshtan | "belief have" | to believe | |||
| be donyâ âmadan | "to world come" | to be born | |||
| az dast dâdan | "from hand give" | to lose |
These constructions can also be used as purely idiomatic expressions:
| del be daryâ zadan | "heart to sea hit" | to take a risk |
In any case, these complex predicates are extremely productive in Persian. New verbs are formed following this pattern, by joining a nominal or adjectival word (possibly a loan word) to a light verb as shown:
| email zadan | "email hit" | to (send) email | |||
| klik kardan | "click do" | to click (on a mouse) |
In addition, verbs in simple form have been and currently are in the process of dying out and are being transformed into the light verb constructions. The light verbs used in these complex predicates are not always semantically vacuous. In fact, these verbs may contribute to the aspectual readings of the predicate or provide a causation interpretation to the verb. They may also contribute to the transitivity of the verb phrase as shown in the examples below. The first sentence consists of the light verb construction shekanje dâdan (torture give) and gives rise to a transitive sentence. The second sentence, on the other hand, is formed with the light verb construction shekanje didan (torture see) and the result is a passive reading.
For the purposes of the Shiraz project, however, light verb constructions were input into the dictionary as lexical units with their corresponding translations into English. In other words, light verbs are treated as compounds in the Shiraz machine translation system: Each element of the construction undergoes morphological analysis and the results are joined together when the light verb construction is recognized. Consider the example in (31) representing a light verb construction, in which both the nominal and the verbal parts carry morphemes.
In this example, the light verb zadand carries information on tense, aspect, number and person. The nominal part kotak (beating) carries the clitic pronoun for third person singular. This clitic is analyzed as an object (i.e., accusative) on verbs. The result of morphological analysis and lexical lookup for each part is shown in (32) for the nominal part and in (33) for the verbal part, where lex represents the lexical information and infl is the inflectional information computed by the morphological analyzer. Note that in (32), the noun has been analyzed as a singular, carrying a clitic pronoun (third person singular). In (33), the verb is analyzed as active voice, preterite tense, third person plural agreement; there are no clitic pronouns on the verb.
(32) Noun[
lex : LexMorph[ number : Singular, regular : True],
infl : NominalInfl[
number : Singular,
clitic : Clitic [person : Third,
number : Singular,
function : Possessive],
ezafe : EzFalse,
indefEncl : False,
indefinite : False,
enclitic : False],
exp : "ktk",
trans : <: LSign[exp : "beating"] :>]
(33) Verb[
lex : LexMorph[
number : Singular,
presentStem : "zn",
regular : True],
infl : VerbalInfl[
voice : Active,
clitic : Clitic[function : Null],
tense : Preterite,
causative : False,
negation : False,
mood : Indicative,
person : Third,
participle : PartFalse,
numberAgr : Plural],
exp : "zdn",
trans : <:
LSign[
exp : "hit"]LSign[
exp : "play"]:>]
The simple rule below shows how the two parts of such a light verb construction are unified in the Shiraz grammar, and how their morphological information is percolated from the right-hand side, up to the left-hand side, in order to form the single light verb construction NominalLVEntry.
(34) NominalLV = per.Rule[
lhs: per.NominalLVEntry[
infl: [ mood: #mood, //verbal morphemes
tense: #tense,
voice: #voice,
person: #person,
numberAgr: #numberAgr,
causative: #caus,
negation: #neg,
participle: #part,
clitic: #clitic, //nominal morphemes
number: #number,
ezafe: #ezafe,
indefEncl: #indencl,
indefinite: #indef,
enclitic: #encl]],
rhs: <:
per.Noun[infl: [
number: #number = Top,
ezafe: #ezafe = Top,
indefEncl: #indencl = Top,
indefinite: #indef = Top,
enclitic: #encl = Top,
clitic: #clitic = Top]]
per.Verb[infl: [mood: #mood = Top,
tense: #tense = Top,
voice: #voice = Top,
person: #person = Top,
numberAgr: #numberAgr = Top,
causative: #caus = Top,
negation: #neg = Top,
participle: #part = Top]]
:>,
recursive: True, // This rule can apply recursively
lookup: True, // Perform dictionary lookup after creating lhs
remove: True]; // Remove all edges used by this rule after parsing
The final structure for the light verb construction, after the rule in (34) has applied and the final structure is looked up in the dictionary, is shown in (35). We now have a light verb construction (NominalLVEntry) resulting from the unification of the two parts.
(35) NominalLVEntry[
lex : LexMorph[
number : Singular,
regular : True],
infl : NominalLVInfl[
voice : Active,
number : Singular,
causative : False,
ezafe : EzFalse,
tense : Preterite,
person : Third,
clitic : Clitic[ person : Third,
number : Singular,
function : Possessive],
participle : PartFalse,
mood : Indicative,
negation : False,
numberAgr : Plural,
indefinite : False,
indefEncl : False,
enclitic : False],
exp : "ktk zdn",
trans : <: LSign[ exp : "beat up"]:>]
Thus, if light verb constructions always occurred as one single unit, they could easily be recognized. This is not the case, however. These verbal constructions can be separated from each other by other intervening elements. The object of the light verb, for instance, may appear between the two parts of the construction as shown in (36) for the light verb construction âsheq shodan (fall in love). In (37), the light verb predicate afzâyesh yâftan (increase) has been separated by the adjective shadid, which is behaving as an adverb. (38) represents the light verb construction xâstâr shodan (request) with an intervening object, which itself consists of a complex noun phrase composed of a NP and a PP.
In all of these examples, the separated parts of the light verb are
still to be recognized as one unit. However, in certain cases, the
separated constituents lose the light verb construction
meaning. Compare the two sentences in (39) . In (39a), the light verb construction is interpreted as a unit, whereas in (39b), the intervening object marker splits the light verb construction. In this case, the nominal part jâru (broom) has become the direct object of the verb zadan (to hit). A similar effect is obtained by the relativization of the nominal part in (40).
Compare (40), however, to the construction in (41) with the light verb predicate latme zadan (damage). In this instance, even when the nominal element is relativized, the light verb construction still obtains.
The examples discussed in this section show that light verb constructions do not form a unified category. Some research is required, however, to be able to better classify the various light verb predicates based on their properties. The current Shiraz dictionary contains more than 8000 light verb constructions and the syntactic parser can correctly recognize them as well as any inflection that appears on them. The parser, however, is unable at this point to recognize light verb constructions with intervening elements.
| Top of Page |