A comprehensive study of the computational treatment of texts is a multifaceted endeavor covering a wide range of linguistic and language use phenomena. Because the various facets of this knowledge are complex in their own right, study of any individual phenomenon is often conducted in relative isolation from the study of other related phenomena. However, in a knowledge-based machine translation (KBMT) application, knowledge about a large number of interrelated linguistic and language use phenomena is required. A natural way of combining the diverse knowledge required of such a system into a unified whole is for the various phenomena to be treated by separate computational linguistic "microtheories" united through a system's control architecture and knowledge representation conventions.
Figure 1:
The Mikrokosmos NLP Architecture.
In the uK project, being developed by researchers at the
Computing Research Laboratory (CRL) of New Mexico State
University,
a comprehensive study of a variety of microtheories central to the support of KBMT systems is being carried out with
the ultimate objective of defining a methodology for representing the meaning of natural language
texts in a language-neutral interlingual format called a text meaning representation (TMR). The
TMR represents the result of analysis of a given input text in any one of the languages supported by
the KBMT system, and serves as input to the generation process. The
meaning of the input text is represented in
the TMR as elements of an independently motivated model of
the world (or ontology). The link between the ontology and the TMR is provided by the lexicon,
where the meanings of most open class lexical items are defined in terms of their mappings into
ontological concepts and their resulting contributions to TMR
structure. Information about the nonpropositional components of text meaning such
as speech acts, speaker attitudes and intentions, relations among text
units, deictic references, etc. is also derived from the lexicon with
inputs from other microtheories, and becomes part of the TMR.
Figure 1 illustrates the uK architecture for analyzing input
texts.
Initially, the project is concentrating on the microtheory of lexical-semantic dependency, the core
microtheory underlying our approach to a comprehensive analysis of the meaning of texts, and the
one in which the basic structure of events or states and their
properties is specified. Additional
microtheories are being developed for aspect, time, modalities,
discourse relations, reference, event ellipsis and
style.
Kavi Mahesh