
|
[CRL Home][CRL Research]
URSAUnicode Retrieval System Architecture
|
|
|
Overview
|
|
URSA is Computing Research Lab's Tipster Phase III research and
development effort to make text processing and information retrieval
transparent to languages.
URSA combines unicode display technology developed at CRL with
translingual information retrieval, multilingual collection
visualization and document management, with special emphasis on design
principles that have been validated by examining the analyst in
real-world scenarios.
|
|
Tipster Phase III
|
|
Tipster is DARPA's design and research effort to
unify the detection and language processing capabilities
of a diverse range of research entities into a single,
plug-and-play architecture. The Tipster Document Manager is the
central feature of the Tipster design. In Tipster, there are
collections, documents, attributes and annotations. Collections can
contain documents and collection attributes, while documents can have
annotations and attributes. Other Tipster features include support for
detection and information extraction technologies.
Because the data in a Tipster document is simply a byte stream, a
document can contain Unicode text, video, audio, or any other
imaginable data type with equal facility. It is up to the application
that makes use of the documents to interpret the data correctly.
|
|
Unicode and URSA
|
|
URSA combines the latest advances in information retrieval with the
coherent Unicode text model to make language-transparent IR a
reality. Ongoing development is focusing on integrating Unicode
detection technology with the Tipster architecture. The model we are
developing utilizes annotations on the documents to describe the
text for indexing. External document annotators can produce
segmentations for Oriental languages, or stemmed word markup for
Western languages, which are then interpreted by the URSA engine and
indexed for later retrieval. The URSA engine will be fully conversant
in Tipster detection needs, including complex query expressions and
natural language queries.
|
|
Papers and Presentations
|
|
Check-out some of our papers or download presentation slides on
Cross-language Text Retrieval, URSA and related efforts:
Download the FREE Adobe Acrobat Reader for PDF files here
-
[postscript]
[pdf]
Tipster III Kickoff Meeting Presentation (October 1996, Columbia, MD)
-
[postscript]
[pdf]
Tipster III 6-month Meeting Presentation (May 1997, Columbia, MD)
-
[postscript]
[pdf]
Tipster III 12-month Meeting Presentation (October 1997, La Jolla, CA)
-
[postscript]
[pdf]
Trec 5 Paper on Cross-Language Text Retrieval (TREC 5, Gaithersburg,
MD, November 1996)
-
[postscript]
[pdf]
Trec 5 Slides on Cross-Language Text Retrieval (TREC 5, Gaithersburg,
MD, November 1996)
-
[postscript]
[pdf]
Cross-Language Text Retrieval using Evolutionary Optimization (EP95 in San
Diego)
-
[postscript]
[pdf]
A Follow-up Paper on Cross-Language Retrieval Using Evolutionary
Optimization (EP96 in San Diego)
-
[postscript]
[pdf]
AAAI Workshop on Cross-Language Text Retrieval Paper (Stanford
University, March 1997)
-
[postscript]
[pdf]
Paper presented at SIGIR96 Workshop on
Cross-linguistic Information Retrieval (ETH, Zurich 1996)
-
[postscript]
[pdf]
Trec 4 Paper on Cross-Linguistic Text
Retrieval (TREC 4, Gaithersburg, MD, November 1995)
-
[postscript]
[pdf]
Trec 6 Paper on Cross-Language Text
Retrieval (TREC 6, Gaithersburg, MD, November 1997)
-
[postscript]
[pdf]
SIGIR 97 Paper on Implementing Large-Scale Cross-Language Text
Retrieval Systems (SIGIR97, Philadelphia, PA, August, 1997)
-
[postscript]
[pdf]
SIGIR 97 Workshop Paper on Monolingual, Multilingual and Crosslingual
Information Retrieval using network models
(SIGIR97, Philadelphia, PA, August, 1997)
-
[postscript] plus a figure [postscript] OR
[pdf] plus a figure [pdf]
Early Tech Report on iteratively least-squares fitting of language
translation models.
-
[pdf]
Slides from Internal CRL Seminar, containing an overview of text
retrieval and some notes on QUILT
-
[power point]
[pdf]
Unicode Conference Presentation (with notes). 14th International
Unicode Conference, Boston MA March 1999.
-
[postscript]
[pdf]
Extended tech report on using URSA libraries and tools. You can
download the J24 development archive described in
the paper if you know the secret password. Warning: it is around 10Mb
compressed and 40 Mb uncompressed.
-
[zip]
[html]
Power Point presentation (May 1999). Reviews work on Cross Language
Language Text Retrival and Interactive IR and how we have combined
these results in the design of an Interactive CLTR interface, KEIZAI.
New
-
[doc]
[pdf] Keizai: An Interactive
Cross-Language Text Retrieval System
Paper for the Workshop on Machine Translation for Cross Language
Information Retrieval which was held in conjuction with the MACHINE
TRANSLATION SUMMIT VII September 13-17, 1999, Singapore.
-
[doc]
[pdf] Improving Cross-Language Text Retrieval with Human Interactions
Paper presented at The Hawaii International Conference on System Sciences HICSS-33
January 4-7, 2000.
|
|
Contacts
|
|
For more information on the URSA project, Unicode detection and
translingual information retrieval or user-centered detection system
design, please contact principle investigators Bill Ogden or
Mark Davis.
|
|