Recent Research in Cross-language Document Search

Fredric Gey

University of California, Berkeley

 

 

Abstract

Cross-language document search research has been underway for more than 10 years now and while much progress has been made, certain research challenges remain.   This talk will review recent research in Cross-language information retrieval, including the 2004 evaluation workshops: NTCIR for Asian language retrieval in Japan (http://research.nii.ac.jp/ntcir-ws4/index.html) and CLEF for European language retrieval (http://clef.iei.pi.cnr.it:2002/), as well as the U.S. DARPA “Hindi Surprise Language Exercise” of 2003.   Topics to be covered include:

·        Language-specific processing (stemming, segmentation, stop-words)

·        Word decompounding for German

·        Translation disambiguation for bilingual dictionaries

·        Parallel corpora induced lexicons

·        Web corpora usage for out-of-vocabulary translation

·        Special retrieval tasks (Patent Retrieval, Cross-language question answering

·        Geographic information retrieval

·        Challenges of less-commonly taught languages

·        The road ahead in cross-language information retrieval research

 

Presenter:  Dr. Fredric Gey has been doing research in cross-language information retrieval since 1998.  He and his associates have participated in every cross-language information retrieval evaluation in the United States, Japan and Europe.   Currently he is working on retrieval (including geographic information retrieval) of Russian language corpora and other digital objects.  Dr. Gey co-chaired the English-Arabic retrieval evaluation track at the TREC conferences in 2001 and 2002.  He co-chaired a workshop on “Cross-language Information Retrieval Research: The Road Ahead” at the ACM SIGIR-2002 conference in Finland.  He is co-author of the entry on “Multilingual Information Retrieval” in the Encyclopedia of Library and Information Science and co-editor of a forthcoming special issue on Cross-Language Information Retrieval of the Information Processing and Management Journal.