Book Review: Multilingual Information Retrieval – From Research to Practice

Multilingual Information Retrieval – From Research to Practice by Carol Peters, Martin Braschler, Paul Clough

ISBN: 978-3-642-23007-3

The fundamental concept of Multilingual Information Retrieval is computer usage aimed at surmounting language boundaries both for information in the WWW and for many other purposes, such as military intelligence or defense, international trading, inventions or international relations between countries, not to mention the most common use – human communication. In 2004 the highest number of candidate countries ever joined the EU – since then on the strong necessity to translate all the official documents into other languages has resulted in a requirement to develop new technologies for automatic search, translation and in some cases information summarisation.

In 2002 American Text Retrieval Conference (TREC), an on- going series of workshops co-sponsored by the National Institute of Standards and Technology (NIST) and the U.S. Department of Defense (the only institution at that time in the world developing the research in multilingual, or rather American-Chinese information systems) has turned the priority over the multilingual tracks to CLEF (Cross-Language Evaluation Forum) headed by one of the book authors – Carol Peters and supported by the European Commission. Thus, in this book the Reader may find a lot of information from experience of running such large evaluation campaigns.

For  xemple, During the SIGIR ’96 workshop on Cross-Language Information Retrieval (CLIR) a discussion took place on introducing a technical term that best describes the field. At the same time American Defense Advanced Research Agency (DARPA) was proposing “translingual” which, as of yet, has not become the standard in literature [Douglas W. Oard, “Alternative Approaches for Cross-Language Text Retrieval,” in AAAI Symposium on Cross-Language Text and Speech Retrieval, pp. 131-139, Palo Alto CA, 1997]. Multilingual, as an alternative, seemed too broad to distinguish IR systems with translation component from those in which the queries are retrieved in any target language.

The book contents comprise six chapters that follow a conference paper structure. Personally, I find the most interesting chapter 1 that defines CLIR and presents its history in brief, in particular conferences and institutions that started the research. The nature of chapter 2 is more technical as it provides step-by-step basics about the process of monolingual IR. Students can learn about indexing and matching phases. Chapter 3 introduces the most common approaches to CLIR indicating divergence of languages, translation models and language ambiguity. The next chapter deals with human- computer interaction in terms of multilingual interface design. Chapter 5 is devoted to evaluation seen from user and system perspectives. Obviously, the Authors couldn’t stop to kill two birds with one stone promoting the multilingual CLEF tracks and sharing the results from the campaigns’ Working notes. The evaluation metrics stimulate developers to improve their own IR systems encouraging them simultaneously to participate in a campaign in order to compete with numerous other research groups from around the world. The last chapter concludes with two aspects that cannot be missed to give the complete overview of information retrieval; non-textual information like image, speech and video retrieval to move on the Reader to practical implementation of multilingual systems. Here presented are Web Search, Digital Libraries and the branches that rely on multilingual systems mainly like healthcare, government, law, business and commerce.

Having the book read I do agree with the Authors saying “The book is intended for graduate students, scholars and practitioners with a basic understanding of classical text retrieval methods.” Therefore, I recommend it to academia as a resource providing background knowledge in multilingual information retrieval.