Musing on the theory of it all — ICTIR 2013

The 4th International Conference on the Theory of Information Retrieval (ICTIR 2013) took place from September 29, 2013 to October 2, 2013, in the beautiful Danish capital of Copenhagen hosted by the University of Copenhagen, Department of Computer Science. As the name suggests this conference is primarily concerned with theoretical aspects of IR, however there is also room for industry-relevant and user-oriented topics that are viewed from a different, more theoretical, angle. ICTIR 2013 offered interesting tutorials, lots of keynotes, inspiring talks, a lively panel, a yummy banquet and lots of exciting people to meet.

Prologue: The tutorials

Peter Ingwersen was quite confident that the conference would be a success. He was right.
Peter Ingwersen was quite confident that the conference would be a success. He was right.

The conference started Sunday with 4 tutorials in 2 parallel sessions. The morning session had two interesting tutorials. In their tutorial “Quantum Mechanics and Information Retrieval: From Theory to Application” Benjamin Piwowarski and Massimo Melucci introduced the formalism behind Quantum Mechanics and its possible application to Information Retrieval. This interesting research direction had kind of a renaissance that was started by Keith van Rijsbergen in his seminal book “The Geometry of Information Retrieval”. If you don’t shy away from a bit of math and like to explore a non-classical formal framework for IR, you may have a look at what’s going on in quantum-based retrieval. What you gain is a powerful and expressive framework that goes beyond traditional probability theory and geometry as they are used in standard IR models. The assumption is that we need such a framework to model phenomena in IR that can not or not easily be modelled using a standard framework. The future is quantum, isn’t it?

At the heart of every IR evaluation there is statistical significance testing. Its theory and practice was the topic of the “Statistical Significance Testing in Information Retrieval: Theory and Practice” tutorial held by Ben Carterette. The goal of this tutorial was to help researchers and practitioners to gain a better understanding of how significance tests work and how they should be interpreted. Clearly, almost all IR researchers can benefit here.

The afternoon continued with two other exciting tutorials. The axiomatic approach to IR models was the topic of the “Axiomatic Analysis and Optimization of Information Retrieval Models” tutorial, held by Hui Fang. The tutorial showed how relevance can be modelled by a set of formal constraints on retrieval functions, for instance “Give a higher score to a document with more occurrences of a query term”. Retrieval models that satisfy more of such constraints are supposed to perform better.

In the other afternoon tutorial “IR Models: Foundations and Relationships”, Thomas Roelleke had a closer look at existing IR models and how they relate. Knowing about the foundations and relationships of IR models is crucial for building IR systems in many respect. The first thing Thomas had to do was to unify the terminology of existing models, which then revealed that indeed many models are related to each other and share some probabilistic roots. Besides that we learnt what you can find out about IR models simply by applying LaTeX macros when writing your documents.

The main conference

Thomas 'The Integrator' Roelleke demonstrating that golf and IR go well together
Thomas 'The Integrator' Roelleke demonstrating that golf and IR go well together

The main conference started on Monday with the first keynote address given by Dominic Widdows. Dominic observed that most vector space models in IR are still based on real numbers. “What kinds of numbers should we be using?”, asks Widdows. Other disciplines like physics have successfully been using a more expressive number field, complex numbers. Widdows encourages researchers to consider a different number field than real numbers. Further in his keynote Dominic Widdows presented a semantic WordSpace model that is able to answer questions like “What’s the dollar of Mexico?”

Following the keynote the first session about relevance feedback models started. In the afternoon there were sessions on evaluation and on context and diversification. After the sessions the posters and drinks reception took place, providing opportunites for some interesting discussion among the delegates.

In the Tuesday keynote Ricardo Baeza-Yates, VP of Yahoo! Research for Europe and Latin America, reflected if there is space for theory in modern commercial search engines. Unsurprisingly according to Baeza-Yates the answer is yes, as he pointed out using examples (tier prediction and query intent prediction) where theory can make an important contribution. “The best theory is inspired by practice and the best practice is inspired by theory” is a quote by Donald Knuth that basically sums up Ricardo’s main message.

Information Retrieval in the 1920s - Keith van Rijsbergen presenting Goldberg's Statistical Machine
Information Retrieval in the 1920s - Keith van Rijsbergen presenting Goldberg's Statistical Machine

Next on the agenda for Tuesday were sessions on recommender systems and temporal and thread search. Afterwards there was the panel discussion on challenges and long-range opportunities in IR research, chaired by Peter Ingwersen. Panelists were Ricardo Baeza-Yates (the ‘doer’ according to Peter Ingwersen), Stephen Robertson (introduced as the ‘thinker’), Thomas Roelleke (the ‘integrator’), Peiling Wang (the ‘methodologist’) and ChengXiang Zhai (the ‘broad-minded’). After a lively panel discussion, with lots interesting viewpoints from both the panelists and the audience, the conference dinner took place at Restaurant SULT located in the heart of Copenhagen.

 

The final day began with Keith van Rijsbergen‘s keynote on the roots of the theoretical basis for information retrieval, which became an interesting tour into the history of information retrieval. We learnt that the concept of search engines is as old as the 1920’s (Goldberg’s Statistical Machine) and also that we do machine learning in IR since the 60s (Rocchio). Keith asserts that we need theories to ask the right questions.

Two sessions on ranking followed after the keynote, before ICTIR 2013 concluded with a discussion on the future of ICTIR. The story of theory in information retrieval will of course continue, in fact as it turns out it is becoming more and more important.

What else do we take home?

The concept of one-man sessions. Both ChengXiang Zhai and Stefano Mizarro had their ‘own’ session (and they did it brilliantly). The unofficial ‘busy speaker award’ goes to ChengXiang Zhai.

Brewing theory that made it into application
Brewing theory that made it into application

Theory, though taking an awful long time to establish, is important for information retrieval in many respects. Even more, although the tutorials may suggests this, IR theory is not only about mathematical models. In fact, cognitive models and frameworks can be regarded theory, too, as it was emphasised in the closing discussion on the future of ICTIR as well as in Ricardo Baeza-Yates’ keynote. Researchers who usually publish in conferences like IIiX should check out future ICTIRs as well.

 

ICTIR had a normalized Salton award winner ratio of 0.06.

Copenhagen is a fantastic city. I enjoyed its relaxed atmosphere.

The organisers, Christina Lioma, Birger Larsen and Peter Ingwersen, did a great job! Birger even changed his affiliation during the conference. Congratulations on your new post at Aalborg University, Birger!

Some of the resources Keith van Rijsbergen mentioned in his keynote:

Further resources: