The European Conference on Information Retrieval https://ecir2020.org/ was originally planned to take place in Lisbon in the week after Easter. The timing for me was unfortunate as we had an important family event to attend in Glasgow. In the weeks leading up to the conference it became clear that actions being taken to mitigate the Covid19 virus would mean that a virtual event was the only option. This was good news for me personally as it meant I could now attend the sessions! Overall it was a very impressive event, not just in terms of the quality of the papers but in the way in which the conference was managed by the Conference Committee. The timing was very good, and most of the presenters recognized the importance of a focused presentation. I did find it difficult to be certain about the schedule of short papers on YouTube and the Q&A sessions suffered from the fact that Zoom offers Chat for text questions and the Raise Your Hand option that enabled the question to be asked in audio in real-time and to me it was unclear what was the best (for the chair and the speaker) to use.
Conference proceedings
The two volumes of the conference proceedings can be found at
https://link.springer.com/book/10.1007/978-3-030-45439-5#about
If you are going to download them be aware that they are 42MB each. At present they are open access.
There are around 70 presentations available on the ECIR2020 YouTube channel.
Keynote presentations
https://ecir2020.org/keynote-speakers/
There were three keynote presentations.
Focusing the macroscope: how we can use data to understand behavior Joana Gonçalves de Sá
Task-Based Intelligent Retrieval and Recommendation (Karen Spärck Jones Award Keynote) Chirag Shah
Better Representations for Search Tasks Jamie Callan
Because of another commitment I was not able to listen in to Joana’s paper. I hope to be able to include it in the next issue of Informer. Below I have summarised my personal take on the presentations from Chirag and Jamie which I hope will give a flavour of their excellent contributions. Making notes is not easy when you find so much of value in what the speaker is presenting. The texts of the keynotes are not included in the conference proceedings.
Professor Chirag Shah – the Karen Spark Jones Award Lecture
https://ischool.uw.edu/people/faculty/profile/chirags
Chirag is one of the leading IR researchers, as you will see from his list of publications. He gave the prestigious Karen Spark Jones lecture, which is probably the IR equivalent of a Noble Prize.
His presentation was structured around four topics
- Using behavioural data to predict task facets
- Learning about searchers intentions
- Combining task, user and intention signals for a more comprehensive model of user behaviour
- Using foraging behaviours to link online and physical searching patterns.
As with many of the speakers Chirag focused in on understanding tasks as a crucial precursor to achieving good search results. In particular he commented about the challenges of learning about the intentions of searchers. He has had some success in deriving intentions from the initial query but in an answer to a question I raised he agreed that this was with English speakers searching English document sets. This issue came up in a number of papers where the assumption was that all search users have English as their primary language and only want to find English language content.
Of particular interest to me was his reference to information foraging. This approach dates back to 1999 and provoked a lot of interest at the time, but was then pushed out of mainstream retrieval thinking until 2017/2018. Now it seems to have come centre-stage in many discussions about IIR and a number of speakers referred to it during the conference. The key issue about foraging is that users make use of a range of subliminal clues when scanning through a list of results. Trying to replicate these clues in the initial management of a search is providing to be very difficult.
What Chirag and others are working towards is moving the intervention as early as possible in the user journey and not waiting until the first set of results are presented. Chirag suggested that this could be regarded as ‘information fostering’. This comes down to making significant improvements in auto-suggestion and although these are being researched once again it comes down to coping with search users operating in their second language and not being able to express a clear intention in their queries.
One of the most memorable elements of his presentation was the way he incorporated an escape room in his investigation of information foraging. I cannot do justice to the description so have to leave that hanging in the air.
In the question period Chirag was asked about what he saw as the impacts of Covid19 on searching.
He made the following points.
- Distributed working will give rise to novel searches to novel problems
- The people doing the search may be doing so on behalf of others in their virtual team and so there is no search history to use
- Collaborative searching has not caught on but it seems very likely that the extensive use of distributed working will result in a surge of interest in this.
Professor Jamie Callan
Jamie Callan is a guru among gurus of IR research
http://www.cs.cmu.edu/~callan/
He made the point at the outset of his presentation that IR is the only element of computer science where evaluation by users is essential in improving the search experience, and that at present we are lacking good evaluation measures for the very simple reason that users are all individuals. Nevertheless that should not stop us trying to develop good evaluation measures. He felt that IR is not being driven by NLP and ML and that research in the area is being reactive rather than innovative. Jamie also highlighted that IR is the only element of computer science where there is a requirement for evaluation as for most other systems it comes down to a core issue of whether the user-specified requirements have been achievedInterestingly he commented that he felt that the negative connotations of presenting ‘ten blue links’ were overstated as a user needed to see the context of each result.
Much of his talk was around understanding search query patterns. Jamie was very positive about the potential value of BERT in multi-stage reranking but emphasized that it needed volumes of data that were rarely generated outside of high-volume e-commerce applications. These could indeed benefit from BERT approaches but were a special case because it was possible to do A/B testing on changes to a much more significant extent that on complete documents.
He then moved on to look at conversational search. In presenting the context for conversational search he noted that text and voice-based dialogue systems have become common but are based around strong models of common tasks, usually rely on extensive curated domain knowledge and search is being used as a last resort. In his view they are good for easy search tasks but have limitations for complex open-domain information seeking. This is because they are poor at giving a sense of the landscape (compared with looking at a search results page) and because of use of machine learning the answers focus on what other people needed to know. He pointed out that the TREC conversational search track (CAsT) was quite limited in its scope, because
- There was no information introduced about the person seeking the information
- Questions were in a static order, and can so can refer to concepts introduced in prior questions
- The answer is a single passage
- Questions and answers are independent of previous results.
He pointed out that a number of changes are being proposed for the 2020 CAsT track though of course it is doubtful that the TREC event will take place.
A subsequent paper by Julia Kiseleva (Microsoft Research) was quite brutally honest about the challenges of evaluating conversational search systems, primarily because at present there is no common way to undertake the evaluations, making comparisons between approaches of little value.
I would add that towards the end of the conference there was an excellent panel discussion on conversational search (Microsoft, Amazon, etc) and the general view was that we are still at the very beginning of understanding not just how to improve conversational search but what the business/use cases were to justify the research and development investment. Much of the work that has been reported where the results seem to be good are a) special cases, such as e-commerce and b) the evaluation is not rigorous enough for others to be able to scale up on the outcomes.