As has been the custom, the main Search Solutions 2021 event has again been preceded by a day of tutorials. Two tutorials were successfully delivered on the day: an introduction to Natural Language Processing and a “Practitioners’ Evaluation Roundtable” (which was not a tutorial in the strict sense). Haiming Liu, Ingo Frommholz and Jochen Leidner reflect on the scope and outcomes of the tutorials.
Tutorial one was a full day tutorial, which was delivered by Dr Michael Oakes, who gave an overview of Natural Language Processing (NLP). Michael based his presentation on the popular textbook by Jurafsky and Martin, an ambitious one-day format. He managed to cover all content with engaging teaching method such as interactive hands-on activities. The event was on-site at the BCS headquarters, so it could benefit from interactive exercises. Six participants attended Tutorial one, who give very positive feedback to their learning experiences and the delivery. After a long period of pandemic-induced online tutorials and events, also at Search Solutions, it was a pleasurable experience to have a face-to-face format again, also due to the very encouraging and interactive nature of the delivery. Topic-wise, the tutorial gave some introduction into the basics of NLP and continued with a state-of-the-art tutorial on machine translation.
Ingo Frommholz and Jochen Leidner moderated an event entitled “Practitioners’ Evaluation Roundtable”, and its motto was a telling “Not A Tutorial”. While in the academic world several initiatives such as TREC, MediaEval or CLEF are trying to evaluate different search solutions and algorithms for specific tasks, evaluation is an essential procedure for productive systems as well, which are usually more complex than academic prototypes and used by thousands of users. In the spirit of the main Search Solutions conference the aim of this roundtable was to shed some light into the different evaluation needs in productive environments and whether both sides, academia and industry, can learn from their respective practices.
To this end, Ingo and Jochen set the scene with a presentation on the Cranfield paradigm, a recap of well-known search evaluation metrics and a reflection on how evaluation and its challenges can be different in industry from evaluation found in academic papers. From there, the 13 attendees engaged in a multi-faceted conversation ranging from challenges of interactive evaluation to evaluation of fashion recommender systems. Some of the main outcomes of the discussion are as follows. While some well-known measures such as recall and precision are indeed used in industrial evaluation, a mere Cranfield-style evaluation would not be able to gain the required insights, for two main reasons.
Firstly, relevance criteria are different and not just topical. Situational relevance is often important, but sometimes it is not even possible to specify certain relevance criteria. The main metric of success, in the end, is purchase! Hence, user engagement and along with this, web site analytics, time on site, click-through rate, to name but a few, are important.
This leads to the second reason: whereas academia is often mainly concerned with the quality of search algorithms, industrial evaluation of search solutions often needs to put the user in the centre, for above-mentioned reasons. However, it has to be said that user-orientation is picked up by the academic community as well, for instance if we look at the successfully running CHIIR conference series; here lies further potential where academia and industry might establish a closer collaboration. Besides these main issues (relevance criteria/metrics and user orientation/engagement), other issues were raised, particularly by industrial participants. With purchase as the most important success criterium in the end, a benchmark or a comparison to other baselines or systems (essential for academic evaluation) is often not required in an industrial setting. (Curated) metadata and data organisation seem to be important, but also examples were reported where product titles on shopping websites were deliberately chaotic and mixed with different keywords to match more keywords that users search.