by Jochen L. Leidner and Ingo Frommholz
SIGIR 2019 in Paris in July. This is a report of some of the highlights of ECIR this year with no claims of representativeness.
Part I – Impressions and general Trends
Search is alive and happy as a field of research. Both theoretical and applied papers were presented, and the conference is growing. This year, for the first time in 40 years, the printed proceedings came in two volumes. This year, both main search events, ECIR and the forthcoming SIGIR are held in Europe, which is trying to catch up given the traditional US/China co-dominance. ECIR was attended by people from the USA to China, from India to Israel, from Austria to Australia: it was good to see this European conference has its geographic scope steadily growing. Out of all 165 full papers submitted from 50 countries, 39 “long paper” contributions were accepted (23% acceptance rate). The themes that were used to structure these were:
– Modeling Relations
– Classification and Search
– Knowledge Graphs
– Recommender Systems I + II
– Query Analytics
– Representation
– Reproducibility I – Systems
– Reproducibility II – Application
– Neural Search
– Topic Modeling
– Metrics
– Image Search and
– Question Answering and Conversational Search.
Industry presence in the field is increasing, hence ECIR 2019 offered an Industry Day after the main conference. The value of technology to improve effectiveness in search is now broadly recognized, and financial services companies employ teams for search and machine learning, contribute to open source and collaborate with academia more. There is also more openness. Both large and small companies who in the past were concerned about revealing commercial secrets now publish their findings in scientific publications to demonstrate their thought leadership. They want to stand out in a space that is getting more crowded in order to distinguish themselves from many companies that try to make noise and call themselves “AI” companies without merit. For example, search engines for CVs/resumes (Textkernel), media and brand monitoring suites (Signal Media, SUMMA/Deutsche Welle), and financial search systems (Bloomberg) were presented at the Industry Day. Several attendees saw this Day as a highlight of the conference. Bloomberg described how they switched their search to a machine learning approach, and Jeremy Pickens from Catalyst Repository Systems presented a talk on legal discovery that showed what kind of detective work can be involved in the unearthing of financial fraud. On par with it was a talk by Allan Hanbury about the incubation of a new browser for radiologists. He described the success story of an EU Horizon 2020-funded research project from research prototype to a successful startup with customer-validated and paying customers, including some informative pitfalls and challenges the team faced. These talks were my highlight as far as applications go, and indeed Jeremy’s presentation was voted the audience’s choice for best presentation. Since we witnessed the counting of the votes we can reveal that several talks got multiple nominations for best presentation, indicating the high quality and diversity of the program. In terms of methods, neural retrieval and classification methods and various forms of embeddings are now firmly established as part of the mainstream. The replication track, which sets out validate earlier work by others, has also become part of the standard features of the conference.
Apart from the presentations at the main conference, from which some highlights are summarised below, there have been two keynote talks. The first one was delivered by the winner of the Karen Spärck Jones Award, Krisztian Balog, who talked about entity retrieval and linking as well as evaluation by introducing the “living labs” methodology. The second keynote was presented by Markus Strohmeier, who talked about the challenges and biases when people are ranked instead of documents.
Part II – a technical summary and annotated list of some technical papers (with a slight finance bias)
The Technical
Fisher et al. describe a method for recognizing summary articles among news feeds, based on a two-stage approach (segment + classify); this is a joint paper between Signal Media and the University of Essex. This is important so that summary articles can be filtered out from newsfeeds, as they constitute essentially noise for many tasks (e.g. event extraction). Kim and Allan described a new algorithm for controversy detection (a topic we are also working on at the moment); their method is based on the Expectation Maximization (EM) algorithm and therefore unsupervised, however, whereas our approach uses supervised machine learning (more in the near future).
In recommender systems there often isn’t a lot of negative evidence (people tend to log what was clicked, not what was NOT clicked), which creates a problem of missing negative training data for downstream machine learning. An unsatisfactory but common fix is to assume what wasn’t clicked wasn’t relevant, which is often untrue. (Comment from the Editor – my experience in enterprise search is that highly relevant documents are not clicked because the user already has them in their personal collection.) Khavar and Zhang introduce a method, “Conformative Filtering”, which mitigates this somewhat, by moving from the notion of “unclicked by one user” to “unclicked by most/all users”.
The paper by Almquist and Jatowt on predicting content expiry was one of my personal favorites this year: they analyze documents in order to determine the temporal relevance, in other words, “how long is this sentence/document relevant?”. Clearly, a document about an IPO next week has a temporal relevance of perhaps 1-2 weeks, whereas a concert last year is no longer relevant at all in most search situations.
Almasian, Spitz and Gertz presented a method for constructing embeddings for entity-annotated texts. Such methods are pivotal in making better comparisons between documents as well as better knowledge graph searches in natural language.
Kurisinkel, Zhang and Varma described a neural sentence compression technique, which can be used for summarization, and which they successfully evaluated on two corpora, a Google News corpus and a subset of the British National Corpus (a representative sample of British English).
Last but most importantly, Vakulenko et al. presented “QRFA: A Data-Driven Model of Information-Seeking Dialogues”, an annotation schema and theoretical model of information seeking dialogs.
One of the most insightful papers was Gupta et al., who measured the correlation of 23 popular evaluation metrics over 8 TREC test collections.
Several works addressed finding information from technical content sets, such as scientific papers: retrieving figures (Kuzi and Zhai), retrieving similar mathematical formulas (Davila et al.) – clearly, the trend is to go beyond document-level search and to break down documents into elements of individual value or relevance. Stork et al. presented a fascinating talk about digitization efforts at the Smithsonian, where collections of animal and plant specimen need to be made accessible together with handwritten descriptions of the same. Scanning and searching handwriting is of course a harder challenge, in particular in the absence of training data and given the individual styles of the many writers.
Vo and Bagheri presented a method for enriching knowledge graphs with underlying temporal ordering of events.
Donnelly and Roegiest, in an interpretability paper, refuted a claim by others that a single “sentiment neuron” can be identified in an LSTM-based sentiment classifier. There is a broader message here in that interpretability research requires careful methodology.
Alshomary et al. study Wikipedia text reuse on the Web, and come up with an estimated $5.5m USD monthly ad revenue estimate generated from verbatim or modified copies of Wikipedia.
Bi, Ai and Croft’s new iterative relevance feedback method outperforms Rocchio (a seminal method) on an answer passage retrieval task.
On the mobile side, Wicaksono, Moffat and Zobel’s paper on Modeling User Actions in Job Search is a nice paper that shows how to do user modeling in a mobile app that has search at its core, in this case job search – this can easily be used as a template or at least inspiration for other search tasks (van Dijk et al. is another paper looking at sessions in search systems).
The Non-Technical
The conference was rounded up by a social dinner, a city tour (including a tour of the iconic cathedral of Cologne) and a volunteer experiment with attendees to get anonymous data about “who talks to whom” at the conference (last photo). We suspect next year’s ECIR will feature a social network type paper about that data-set, which will be opened up.
Next year, ECIR 2020 will be held in Lisbon; we hope to see you there!