ECIR 2023 Industry Day

The European Conference on Information Retrieval (ECIR) seems to have emerged from the pandemic stronger than ever. This year, in Dublin, saw the highest number of attendees ever, I’m told, at over 380 in-person and virtual attendees. I’m not a computing academic, so I don’t sit through the academic presentations, but the last day of the conference is termed the Industry Day, and is designed to straddle the divide between the academy and the real world. This year, there were around 60 stalwarts who stayed the course for a very full day of no fewer than 13 presentations. What makes the industry day exceptional is the range and number of questions asked by the audience: there is an informal air to the event that, I think, encourages discussion. This was a conference where the questions were not to display the questioner’s knowledge, but to provide a vital reality check. Have you tried this with users? What do you do about fake news? Is there a feedback loop once it goes live?

Some of the presentations inevitably felt rushed, as if it was a challenge to squeeze as many slides as possible into the 15-minute limit. Nonetheless, the goal of the day was achieved. Here were practitioners talking about information retrieval and, for the most part, on how it worked in practice.

An exemplary session was by Ramsey Haddad of Bloomberg, describing how his team provided not only what he called “dedicated search” (the search functionality on the main Bloomberg platform), but also “search as a service”, providing search functionality for other divisions within Bloomberg, with today hundreds of what he called “tenants” using the Bloomberg version of Apache Solr. There was a useful comparison of the requirements of providing search for users directly, and providing search tools somewhat like a hosting company provides content hosting for societies.

Another excellent insight was provided by Paul-Louis Nech of Algolia, who described how image retrieval search can be built and combined with text. His talk became more vivid because of the way he quoted real-life statistics as the background to the coding decisions. For example, users expect instant interactions with their search interface: a 0.5 second delay causes a 20% drop in traffic, while on average users pick a movie in 1.8 seconds (I think this explains why the films I choose on Netflix are so bad). He clearly described the benefits of a workflow that comprised vectorization of images followed by hashing, and provided links to relevant academic articles for those who wanted to learn more.

Not surprisingly, many of the talks described using neural tools and Chat GPT, together with entity resolution, often linked to a taxonomy: that seems to be the standard approach for search interfaces these days.

If the day had a fault, it was that some of the presentations revealed almost in passing that they had not been tested in anger: the innovation they described had not been tested with real users. That, of course, can make a huge difference. Once users interact with a system, the best theory might turn out to be less than realistic. For example, the presentation about an AI-based service for job selection sounded impressive indeed, until you noticed there was no mention of bias, and you remembered some of the cases of bias involving AI in education, for example one revealed by Cathy O’Neil in her Weapons of Math Destruction.

As usual, there was a sprinkling of terminology that I will have to look up in my own time – I was no wiser what a “cluster centroid” is after the presentation than before, and I still don’t know what the Yule-Simon equation does. But if that is the homework required to see what is happening in the world of search, I’m happy to devote some time to it. As one former academic, now working for a search company, said to me in the break, he comes specifically to this event to keep up with what is going on in the industry. I don’t know a better (or more rewarding) way of doing that.

Leave a comment Cancel reply