Towards Search Standardisation

The EU-funded COST Network IC1002 (http://www.mumia-network.eu/ ) is a four year (2010-2014) networking programme which aims to promote collaboration between researchers and professionals working on Multilingual and Multifaceted Information Access (MUMIA), principally in Information Retrieval, Machine Translation and related topics. More than 250 scientists and professionals from 28 COST countries and 4 non COST countries participated in the Action activities during its operation. One of the areas in which the network is active is the development of standards for search systems. This networking activity was mainly motivated by previous work and discussions that were developed inside the network about integrating various Information Retrieval and Natural Language Processing (IR/NLP) technologies. During early 2013 two internal Working Group meetings and a Workshop (at ECIR in Moscow) were organised to discuss the problems of Integrating IR/NLP tools for professional search systems.

These meetings and discussions during the workshop revealed the need for standards and protocols and led to the organization of another Working Group meeting to specifically discuss this challenge. This initial meeting on this topic was held in Thessaloniki, Greece on the 19 November 2013. The meeting had the following objectives:

Bring together different stakeholders to discuss and prioritize the needs and challenges for developing standards and protocols in the domain of search technologies.
Explore the best ways to launch a standardization initiative for creating standards and protocols for integrating IR/NLP search tools and technologies

It is expected that the development of suitable standards will make it easier for researchers and practitioners to exchange software and to adapt functionalities from one system to another when building applications. At that meeting two sorts of standards were identified: Component-based standards (focussing on API’s and the like) using massive decomposition of search systems and Conceptual-based standards, focussing more on defining concepts and data structures to facilitate “standard- enabled” exchange of information between search/NLP tools and also on coordination architectures and overall capabilities. Some invited participants with previous experience of getting standards approved and adopted provided the active MUMIA members with a great deal of useful information about standards bodies, the way they operate, and how standardisation activities work in practice. Topics which were discussed included:

Scope and objectives of a potential standard in this area.
Are we seeking system or conceptual standardization? Functional decomposition: which components / interfaces / APIs would offer themselves for standardization?
Which standards exist that we would need to consider?
IR/NLP technologies that should be first prioritized in view of developing such technical standard and protocols.
Domains in which standards could usefully be adopted (e.g. web search, patent, medical, bibliographic etc) and how they would benefit from standards.
Experiences about using other standards in search systems development (e.g. open search protocol).

A follow up meeting was organised in Amsterdam (Netherlands) co-located with ECIR 2014 on 11 and 12 April 2014.

In the Amsterdam meeting there were 35 participants from 20 countries, including David Fisher from the LEMUR project, Iadh Ounis and Craig Macdonald from Terrier project, and Peter Mika from Yahoo! Efforts had been made to involve participants from Bing!, Google, Yandex, ElasticSearch and the Lucene/SOLR community, but unfortunately these were unsuccessful, although some expressed significant interest in the standards activity.

The meeting’s principle mode of operation was in the World Café model. We began with some introductory talks, covering the conclusions of the previous meeting, the experience of the development of search systems, and an introduction to the World Café process. We moved on to a brainstorming session involving the whole group in which we identified, first of all, a long list of candidate topics we could discuss, and then a weeded and merged list of five specific topics on which we would focus of the rest of the meeting. Those topics were:

Content Representation & Text Processing
Indexing
Input-Output and Adaptability
Retrieval
Low Hanging Fruits

(Please bear with us if these topic labels are little obscure – they were more meaningful in the context of the meeting!).

Each of the topics was assigned to a discussion table, with a host who on this occasion also acted as a recorder. Iadh Ounis assisted by Craig McDonald, Mike Salampasis, Fernando Loizides, Parth Gupta, and David Fisher kindly volunteered to act as table hosts respectively for each topic from the five identified. We had “switching times” every 20 minutes, and so participants spent 20 minutes at each table discussing each of the topics, with the table hosts summarising the conclusions of previous rounds at that table. People were encouraged to move fairly randomly between tables rather than stick together as a group. In this way everyone was able to contribute in every topic, to spot the key issues/inhibitors in each of the identified topic. The first day concluded with a brief summary of the common themes and conclusions which had emerged during discussion at each table.

The table hosts wrote up a short report on their topic overnight, and the early part of the second day was spent finalising these reports with participants. We plan to more extensively work on these short reports and create a more comprehensive paper reporting about the outcomes of the two Working Group meetings. The standards meeting concluded mid-morning with a plenary session on next steps and actions. During this plenary session the decision which was taken in Thessaloniki to produce a “white paper” on standards and protocols for search systems was reinforced. The white paper will:

describe best practices to be potentially adopted from stakeholders of search industry and also academia wanting to increase re-usability and interoperability of their tools, and
Propose recommendations for a future formal standardization activity.

A working draft of the White Paper and a programme for its development will appear on the Mumia Web Site (http://www.mumia-network.eu/) by June 2014.

Acknowledgement: This piece was co-authored by John Tait.