Search Solutions 2019 – conference report

It is not easy to sit through a day of presentations on a wide range of topics and at the same time make good enough notes to prepare this detailed account of the meeting. I am grateful to Alberto Purpura for writing this excellent account.

Search Solutions is the premier UK forum for presentation of the latest innovations in search and information retrieval. This year’s event featured lots of interesting talks covering many of the challenges in different Information Retrieval (IR) areas: from how to benefit from Neural IR models in the task of Job Search, to how search engines can help in the healthcare domain.

In the first session, Andreas Kaltenbrunner (NTENT) and Henning Rode (Textkernel) discussed the challenges in the semantic mobile search and job search domains, respectively.

Andreas Kaltenbrunner, NTENT, “Semantic Mobile Search”.

NTENT offers a customizable search platform which integrates easily to any system, to deliver and monetize the most relevant content based on user intent.

Andreas gave us an overview of their Natural Language Processing (NLP) pipeline: from information extraction to documents ranking. He showed how NTENT combines language-dependent (lexicon) and language-independent (ontologies) resources with linguistic rules and AI-powered tools to extract information from documents. Then, he gave us an overview of how they select specialized sources – called “Experts”, such as Yelp – to answer their users’ queries, after understanding their search intent.

Henning Rode, Textkernel, “Combining Deep Learning Matching and Traditional Keyword Search for Job Search”.

Textkernel is a platform for job search with high quality multilingual CV parsing and advanced semantic search tools.

Henning started presenting an overview of their information extraction pipeline, highlighting the challenges of CV parsing and the limitations of traditional lexical search models. Then, he continued describing the research efforts of Textkernel towards the implementation of an all-neural information retrieval pipeline, starting from the usage of Doc2Vec to Convolutional Neural Networks (CNN)-based models. Finally, he covered some hybrid approaches which exploit neural models only for reranking or combine traditional features used in lexical models with neural approaches.

In the second session, Allan Hanbury (contextflow and TU Wien) and Roger Tritton (Cochrane) presented two great examples of how search engines can be successfully employed in the healthcare domain to help doctors and healthcare workers in their diagnoses. Then, Nicolas Fiorini and Pauline Chavallard (Doctrine) presented the challenges of search in the legal domain and how they tackled them.

Allan Hanbury, contextflow and TU Wien, “Search to Support Radiologists: from a research prototype to a product”.

Contextflow is a platform for radiology image search, powered by deep learning, which allows radiologists to save time and capitalize on the large archives of radiology images available in hospitals. The platform allows a radiologist to select a pattern from an image and use it to search similar ones from an archive of 3D images. Contextflow also allows radiologists to access the diagnoses associated to each search result, supporting them in their decisions on uncertain situations.

Allan told us the story of how the company started: from a EU-funded academic project (Khresmoi), to a successful startup. He showed us the numerous challenges their academic project had to overcome to find its way to become a polished product to be put on the market, and how they benefitted from from a startup incubator project to reach their customers.

Nicolas Fiorini and Pauline Chavallard, Doctrine, “Document understanding as a way to improve search relevance”.

Doctrine is a successful French startup offering a legal research and analytics software for legal professionals, law firms, companies, and individuals to search through the law, manage their cases, and grow their businesses.

Nicolas and Pauline started giving us an overview of the challenges in legal search, such as the volume of data they have to search, their heterogeneity, and the difficulties coming from the domain specificity of their search task. Doctrine offers its customers a familiar experience mimicking the simple interface of Google search, simplifying the complex task of legal search. They showed us how they leveraged on the regular structure of some legal documents to extract useful search meta-data and how they worked out different solutions for document understanding such as paragraph classification for less structured data.

Roger Tritton, Cochrane Innovations, “PICO search at Cochrane”.

Cochrane is a not-for-profit large community of researchers, professionals, and others from all around the world, creating high-quality, accessible systematic reviews. PICO search, is a platform for decision-making support in the healthcare domain, based on the content produced by the Cochrane community. In particular, they focus on the creation of systematic reviews and of other synthesized research evidences.

Roger explained us the details of PICO (Population, Intervention, Comparison, Outcome) search, a strategy specific to the healthcare domain where its acronym summarizes the key factors relevant for a clinical research question. Then, he showed us how PICO search works in practice and how users can benefit from this new search engine available here.

In the third session, Benjamin Braash (Raytion), Dawn Anderson (Bertey), Tim Crawford and David Lewis (Goldsmiths, University of London) talked about search in the enterprise domain, the relation between Search Engine Optimization and IR and on how to perform Music Information Retrieval, respectively.

Benjamin Braasch, Raytion, “Evolution of Hybrid Search: What’s next”.

Raytion is an internationally operating IT-business consultancy that implements state-of-the-art information management and corporate communications solutions.

Benjamin presented us the challenges of building and maintaining a cloud-based enterprise search platform which integrates the search solutions offered by companies such as Google and Microsoft. He also discussed other important factors in the enterprise search domain such as data privacy and security. Finally, he presented us a demo of the search engine offered by Raytion.

Dawn Anderson, Bertey, “Connecting the worlds of IR and SEO”.

Dawn told us the story of how she, coming from the Search Engine Optimization (SEO) field, learned about Information Retrieval (IR), and showed us how much SEO and IR are related to each other. Indeed, as IR is quickly evolving, SEO is changing as fast, keeping up with the research advancements coming from the NLP and Machine Learning areas.

SEO connects many different fields from marketing to data science and IR, but she clearly showed us how the knowledge from the IR field made her better at SEO. In fact, SEO is a key interface for businesses to interact with search engines and knowing how the latter work can be of great benefit. Finally, she invited more researchers from the IR community to participate to more SEO events.

Tim Crawford and David Lewis, Goldsmiths, University of London, “Searching page-images of early music”.

Tim and David introduced us to the task of Music Information Retrieval (MIR). In particular, to the task of retrieving written music from historical archives. They explained us the main techniques to parse music sheets and to represent the information to make it searchable. Finally, Tim and David showed us how their search engine works and how it can even help research in the musicology field. A demo of their work is available here.

In the final session, Ryan Mcdonald (Google) and Matteo Venanzi (Microsoft) talked about the relation between NLP and IR, and presented an approach for the automatic creation of knowledge bases, respectively.

Ryan Mcdonald, Google, “NLP and IR: How deep learning has bridged the gap”.

Ryan presented us a few opportunities of how NLP can help in the IR task. He started showing us how, from the early years of both of these fields, NLP techniques were employed in IR and vice-versa. Then, he showed us a few examples of how NLP techniques such as BERT, were used at Google to improve retrieval quality and the impact they had in the research community.

Ryan concluded highlighting how IR and NLP could work in synergy especially on the task of information extraction and presentation to users.

Matteo Venanzi, Microsoft, “Model-Based Knowledge Mining for Enterprise Search”.

Matteo presented us Alexandria, an exciting new approach developed by Microsoft for the automatic creation of knowledge bases, based on probabilistic programming. He started giving us some background on probabilistic programming and showing us its potential for the creation of knowledge bases. Then, he gave us some practical examples on how Alexandria worked and how it is able to infer new types of knowledge base facts without the need of large sets of labelled training data thanks to probabilistic programming.

Leave a comment Cancel reply