Applied research at the Competence Center Information Retrieval and Machine Learning of DAI Laboratory, TU Berlin

The Distributed Artificial Intelligence (DAI) Laboratory at Technische Universität Berlin, headed by Prof. Dr. Sahin Albayrak, works on providing solutions for a new generation of systems and services to support our everyday life, coined as  “smart services and smart systems”. The institute currently employs over 100 researchers, post-docs, graduate students, and support staff. The main objective of the lab is to provide a bridge between Academia and Industry, which has led to strong ties with leading multinational companies, research institutes and various SMEs (Berlin is home to a large share of Europe’s ICT start-ups).

In order to apply research in the field, research at DAI Lab is performed in six competence centers, covering diverse domains such as IT security, Network and Mobility, Information Retrieval & Machine Learning, Agent Core Technologies and Next Generation Services. Joint research efforts are coordinated by the lab’s six Application Centers. Further, DAI Lab provides a fully integrated and interconnected home IP infrastructure (referred to as Living Lab) to develop and evaluate novel technologies in a real environment.

Research on the semantic collection, intelligent processing and extensive analysis of data and information is performed within the Competence Center ”Information Retrieval & Machine Learning“ (CC IRML).  The group currently consists of three postdoctoral researchers, 11 PhD students and 11 student assistants. An overview over our research interests and current projects is outlined below. Further information is provided in the group’s blog and communicated via Twitter.

Knowledge Acquisition and Data Analysis

CC IRML focuses on the development of methods and tools to enrich textual and multimedia data from different sources. For example, we employ NLP techniques to extract named entities or to detect opinions from text documents, provide semantic annotations and focus on the identification of semantic relationships between items. Another research direction is the study of methods for a content-based analysis of video content, e.g. to detect violence in movie scenes. This research is applied in different domains such as preventive health, ethics-guided video analysis and the analysis of daily news.

GeM – Multi-Lingual Preventive Health Services:

As evident by the increasing number of health information portals that are available online, preventive health services are becoming an increasingly important part of personal care to ensure fitness and health. Although health care providers intend to provide health information services to all their clients, immigrants have been identified as vulnerable population that benefit less from existing health care systems since language and cultural barriers prevent them from using existing prevention services. In order to address this issue, we developed within the GeM project a multi-lingual health prevention service that guarantees personalised access to professionally created health care content. The system builds heavily on semantic technologies to perform the task of multi-lingual information supply.

Semantic multi-lingual health search system where the query is converted into semantic concepts

INAMET – Analysis of daily news:

In the INAMET project, users are taken on a tour to explore past years‘ events. INAMET analyses a large number of German news articles with regard to their topic and time of publication. Events such as the Tsunami impact in Fukushima and the catastrophic reactor accident in 2011 for example can be aggregated to the topic “Nuclear energy” and can be connected to the nuclear phase-out debate in Germany. In this way, INAMET provides the user a hierarchically structured news overview. Interpreting the news events over time also makes it possible to extract and track opinions and resonance in the media about popular topics, persons or organisations. Further, we focus on extracting direct and indirect quotations from the masses of daily news articles to analyse the opinions they contain. A distributed clustering algorithm sorts the recognized events and quotations into topic areas of different levels of abstraction. These topic areas are then arranged in a dynamic hierarchy and presented to the user. This provides users with an overview of the most relevant news topics of different time periods and of how popular opinions on various topics have progressed over time. A demo of the quotation extraction system can be found here.

Screenshot of the quotation extraction demo

VideoSense – Ethics-Guided Video Analytics:

VideoSense is a Network of Excellence within the 7th Framework Programme and aims to integrate leading European research groups to create a long-term open integration in the twin areas of ethically-guided and privacy preserving video analytics. The interaction between new data intelligence technologies against citizen’s norms and expectations is becoming a central issue in video surveillance and deserves in-depth study and solution seeking. VideoSense efforts will provide added-value to security needs both from the technical perspective and from the ethical and regulatory one. In particular, VideoSense conducts joint collaborative research studies focused on the most challenging problems and outstanding issues to be resolved in the multimodal content analysis of surveillance videos.

Within VideoSense, we aim to build a standard framework for multi-modal surveillance video analytics. The framework shall provide solutions to the most challenging problems in video surveillance, including but not limited to:

  • Feature Analysis for detecting discriminative properties
  • Suspicious / interesting object detection (e.g., left object detection in public places), and
  • Suspicious event detection (e.g., two or more people fighting, people loitering).
Research within VideoSense: Human Activity Recognition in Image Sequences

Personalisation, Recommendation and Adaptive Retrieval

Aiming to assist users in their information gathering task, we concentrate on the development of user modelling techniques that can be applied to adapt search results, i.e., by collecting, aggregating and understanding user data and their behaviour. Further, we focus on the development and evaluation of recommender systems. Recommender systems use any available information to predict what is and will be interesting for the users. Current projects include LSR and EPEN.

LSR – Learning Semantic Recommenders:

Within the LSR project, we developed the movie recommender system “SemanticMovie”. The system offers personalised movie recommendations based on semantic relations such as the director, artists involved, genre, etc. Relations are extracted from the well-known Internet Movie Database (IMDb) and mapped in a semantic graph. For each movie in the database, semantically related movies are identified by employing adaptive weighting schemes and different combination strategies. In order to provide users an impression of unknown entities, recommendations are enriched with movie snippets, trailers and explanations that have been derived from the respective semantic relations. The graphical user interface of the system provides features to manually adjust these weighting scheme or to filter out movies of specific categories or types of movies, thus allowing the user to adapt recommendations based on their respective preferences. In addition, SemanticMovie exploits explicit user feedback to learn user preferences and adapt future recommendations. A live demo of the system is available online.

EPEN – A Personalised News Article Recommendation System:

The EPEN project investigates how recommendation algorithms contribute to automatically provided relevant news items to readers. The project focuses on three significant aspects of recommendation algorithms. First, we develop an evaluation strategy to accurately measure the system’s utility for its users. Using implicit feedback as well as uncertainty about user data represents major challenges. Strategies to combine individual recommendation algorithms allow to consider several relevance criteria – such as thematic similarity to previously read articles, trends and other users’ tastes. Such strategies require weighting methods whose optimal parameterisation will be analyzed. Moreover, we examine to what extent preferences in one scenario (e.g., articles about sports) can be used on other scenarios (e.g., articles about economy).

Example click-through rates of different domains

Enterprise and Desktop Search

DAI Lab has its roots in the development of software agent technologies and has developed a strong expertise in this field. With JIAC, one of the leading agent frameworks is developed at DAI Lab. We employ the software agent approach to further study and enhance enterprise and desktop search with distributed indices and aggregated search capabilities. Efforts are mainly focused on the development of PIA Enterprise, an enterprise search engine that has the goal to assist employees to fulfill their daily information gathering tasks.

PIA Enterprise:

The PIA Enterprise system provides access to content from multiple sources within an enterprise such as intranet, web, databases, mails and user desktops whilst taking into account privacy and user rights.

The system provides quick access to information and offers personalised continuous information supply to inform users once new content is available.

PIA Enterprise UI

PIA Enterprise, developed in cooperation with the IT-Dienstleistungszentrum Berlin, is currently rolled out in the Berlin public administration to offer a Berlin-wide search for internal and external documents. A live demo with limited functionalities is available online.

Pattern Recognition and Machine Learning

New methods of machine learning are explored and developed, e.g., to detect patterns in time series. In the scope of a government-funded project to promote electro-mobility, CC IRML studies consumption patterns of electrical car fleets, e.g., to optimize charging processes. Further, we collect and analyse car engine data from various sensors to detect typically driver patterns that help the car manufacturer to optimize their engine development process.

Regioabsicherung:

The project Regioabsicherung is a research cooperation with Volkswagen AG.  Within the scope of the project, engine data, collected from various sensors, is evaluated to detect typical patterns. These patterns will be used to optimize tests to ensure the compliance with EU limits. Therefore, new machine learning methods are explored and developed which are capable to process the huge amount of engine data efficiently and intelligently.

Patterns in car driving data
Patterns in real car driving data

IPIN Integrationsplattform Intelligente Netze (Integration Platform Intelligent Networks):

The IPIN project is part of the government-funded “Schaufenster Elektromobilität” framework programme for the promotion of electromobility. Within IPIN, we develop a data integration and analysis platform that combines the collected data of all projects in this showcase programme and unifies the view on the data. Based on this, we develop an analysis tool that detects patterns in the energy usage and builds the basis for further optimisations.

Energy load distribution of different energy consumers

Dissemination in 2012/13

Apart from publishing their research in journals, conferences and workshops, members of CC IRML are and have been involved in the organisation of various conferences (e.g., RecSys’12, MMM’14, ICMR’14), have chaired sessions at international conferences (i.e., at MMM’12, PIM’12, UMAP’12) and co-organised several workshops, including:

Besides, CC IRML frequently showcases its expertise at national and international expos (e.g., CeBIT, CeBIT Asia).