{"id":4236,"date":"2016-04-26T11:46:40","date_gmt":"2016-04-26T10:46:40","guid":{"rendered":"https:\/\/irsg.bcs.org\/informer\/?p=4236"},"modified":"2016-04-26T11:46:40","modified_gmt":"2016-04-26T10:46:40","slug":"first-international-workshop-on-recent-trends-in-news-information-retrieval-newsir16","status":"publish","type":"post","link":"https:\/\/archive-irsg.bcs.org\/informer\/?p=4236","title":{"rendered":"First International Workshop on Recent Trends in News Information Retrieval (NewsIR\u201916)"},"content":{"rendered":"<figure id=\"attachment_4423\" aria-describedby=\"caption-attachment-4423\" style=\"width: 200px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-4423   \" title=\"NewsIR_Tafel\" src=\"https:\/\/irsg.bcs.org\/informer\/wp-content\/uploads\/NewsIR_Tafel.jpg\" alt=\"\" width=\"200\" height=\"300\" \/><figcaption id=\"caption-attachment-4423\" class=\"wp-caption-text\">News + IR = NewsIR&#39;16 (demonstrated by our distinguished keynote speaker Jochen Leidner)<\/figcaption><\/figure>\n<p>The <em>First International Workshop on Recent Trends in News Information Retrieval<\/em> (<a title=\"Recent Trends in News Information Retrieval (NewsIR\u201916)\" href=\"http:\/\/research.signalmedia.co\/newsir16\/index.html\" target=\"_blank\" rel=\"noopener\">NewsIR\u201916<\/a>) was held\u00a0as part of\u00a0the <a title=\"ECIR 2016 - 38th European Conference on Information Retrieval\" href=\"http:\/\/ecir2016.dei.unipd.it\/\" target=\"_blank\" rel=\"noopener\">ECIR 2016<\/a> conference in Padua (Italy) on 20 March 2016. The workshop provided an opportunity for a diverse group of stakeholders\u2014researchers, professionals and practitioners\u2014involved in news-related information to come together and discuss the latest and most powerful uses of IR technology applied to news sources. Interest in the subject, combined with appropriate management and good advertisement, resulted in a fairly successful turn-out: more than 40 attendees were present in the room at all times and for all sessions. To take full advantage of the wide range of perspectives brought by the participants, the presentations were brief and allowed time for questions. Additionally, there were plenty of opportunities for networking during the coffee and lunch breaks, poster and break-out sessions, and the welcome reception at the end of the day.<\/p>\n<p>&nbsp;<\/p>\n<p><!--more--><\/p>\n<h2>1\u00a0\u00a0 The Origins of NewsIR\u201916<\/h2>\n<p>The use of news collections to support IR research has been a common practice for several years. Indeed, the <a title=\"Reuters-21578 Text Categorization Collection Data Set\" href=\"https:\/\/archive.ics.uci.edu\/ml\/datasets\/Reuters-21578+Text+Categorization+Collection\" target=\"_blank\" rel=\"noopener\">Reuters-21578<\/a> collection, which contained Reuters newswire material, was compiled in 1987. Other collections employed for IR research, such as those in some tracks of the <em>Text REtrieval Conference<\/em> (<a title=\"Text REtrieval Conference (TREC)\" href=\"http:\/\/trec.nist.gov\/\" target=\"_blank\" rel=\"noopener\">TREC<\/a>) and the <em><a title=\"CLEF Initiative (Conference and Labs of the Evaluation Forum)\" href=\"http:\/\/www.clef-initiative.eu\/\" target=\"_blank\" rel=\"noopener\">CLEF Initiative<\/a><\/em>, also comprise feeds from news broadcasters and other news-related data. However, it would be a mistake to assume that all problems associated with news search have been successfully \u201csolved\u201d. Moreover, mainstream media outlets are often among the most prominent sources of information\u2014ranging from the influence that newspapers have on elections to the damage to a brand\u2019s reputation that a negative article on a popular blog can cause.<\/p>\n<p>Realising the importance of news datasets for the IR community, a group of participants who attended <a title=\"Conference Review: ECIR 2015\" href=\"https:\/\/irsg.bcs.org\/informer\/2015\/04\/conference-review-ecir-2015\/\" target=\"_blank\" rel=\"noopener\">ECIR 2015<\/a> in Vienna started to talk about the use of news articles as input to various tasks, ranging from news recommendation to temporal summarisation or real-time clustering. It emerged that there was a good number of researchers interested in news-related IR. Yet, it seemed that the available collections for this kind of research were outdated and usually represented biased versions of the news\u2014because they did not contain enough articles, or only a few sources were considered. Additionally, in some cases the data is cleaned and filtered, as opposed to the noisy entries found in the real word, which creates an unrealistic environment for evaluation. Consequently, <a title=\"Miguel Martinez\" href=\"https:\/\/twitter.com\/miguelmalvarez?lang=en-gb\" target=\"_blank\" rel=\"noopener\">Miguel Martinez-Alvarez<\/a>\u2014Co-Founder and Head of Research at <a title=\"Signal | Media Monitoring and Market Intelligence\" href=\"http:\/\/signal.uk.com\/\" target=\"_blank\" rel=\"noopener\">Signal Media Ltd<\/a>\u2014set up a <a title=\"News-IR\" href=\"https:\/\/groups.google.com\/forum\/m\/#!forum\/news-ir\" target=\"_blank\" rel=\"noopener\">Google group<\/a> to continue the discussion on news-related IR after the end of <a title=\"Conference Review: ECIR 2015\" href=\"https:\/\/irsg.bcs.org\/informer\/2015\/04\/conference-review-ecir-2015\/\" target=\"_blank\" rel=\"noopener\">ECIR 2015<\/a>.<\/p>\n<p>Initially, 55 people joined the <a title=\"News-IR\" href=\"https:\/\/groups.google.com\/forum\/#!forum\/news-ir\" target=\"_blank\" rel=\"noopener\">Google discussion forum<\/a> and talked about subjects such as news bias, summarisation, clustering, topic classification, entity linking, entity recognition and disambiguation, event detection and social media integration. A couple of questions that permeated the discussion from the very beginning were whether news-related information had been relegated to be \u201cless\u201d important than it really is, and whether it would be worth organising a workshop combining news and IR.<\/p>\n<p>Eventually, a proposal to start up an international workshop was drafted and submitted to <a title=\"ECIR 2016 - 38th European Conference on Information Retrieval\" href=\"http:\/\/ecir2016.dei.unipd.it\/\" target=\"_blank\" rel=\"noopener\">ECIR 2016<\/a>, where it was well received. A total of 9 full papers and 3 short papers were selected by the <a title=\"Programme Committee\" href=\"http:\/\/research.signalmedia.co\/newsir16\/organisation.html\" target=\"_blank\" rel=\"noopener\">Programme Committee<\/a> from a total of 19 submissions\u2014each submitted paper was reviewed by at least 3 members of an international reviewing group made of 30 members. Apart from the selected papers, two keynote speakers joined the workshop: <a title=\"Jochen L. Leidner\" href=\"https:\/\/scholar.google.co.uk\/citations?user=LyyUBIIRI3QJ&amp;hl=en\" target=\"_blank\" rel=\"noopener\">Jochen L. Leidner<\/a> (<a title=\"Thomson Reuters\" href=\"http:\/\/thomsonreuters.com\/en.html\" target=\"_blank\" rel=\"noopener\">Thomson Reuters<\/a>) and <a title=\"Julio Gonzalo\" href=\"https:\/\/scholar.google.com\/citations?user=opFCmpYAAAAJ\" target=\"_blank\" rel=\"noopener\">Julio Gonzalo<\/a> (<a title=\"The National Distance Education University (UNED)\" href=\"http:\/\/portal.uned.es\/portal\/page?_pageid=93,24305391&amp;_dad=portal&amp;_schema=PORTAL\" target=\"_blank\" rel=\"noopener\">National University of Distance Education<\/a>). The NewsIR\u201916 Workshop website is available <a title=\"NewsIR'16 - Homepage\" href=\"http:\/\/research.signalmedia.co\/newsir16\/index.html\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<h2>2<strong> <\/strong><em> <\/em><strong> <\/strong><em> <\/em> The Signal Media One-Million News Articles Dataset<\/h2>\n<p>To accompany the Workshop, and facilitate conducting research on news articles, Signal Media released a <a title=\"The Signal Media One-Million News Articles Dataset\" href=\"http:\/\/research.signalmedia.co\/newsir16\/signal-dataset.html\" target=\"_blank\" rel=\"noopener\">dataset<\/a> intended to serve and encourage the research of the news retrieval community. Such a dataset consists of approximately 1 million news articles from a wide range of sources.<\/p>\n<p>Originally, the articles of the dataset were gathered by <a title=\"Moreover Technologies\" href=\"http:\/\/www.moreover.com\/\" target=\"_blank\" rel=\"noopener\">Moreover Technologies<\/a> for a period of 1 month\u2014precisely, 1-30 September 2015. Most of the articles are in English, but there are a few non-English and multi-lingual articles. The sources of the articles include companies such as <a title=\"Thomson Reuters\" href=\"http:\/\/uk.reuters.com\/\" target=\"_blank\" rel=\"noopener\">Reuters<\/a>, and also local news sources and blogs. The number of individual unique sources is over 93k. The dataset contains 265,512 blog articles and 734,488 news articles. The average length of an article is 407.75 words.<\/p>\n<h2>3\u00a0\u00a0 Session 1: Media Monitoring<\/h2>\n<p>The first session was chaired by Gabriella Kazai (<a title=\"Lumi.news\" href=\"https:\/\/lumi.news\/join\" target=\"_blank\" rel=\"noopener\">Lumi<\/a>, UK) and included the first keynote, by <a title=\"Jochen L. Leidner\" href=\"https:\/\/scholar.google.co.uk\/citations?user=LyyUBIIRI3QJ&amp;hl=en\" target=\"_blank\" rel=\"noopener\">Jochen L. Leidner<\/a><strong> <\/strong><em> <\/em>, Director of Research at <a title=\"Thomson Reuters\" href=\"http:\/\/thomsonreuters.com\/en.html\" target=\"_blank\" rel=\"noopener\">Thomson Reuters<\/a>. Short descriptions of the keynote and papers presented in this session are listed below.<\/p>\n<h5>Keynote: Recent Advances in Information Access at Thomson Reuters R&amp;D \u2013 News and Beyond<\/h5>\n<p>By Jochen L. Leidner (Corporate Research &amp; Development, <a title=\"Thomson Reuters\" href=\"http:\/\/thomsonreuters.com\/en.html\" target=\"_blank\" rel=\"noopener\">Thomson Reuters<\/a>\u2014London, UK)<\/p>\n<p>Jochen is currently responsible for the Research and Development at the Thomson Reuters site in London, UK, where approximately 40 scientists and developers are engaged on research activities\u2014strategic forward-looking, and applied and contract research (no product development). As pointed out by Jochen, his keynote was largely an \u201cideas\u201d talk, as opposed to a technical talk.<\/p>\n<figure id=\"attachment_4429\" aria-describedby=\"caption-attachment-4429\" style=\"width: 300px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-4429 \" title=\"NewsIR_Jochen\" src=\"https:\/\/irsg.bcs.org\/informer\/wp-content\/uploads\/NewsIR_Jochen-300x200.jpg\" alt=\"\" width=\"300\" height=\"200\" \/><figcaption id=\"caption-attachment-4429\" class=\"wp-caption-text\">Live performance at NewsIR&#39;16<\/figcaption><\/figure>\n<p>Jochen reported briefly on some recent developments made by the Corporate R&amp;D Group at Thomson Reuters. For example, a news recommender system called <em><a title=\"All NewsPlus\" href=\"http:\/\/westlawinternational.com\/our-solutions\/all-newsplus\/\" target=\"_blank\" rel=\"noopener\">NewsPlus<\/a><\/em>, and a real-time Twitter rumour detection tool for journalists called <em><a title=\"Real-time Rumor Debunking on Twitter\" href=\"http:\/\/dl.acm.org\/citation.cfm?id=2806651\" target=\"_blank\" rel=\"noopener\">REUTERS Tracer<\/a><\/em>. From the area of pharma within IP &amp; Science, Jochen talked about <em><a title=\"Computational drug repositioning based on side-effects mined from social media\" href=\"https:\/\/peerj.com\/articles\/cs-46.pdf\" target=\"_blank\" rel=\"noopener\">SoMeDoSEs<\/a><\/em>, a pharmaco-vigilance system that uses Twitter to mine adverse events associated with medical drugs, which is used for drug repositioning. From the area of law, Jochen discussed the advanced search engine technology that powers the <em><a title=\"Thomson Reuters Westlaw\" href=\"http:\/\/thomsonreuters.com\/en\/products-services\/legal\/large-law-firm-practice-and-management\/westlaw.html\" target=\"_blank\" rel=\"noopener\">Westlaw<\/a><\/em> search engine. Finally, in the area of Finance &amp; Risk, the <a title=\"Risk Management Solutions\" href=\"http:\/\/thomsonreuters.com\/en\/products-services\/risk-management-solutions.html\" target=\"_blank\" rel=\"noopener\">Risk Mining<\/a>, a computer-supported risk register extraction application was introduced.<\/p>\n<p>According to Jochen, the future of news-related information retrieval lies on the ability to transform news into actionable intelligence. This is critical for proactively preventing and reacting to future events.<\/p>\n<h5>Boolean Queries for News Monitoring: Suggesting New Query Terms to Expert Users<\/h5>\n<p>By Suzan Verberne (<a title=\"Radboud University\" href=\"http:\/\/www.ru.nl\/english\/\" target=\"_blank\" rel=\"noopener\">Radboud University<\/a>, the Netherlands), Thymen Wabeke (<a title=\"TNO - Innovation for life\" href=\"https:\/\/www.tno.nl\/en\/\" target=\"_blank\" rel=\"noopener\">TNO<\/a>, the Netherlands) and Rianne Kaptein (<a title=\"TNO - Innovation for life\" href=\"https:\/\/www.tno.nl\/en\/\" target=\"_blank\" rel=\"noopener\">TNO<\/a><strong> <\/strong><em> <\/em>, the Netherlands).<\/p>\n<p>The paper evaluates query suggestions for Boolean queries in a news monitoring system. Users of the system receive news articles that match their queries on a daily basis\u2014but the queries need regular updates, as the news changes continuously.<\/p>\n<p>Suzan Verberne emphasised during the presentation the importance of tasks that are recall oriented\u2014i.e., tasks where missing a single relevant document is not acceptable. One of the traditional ways to address these tasks is by means of long and complex Boolean queries. This research, however, introduces a method to have candidate query terms suggested from retrieved documents.<\/p>\n<p>After presenting experimental results and qualitative feedback obtained through a questionnaire answered by the participants in the experiments, the authors conclude that the use of relevance ranking, instead of Boolean retrieval, and a post-filtering mechanism for removing non-relevant terms, will give better user satisfaction.<\/p>\n<h5>Detecting Attention Dominating Moments Across Media Types<\/h5>\n<p>By Igor Brigadir, Derek Greene and P\u00e1draig Cunningham (<a title=\"Insight Centre for Data Analytics\" href=\"https:\/\/www.insight-centre.org\/\" target=\"_blank\" rel=\"noopener\">Insight Centre for Data Analytics<\/a>, <a title=\"University College Dublin\" href=\"http:\/\/www.ucd.ie\/\" target=\"_blank\" rel=\"noopener\">University College Dublin<\/a>).<\/p>\n<p>This paper focuses on identifying <em>attention dominating moments<\/em> in online media\u2014i.e., moments when everyone seems to be talking about the same issue. To explore attention dominating news stories, three different media sources were studied: mainstream news, blogs and tweets. For the first two sources, the <a title=\"The Signal Media One-Million News Articles Dataset\" href=\"http:\/\/research.signalmedia.co\/newsir16\/signal-dataset.html\" target=\"_blank\" rel=\"noopener\">Signal Media dataset<\/a> was used. For the final source, the authors collected a Twitter corpus comprising a large set of newsworthy sources curated by journalists, instead of retrieving tweets based on keywords.<\/p>\n<p>The paper suggests that it might be possible to identify and track major developments with global impact by linking attention dominating moments across multiple sources on different platforms. Social media communities both influence and are influenced by traditional news media\u2014in fact, stories break simultaneously on both Twitter and traditional news publishers.<\/p>\n<h5>Exploiting News to Categorize Tweets: Quantifying the Impact of Different News Collections<\/h5>\n<p>By Marco Pavan, Stefano Mizzaro, Matteo Bernardon, Ivan Scagnetto (<a title=\"The University of Udine\" href=\"http:\/\/web.uniud.it\/didattica\/servizi_studenti\/international_students_service\/university.htm\" target=\"_blank\" rel=\"noopener\">University of Udine<\/a> &#8211; Italy).<\/p>\n<p>The last paper of the first session was also the recipient of the best paper award of the workshop. This paper is a part of a longer term research that aims at understanding the effectiveness of enriching tweets with information derived from the news, instead of the whole Web as a knowledge source.<\/p>\n<p>Stefano Mizzaro delivered the presentation and explained how to exploit news articles to enhance tweet categorisation using sets of words extracted from news articles with the same temporal context. Three different features of the news were tested as part of this research: <em>volume<\/em>, <em>variety<\/em> and <em>freshness<\/em>. The experiments confirmed the importance of these three features.<\/p>\n<p>Future work will look at the impact of the number of documents extracted from the news collection to categorise short texts. There are also plans to investigate which kinds of news is important to consider and which ones are marginal.<\/p>\n<h2>4\u00a0\u00a0 Session 2: News Events<\/h2>\n<h5>Semi-Supervised Events Clustering in News Retrieval<\/h5>\n<p>By Jack G. Conrad (<a title=\"Thomson Reuters Labs\" href=\"http:\/\/innovation.thomsonreuters.com\/en\/labs.html\" target=\"_blank\" rel=\"noopener\">Thomson Reuters Corporate Research &amp; Development<\/a>, Minnesota \u2013 USA) and Michael Bender (<a title=\"Thomson Reuters Global Resources\" href=\"http:\/\/blog.thomsonreuters.com\/index.php\/tag\/trgr\/\" target=\"_blank\" rel=\"noopener\">Thomson Reuters Global Resources<\/a>, Switzerland)<\/p>\n<p>The authors introduced a news retrieval system, <em>eventNews<\/em>, which employs an <em>event-centric<\/em> algorithm; thus, allowing users to monitor developing stories based on events, rather than by examining an exhaustive list of retrieved documents.<\/p>\n<p>News articles are clustered around an editorially supplied topical label, called a \u201cslugline\u201d. Decisions about merging related documents or clusters are made according to two distinct sources: a digital signature based on the unstructured text in the document, and the presence of named-entity tags assigned by the Thomson Reuters\u2019 <a title=\"Thomson Reuters Open Calais\" href=\"http:\/\/www.opencalais.com\/\" target=\"_blank\" rel=\"noopener\">Calais engine<\/a>, a named entity tagger. Human assessments were used to evaluate the system on a 5-point scale, and the average quality achieved was around 80%.<\/p>\n<p>The development of a more robust working model for eventNews is anticipated in the near future, while further work will focus on testing the recall of the system\u2014i.e., how many events are captured and represented from all the possible news events in the dataset or sample.<\/p>\n<h5>Cross-Lingual Trends Detection for Named Entities in News Texts with Dynamic Neural Embedding Models<\/h5>\n<p>By Andrey Kutuzov (<a title=\"The University of Oslo\" href=\"https:\/\/www.uio.no\/english\/\" target=\"_blank\" rel=\"noopener\">University of Oslo<\/a> \u2013 Norway) and Elizaveta Kuzmenko (<a title=\"National Research University Higher School of Economics\" href=\"https:\/\/www.hse.ru\/en\/\" target=\"_blank\" rel=\"noopener\">National Research University Higher School of Economics<\/a> \u2013 Moscow, Russia)<\/p>\n<p>This paper discusses the use of vector space models, particularly \u201cneural embeddings\u201d\u2014prediction-based distributional models\u2014to detect real-world events as manifested in news texts. Temporal shifts of the embeddings might potentially predict specific events.<\/p>\n<p>The models are trained on a large corpus consisting of English and Russian news: English text is derived from the <a title=\"The Signal Media One-Million News Articles Dataset\" href=\"http:\/\/research.signalmedia.co\/newsir16\/signal-dataset.html\" target=\"_blank\" rel=\"noopener\">Signal Media dataset<\/a>, and the Russian text comes from a corpus of news articles in Russian published in September 2015 that include 500,000 extracts from about 1,000 Russian-language news websites\u2014unfortunately, not available publicly due to copyright restrictions.<\/p>\n<p>After training the models on the \u2018reference\u2019 corpus, they are successively updated with new textual data from daily news. The approach effectively retrieves meaningful temporal trends for named entities regardless of language. Plans to continue this work by experimenting with different algorithms or parameter sets for different languages are already on their way, and preliminary tests show promising results.<\/p>\n<h5>Using News Articles for Real-time Cross-Lingual Event Detection and Filtering<\/h5>\n<p>By Gregor Leban, Bla\u017e Fortuna and Marko Grobelnik (<a title=\"Jozef Stefan Institute\" href=\"https:\/\/www.ijs.si\/ijsw\/JSI\" target=\"_blank\" rel=\"noopener\">Jozef Stefan Institute<\/a> \u2013 Ljubljana, Slovenia).<\/p>\n<p>This presentation referred to a system called <em>Event Registry<\/em> which is able to group articles about an event across different languages, and extract core event information from them in a structured manner.<\/p>\n<p>The immediate advantage that Event Registry offers to news readers and analysts is a significant reduction on the amount of content that has to be reviewed while gathering the global coverage of a particular event. Moreover, since all the event information is structured, Event Registry provides several options for searching and filtering that are not available on existing news aggregators.<\/p>\n<p>According to the presenter, Gregor Leban, traditional news aggregators overwhelm users with duplicate news articles\u2014articles referring to the same event\u2014whereas the approach followed by Event Registry, based on semantic annotation per document, event clustering and the extraction of main event facts, is capable of showing events rather than articles, providing a better alternative and a more general understanding of particular events.<\/p>\n<h5>Exploring a Large News Collection Using Visualisation Tools<\/h5>\n<p>By Tiago Devezas (<a title=\"INESC TEC\" href=\"https:\/\/www.inesctec.pt\/\" target=\"_blank\" rel=\"noopener\">INESC TEC<\/a> and <a title=\"Faculdade de Engenharia da Universidade do Porto\" href=\"https:\/\/sigarra.up.pt\/feup\/pt\/web_page.inicial\" target=\"_blank\" rel=\"noopener\">DEI<\/a> \u2013 Porto, Portugal), Jos\u00e9 Devezas (<a title=\"Faculdade de Engenharia da Universidade do Porto\" href=\"https:\/\/sigarra.up.pt\/feup\/pt\/web_page.inicial\" target=\"_blank\" rel=\"noopener\">DEI<\/a><strong> <\/strong><em> <\/em> \u2013 Porto, Portugal) and S\u00e9rgio Nunes (<a title=\"INESC TEC\" href=\"https:\/\/www.inesctec.pt\/\" target=\"_blank\" rel=\"noopener\">INESC TEC<\/a><strong> <\/strong><em> <\/em> and <a title=\"Faculdade de Engenharia da Universidade do Porto\" href=\"https:\/\/sigarra.up.pt\/feup\/pt\/web_page.inicial\" target=\"_blank\" rel=\"noopener\">DEI<\/a><strong> <\/strong><em> <\/em> \u2013 Porto, Portugal).<\/p>\n<p>The final paper of the session explored the <a title=\"The Signal Media One-Million News Articles Dataset\" href=\"http:\/\/research.signalmedia.co\/newsir16\/signal-dataset.html\" target=\"_blank\" rel=\"noopener\">Signal Media dataset<\/a> using the visualisation tools provided by the <a title=\"MediaViz: An Interactive Visualization Platform for Online Media Studies\" href=\"http:\/\/dl.acm.org\/citation.cfm?id=2808474&amp;dl=ACM&amp;coll=DL&amp;CFID=603433401&amp;CFTOKEN=16244081\" target=\"_blank\" rel=\"noopener\">MediaViz<\/a> platform. MediaViz aims to assist in gaining insight from large archives of news through interactive visualisation tools.<\/p>\n<p>The visualisation analysis of the <a title=\"The Signal Media One-Million News Articles Dataset\" href=\"http:\/\/research.signalmedia.co\/newsir16\/signal-dataset.html\" target=\"_blank\" rel=\"noopener\">Signal Media dataset<\/a><strong> <\/strong><em> <\/em> revealed the following:<\/p>\n<p>* News and blog sources evaluate differently the importance of similar events, granting them distinct amounts of coverage.<br \/>\n* There are both dissimilarities and overlaps in the publication patterns of the two source types\u2014news and blog sources.<br \/>\n* The content direction and diversity behave differently over time.<\/p>\n<p>More precisely, a <em>keyword<\/em> analysis allowed the researchers to see that news and blog sources granted different levels of importance to a given set of keywords related with major global events that took place in September 2015. A <em>source<\/em> analysis showed that the temporal publication patterns of these two media behaved differently\u2014blogs published a higher percentage of content during the weekend than news sources\u2014though both sources followed an identical curve during a 24-hour cycle. Finally, a <em>diversity<\/em> analysis indicated variations in the dynamics of topical diversity over time.<\/p>\n<h2>5\u00a0\u00a0 Poster Session<\/h2>\n<p>The poster session took place after finalising the second session. The accepted posters are listed below,<\/p>\n<figure id=\"attachment_4419\" aria-describedby=\"caption-attachment-4419\" style=\"width: 420px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-4419  \" title=\"NewsIR_Posters\" src=\"https:\/\/irsg.bcs.org\/informer\/wp-content\/uploads\/NewsIR_Posters.jpg\" alt=\"\" width=\"420\" height=\"280\" \/><figcaption id=\"caption-attachment-4419\" class=\"wp-caption-text\">NewsIR&#39;16 Poster Session<\/figcaption><\/figure>\n<ul>\n<li>Temporal Random Indexing: A Tool for Analysing Word Meaning Variations in News<br \/>\nBy Pierpaolo Basile, Annalina Caputo and Giovanni Semeraro (<a title=\"Department of Computer Science\" href=\"http:\/\/www.uniba.it\/ricerca\/dipartimenti\/informatica\" target=\"_blank\" rel=\"noopener\">Department of Computer Science<\/a>, <a title=\"University of Bari Aldo Moro\" href=\"http:\/\/www.uniba.it\/english-version\" target=\"_blank\" rel=\"noopener\">University of Bari Aldo Moro<\/a> \u2013 Italy)<\/li>\n<li>Visualising the Propagation of News on the Web<br \/>\nBy Svitlana Vakulenko (<a title=\"MODUL University Vienna\" href=\"https:\/\/www.modul.ac.at\/\" target=\"_blank\" rel=\"noopener\">MODUL University Vienna<\/a>, Austria), Max G\u00f6bel (<a title=\"Vienna University of Economics and Business\" href=\"https:\/\/www.wu.ac.at\/en\/\" target=\"_blank\" rel=\"noopener\">Vienna University of Economics and Business<\/a>, Austria), Arno Scharl (<a title=\"MODUL University Vienna\" href=\"https:\/\/www.modul.ac.at\/\" target=\"_blank\" rel=\"noopener\">MODUL University Vienna<\/a><strong> <\/strong><em> <\/em>, Austria) and Lyndon Nixon (<a title=\"MODUL University Vienna\" href=\"https:\/\/www.modul.ac.at\/\" target=\"_blank\" rel=\"noopener\">MODUL University Vienna<\/a><strong><\/strong><em><\/em>, Austria).<\/li>\n<li>Comparative Analysis of GDELT Data Using the News Site Contrast System<br \/>\nBy Masaharu Yoshioka (<a title=\"Hokkaido University\" href=\"https:\/\/www.oia.hokudai.ac.jp\/\" target=\"_blank\" rel=\"noopener\">Hokkaido University<\/a> \u2013 Japan) and Noriko Kando (<a title=\"National Institute of Informatics\" href=\"http:\/\/www.nii.ac.jp\/en\/\" target=\"_blank\" rel=\"noopener\">National Institute of Informatics<\/a> \u2013 Tokyo, Japan)<\/li>\n<\/ul>\n<p>Some of the papers included in the proceedings of the workshop also had posters, and each poster presented had to be explained and defended by its authors, which encouraged interaction among the participants and greatly enhanced the discussion.<\/p>\n<h2>6\u00a0\u00a0\u00a0 Session 3: Analysis and Visualisation<\/h2>\n<p>The second keynote, delivered by <a title=\"Julio Gonzalo\" href=\"http:\/\/nlp.uned.es\/web-nlp\/component\/content\/article?id=13\" target=\"_blank\" rel=\"noopener\">Julio Gonzalo<\/a>, opened the afternoon sessions. Short descriptions of the keynote and papers presented in this session are listed below.<\/p>\n<h5>Monitoring Reputation in the Wild Online West<\/h5>\n<p>By Julio Gonzalo (<a title=\"National Distance Education University (UNED)\" href=\"http:\/\/portal.uned.es\/portal\/page?_pageid=93,24305391&amp;_dad=portal&amp;_schema=PORTAL\" target=\"_blank\" rel=\"noopener\">National Distance Education University (UNED)<\/a> \u2013 Madrid, Spain)<\/p>\n<p>Julio elaborated on some of his recent work on <em><a title=\"Tweet Stream Summarization for Online Reputation Management\" href=\"http:\/\/link.springer.com\/chapter\/10.1007\/978-3-319-30671-1_28\" target=\"_blank\" rel=\"noopener\">online reputation management<\/a><\/em>, which has already become a key part of public relations (PR) for organisations and individuals. Julio explained how PR companies start by analysing what topics have been mentioned in Twitter, the &#8220;central nervous system for PR companies&#8221;. Afterwards, filtering is applied, focusing on topics that are relevant. PR work should be approached with a recall-oriented point of view, where every tweet or news article counts.<\/p>\n<p>Special emphasis was given to the <a title=\"Overview of RepLab 2014: Author Profiling and Reputation Dimensions for Online Reputation Management\" href=\"http:\/\/ceur-ws.org\/Vol-1180\/CLEF2014wn-Rep-AmigoEt2014.pdf\" target=\"_blank\" rel=\"noopener\">RepLab Evaluation Campaign<\/a>. Such a campaign is co-organised by Julio, who compiled a collection of tweets retrieved in collaboration with the UNED research group. The collection provides over half a million manual annotations by reputation experts\u2014approximately, 570,108 annotations\u2014and 208,000 URLs derived from tweets for which manual annotations are available. As in the case of PR companies, RepLab concentrates on Twitter content, because it is the key media for early detection of potential reputational issues. Nevertheless, online monitoring pervades all media: news, social media, the blogosphere, etc.<\/p>\n<p><em>Polarity for reputation<\/em>, as opposed to sentiment analysis, was illustrated with examples during the keynote. Polarity for reputation attempts to identify statements and opinions that have negative or positive implications for the reputation of a person or company, and it is involved in author profiling, categorisation and ranking.<\/p>\n<p>Finally, Julio referred briefly to his interest in <em><a title=\"E-reputation monitoring on Twitter with active learning automatic annotation\" href=\"https:\/\/hal-univ-avignon.archives-ouvertes.fr\/hal-01002818\/document\" target=\"_blank\" rel=\"noopener\">adaptive learning<\/a><\/em> and systems that implement it as a means to correct potential failures. Julio expressed that it is not critical for a system to fail once, as long as it is not recurrent\u2014knowledge about the failure can be incorporated to prevent the same error in the future.<\/p>\n<h5>What do a Million News Articles Look like?<\/h5>\n<p>By David Corney, Dyaa Albakour, Miguel Martinez and Samir Moussa (<a title=\"Signal | Media Monitoring &amp; Market Intelligence\" href=\"http:\/\/signal.uk.com\/\" target=\"_blank\" rel=\"noopener\">Signal Media<\/a> \u2013 London, UK)<\/p>\n<p>The final presentation of the workshop was delivered by Dyaa Albakour from <a title=\"Signal | Media Monitoring &amp; Market Intelligence\" href=\"http:\/\/signal.uk.com\/\" target=\"_blank\" rel=\"noopener\">Signal Media<\/a>, who offered a comprehensive description of the <a title=\"The Signal Media One-Million News Articles Dataset\" href=\"http:\/\/research.signalmedia.co\/newsir16\/signal-dataset.html\" target=\"_blank\" rel=\"noopener\">Signal Media dataset<\/a>.<\/p>\n<p>As explained by Dyaa, there are 407,754,159 words in the <a title=\"The Signal Media One-Million News Articles Dataset\" href=\"http:\/\/research.signalmedia.co\/newsir16\/signal-dataset.html\" target=\"_blank\" rel=\"noopener\">Signal Media dataset<\/a>; there are 2,003,254 distinct words in the dataset; and the average number of words per article is 407.75. Just 144 articles in the collection have more than 10,000 words. The longest article has 12,450 words and it is a transcript of a US college football match. Other long articles include an instalment of a serialised novel; detailed personal memoirs; and a list of fishing reports from Florida. The articles were collected from a variety of news sources, including <a title=\"Reuters - Breaking News, Business News, Financial and ...\" href=\"http:\/\/uk.reuters.com\/\" target=\"_blank\" rel=\"noopener\">Reuters<\/a>, the <em><a title=\"BBC\" href=\"http:\/\/www.bbc.co.uk\/\" target=\"_blank\" rel=\"noopener\">BBC<\/a><\/em> and the <em><a title=\"The New York Times - Breaking News, World News\" href=\"http:\/\/www.nytimes.com\/\" target=\"_blank\" rel=\"noopener\">New York Times<\/a><\/em>, along with many sources that have fewer readers\u2014such as news magazines, blogs, local outlets and specialist publications. The dataset is shared under a <em><a title=\"Creative Commons\" href=\"https:\/\/creativecommons.org\/licenses\/\" target=\"_blank\" rel=\"noopener\">Creative Commons licence<\/a><\/em>, while the copyright of the articles remains with the original publishers.<\/p>\n<p>The <a title=\"The Signal Media One-Million News Articles Dataset\" href=\"http:\/\/research.signalmedia.co\/newsir16\/signal-dataset.html\" target=\"_blank\" rel=\"noopener\">Signal Media dataset<\/a> is text-only, and does not contain links to the original articles. This is partly due to issues around image licensing, but also to avoid storing links that might eventually become obsolete and outdated. An open source repository on <a title=\"Signal-1M-Tools\" href=\"https:\/\/github.com\/SignalMedia\/Signal-1M-Tools\" target=\"_blank\" rel=\"noopener\">GitHub<\/a> to host useful tools and programming scripts for processing the dataset has been created. This provides scripts to index the data with <a title=\"ElasticSearch\" href=\"https:\/\/www.elastic.co\/\" target=\"_blank\" rel=\"noopener\">ElasticSearch<\/a> and convert it to the <a title=\"The TREC Conference series\" href=\"http:\/\/trec.nist.gov\/\" target=\"_blank\" rel=\"noopener\">TREC<\/a> format for compatibility with other IR tools.<\/p>\n<h2>7\u00a0\u00a0 Break-out Session<\/h2>\n<p>At the end of the third session, the audience was divided into three groups to discuss, separately, the challenges that the news IR community faces, the data that would be useful to have to continue the current research, and the tasks that the community should focus on in the short and long term. Then, the entire audience reconvened and a representative of each group presented the outcomes.<\/p>\n<figure id=\"attachment_4421\" aria-describedby=\"caption-attachment-4421\" style=\"width: 300px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-4421   \" title=\"NewsIR_Breakout\" src=\"https:\/\/irsg.bcs.org\/informer\/wp-content\/uploads\/NewsIR_Breakout.jpg\" alt=\"\" width=\"300\" height=\"200\" \/><figcaption id=\"caption-attachment-4421\" class=\"wp-caption-text\">Breakout session<\/figcaption><\/figure>\n<p>Generally, the participants expressed their interest in extending the time period covered by the <a title=\"The Signal Media One-Million News Articles Dataset\" href=\"http:\/\/research.signalmedia.co\/newsir16\/signal-dataset.html\" target=\"_blank\" rel=\"noopener\">Signal Media dataset<\/a>, as one month\u2014September 2015\u2014makes it unsuitable for an investigation on temporal analysis, where longer time spans are indispensable. The participants also recommended integrating multimedia content and multilingual sources to the <a title=\"The Signal Media One-Million News Articles Dataset\" href=\"http:\/\/research.signalmedia.co\/newsir16\/signal-dataset.html\" target=\"_blank\" rel=\"noopener\">Signal Media dataset<\/a>, though these features were out of the scope of the original plan. Talks to combine a Twitter dataset with the existing news articles over the same time period to have a unified collection of news, blogs and tweets have already taken place in collaboration with Igor Brigadir from the <a title=\"Insight Centre for Data Analytics\" href=\"https:\/\/www.insight-centre.org\/\" target=\"_blank\" rel=\"noopener\">Insight Centre for Data Analytics<\/a>. Arguably, this would be a very useful dataset for future work.<\/p>\n<p>Another issue that came up in the discussion was the verification of news, which includes fact checking, controversy detection and determination of news bias. These were suggested as possible tasks for the next year&#8217;s workshop.<\/p>\n<h2>8\u00a0\u00a0 Panel and Closing Remarks<\/h2>\n<p>A panel composed by Gabriella Kazai, Stefano Mizzaro, Jochen Leidner and Julio Gonzalo addressed the final questions of the audience and offered their final thoughts at the end of the day.<\/p>\n<figure id=\"attachment_4420\" aria-describedby=\"caption-attachment-4420\" style=\"width: 600px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-4420  \" title=\"NewsIR_Panel\" src=\"https:\/\/irsg.bcs.org\/informer\/wp-content\/uploads\/NewsIR_Panel.jpg\" alt=\"\" width=\"600\" height=\"400\" \/><figcaption id=\"caption-attachment-4420\" class=\"wp-caption-text\">&quot;Panel and wine - all will be fine&quot; (old German proverb)<\/figcaption><\/figure>\n<p>Evaluation was the first topic referred to by the panel. Stefano and Julio reminded us of the multiple evaluation metrics already devised for IR systems\u2014precision and recall are not the only ones. Since each different metric says something different about the data, we may not need new metrics, but rather to combine correctly the existing ones to provide better explanations. In this context, Jochen proposed involving journalists to future NewsIR events, as their input would offer valuable insight that we may otherwise miss.<\/p>\n<p>The interconnection amongst the different elements of online media\u2014blogs, news websites, Twitter and social media in general\u2014sparked off the discussion during the panel session as well. All the different channels where news are published at present are so intrinsically linked that some bloggers have got as much influence as major newspapers. The panel agreed that there are clear dependencies between news, blogs and social media: a tweet might cause someone to create a blog that in turn will cause someone else to write a short piece in a local newspaper, which, later on, will be picked up by a worldwide publication. Then, as explained by Jochen, ascertaining the quality of news material, and the need for mechanisms to separate real news from pseudo-news, rumours and <em><a title=\"Clickbait - Wikipedia\" href=\"https:\/\/en.wikipedia.org\/wiki\/Clickbait\" target=\"_blank\" rel=\"noopener\">clickbait<\/a><\/em> is critical. In this sense, the explosion of new sources has increased the difficulty of determining legitimate or trustworthy sources. Moreover, defining what trustworthy means is becoming harder, as single events can be seen through several points of views, some of which might be contradictory.<\/p>\n<p>Finally, Gabriella commented on the business model that should be adopted to protect news publishers. Since people have become content creators, who should receive the revenue from advertising in the future, and who should be responsible for the distribution of news.<\/p>\n<p>Anyway&#8230; the discussion continues online. Plans for the next event, and comments on all the topics of interest for the news IR community\u2014summarisation, clustering, topic classification, duplicate identification, entity recognition and disambiguation, event detection and social media integration\u2014are part of the regular chatter at the <a title=\"NewsIR Google discussion forum\" href=\"https:\/\/groups.google.com\/forum\/#!forum\/news-ir\" target=\"_blank\" rel=\"noopener\">Google discussion forum<\/a>.<\/p>\n<h2>Acknowledgements<\/h2>\n<p>This article was written by Marco Palomino and Ayse G\u00f6ker. Pictures were kindly provided by Udo Kruschwitz.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The First International Workshop on Recent Trends in News Information Retrieval (NewsIR\u201916) was held\u00a0as part of\u00a0the ECIR 2016 conference in Padua (Italy) on 20 March 2016. The workshop provided an opportunity for a diverse group of stakeholders\u2014researchers, professionals and practitioners\u2014involved in news-related information to come together and discuss the latest and most powerful uses of&hellip; <a class=\"more-link\" href=\"https:\/\/archive-irsg.bcs.org\/informer\/?p=4236\">Continue reading <span class=\"screen-reader-text\">First International Workshop on Recent Trends in News Information Retrieval (NewsIR\u201916)<\/span><\/a><\/p>\n","protected":false},"author":48,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[195,210],"tags":[],"class_list":["post-4236","post","type-post","status-publish","format-standard","hentry","category-conference-review","category-spring-2016","entry"],"_links":{"self":[{"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=\/wp\/v2\/posts\/4236","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=\/wp\/v2\/users\/48"}],"replies":[{"embeddable":true,"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4236"}],"version-history":[{"count":0,"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=\/wp\/v2\/posts\/4236\/revisions"}],"wp:attachment":[{"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4236"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4236"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4236"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}