Workshop Review: 1st Bibliometric-enhanced Information Retrieval (BIR) workshop @ ECIR 2014

Organizer of the workshop: Philipp Mayr, Andrea Scharnhorst, Birger Larsen, Philipp Schaer, Peter Mutschke
Workshop Website: http://www.gesis.org/en/events/events-archive/conferences/ecirworkshop2014/
Workshop proceedings: http://ceur-ws.org/Vol-1143/
Participants: 40

The BIR workshop was also informed by an ongoing COST Action TD1210 KnowEscape .

The 1st Bibliometric-enhanced Information Retrieval (BIR) workshop @ECIR 2014 in Amsterdam (NL) aimed to engage with the IR community about possible links to bibliometrics and scholarly communication (see more detail in the editorial). Bibliometric techniques are not yet widely used to enhance retrieval processes in digital libraries, although they offer value-added effects for users. To give an example, recent approaches have shown the possibilities of alternative ranking methods based on citation analysis leading to an enhanced IR.

The BIR workshop was the follow-up event of the workshop „Combining Bibliometrics and Information Retrieval“ at International Society of Scientometrics and Informetrics Conference (ISSI) in 2013. The motivation of these both workshops is to bring researchers from IR and bibliometrics together to discuss novel approaches and techniques which hopefully can be implemented in digital libraries and learn from each other.

Auditorium at the BIR workshop
Auditorium at the BIR workshop

The workshop was a well-attended meeting with six short paper presentation and 40 international participants. In the following we will outline the six papers of the workshop in the sequence of presentation.

Since bibliographic studies enabled the systematic study of citations, researchers have debated about the meaning of citations. The analysis of citations has revealed meaningful traces of knowledge diffusion in scholarly communication based on large scale analysis. This does not take away that for every reference made in a text, the reason for such a reference can be very different. It can be a reference to a body of work fundamental for the argument made in this paper, or indicating other related work with which this paper engages complementary, continuing or debating. Linguistic analysis of the context (the textual neighborhood) of a citation has been conducted to determine the sentiment of a citation. The paper of Bertin and Atanassova [http://ceur-ws.org/Vol-1143/paper1.pdf] belongs to those studies, which try to further unravel the riddle of meaning of citations. The authors analyse the word use in standard parts of articles – such as Introduction, Methods, Results and Discussion, and reveal interesting distributions of the use of verbs for those sections. The authors propose to use this work in future citation classifier, which in the long-term might be implemented also in citation-based information retrieval.

Nees Jan van Eck and Ludo Waltman [http://ceur-ws.org/Vol-1143/paper2.pdf] consider the problem of scientific litera-ture search, and suggest that citation relations between publications can be a helpful instrument in the systematic retrieval process of scientific literature. They introduce a new software tool called CitNetExplorer that can be used for citation-based scientific literature retrieval. To demonstrate the use of CitNetExplorer, they employ the tool to identify publications dealing with the topic of “community detection in networks”. They argue that their approach can be especially helpful in situations in which one needs a comprehensive overview of the literature on a certain research topic, for in-stance in the preparation of a review article.

Bibliometrics group
Bibliometrics group

Muhammad Kamran Abbasi and Ingo Frommholz [http://ceur-ws.org/Vol-1143/paper3.pdf] investigate the benefit of combining polyrepresentation with document clustering. The goal is to provide the search process by highly ranked polyrepresentative clusters. The principle of polyrep-resentation in IR can be generally described as the increase of a document’s relevancy if multiple representations are pointing to it. Given this, the authors argue that from user perspective it seems more suitable to present clusters of documents relevant to the same representation instead of presenting ranked lists of search results. The approach proposed therefore is to provide the user with a ranked list of documents appearing in the “best” cluster first, i.e. the cluster of documents providing the most cognitive overlap of different representations. The authors applied clustering to information need as well as to document-based polyrepresentation. The evaluation of the model on the basis of the iSearch collection shows some potential of the approach to improve retrieval quality, but also some dependency from the number of relevant documents.

Haozhen Zhao and Xiaohua Hu [http://ceur-ws.org/Vol-1143/paper4.pdf] explore the effect of including citation and co-citation information as document prior probabilities for relevancy on retrieval quality. As document priors a paper’s citation count, its PageRank and its co-citation cluster is used. The paper provides an evaluation of the approach on the basis of the iSearch collection, however indicating a limited effect of applying document priors based on citation counts, PageRank and co-citation clusters of retrieval performance. The authors conclude that using document priors in a more query dependent manner and combining citation features with content features might lead to a greater effect.

IR group
IR group
Mixed bibliometrics and IR groups
Mixed bibliometrics and IR groups

Zeljko Carevic and Philipp Schaer [http://ceur-ws.org/Vol-1143/paper5.pdf] examined the iSearch test collection and the available citation information included in this collection. Unlike iSearch common IR test collections don’t included all available information to do proper evaluations in the field of citation-based rankings. The main goal of this work is to learn about the connection between citation-based and topical relevance rankings and the suitability of iSearch to work on this task. The paper at hand is a pretest for this overall research question and analyses the dataset and it’s suitability for citation analysis. Furthermore they investigated on co-citation recommendations based on topical relevant seed documents.

Kris Jack, Pablo López-García, Maya Hristakeva and Roman Kern [http://ceur-ws.org/Vol-1143/paper6.pdf] present a work on how to increase the number of citations to support claims in Wikipedia. They analyse the distribution of more than 9 million citations in Wikipedia and found that more than 400,000 times an explicit marker for a needed citation is present. To over-come this situation they propose different techniques based on journal productivity (Bradfordizing) and popularity (number of readers in Mendeley) to implement a cita-tion recommending system. The evaluation is carried out using the Mendeley corpus with 100 million documents and 10 topics. Although this paper is just a case study it can be clearly seen that a normal keyword-based search engine like Google Scholar is not sufficient to be used to provide citation recommendation for Wikipedia articles and that altmetrics like readership information can improve retrieval and recommen-dation performance.

 

Organizers talk with Ingo
Organizers talk with Ingo

 

More pictures of the event can be found on the workshop website.