Conference Review: Digital Libraries 2014

The joint conference on Digital Libraries 2014 – ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014) and the Theory and Practice of Digital Libraries Conference(TPDL 2014). The conference was held between 8^th and 12^th September 2014 at Milton Court in the Barbican Centre (main conference) and City University London (tutorials, workshops and doctorial consortium). We review here the information retrieval focused aspects of the conference, focusing on session dealing with indexing and search.

Doctorial Consortium, Tutorials

The doctorial consortium and tutorials were held on was held on Monday 8^th September. The subjects in the Doctorial Consortium ranged from user interaction to working with digital collections and social factors with work being done on serious games for information literacy and various digital library frameworks. A particularly relevant talk was provided by Hugo Huuderman who focused on searching archives considering temporal aspects. An open mic session ended the event. Three tutorials were delivered on preservation, online scholarly content with Ed Fox providing a full day session on Digital Libraries.

Workshops

There were nine workshops, with two which were particularly relevant to the search community : digital libraries for musicology and knowledge maps and information retrieval.

There was quite a bit of interesting work in the musicology workshop both in terms of long and short papers. Padilla et al and Saitis et al focused on scanned music and the problem of making optical music recognition useable. Fillon et al described an open source framework to providing access to audio archives. Uyar et al provided information on a Turkish music collection which could be used for music retrieval experiments. Brown et al provided an overview of the Listening Experience Database, which used crowd sourcing data to index the content using linked data schemes. Wolff et all described machine learning techniques for classifying music datasets. Bainbridge et al described the use of various open source tools to create access in music digital libraries. Downie et al described the HathiTrust Digital library using a bibliometric stud to examine the music related content of 11 million volumes of information. Of the short papers, there were a number of interesting articles including Luzzi who focused on FRBR framework to provide access to music data through linked data and semantic web technologies. Fujinaga et al focused on a UI for searching music scores, whilst Schindler examines the broader aspects of UI for music access. Sordo et al give an overview of creating corpora for Arab-Andalusian music. Porter and Serra focus on feature extraction from music collections. Weyde et al and Rose & Tuppen argued for larger collection in computational music research including that for music retrieval.

The knowledge map for information retrieval workshop contained a number of quite focused papers on various aspect of the use of knowledge maps (visualising the structure of large information spaces) for interactive search. Cribbin looked at methods for citation search. Deng and Hu provided information on KM for guided searching. Yang and Ganascia focused on memory Islands, which can then be ‘navigated’ by the user. Ahn et all focused on visualising Dewey Decimal classifications to support information retrieval. Mutschke and Moussa focused on the use of heat maps on indexing vocabularies to support information seeking. Hienert and va Hoek described a UI for exploring document collections, whilst Alasllakh focused on the visualisation of multi-faceted search results, focusing on elements of interest to the user. Triebel et al focused on the evaluation of visual user-interfaces. Brath and Banissi focused on the problem of font specific attributes in KM and IR.

Main Conference

The main conference had 14 sessions in all aspects of digital libraries, ranging from citation analysis to education and building DL systems. Below are the highlights of interest to the IR community.

Keynotes speakers: The opening keynote was given by Prof. Dieter Fellner who addressed issues of 3D digitization in cultural heritage research. The second keynote was delivered by Prof. Jane Ohlmeyer who reviewed the 1641 Depositions Project and discussing in the digital humanities and outlined future challenges in digital libraries. Both were fascinating and engaging talks.

Session 2: Recommendation and Indexing. Akbar et al used social netwok analysis for the purposes of recommendation. Gollub looked at building taxonomies for library based exploration based on users’ information needs. Clough et al used the PageRank algorithm to personalise recommendation in a large cultural heritage collection. Finally, Friedrich and Kempf focused on providing a conceptual model for indexing to allow uses to find relevant research data.

Session 6: Browsing and Searching. McKay et al looked at browsing data to investigate patterns of loans and to confirm statements made on book browsing in the literature. Lacasa et al focused on the use of GeoSpatial and linked data to support semantic based searching and browsing. Harris et al focused on digital humanities and describe a search and mining system to support users in that domain. Mets et al described a federated search engine which provides access to public domain literature from various European libraries.

Session 7: Item identification. Santana et al focused on Author name disambiguation, and important problem for people search. Smith focused on detecting clusters of reused passages in texts of longer documents, to demonstrate how ideas spread but could be used for the purposes of improving indexing. Batjargal et al looked at the identification of duplication records but in different languages across multiple databases – a case of cross-language IR duplicate detection. Meunschke and Gipp looked at plagiarism detection, looking a citation based methods to reduce the computational load.

Session 8: Quality data and metadata. Dalip et al looked at using machine learning for collaborative filtering applications using various quality indicators. Gao focused on image based calligraphy image search on large scale data sets. Majidi and Crane looked at human an automatics parsers for ancient greek text, to see how they could be combined. Chen et al looked at search logs in CiteSeerx and found that most queries focus focus on a small set of terms and conclude that manual intervention is feasible.

Session 9: Topics. Jatowt and Duh looked at the examination of historical text to analyse the change in meaning of words over time, which brings in an important temporal aspect to search. Xu et al also looked at temporal aspects but for research topics, which have a value over time. Aletras et al looked at various topic representations to support browsing. Kimura and Maeda looked at extracting texts and finding personal relationships in them, which could be useful for searching for people and their connections.

Session 10: knowledge infrastructure and repositories. Borgman et al looked at information infrastructures of managing scientific data including access. Lagoze et al described a meta-data repository system providing access to confidential data and meta-data. St Pierre et al showed that is is possible to undertake real time indexing of a live stream of glyph compositor operators in users interactions within a desktop working environment. Tiessen et al looked at access to research datasets. Stathopolu et al looked a quality meta-data in open digital culture repositories.

Session 11: Data transformation & description: Gonano et al looked a creating a suitable ontology for browsing photo’s in a UI. Crawford et al looedk a early music data to create connections using linked data which could be used for the purposes of categorisation. Castro et al focused at creating ontologies for cross domain research data management. Fenlon et al evaluated HathiTust meta-data (see comment on Downie article above).

Session 12: Web archives & memory. Brunelle looked at the impact of missing embedded resources in web archives – which has an impact on the effectiveness of search. Huudeman focused at the unarchived web and the problems of finding pages in that set. Kanhabua looked at Wikipedia as a memory resource for significant events, which introduces an important temporal aspect to search.