Review of Taxonomy Boot Camp, London, Oct 2016

The first ever Taxonomy Boot Camp (TBC) this side of the Atlantic took place in London on 18th and 19th October 2016 at Kensington Olympia, organized by Information Today Ltd. As I have attended and presented at Taxonomy Boot Camp in the U.S. several times, I was keen to see what the first TBC in London would offer.
The conference brought together a wide range of speakers from different geographies and different backgrounds including Patrick Lambe, Founder of Straits Knowledge in Singapore and Adjunct Professor of KM at the Hong Kong Polytechnic University, Heather Hedden, author of “The Accidental Taxonomist” and Senior Vocabulary Editor, Cengage Learning, Andreas Blumauer, CEO of The Semantic Web Company, Dave Clarke, CEO Synaptica, Mike Atherton, Content Strategist at Facebook and Judi Vernau, Information Architect at Metataxis.

Two tracks ran simultaneously throughout the conference, giving attendees a wide variety of presentations to choose from. For those wanting to learn the nuts and bolts of taxonomies, there were plenty of useful sessions outlining basic taxonomy principles, starting a taxonomy project, taxonomy strategy and governance.

More advanced topics touched on text analytics, automatic tagging, auto-categorization and tools & platforms for linked data. Linked data is the semantic web term for connecting related data using hyperdata links. According to Webopedia, “the idea behind Linked Data is that hyperdata links will let people or machines find related data on the Web that was not previously linked”.

Tom Reamy, Chief Knowledge Architect of the KAPS Group, USA, gave a presentation on “catonomies”, which he defined as taxonomies with built-in categorization. Reamy detailed the steps required to create a catonomy, starting with term building from a set of documents, adding the terms to a rule and applying the rule to a broader set of content. He emphasized the importance of gaining input from subject matter experts during the catonomy-building process because an element of human judgment is important. In Reamy’s process recall and precision scores are used to determine accuracy.

Reamy cautioned that no tool will automatically build any type of taxonomy and told his audience to “run a mile” if a vendor says their tool is capable of building a custom taxonomy from scratch without any human input. As a taxonomist I would agree with this!

Martin Kaltenboek from Semantic Web Company and Sukaina Bharwani of the Stockholm Environment Institute gave a fascinating presentation on how standardized tagging and the use of semantic technology has turned a large, fragmented dataset on climate change into real knowledge that researchers around the world can leverage much more effectively than ever before. Climate change researchers have traditionally used many different terms to describe the same phenomena, meaning that connection to similar data and understanding was lost. Using five domain-specific thesauri (Overall Climate Tagger, 4000 concepts; Energy Efficiency, 750 concepts; Renewable Energy 2089 concepts; Climate Change Adaptation, 116 concepts; Climate Change Mitigation 473 concepts), this vast body of climate change data was tagged and linked using the PoolParty semantic platform. The platform comprised supervised machine learning, entity extraction based on knowledge graphs, geo-tagging, semantic search and content recommendation.

Climate Tagger visualization of who’s working on what

Other interesting presentations included a look at how big data is giving new insights into humanities, music and film research in which Roger Press, Director of Product Development at Academic Rights Press Ltd, shared how the company is using natural language processing and semantic indexing (think triples (subject, predicate, object) to gain deeper understanding of music and film sales. Visualization of the insights in graph format, “deliver the precise information to measure the cultural impact of different artists across the world.”

Two similar presentations highlighted the benefit of taxonomies to scientific web content. The first, by Andrew Needham, Ontology Manager at Springer Nature UK, was a case study showing how the publisher uses its taxonomy primarily for content discovery by users. The second, by Rachel Drysdale, Manager of Taxonomy and Analysis at PLOS (Public Library of Science) was also a case study and detailed how PLOS approaches taxonomies in a very pragmatic way. The PLOS taxonomy team leverages machine-aided indexing, natural language processing and a rules-based approach to index articles accepted for publication. Users are also able to flag terms that they do not agree with or that they think should be included in the taxonomy.

PLOS uses machine-aided indexing

My observation was that there were more presentations devoted to organization of internal content – behind companies’ firewalls – than organization of external content. Furthermore, only one presentation touched on text analytics – that of Tom Reamy, outlined above, which was disappointing, as taxonomies can play an important part in the discovery of new knowledge. That said, it is good to know that the organizers have already booked the same venue for a return of Taxonomy Boot Camp to London in 2017.

For another write up on Taxonomy Boot Camp, see Heather Hedden’s blog.
Helen Clegg
Text Analytics Manager
A.T. Kearney, London
Helen.clegg@atkearney
@HClegg

Leave a comment

Your email address will not be published. Required fields are marked *