{"id":4177,"date":"2016-01-31T12:39:59","date_gmt":"2016-01-31T12:39:59","guid":{"rendered":"https:\/\/irsg.bcs.org\/informer\/?p=4177"},"modified":"2016-01-31T12:39:59","modified_gmt":"2016-01-31T12:39:59","slug":"book-review-fundamentals-of-predictive-text-mining-2nd-ed-2015","status":"publish","type":"post","link":"https:\/\/archive-irsg.bcs.org\/informer\/?p=4177","title":{"rendered":"Book Review: Fundamentals of Predictive Text Mining 2nd ed (2015)"},"content":{"rendered":"<h3>Book Review: Weiss S. M., Indurkhya N. and Zhang T.\u00a0 (2015). Fundamentals of Predictive Text Mining. Springer-Verlag, London. Second Edition<\/h3>\n<p>The volume \u201cFundamentals of Predictive Text Mining\u201d, 2<sup>nd<\/sup> ed. has nine chapters, a table of contents, a list of references, a Subject Index and an Author Index. The book also includes a Preface written by the three authors,<\/p>\n<p><strong>Summary<\/strong><\/p>\n<p>Abbriavions: ML=Machine Learning; NLP=Natural Language Processing; IR= Information Retrieval<\/p>\n<p>1) In Chapter 1, \u201cOverview of Text Mining\u201d, it is shown how models are constructed out of unstructured documents and how the results of the classification is projected to new text. One of the most important concepts introduced in this chapter is \u201cdata representation\u201d. The spreadsheet model is introduced and some drawbacks, namely sparseness and missing values, are explained. Different text mining tasks that rely on a ML-based predictive framework are described (\u201c1.2 What types of problems can be solved\u201d), namely document classification, information retrieval, document clustering, information extraction. The last section of the chapter explains why performance evaluation is important.<\/p>\n<p><!--more--><\/p>\n<p>2) Chapter 2, \u201cFrom Textual Information to Numerical Vectors\u201d, describes how words are converted in a vector-shaped format, which is the required format needed when applying predictive methods.\u00a0 Words, or tokens, may be reduced to common roots by lemmatizers or stemmers. The words can be added to a dictionary.\u00a0 In the vector representation, words are represented as attribute-value pairs, where the value of each attribute can be the measure of frequency of occurrence of a specific words weighted by specific weighting schemes, \u00a0such as tf\/idf. Dictionary can be extended to multiword features like phrases. Other linguistic manipulation can be applied, such as part-of-speech tagging (for morphological analysis) , word sense disambiguation (to solve ambiguities such as \u201capple\u201d the fruit vs. \u201capple\u201d the brand), parsing (for syntactic analysis) , etc. In the last section of the chapter (2.12 Feature Generation), the authors point out how linguistic preprocessing can be useful to identify good features for text mining.<\/p>\n<p>3) In the third chapter \u201cUsing Text for Prediction\u201d, predictive text mining is described in terms of empirical analysis that focuses on word patterns. Fundamental ML methods are described, namely similarity-based methods, decision rules and trees, probabilistic methods and linear methods. \u00a0Section 3.5 Evaluation Performance describes evaluation measures (such as precision, recall etc.) and some pitfalls of these metrics. The chapter ends with a short introduction to graph models for social networks.<\/p>\n<p>4) In chapter 4,\u00a0 \u201cInformation Retrieval and Text Mining\u201d, IR is defined and described as predictive text-mining task because the methods for retrieval can be considered variations of similarity-based nearest-neighbor methods. The different methods of measuring similarity are illustrated, including cosine similarity. Link analysis for ranking similarity of documents is then presented and discussed. The chapter ends with a list of additional evaluation cues to be taken into account when doing IR, eg. the date of appearance of documents and users\u2019 voting.<\/p>\n<p>5) Chapter 5 \u201cFinding Structure in a Document Collection\u201d presents methods for clustering documents. Clustering is used when documents in a collection have no label indicating their content. Clustering helps sort out documents into groups that have implicitly the same theme. A review of popular clustering algorithms is then presented, namely k-means, hierarchical clustering and the EM algorithm. The chapter includes a discussion on how to assign \u201cmeaning\u201d to cluster that have been generated by algorithms. The chapter ends by emphasizing the value added by clustering techniques when performing exploratory analysis.<\/p>\n<p>6) Chapter 6, \u201cLooking for Information in Documents\u201d, describes several models and learning methods that can be used for information extraction (IE). IE is defined as \u201ca restricted form of full language understanding, where we know in advance what kind of semantic information we are looking for\u201d (p. 119). Three tasks are examined in more detail, namely name-entity recognition (NER), co-reference resolution and relation extraction (RE). NER refers to automatic identification of names of persons, organizations, locations, expressions of times, quantities, etc. in unstructured text, while RE refers to the detection and classification of semantic relationship mentions. Co-reference resolution occurs when two or more expressions in a text refer to the same person or thing: in order to derive the correct interpretation of a text, pronouns and other referring expressions must be connected to the right individuals. In this chapter the Maximum Entropy method is illustrated and discussed. The chapter ends with a list of applications based on IE in the fields of IR, commercial extraction systems, criminal justice and intelligence.<\/p>\n<p>7) In Chapter 7, \u201cData Sources for Prediction: Databases, Hybrid Data and the Web\u201d, the authors explore hybrid forms of text and structure numerical data, for example stock data and related newswire headline. Prototypical examples are described, such as opinion mining and sentiment analysis and web-based XML data.<\/p>\n<p>In Chapter 8, \u201cCase Studies\u201d, several scenarios are illustrated and discussed, namely: market intelligence from the web, document matching for digital libraries, help desk applications, assignments of topics to news articles, email filtering, search engines, named-entity extraction, mining of social media, and finally the creation of a customized version of newspaper. Each case study contains the following features: problem description, solution overview, methods and procedures and system deployments. Applications from several fields are reviewed.<\/p>\n<p>In the last chapter, Chapter 9, \u201cEmerging Directions\u201d, the authors present a number of topics that show an increasing interest in predictive text mining, e.g. summarization, question answering, active learning, learning with unlabeled data, and deep learning.<\/p>\n<p><strong>Evaluation<\/strong><\/p>\n<p>This volume is a gentle introduction to predictive text mining. The language used in the book is simple and accessible. The book summarizes the basic knowledge needed to mine unstructured textual data and cast predictions on new text.<\/p>\n<p>The ideal audience of the book is composed of students and, in general, of beginners with some basic knowledge in information retrieval, probability theory and linear algebra. \u00a0This \u201cmathematical maturity\u201d is a requirement that is clearly spelled out in the Preface (p. vi) and is needed to understand the formulas presented along the chapters (eg. Chapters 3 and 4).<\/p>\n<p>In this volume, text mining is presented as an ideal blend of NLP, ML and IR. In particular, the importance of NLP for predictive text mining is duly stressed. NLP tasks are considered to be important to generate valuable features (Chapter 2) and for information extraction (Chapter 6). Chapter 6 could be also used in NLP courses.\u00a0 In Chapter 9, many NLP tasks (such as summarization and question answering, which are well established in the fields of NLP, computational linguistics and language technology) are referred to as <em>emerging areas<\/em> in text mining. One of the authors of the book, namely Nitin Indurkhya, is also editor (together with Fred Damerau) of the <em>Handbook of Natural Language Processing, 2010, 2<sup>nd<\/sup> ed., <\/em>and this might explain the emphasis on NLP. Since I am computational linguist, I do appreciate this special stress, also because text mining is actually the ideal interdisciplinary meeting point of different fields, such as NLP, ML and IR.<\/p>\n<p><em> <\/em><\/p>\n<p>The book has many good features. For instance:<\/p>\n<p>1)\u00a0\u00a0A number of pseudo-coded algorithms are provided (eg., in see Fig 2.4 \u201cGenerating features from tokens\u201d or Fig 2.7 \u201cGenerating multiword features from tokens\u201d).<\/p>\n<p>2)\u00a0\u00a0 Important questions are addressed and discussed in dedicated sections, eg. \u201c3.2 How many documents are enough?\u201d or \u201c5.3 What do a cluster\u2019s label mean?\u201d.<\/p>\n<p>3)\u00a0\u00a0 At the end of each chapter the reader is provided with a short Summary section, that presents the main concepts introduced in the chapter, and a section called \u201cHistorical and Bibliographical Remarks\u201d, which is very useful to get an idea of the progress in the area.<\/p>\n<p>4)\u00a0\u00a0 Each chapter is complemented with Questions and Exercises, which are valuable additions to the content and can be used in class as teaching material.<\/p>\n<p>5)\u00a0\u00a0 Teaching aid is available in the form of \u201c[s]lides, sample solutions to selected exercises and suggestions for using the book in courses are are [sic] available from the publisher&#8217;s companion site for this book.\u201d (p. vii) .<\/p>\n<p>6)\u00a0\u00a0 Optional software that implements many of the methods discussed in the book can be also be downloaded from the data-miner website (p. vii).<\/p>\n<p>There are, however, a couple of notions that I felt are missing in the book and that might be interpreted as <em>desiderata <\/em>for the next edition. The first one is the notion of \u201cinductive bias\u201d. What it is and how it affects the performance of different learning algorithms on the same data is important to know in ML practice. \u00a0Following Mitchell&#8217;s definition, the inductive bias of a machine learning algorithm is the \u201cassumptions that must be added to the observed data to transform the algorithm&#8217;s outputs into logical deductions\u201d. According to Daume\u2019 \u201cin the absence of data that narrow down the relevant concept, what type of solutions are we more like to prefer?\u201d. Inductive bias is difficult notion for students to understand and it would be useful to explain it in a text mining manual.<\/p>\n<p>The second desideratum is comparative evaluation metric. The evaluation of performance is comprehensively dealt with along the chapters. For instance, Chapters 3, 4, and \u00a05 consistently present evaluation for supervised ML-based methods, for IR and for clustering. However, in order to compare the performance of two or more classifiers, statistical tests are usually employed, such as <em>t-test<\/em> and the like. This knowledge would also be useful for the students to fully grasp ML practice.<\/p>\n<p>All in all, this manual makes a good contribution not only in predictive text mining, but also in machine learning for language technology in general. It is a good reading and a valuable manual in many respects.<\/p>\n<p>Marina Santini<\/p>\n<p>Computational linguist currently teaching Mathematics for language technologists, Machine Learning \u00a0in language technology and Semantic analysis in language technology at Uppsala University (Sweden).<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Book Review: Weiss S. M., Indurkhya N. and Zhang T.\u00a0 (2015). Fundamentals of Predictive Text Mining. Springer-Verlag, London. Second Edition The volume \u201cFundamentals of Predictive Text Mining\u201d, 2nd ed. has nine chapters, a table of contents, a list of references, a Subject Index and an Author Index. The book also includes a Preface written by&hellip; <a class=\"more-link\" href=\"https:\/\/archive-irsg.bcs.org\/informer\/?p=4177\">Continue reading <span class=\"screen-reader-text\">Book Review: Fundamentals of Predictive Text Mining 2nd ed (2015)<\/span><\/a><\/p>\n","protected":false},"author":49,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[194,234],"tags":[],"class_list":["post-4177","post","type-post","status-publish","format-standard","hentry","category-book-review","category-winter-2016","entry"],"_links":{"self":[{"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=\/wp\/v2\/posts\/4177","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=\/wp\/v2\/users\/49"}],"replies":[{"embeddable":true,"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4177"}],"version-history":[{"count":0,"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=\/wp\/v2\/posts\/4177\/revisions"}],"wp:attachment":[{"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4177"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4177"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4177"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}