{"id":3904,"date":"2015-10-29T23:22:29","date_gmt":"2015-10-29T23:22:29","guid":{"rendered":"https:\/\/irsg.bcs.org\/informer\/?p=3904"},"modified":"2015-10-29T23:22:29","modified_gmt":"2015-10-29T23:22:29","slug":"detecting-adverse-drug-effects-in-natural-language-using-limited-training-data","status":"publish","type":"post","link":"https:\/\/archive-irsg.bcs.org\/informer\/?p=3904","title":{"rendered":"Detecting adverse-drug effects in natural language using limited training data"},"content":{"rendered":"<p style=\"text-align: justify;\"><!-- p { margin-bottom: 0.25cm; line-height: 120%; } -->A large amount of information is provided in text documents but difficult to access for computer  programs. In order to detect complex information it is often important to understand the relationships  between words and entities in sentences. A relation can express for instance that a disease has a  particular finding or a that a drug can be used to treat a disease. An example of a relation is given in  Figure 1. Relation extraction addresses the task of detecting relationships between entities from natural language.<\/p>\n<p style=\"text-align: justify;\"><!--more--><\/p>\n<table border=\"black\" width=\"100%\">\n<tbody>\n<tr>\n<td style=\"text-align: center;\">BACKGROUND: <strong>Insulin<\/strong> has traditionally been viewed as a last resort in the treatment of <strong>type 2  diabetes<\/strong>. (PMID=17257474)<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p style=\"text-align: center;\"><em>Figure 1: Example describes the relation may-treat between insulin and type 2 diabetes.<\/em><\/p>\n<p style=\"text-align: justify;\"><!-- p { margin-bottom: 0.25cm; line-height: 120%; } -->Supervised learning approaches are most successful to address this task as proved by different shared  tasks and competitions (see e.g. (Segura-Bedmar et al., 2013)). Supervised learning is a general term  for machine learning methods using a fix set of manually labelled instances (usually consistent of  positive and negative examples) to train a classifier. Depending on the use case different  features are extracted from the training data to train the classifier. Features can be for example words or part-of-speech tags around the two entities or the dependency tree path between the entities. In many  cases supervised learning methods provide better results using a larger set of training examples. Nevertheless,  training data is not always available. Moreover, the generation of manually labelled data is usually time  consuming and expensive. Depending on the domain even expert knowledge is required.<\/p>\n<p style=\"text-align: justify;\"><!-- p { margin-bottom: 0.25cm; line-height: 120%; } -->Distant supervision is a technique to overcome this problem by generating positive and negative  training data automatically. Those instances are then used as input to train a relational classifier. According to  Mintz et al. (2009) distant supervision is defined as follows:<\/p>\n<p><em>\u201cThe distant supervision assumption is that if two entities participate in a relation, any sentence that  contain those two entities might express that relation.\u201d<\/em><\/p>\n<p style=\"text-align: justify;\">Using a set of known facts (e.g. <strong>Aspirin<\/strong> may treat <strong>pain<\/strong>) for a relation the approach searches for sentences containing  these facts and labels them according to the relation of interest. In most of the cases a set known facts  are taken from a relational knowledge base such as Freebase or UMLS. An excerpt of the UMLS  relation may-treat is given in Table 1. In contrast to Figure 1, however, those automatically (distantly)  labelled sentences might contain false labels, as the example in Figure 2 shows. According to Table 1,  <strong>insulin<\/strong> can be used to treat <strong>Type 2 Diabetes<\/strong>. Conversely, the example shows that the sentence expresses  something different. Nonetheless, distant supervision is able to generate large training data sets, and  classifiers trained on this data provide reasonable results compared to using manually labelled data (see  e.g. (Thomas et al. 2012)).<\/p>\n<p><!-- p { margin-bottom: 0.25cm; line-height: 120%; } --><\/p>\n<table class=\"aligncenter\" border=\"black\" cellspacing=\"0\" cellpadding=\"4\" width=\"213\">\n<colgroup>\n<col width=\"128*\"><\/col>\n<col width=\"128*\"><\/col>\n<\/colgroup>\n<tbody>\n<tr>\n<td style=\"text-align: center;\" colspan=\"2\" width=\"100%\" valign=\"top\"><strong>may_treat<\/strong><\/td>\n<\/tr>\n<tr valign=\"top\">\n<td><strong>DRUG<\/strong><\/td>\n<td><strong>DISEASE<\/strong><\/td>\n<\/tr>\n<tr valign=\"top\">\n<td>Zinc Lozenge<\/td>\n<td>Common Cold<\/td>\n<\/tr>\n<tr valign=\"top\">\n<td>Insulin<\/td>\n<td>Type 2 Diabetes<\/td>\n<\/tr>\n<tr valign=\"top\">\n<td>Sulconazole Nitrate Cream<\/td>\n<td>Tinea versicolor<\/td>\n<\/tr>\n<tr valign=\"top\">\n<td>Vitamin E<\/td>\n<td>Alzheimer<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p style=\"text-align: center;\"><em>Table 1: Excerpt of the UMLS may-treat relation<\/em><\/p>\n<table border=\"black\" width=\"100%\">\n<tbody>\n<tr>\n<td>\n<p style=\"text-align: center;\">Reactive species and early manifestation of <strong>insulin<\/strong> resistance in <strong>type 2 diabetes<\/strong>. (PMID=16448517)<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p style=\"text-align: center;\"><em>Figure 2: Distantly labelled sentence, false positive<\/em><\/p>\n<p style=\"text-align: justify;\"><!-- p { margin-bottom: 0.25cm; line-height: 120%; } -->This article presents an alternative scenario originally published in Roller and Stevenson (2015).  Supervised methods tend to provide better results the more manually labelled training data is available. What if only a small set of training instances is available? How does this effect the classification results? In the following a method is presented which shows how to improve classification results of a supervised classifier using only a small manually labelled instances by including automatically labelled data. The method is tested in the biomedical domain for the detection of adverse-drug effects (ADE) from sentences. The ADE data set is described in detail in Gurulingappa et. al (2012) and  Roller and Stevenson (2015). The given data set contains 1644 annotated abstracts of medical  publications. Each annotated abstract contains several sentences with a range of different adverse-drug  effects.<\/p>\n<p style=\"text-align: justify;\">For the experiment only a small training data set of maximum 200 medical abstracts is used and evaluated on a larger set of 1444 abstracts. In order to examine the impact of small training data the experiment starts with only one abstract. Then, the number of training instances is slowly increased to  200. For each training subset positive and negative mentions of adverse-drug effects are extracted and the used to generate further training data automatically. From the sentence in Figure 4 for  instance the seed pair \u201chair loss\u201d-\u201cparoxetine\u201d can be extracted and used to generate distantly labelled data. In the next step those seed instances  are then used to generate distantly labelled training data from 2 million medical abstracts  published in the Medline repository. Using the seed instances and the automatic labelling process it is  possible to generate a much larger training data set than using only manually labelled data. The  different size of training data is presented in Table 2.<\/p>\n<table border=\"black\" width=\"100%\">\n<tbody>\n<tr>\n<td style=\"text-align: left;\">\n<p style=\"text-align: center;\">Findings on discontinuation and rechallenge supported the assumption that the <strong>hair loss<\/strong> was a side effect of the <strong>paroxetine<\/strong>. (PMID=10442258)<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p style=\"text-align: center;\"><em>Figure 4: Example of an adverse-drug effect<\/em><\/p>\n<p style=\"text-align: center;\"><!-- td p { margin-bottom: 0cm; }p { margin-bottom: 0.25cm; line-height: 120%; } --><\/p>\n<table class=\"aligncenter\" border=\"black\" cellspacing=\"0\" cellpadding=\"4\" width=\"397\">\n<colgroup>\n<col width=\"47*\"><\/col>\n<col width=\"26*\"><\/col>\n<col width=\"37*\"><\/col>\n<col width=\"37*\"><\/col>\n<col width=\"37*\"><\/col>\n<col width=\"37*\"><\/col>\n<col width=\"37*\"><\/col>\n<\/colgroup>\n<tbody>\n<tr valign=\"top\">\n<td><span><span style=\"font-size: small;\"># \t\t\tabstracts<\/span><\/span><\/td>\n<td colspan=\"2\"><span><span style=\"font-size: small;\">seed<\/span><\/span><\/td>\n<td colspan=\"2\"><span><span style=\"font-size: small;\">manually \t\t\tlabelled <\/span><\/span><\/td>\n<td colspan=\"2\"><span><span style=\"font-size: small;\">distantly \t\t\tlabelled<\/span><\/span><\/td>\n<\/tr>\n<tr valign=\"top\">\n<td><\/td>\n<td><span><span style=\"font-size: small;\">positive<\/span><\/span><\/td>\n<td><span><span style=\"font-size: small;\">negative<\/span><\/span><\/td>\n<td><span><span style=\"font-size: small;\">positive<\/span><\/span><\/td>\n<td><span><span style=\"font-size: small;\">negative<\/span><\/span><\/td>\n<td><span><span style=\"font-size: small;\">positive<\/span><\/span><\/td>\n<td><span><span style=\"font-size: small;\">negative<\/span><\/span><\/td>\n<\/tr>\n<tr valign=\"top\">\n<td>10<\/td>\n<td>11<\/td>\n<td>78<\/td>\n<td>56<\/td>\n<td>180<\/td>\n<td>469<\/td>\n<td>1507<\/td>\n<\/tr>\n<tr valign=\"top\">\n<td>20<\/td>\n<td>25<\/td>\n<td>109<\/td>\n<td>122<\/td>\n<td>250<\/td>\n<td>919<\/td>\n<td>1883<\/td>\n<\/tr>\n<tr valign=\"top\">\n<td>50<\/td>\n<td>68<\/td>\n<td>228<\/td>\n<td>364<\/td>\n<td>516<\/td>\n<td>1860<\/td>\n<td>2636<\/td>\n<\/tr>\n<tr valign=\"top\">\n<td>75<\/td>\n<td>103<\/td>\n<td>340<\/td>\n<td>534<\/td>\n<td>752<\/td>\n<td>2252<\/td>\n<td>3173<\/td>\n<\/tr>\n<tr valign=\"top\">\n<td>100<\/td>\n<td>147<\/td>\n<td>444<\/td>\n<td>710<\/td>\n<td>988<\/td>\n<td>2861<\/td>\n<td>3981<\/td>\n<\/tr>\n<tr valign=\"top\">\n<td>150<\/td>\n<td>219<\/td>\n<td>656<\/td>\n<td>1112<\/td>\n<td>1454<\/td>\n<td>6208<\/td>\n<td>8118<\/td>\n<\/tr>\n<tr valign=\"top\">\n<td>200<\/td>\n<td>310<\/td>\n<td>857<\/td>\n<td>1550<\/td>\n<td>1886<\/td>\n<td>8175<\/td>\n<td>9947<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p style=\"text-align: center;\"><em>Table 2: ADE training data size<\/em><\/p>\n<p style=\"text-align: justify;\"><!-- p { margin-bottom: 0.25cm; line-height: 120%; } -->Table 2 shows that using only a small number of abstracts for training only a few different positive and  negative seed instance pairs can be extracted. Furthermore, a small set of seed abstracts contains only a  small number of positive and negative training instances (manually  labelled). As Table 2 shows using seed instances it is possible to generate a larger number of automatically labelled training data, in particular in comparison to the manually labelled data.<\/p>\n<p style=\"text-align: justify;\">&nbsp;<\/p>\n<h2 style=\"text-align: justify;\">Experiment<\/h2>\n<p style=\"text-align: justify;\">For the following experiment three classifiers will be trained. First, a classifier is trained using only the manually labelled data (gold standard) as input (supervised). Next, a classifier is trained using only the automatically  labelled data (distantly supervised) as input. Finally, both data sets are merged and used as input for the third classifier  (mixture model). For the experiment a support vector machine with a shallow linguistic kernel is used (Giuliano et al. 2006). To ensure reliable results the increasing training step (from 1-200) is repeated 5 times with a different set of 200 abstracts. This means that each of the 5 evaluation  rounds use a different set of seed instances and, therefore, a different set of distantly labelled instances.  The results are evaluated in terms of precision, recall and f-score and presented in Figure 4.<\/p>\n<p style=\"text-align: center;\"><img decoding=\"async\" src=\"http:\/\/rolandroller.com\/pics\/ADEprecision_recall_curve4_2.png\" alt=\"Effect of varying number of seed abstracts\" \/><br \/>\n<em>Figure 4: Effect of varying number of seed abstracts<\/em><\/p>\n<p style=\"text-align: justify;\">Figure 4 shows that using only a small set of manually labelled training data the supervised classifier  provides low results in terms of f-score. Increasing the training data size improves the result. It is  interesting that the distantly supervised classifier outperforms the supervised classifier to an abstract  size of 100. In this case using more data (even though the data is noisy) leads to better results than  using only a small set of manually labelled data. If both data sets are combined further improvements  are achieved. With an increasing number of training abstracts the distance between the results of the mixture model and  the supervised classifier is getting smaller. Eventually a further increase of the training data  outperforms the supervised classifier (using around 300 abstracts \u2013 not shown in this experiment).<\/p>\n<p>&nbsp;<\/p>\n<h2><strong>Conclusion<\/strong><\/h2>\n<p style=\"text-align: justify;\">This article presented a method to detect relations in natural language if only a small set of manually  labelled training data is available. The approach has been tested in context of detecting adverse-drug  effects from biomedical abstracts. It is able to use information from an existing training data set to  automatically acquire new training data. Using this data, a relational classifier can be trained to detect  and extract similar information in text documents. The classifier is able to provide comparable results to a supervised classifier using a small gold standard as input. Furthermore a mixture model has been  presented using manually labelled and distantly labelled data which is able to outperform a classifier  using only (a small set of) gold standard data. This result is notable since distantly supervised data  tends to be much noisier than manually labelled data and therefore produces less accurate classifiers.<\/p>\n<p style=\"text-align: justify;\">&nbsp;<\/p>\n<h2><strong>References<\/strong><\/h2>\n<p style=\"text-align: justify;\">Harsha Gurulingappa, Abdul Mateen Rajput, Angus Roberts, Juliane Fluck, Martin Hofmann-Apitius, and Luca Toldo. Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports. Journal of Biomedical Informatics, 2012. Text Mining and Natural Language Processing in Pharmacogenomics.<\/p>\n<p style=\"text-align: justify;\">Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore, 2009.<\/p>\n<p style=\"text-align: justify;\">Roland Roller and Mark Stevenson. Making the most of limited training data using distant supervision. In Proceedings of the BioNLP 2015 Workshop, Beijing, China, 2015.<\/p>\n<p style=\"text-align: justify;\">Isabel Segura-Bedmar, Paloma Mart\u00ednez, and Daniel S\u00e1nchez-Cisneros. The 1st DDI Extraction-2011 challenge task: Extraction of Drug-Drug Interactions from biomedical texts. In Proceedings of DDI Extraction-2011 challenge task., 2011.<\/p>\n<p style=\"text-align: justify;\">Philippe Thomas, Ill\u00e9s Solt, Roman Klinger, and Ulf Leser. Learning Protein Protein Interaction Extraction using Distant Supervision. In Proceedings of Robust Unsupervised and Semi-Supervised Methods in Natural Language Processing, 2011.<\/p>\n<p style=\"text-align: justify;\">&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A large amount of information is provided in text documents but difficult to access for computer programs. In order to detect complex information it is often important to understand the relationships between words and entities in sentences. A relation can express for instance that a disease has a particular finding or a that a drug&hellip; <a class=\"more-link\" href=\"https:\/\/archive-irsg.bcs.org\/informer\/?p=3904\">Continue reading <span class=\"screen-reader-text\">Detecting adverse-drug effects in natural language using limited training data<\/span><\/a><\/p>\n","protected":false},"author":44,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[184,201],"tags":[],"class_list":["post-3904","post","type-post","status-publish","format-standard","hentry","category-autumn-2015","category-feature-article","entry"],"_links":{"self":[{"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=\/wp\/v2\/posts\/3904","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=\/wp\/v2\/users\/44"}],"replies":[{"embeddable":true,"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3904"}],"version-history":[{"count":0,"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=\/wp\/v2\/posts\/3904\/revisions"}],"wp:attachment":[{"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3904"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3904"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/archive-irsg.bcs.org\/informer\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3904"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}