This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
Literature-Based Discovery (LBD) is a text mining technique used to generate novel hypotheses from vast amounts of literature sources, by identifying links between concepts from disparate sources. One of the main areas where it has been predominantly applied is the healthcare domain, whereby promising results, in the form of novel hypotheses, have been reported. The purpose of this work was to conduct a systematic literature review of recent publications on LBD in the healthcare domain in order to assess the trends in the approaches used and to identify issues and challenges for such systems.
The review was conducted following the principles of the Kitchenham method. The selected studies have been scrutinized and the derived findings have been reported following the PRISMA guidelines.
The review results reveal useful information regarding the application areas, the data sources considered, the approaches used, the performance in terms of accuracy and reliability and future research challenges. The results of this review will be beneficial to LBD researchers and other stakeholders in the healthcare domain, by providing them with useful insights on the approaches to adopt, data sources to consider, evaluation model to use and challenges to reflect on.
The synthesis of the results of this work has shed light on recent issues and challenges that drive new LBD models and provides avenues for their application in other diverse areas in the healthcare domain. To the best of our knowledge, no such recent review has been conducted.
Keywords: Literature-based discovery, Evidence-based healthcare, Knowledge translation, Systematic review
Healthcare management, being one of the highest priorities of most governments, attracts huge investments in terms of health and medical research worldwide. Medical research was found to be the main contributing factor in the improvement of health and longevity of individuals and populations in developed countries [1]. Researchers in the field are making new discoveries and generating knowledge, which has the potential to enhance healthcare delivery, improve patient health outcomes and reduce healthcare costs, thus strengthening the overall healthcare system and economy. This is only achievable if the knowledge is actually put into action [2]. However, the transfer of research findings into healthcare practice in the clinical setting, known as knowledge translation [3], is a very complex and slow process, often resulting in patients not being provided with the most appropriate care, although better treatment recommendations have been proposed and demonstrated. A frequently stated average time lag for knowledge translation is 17 years [4]. Understanding the various stages of knowledge translation and speeding up the process is a policy priority for many health research systems [4].
In order to leverage new medical research findings more quickly for the benefit of patients, medical practitioners are encouraged to adopt the practice of evidence-based medicine, whereby medical practitioners are expected to scrutinize the scientific and clinical research literature in their respective areas in an attempt to translate health research knowledge into effective healthcare action more quickly. However, due to the large volumes of biomedical literature available and the time constraints of medical practitioners, the practice of evidence-based medicine has become a major challenge [5]. This limitation can be considerably overcome by the use of appropriate computation techniques for the automated or semi-automated knowledge extraction from relevant research literature. A broad term commonly used for such techniques is literature based discovery (LBD), whose main goal is to generate novel hypotheses from the vast available biomedical literature by discovering unknown associations in existing knowledge [6]. Recent advances in machine learning, text mining and statistical analysis techniques have spurred research in this field and have resulted in many publications on the design and application of LBD systems for various use cases in the biomedical and healthcare domains.
The purpose of this work is to perform a systematic literature review of recently published research papers on the application of LBD for evidence-based healthcare, with the objective of identifying and integrating the findings of the most relevant individual studies. It is expected that the results of this review will give insights on the different LBD approaches and tools used in various application areas in the healthcare domain. It will help establish to what extent research has progressed in the field, with a focus on performance criteria like effectiveness, accuracy and reliability. A main outcome would be to identify research challenges, which will invoke further studies and thus, provide avenues for future research in other areas in the healthcare domain. The Kitchenham guidelines for performing systematic literature reviews [7] was adopted and the reporting of this paper follows PRISMA (preferred reporting items for systematic reviews and meta-analysis) guidelines [8]. To the best of our knowledge, no such recent review has been performed for evidence-based healthcare.
The challenges of knowledge translation have become a major concern to individuals who seek and need healthcare, healthcare providers, policy makers and funders of health services. The incorporation of scientific medical discoveries into practice guidelines and policies in the clinical setting can greatly improve healthcare delivery and patient health outcomes, and is the basis of evidence-based healthcare [9]. Evidence-based practice involves clinical decision making which considers the best and most up-to-date available scientific evidence, together with patient values and preferences, the clinical judgment of the medical practitioner and the context in which the care is provided [10]. Healthcare professionals seek evidence to support and justify any activity or intervention for patient care.
In their practice of evidence-based medicine, medical practitioners are expected to scrutinize the best available evidence for making decisions about the care of individual patients. However, with the increasing volume of academic research papers and related structured knowledge resulting from medical research worldwide, they only focus on publications that are directly relevant to their respective area of specialization and often skip other potentially relevant research. Thus, discoveries in one field remain unknown to others and potential connections between sub-fields are often missed out [11]. This limitation can be greatly curbed by LBD, which can automate or semi-automate the analysis of online resources from disparate sources to find new discoveries. With the exponential growth of scientific literature, LBD is becoming an increasingly important tool for facilitating research [12].
LBD generates discoveries not yet published anywhere, by combining knowledge extracted from varied literature sources and therefore, supports hypothesis generation [13]. There are two modes of discovery in LBD, namely open discovery and closed discovery. Open discovery starts with a concept X and tries to generate a potential association between X and another concept Z, based on an intermediate concept Y. This follows from the ABC co-occurrence model, which states that if A and B are often associated to each other, and B and C are also often associated to each other, there may potentially be an association between A and C, even if this association is not mentioned in any research paper [14]. In contrast, in closed discovery, both the start concept X and end concept Z are known, and an association between X and Z is predicted, based on a hypothesis about the relationship between X and Z. This technique then attempts to demonstrate the hypothesis through an intermediate concept Y.
LBD approaches in healthcare are becoming essential, since biomedical knowledge is spread out across a larger number of publications [15]. Potential discoveries in healthcare can be associations that exist between biomedical concepts, which are not usually discussed together in the literature. Appropriate implementation of LBD techniques have the potential to predict future strong associations between these concepts [15] and therefore entails further research. In the LBD approach the starting concept X may be a disease and the end concept Z may be a treatment or cause for the disease. The results of such discoveries need to be further investigated through experimental methods or clinical studies.
This review has been performed following the guidelines on undertaking systematic literature reviews by Kitchenham and Charters [7] and the reporting follows the PRISMA guidelines [8]. The methodology consisted of first setting out the research questions to give a focus for this review, followed by the specification of the search strategy, the application of assessment criteria for the selection of papers and finally the data analysis and extraction.
Based on the objectives of this review, the research questions have been set out and elaborated as follows:
We seek to find out the different application areas in which the application of LBD techniques has proved to be successful in the healthcare domain.
The foundation of LBD is the large amount of scientific literature available for a specific field of study. It is therefore important to identify the different literature sources which have been harnessed for LBD in the different studies.
Due to the peculiarity of the healthcare domain, LBD techniques have to be adapted to specific application areas. There is therefore the need to investigate the specific LBD techniques/approaches which are more relevant and effective for the healthcare domain.
Accuracy and reliability are imperative evaluation criteria for any computational technique in the healthcare domain, since a wrong intervention can lead to harmful consequences for the patient. We therefore study the different evaluation strategies used for LBD systems and find out their performance in terms of accuracy and reliability.
The search strategy involved the identification of potential research papers to be included in the review by performing a search on Google Scholar, with keywords ‘“Literature-based discovery” in health’. Google Scholar was chosen since it indexes scientific articles from various scholarly publishers and professional societies like Springer, ScienceDirect, ACM, IEEE Xplore, ResearchGate amongst others [16]. It also indexes biomedical-specific journals like the Journal of Biomedical Informatics, PLOS ONE and BioMed Central (BMC). Gusenbauer [17] performed a comparative study of academic search engines in 2019 and concluded that “Google Scholar is currently the most comprehensive academic search engine”. Keyword search was then followed by a manual screening of reference lists of relevant primary studies to extend the search space.
Based on the objectives of this systematic review, we have set some inclusion and exclusion criteria to guide the study selection process, as follows. The focus of this review being on recent advances in LBD techniques and approaches, we considered studies carried out during the last five years, that is, since 2015. We only considered peer-reviewed papers published in the English language. Primary studies were included while secondary and tertiary studies, like surveys, systematic reviews and meta analyses were excluded. During an initial screening of studies, we came across papers which describe general LBD techniques without showing their application in the healthcare domain. Such studies were not included, since the objective of this review was to get insights on the different approaches which are more appropriate for specific application areas of LBD. We thus considered papers which describe the use of LBD approaches in a specific application area in the healthcare domain.
The database search was performed on 2 nd February 2021. The keyword search returned 650 results, after applying the filter on year of publication. The manual screening of reference lists of relevant studies returned 12 eligible studies. 8 duplicate studies were identified from the two sources, resulting in 654 studies to screen. After a rigorous screening of the titles and abstracts based on the inclusion and exclusion criteria, 29 studies were pre-selected for the review.
After initial screening based on the inclusion and exclusion criteria, the pre-selected studies were assessed for “quality” in order to integrate more detailed inclusion and exclusion criteria. Based on the research questions, four quality assessment criteria were set as shown in Table Table1. 1 . The possible outcomes for each criteria were “Yes” if the paper met the criteria and “No” if it did not meet the criteria. Two of the quality assessment criteria also had a “Partially” outcome.
Quality Assessment Criteria
Yes: The LBD approach used has been described in detail
Partially: The LBD approach used has been briefly described
No: The LBD approach used has not been described
Yes: There was a discovery
No: No discovery was made
Yes: A concise evaluation was done
Partially: The evaluation was not intensive
No: No evaluation was done
Yes: The study gives insights on research challenges and future directions
No: The study does not give insights on research challenges and future directions
During the quality assessment phase, appropriate scores were given to each pre-selected study. A score of 1 was given for a “Yes” outcome, 0 for a “No” outcome and 0.5 for a “Partially” outcome. Studies which obtained a score of at least 2.5 were included in the final review. This would allow for one “No” and one “Partially” outcome in the outmost scenario. After the quality assessment phase, 23 studies have been selected for the final review, based on the scores obtained. Figure 1 shows the PRISMA flow diagram for the study selection process.