Use of Artificial Intelligence in the Search for New Information Through Routine Laboratory Tests: Systematic Review

Background In recent decades, the use of artificial intelligence has been widely explored in health care. Similarly, the amount of data generated in the most varied medical processes has practically doubled every year, requiring new methods of analysis and treatment of these data. Mainly aimed at aiding in the diagnosis and prevention of diseases, this precision medicine has shown great potential in different medical disciplines. Laboratory tests, for example, almost always present their results separately as individual values. However, physicians need to analyze a set of results to propose a supposed diagnosis, which leads us to think that sets of laboratory tests may contain more information than those presented separately for each result. In this way, the processes of medical laboratories can be strongly affected by these techniques. Objective In this sense, we sought to identify scientific research that used laboratory tests and machine learning techniques to predict hidden information and diagnose diseases. Methods The methodology adopted used the population, intervention, comparison, and outcomes principle, searching the main engineering and health sciences databases. The search terms were defined based on the list of terms used in the Medical Subject Heading database. Data from this study were presented descriptively and followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses; 2020) statement flow diagram and the National Institutes of Health tool for quality assessment of articles. During the analysis, the inclusion and exclusion criteria were independently applied by 2 authors, with a third author being consulted in cases of disagreement. Results Following the defined requirements, 40 studies presenting good quality in the analysis process were selected and evaluated. We found that, in recent years, there has been a significant increase in the number of works that have used this methodology, mainly because of COVID-19. In general, the studies used machine learning classification models to predict new information, and the most used parameters were data from routine laboratory tests such as the complete blood count. Conclusions Finally, we conclude that laboratory tests, together with machine learning techniques, can predict new tests, thus helping the search for new diagnoses. This process has proved to be advantageous and innovative for medical laboratories. It is making it possible to discover hidden information and propose additional tests, reducing the number of false negatives and helping in the early discovery of unknown diseases.


Background
The large amount of data generated in the last decades has become a great challenge, demanding new forms of analysis and processing of complex and unstructured data, known until now as data mining [1]. The health care domain has great prominence in applying data mining, supporting infection control, epidemiological analysis, treatment and diagnosis of diseases, hospital management, home care, public health administration, and disease management [2]. This predictive analysis is strongly linked to the evolution of artificial intelligence (AI) techniques such as machine learning (ML). These algorithms, able to learn interactively from data, allow systems based on computational intelligence to find information that was initially unknown [3].
Currently, prediction systems [4] and decision-making support have been using web-based medical records and clinical data, analyzing the history of patients to propose models to identify high-risk situations as well as false positives [5]. This precision medicine (in silico) based on electronic health records has gained strength given the possibility of more accessible and efficient treatments aimed at the particular characteristics of each individual. In this sense, Wong et al [6] proposed using ML to structure and organize stored data and for mining and aiding in diagnosis. Similarly, Roy et al [7] used electronic health record data to predict laboratory test results in a pretest.
These works motivated us to study the potential of the use of AI, especially ML techniques, in the area of health.
According to Peek et al [8], in recent decades, there has been a major shift from knowledge-based to data-oriented methods. Analyzing 30 years of publications from the International Conference on Artificial Intelligence in Medicine, an increase in the use of data mining and ML techniques was observed.
In recent years, other reviews have been published presenting the growth and potential of the use of ML methods in the health area. In their review, Rashidi et al [9] addressed the multidisciplinary aspect of this scenario and presented the potential of using ML techniques in data processing in the health area comparing the different methods.
Similarly, Ahmed et al [10] discussed aspects of precision medicine in their review, presenting works with different approaches to the use of ML in addition to discussing ethical aspects and the management of health resources. However, the work by Houfani et al [11] focused on the prediction of diagnoses, presenting an overview of the methods applied in the prediction of diseases.
In their work, Ma et al [12] present aspects of real-world big data studies with a focus on laboratory medicine. In their review, Ma et al [12] highlighted the lack of standardization in clinical laboratories and the difficulty in using data in real time, mainly because of unstructured and unreliable data. However, the potential is emphasized in the use of laboratory data together with aspects such as the establishment of the reference range, quality control based on patient data, analysis of factors that affect analyte test results, establishment of diagnostic and prognostic models, epidemiological investigation, laboratory management, and data mining. All of this is aimed at helping traditional clinical laboratories develop into smart clinical laboratories.
In contrast to the studies presented, this study aimed to analyze studies that used data from laboratory tests together with AI techniques to predict new results.

Study Questions
Clinical laboratories display most test results as individual numerical values. However, the results of these tests, viewed in isolation, are usually of limited significance for reaching a diagnosis.
In their study of ferritin, Luo et al [5] found that laboratory tests often contain redundant information.
Similarly, Gunčar et al [13] found that ML models can predict hematological diseases using only blood tests. In their study, Gunčar et al [13] stated that laboratory tests have more information than health professionals commonly consider.
Demirci et al [14] and Rosenbaum and Baron [15] also used ML techniques to identify possible errors in the clinical process of performing laboratory tests. In both studies, the authors obtained satisfactory results, demonstrating the ability of computational models based on ML to assist in analyzing laboratory tests. Similarly, Baron et al [16] used an algorithm to generate a decision tree capable of identifying tests with possible problems arising from the preanalytical process during the execution of laboratory tests.
The presentation of these works makes us reflect on how much information can be present in a set of laboratory test data and the potential for the exploration and use of such data. Thus, our objective was to identify scientific studies that used laboratory tests and ML models to predict results.
This study had the following specific research questions: (1) Is it possible to predict specific examinations from other examinations? (2) Which examinations are typically used as input data to predict other results? and (3) What methods are used to predict these tests?

Search Strategy
Searches were conducted in 7 electronic databases in international journals in the areas of engineering and health sciences-Compendex (Engineering Village), EBSCO (MEDLINE complete), IEEE Xplore, PubMed (MEDLINE), ScienceDirect, Scopus, and Web of Science-in the English language for publications from April 2011 to February 2022. Additional records were further identified during the screening phase of this research by analyzing the references of the eligible articles included.
The population, intervention, comparison, and outcome principles were used to group the search terms. As this study addressed laboratory tests, 3 principal search terms were considered, and 2 Boolean operators were used (OR and AND): population ("Clinical Laboratory Test" OR "Laboratory Diagnosis" OR "Blood Count, Complete" OR "Routine Diagnostic Test") AND intervention ("Machine Learning") AND outcomes ("Clinical Decision-Making" OR "Computer-Assisted Diagnosis" OR "Predictive Value of Tests").
The search terms were defined based on the list of terms used in the Medical Subject Heading database [17]. The studies were collected from the databases from April 2, 2021, to April 10, 2021; the roots of the words and all the variants of the terms were searched (singular or plural, past tense, gerund, comparative adjective, and superlative, when possible). The following filters were used for the area of activity: medicine, engineering (industrial, biomedical, electrical, manufacturing, and mechanics), robotics, health professions, and multidisciplinary according to the availability in the database.
The following study characteristics were extracted and described: authors' names, year of publication, title, description, data set, features, methods, and main results. The data of this study were presented descriptively and followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement flow diagram [18] and the National Institutes of Health (NIH) Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies [19].

Inclusion and Exclusion Criteria
The criteria for inclusion and exclusion of studies are outlined in Textbox 1.
The search results were exported to the web-based Mendeley software (Elsevier), where duplicates or triplicates were removed, and full texts were extracted after analyzing the possible eligibility of the articles.

•
Use of laboratory tests

Study Analysis
Regarding the eligibility of the studies, the review process involved an analysis of the title keywords and reading of the abstracts by 2 reviewers independently (the first 2 authors of this paper). When in doubt about eligibility, the full text was reviewed. In cases of disagreement between the 2 reviewers, a decision was made by consensus or a third investigator provided an additional review, and the decision was made by arbitration.

Methodological Quality Assessment of the Studies
Regardless of the inclusion and exclusion criteria, which were directly related to the objective of the study, an analysis of the quality of the selected articles was also conducted.
The quality of the eligible studies was assessed using tools proposed by the NIH of the United States [19]. This study included the cross-sectional study assessment tool (with 14 criteria). The NIH website [19] provides tools and guidelines for assessing the quality of each type of study, containing explanatory information about each item that should be assessed in the study: (1) Was the research question or objective in this study clearly stated? (2) Was the study population clearly specified and defined? (3) Was the participation rate of eligible persons at least 50%? (4) Were all the participants selected or recruited from the same or similar populations (including the same period)? Were inclusion and exclusion criteria for being in the study prespecified and applied uniformly to all participants? (5) Was a sample size justification, power description, or variance and effect estimates provided? (6) For the analyses in this study, were the exposures of interest measured before the outcomes were measured? (7) Was the time frame sufficient so that one could reasonably expect to see an association between exposure and outcome if it existed? (8) For exposures that can vary in amount or level, did the study examine different levels of exposure as related to the outcome (eg, categories of exposure or exposure measured as a continuous variable)? (9) Were the exposure measures (independent variables) clearly defined, valid, reliable, and implemented consistently across all study participants? (10) Was the exposure assessed more than once over time? (11) Were the outcome measures (dependent variables) clearly defined, valid, reliable, and implemented consistently across all study participants? (12) Were the outcome assessors blinded to the exposure status of participants? (13) Was loss to follow-up after baseline 20% or less? and (14) Were key potential confounding variables measured and adjusted statistically for their impact on the relationship between exposure and outcome?
The rating quality was classified as good, fair, or bad, allowing for the general analysis of the evaluators considering all items [19]. Each item in the assessment tool received an "✓" rating when the study was performed, a negative ("-") when not performed, and other options (cannot be determined, not applicable, and not reported).
According to Wong et al [20], observational studies with a classification of ≥67% of positive items indicated good quality, 34% to 66% of positive verifications indicated regular quality, and ≤33% indicated low quality.
RF and lassobased regularized generalized linear models and ANN

Age and CBC (FBC) parameters
The data set included in the analysis and training contained anonymized FBC results from 5664 patients seen at the Hospital Israelita Albert Einstein (São Paulo, Brazil) from March 2020 to April 2020 and who had samples collected to perform the SARS-  [43], 2020 The algorithm identified 3% of the population who required an investigation and 35% of patients who received a diagnosis of CRC within the following 6 months.
Validate the ability of an algorithm that uses laboratory and demographic information to identify patients at increased risk of CRC  [54], 2021 The combined WBC-HFLC marker was the best diagnostic marker for both mild and serious disease. CRP and lymphocyte count were early indicators of progression to serious disease, whereas WBC, NEUT bz , IG, and the NLR were the best indicators of critical disease.
Enter binary LR analysis was conducted to determine the influence of the parameters on the outcome.

Principal Findings
This study aimed to identify studies that used laboratory tests to predict new results. Our interest in this line of study was motivated by the possibility that laboratory tests can be used more comprehensively to search for hidden information, discovering previously unknown pathologies. This methodology is highly advantageous for the diagnostic process of medical laboratories. In this sense, intelligent systems could automatically analyze the examinations performed on a patient and make predictions in the search for hidden pathologies. In positive cases, alarms would be generated, and complementary examinations would be suggested. In most cases, the collected sample could be used to carry out new tests.
The use of laboratory tests to predict results has been increasingly explored. In recent years, several studies have obtained good results using clinical data to search for diagnoses [58]. In addition to laboratory tests, the studies in this review used patient histories, imaging tests, and medical diagnoses. For example, Wu et al [59] and Hische et al [60], in addition to laboratory tests, also made use of other clinical data in the search for a diagnosis. Some studies, such as those by Ravaut et al [61] and Le et al [62], aimed to determine whether a patient was likely to develop the disease in the future, which is quite relevant as part of a process in predictive medicine. These studies obtained good results but used clinical or diagnostic data. This information is generated through the analysis by a physician, unlike most laboratory tests such as the complete blood count, which follows an automated analytical process without the intervention of human factors.
However, in this research, we only looked for studies that emphasized laboratory tests to predict new information. This methodology can innovate the diagnostic processes of medical laboratories and has attracted the interest of several researchers over time, especially in recent years owing to the COVID-19 pandemic. In total, we found 40 studies referring to the last decade that met the established criteria, with most studies published in 2020 (15/40, 38%) and 2021 (10/40, 25%).
All (40/40, 100%) the studies presented in this review used laboratory tests as input data in addition to some clinical data such as gender and age. Some (12/40, 30%) studies used >20 parameters, such as the study by Yadaw et al [42], who used >100 different parameters. Others (6/40, 15%) used very few parameters, as is the case of the work by Joshi et al [38], who used only 3 parameters (absolute neutrophil count, absolute lymphocyte count, and hematocrit). However, most (22/40, 55%) studies used approximately 10 parameters, with the complete blood count as the primary data source. Finally, 22% (9/40) of the studies used full blood count data only.
When analyzing the quality assessment tool (Table 1), all studies showed good results, with an average value of 88%. As most of the studies were characterized as retrospective cohort studies, the data used were generated before the research. Thus, questions 8 and 10 of the questionnaire [19], referring to the levels and amount of exposure, were answered mainly with not applicable or cannot be determined. This fact lowered the average slightly in the evaluation process of most (38/40, 95%) studies. However, 5% (2/40) of the studies [29,31] were evaluated with 100%. Another 45% (18/40) of the studies were evaluated with 93%, 32% (13/40) of the studies were evaluated with 86%, and 18% (7/40) of the studies were evaluated with 79%. Table 2 presents a summary of the main characteristics of the studies. In addition to a brief description of the research, it is possible to know the methodology and the main results in a simplified way.
It is not possible to make a comparison between the methodology and results of the selected studies as they had different objectives. Our goal was to confirm the possibility of predicting specific examinations from other examinations and which ML methods and parameters were most used.
Regarding the models, most (39/40, 98%) studies used ML methods with supervised training, almost always aiming at the exam responsible for the diagnosis. Of the 40 studies selected, only 3 (8%) used regression methods, whereas the other 37 (92%) used classification methods. Among the most used models, we can mention logistic regression, random forest, support vector machine, and k-nearest neighbor, trained as binary classifiers. In the case of neural networks, they were almost always used with deep learning techniques (deep neural networks [DNNs]).
The random forest method was the most tested, with 50% (20/40) of the studies using it. The next most tested methods were logistic regression with 45% (18/40) of the studies and support vector machine with 35% (14/40) of the studies, followed by naïve Bayes, decision tree, and XGBoost with 25% (10/40) of the studies each. By contrast, artificial neural networks were tested in 18% (7/40) of the studies, in addition to DNN methods in another 15% (6/40) of the studies.
In general, the most efficient method was the DNN, such that, of the 6 studies that used this method, 5 (83%) had better results with it. Next, there was the XGBoost method, such that, of the 10 studies that used this method, 7 (70%) considered it better, followed by random forest, where, of the 20 studies that tested this method, 12 (60%) had better results with it. In a simplified way, we can say that the DNN method was 83% better than the others, followed by XGBoost (70% better) and random forest (60% better). Although the DNN model presents better results, the random forest method is quite attractive, not only because it is simple and fast but also because it presents the path taken in the search for the result, which is quite relevant in research in the health care domain.
Research that initially caught our attention was conducted by Luo et al [5] to predict ferritin levels to detect patients with anemia. The research used 41 laboratory tests from 989 patients admitted to the tertiary care hospital in Boston, Massachusetts, for 3 months in 2013. The work had good results, with an area under the curve (AUC) of 97%. The most interesting thing is that, even in cases where the ferritin tests were false negatives, the system could detect anemia. This result shows that laboratory tests may have more information when analyzed holistically than when referring to the specific test performed.
Rawson et al [28] used laboratory tests to identify cases of bacterial infection among 160,203 hospitalized patients over 6 months. An interesting feature of this research is that only 6 tests were used as input parameters (C-reactive protein, white blood cell count, bilirubin, creatinine, alanine aminotransferase, and alkaline phosphatase), achieving good results, with an area under the receiver operating characteristic curve of 0.84. The use of a low number of examinations was an important factor in building the model. This situation makes it possible to use tests already performed on patients, making the screening process fast and straightforward without collecting more blood samples from a patient.
Of the selected studies, 8% (3/40) focused on the prediction of colorectal cancer. Colorectal cancer has a high incidence rate, accounting for many deaths worldwide. The early identification of this type of pathology can be very advantageous to governments and health systems, who can provide adequate treatment to prevent the worsening of the disease. Kinar et al [23] obtained good results in identifying patients with a propensity to develop colorectal cancer 1 year before the development of the disease. In this study, 20 parameters from the complete blood count of approximately 2 million patients were used. Similarly, Birks et al [26] used the complete blood count of 2.5 million patients, obtaining an AUC of 75% for more extended periods (3 years) and 85% for shorter periods (6 months). More recently, Schneider et al [44] also obtained a mean AUC of 78% in a study of approximately 2.8 million patients seen between 1996 and 2015.
Another 12% (5/40) of the studies [7,29,32,36,48] aimed to identify tests that would not change over time, remaining classified as normal without the need to be repeated. In general, all of them showed good results; however, we highlight the work by Xu et al [32], who obtained an AUC of >90% for 12 months of analysis.
A recent publication that also caught our attention was the work by Park et al [50]. The authors used deep learning models to predict 39 different diseases in their research, reaching an accuracy of >90% and an F 1 -score of 81% for the 5 most common diseases. They used 88 features from 5145 patients who visited the emergency room.
The use of laboratory tests and ML techniques has increased in recent years, mainly owing to the COVID-19 pandemic. Of the 40 studies in this review, 27 (68%) published between 2020 and 2022 were selected. Of these 27 studies, 19 (70%) studies were related to SARS-CoV-2, a total of 8 (30%) studies were published in 2020, a total of 9 (33%) studies were published in 2021, and 1 (4%) study was published in 2022. All of them used laboratory tests to predict some unknown information, and most (34/40, 85%) studies focused on the search for a diagnosis.
Analyzing aspects related to training and the potential for bias based on the data sets, a common feature among most studies was the fact that 92% (37/40) of them were treated as a classification problem using supervised models. In this process, a point to be considered is the fact that the target classes of the models are almost always defined by a medical diagnosis or a reference value. In class prediction, the results of values close to the classification margins may be affected, influencing the final result of the model.
Another aspect that draws attention is the fact that the data sets were highly unbalanced, with some (3/40, 8%) studies [21,23,26] where the target represented <1% of the data set, implying some care to avoid errors in the training and evaluation process. In this sense, most (34/40, 85%) of the analyzed studies used the area under the receiver operating characteristic curve as the main evaluation metric, with an average value of approximately 85%. Although this metric is quite common in health-related problems, some authors defend [63] the use of the area under the precision-recall curve as the most appropriate metric for strongly unbalanced bases.
Considering the aspects discussed, we question whether, in the search for a diagnosis, it would not be more appropriate to treat the prediction of new tests as a regression problem, leaving the responsibility of decision-making to health professionals.

Limitations
One of the limitations of this study was how the articles were selected, analyzing only the data from the titles, keywords, and abstracts initially reviewed.
Another limitation was the nonuse of studies whose data source consisted of imaging examinations and clinical history and where the objective was not a prediction.
These criteria greatly reduced the number of selected studies. However, our objective was to analyze only studies that had a main focus on the use of laboratory tests. These requirements are fundamental in building models that can automatically analyze test results without affecting the processes of medical laboratories.

Conclusions
In the search for scientific research that used laboratory tests and ML models to predict new information, 40 studies were found that fit the established criteria. Among these, all (40/40, 100%) sought to predict unknown information, with most (34/40, 85%) focused on the search for a diagnosis.
We have seen a large increase in the use of this methodology in recent years, mainly motivated by the COVID-19 pandemic. Of the 40 works selected from 2010 onward, 27 (68%) focused on SARS-CoV-2, published between 2020 and 2022.
All (40/40, 100%) studies used only laboratory tests, and the complete blood count was the most used. The use of routine examinations is encouraged, mainly as they are more frequently performed and have greater availability. Among the prediction methods, most (39/40, 98%) studies used ML models with supervised learning. These techniques have been spreading and obtaining good results over the years, and binary classification models are still the most used, with XGBoost and DNNs being the models with the best results. These models almost always seek to determine the occurrence or not of a specific event, which has proved to be very useful in the triage of hospitalized patients and in the search for a diagnosis.
In general, all the evaluated studies presented good results, making predictions according to the research objective. Responding to the objectives of this work, we conclude that it is possible to predict specific tests from other laboratory tests, with the complete blood count being the most used in the prediction of new results. The most used method was binary classification with supervised learning.
Thus, the use of laboratory tests and ML techniques represents an innovative potential for the process of medical laboratories, allowing for a more comprehensive analysis of the tests performed, enabling the early discovery of unknown pathologies or errors in the tests performed. This automatic analysis is very advantageous as it is low-cost and does not interfere with the processes already established by medical laboratories.