%0 Journal Article
        
%@ 2563-3570
%I JMIR Publications
%V 5
%N 
%P e62747
%T Eco-Evolutionary Drivers of Vibrio parahaemolyticus Sequence Type 3 Expansion: Retrospective Machine Learning Approach
%A Campbell,Amy Marie
%A Hauton,Chris
%A van Aerle,Ronny
%A Martinez-Urtaza,Jaime
%+ Department of Genetics and Microbiology, Autonomous University of Barcelona, Facultat de Biociènces, oficina C3/109, Campus de la UAB, Bellaterra, Barcelona, 08193, Spain, 34 93 581 2729, jaime.martinez.urtaza@uab.cat
%K pathogen expansion
%K climate change
%K machine learning
%K ecology
%K evolution
%K vibrio parahaemolyticus
%K sequencing
%K sequence type 3
%K VpST3
%K genomics
%D 2024

%7 28.11.2024
%9 Original Paper
%J JMIR Bioinform Biotech
%G English
                
%X Background: Environmentally sensitive pathogens exhibit ecological and evolutionary responses to climate change that result in the emergence and global expansion of well-adapted variants. It is imperative to understand the mechanisms that facilitate pathogen emergence and expansion, as well as the drivers behind the mechanisms, to understand and prepare for future pandemic expansions. Objective: The unique, rapid, global expansion of a clonal complex of Vibrio parahaemolyticus (a marine bacterium causing gastroenteritis infections) named Vibrio parahaemolyticus sequence type 3 (VpST3) provides an opportunity to explore the eco-evolutionary drivers of pathogen expansion. Methods: The global expansion of VpST3 was reconstructed using VpST3 genomes, which were then classified into metrics characterizing the stages of this expansion process, indicative of the stages of emergence and establishment. We used machine learning, specifically a random forest classifier, to test a range of ecological and evolutionary drivers for their potential in predicting VpST3 expansion dynamics. Results: We identified a range of evolutionary features, including mutations in the core genome and accessory gene presence, associated with expansion dynamics. A range of random forest classifier approaches were tested to predict expansion classification metrics for each genome. The highest predictive accuracies (ranging from 0.722 to 0.967) were achieved for models using a combined eco-evolutionary approach. While population structure and the difference between introduced and established isolates could be predicted to a high accuracy, our model reported multiple false positives when predicting the success of an introduced isolate, suggesting potential limiting factors not represented in our eco-evolutionary features. Regional models produced for 2 countries reporting the most VpST3 genomes had varying success, reflecting the impacts of class imbalance. Conclusions: These novel insights into evolutionary features and ecological conditions related to the stages of VpST3 expansion showcase the potential of machine learning models using genomic data and will contribute to the future understanding of the eco-evolutionary pathways of climate-sensitive pathogens. 
%M 39607996
%R 10.2196/62747
%U https://bioinform.jmir.org/2024/1/e62747
%U https://doi.org/10.2196/62747
%U http://www.ncbi.nlm.nih.gov/pubmed/39607996

%0 Journal Article
        
%@ 2563-3570
%I JMIR Publications
%V 5
%N 
%P e62752
%T Exploring the Intersection of Schizophrenia, Machine Learning, and Genomics: Scoping Review
%A Hudon,Alexandre
%A Beaudoin,Mélissa
%A Phraxayavong,Kingsada
%A Potvin,Stéphane
%A Dumais,Alexandre
%+ Department of psychiatry and addictology, Université de Montréal, 2900 Edouard Montpetit Blvd, Montréal, QC, H3T 1J4, Canada, 1 514 648 8461, alexandre.dumais@umontreal.ca
%K schizophrenia
%K genomic data
%K machine learning
%K artificial intelligence
%K classification techniques
%K psychiatry
%K mental health
%K genomics
%K predictions
%K ML
%K psychiatric
%K synthesis
%K review methods
%K searches
%K scoping review
%K prediction models
%D 2024

%7 15.11.2024
%9 Review
%J JMIR Bioinform Biotech
%G English
                
%X Background: An increasing body of literature highlights the integration of machine learning with genomic data in psychiatry, particularly for complex mental health disorders such as schizophrenia. These advanced techniques offer promising potential for uncovering various facets of these disorders. A comprehensive review of the current applications of machine learning in conjunction with genomic data within this context can significantly enhance our understanding of the current state of research and its future directions. Objective: This study aims to conduct a systematic scoping review of the use of machine learning algorithms with genomic data in the field of schizophrenia. Methods: To conduct a systematic scoping review, a search was performed in the electronic databases MEDLINE, Web of Science, PsycNet (PsycINFO), and Google Scholar from 2013 to 2024. Studies at the intersection of schizophrenia, genomic data, and machine learning were evaluated. Results: The literature search identified 2437 eligible articles after removing duplicates. Following abstract screening, 143 full-text articles were assessed, and 121 were subsequently excluded. Therefore, 21 studies were thoroughly assessed. Various machine learning algorithms were used in the identified studies, with support vector machines being the most common. The studies notably used genomic data to predict schizophrenia, identify schizophrenia features, discover drugs, classify schizophrenia amongst other mental health disorders, and predict the quality of life of patients. Conclusions: Several high-quality studies were identified. Yet, the application of machine learning with genomic data in the context of schizophrenia remains limited. Future research is essential to further evaluate the portability of these models and to explore their potential clinical applications. 
%M 39546776
%R 10.2196/62752
%U https://bioinform.jmir.org/2024/1/e62752
%U https://doi.org/10.2196/62752
%U http://www.ncbi.nlm.nih.gov/pubmed/39546776

%0 Journal Article
        
%@ 2563-3570
%I JMIR Publications
%V 5
%N 
%P e58357
%T Enhancing Suicide Risk Prediction With Polygenic Scores in Psychiatric Emergency Settings: Prospective Study
%A Lee,Younga Heather
%A Zhang,Yingzhe
%A Kennedy,Chris J
%A Mallard,Travis T
%A Liu,Zhaowen
%A Vu,Phuong Linh
%A Feng,Yen-Chen Anne
%A Ge,Tian
%A Petukhova,Maria V
%A Kessler,Ronald C
%A Nock,Matthew K
%A Smoller,Jordan W
%+ Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, 185 Cambridge St, 6th Floor, Boston, MA, 02114, United States, 1 617 724 0835, jsmoller@mgh.harvard.edu
%K polygenic risk score
%K suicide risk prediction
%K suicide attempt
%K predictive algorithms
%K genomics
%K genotypes
%K electronic health record
%K machine learning
%D 2024

%7 23.10.2024
%9 Original Paper
%J JMIR Bioinform Biotech
%G English
                
%X Background: Despite growing interest in the clinical translation of polygenic risk scores (PRSs), it remains uncertain to what extent genomic information can enhance the prediction of psychiatric outcomes beyond the data collected during clinical visits alone. Objective: This study aimed to assess the clinical utility of incorporating PRSs into a suicide risk prediction model trained on electronic health records (EHRs) and patient-reported surveys among patients admitted to the emergency department. Methods: Study participants were recruited from the psychiatric emergency department at Massachusetts General Hospital. There were 333 adult patients of European ancestry who had high-quality genotype data available through their participation in the Mass General Brigham Biobank. Multiple neuropsychiatric PRSs were added to a previously validated suicide prediction model in a prospective cohort enrolled between February 4, 2015, and March 13, 2017. Data analysis was performed from July 11, 2022, to August 31, 2023. Suicide attempt was defined using diagnostic codes from longitudinal EHRs combined with 6-month follow-up surveys. The clinical risk score for suicide attempt was calculated from an ensemble model trained using an EHR-based suicide risk score and a brief survey, and it was subsequently used to define the baseline model. We generated PRSs for depression, bipolar disorder, schizophrenia, suicide attempt, and externalizing traits using a Bayesian polygenic scoring method for European ancestry participants. Model performance was evaluated using area under the receiver operator curve (AUC), area under the precision-recall curve, and positive predictive values. Results: Of the 333 patients (n=178, 53.5% male; mean age 36.8, SD 13.6 years; n=333, 100% non-Hispanic and n=324, 97.3% self-reported White), 28 (8.4%) had a suicide attempt within 6 months. Adding either the schizophrenia PRS or all PRSs to the baseline model resulted in the numerically highest discrimination (AUC 0.86, 95% CI 0.73-0.99) compared to the baseline model (AUC 0.84, 95% Cl 0.70-0.98). However, the improvement in model performance was not statistically significant. Conclusions: In this study, incorporating genomic information into clinical prediction models for suicide attempt did not improve patient risk stratification. Larger studies that include more diverse participants are required to validate whether the inclusion of psychiatric PRSs in clinical prediction models can enhance the stratification of patients at risk of suicide attempts. 
%M 39442166
%R 10.2196/58357
%U https://bioinform.jmir.org/2024/1/e58357
%U https://doi.org/10.2196/58357
%U http://www.ncbi.nlm.nih.gov/pubmed/39442166

%0 Journal Article
        
%@ 2563-3570
%I JMIR Publications
%V 5
%N 
%P e56538
%T Deep Learning–Based Identification of Tissue of Origin for Carcinomas of Unknown Primary Using MicroRNA Expression: Algorithm Development and Validation
%A Raghu,Ananya
%A Raghu,Anisha
%A Wise,Jillian F
%+ Department of Biology and Biomedical Sciences, Salve Regina University, 100 Ochre Point Avenue, Newport, RI, 02840, United States, 1 401 847 6650 ext 2822, jillian.wise@salve.edu
%K cancer genomics
%K machine learning algorithms
%K deep learning
%K gene expression
%K RNA
%K RNAs
%K cancer
%K oncology
%K tumor
%K tumors
%K tissue
%K tissues
%K metastatic
%K microRNA
%K microRNAs
%K gene
%K genes
%K genomic
%K genomics
%K machine learning
%K algorithm
%K algorithms
%K carcinoma
%K genetics
%K genome
%K detection
%K bioinformatics
%D 2024

%7 24.7.2024
%9 Original Paper
%J JMIR Bioinform Biotech
%G English
                
%X Background: Carcinoma of unknown primary (CUP) is a subset of metastatic cancers in which the primary tissue source of the cancer cells remains unidentified. CUP is the eighth most common malignancy worldwide, accounting for up to 5% of all malignancies. Representing an exceptionally aggressive metastatic cancer, the median survival is approximately 3 to 6 months. The tissue in which cancer arises plays a key role in our understanding of sensitivities to various forms of cell death. Thus, the lack of knowledge on the tissue of origin (TOO) makes it difficult to devise tailored and effective treatments for patients with CUP. Developing quick and clinically implementable methods to identify the TOO of the primary site is crucial in treating patients with CUP. Noncoding RNAs may hold potential for origin identification and provide a robust route to clinical implementation due to their resistance against chemical degradation. Objective: This study aims to investigate the potential of microRNAs, a subset of noncoding RNAs, as highly accurate biomarkers for detecting the TOO through data-driven, machine learning approaches for metastatic cancers. Methods: We used microRNA expression data from The Cancer Genome Atlas data set and assessed various machine learning approaches, from simple classifiers to deep learning approaches. As a test of our classifiers, we evaluated the accuracy on a separate set of 194 primary tumor samples from the Sequence Read Archive. We used permutation feature importance to determine the potential microRNA biomarkers and assessed them with principal component analysis and t-distributed stochastic neighbor embedding visualizations. Results: Our results show that it is possible to design robust classifiers to detect the TOO for metastatic samples on The Cancer Genome Atlas data set, with an accuracy of up to 97% (351/362), which may be used in situations of CUP. Our findings show that deep learning techniques enhance prediction accuracy. We progressed from an initial accuracy prediction of 62.5% (226/362) with decision trees to 93.2% (337/362) with logistic regression, finally achieving 97% (351/362) accuracy using deep learning on metastatic samples. On the Sequence Read Archive validation set, a lower accuracy of 41.2% (77/188) was achieved by the decision tree, while deep learning achieved a higher accuracy of 80.4% (151/188). Notably, our feature importance analysis showed the top 3 most important features for predicting TOO to be microRNA-10b, microRNA-205, and microRNA-196b, which aligns with previous work. Conclusions: Our findings highlight the potential of using machine learning techniques to devise accurate tests for detecting TOO for CUP. Since microRNAs are carried throughout the body via extracellular vesicles secreted from cells, they may serve as key biomarkers for liquid biopsy due to their presence in blood plasma. Our work serves as a foundation toward developing blood-based cancer detection tests based on the presence of microRNA. 
%M 39046787
%R 10.2196/56538
%U https://bioinform.jmir.org/2024/1/e56538
%U https://doi.org/10.2196/56538
%U http://www.ncbi.nlm.nih.gov/pubmed/39046787

%0 Journal Article
        
%@ 2563-3570
%I JMIR Publications
%V 5
%N 
%P e52059
%T Machine Learning Models for Prediction of Maternal Hemorrhage and Transfusion: Model Development Study
%A Ahmadzia,Homa Khorrami
%A Dzienny,Alexa C
%A Bopf,Mike
%A Phillips,Jaclyn M
%A Federspiel,Jerome Jeffrey
%A Amdur,Richard
%A Rice,Madeline Murguia
%A Rodriguez,Laritza
%+ Division of Maternal-Fetal Medicine, Department of Obstetrics and Gynecology, Inova Health System, 3300 Gallows Road, Falls Church, VA, 22042, United States, 1 571 472 0920, homa.ahmadzia@inova.org
%K postpartum hemorrhage
%K machine learning
%K prediction
%K maternal
%K predict
%K predictive
%K bleeding
%K hemorrhage
%K hemorrhaging
%K birth
%K postnatal
%K blood
%K transfusion
%K antepartum
%K obstetric
%K obstetrics
%K women's health
%K gynecology
%K gynecological
%D 2024

%7 5.2.2024
%9 Original Paper
%J JMIR Bioinform Biotech
%G English
                
%X Background: Current postpartum hemorrhage (PPH) risk stratification is based on traditional statistical models or expert opinion. Machine learning could optimize PPH prediction by allowing for more complex modeling. Objective: We sought to improve PPH prediction and compare machine learning and traditional statistical methods. Methods: We developed models using the Consortium for Safe Labor data set (2002-2008) from 12 US hospitals. The primary outcome was a transfusion of blood products or PPH (estimated blood loss of ≥1000 mL). The secondary outcome was a transfusion of any blood product. Fifty antepartum and intrapartum characteristics and hospital characteristics were included. Logistic regression, support vector machines, multilayer perceptron, random forest, and gradient boosting (GB) were used to generate prediction models. The area under the receiver operating characteristic curve (ROC-AUC) and area under the precision/recall curve (PR-AUC) were used to compare performance. Results: Among 228,438 births, 5760 (3.1%) women had a postpartum hemorrhage, 5170 (2.8%) had a transfusion, and 10,344 (5.6%) met the criteria for the transfusion-PPH composite. Models predicting the transfusion-PPH composite using antepartum and intrapartum features had the best positive predictive values, with the GB machine learning model performing best overall (ROC-AUC=0.833, 95% CI 0.828-0.838; PR-AUC=0.210, 95% CI 0.201-0.220). The most predictive features in the GB model predicting the transfusion-PPH composite were the mode of delivery, oxytocin incremental dose for labor (mU/minute), intrapartum tocolytic use, presence of anesthesia nurse, and hospital type. Conclusions: Machine learning offers higher discriminability than logistic regression in predicting PPH. The Consortium for Safe Labor data set may not be optimal for analyzing risk due to strong subgroup effects, which decreases accuracy and limits generalizability. 
%M 38935950
%R 10.2196/52059
%U https://bioinform.jmir.org/2024/1/e52059
%U https://doi.org/10.2196/52059
%U http://www.ncbi.nlm.nih.gov/pubmed/38935950