Published on in Vol 6 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/68848, first published .
Lung Cancer Diagnosis From Computed Tomography Images Using Deep Learning Algorithms With Random Pixel Swap Data Augmentation: Algorithm Development and Validation Study

Lung Cancer Diagnosis From Computed Tomography Images Using Deep Learning Algorithms With Random Pixel Swap Data Augmentation: Algorithm Development and Validation Study

Lung Cancer Diagnosis From Computed Tomography Images Using Deep Learning Algorithms With Random Pixel Swap Data Augmentation: Algorithm Development and Validation Study

Authors of this article:

Ayomide Adeyemi Abe1 Author Orcid Image ;   Mpumelelo Nyathi1 Author Orcid Image

Department of Medical Physics, Sefako Makgatho Health Science University, Molotlegi St, Zone 1, Garankuwa, Pretoria, South Africa

Corresponding Author:

Ayomide Adeyemi Abe, PhD


Background: Deep learning (DL) shows promise for automated lung cancer diagnosis, but limited clinical data can restrict performance. While data augmentation (DA) helps, existing methods struggle with chest computed tomography (CT) scans across diverse DL architectures.

Objective: This study proposes Random Pixel Swap (RPS), a novel DA technique, to enhance diagnostic performance in both convolutional neural networks and transformers for lung cancer diagnosis from CT scan images.

Methods: RPS generates augmented data by randomly swapping pixels within patient CT scan images. We evaluated it on ResNet, MobileNet, Vision Transformer, and Swin Transformer models, using 2 public CT datasets (Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases [IQ-OTH/NCCD] dataset and chest CT scan images dataset), and measured accuracy and area under the receiver operating characteristic curve (AUROC). Statistical significance was assessed via paired t tests.

Results: The RPS outperformed state-of-the-art DA methods (Cutout, Random Erasing, MixUp, and CutMix), achieving 97.56% accuracy and 98.61% AUROC on the IQ-OTH/NCCD dataset and 97.78% accuracy and 99.46% AUROC on the chest CT scan images dataset. While traditional augmentation approaches (flipping and rotation) remained effective, RPS complemented them, surpassing the performance findings in prior studies and demonstrating the potential of artificial intelligence for early lung cancer detection.

Conclusions: The RPS technique enhances convolutional neural network and transformer models, enabling more accurate automated lung cancer detection from CT scan images.

JMIR Bioinform Biotech 2025;6:e68848

doi:10.2196/68848

Keywords



Background

Lung cancer is a lethal disease characterized by uncontrolled cell growth in the lungs [1]. These malignant cells can proliferate, invade nearby tissues, and metastasize to other parts of the body [2]. The disease progresses through distinct stages, with advanced stages often proving fatal [3]. Lung cancer comprises multiple histological types and subtypes, affecting individuals regardless of gender [4]. Globally, lung cancer remains the leading cause of cancer-related mortality [5]. In 2020 alone, it accounted for 1.8 million deaths, ranking as the 6th leading cause of death worldwide among individuals younger than 70 years [2]. A key contributor to this high mortality is the frequent absence of early symptoms, leading to late-stage diagnosis and poorer outcomes [6]. The 5-year survival rate for lung cancer patients remains low, emphasizing the critical need for early detection [7]. Early diagnosis significantly improves prognosis, reduces long-term treatment costs, expands therapeutic options, and alleviates the burden on caregivers and families [1,8-10]. However, most cases are still detected at advanced stages, drastically limiting survival rates [5]. These challenges underscore lung cancer as a major public health priority.

Computed tomography (CT) is a medical imaging technique that produces high-resolution cross-sectional images of the lungs, providing detailed anatomical information for clinical evaluation [11]. As a noninvasive diagnostic tool, CT imaging has become indispensable for the early detection of lung cancer, offering superior sensitivity compared to conventional radiography [12,13]. However, the interpretation of CT scans presents significant challenges in clinical practice. The process demands considerable expertise from radiologists, as subtle early-stage malignancies may demonstrate imaging features that escape human detection, potentially leading to diagnostic oversights [14,15]. The subjective nature of image interpretation introduces variability in diagnostic accuracy among practitioners, which can result in false-positive identification of pulmonary nodules. Such errors may prompt unnecessary invasive procedures for confirmation, exposing patients to avoidable risks and health care systems to additional costs [13]. Furthermore, the comprehensive evaluation of CT examinations is particularly demanding, as each study comprises numerous sequential slices, requiring both individual assessment and integrated analysis. This labor-intensive process frequently overwhelms available radiological resources, contributing to diagnostic delays and extended patient waiting periods [15-17]. To address these limitations, computer-assisted diagnostic systems have been developed to augment radiologists’ interpretive capabilities [18]. These automated solutions employ advanced algorithms to analyze CT images, enhancing diagnostic accuracy while improving workflow efficiency [19]. By integrating such technological advancements into clinical practice, health care providers can mitigate the current challenges associated with manual CT interpretation, ultimately improving patient outcomes through more timely and reliable diagnoses.

The application of computer algorithms for the automated early diagnosis of lung cancer from CT scan images has evolved considerably. Early approaches used radiomics and machine learning techniques, but recent advancements have established deep learning (DL) as the predominant methodology [20]. Unlike traditional methods that depend on manually engineered features, a process prone to bias and time constraints, DL employs artificial neural networks to autonomously extract sophisticated features through training [21]. Among DL architectures, both convolutional neural networks (CNNs) and Vision Transformers have demonstrated exceptional potential for the early detection of lung cancer [22]. CNNs gained prominence after 2012, while Vision Transformers emerged in 2020 [23], with both now leading innovations in automated CT scan analysis [18,19].

CNNs and transformers offer distinct advantages for medical image analysis. CNNs, with their inductive bias for spatial locality and translation invariance, benefit from a simpler, parameter-efficient architecture rooted in spatial priors, which is highly effective and easier to train on smaller datasets [24,25]. They specialize in extracting local features and understanding spatial relationships between adjacent pixels. In contrast, transformers excel at capturing long-range dependencies across the entire image [26]. Vision Transformers are particularly scalable, maintaining image resolution better than CNNs during processing [27]. Their parallel processing capability also enables faster training times compared to similarly complex CNNs [28], although they typically require larger training datasets to achieve comparable performance [29]. Recent developments have seen the rise of hybrid networks that combine CNN and transformer architectures, successfully integrating both local and global feature extraction to overcome the limitations of standalone approaches [30,31].

Despite their capabilities, DL models face significant data-related challenges. While these architectures proficiently automate nodule detection, classification, and segmentation in CT scans [32], they demand extensive training data to outperform radiologist interpretations [33]. The scarcity of annotated medical CT datasets presents a major constraint [34], as creating such datasets requires time-consuming, expert-driven image labeling [35]. Data augmentation (DA) has emerged as a crucial solution to expand dataset size and diversity [36], enhancing both the quantity and quality of available training samples [37]. However, selecting appropriate DA techniques for chest CT analysis remains challenging due to several factors, including the variable effectiveness of methods across different datasets and domains [38], potential label distortions and crucial information loss caused by certain transformations [39], and current limitations in improving performance for both CNN and transformer architectures [37,40]. To address these challenges, this study proposes the Random Pixel Swap (RPS) augmentation method, specifically designed to enhance the generalization capabilities of both architectural paradigms in lung cancer diagnosis from chest CT scan images.

Related Work

The effectiveness of DA in training large neural networks was first conclusively demonstrated in 2012 [41], sparking the development of numerous innovative techniques [37]. These methods primarily fall into 2 categories: data synthesis and data transformation [36]. Data synthesis techniques generate novel samples that maintain statistical similarity to the original training data, while data transformation techniques create variations by modifying existing training samples. Both approaches effectively increase training dataset size, quality, and diversity, although they differ significantly in implementation. Data synthesis typically requires parameter learning, a process that can prove computationally intensive and often demands substantial training data to achieve optimal results [42]. In contrast, data transformation techniques generally avoid parameter learning and consequently require less computational resources. Traditional data transformation methods include fundamental image manipulations such as flipping, rotation, cropping, translation, and photometric adjustments (modifications to brightness, saturation, contrast, and hue) [36]. More sophisticated approaches like Cutout [43], Random Erasing [44], MixUp [45], and CutMix [46] have subsequently emerged, achieving state-of-the-art performance across various domains. These advanced techniques have been employed in lung cancer diagnosis from CT scan images [47-49].

The following section provides a comprehensive examination of the Cutout, Random Erasing, MixUp, and CutMix techniques, analyzing their limitations in medical imaging applications and contrasting them with the proposed RPS method. This comparative analysis establishes the foundation rationale for developing specialized augmentation approaches optimized for medical image analysis challenges.

Cutout DA Technique

The Cutout technique randomly selects square regions within images and masks their pixel values [43]. While effective for improving model robustness against occlusions in natural images, this approach presents significant limitations for medical CT scans. The method’s potential to eliminate critical diagnostic information (such as cancerous regions) may degrade performance [38]. Additionally, the masking process can inadvertently alter image labels, further limiting effectiveness [39]. Unlike Cutout, our RPS approach avoids information loss. It preserves diagnostic information by replacing masked regions with pixel values that are derived from other areas within the same CT scan while maintaining original labels.

Random Erasing DA Technique

Random Erasing extends Cutout’s functionality by supporting both square and rectangular masks of varying sizes [44]. This technique randomly selects image regions for erasure and replaces them with random pixel values. While the variable mask sizes increase dataset diversity compared to Cutout, the method still suffers from information loss and label alteration issues [36,40]. These limitations are particularly problematic for medical imaging, where preserving anatomical content is crucial.

MixUp DA Technique

MixUp generates new samples through linear interpolation of pixel values and labels from 2 distinct images [45]. This approach enhances model generalization by preventing label memorization and improving adversarial robustness. However, the technique’s potential to blur important anatomical boundaries and the requirement of careful hyperparameter tuning can create a bottleneck in medical contexts [47,48]. Furthermore, its convergence speed is often suboptimal [47]. RPS addresses these limitations by operating within single patient scans rather than mixing data across patients and employs a single hyperparameter for more efficient training.

CutMix DA Technique

CutMix combines aspects of previous methods by cutting patches from one image and pasting them onto another while proportionally blending labels [46]. Although this approach leverages the benefits from both Cutout and MixUp, the label blending can introduce noise that degrades model performance [50]. For medical CT scans, combining patches from different patients may confuse learning models, particularly when dealing with subtle pathological features [51]. RPS overcomes these challenges by performing pixel swaps exclusively within individual patient scans and preserving original labels without blending. Figure 1 visually contrasts these techniques with the proposed RPS method.

Figure 1. Computed tomography images for various data augmentation techniques. (A) MixUp; (B) CutMix; (C) Cutout; (D) Random Erasing; (E) Random Pixel Swap. The original image is in column 1, while the augmented images are in columns 2, 3, 4, and 5.

RPS DA Technique

The RPS technique is a parameter-free DA algorithm that operates with a predefined transformation probability. This method partitions input images into 2 distinct regions that serve as source and target areas for patch selection and swapping operations. The study proposes 4 specific implementation approaches, designated as RPSH (vertical), RPSW (horizontal), RPSU (upper right diagonal), and RPSD (upper left diagonal) swap configurations, as illustrated in Figure 2. This multidirectional swapping mechanism provides several advantages: it generates diverse transformations within individual patient CT scans while maintaining pathological plausibility, introduces meaningful variability in the training dataset without requiring parameter learning, and preserves all critical diagnostic information by operating exclusively within each scan’s original pixel values. The technique’s ability to produce multiple distinct transformations from a single image significantly enhances dataset diversity while avoiding the label alteration and information loss issues associated with other augmentation methods.

Figure 2. Four possible swap approaches for the Random Pixel Swap (RPS) data augmentation technique. I is the original image. Areas A1 and A2 are the swap regions. RPSH (vertical), RPSW (horizontal), RPSU (upper right diagonal), and RPSD (upper left diagonal) are the possible swap configurations.

RPS possesses distinct invariant properties compared to other techniques. For an image with N pixels and Li intensity levels, the RPS transformation preserves global intensity, as shown in Equations (1)-(3). The technique employs a controlled, systematic, random patch-based pixel swap, rather than a random point-based pixel swap, ensuring that image content is preserved. This approach generates meaningful variations while maintaining pathological truth, thereby retaining clinical relevance in the context of lung cancer diagnosis.

X=T(X)(1)

where T is permutation transform

p=n(i)N=n(i)N;i=0,1,2,...,L1(2)
Ig=i=0L1iN=i=0L1iN(3)

where P is the probability of a pixel having intensity i; n(i),n(i) is the number of pixels with intensity level iXX, respectively; N,N is the total number of pixels in XX, respectively; L is the intensity level; and Ig is the average global intensity.

Implementation of RPS

The RPS technique is implemented by first randomly selecting 2 coordinate points (x₁, x₂) along the x-axis and 2 points (y₁, y₂) along the y-axis within the input image. These coordinates define 2 equal subswap regions: region X bounded by swap area A1:(x₁, y₁) and (x₂, y₂), and As2 bounded by swap area A2:(x₁, y₁)′ and (x₂, y₂)′. The method incorporates a key hyperparameter called the swap area factor Sf, which ranges from 0.1 to 1.0, to control the extent of augmentation. The actual swap areas Sa1 and Sa2 are derived by scaling the subswap regions using this factor, as specified in Equations (4) and (5). During the augmentation process, the contents of swap area Sa1 are cropped and pasted into swap area Sa2 while simultaneously transferring the contents of swap area Sa2 to swap area Sa1. This bidirectional swapping ensures comprehensive data transformation while preserving all original image information. The complete RPS procedure is formally described in Textbox 1.

Sa1=As1Sf(4)
Sa2=As2Sf(5)
Textbox 1. Algorithm 1: Random Pixel Swap data augmentation procedure.

Input: data X; with shape H×W

Output: Augmented data X

1: A112(HW)

2: Init: All points P within A1

3: SfSf ϵ ℚ : Sf ϵ [0.1, 1.0]

4: forPi,PjϵP,do

5: Randomly select Pi , Pj ,

Pi ,Pj = Pi2 , Pj2

6: As1 = Area (Pi , Pj )

7: As2 = Area (Pi , Pj )

8: Sa1 = As1 * Sf

9: Sa2 = As2 * Sf

10: X* ←Replace Sa1 with Sa2 in X and Sa2 with Sa1 in X

11: end for

12: returnX*

Swap Area Factor

The swap area factor Sf is a crucial parameter in the RPS technique, representing the ratio between the subswap region and the total swap area as described in Equation (6). This factor plays a vital role in the augmentation process for two key reasons: (1) it allows customization for different DL architectures that may benefit from varying swap region sizes, and (2) it helps maintain clinical relevance by limiting distortion of diagnostically important anatomical features. The study proposes two distinct implementations of this parameter: (1) single-value swap area factor (SVSF), which applies a fixed value throughout the augmentation process, and (2) multivalue swap area factor (MVSF), which uses multiple values to generate more diverse swap areas. In both implementations, the swap area factor operates within a defined range of 0.1 to 1.0, providing controlled flexibility for different medical imaging scenarios.

Sf=AsSa(6)

Experimental Validation of the RPS Technique

We conducted comprehensive experiments to validate the effectiveness of the proposed RPS technique in enhancing DL model performance across both CNN and transformer architectures. For our evaluation, we selected 4 established models: ResNet-34 [52], MobileNetV3 (small variant) [53], Vision Transformer (base-16) [23], and Swin Transformer (tiny version) [29], all initialized with preactivated weights. These architectures were chosen based on three key criteria: (1) public availability for reproducible benchmarking, (2) widespread adoption in methodological comparisons [29,48], and (3) efficient training characteristics due to their relatively fewer trainable parameters compared to larger variants.

Our experimental design incorporated three key comparisons: (1) models trained without any augmentation, (2) models trained with RPS augmentation, and (3) models trained with 4 state-of-the-art DA techniques (Cutout [43], Random Erasing [44], MixUp [45], and CutMix [46]). These comparison techniques were selected because they represent current best practices in parameter-free augmentation methods that share conceptual similarities with RPS [48]. We evaluated all models using two key metrics: (1) classification accuracy and (2) area under the receiver operating characteristic curve (AUROC), providing a comprehensive assessment of both overall performance and diagnostic discrimination capability.

Experimental Setup and Implementation

All experiments were conducted using Python 3.12.2 (Python Software Foundation) and PyTorch 2.2.2+ cu118 (PyTorch Foundation) within Jupyter Notebook 7.0.8 (IPython Project), running on an NVIDIA Quadro RTX 3000 GPU (Nvidia Corporation). We adopted the AdamW optimizer with a cross-entropy loss function, using a batch size of 16. The StepLR scheduler was configured with a step size of 10 and a gamma value of 0.5 [52]. Models were trained for 50 epochs, as additional training resulted in overfitting and performance degradation. After evaluating various learning rates, we selected 1×10⁻⁴ as it yielded optimal results. Image normalization was applied with mean and SD values of 0.5 to enhance training stability and accelerate convergence [53].

For RPS implementation, we used a swap area factor of 1.0 with an augmentation probability of 1.0 for all experiments. CNN models processed images at 512×512 and 224×224 resolutions, while transformer architectures used 224×224 resolution due to the Vision Transformer’s input size limitations. Although the Swin Transformer supports 512×512 inputs, we maintained a consistent 224×224 resolution across all transformer experiments for fair comparison. All experiments were conducted with a random seed of 42 after verifying consistent performance patterns across 3 different seeds.

Statistical Analysis

To evaluate our hypothesis that an effective DA technique should perform consistently across both CNN and transformer architectures, we treated each technique as an independent variable and considered model performance as the dependent variable. We used paired sample t tests [54] to assess significant differences between techniques, considering P values <.05 as statistically significant.

For comprehensive technique comparison, we implemented a ranking system based on cumulative scores C (Equations (7) and (8)), where higher scores received lower rank numbers R. This approach enabled holistic performance benchmarking across all models and architectures.

C=model=1nmodel(A+AUROC)(7)
R1,R2,R3,,Rm+1=C1,C2,C3,,Cm+1 C1>C2>C3,,>Cm+1(8)

where C is cumulative score, R is rank, m is the total number of data augmentation techniques, n is the total number of selected models, A is accuracy, and AUROC is the area under the receiver operating characteristic curve.

Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases Dataset

The Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases (IQ-OTH/NCCD) dataset contains 1097 JPEG CT images collected from 110 patients [35]. These images were obtained using a SOMATOM Siemens scanner (Siemens Healthineers) and encompass a diverse range of demographic characteristics. The dataset is organized into 3 categories: normal scans, benign tumor scans, and malignant tumor scans. Specifically, it includes 15 cases of benign tumors, totaling 120 images; 40 cases of malignant tumors, totaling 416 images; and 55 cases of normal findings, totaling 561 images. Each image has a resolution of 512×512 pixels. We divided the images in a ratio of 7:3 for training and testing.

Chest CT Scan Images Dataset

The chest CT scan images dataset contains 1000 lung CT scans from patients diagnosed with 3 different types of lung cancers, as well as scans from healthy individuals, all in JPG format [55]. The lung cancer types included in the dataset are adenocarcinoma, squamous cell carcinoma, and large cell carcinoma. The images are organized into training, testing, and validation sets for each lung cancer category.

Ethical Considerations

Ethics approval was obtained from the Sefako Makgatho University Research Committee (ethics reference number: SMUREC/M/12/2022:PG).


Average Training Time Overhead

To evaluate the computational impact of the RPS technique, we measured training duration for 4 architectures (ResNet-34, MobileNetV3 [small], Vision Transformer [base-16], and Swin Transformer [tiny]) with and without RPS implementation. The training time overhead was calculated as the difference between augmented and nonaugmented training times. Experiments were conducted on both the IQ-OTH/NCCD and chest CT scan datasets using 224×224 image resolution, with results averaged across 3 independent runs for reliability.

Our analysis included a comparative assessment of 4 established DA techniques: Cutout, Random Erasing, MixUp, and CutMix. Results demonstrated that while RPS increased training times across all models compared to nonaugmented training, this increase was not statistically significant (P=.07). Similarly, comparisons between RPS and other DA techniques revealed no statistically significant differences in computational overhead (Cutout: P=.06; Random Erasing: P=.17; MixUp: P=.49; CutMix: P=.16). Among all evaluated methods, RPS showed the highest training time overhead, followed sequentially by MixUp, CutMix, Random Erasing, and Cutout. Complete results are presented in Figure 3.

Figure 3. Average training time overhead of 5 data augmentation techniques across 4 deep learning models.

Performance Comparison of RPS With State-of-the-Art DA Techniques for Lung Cancer Detection

To evaluate pulmonary nodule detection in chest CT scan images, the selected CNN and transformer models (ResNet-34, MobileNetV3 [small], Vision Transformer [base-16], and Swin Transformer [tiny]) were trained on the IQ-OTH/NCCD dataset to classify the scan images as normal or containing benign or malignant pulmonary nodules. Experimental results demonstrated that RPS significantly enhanced performance across all 4 architectures (P=.008). The MobileNetV3 model achieved particular success when combined with RPS using 512×512 image resolution, reaching a peak classification accuracy of 94.21%, representing a 1.22% accuracy improvement and 0.86% AUROC increase over the baseline model.

At 224×224 image resolution, our comprehensive comparison of RPS against the 4 established DA methods (Cutout: P=.03; Random Erasing: P=.008; MixUp: P=.02; CutMix: P=.02) revealed consistent superiority of the RPS technique (P<.05). For ResNet-34, RPS exceeded CutMix (the best alternative) by 2.44% and Random Erasing (the least effective) by 5.49% in accuracy. MobileNetV3 showed a 0.3% improvement over Cutout (best alternative) and 1.83% over MixUp (least effective) in accuracy. Transformer architectures demonstrated even more pronounced benefits: Vision Transformer with RPS outperformed Random Erasing by 1.52% and MixUp by 16.77%, while Swin Transformer showed a 1.53% improvement over MixUp and 4.57% over Cutout in accuracy. Across all architectures, performance ranking was as follows: (1) RPS (best technique), (2) Random Erasing, (3) CutMix, (4) MixUp, and (5) Cutout. The detailed results are presented in Table 1.

Table 1. Classification results of the IQ-OTH/NCCDa dataset using preactivated deep learning models with various data augmentation techniques (224×224 image resolution).
Data augmentationRankbResNet-34MobileNetV3 (small)Vision Transformer (base-16)Swin Transformer (tiny)
Accuracy, %AUROCc, %Accuracy, %AUROC, %Accuracy, %AUROC, %Accuracy, %AUROC, %
Base modeld685.9883.3986.5993.1657.6264.8885.6789.06
Cutoutd585.6786.1189.3392.9557.0163.4585.3793.68
Random Erasingd282.6291.2388.7290.1071.6575.41e86.8991.42
MixUpd484.4591.0587.8086.5756.4068.5588.4192.51
CutMixd,f385.6788.3488.4193.0268.9069.6888.4192.05
Random Pixel Swapf188.11e93.70e89.63e93.80e73.17e74.6489.94e94.79e

aIQ-OTH/NCCD: Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases.

bRank represents the overall rating for each technique, with “1” indicating the best technique across all models.

cAUROC: area under the receiver operating characteristic curve.

dSignificant difference between an augmentation technique and the Random Pixel Swap technique across all models.

eHighest value in the column.

fSignificant difference between training using an augmentation technique and the base model across all models.

At 512×512 image resolution, ResNet-34 exhibited nuanced performance differences between augmentation techniques: while CutMix achieved a marginal 0.31% higher accuracy than RPS, RPS demonstrated significantly superior diagnostic capability with a 5.31% improvement in AUROC. Furthermore, RPS outperformed the least effective technique (Random Erasing) by 2.13% in accuracy and 3.17% in AUROC. For MobileNetV3, RPS dominated all comparative techniques in both accuracy and AUROC, except for a 1.23% AUROC advantage by CutMix. Specifically, RPS exceeded Cutout (the best alternative technique) by 0.61% and surpassed MixUp (the least effective) by 4.58% in accuracy. Across all evaluated methods, the overall performance ranking was as follows: (1) RPS (best technique), (2) Cutout, (3) CutMix, (4) MixUp, and (5) Random Erasing. The detailed results are presented in Table 2.

Table 2. Classification results of the IQ-OTH/NCCDa dataset using preactivated deep learning models with various data augmentation techniques (512×512 image resolution).
Data augmentationRankbResNet-34MobileNetV3 (small)
Accuracy, %AUROCc, %Accuracy, %AUROC, %
Base modeld688.7278.5192.9994.81
Cutoutd290.2493.2593.6095.42
Random Erasingd589.9491.8590.8592.19
MixUpd489.9496.13e89.6395.18
CutMix392.38e89.7192.6896.90e
Random Pixel Swap192.0795.0294.21e95.67

aIQ-OTH/NCCD: Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases.

bRank represents the overall rating for each technique, with “1” indicating the best technique across all models.

cAUROC: area under the receiver operating characteristic curve.

dSignificant difference between an augmentation technique and the Random Pixel Swap technique across all models.

eHighest value in the column.

Performance Comparison of RPS With State-of-the-Art DA Techniques for Lung Cancer Classification From CT Scan Images Using DL Architectures

We evaluated the effectiveness of the RPS technique for lung cancer classification using the chest CT scan images dataset across multiple DL architectures. The experimental results demonstrated that RPS significantly enhanced classification performance for all architectures (P=.008). RPS combined with ResNet-34 at 512×512 image resolution achieved optimal performance, reaching 97.78% accuracy and 99.46% AUROC.

At 224×224 image resolution, RPS consistently outperformed competing techniques across most models (Cutout: P=.001; Random Erasing: P=.02; MixUp: P=.047; CutMix: P=.18). For ResNet-34, RPS exceeded CutMix (the best alternative) by 0.64% and Random Erasing (the least effective) by 5.08% in accuracy. MobileNetV3 showed even greater improvements over other methods, with RPS surpassing CutMix by 3.49% and MixUp by 9.21% in accuracy. For the implementation with Vision Transformer, RPS surpassed Random Erasing (the best alternative) by 1.91% and MixUp (the least effective) by 18.85% in accuracy. While CutMix showed a 2.22% accuracy advantage over RPS for the Swin Transformer, RPS maintained superior performance against all other techniques, exceeding Cutout by 7.3% (the least effective). Across all architectures, the overall performance ranking was as follows: (1) RPS (best technique), (2) CutMix, (3) Random Erasing, (4) Cutout, and (5) MixUp. The detailed results are presented in Table 3.

Table 3. Classification results of the chest CTa scan images dataset using preactivated deep learning models with various data augmentation techniques (224×224 image resolution).
Data augmentationRankbResNet-34MobileNetV3 (small)Vision Transformer (base-16)Swin Transformer (tiny)
Accuracy, %AUROCc, %Accuracy, %AUROC, %Accuracy, %AUROC, %Accuracy, %AUROC, %
Base modeld593.3399.0087.3097.0982.8695.8484.7696.92
Cutoutd,e493.0298.9485.7197.6280.6394.3584.1396.05
Random Erasingd390.4898.5488.8997.4584.7696.72f88.2597.28
MixUpd691.4398.5783.4996.8567.8286.9790.7997.87
CutMix294.9298.6989.2197.8076.8292.6093.65f98.74f
Random Pixel Swape195.56f99.15f92.70f98.02f86.67f96.3291.4398.45

aCT: computed tomography.

bRank represents the overall rating for each technique, with “1” indicating the best technique across all models.

cAUROC: area under the receiver operating characteristic curve.

dSignificant difference between an augmentation technique and the Random Pixel Swap technique across all models.

eSignificant difference between training using an augmentation technique and the base model across all models.

fHighest value in the column.

At 512×512 image resolution, the RPS technique demonstrated superior performance compared to all evaluated DA methods (Cutout: P=.13; Random Erasing: P=.27; MixUp: P=.13; CutMix: P=.31). For ResNet-34, RPS matched the accuracy of the top-performing alternative (CutMix) while achieving a 0.21% improvement in AUROC. Furthermore, RPS showed significant gains over the least effective technique (MixUp), with a 7.74% accuracy performance advantage. The MobileNetV3 architecture exhibited even more pronounced benefits, where RPS outperformed CutMix (the best alternative) by 2.23% and surpassed MixUp by 4.45% in accuracy. Across all techniques, the performance ranking was as follows: (1) RPS (best technique), (2) CutMix, (3) Cutout, (4) Random Erasing, and (5) MixUp. The detailed results are presented in Table 4.

Table 4. Classification results of the chest CTa scan images dataset using preactivated deep learning models with various data augmentation techniques (512×512 image resolution).
Data augmentationRankbResNet-34MobileNetV3 (small)
Accuracy, %AUROCc, %Accuracy, %AUROC, %
Base model596.8399.2593.0298.27
Cutout396.5199.3594.6098.39
Random Erasing496.8399.4293.6598.82d
MixUp692.3898.6492.3898.51
CutMix297.78d99.2594.6098.61
Random Pixel Swap197.78d99.46d96.83d98.75

aCT: computed tomography.

bRank represents the overall rating for each technique, with “1” indicating the best technique across all models.

cAUROC: area under the receiver operating characteristic curve.

dHighest value in the column.

Performance Analysis of Swap Area Factors for Lung Cancer Diagnosis

The swap area factor serves as a critical hyperparameter in RPS implementation. We systematically evaluated its influence using both SVSF and MVSF configurations across the 0.1 to 1.0 range on the IQ-OTH/NCCD dataset. MVSF provides over 100 possible combinations of lower and upper bounds (eg, 0.1‐0.5 and 0.4‐0.8); however, our experimental configurations maintained a fixed lower bound of 0.1. Experimental results revealed distinct optimal configurations for each architecture. For SVSF implementations, ResNet-34, Vision Transformer, and Swin Transformer achieved peak performance at 1.0, while MobileNetV3 performed best at 0.9. For MVSF implementations, ResNet-34 showed optimal results within 0.1‐0.9, MobileNetV3 performed best at 0.1‐0.7, Vision Transformer excelled at 0.1‐0.3, and Swin Transformer achieved peak performance at 0.1‐0.5.

Comparative analysis demonstrated that SVSF generally outperformed MVSF configurations for a fixed 0.1 lower bound across most architectures, with the notable exception of ResNet-34. For this model, MVSF (0.1‐0.9) surpassed SVSF (1.0) by 0.61% in accuracy and 1.08% in AUROC. The most effective overall configuration combined MobileNetV3 with RPS using an SVSF of 0.9, achieving 94.51% accuracy and 95.77% AUROC. The detailed results are presented in Table 5.

Table 5. Analysis of the IQ-OTH/NCCDa dataset using different deep learning architectures and Random Pixel Swap data augmentation with single-value and multivalue swap area factors (224×224 image resolution).
Swap factorResNet-34MobileNetV3 (small)Vision Transformer (base-16)Swin Transformer (tiny)
Accuracy, %AUROCb, %Accuracy, %AUROC, %Accuracy, %AUROC, %Accuracy, %AUROC, %
Single value
0.189.0292.2693.9095.1064.0274.97c86.2892.02
0.291.1694.6292.9995.0460.9863.2083.2393.59
0.390.8593.4892.0795.3864.0267.9589.0291.63
0.489.6392.6692.9995.2469.2172.5984.7692.81
0.590.5590.9692.6895.0168.6072.4787.8084.13
0.690.5594.7692.9995.2469.2172.3684.7689.63
0.791.4695.2392.6895.1267.9966.4883.5490.65
0.889.6392.2293.6095.6067.9969.8389.94c95.95c
0.990.8594.4294.51c95.77c71.6572.7983.8490.99
1.092.07c95.02c94.2195.6773.17c74.6489.94c94.79
Multivalue
0.1‐0.290.5593.8093.90c94.8366.1669.1485.9894.29
0.1‐0.389.3390.5393.2995.1872.56c77.93c84.7691.84
0.1‐0.489.3390.1893.6095.0360.9873.1587.5094.30
0.1‐0.591.1695.93c93.6094.8459.1568.8088.11c94.94c
0.1‐0.690.5593.2692.3894.6062.2059.8687.2093.45
0.1‐0.789.9490.9993.90c95.3361.2870.7388.11c92.53
0.1‐0.887.8093.1393.2995.0066.7776.8186.2885.93
0.1‐0.992.68c95.2993.6095.2968.2964.9886.5992.69
0.1‐1.089.0292.5393.6095.71c62.8069.0583.5494.35

aIQ-OTH/NCCD: Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases.

bAUROC: area under the receiver operating characteristic curve.

cHighest value in the column.

Our evaluation of the chest CT scan images dataset using different swap area factor configurations revealed architecture-specific optimal settings. SVSF demonstrated superior performance at 1.0 for both ResNet-34 and MobileNetV3, while Vision Transformer achieved peak accuracy with an SVSF of 0.1. For Swin Transformer, MVSF configurations between 0.1 and 0.6 yielded optimal results. Among all tested combinations, ResNet-34 paired with RPS using an SVSF of 1.0 delivered the highest classification performance, reaching 97.78% accuracy and 99.46% AUROC. The detailed results are presented in Table 6.

Table 6. Analysis of the chest CTa scan images dataset using different deep learning architectures and Random Pixel Swap data augmentation with single-value and multivalue swap area factors (224×224 image resolution).
Swap factorResNet-34MobileNetV3 (small)Vision Transformer (base-16)Swin Transformer (tiny)
Accuracy, %AUROCb, %Accuracy, %AUROC, %Accuracy, %AUROC, %Accuracy, %AUROC, %
Single value
0.196.1999.1394.6098.5586.67c96.3294.29c98.72
0.297.4699.2794.6098.6081.2794.3792.0698.78c
0.396.1999.2295.2498.6178.7393.9893.6598.65
0.497.4699.2095.2498.6582.5496.1092.0698.46
0.597.1499.4194.9298.7485.0896.3191.4398.38
0.697.1499.3095.5698.79c85.4096.86c91.7597.85
0.797.1499.1995.5698.6983.8195.4893.6598.65
0.897.1499.3895.8798.7581.5995.4891.4397.95
0.996.8399.3595.8798.6281.2794.2588.8997.80
1.097.78c99.46c96.83c98.7575.5691.6291.4398.45
Multivalue
0.1‐0.297.1499.2794.9298.5975.5692.8593.6598.51
0.1‐0.396.5199.3093.9798.6284.1395.8691.4398.20
0.1‐0.496.5199.0094.6098.5576.8392.7793.6598.73
0.1‐0.597.4699.2894.2998.6380.3294.6992.3898.64
0.1‐0.696.5199.3795.2498.5982.5495.6396.19c98.90c
0.1‐0.797.78c99.3994.9298.6586.03c96.84c94.2998.90c
0.1‐0.897.4699.2593.9798.6281.9094.8893.0298.83
0.1‐0.997.4899.3294.9298.7081.9094.9993.6598.86
0.1‐1.097.4699.41c95.87c98.75c82.2295.2193.3398.72

aCT: computed tomography.

bAUROC: area under the receiver operating characteristic curve.

cHighest value in the column.

RPS With Lung Region of Interest Segmentation

Prior studies have demonstrated that segmenting lung regions of interest (ROIs) can significantly improve the diagnostic performance of DL models [33,56]. To evaluate the effectiveness of the RPS technique when applied to segmented images, we conducted experiments using the selected models (ResNet-34, MobileNetV3 [small], Vision Transformer [base-16], and Swin Transformer [tiny]). Our investigation used both the IQ-OTH/NCCD dataset and chest CT scan images dataset at 224×224 resolution.

The segmentation process involved multiple steps. We first applied a threshold algorithm to generate a lung mask, followed by dilation and hole-filling operations to ensure comprehensive coverage of pulmonary structures. The final lung ROI was extracted by cropping surrounding pixels along the mask boundaries. The complete procedure is illustrated in Figure 4. For comparative analysis, we evaluated model performance under three conditions: (1) training without augmentation, (2) training with RPS, and (3) training with established augmentation techniques (Cutout, Random Erasing, MixUp, and CutMix). This comprehensive evaluation framework allowed us to assess the relative benefits of RPS when applied to segmented lung images.

Figure 4. Lung segmentation procedure.

Our experiments with the IQ-OTH/NCCD dataset demonstrated that the RPS technique significantly improved performance across all evaluated models (P=.04) and most techniques (Cutout: P=.049; Random Erasing: P=.004; MixUp: P=.04; CutMix: P=.06). The most notable results were achieved by ResNet-34 with RPS, reaching 97.56% accuracy and 98.61% AUROC. While RPS outperformed all competing techniques for MobileNetV3 and Swin Transformer, CutMix showed superior performance for Vision Transformer, exceeding RPS by 1.52% in accuracy and 0.67% in AUROC. The overall performance ranking across techniques was as follows: (1) RPS (best technique), (2) CutMix, (3) Random Erasing, (4) Cutout, and (5) MixUp.

For the chest CT scan images dataset, the RPS technique consistently improved performance across models (P=.06) and most techniques (Cutout: P=.01; Random Erasing: P=.009; MixUp: P=.01; CutMix: P=.38). The highest performance was again achieved by ResNet-34 with RPS (95.51% accuracy and 98.86% AUROC). While RPS showed superior results for MobileNetV3 and Swin Transformer, CutMix performed better for Vision Transformer (3.21% higher accuracy and 0.6% higher AUROC). The comprehensive performance ranking was similar to that for the IQ-OTH/NCCD dataset and was as follows: (1) RPS, (2) CutMix, (3) Cutout, (4) Random Erasing, and (5) MixUp. The detailed results are presented in Table 7.

Table 7. Classification results of the IQ-OTH/NCCDa and chest CTb scan images datasets using preactivated deep learning models with various data augmentation techniques and segmentation of the lung region of interest (224×224 image resolution).
Data augmentationRankcResNet-34MobileNetV3 (small)Vision Transformer (base-16)Swin Transformer (tiny)
Accuracy, %AUROCd, %Accuracy, %AUROC, %Accuracy, %AUROC, %Accuracy, %AUROC, %
IQ-OTH/NCCD dataset
Base modele596.6599.1395.4397.2889.9496.5193.6098.21
Cutoute496.0498.8695.4396.3392.3896.2093.9097.80
Random Erasinge395.7397.4596.65f97.2991.4696.2794.82f98.00
MixUpe,g695.8799.19f91.7797.1191.7796.2793.2997.52
CutMix296.6598.8694.5196.3993.90f97.64f93.2997.52
Random Pixel Swapg197.56f98.6196.65f98.00f92.3896.9794.82f98.12f
Chest CT scan images dataset
Base model295.1999.0387.8296.83f82.6995.4890.7198.11
Cutoute494.5598.8588.1497.6680.7793.8688.1497.32
Random Erasinge,g594.5598.7586.5496.5279.8189.7286.8697.16
MixUpe,g694.5598.7782.0595.3378.8593.2985.9097.10
CutMix395.1999.05f86.5496.8986.86f96.43f87.8296.73
Random Pixel Swap195.51f98.8690.71f97.5183.6595.8391.35f98.36f

aIQ-OTH/NCCD: Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases.

bCT: computed tomography.

cRank represents the overall rating for each technique, with “1” indicating the best technique across all models.

dAUROC: area under the receiver operating characteristic curve.

eSignificant difference between an augmentation technique and the Random Pixel Swap technique across all models.

fHighest value in the column.

gSignificant difference between training using an augmentation technique and the base model across all models.

Performance Analysis of the Combination of RPS With Traditional DA Techniques for Lung Cancer Diagnosis

Traditionally, DA techniques, including image flipping and rotation, are widely employed in medical image analysis with DL [44]. To evaluate the potential benefits of combining these methods with the RPS technique, we conducted a systematic comparison. First, we trained selected models (ResNet-34, MobileNetV3 [small], Vision Transformer [base-16], and Swin Transformer [tiny]) using individual traditional techniques: horizontal flipping, vertical flipping, and random rotation (±90°). Subsequently, we trained the models using combinations of each traditional technique with RPS.

Our experiments revealed that the combination of RPS with traditional techniques generally enhanced model performance compared to using traditional methods alone. However, when a traditional technique failed to improve baseline performance, its combination with RPS did not surpass RPS alone. For the IQ-OTH/NCCD dataset, using RPS alone surpassed the individual traditional techniques (horizontal flipping: P=.63; vertical flipping: P=.22; rotation: P=.93). RPS with rotation achieved peak performance for ResNet-34 and Vision Transformer (base-16), improving upon rotation alone by 2.14% and 2.75% in accuracy, respectively. RPS with vertical flipping performed the best for MobileNetV3 (small), exceeding vertical flipping alone by 0.61% in accuracy. However, RPS alone showed superior results for Swin Transformer (tiny).

Similarly, for the chest CT scan images dataset, using RPS alone surpassed the individual traditional techniques (horizontal flipping: P=.01; vertical flipping: P=.03; rotation: P=.04). RPS with rotation demonstrated the strongest overall performance, improving upon rotation by 0.95% in accuracy. RPS with horizontal flipping achieved optimal results for Vision Transformer (base-16), surpassing horizontal flipping alone by 5.71% in accuracy. However, RPS alone outperformed all combinations for MobileNetV3 (small) and Swin Transformer (tiny). The detailed results are presented in Table 8.

Table 8. Classification results of the IQ-OTH/NCCDa and chest CTb scan images datasets using preactivated deep learning models when 3 traditional data augmentation techniques are combined with the Random Pixel Swap data augmentation technique (224×224 image resolution).
Data augmentationRankcResNet-34MobileNetV3 (small)Vision Transformer (base-16)Swin Transformer (tiny)
Accuracy, %AUROCd, %Accuracy, %AUROC, %Accuracy, %AUROC, %Accuracy, %AUROC, %
IQ-OTH/NCCD dataset
Base modele885.9883.3986.5993.1657.6264.8885.6789.06
Horizontal flip582.0186.7088.1193.3673.7886.2087.5091.86
Horizontal flip with Random Pixel Swap287.2090.4388.4191.2178.3689.31f88.7292.12
Vertical flipg787.5089.8290.5593.89f62.5076.5489.0292.41
Vertical flip with Random Pixel Swape,g688.1189.5491.16f93.2868.2973.9488.7293.10
Rotationg487.8090.3089.6391.5375.9180.7088.4192.67
Rotation with Random Pixel Swapg189.94f90.2889.0291.7578.66f87.1689.0293.10
Random Pixel Swapg388.1193.70f89.6393.8073.1774.6489.94f94.79f
Chest CT scan images dataset
Base modele893.3399.0087.3097.0982.8695.8484.7696.92
Horizontal flipe791.4398.5687.6297.3782.8695.8391.7598.33
Horizontal flip with Random Pixel Swapg493.9798.9689.2197.4888.57f97.63f92.7098.09
Vertical flipe,g693.3398.8683.8196.8384.7296.1092.0698.58
Vertical flip with Random Pixel Swape,g
593.9798.8186.6797.1384.7696.2391.4398.04
Rotatione,g
395.2499.2290.1697.5884.5794.9595.8799.03
Rotation with Random Pixel Swapg
196.19f99.24f91.7597.7385.2395.1096.1998.99
Random Pixel Swapg
295.5699.1592.70f98.02f86.6796.3296.19f98.90f

aIQ-OTH/NCCD: Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases.

bCT: computed tomography.

cRank represents the overall rating for each technique, with “1” indicating the best technique across all models.

dAUROC: area under the receiver operating characteristic curve.

eSignificant difference between an augmentation technique and the Random Pixel Swap technique across all models.

fHighest value in the column.

gSignificant difference between training using an augmentation technique and the base model across all models.

Validation Results of the Generalization Capabilities of the RPS Technique

Enhancing the generalization ability of DL models to unseen data represents a critical objective of DA [46]. To evaluate the RPS technique’s capacity to improve model generalization, we conducted experiments using the selected models (ResNet-18, MobileNetV3 [small], Vision Transformer [base-16], and Swin Transformer [tiny]). Models were trained on the IQ-OTH/NCCD dataset and validated on the chest CT scan images dataset (distinct collections acquired using different imaging equipment, protocols, time periods, and geographical locations). All models performed binary classification (cancerous vs normal) of CT images.

Our comparative analysis included the base models, RPS implementation, and selected standard DA techniques. The results demonstrated RPS’s superior performance across all architectures (Cutout: P=.05; Random Erasing: P=.054; MixUp: P=.04; CutMix: P=.03), with an exception for the Vision Transformer implementation. Random Erasing showed a marginal 0.8% accuracy advantage over RPS. However, RPS maintained a significant 9.28% improvement in AUROC over Random Erasing. Furthermore, the cumulative ranking was as follows: (1) RPS (best technique), (2) Cutout, (3) CutMix, (4) Random Erasing, and (5) MixUp. The detailed results are presented in Table 9.

Table 9. Validation results of the generalization capabilities of different data augmentation techniques for lung cancer diagnosis using deep learning (224×224 image resolution).
Data augmentationRankaResNet-34MobileNetV3 (small)Vision Transformer (base-16)Swin Transformer (tiny)
Accuracy, %AUROCb, %Accuracy, %AUROC, %Accuracy, %AUROC, %Accuracy, %AUROC, %
Base modelc382.5384.3391.2490.0379.2963.6092.2295.22d
Cutoutc282.6597.2992.0990.7081.8063.4992.2285.77
Random Erasinge588.7178.6691.4589.6682.65d58.9291.9685.96
MixUpc683.8095.1791.5890.3680.7452.2491.5879.73
CutMixc484.5794.3690.6980.2681.1266.9092.0986.09
Random Pixel Swape190.69d97.48d92.35d93.30d81.8568.20d92.35d95.04

aRank represents the overall rating for each technique, with “1” indicating the best technique across all models.

bAUROC: area under the receiver operating characteristic curve.

cSignificant difference between an augmentation technique and the Random Pixel Swap technique across all models.

dHighest value in the column.

eSignificant difference between training using an augmentation technique and the base model across all models.

Comparison With Prior Work

Our experimental results demonstrated improvements over the results of previous studies using both the IQ-OTH/NCCD and chest CT scan images datasets. For the IQ-OTH/NCCD dataset, our approach achieved a 7.67% performance improvement over a machine learning technique in the study by Kareem et al [57], a 4.76% improvement over an ensemble of VGG-16, ResNet-50, InceptionV3, and EfficientNetB7 models in the study by Solyman et al [58], and a 2.13% enhancement over an ensemble of 3 custom CNNs in the study by Abe et al [59]. Similarly, for the chest CT scan images dataset, our method showed a 5.78% improvement over a 3-layer custom CNN in the study by Mamun et al [60] and a 2.22% improvement over a 5-layer CNN with a custom Mavage Pooling layer in the study by Abe et al [47]. The comparative results are detailed in Table 10.

Table 10. Comparison of our study results with the results of previous studies on the analysis of the IQ-OTH/NCCDa and chest CTb scan images datasets.
Dataset and studyAccuracy, %Number of classes
IQ-OTH/NCCD
Kareem et al [57]89.893
Solyman et al [58]92.803
Abe et al [59]95.433
Our study97.563
Chest CT scan images
Mamun et al [60]92.004
Abe et al [47]95.564
Our study97.784

aIQ-OTH/NCCD: Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases.

bCT: computed tomography.


Principal Findings

The experimental results of the study demonstrated that the RPS DA technique significantly enhanced the diagnostic performance of both CNN and transformer architectures for lung cancer diagnosis from CT scan images. Our comprehensive evaluation demonstrated that RPS consistently outperformed 4 established augmentation methods (CutMix, Random Erasing, MixUp, and Cutout) across multiple performance metrics and diverse experimental conditions. The superior efficacy of RPS stems from its unique capacity to preserve critical anatomical content while generating clinically meaningful variations through controlled intraimage pixel swapping. This characteristic makes RPS particularly valuable for medical imaging applications where maintaining content integrity is essential for an accurate diagnosis.

For CNN architectures, specifically ResNet-34, RPS yielded remarkable performance improvements. ResNet-34 achieved peak accuracies of 97.56% for the IQ-OTH/NCCD dataset and 97.78% for the chest CT scan images dataset, with corresponding AUROC scores of 98.61% and 99.46%, respectively, at 512×512 image resolution. The technique’s effectiveness with MobileNetV3 (96.65% accuracy and 98.0% AUROC for the IQ-OTH/NCCD dataset; 96.83% accuracy and 98.75% AUROC for the chest CT scan images dataset) is particularly notable given this model’s lightweight architecture, suggesting RPS’s potential for deployment in resource-constrained clinical settings where efficient models are often preferred [56]. The study results represent a substantial advancement over conventional augmentation approaches, as RPS effectively addresses the inherent limitation of CNNs in capturing global relationships by creating localized variations that enhance feature learning while preserving diagnostically relevant image features.

The transformer-based architectures (Vision Transformer and Swin Transformer) showed particularly notable improvements when augmented with RPS. While transformer models conventionally demand large-scale training datasets to achieve peak performance, RPS effectively compensated for data limitations by generating variations that preserved the overall image content for proper attention mechanism functioning. For the Vision Transformer, RPS augmentation significantly enhanced performance, reaching 92.38% accuracy and 96.93% AUROC on the IQ-OTH/NCCD dataset and 86.67% accuracy and 96.32% AUROC on the chest CT scan images dataset. The Swin Transformer demonstrated robust performance gains, achieving 94.82% accuracy and 98.12% AUROC on the IQ-OTH/NCCD dataset and 96.19% accuracy and 98.90% AUROC on the chest CT scan images dataset when enhanced with RPS. The study results showed that RPS enables transformer models to develop more robust and clinically relevant feature representations, even with limited training data.

Our comparative analysis revealed RPS’s consistent dominance across evaluation metrics and experimental conditions. While CutMix showed marginal advantages in specific scenarios (notably a 0.31% accuracy improvement with ResNet-34 at 512×512 image resolution), RPS maintained substantially better AUROC scores (5.31% higher in the same comparison), indicating more reliable diagnostic discrimination capability. This performance pattern held true across both the IQ-OTH/NCCD and chest CT scan images datasets, with RPS consistently ranking the highest in our comprehensive evaluation framework. Importantly, while conventional augmentation techniques sometimes degraded model performance in certain scenarios [38,40], RPS demonstrated universal performance enhancement across all tested conditions. Three fundamental characteristics explain RPS’s exceptional effectiveness. The first characteristic is anatomical content preservation. Unlike methods that erase or mix image regions, RPS maintains all original diagnostic information while creating realistic variations through a controlled, systematic, random patch-based pixel swap within carefully defined ROIs. This approach preserves the clinical relevance of training samples while providing valuable data diversity. The second characteristic is architecture agnostic adaptability. The technique’s parameter-free implementation and tunable swap area factor enable optimal performance across diverse model architectures without requiring architecture-specific adjustments. This flexibility makes RPS particularly valuable for medical imaging research, where multiple architectures may be explored. The third characteristic is clinical pathological relevance. By restricting pixel swaps to anatomically plausible regions within lung tissue (especially when combined with ROI segmentation), RPS enhances the learning of pathological features that may appear anywhere in the pulmonary anatomy, a crucial capability given the unpredictable spatial distribution of malignant nodules in many cancer cases [61].

Validation experiments using independently acquired datasets with different scanning protocols and equipment configurations demonstrated RPS’s superior generalization capabilities. The technique achieved these results while adding minimal computational overhead (statistically insignificant increases in training time, P>.05), making it practical for real-world clinical implementation. Furthermore, RPS showed excellent compatibility with conventional augmentation methods, providing additional performance gains when combined with rotation and flipping operations, which suggests easy integration into existing medical image processing pipelines.

These findings offer significant implications for the development of computer-aided diagnosis systems. RPS directly addresses two fundamental challenges in medical AI: (1) the scarcity of annotated medical imaging data and (2) the limited generalizability of many models across different clinical settings [23]. By consistently outperforming current state-of-the-art techniques while maintaining computational efficiency, RPS emerges as a versatile solution suitable for both research investigations and clinical deployment. Additionally, the technique’s effectiveness suggests promising applications in educational settings for training radiologists, where realistic image variations could enhance learning without requiring additional patient scans.

Conclusions

The findings of this study demonstrate that RPS is a robust and versatile DA technique that significantly enhances the performance of both CNN and transformer architectures for lung cancer diagnosis from CT scan images. By preserving anatomical content while introducing meaningful variability, RPS outperforms existing augmentation methods across multiple metrics and datasets, achieving improved accuracy and AUROC scores. Its computational efficiency, adaptability to diverse architectures, and ability to improve generalization make it particularly valuable for medical imaging applications where data scarcity and model reliability are critical challenges. RPS not only advances the technical frontier of DA but also holds immediate promise for improving computer-aided diagnosis systems in clinical practice. Future work will explore its extension to other medical imaging modalities (magnetic resonance, ultrasound, and x-ray imaging) and extension to 3D applications.

Acknowledgments

This research was funded by the Department of Science and Innovation–Council for Scientific and Industrial Research (DSI-CSIR) Inter-Bursary Support (IBS) Programme.

Data Availability

The data supporting the findings of this study are available upon reasonable request from the corresponding author.

The Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases (IQ-OTH/NCCD) dataset is available on Mendeley Data [35]. The chest computed tomography (CT) scan images dataset is available on Kaggle [55]. The code is available on GitHub [62].

Authors' Contributions

Conceptualization: AAA

Data curation: AAA

Formal analysis: AAA

Funding acquisition: AAA

Investigation: AAA

Methodology: AAA

Project administration: NM

Resources: AAA

Software: AAA

Supervision: NM

Validation: AAA

Visualization: AAA

Writing – original draft: AAA

Writing – review & editing: NM

All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest

None declared.

  1. Yoon SM, Shaikh T, Hallman M. Therapeutic management options for stage III non-small cell lung cancer. World J Clin Oncol. Feb 10, 2017;8(1):1-20. [CrossRef] [Medline]
  2. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA A Cancer J Clinicians. Nov 2018;68(6):394-424. [CrossRef]
  3. Schabath MB, Cote ML. Cancer progress and priorities: lung cancer. Cancer Epidemiol Biomarkers Prev. Oct 2019;28(10):1563-1579. [CrossRef] [Medline]
  4. Travis WD, Brambilla E, Noguchi M, et al. International association for the study of lung cancer/american thoracic society/european respiratory society international multidisciplinary classification of lung adenocarcinoma. J Thorac Oncol. Feb 2011;6(2):244-285. [CrossRef] [Medline]
  5. Shin HJ, Kim MS, Kho BG, et al. Delayed diagnosis of lung cancer due to misdiagnosis as worsening of sarcoidosis: a case report. BMC Pulm Med. Mar 21, 2020;20(1):71. [CrossRef] [Medline]
  6. Ruano-Raviña A, Provencio M, Calvo de Juan V, et al. Lung cancer symptoms at diagnosis: results of a nationwide registry study. ESMO Open. Nov 2020;5(6):e001021. [CrossRef] [Medline]
  7. Del Ciello A, Franchi P, Contegiacomo A, Cicchetti G, Bonomo L, Larici AR. Missed lung cancer: when, where, and why? Diagn Interv Radiol. 2017;23(2):118-126. [CrossRef] [Medline]
  8. Blandin Knight S, Crosbie PA, Balata H, Chudziak J, Hussell T, Dive C. Progress and prospects of early detection in lung cancer. Open Biol. Sep 2017;7(9):170070. [CrossRef] [Medline]
  9. Meza R, Jeon J, Toumazis I, et al. Evaluation of the benefits and harms of lung cancer screening with low-dose computed tomography: modeling study for the US preventive services task force. JAMA. Mar 9, 2021;325(10):988-997. [CrossRef] [Medline]
  10. Fitzgerald RC, Antoniou AC, Fruk L, Rosenfeld N. The future of early cancer detection. Nat Med. Apr 2022;28(4):666-677. [CrossRef] [Medline]
  11. Abujudeh HH, Boland GW, Kaewlai R, et al. Abdominal and pelvic computed tomography (CT) interpretation: discrepancy rates among experienced radiologists. Eur Radiol. Aug 2010;20(8):1952-1957. [CrossRef] [Medline]
  12. Hoffman RM, Atallah RP, Struble RD, Badgett RG. Lung cancer screening with low-dose CT: a meta-analysis. J Gen Intern Med. Oct 2020;35(10):3015-3025. [CrossRef] [Medline]
  13. Bonney A, Malouf R, Marchal C, et al. Impact of low-dose computed tomography (LDCT) screening on lung cancer-related mortality. Cochrane Database Syst Rev. Aug 3, 2022;8(8):CD013829. [CrossRef] [Medline]
  14. Swensen SJ, Jett JR, Hartman TE, et al. Lung cancer screening with CT: Mayo Clinic experience. Radiology. Mar 2003;226(3):756-761. [CrossRef] [Medline]
  15. Silvestri GA, Goldman L, Tanner NT, et al. Outcomes from more than 1 million people screened for lung cancer with low-dose CT imaging. Chest. Jul 2023;164(1):241-251. [CrossRef] [Medline]
  16. Krupinski EA, Berbaum KS, Caldwell RT, Schartz KM, Kim J. Long radiology workdays reduce detection and accommodation accuracy. J Am Coll Radiol. Sep 2010;7(9):698-704. [CrossRef] [Medline]
  17. Jacobsen MM, Silverstein SC, Quinn M, et al. Timeliness of access to lung cancer diagnosis and treatment: a scoping literature review. Lung Cancer (Auckl). Oct 2017;112:156-164. [CrossRef] [Medline]
  18. Sathyakumar K, Munoz M, Singh J, Hussain N, Babu BA. Automated lung cancer detection using artificial intelligence (AI) deep convolutional neural networks: a narrative literature review. Cureus. Aug 25, 2020;12(8):e10017. [CrossRef] [Medline]
  19. Huang S, Yang J, Shen N, Xu Q, Zhao Q. Artificial intelligence in lung cancer diagnosis and prognosis: current application and future perspective. Semin Cancer Biol. Feb 2023;89:30-37. [CrossRef] [Medline]
  20. Marentakis P, Karaiskos P, Kouloulias V, et al. Lung cancer histology classification from CT images based on radiomics and deep learning models. Med Biol Eng Comput. Jan 2021;59(1):215-226. [CrossRef] [Medline]
  21. Buduma N, Buduma N, Papa J. Fundamentals of Deep Learning. 2nd ed. O’Reilly Media, Inc; 2022. ISBN: 9781492082170
  22. Forte GC, Altmayer S, Silva RF, et al. Deep learning algorithms for diagnosis of lung cancer: a systematic review and meta-analysis. Cancers (Basel). Aug 9, 2022;14(16):3856. [CrossRef] [Medline]
  23. Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: transformers for image recognition at scale. arXiv. 2020. URL: https://arxiv.org/abs/2010.11929 [Accessed 2025-08-22]
  24. Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278-2324. [CrossRef]
  25. Lu K, Xu Y, Yang Y. Comparison of the potential between transformer and CNN in image classification. Presented at: 2nd International Conference on Machine Learning and Computer Application; Dec 17-19, 2021; Shenyang, China. URL: https://ieeexplore.ieee.org/document/9736894 [Accessed 2025-08-22]
  26. Wang H, Liu Z, Ai T. Long-range dependencies learning based on non-local 1D-convolutional neural network for rolling bearing fault diagnosis. J Dyn Monit Diagn. Apr 12, 2022;(3):148-159. [CrossRef]
  27. Zhai X, Kolesnikov A, Houlsby N, Beyer L. Scaling vision transformers. Presented at: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); Jun 18-24, 2022; New Orleans, LA. [CrossRef]
  28. Matsoukas C, Haslum JF, Söderberg M, Smith K. Is it time to replace CNNs with transformers for medical images? arXiv. 2021. URL: https://arxiv.org/abs/2108.09038 [Accessed 2025-08-22]
  29. Liu Z, Lin Y, Cao Y, et al. Swin transformer: hierarchical vision transformer using shifted windows. Presented at: 2021 IEEE/CVF International Conference on Computer Vision (ICCV); Oct 10-17, 2021; Montreal, QC, Canada. [CrossRef]
  30. Liu D, Liu F, Tie Y, Qi L, Wang F. Res-trans networks for lung nodule classification. Int J Comput Assist Radiol Surg. Jun 2022;17(6):1059-1068. [CrossRef] [Medline]
  31. Nejad RR, Hooshmand S. HViT4Lung: hybrid vision transformers augmented by transfer learning to enhance lung cancer diagnosis. Presented at: 5th International Conference on Bio-engineering for Smart Technologies (BioSMART); Jun 7-9, 2023; Paris, France. [CrossRef]
  32. Atmakuru A, Chakraborty S, Faust O, et al. Deep learning in radiology for lung cancer diagnostics: a systematic review of classification, segmentation, and predictive modeling techniques. Expert Syst Appl. Dec 2024;255:124665. [CrossRef]
  33. Santos C, Papa JP. Avoiding overfitting: a survey on regularization methods for convolutional neural networks. ACM Comput Surv. Jan 31, 2022;54(10s):1-25. [CrossRef]
  34. Razzak MI, Naz S, Zaib A. Deep learning for medical image processing: overview, challenges and the future. In: Dey N, Ashour A, Borra S, editors. Classification in BioApps Lecture Notes in Computational Vision and Biomechanics. Springer; 2018:323-350. [CrossRef]
  35. Alyasriy H. The IQ-OTHNCCD lung cancer dataset. Mendeley Data. 2020. URL: https://data.mendeley.com/datasets/bhmdr45bh2/1 [Accessed 2025-08-22]
  36. Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. J Big Data. Dec 2019;6(1):1-48. [CrossRef]
  37. Chlap P, Min H, Vandenberg N, Dowling J, Holloway L, Haworth A. A review of medical image data augmentation techniques for deep learning applications. J Med Imaging Radiat Oncol. Aug 2021;65(5):545-563. [CrossRef] [Medline]
  38. Lin CH, Kaushik C, Dyer EL, Muthukumar V. The good, the bad and the ugly sides of data augmentation: an implicit spectral regularization perspective. J Mach Learn Res. 2024;25:1-85. URL: https://jmlr.org/papers/volume25/22-1312/22-1312.pdf [Accessed 2025-08-22]
  39. Maharana K, Mondal S, Nemade B. A review: data pre-processing and data augmentation techniques. Global Transitions Proceedings. Jun 2022;3(1):91-99. [CrossRef]
  40. Balestriero R, Bottou L, LeCun Y. The effects of regularization and data augmentation are class dependent. Presented at: 36th International Conference on Neural Information Processing Systems; Nov 28 to Dec 9, 2022; New Orleans, LA. [CrossRef]
  41. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM. 2012;60(6):84-90. [CrossRef]
  42. Goodfellow IJ, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. Presented at: 28th International Conference on Neural Information Processing Systems; Dec 8-13, 2014; Montreal, Canada. [CrossRef]
  43. DeVries T, Taylor GW. Improved regularization of convolutional neural networks with cutout. arXiv. 2017. URL: https://arxiv.org/abs/1708.04552 [Accessed 2025-08-22]
  44. Zhong Z, Zheng L, Kang G, Li S, Yang Y. Random erasing data augmentation. Proc AAAI Conf Artif Intell. Apr 3, 2020;34(7):13001-13008. [CrossRef]
  45. Zhang H, Cisse M, Dauphin YN, Lopez-Paz D. Mixup: beyond empirical risk minimization. arXiv. 2017. URL: https://arxiv.org/abs/1710.09412 [Accessed 2025-08-22]
  46. Yun S, Han D, Chun S, Oh SJ, Yoo Y, Choe J. CutMix: regularization strategy to train strong classifiers with localizable features. Presented at: 2019 IEEE/CVF International Conference on Computer Vision (ICCV); Oct 27 to Nov 2, 2019; Seoul, Korea (South). [CrossRef]
  47. Abe A, Nyathi M, Okunade A. Lung cancer diagnosis from computed tomography scans using convolutional neural network architecture with Mavage pooling technique. AIMS Med Sci. 2025;12(1):13-27. [CrossRef]
  48. Jiang Y, Manem VSK. Data augmented lung cancer prediction framework using the nested case control NLST cohort. Front Oncol. Feb 25, 2025;15:1492758. [CrossRef]
  49. Jin X, Zhu H, Li S, et al. A survey on mixup augmentations and beyond. arXiv. 2024. URL: https://arxiv.org/abs/2409.05202 [Accessed 2025-08-22]
  50. He J, Liu B, Yang X. Non-local patch mixup for unsupervised domain adaptation. Presented at: 2022 IEEE International Conference on Data Mining (ICDM); Nov 28 to Dec 1, 2022; Orlando, FL. [CrossRef]
  51. Oh J, Yun C. Provable benefit of Cutout and CutMix for feature learning. arXiv. 2024. URL: https://arxiv.org/abs/2410.23672 [Accessed 2025-08-22]
  52. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Presented at: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Jun 27-30, 2016; Las Vegas, NV. [CrossRef]
  53. Howard A, Sandler M, Chen B, et al. Searching for mobilenetv3. Presented at: 2019 IEEE/CVF International Conference on Computer Vision (ICCV); Oct 27 to Nov 2, 2019; Seoul, Korea (South). [CrossRef]
  54. Paszke A, Gross S, Massa F, et al. PyTorch: an imperative style, high-performance deep learning library. Presented at: 33rd International Conference on Neural Information Processing Systems; Dec 8-14, 2019; Vancouver, BC, Canada. [CrossRef]
  55. Chest CT-scan images dataset. Kaggle. URL: https://www.kaggle.com/datasets/mohamedhanyyy/chest-ctscan-images [Accessed 2025-08-22]
  56. Setio AAA, Traverso A, de Bel T, et al. Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the LUNA16 challenge. Med Image Anal. Dec 2017;42:1-13. [CrossRef] [Medline]
  57. Kareem HF, AL-Huseiny MS, Y. Mohsen F, A. Khalil E, S. Hassan Z. Evaluation of SVM performance in the detection of lung cancer in marked CT scan dataset. Indones J Electr Eng Comput Sci. Mar 2021;21(3):1731. [CrossRef]
  58. Solyman S, Schwenker F. Lung tumor detection and recognition using deep convolutional neural networks. In: Girma Debelee T, Ibenthal A, Schwenker F, editors. Pan-African Conference on Artificial Intelligence. 2023:79-91. [CrossRef]
  59. Abe AA, Nyathi M, Okunade AA, Pilloy W, Kgole B, Nyakale N. A robust deep learning algorithm for lung cancer detection from computed tomography images. Intelligence-Based Medicine. 2025;11:100203. [CrossRef]
  60. Mamun M, Mahmud MI, Meherin M, Abdelgawad A. LCDctCNN: lung cancer diagnosis of CT scan images using CNN based model. Presented at: 10th International Conference on Signal Processing and Integrated Networks (SPIN); Mar 23-24, 2023; Noida, India. [CrossRef]
  61. Ali H, Mohsen F, Shah Z. Improving diagnosis and prognosis of lung cancer using vision transformers: a scoping review. BMC Med Imaging. Sep 15, 2023;23(1):129. [CrossRef] [Medline]
  62. Random pixel swap (RPS) data augmentation technique code. GitHub. URL: https://github.com/Saintcodded/Random-Pixel-Swap-RPS-Data-Augmentation-Technique [Accessed 2025-08-22]


AUROC: area under the receiver operating characteristic curve
CNN: convolutional neural network
CT: computed tomography
DA: data augmentation
DL: deep learning
IQ-OTH/NCCD: Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases
MVSF: multivalue swap area factor
ROI: region of interest
RPS: Random Pixel Swap
SVSF: single-value swap area factor


Edited by Sean Hacking; submitted 15.11.24; peer-reviewed by Gabriel Guerrero-Contreras, Keqin Li, Tianle Zhang, Yunfei Xia, Zihao Zhao; final revised version received 26.05.25; accepted 23.06.25; published 03.09.25.

Copyright

© Ayomide Adeyemi Abe, Mpumelelo Nyathi. Originally published in JMIR Bioinformatics and Biotechnology (https://bioinform.jmir.org), 3.9.2025.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Bioinformatics and Biotechnology, is properly cited. The complete bibliographic information, a link to the original publication on https://bioinform.jmir.org/, as well as this copyright and license information must be included.