Deep learning‐based high‐dimensional multiple regression estimator for chest x‐ray image classification in rapid cardiomegaly screening

Chest x‐ray (CXR) examination is a common first‐line, non‐invasive, and rapid screening method in clinical examinations. The posteroanterior (PA) and anteroposterior (AP) view modes can both be used to detect related cardiopulmonary diseases, such as pneumonitis, tuberculosis, pulmonary fibrosis, lung tumors, and cardiomegaly. Compared with cardiac computed tomography and cardiac magnetic resonance imaging methods, CXR examination has a short scanning duration and costs less, and is suitable for routine and follow‐up health examinations. Cardiomegaly is an asymptomatic disease in the early stage and cannot be detected through electrocardiography measurements. Thus, early cardiomegaly classes detections, such as cardiac hypertrophy and ventricular dilatation, can help make decisions regarding drug treatments and surgeries. In addition, an automatic assistive tool is required to differentiate between normal individuals and those with cardiomegaly to address the problem of manual inspection and labor shortage. Hence, PA view‐based CXR classification is used to develop a deep learning (DL)‐based high‐dimensional multiple regression analysis (MRA) model for CXR image classification in rapid cardiomegaly screening. This multilayer network model uses a two‐channel three‐layer convolution‐normalization‐pooling process with two‐dimensional (2D) multi convolution operations to enhance images and to extract feature patterns; and then a one‐dimensional feature conversion is used to estimate the four coordinate points of the maximal horizontal cardiac diameter (MHCD) and maximal horizontal thoracic diameter (MHTD), which can be used to estimate cardiothoracic ratio and detect cardiomegaly. For experimental tests, the training and testing datasets are collected from the National Institutes of Health CXR Image Database (Clinical Center, USA), and 10‐fold cross‐validation was used for model evaluation in terms of precision (%), recall (%), accuracy (%), and F1 score. These indexes are used to evaluate the feasibility of the proposed MRA estimator. In addition, the performances of the proposed model are compared with those of conventional DL‐based multilayer classifiers.

dimensional feature conversion is used to estimate the four coordinate points of the maximal horizontal cardiac diameter (MHCD) and maximal horizontal thoracic diameter (MHTD), which can be used to estimate cardiothoracic ratio and detect cardiomegaly.For experimental tests, the training and testing datasets are collected from the National Institutes of Health CXR Image Database (Clinical Center, USA), and 10-fold cross-validation was used for model evaluation in terms of precision (%), recall (%), accuracy (%), and F1 score.These indexes are used to evaluate the feasibility of the proposed MRA estimator.In addition, the performances of the proposed model are compared with those of conventional DL-based multilayer classifiers.

| INTRODUCTION
The anatomy structure of the heart is a hollow muscular tissue, including the left and right atria and ventricles, and is located behind the sternum in the chest cavity, as seen in the physiological anatomy diagram in Figure 1A.The heart pumps blood throughout the human body, including the circulation systems of the upper and lower bodies, which can transport nutrients, O 2 , and CO 2 .When blood supply to the coronary artery is cut off, myocardial hypoxia or slowed nutrient supply occurs, affecting the rhythm conduction system of the heart and resulting in arrhythmia or irregular heartbeats [1].In clinical cases, such as hypertension and coronary artery disease, long-term high blood pressure raises the left ventricular load, causing it to become hypertrophic or enlarged and eventually leading to fatal heart failure.Most people with cardiomegaly may exhibit no symptoms, but some people may exhibit symptoms, such as dyspnea/shortness of breath, irregular heartbeat, edema, easy fatigue, and upset.In clinical examination, cardiomegaly can be preliminarily diagnosed through exercise electrocardiogram (ECG) and chest x-ray (CXR) imaging examinations.For further diagnosis, imaging methods, such as cardiac computed tomography (CCT), cardiac magnetic resonance imaging (CMRI), and echocardiography (ECHO), are used to detect the possible heart's focus [2][3][4][5], and among these methods, ECHO can achieve higher accuracy.
Cardiomegaly can be divided into two classes, such as cardiac hypertrophy and ventricular dilatation.CXR imaging is the first-line rapid examination method for evaluating the heart condition through a posteroanterior (PA) view, as shown in Figure 2 (PA projection and anteroposterior [AP] projection).As shown in Figure 1B, ventricular muscle tissue indicates hypertrophy and hyperplasia, resulting in the thickening of ventricular muscles; when the ventricular thickness is greater than 1.3 cm, it may belong to the ventricular fat thickness and exceeds 1.5 cm or asymmetric ventricular, special attention shall be paid.Patients with morbid cardiac hypertrophy will experience high blood pressure and a gradual increase in the heart's workload; in addition, these symptoms are caused by heart valve stenosis, myocarditis, cardiomyopathy, and myocardial infarction and eventually increase the risk level.In clinical examination, the major methods for cardiomegaly detection are ECHO, phonocardiogram signals (for related cardiopulmonary disease), and ECG.The ECHO can observe the structure, size, and contraction conditions of the heart and can also evaluate the activity of the heart valve to detect whether there are heart valve defects.Color Doppler ultrasound can also be used to observe the blood flow, volume, and velocity to determine whether the coronary artery is narrowed or stenosis; and heart murmur signals can be obtained using an electronic stethoscope.However, the auscultation measurement needs to be correctly judged by experienced and professional clinicians.The ECG can indicate ventricular premature contraction, ventricular tachycardia, and other phenomena.However, there are some drawbacks, such as the need to wear an ECG device for long-term monitoring and to overcome the problems of noise and electromagnetic interference; and to detect the potential heart problems for subjects with no symptoms in ECG waveforms, and measurement results are interpreted by relevant medical staff in the hospital environment [6,7].
The CXR examination is the fastest method for routine and follow-up health examinations (essential detection item) for related cardiopulmonary diseases, including pneumonitis, tuberculosis, pulmonary fibrosis, and lung tumors [5,[8][9][10][11][12][13]. Its imaging can also be used to detect cardiomegaly problems.At the center of the chest, the heart is easy to locate its position and detect cardiomegaly from a CXR image.Follow-up imaging techniques, such as ECHO, CCT, and CMRI, can be used to confirm the presence of cardiomegaly.As CXR is noninvasive and can reduce the imaging time and cost, therefore, this study proposes a deep learning (DL)-based high-dimensional multiple regression analysis (MRA) model for establishing an estimator to rapidly detect cardiomegaly.
Based on a two-dimensional (2D) CXR image in a specific range, known as the region of interest (ROI), with a bounding box to extract a feature map of the heart and left/right thoracic cages, multiple convolutionnormalization-pooling processes (CNPPs) are used to enhance object features, filter noises, and extract the desired object features, producing key features to estimate four coordinate points of the maximal horizontal cardiac diameter (MHCD) and the maximal horizontal thoracic diameter (MHTD) for automatically calculating the cardiothoracic ratio (CTR) [9,11,12].In this study, we propose two-channel three-layer CNPPs with multi-sliding convolutional windows (with different weighting combinations) to perform the feature extraction task.Hence, specific object features, such as the heart and left/right thoracic cages, can be easily detected within the ROI.Throughout the denoising and sharpening processes, these feature parameters can provide key information for classification and prediction applications.In each CNPP layer, a batch normalization process [14][15][16][17] is used to normalize the feature parameters within the specific range of values based on the batch mean and SD, which can suppress the exciting value in the active function and can improve training performance [15].In the pooling process, maximum pooling (MP) uses the downsampling process to reduce the number of feature parameters, which can reduce the complexity level, computational time, and memory requirement for storing feature parameters.In the classification layer, an optimization algorithm is used to adjust the optimal network parameters to refine prediction performance, which can accurately estimate four coordinate points of the MHCD and MHTD.Then, a promising CTR index can be obtained for evaluating the risk level; for a threshold value = 0.50, we can divide the CTR index into five levels:  level; (5) CTR > 0.60 represents a severe level [1].In clinical cases, subjects with chronic heart failure, long-term high blood pressure, or long-term dialysis have CTR > 0.50, and subjects having CTR > 0.60 are more prone to cardiovascular diseases, such a cardiopulmonary embolism, stroke, and left ventricular hypertrophy.
Therefore, using the Nation Institutes of Health (NIH) CXR Image Database [18][19][20], containing health heart (Nor) and cardiomegaly (Card) CXR images, this study establishes a DL-based high-dimensional MRA estimator for rapid cardiomegaly screening.In the training stage, the gradient descent algorithm (GDA) [21,22] is used to adjust the connecting network parameters by iteration computations.After the training stage, the trained MRA estimator can automatically screen the cardiomegaly level and improve the estimation accuracy.With the cross-validation method, four indexes, precision (%), recall (%), F1 score, and accuracy (%) [12,23], are used to validate the feasibility of the proposed prediction model for clinical applications, and the proposed method is compared with conventional convolutional neural network (CNN)-based multilayer classifiers, such as "U-Net" and "U-Net + Dense." The remainder of this study is organized as follows.Section 2 describes the materials and methodology, including the NIH CXR image collection, MRA model design, and MRA performance evaluation for cardiomegaly screening.Sections 3 and 4 present the experimental results and feasibility evaluation with the cross-validation and conclusions, respectively.

| MATERIALS AND METHODOLOGY 2.1 | CXR image collection
An x-ray is a type of electromagnetic radiation that has a higher energy than visible light and can pass through the human body.Radiologists can scan the human body and capture images of the body interior, including tissues, structures, and bone features, using an x-ray imaging system, such as CXR and CCT imaging, and can use the PA or AP projection mode with low-dose x-rays to produce images (radiographs), as shown in Figure 2.For example, for PA projection scanning, clinicians or radiologists can directly inspect images as representing shadows for specific medical purposes, such as diagnosing or treating diseases.Hence, this study collects the CXR images from the NIH CXR Image Database (Clinical Center, USA, approximately 112 000 films and 31 000 subjects) [18][19][20] and selects the Nor and Card CXR images to validate the feasibility of the proposed MRA estimator.This image database is freely available for clinical research, diagnostic decision-making with artificial intelligence methods, and digital healthcare applications.These images include 14 classes of related cardiopulmonary diseases as annotated normal/ abnormal labels, and the 15th class includes undiscovered diseases as related cardiopulmonary diseases, including pneumonia, fibrosis, tuberculosis, pneumothorax, mass/nodules, pleural effusion, emphysema, and cardiomegaly.To protect patients' privacy, anonymization is performed, and each CXR image is manually inspected and identified twice by experienced clinicians or radiologists, serving as a biomarker for each patient's age, disease classes, and focus locations.Each CXR image (colored image) in PNG format is digitized to a resolution of 96 Â 96 dots per inch and 24 bits per pixel; thus, each image is a 1024 Â 1024-pixel image.
Among these annotated images, we collect the Nor and Card CXR images, which can be divided into two sets: training and test sets.For a specific bounding box of 124 pixels Â 124 pixels, according to the biomarker data, we extract feature maps from the CXR images and locate four coordinate points of the MHCD (from (x 1 , y 1 ) to (x 2 , y 2 )) and MHTD (from (x 3 , y 3 ) to (x 4 , y 4 )), as shown in the coordinate points (x 1 , y 1 ), (x 2 , y 2 ), (x 3 , y 3 ), and (x 4 , y 4 ) in Figure 3.As seen in Figure 4, we can use these four coordinate points to compute the CTR to measure the enlargement of the cardiac silhouette, as follows [13][14][15]: where q .
The CTR index can be used to evaluate the cardiomegaly level: (1) Nor condition: CTR ≤0.50 (CTR < 0.42: pathologic condition, indicating a smaller heart); (2) mild cardiomegaly: 0.50 < CTR ≤0.55: (3) moderate cardiomegaly: 0.55 < CTR ≤0.60; (4) severe cardiomegaly: CTR > 0.60.In clinical applications, CTR > 0.50 is the critical index to evaluate non-fatal myocardial infarction, coronary death, and left ventricular systolic function (CTR > 0.60).The CTR index is clinically applicable to both adults and children for exclusionary diagnosis or follow-up health examinations, as well as for further disease diagnosis.Furthermore, it should be confirmed using ECHO, CCT, and CMRI.

| Proposed MRA estimator design
Current CNN-based methods for image segmentation and image classification are based on different multilayer classifier models, as shown in the multilayer architecture in Figure 5.This novelty concept was first proposed by Dr. Yann LeCun (American Computer Scientist) in 1989 years, as the so-called DL algorithm, which uses several convolution-pooling layers (>10 layers) to enhance images and extract specific feature maps from 2D images, such as objects' shapes, contours, and edge information.Such multiple convolution-pooling processes can improve the pattern recognition ability.In each convolutional layer, the multi-kernel windows are used to perform convolution operations to obtain multi- feature maps.These feature maps can be superimposed into a thickness feature map.This thickness map contains a significant amount of key information for improving classification and prediction accuracy.According to the architecture in Figure 5, with the NIH CXR Image Database [18][19][20], we design a multilayer CNN-based model to rapidly screen cardiomegaly in clinical usage.
In this architecture, for a CXR image, we obtain an input map (124 pixels Â 124 pixels) with the bounding box within the ROI and then feed it to the two-channel CNPP paths (three layers in each path) to extract the feature parameters of the heart and left/right thoracic cages.The batch normalization process [14][15][16][17] is used to improve learning efficiency.In the classification layer, a back-propagation neural network (BPNN) is used to predict the four coordinate points of the MHCD and MHTD, including points, (x 1 , y 1 ), (x 2 , y 2 ), (x 3 , y 3 ), and (x 4 , y 4 ).The proposed high-dimensional MRA estimator can be trained on the input-output paired training datasets, as shown in Figure 3, and its function includes (1) extracting a feature map within the ROI in the CXR image; (2) extracting the thickness feature map with twochannel CNPP processes; (3) predicting the four coordinate points to estimate the "CTR index" (as seen in Figure 5).The functions of each layer in the model are described as follows: • Feature map extraction: the 124 pixels Â 124 pixels The bounding box is used to obtain the feature map within the ROI and then feed it into the two-channel CNPP paths to extract the feature parameters of the heart and left/right thoracic cages; • Convolutional processing: the 3 Â 3 kernel windows with nine weighted values are used to perform the convolution operation with a sliding stride = 1, where the multi-kernel windows can gradually enhance nonlinear features and can also perform denoising processes.As a result, specific feature parameters such as the heart and left/right thoracic cages can be observed.
Notice: Two-channel paths for convolution processing: (Channel #1) gradually increase the number of kernel windows from 16 to 256 (16 !64 !256); (Channel #2) increase the number of kernel windows from 32 to 512 (32 !128 !512), as seen in Figure 5.In each path, we use these multi-kernel windows with different weighted values to perform convolution operations.In the training stage, the learning algorithm is used to find the optimal weighted values by iteration computations in a fully connecting network, which can distinguish the desired object's features.The general form of the convolutional operator can be expressed as follows: • Normalization processing: the batch normalization process [14][15][16][17] is used to normalize feature parameters between network layers, and the general form can be expressed as follows: where μ denotes the mean value of feature parameters; σ denotes the SD of feature parameters; N represents the number of feature parameters, N = n Â n.With these parameters, μ and σ, the general form can be expressed as follows: where γ and β denote the scaling and shifting parameters, respectively.When performing a feature scaling operation using (5) during the training stage, the feature parameters can be scaled within a specific range, resulting in parameters that do not oscillate and can accelerate the training processes.
• Maximum pooling processing (MPP): a 2 Â 2 sliding window is used to find the maximum value within the window range, which can reduce the number of feature parameters to solve the overfitting problem and accelerate the training processes.After MPP, the flattening process is used to convert the 2D feature map to a one-dimensional (1D) feature vector.The general forms can be expressed as follows: where max(•) denotes the MPP operator; flat(•) denotes the flattening operator; X = [x 1 , x 2 , x 3 , …, x n 0 ] denotes the 1D feature vector.
• MRA estimator training: in the classification layer, the fully-connected BPNN uses a loss function (LF) to evaluate the training performance during the training stage; an optimization learning algorithm is used to adjust the network connecting parameters to minimize the LF value, to obtain the smallest residual value between the target and prediction values, or to obtain the minimal error rate.
The LF is the mean square error function and can be expressed as follows: where Due to the high-dimensional input feature vector, in the classification layer, the BPNN requires many hidden Layers to construct the complexity and nonlinear scheme to deal with the high-dimensional feature parameter and then estimate (predict) the four coordinate points using the MRA concept.This study constructs a fully connected network with multiple hidden layers, whose outputs can be computed as follows: where l represents the current hidden layer number, l = 1, 2, 3, …, L; f(•) denotes the active function in each hidden node, as in (11).The GDA [21,22] is used to adjust the optimal weighted parameters and biasweighted parameters to reach the best regression between the output Y and input X through iteration computations.
The general forms are as follows: where rW l and rB l denote the gradient values in the lth layer, which are used to update the weighted matrix, W, and bias-weighted matrix, B, respectively; η represents the learning rate (0 < η ≤ 1).The gradient values of W and B are computed by ( 17) and (18), respectively.The GDA learning method is used to update the connecting weighted and bias-weighted parameters between network layers (from the Lth layer to the first layer) and to improve the estimation accuracy.With Equations ( 15)-( 18), the optimal parameters can be obtained using matrix operations to minimize LF values until the LF value satisfies the predetermined convergence condition.
F I G U R E 6 ReLU active function.

| MRA estimator performance evaluation
The proposed MRA estimator can recognize the thickness feature map to estimate the four coordinate points (x 1 , y 1 ), (x 2 , y 2 ), (x 3 , y 3 ), and (x 4 , y 4 ) for computing the CTR index.Based on cross-validation, the confusion matrix for the MRA estimator is shown in Figure 7, showing the true positive (TP), false positive (FP), true negative (TN), and false negative (FN).Through 10-fold cross-validation, four indexes, namely, precision (%), recall (%), F1 score (F1 measure), and accuracy (%), can be used to evaluate the performance of the proposed estimator for each validation.The four indexes are defined as follows: For example, given 226 CXR images to validate the performance of the MRA estimator for Card screening, comprising 101 Nor CXR images and 125 Card CXR images, Figure 7 shows the confusion matrix of the estimator, with TP = 93, FP = 8, TN = 102, and FN = 23.Then, we could use four outcomes in the 2 Â 2 confusion matrix to compute the four evaluation indexes, as recall = 80.17%, precision = 92.08%,accuracy = 86.28%,and F1 score = 0.8571, to evaluate the performance of the proposed estimator.The recall (%) index is the number of correct results divided by the number of results; precision (%) is the number of correct results divided by the number of all returned results; both were greater than 80.00%, demonstrating that the classifier/estimator has promising classification/prediction performance.F1 score is the harmonic mean of precision (%) and recall (%) for identified TP for Card screening.An F1 score >0.9000 indicates a good classifier/estimator model.Therefore, in this study, the performance of the MRA estimator was verified by these four indexes.

| EXPERIMENTAL RESULTS AND DISCUSSION
This study implemented the DL-based high-dimensional MRA estimator using the PyCharm Community Edition 2022.3 version, comprising two-channel CNPP paths, flattening processing, layer, and classification layer, as seen in Figure 5, and the related data of the MRA model, as also shown in Table 1, as described below: • In each CNPP path, multi 3 Â 3 kernel convolutional windows were used to perform feature extraction tasks to detect and sharpen the contours of the heart and left/right thoracic cages and to remove noise; • Each kernel window moved with a stride of 1 (stride = 1) to perform local convolution operations, and the size of the feature map after each convolution operation with feature parameter is kept at padding = 1; batch normalization [14][15][16][17] was used to restrict the feature parameters within a specific range; • In the pooling process, MPP used a 2 Â 2 sliding window with stride = 2 to find the maximum feature parameter in the local range.
As shown in Table 1, we used two CNPP paths and three-layer CNPP processes and increased the number of convolution windows (16 !64 !256 and 32 !128 !512) in each path to obtain the thickness feature map for enhanced recognition of complex feature maps.In the classification layer, the fully connected BPNN model included an input layer, multiple hidden layers, and an output layer.The topology of the network is 294 912 Â 256 Â 128 Â 64 Â 32 Â 8.In this study, we implemented the MRA estimator in a multi-core personal computer-based platform (Intel ® Q370, Intel ® Core™ i7 8700, DDR4 2400 MHz 8G*3) and used a graphics processing unit (NVIDIA ® GeForce ® RTX™ 2080 Ti, 1755 MHz, 11GB GDDR6) to increase the execution time.This study used the NIH CXR image Visualization of a confusion matrix for the outputs of MRA estimator.
dataset [18][19][20], including 200 Nor images (control group: CTR ≤0.50) and 200 Card images (test group: CTR > 0.50), for MRA estimator validation.The training and testing datasets were used to train and verify the MRA estimator, respectively.Figure 8A,B showed the training and validation history curves of the estimator, respectively, where the blue solid line indicated the training performance curves, and the orange solid line indicated the validation curves.As the number of training epochs increases, the prediction accuracy of the MRA estimator gradually be improved, and the LF value gradually decreased.After about 120 training epochs, the convergence condition was satisfied, improving the prediction accuracy.We also developed a graphical user interface (GUI) in both automatic and manual operation modes for clinical usage, as seen in Figure 9, through which clinicians or radiologists could input a batch of CXR images for cardiomegaly screening.For example, as shown in Figure 10, two red coordinate points and two green coordinate points in the six CXR images could be automatically located by using the proposed MRA estimator, which could be used to estimate the MHCD and MHTD to compute the CTR ≥0.50 (= MHCD/ MHTD) and evaluated the possible cardiomegaly level.In addition, clinicians or radiologists could manually select specific CXR images for batch imaging examinations.Hence, this study also considered three approaches for cardiomegaly screening, as follows: • Approach#1: manual examination was used; clinicians or radiologists manually located the four coordinate points of the MHCD and MHTD and then computed the CTR index to judge the "Nor level" or "Card level" based on their experienced and professional abilities; The same training and testing datasets could be used to train and validate the Classifier#2 and Classifier#3, respectively.Classifiers #2 and Classifier#3 could produce the visualization of confusion matrices on the same 226 testing CXR images, as shown in Figure 11.For example, Classifier#2 obtained TP = 62, FP = 39, TN = 111, and FN = 14, with a recall = 81.58%,precision = 61.39%,accuracy = 76.55%, and F1 score = 0.7006.According to the comparisons of the experimental results of the MRA estimator, Classifier#2, and Classifier#3 in Table 2, the performances of the proposed MRA estimator in terms of recall (%), precision (%), accuracy (%), and F1 Score was superior to that of the traditional CNN and machine learning (ML) methods, where both the precision (%) and recall (%) values were greater than 80%.The recall (%) index is also called the positive predictive value (PPV), which means the correct rate for identified TP.Generally, the PPV index of the MRA estimator was greater than 80%, demonstrating its promising performance.The MRA estimator obtained a harmonic mean index (F1 score) of 0.8571, which was higher than those of Classifier#2 and Classifier#3.These results show that the proposed high-dimensional MRA model is a good estimator for predicting the CTR for cardiomegaly screening.
In addition, the sensitivity and specificity of the MRA estimator [24,25] were 80.17% and 92.73% in identifying the Nor and Card level, respectively, with a negative predictive value (NPV) of 81.60%.A PLR of 11.3 was obtained for clinical diagnosis, where a higher ratio denotes a greater likelihood of disease presence in this batch examination and a smaller ratio denotes the absence of disease.As shown in Table 3, the PLR index could be divided into seven levels, namely, >10.0, 5.0-10.0,2.0-5.0,0.5-2.0,0.2-0.5, 0.1-0.2, and <0.1, in clinical research [26][27][28].PLR > 10 indicates strong evidence to rule in disease, and the significance test for the negative likelihood ratio (NLR) = 0.21 was higher.For the comprehensive performance evaluation in terms of PLR and NPR, the proposed estimator outperformed Classifier#2 (PLR = 3.22, NLR = 0.25) and Classifier#3 (PLR = 5.09, NLR = 0.39).However, the proposed MRA model had a more complex structure and required more network parameters (142067337) and memory storage than Classifier#2 and Classifier#3.
In recent years, DL-based classifiers have been extensively used in classification applications, such as medical imaging, speech recognition, and sentence processing.Multiple convolution processes with the multi-kernel convolution windows were used for image enhancement, denoising, and feature extraction [29][30][31][32][33][34][35], and then the desired object's (focus) shapes and contours could be enhanced.The multi-kernel windows with different weighted values could improve the thickness information of the feature map to produce more key feature parameters and improve pattern recognition accuracy, and then, the multiple pooling processes could effectively reduce the dimensionality of the feature map and computational operations to solve the overfitting problem in the training stage [36,37].Previous research [8,29,[33][34][35] has applied DL-based models to image segmentation and classification using CXR images, such as lung, heart, and clavicle bone segmentation, diagnosis, cardiomegaly localization, and CTR estimation.Literature [33] proposed the X-RayNet-1 and X-RayNet-2 models for automatically segmenting the regions of left / right lungs, heart, and left/right clavicle bones with two multilayer structures through 64, 128, 256, and 512 convolution windows and 32, 64, 128, and 256 convolution windows in each layer, respectively.On the JSRT Image Database (247 PA CXR Images) [30,31], the regions of left/right lungs and heart could be accurately segmented, with accuracy = 99.06% for the lungs, 99.16% for the heart, and 99.80% for clavicle bones with the X-RayNet-1 model, and accuracy = 98.93%, 98.96%, and 99.80%, respectively, for the three regions with the X-RayNet-2 model.Given the boundaries of the left/right lungs and heart, these results could be used to determine the MHCD and MHTD to estimate the CTR index for diagnosing cardiomegaly.Literature [8,29] also proposed 2D U-Net and U-Net + Dense Conditional Random Field (CRF), Standard U-Net (Method#1), and XLSor (Method#2) models for lung and heart segmentation and CTR estimation, achieving promising results, as shown in Table 4.
In addition, literature [34,35] proposed CNN-based algorithms, such as a U-Net-based model, ALEX Net, VGG-16, RESNET-50, RESNET-101, and U-Net models for the same medical application.For example, the U-Net-based model used a contracting path (downsampling path) and an expanding path (upsampling path) to perform feature extraction tasks, extracting key information from an image by the multi-downsampling convolution operations, extracting the boundary information of the target (lesion) in the image by multiupsampling convolution operations, and then combining key features and spatial information to obtain high-resolution feature maps [34], which could be used for image segmentation and precise lesion location in the diagnosis of cardiomegaly diseases.On the PA CXR images in the ChestX-ray8 Database [18,19], an accuracy  of >93% could be obtained.Literature [35] compared different DL-based network models, such as ImageNet (DCN), ALEX Net, VGG-16, RESNET-50, and RESNET-101, in abnormality detection and localization for cardiomegaly and pulmonary edema localization and could find the MHCD, MHTD, perimeter of the heart, the perimeter of the thoracic region, area of the heart, and area of left/ right lung to compute three evaluation indexes, namely, 1D-CTR, 2D-CTR, and CTAR (cardiothoracic area ratio) [35].On the Indiana Dataset [35] and JSRT Database [30,31], promising results could also be obtained, as shown in Table 4.The above-mentioned DL-based methods have been applied to image segmentation, cardiopulmonary-related lesion location, and CTR estimation and could improve accuracy, sensitivity, specificity, PPV, and NPV in pattern recognition.However, the complex convolution-pooling process may result in the computation of many feature parameters, necessitating more computing time during the training process of the classifier.Table 4 compares the performance of these methods for automatic cardiomegaly screening.As a small-scale multilayer DL-based estimator for cardiomegaly screening, this study proposed two-channel three-layer CNPPs with many convolution windows to continuously enhance heart and lung features.The proposed method achieved promising results in terms of recall (%), precision (%), accuracy (%), and F1 score for evaluating the proposed estimator and good PLR (11.03) and NLR (0.21) values for evaluating clinical applications.The proposed method had the following advantages: (1) small-scale multilayer training model; the design cycle, feature parameter requirement, computational time, and training time of the proposed model are superior to those of traditional CNN-based models.(2) High-dimensional pattern recognition ability for coordinate point estimation to determine the MHCD and MHTD and then compute the CTR index (as shown in Figure 10).( 3) Visualization result indication; the model could assist clinicians or radiologists in automatically locating the MHCD and MHTD to rapidly screen out the possible focus of the heart.However, this study has limitations in clinical applications, and thus, it is necessary to extend the study to the detection of other cardiopulmonary diseases, such as severe pneumonia, pleural effusion, pulmonary fibrosis, and hydropericardium syndrome.These abovementioned diseases may affect the detection accuracy in image segmentation and CTR estimation, as shown in the Nor and Card CXR images in Figure 12A, where these images could be correctly identified as possible cardiomegaly levels, and as shown in the CXR images in Figure 12B, where multiple cardiorespiratory symptoms, such as pleural and cardiomegaly or pulmonary fibrosis and cardiomegaly, could affect the accuracy of the proposed model.

F
I G U R E 1 Physiological heart anatomy diagram.(A) Health heart anatomy diagram (left and right atria and ventricles), and (B) cardiomegaly anatomy diagram.

F
I G U R E 2 CXR scanning technology.(A) PA projection scanning, and (B) AP projection scanning.

F I G U R E 4
Enlargement evaluation of the cardiac silhouette with the CTR; (A) Health heart (CTR = 0.48): normal condition (Nor); (B) Severe cardiomegaly (CTR = 0.69).F I G U R E 5 Architecture of the proposed high-dimensional MRA estimator for rapid cardiomegaly screening.
L represents the LF; Y = [y 1,k, y 2,k , y 3,k , …, y 8,k ] denotes the prediction vector (the eight prediction values in this study, as (y 1,k , y 2,k ), (y 3,k , y 4,k ), (y 5,k , y 6,k ), and (y 7,k , y 8,k )); t j,k denotes the desired target values, as the target vector T = [t 1,k , t 2,k , t 3,k , …, t m,k ]; m represents the number of desired classes; k = 1, 2, 3, …, K; K represents the number of training data points (as shown in Figure 3); W represents the weighted matrix of the network; B represents the bias-weighted matrix; ReLU(•) denotes the active function, as the rectified linear unit (ReLU) function, as seen in Figure 6.

F
I G U R E 9 GUI for cardiomegaly screening with automatic and manual operation modes.F I G U R E 8 Training history curves of MRA-based estimator.(A) Training history curves in terms of estimation accuracy vs training epoch.(B) Training convergence curves in terms of loss value vs training epoch.

F I G U R E 1 1
Visualization of confusion matrices for Classifier#2 and Classifier#3.(A) Output confusion Matrix of Classifier#2; (B) Output confusion matrix of Classifier#3.F I G U R E 1 0 Automatic coordinate points estimation for determining MHCD and MHTD using MRA-based estimator.
T A B L E 2 T A B L E 4 Comparisons of different DL-based methods for medical purposes, including lung, heart, and clavicle bone image segmentation, CTR estimation, and cardiomegaly diagnosis.