2023 Impact Factor
Depression is a common and severe problem for breast cancer patients. Breast cancer patients have a longer survival rates and favorable prognosis when compared with other cancers [1]. Due to the long and distressing process of treatment, 8 to 24% of all cancer patients suffer from depression [2]. The risk of depression increases up to 32.6% in patients with metastatic breast cancer [3]. Moreover, among depressed breast cancer patients, 22% suffer from moderate to severe depression which needs psychiatric intervention [4,5]. Diagnosis of breast cancer may be delayed in the presence of depression [6]. Unrecognized and untreated depression can increase cancer mortality and lead to shorter survival duration [7]. The aggravation of depressive symptoms can also negatively affect the treatment and prognosis of breast cancer patients [8,9]. Given that the high morbidity and negative consequence of depression during breast cancer treatment, the detection of high-risk groups of depression among breast cancer patients is clinically crucial.
Various instruments have been developed to effectively detecting the depression in breast cancer patients [10]. The Beck Depression Inventory (BDI), one of these instruments, is a 21-item self-report scale developed to evaluate the degree of depressive symptoms [11,12]. Although the BDI could be helpful in detecting depresion for breast cancer patients [6,13], a brief assessment tool could be useful when considering medical condition in patients with breast cancer [14,15]. The Hospital Anxiety and Depression Scale (HADS) is a comparably more convenient scale with 14-item which measures anxiety and depression. The HADS has been widely used as a standard screening tool with excellent psychometric properties in breast cancer patients and survivors [16-18]. The Patient Health Questionnaire-9 (PHQ-9) is a 9-item brief self-report questionnaire. The PHQ-9 was reported to have acceptable reliability and validity for screening of depression in general population [19,20]. The PHQ-9 also has good reliability and validity in patients with breast cancer [21,22]. Meanwhile, a single iteme Distress Thermometer (DT) is sometimes used as a screening tool for depression in patients with breast cancer [23]. Although the DT is a convient screening tool for depression, this scale is limited in satisfying both sensitivity and specificity [24-27].
Screening for depression should be convenient, but also accurate. One approach is to use simultaneous and multiple self-report scales to enhance the detection rate of depression. When using multiple tools, the integrated score should be considered instead of applying a fixed cut-off point. Machine learning could be used to develop diagnostic screening models based on multiple scales [28]. However, there is a lack of studies that use machine learning with multiple scales to screen for depression in breast cancer patients. Therefore, this study aims to evaluate the performance of individual self-rating scales as well as the combinations of scales for screening depression in breast cancer patients using machine learning models.
Patients attending the Breast Cancer Clinic in Pusan National University hospital from February 2021 to December 2021 were participated in this study. All patients who suffer from breast cancer without distant metastasis were included. Patients more than years after breast surgery and those who refused psychiatric evaluation were excluded. Total 327 patients were included. This study was approved by Institutional Review Board at Pusan National University Hospital (PNUH IRB: No.2301-004-122).
The clinical characteristics in patients attending a breast cancer clinic were cross-sectionally investigated. The BDI, HADS, and PHQ-9 were used to measure depressive symptoms. The depression was evaluated according to the diagnostic criteria of major depressive disorder by the Diagnostic and Statistical Manual of Mental Disorders 5th edition. Demographic variables were examined by using electronic medical recording.
The BDI is a 21-item self-report scale which measures the severity of depressive symptoms in emotional, cognitive, motivational, and physiological domains [11]. Each item is rated from 0 to 3, and the total score ranges from 0 to 63 points. The BDI is a valid screening tool for depression in medical inpatients [12]. The BDI has been used in patients with breast cancer [6,13].
Hospital Anxiety Depression ScaleThe HADS was developed by Zigmond et al. [29] to assess anxiety and depression in general hospital patients. It consists of 14 items: 7 for anxiety and 7 for depression. It excludes somatic symptoms that may overlap with physical illnesses. It has been standardized and validated in various studies [30,31]. The HADS is a standard screening tool with good psychometric properties in breast cancer patients [16-18].
Patient Health Questionnaire-9The PHQ-9 consisting of 9 items was developed to screening the high-risk group of depression in general population [19]. The Korean version of PHQ-9 also has acceptable psychometric properties to identify high-risk groups for depression [20]. The PHQ-9 in patients with breast cancer also has good reliability and validity [21,22].
AnalysisSociodemographic and clinicopathological characteristics in patients with breast cancer were presented as the mean and standard deviation. To assess depression in patients with breast masses, the raw data of self-rating assessment scales for depression were used. Each individual depression questionnaire and the combination with these scales were used as input variables. The label of machine learning model determined the depression, classifying them into two categories: depression group and healthy group. For establishing the machine learning model, classifiers such as Support Vector Machine; Linear Discriminant Analysis (LDA), Random Forest; k-Nearest Neighbor-hood (kNN), and Logistic Regression (LR) were used. The data were randomly assigned into the training set and test set at a ratio of 7:3. The performance of the prediction model for detecting the depression group was evaluated using accuracy, area under the curve (AUC), sensitivity, and specificity. The metrics were calculated as means from 1,000 iterations. All analyses, including classification, were performed using MATLAB R2022 (MathWorks).
The characteristics of demographic variables in the subjects were shown in Table 1. The age of subjects was mean 48.7 ± 6.4. While the proportion of married state was 82.3%, the unmarried and divorced subjects were 16.5% and 1.2%, respectively. Stage 0 was 14.7% and stage I was 41.6%. Stage II or higher were 43.9% of the subjects. Histological findings showed that the ductal type was the most frequent at 86.9%. As for the type of surgery, partial resection was the highest at 62.1%. The subjects who received adjuvant treatment for chemotherapy were 48.3%, and 67.6% received radiotherapy. The patients who received anti-hormonal treatment were 80.4%. The total score of BDI was 9.7 ± 7.9. The total score of HADS was 10.1 ± 6.4. The total score of PHQ-9 was a mean of 5.6 ± 4.4.
The accuracy of the BDI questionnaire ranged from 0.629 to 0.713 (Table 2). The accuracy of the LR classifier was the highest. The AUC ranged from 0.660 to 0.785. The BDI showed the highest AUC when using the LR classifier. The sensitivity (0.735) was highest when using the LR classifier, and the specificity (0.761) was highest when using the kNN-10 classifier. The HADS questionnaire had an accuracy between 0.615 and 0.705, with the highest accuracy achieved by the LR classifier. The AUC was between 0.637 and 0.784, with the LDA classifier achieving the highest AUC. The sensitivity was highest (0.719) for the LR classifier, and the specificity was highest (0.767) for the kNN-10 classifier. When using the PHQ-9 questionnaire, the accuracy ranged from 0.610 to 0.680 according to machine learning classifiers. The accuracy of the support vector machine radial basis function, support vector machines with linear and LR classifier was the highest. The AUC ranged from 0.623 to 0.756, and the model’s accuracy was best when using the LDA classifier. The sensitivity of the model (0.688) was highest when using LR, and the specificity of the model (0.773) was highest when using the kNN-10 classifier.
The combination of BDI and HADS had an accuracy between 0.624 and 0.737, with LR classifier achieving the highest value (Table 3). The AUC was between 0.658 and 0.812, with LR classifier also having the highest value. The sensitivity was highest (0.756) for LR classifier, and the specificity was highest (0.775) for kNN-10 classifier. The combination of BDI and PHQ-9 had an accuracy between 0.607 and 0.715, with LR having the highest value. The AUC was between 0.635 and 0.782, with LR classifier also having the highest value. The sensitivity was highest (0.730) for LR classifier, and the specificity was highest (0.745) for kNN-10 classifier. The combination of HADS and PHQ-9 had an accuracy between 0.602 and 0.719, with LR classifier having the highest value. The AUC was between 0.639 and 0.794, with LR classifier also having the highest value. The sensitivity was highest (0.728) for LR classifier, and the specificity was highest (0.765) for kNN-10 classifier.
The accuracy of the triple model with BDI, HADS, and PHQ-9 ranged from 0.607 to 0.734. The LR classifier showed the best accuracy. The AUC of triple model ranged from 0.636 to 0.807. The LR classifier also showed the best accuracy. The sensitivity of the triple model (0.749) was the best when using LR classifier. The specificity of the triple model (0.737) was best when using the kNN-5 and kNN-10 classifiers.
The findings of this study showed that most of the machine learning models using single questionnaire such as BDI, HADS, and BDI had acceptable performance as shown in Table 2. The accuracy and AUC in most classifiers showed more than 0.6 and 0.7, respectively. The BDI showed the highest accuracy (0.713) and AUC (0.785) with LR classifier. Recent review on machine learning models for the early diagnosis of depression reported that the performance of the machine learning models varied within 60.1–100.0 for accuracy and 64.0–96.0 for the AUC, which is consistent with the result of this study [32]. Meanwhile, the machine models for the diagnosis of depression using neuroimaging or neurophysiologic data, such as structural magnetic resonance imaging (MRI), functional MRI, diffusion MRI, electroencephalogram neuroimaging, showed higher performance than the one in this study [33], but machine learning models using these more accurate tests may not be useful for depression screening due to their cost-effectiveness. Given that the cost-effectiveness of the screening method, machine learning models using simple questionnaire with acceptable performance might be useful for the detection of depression.
Meanwhile, a strategy to further improve the performance of the machine learning model using the questionnaire may be needed. This study examined the performance of machine learning models using the combination of questionnaires for detection of depression. As shown in Table 3, the performances of combination models using multiple self-rating assessment scale for depression tend to be high compared to those of each single depression scale. Especially, the combination model with BDI and HADS showed highest accuracy (0.737) and AUC (0.812) with LR classifier in detecting depression in breast cancer patients. The screening performance of depression in the combination of BDI and HADS tended to be higher than one in the combination of PHQ-9 and HADS. These finding suggest that using the BDI with various questions about depression, rather than simple questions like the PHQ-9, might increase detecting performance. Additionally, the HADS asks anxiety symptoms, unlike the BDI, which can reflect general distress due to cancer [34,35]. These characteristics of the HADS appear to improve the model performance when the BDI and HADS are combined. However, the performance of the triple combination model was no better than that of the two-combination model. The overlapping use of BDI and PHQ-9, both of which only ask for depressive symptoms, does not seem to enhance the model’s performance. As a result, the two-combination model might be more valuable and efficient than the triple combination model.
This study has several limitations, as follows. First, this study used only three types of self-report questionnaires. Three major self-report assessment tools such as BDI, HADS, and PHQ-9 were included, but some other tools such as the Center for Epidemiological Studies-Depression Scale and DT, which can be also used as screening tools for breast cancer patients, were not included. In future studies, it will be necessary to include all useful screening tools for comparisons. Second, clinical conditions such as breast cancer stage and treatment period were not considered for machine learning models. The screening results for depression in patients with breast cancer may differ depending on clinical characteristics, such as stage of breast cancer and the treatment period. However, a machine learning model that does not distinguish the clinical characteristics of breast cancer may still be useful in when early stages of a visit to a breast cancer clinic. Third, this study did not verify statistical significance among models combining scales. This study was an exploratory study to appropriately screen for depression in breast cancer patients. In future research, studies comparing the significance of the explored models need to be conducted.
In conclusion, this study showed that the BDI and HADS combination model showed acceptable and efficient performance in detecting depression in breast cancer patients. The combination method using self-report questionnaires in breast cancer patients may be better than a single scale. In future studies, other strategies to improve the screening performance, apart from self-report questionnaires, need to be considered. However, a machine learning models using a combination of simple and brief self-report questionnaires may help detect depression in patients who visit the breast cancer clinic.
No potential conflict of interest relevant to this article was reported.
Conceptualization: Heeseung Park, Eunsoo Moon, Taewoo Kang. Data curation: Heeseung Park, Kyoung-Eun Kim. Formal analysis: Kyungwon Kim, Hyun Ju Lim. Funding acquisition: Heeseung Park. Investigation: Heeseung Park, Eunsoo Moon, Taewoo Kang. Methodology: Heeseung Park, Kyungwon Kim, Eunsoo Moon, Hwagyu Suh, Taewoo Kang. Validation: Hwagyu Suh, Taewoo Kang. Writing—original draft: Heeseung Park, Eunsoo Moon. Writing—review & editing: Heeseung Park, Kyungwon Kim, Eunsoo Moon, Hyun Ju Lim, Hwagyu Suh, Kyoung-Eun Kim, Taewoo Kang.