1.
Introduction
Breast cancer is one of the most common malignant tumors in women. In 2013, there were about 180 million new cases of female breast cancer in the world [1,2]. In China, the incidence rate of breast cancer in 2013 was 42.02/10 million, ranking the first place in the incidence of cancer in women, and the mortality rate was 9.74/10 million, ranking fifth in the female cancer death cause [3,4,5]. Breast cancer seriously endangers women's life and health, and brings huge economic burden to the family and society [6].
At present, the traditional statistical methods to analyze the medical expenses and influencing factors of breast cancer include multiple linear regression, logistic regression analysis and so on [7,8,9,10]. Back propagation (BP) neural network is an artificial neural network model based on error back propagation [11]. Compared with traditional statistical methods, it has no special requirements for the type and distribution of data, and has some fault tolerance, so it has more advantages [12,13,14,15]. In recent years, BP neural network model and other data mining methods have also been applied to analyze the medical expenses of cancer patients such as gastric cancer, lung cancer, liver cancer, gynecological cancer and so on [16,17,18].
In this study, the medical expenses of female breast cancer patients in rural areas of Anhui Province were selected as the research object. After the BP neural network model was constructed, the results were compared with the multiple linear model. The aim was to verify the effectiveness and predictive power of the two models for the influencing factors of medical expenses of breast cancer patients, so as to provide scientific theoretical basis for reasonable control of medical expenses of breast cancer.
2.
Materials and methods
2.1. General data
According to the data of cancer registration report in Anhui Province, the medical records of all female breast cancer patients diagnosed from 2017 to 2019 and coded as C50 according to ICD-10 disease code were collected from four county people's hospitals (Feidong, Feixi, Changfeng, and Lujiang) in Anhui Province, including hospitalization number, medical insurance number, medical payment method, operation, chemotherapy and radiotherapy; Medical insurance number, name, date of birth, disease diagnosis, total cost of hospitalization, hospital and other information were collected from social security department. A total of 846 samples were collected. All research variables were assigned values (Table 1). After the samples with missing values and illogicality were excluded, 795 cases were included in the study, and the effective rate was 93.97%.
2.2. Back propagation neural network
BP neural network is the abbreviation of error back propagation neural network. It is a multilayer feedforward network trained by error back propagation algorithm. It is one of the most widely used neural network models at present. It is composed of an input layer, one or more hidden layers and an output layer. Each layer is composed of a certain number of neurons. Its structure is shown in Figure 1. First, the number of neurons in each layer is determined. The number of neurons in the input layer and the output layer is determined by the independent and dependent variables of the actual research. There is no unified calculation method for the number of neurons in the hidden layer. In this study, the formula (M represents the number of neurons in the hidden layer, n represents the number of neurons in the input layer, m represents the number of neurons in the output layer, and represents any value from 1 to 10) is used to calculate the number of neurons in the hidden layer. Secondly, network training. Through SSPS Clementine 12.0 software to simulate the training set, different BP neural network models are obtained. Finally, the network test is carried out. The test set is substituted into each model, and the model with the highest accuracy is the optimal model.
2.3. Multiple linear regression
Multiple linear regression is a regression analysis based on the given values of multiple explanatory variables. It is a method to study the linear relationship between a dependent variable and multiple independent variables [19]. The general form of multiple linear regression model is:
Where β0 is a constant term, I is the number of independent variables, βi (i = 1, 2… i) is partial regression coefficient, e is random error. The meaning of partial regression coefficient is the average change of dependent variable Y when the independent variable changes one unit while other independent variables remain unchanged. After the multiple linear regression parameters are obtained, it is necessary to carry out statistical tests to determine the reliability of the model performance, including the fitting test (coefficient of determination), the significance test of the overall linear equation (F test), the significance test of variables (t test), etc. [20].
2.4. Model evaluation method
The performance of the two models is evaluated by the following indicators, where R2 is the coefficient of determination, which can explain the percentage of the independent variable explaining the change of the dependent variable. The value range is between 0 and 1. The closer the value is to 1, the better the fitting degree of the model to the sample is [21]. The absolute mean error (MAE) reflects the actual error of the model, and the root mean square error (RMSE) is the arithmetic square root of the mean square error (MSE). The calculation formula of each index is as follows.
The predicted data of all patients were test by the normality test. The data both in back propagation neural network model and multiple linear regression model were consistent with normal distribution. After passing the homogeneity test, the difference between the two models were compared by t test.
3.
Result
3.1. Basic information of patients
The median medical expenses of breast cancer patients were (24576 ± 4792) RMB, and the hospitalization days was (31.4 ± 6.7) d. The composition ratio of other variables is shown in Table 2.
3.2. Modeling results
The SPSS Clementine 12.0 data mining platform was used to construct BP neural network model and multiple linear regression model for medical expenses of female breast cancer patients. BP neural network model takes nine indicators such as region, year of diagnosis and medical payment mode as input and logarithm of medical expenses as output. It adopts random sampling method and takes 70% samples as training set and 30% samples as test set. After repeated verification, a three-layer BP neural network model is finally constructed, with 9 neurons in the input layer, 10 neurons in the hidden layer and 1 neuron in the output layer, and the accuracy is 95.97%. The model results are shown in Table 3.
In the multiple linear regression model, 9 indicators such as region, year of diagnosis and medical payment mode were taken as independent variables, and the logarithm of medical expenses was taken as dependent variable. The multiple correlation coefficient was 0.841, which indicated that the model fitted well. The probability of F statistic of the model population linear test (P < 0.001) indicates that there is a significant linear relationship between independent variables and dependent variables. The model results are shown in Table 4.
3.3. Comparative analysis of influencing factors of medical expenses of breast cancer patients
BP neural network model gives the sensitivity of each variable, that is, the influence of each variable change on medical expenses. The analysis and comparison results of the two models are shown in Table 5. It can be seen that the top two influencing factors for medical expenses of breast cancer patients are length of stay and region; whether radiotherapy, surgery, age and chemotherapy also have a greater impact on medical expenses; medical payment method, diagnosis year and clinical stage have a smaller impact on medical expenses.
3.4. Comparison between back propagation neural network model and multiple linear regression model
All samples were substituted into the established BP neural network model and multiple linear regression model to evaluate the performance of the two models. The results are shown in Table 6. The coefficient of determination (R2) of BP neural network model was larger than that of multiple linear regression model, so the fitting degree of BP neural network model was better than that of multiple linear regression model. The MAE, MSE and RMSE values of BP neural network model were less than those of multiple linear regression model, so the prediction ability of BP neural network model was better than that of multiple linear regression model.
4.
Discussion
In this study, the median medical expenses and length of stay of breast cancer patients were (24576 ± 4792) RBM and (31.4 ± 6.7) d. The high cost of hospitalization not only affects the delay of treatment, but also hinders the choice of treatment [22,23]. At the same time, breast cancer brings huge disease burden to patients. Therefore, it is suggested to strengthen cancer screening for women, so as to achieve early detection, early diagnosis and early treatment, so as to reduce the economic burden and disease burden for individuals, families and society.
The results show that BP neural network model and multiple linear regression model can fit the data well. Through the comparative analysis of the influencing factors of medical expenses of breast cancer patients, it can be seen that the hospitalization days and regions are consistent in the two models, whether surgery, age of diagnosis, chemotherapy, medical payment method, year of diagnosis and clinical stage are basically consistent in the two models, but whether radiotherapy is inconsistent in the two models. The length of stay has the greatest impact on the cost of hospitalization, which is consistent with the existing research. Therefore, on the premise of ensuring the level of medical services, shortening the length of stay is an effective measure to reduce the medical expenses of breast cancer patients. There are also great differences in the medical expenses of breast cancer among regions, mainly due to the different economic development and medical technology levels. It is suggested that medical service providers can standardize the clinical pathway and provide efficient and affordable treatment for patients. Medical insurance is the main means to reduce the economic burden of rural patients. In this study, it has little impact on medical expenses. It is suggested that medical insurance policy makers should strengthen the compensation for rural cancer patients and reduce their economic risks.
In recent years, BP neural network model has been widely used in the field of medicine, and achieved good results [24,25]. Although BP neural network model has some disadvantages, such as over training, slow convergence speed, easy to fall into local optimum, it has no requirements for data type and distribution, has certain fault tolerance, and can correct errors repeatedly in the process of self-learning [26,27]. These advantages have great advantages in dealing with medical data with the characteristics of complexity and diversity. The results show that the determination coefficient of BP neural network model is greater than that of multiple linear regression model, and the values of MAE, RMSE and MSE are less than the corresponding values of multiple linear regression model, so its prediction ability is higher than that of multiple linear regression model.
5.
Conclusions
Compared with multiple linear regression model, BP neural network model is more suitable for the analysis of medical expenses in patients with breast cancer. However, the model itself has no advantages or disadvantages, only the applicable conditions of each model are different.
Acknowledgements
The authors would like to acknowledge the all the breast cancer patients for their participation and support.
Conflict of interest
The authors declared that there was no conflict of interests.