Loading [MathJax]/jax/output/SVG/jax.js
Research article

Determinants of tuberculosis treatment support costs to the treatment supporters in rural Ghana

  • Received: 01 November 2022 Revised: 30 January 2023 Accepted: 08 February 2023 Published: 21 February 2023
  • Background 

    The Ghana Health Service has been implementing the Directly Observed Therapy Short Course (DOTS) strategy for decades now, to cure and reduce the transmission of tuberculosis. DOTS strategy requires TB patients and their treatment supporters to make multiple clinic visits in the course of treatment, and this may place financial burden on treatment supporters with low socio-economic status. However, the determinants of tuberculosis treatment support costs to treatment supporters are unknown in Ghana.

    Objectives 

    This study determined the costs associated with treatment support to the treatment supporters in Bono Region, Ghana.

    Methods 

    In a cross-sectional study using cost-of-illness approach, 385 treatment supporters were selected and interviewed. A validated questionnaire for the direct and indirect costs incurred was used. Descriptive statistics and bivariate techniques were used for data analysis.

    Results 

    Averagely, each treatment supporter spent GHS 122.4 (US$ 21.1) on treatment support activities per month, which is about 19% of their monthly income. The findings also revealed that highest level of education, household size, monthly income and district of residence were significant predictors of the direct costs. On the other hand, gender of the respondents, highest level of education, ethnicity, household size, income level and relationship with patient were some of the factors that significantly influenced the indirect costs. The significance levels were set at a 95% confidence interval and p < 0.05.

    Conclusion 

    The study concludes that the estimated cost associated with assisting tuberculosis patients with treatment is significant to treatment supporters. If these costs are not mitigated, they have the tendency of affecting the socio-economic status and welfare of individuals assisting tuberculosis patients with treatment.

    Citation: Robert Bagngmen Bio, Patricia Akweongo, Augustine Adomah-Afari, Augustina Koduah. Determinants of tuberculosis treatment support costs to the treatment supporters in rural Ghana[J]. AIMS Public Health, 2023, 10(1): 78-93. doi: 10.3934/publichealth.2023007

    Related Papers:

    [1] Tingxi Wen, Hanxiao Wu, Yu Du, Chuanbo Huang . Faster R-CNN with improved anchor box for cell recognition. Mathematical Biosciences and Engineering, 2020, 17(6): 7772-7786. doi: 10.3934/mbe.2020395
    [2] Wei-Min Zheng, Qing-Wei Chai, Jie Zhang, Xingsi Xue . Ternary compound ontology matching for cognitive green computing. Mathematical Biosciences and Engineering, 2021, 18(4): 4860-4870. doi: 10.3934/mbe.2021247
    [3] Carlos Castillo-Chavez, Bingtuan Li, Haiyan Wang . Some recent developments on linear determinacy. Mathematical Biosciences and Engineering, 2013, 10(5&6): 1419-1436. doi: 10.3934/mbe.2013.10.1419
    [4] Sanket Desai, Nasser R Sabar, Rabei Alhadad, Abdun Mahmood, Naveen Chilamkurti . Mitigating consumer privacy breach in smart grid using obfuscation-based generative adversarial network. Mathematical Biosciences and Engineering, 2022, 19(4): 3350-3368. doi: 10.3934/mbe.2022155
    [5] Hasan Zulfiqar, Rida Sarwar Khan, Farwa Hassan, Kyle Hippe, Cassandra Hunt, Hui Ding, Xiao-Ming Song, Renzhi Cao . Computational identification of N4-methylcytosine sites in the mouse genome with machine-learning method. Mathematical Biosciences and Engineering, 2021, 18(4): 3348-3363. doi: 10.3934/mbe.2021167
    [6] Bochuan Du, Pu Tian . Factorization in molecular modeling and belief propagation algorithms. Mathematical Biosciences and Engineering, 2023, 20(12): 21147-21162. doi: 10.3934/mbe.2023935
    [7] Liang Yu, Zhengkuan Zhang, Yangbing Lai, Yang Zhao, Fu Mo . Edge computing-based intelligent monitoring system for manhole cover. Mathematical Biosciences and Engineering, 2023, 20(10): 18792-18819. doi: 10.3934/mbe.2023833
    [8] Ruiping Yuan, Jiangtao Dou, Juntao Li, Wei Wang, Yingfan Jiang . Multi-robot task allocation in e-commerce RMFS based on deep reinforcement learning. Mathematical Biosciences and Engineering, 2023, 20(2): 1903-1918. doi: 10.3934/mbe.2023087
    [9] Michele La Rocca, Cira Perna . Nonlinear autoregressive sieve bootstrap based on extreme learning machines. Mathematical Biosciences and Engineering, 2020, 17(1): 636-653. doi: 10.3934/mbe.2020033
    [10] Qi Cui, Ruohan Meng, Zhili Zhou, Xingming Sun, Kaiwen Zhu . An anti-forensic scheme on computer graphic images and natural images using generative adversarial networks. Mathematical Biosciences and Engineering, 2019, 16(5): 4923-4935. doi: 10.3934/mbe.2019248
  • Background 

    The Ghana Health Service has been implementing the Directly Observed Therapy Short Course (DOTS) strategy for decades now, to cure and reduce the transmission of tuberculosis. DOTS strategy requires TB patients and their treatment supporters to make multiple clinic visits in the course of treatment, and this may place financial burden on treatment supporters with low socio-economic status. However, the determinants of tuberculosis treatment support costs to treatment supporters are unknown in Ghana.

    Objectives 

    This study determined the costs associated with treatment support to the treatment supporters in Bono Region, Ghana.

    Methods 

    In a cross-sectional study using cost-of-illness approach, 385 treatment supporters were selected and interviewed. A validated questionnaire for the direct and indirect costs incurred was used. Descriptive statistics and bivariate techniques were used for data analysis.

    Results 

    Averagely, each treatment supporter spent GHS 122.4 (US$ 21.1) on treatment support activities per month, which is about 19% of their monthly income. The findings also revealed that highest level of education, household size, monthly income and district of residence were significant predictors of the direct costs. On the other hand, gender of the respondents, highest level of education, ethnicity, household size, income level and relationship with patient were some of the factors that significantly influenced the indirect costs. The significance levels were set at a 95% confidence interval and p < 0.05.

    Conclusion 

    The study concludes that the estimated cost associated with assisting tuberculosis patients with treatment is significant to treatment supporters. If these costs are not mitigated, they have the tendency of affecting the socio-economic status and welfare of individuals assisting tuberculosis patients with treatment.



    With the development of society and the improvement of computer storage and data processing ability, the explosive growth of data has become an important feature. Faced with the growing scale of massive data, the storage and management of massive data has been a concern. In order to improve the reliability and access efficiency of HDFS, replica technology is commonly used to copy multiple data files and store them on multiple nodes of the distributed file system. For files of different popularity in distributed system, how to dynamically adjust the number of copies so it can not only ensure the reliability and availability of data, but also improve the data access efficiency? The system should be able to adjust the number of copies when the predicted access popularity value is changed, with the number of copies increased for high popularity files and the number of copies reduced for low popularity files, so as to improve the access efficiency and save storage resources. The solution of the above problems is of great significance in improving performance of the cluster.

    File popularity is used to describe the degree to which files are required by users in a time period, and it is defined by the frequency of file access in a certain period of time [1], i.e., the times the file is accessed by the user in unit time. Domestic and foreign scholars mainly use neural network models [2,3], grey models [4], deep learning models and so on in file popularity prediction. In References [5,6], aiming at the shortcomings of static copy strategy, a prediction model based on grey model is proposed to predict the future access popularity of data according to the latest access characteristics of files. The frequency of file access is regarded as the popularity of file access. According to temporal locality, the recently accessed files will be accessed again in a short time, and the algorithm calculated file's accessing temperature by analyzing file's accessing frequency within the specified time, and improved the performance of data service [7,8]. Literature [9,10] used combination forecasting model to deal with the shortcomings of single forecasting model. The neural network model in Reference [11] needs to set up accurate levels and a large number of sample space for training, which will affect the time of file popularity prediction. A method was proposed based on LSTM deep learning model to predict the popularity of files [12], which divides the dynamic time window to construct a time series of file access features, and predicts the access trends of different data. With a subjective and objective popularity calculation method, Reference [13] dynamically predicted the file access volume through LSTM neural network model, calculated the file popularity through the predicted file access volume, and adjusted the copy storage strategy.

    In the aspect of replica dynamic management, Veeravalli et al. [14] proposed a CDRM dynamic copy management model, which adjusts the number and storage location of data copies through dynamic changes in node storage space capacity and load conditions. Aiming at the low efficiency of randomly selecting the source node and target node for replication in the default copy creation process of HDFS in Hadoop, Higai et al. [15] proposed a copy management strategy based on a single ring structure. The copy management strategy of this method is 25% more efficient than the default method.

    In the current methods of file popularity prediction, the BP neural network prediction tends to get stuck on local optimum, and the accuracy of grey prediction model is not enough. The time feedback model and the prediction method with decay factor are suitable for short-term prediction, while the prediction accuracy is not high. The deep learning LSTM requires a long training time. In the light of the above problems, aiming at the static copy management strategy and the default copy placement strategy, the paper created the unbiased grey model to predict the future access popularity of the file, and proposed an optimization algorithm based on the cuckoo search which can optimize the Markov model to correct the grey prediction value, then the number of copies is adjusted by the popularity prediction value. The main innovations of this paper are listed as follows:

    (1) This paper builds a prediction model of file popularity based on the unbiased grey model. This model has the sequence and uncertain character, and has high prediction accuracy.

    (2) This paper presents an optimized algorithm. The Markov model is optimized by cuckoo search algorithm to correct the predicted values and improve the accuracy of prediction.

    (3) This paper presents a new method to compute the number of copies. This method dynamically adjusts the number of copies and improves the access efficiency.

    The remainder of this paper is organized as follows. Section 2 introduces file access popularity prediction method. In Section 3, we introduce how to calculate the number of copies. Section 4 is about the determination of the number of file copies. Section 5 reports experimental results, and conclusions of this paper are made in the last section.

    In order to deal with the problem of low access efficiency caused by the sudden increase of hot file access, the method of predicting the popularity of the file at the next moment in advance is adopted to adjust the number of copies in time to improve user access efficiency [16,17]. In the past research, a single method model generally has limitations and doesn't work well in addressing certain problems. Many researchers use the combined model to cope with the limitations of the single model and achieve better results [18]. In our research, the grey model is adopted which can predict the data development trend through a small amount of simple calculation and incomplete information, and predicts the file popularity at the next moment through the latest access characteristics of the file [19]. In order to further improve the accuracy of the prediction data, the cuckoo search algorithm is used to optimize the Markov model and modify the prediction data to improve the prediction accuracy.

    The grey model is suitable for the prediction model of the system with uncertain factors. For the grey system with known and unknown information, the grey differential prediction model is established by using a small amount of incomplete information to predict the development and change of the system behavior eigenvalue and the grey process related to time series which changes in a certain range. Grey process finds the law of system change through correlation analysis, establishes the corresponding differential equation model, and predicts the future development trend. The grey prediction model uses equal time interval to obtain a series of quantitative values reflecting the characteristics of the prediction object to construct the grey model prediction data set. This paper takes the file access popularity as the original data, and predicts the future development trend of the file popularity through the grey model processing.

    Markov model is a stochastic process of things discovered by Russian mathematicians. The change process of this kind of things is only related to the recent state. Markov model can predict the future development trend according to the probability of transition between different states. In the process of state transition, the state after the nth transition only depends on the previous result state, which has nothing to do with the original state of the system and the Markov process before the transition. In this paper, unbiased grey model is used to predict the file popularity. Because the randomness of user access and the sudden change of file popularity are two inevitable factors, the grey prediction model may produce large errors. Markov model is used to predict the state of file popularity in the future, and the error of file popularity prediction value is corrected.

    Cuckoo search (CS) algorithm [20,21] is a natural heuristic algorithm proposed by Yang and Deb. It has the advantages of fast convergence, high efficiency and few adjustment parameters. This method is based on the cuckoo's parasitic brooding behavior and Lévy flight, which is more effective than genetic algorithm and particle swarm optimization. Lévy flight can expand the search range, increase the diversity of the population, and it is easier to jump out of the local point. The algorithm regards all host nests as a generation, and each nest carries bird eggs as a solution. The goal is to find the potential optimal solution through continuous iteration to replace the existing scheme.

    This paper uses the cuckoo search algorithm for multi-objective optimization, based on the following three criteria:

    (1) Each cuckoo lays K eggs at a time (representing the solution of K goals) and puts them in a randomly selected host nest;

    (2) The best nest with high quality eggs (solutions) in each generation is brought to the next generation;

    (3) The number of available host nests is fixed, and the probability of finding cuckoo eggs is pa(0,1). After discovery, the host can destroy the eggs or give up the old nest to build a new one.

    Each egg in the nest represents a solution, and each cuckoo can lay only one egg. The purpose is to use a new or better solution to replace the bad one in the nest. The CS search algorithm is used to explore the relationship between the Markov state interval, the actual value and the predicted value to find the most suitable value of the interval and to modify the predicted value. The problem is a multi-objective optimization problem, and the optimal solution is used to divide the Markov model state interval.

    The grey Markov model is widely used in the prediction of small sample and exponential distribution sample data. In a period of time, the access frequency of popularity files is like this. Therefore, the paper establishes the file popularity prediction model from grey model parameters, Markov correction value and prediction sequence update, and finally optimizes the model.

    The unbiased grey model is used to process the file access sequence and predict the file popularity in the future. With the help of grey development coefficient α and grey action quantity U of grey model, the original data series fitting model of unbiased grey prediction model is determined, and the future popularity of documents is predicted. The original data sequence of unbiased grey model is the access of a file in the first m equal time periods.

    Set the access volume of file I in the kth time period, and establish the original access volume sequence according to the access volume of file I in the first m equal time periods at the current time, the original sequence data is represented by the upper corner mark "0".

    f(0)i={f(0)i(1),f(0)i(2),f(0)i(3),,f(0)i(m)}

    (1) Original sequence accumulation operation. Add the first and second data of the original sequence to get the second data of the accumulated sequence, add the third data and the second data to get the third data and so on. The calculation function is shown in (1).

    f(1)i(k)=km=1f(1)i(m),(k=1,2,3,,m) (1)

    (2) Smoothness test and quasi exponential test. Before establishing the model, we need to test the smoothness of the original sequence and the series test of the accumulated sequence. If it meets the requirements, we can further establish the model.

    For smoothness test, time-series smoothness ratio is defined as follows:

    ρ(k)=f(0)(k)f(1)(k1),k=3,4,,m (2)

    If f(1)(k)meetsρ(k+1)ρ(k)<1, ρ(k) ∈[0, ϵ], ϵ<0.5,k=2,3,,m, f(1)(k) is smooth data sequence.

    For the quasi exponential test, the class ratio is defined as follows:

    σ(k)=f(0)(k)f(0)(k1),k=3,4,,m (3)

    If a sequence ratio satisfies σ(k)[1,1.5],k=3,4,,m, then the sequence f(1)(k) is regarded as exponential distribution.

    (3) The trend of accumulation sequence can be described by the whitening differential equation shown in Eq (4).

    df(1)i(m)dt+af(1)i(m)=u (4)

    Where parameters a and u are identified, and a is called development coefficient, u is the amount of grey action.

    (4) Model solving. Parameters a and u are computed by the least square method. Let Fm=[f(0)i(2),f(0)i(3),f(0)i(4),,f(0)i(m)]T, ˆα=(au), ˆα is estimator of parameter vector.

    B=[12[f(1)i(1)+f(1)i(2)]112[f(1)i(2)+f(1)i(3)]112[f(1)i(n1)+f(1)i(n)]1] (5)

    Substituting ˆα into the Eq (6), compute the value Fm[A1].

    Fm=Bˆα (6)

    (5) The whitening differential equation is solved, and Eq (4) is solved to obtain the unbiased grey prediction discrete-time response function, which is shown in formula (7).

    ˆf(1)i(k+1)=[f(0)i(1)ua]eat+uak=0,1,2,,m1 (7)

    (6) The parameters b and A are calculated according to the estimated values a and u in grey model, and eliminate the inherent deviation. The calculation method is shown in Eq (8).

    b=ln2a2+a,A=2u2+a (8)

    (7) The unbiased grey prediction model of original data is shown in Eq (9).

    ˆf(0)1=f(0)1,ˆf(0)k+1=Aebk,k=1,2,3,,m (9)

    By updating the parameters, the unbiased grey prediction model does not need to reduce and restore the prediction data, which simplifies the modeling steps and improves the operation efficiency of the system.

    In the Markov model, the error can be corrected by selecting the center value of the state interval where the grey prediction value is located. However, only considering the center value of the state interval may ignore the influence of other error distribution information, and the center value of the state interval cannot properly represent the error state. Therefore, with the help of cuckoo search algorithm, the grey prediction value of Markov model can be modified more accurately.

    The partition of state interval is determined by relative precision, which is the difference between actual value and predicted value. Each data is distinguished according to its own relative precision. To adapt to the prediction of different data series, the boundary value of the interval between the first and the last states is selected as the maximum value. The target interval is divided equally into 3–5.

    Suppose [ai,bi] is the ith state interval, the objective function makes the error of prediction correction vi as small as possible, and it is defined as follows (10).

    vi=αiai+(1αi)bi,i=1,2,,n (10)

    The better parameter αi of each interval is found by Cuckoo search algorithm. When α = 0.5, the correction value is the center value of the state interval.

    The specific process of cuckoo search algorithm to optimize Grey-Markov model is as follows.

    (1) Location update

    In CS algorithm, cuckoos find the nest and lay eggs by the Lévy flight search. The invincible walk is a walk between long and short steps. Therefore, the nest updating position is calculated by Lévy flight as follows:

    xt+1i=xti+βLévy(λ,s) (11)
    s=u|v|1λ
    Lévy(λ,s)=λΓ(λ)sin(πλ2)πs1+λ,λ(1,2)

    Where t is the number of iterations and xti is the position of the ith point at time t and β is the step, uN(0,σ2u),vN(0,σ2v),σ2u=(Γ(1+λ)sin(πλ2)Γ[2(λ1)2λ(1+λ)2]),σv=1,Γ(z)=0tz1etdt, for |s| ≥ |s0|, s submits to Lévy distribution, s0 is the min step, s00, but in fact, if s0 is reasonable, it is feasible, for example s0(0.1,1).

    The host bird rebuilds the nest after discovering the alien bird with a certain probability pa, because the choice of direction is random, the probability of each direction is the same, obeying the uniform distribution. A relatively small value is selected by a large probability in the Lévy distribution, so this location can be approximately satisfied with the Mantegna algorithm. Therefore, the position of the new bird's nest is defined as follows:

    xt+1i=xti+αsH(paϵ)(xtjxtk) (12)

    Among them, t is the number of iterations and xti is the ith position at time t, α is the step, s, ϵ are random numbers that obey even distribution, H is the Heaviside function (13) derived from even distribution, ⨂ is the point multiplication, xtj and xtk is the location of two bird nests randomly selected at t time.

    Heaviside(x)={1x00x<0 (13)

    (2) Definition of Cuckoo search objective function

    The objective function is represented by the mean absolute value of relative percentage error (MAPE). The state interval is divided according to the residual between the grey prediction value and the actual value. Each generation is determined by the minimum MAPE value; the prediction value is modified according to the residual correction after each iteration. The objective function is defined as follows (14).

    MAPE=1nni=1(|f(0)i(k)ˆF(0)i(k)|f(0)i(k)) (14)

    Where ˆF(0)i(k) is the prediction correction value.

    (3) Calculation of correction value

    The correction value of each state interval is to calculate by the optimal decision interval coefficient α=(α1,α2,,αn) in the first step. According to formula (15), the correction values v1,v2,,vn are obtained.

    {v1=α1a1+(1α1)b1v2=α2a2+(1α2)b2v3=α3a3+(1α3)b3 (15)

    (4) Determination of predicted value

    Through the k-step state transition probability matrix and the state Ei of the current data and its initial vector X(0), the state of the predicted value at the next moment can be predicted. Assuming max(Pij)=Pik, the predicted data is most likely to change from state Ei to Ek at the next moment.

    Each state interval is divided according to the relative accuracy, and the unbiased grey forecast is corrected according to the relative accuracy to reduce the error and increase the accuracy of the forecast. The formula is as follows (16).

    ˆF(0)(m+1)=ˆf(0)(m+1)+vi (16)

    Among them, vi is the correction value of state i interval, ˆF(0)(m+1) represents the correction value of the final predicted value, which is the final predicted value of file access popularity.

    Cuckoo search algorithm based on Lévy flight

    Input: Number of host nests n; number of iterations m; probability of detection pa
    Output: Optimal solution Xi
    1.begin
    2.   Initial population n host nests Xi (i=1,2,,n);
    3.   Calculate fitness Fi( i=1, 2, …, n);
    4.   While (not true)
    5.      Using Lévy flight to generate a new solution Xi;
    6.      Calculate the fitness value Fi of the new solution Xi;
    7.      Select candidate solution Xj,
    8.         if(Fi > Fj)
    9.            Fj replaces Fi;
    10.      end
    11.      Discard the bad solution according to the detection probability pa;
    12.      Generate new nest (solution) by local random search;
    13.      Preserve the best solution (Nest of high quality solutions);
    14.      Find the current best Xi;
    15.   end While
    16. end

    This paper predicts the data file access popularity based on the grey prediction model, and the data file is a small sample. Data access has the time locality characteristic, and the current frequently accessed files will be accessed again in a certain period of time in the future, so the older data can't reflect the current trend, and the data needs to be updated.

    Files in the distributed system are being accessed all the time, and the latest file access data will be generated at any time. In order to be able to use historical popularity to predict future file popularity more accurately, we deal with the original data sequence by metabolism according to the idea of high weight of new data in Reference [17]. When updating the data sequence, add the latest file popularity to the original sequence and eliminate the old popularity data. Repeat the above operations according to the file statistical cycle to update the file popularity forecast sequence. It can better reflect the recent trend of the file's popularity, and more accurately predict the file's future popularity.

    In general, the number of replicas is 3 in HDFS replica management policy, which is inefficient when a large number of files are accessed. Dynamic adjustment of replica number based on file popularity and file system availability can provide reliable and fast data access environment for users and improve efficiency in distributed file system.

    In the static replica management strategy, the replica number is 3 and it is fixed value, which will not change in the whole life cycle from replica creation to destruction. However, in fact, the fixed number of copies does not meet the requirement, and it should be dynamically adjusted according to the user's access characteristics and demand changes [19]. Appropriate number of copies can not only improve the readability and availability of data, but also improve the task execution ability of MapReduce. With the dynamic change of file popularity, the copy number of high popularity files should be increased dynamically to improve the transaction processing ability of the system under concurrent access. When the file popularity is reduced, the copy number should be reduced to save storage space and maintenance cost.

    The replica adjustment strategy includes replication, keeping and deletion of copies. Replication is to increase the number of copies and copy the copies of files to new nodes, in order to respond to a large number of access requests of files in time; copy keeping means that the number of copies of the document remains unchanged; copy deletion is to delete some copies of files on the premise of ensuring storage reliability when the popularity of file access decreases [22].

    Because the current prediction number of replicas is lagged, we should adjust the number of replicas with the prediction of the number at the next moment, so that we can solve the cyber source competition and congestion caused by the sudden access, and reduce the task execution delay caused by the request source competition. The strategy predicts the file access heat of the next time according to the previous n access heat values in the distributed file system before the current time, so as to adjust the number of copies.

    The number of copies is determined by the relationship between the predicted access heat of files and the average access heat of all files in the system. In order to ensure the data reliability of cloud storage system, each file has the minimum number of copies C, C = 2. The number of replicas for each file is defined by the formula (17).

    Nadjust(i)=C×H(i)HavgNexist(i) (17)

    Where, Nadjust(i) is the number of copies of file i, H(i) is the file popularity of file i predicted by the prediction model at the next moment, Havg is the average access popularity of all files in the distributed file system, Nexist(i) is the number of existing copies of file i.

    Most researchers consider the file popularity rather than the availability of cloud computing system in the fixed number of copies in the static copy strategy. Through the quantitative analysis of system availability parameters and the availability of data blocks, the relationship function between the availability of file system and the number of copies is obtained.

    (1) In the distributed file system, suppose that the file F is divided into n data blocks {b1,b2,,bn}, which are stored on different nodes. Assuming that the data block bk has rj copies, and the rj copies are placed on m nodes (jm), only when all the copy storage nodes of the data block are unavailable, the data block is bad. Because each node in the distributed file system is independent each other, P(Bk) is the availability probability of data block k, and P(Nk) is the availability probability of data node k. Therefore, the calculation for the unavailability probability of the data block bk is shown in (18).

    P(Bk)=P(N1)×P(N2)××P(Nm) (18)

    In the cloud storage system, the reliability of nodes changes exponentially with time. In a distributed file system, it is assumed that the reliability of data nodes is an exponential distribution (19) in the life cycle T of the file.

    f(T)=eαT (19)

    Where α is the failure rate of the node in the file system, and f(T) is the effective expected value of the node in the life cycle of the file.

    (2) The availability of a file is that every data block of the file is available. The failure of any data block will lead to the unavailability of the file. The file availability calculation formula is shown in (20).

    P(F)=P(B1B2Bn) (20)

    Each data block is independent of each other, and the formula (21) is obtained.

    P(F)=ni=1(1)i+1Cin(rjl=1(1eαlTrj))i (21)

    (3) The minimum number of copies can be obtained according to the file availability. The formula is shown in (22).

    P(F)=1P(F)Pstandard (22)

    Where Pstandard is the standard value of file availability.

    The final number of copy factors is determined by integrating the copy adjustment model based on file popularity. The 4.2 can ensure the minimum value of the copy numbers under the condition of file reliability, combined with the copy numbers obtained by the file popularity decision, and under the condition of ensuring file reliability, adjust the file copy numbers to improve the system response efficiency and system storage space utilization while ensuring the reliability of the distributed file system. The specific process of the dynamic adjustment of the copy numbers is shown in Figure 1. The minimum number of copies n1 is calculated under the condition of ensuring the availability of the file, and the copy number n2 is obtained by combining the relationship between the predicted value of the file's future popularity and the average value of all files in the cluster. When the number of copies needs to be increased, the larger value is selected between n1 and n2. When we reduce the number of copies, the reduction n3 is equal to the original number of copies minus the forecast numbers based on file popularity prediction. When n3 is less than 2, the minimum number of copies is 2 by the multiple copy mechanism of the distributed file system.

    Figure 1.  Flow chart of the dynamic adjustment of the copies number.

    In this section, we present a series of experiments to verify the algorithm innovation points in this paper. In order to achieve a real simulation of the changes in user access to files, we collect the data from real-time data of a social communication software, as shown in Table 1. The data to the user's access to a certain file (file popularity) is collected every ten minutes, which is used as the original data sequence before processing. First, accumulate the original sequence to get a new sequence, and perform smooth and exponential detection on the new sequence. Through formulae (2) and (3) and test standards, it is verified that the cumulative sequence is a smooth sequence and has an exponential distribution.

    Table 1.  Real-time visits to hot news by users of a social communication platform.
    Serial number/period 1 2 3 4 5 6 7 8
    Pageview/ten thousand times 324.01 366.7 447.19 370.52 289.23 382.11 388.01 381.45

     | Show Table
    DownLoad: CSV

    CloudSim4.0 is used as the platform to verify the effect of the dynamic HDFS copies placement strategy based on file access popularity. The operating system is Windows 10. The development tool is Eclipse. The version of JDK is 1.8.0. In the experiment, ten simulated heterogeneous data nodes are distributed on three racks, with nodes marked as DN1 to DN10 and racks marked as Rack0, Rack1 and Rack2. One rack contains 4 nodes and the other two racks contain 3 nodes respectively. The detailed information of the nodes and racks is shown in Table 2.

    Table 2.  Node configuration information.
    DataNode CPU RAM/MB Disk Space /GB Rack
    Basic Frequency/GHz core
    1 2.0 8 6144 1024 0
    2 2.5 6 4096 512 0
    3 2.0 6 8192 1024 0
    4 2.5 4 4096 512 0
    5 2.5 4 4096 1024 1
    6 2.0 8 8192 512 1
    7 2.0 6 6144 1024 1
    8 2.5 4 4096 512 2
    9 2.5 8 8192 1024 2
    10 2.0 6 4096 512 2

     | Show Table
    DownLoad: CSV

    The experiments are listed as follows:

    • Experiment 1: State interval division verification;

    • Experiment 2: Unbiased grey prediction model optimized by CS verification;

    • Experiment 3: File copies number verification;

    • Experiment 4: File copies adjustment efficiency verification;

    • Experiment 5: Performance test and result analysis.

    According to the grey model and Markov model, we have realized the prediction of future file visits to the original data sequence. Among them, a=0.0061, u=385.1279. Unbiased grey prediction model parameters b=0.0061,A=383.9498. According to the prediction and fitting results to the original data sequence, we select the two maxima of the residuals as the boundary of the state interval, and they are equally divided into three intervals, corresponding to the three states of E1, E2 and E3. The number of states is the same as the nest number of the CS algorithm. The division of the state interval is shown in Table 3 and the residual of the unbiased grey Markov model are given respectively in Table 4.

    Table 3.  Markov modified state interval division.
    State Interval
    E1 [−85.41, −34.30)
    E2 [−34.30, 16.81)
    E3 [16.81, 67.92)

     | Show Table
    DownLoad: CSV
    Table 4.  Residual error and state of unbiased grey model predicted value and actual value.
    Serial number 1 2 3 4 5 6 7 8
    residual 0 −14.90 67.92 −6.43 −85.41 9.76 17.94 13.64
    status E2 E2 E3 E2 E1 E2 E3 E2

     | Show Table
    DownLoad: CSV

    (1) The parameters of CS algorithm

    Considering the efficiency and accuracy of the algorithm, the parameters of CS algorithm are obtained after many experiments. The parameter settings are shown in Table 5.

    Table 5.  CS algorithm parameter settings.
    Parameter Nest capacity Nest number Step Random step Detection probability Iterations
    value 3 25 0.01 1.5 0.25 200

     | Show Table
    DownLoad: CSV

    (2) Evaluation of experimental results

    The prediction model is evaluated by MAPE (mean absolute percentage error). MAPE can not only determine whether the prediction effect of the model is good, but also can be used as the basis for the comparison between the models.

    (3) The model correction value

    Through the CS algorithm iterative optimization, when the number of iterations is 100, it can show the expected effect. Considering the accuracy and the efficiency of the experiment, the number of iterations is 200, and α=(α1,α2,α3)=(0.9883,0.4183,0.4889), and the optimized Markov correction value v=(84.81,4.56,42.93).

    Note: The Markov residual correction value obtained from the experiment is only suitable for the current prediction sequence. When the prediction sequence changes, it is necessary to recalculate α to achieve the optimal effect.

    (4) Comparison of experimental data

    Table 6 shows the comparison of experimental data based on Grey Markov model, unbiased grey Markov model and CS unbiased grey Markov model.

    Table 6.  Comparison of experimental data based on the three grey prediction models.
    Sequence number xi Grey Markov model Unbiased Grey Markov model CS Unbiased Grey Markov model
    yi ei ri yi ei ri yi ei ri
    1 324.01 324.01 0.00 0.0000 324.01 0.00 0.0000 324.01 0.00 0.0000
    2 366.70 381.97 —15.27 0.0168 372.45 −5.75 0.0157 376.94 —10.24 0.0279
    3 447.19 379.63 67.56 0.0572 421.64 25.55 0.0571 422.20 24.99 0.0559
    4 370.52 377.31 −6.79 0.0000 367.80 2.72 0.0073 372.29 −1.77 0.0048
    5 289.23 375.00 —85.77 0.0883 314.78 —25.55 0.0883 289.83 −0.60 0.0021
    6 382.11 372.70 9.41 0.0485 363.24 18.87 0.0494 367.69 14.42 0.0377
    7 388.01 370.42 17.59 0.0629 412.07 —24.06 0.0620 413.00 —24.99 0.0644
    8 381.45 368.16 13.29 0.0587 358.66 22.79 0.0597 363.15 18.30 0.0480
    Prediction value 362.89 356.80 6.09 0.0168 356.41 6.48 0.0179 361.00 1.89 0.0052
    MAPE 4.36% 4.47% 3.08%
    Note: The xiandyi unit is ten thousand times in Table 6.

     | Show Table
    DownLoad: CSV

    Let xi is the ith element of the original sequence, yi is the predicted value of xi, ei is the error, ri is the relative error. The error and the relative error can be defined as follows:

    ei=yixii=1,2,n
    ri=|ei|xii=1,2,n (23)

    Finally, MAPE is used to evaluate the fitting degree of the three models.

    From Table 6, we can see that the optimization effect of the CS Unbiased Grey Markov model is superior to the Grey Markov model and the unbiased grey Markov model. the MAPE of the Grey Markov model and the Unbiased Grey Markov model are both about 4.4%, indicating that the Markov correction model has a good effect on reducing the error of the grey model. The Markov model optimized by the CS algorithm improves the error correction, and the MAPE is only 3.08%.

    In order to further verify that CS unbiased grey Markov model can effectively predict the next moment data in the case of small samples. Using the same group of data, we do the experiment based on the grey model, unbiased grey model, grey Markov model, unbiased grey Markov model, BP neural network and grey neural network. The experimental results are shown in Figure 2 and Table 7.

    Figure 2.  Comparison of fitting effects of five grey models.
    Table 7.  Comparison of MAPE values of grey relevant models.
    Model Grey model Unbiased grey model Grey Markov model Unbiased grey Markov model Grey neural network CS Unbiased Grey Markov model
    MAPE 7.76% 7.75% 4.36% 4.47% 3.35% 3.08%

     | Show Table
    DownLoad: CSV

    From Table 7, it is concluded that the prediction accuracy of grey model and unbiased grey model is not much different, but the unbiased grey prediction model doesn't need to carry out incremental operation for the prediction of file popularity, so the calculation efficiency will be higher. In the case of small sample, the MAPE of CS unbiased grey Markov model is the smallest compared with other models, that is, the accuracy of data prediction is higher. It shows that the model can better predict the short-term data, and provide a good method for coping with future environmental changes and taking corresponding measures in advance.

    Table 8 shows the comparison of MAPE values between our method and other algorithms. In the time feedback model, the prediction is more accurate in a short period of time, but the effect becomes worse in a long period of time. The model with the decay factor has a higher prediction accuracy when the change is relatively uniform, and the prediction accuracy decreases when there is a sharp change. The MAPE value of the BP neural network model is between the time feedback model and the CS unbiased grey Markov model. Our method achieves a relatively good effect.

    Table 8.  Comparison of MAPE values of other algorithms.
    Model Model with decay factor Time feedback model BP neural network Our method
    MAPE 8.46% 3.48% 4.33% 3.08%

     | Show Table
    DownLoad: CSV

    CloudSim simulation platform is used for simulation experiment, and HDFS simulation is realized by extending the distributed file system. The prediction of file popularity uses multiple clients to write 10 files into the cluster at the same time. The number of copies obtained based on file popularity is shown in Table 9. Firstly, the CS unbiased grey Markov model is used to predict the file popularity in the future, and according to the future file access heat and the availability of the file, the change of the copy is obtained. When the file access volume increases, the number of copies needs to be increased, instead, the number of copies reduced. However, to ensure the availability of files, when the number of file copies is less than the default value, select the default value directly.

    Table 9.  Change in number of copies.
    File Initial file popularity Predict file popularity at t Initial number of copies Number of copies at t Copies change
    1 444.01 690.09 2 3 1
    2 551.56 499.9 3 2 −1
    3 869.73 489.99 4 2 −2
    4 467.32 1279.66 2 5 3
    5 365.01 657.36 2 3 1
    6 152.33 344.98 2 2 0
    7 240.87 689.76 2 3 1
    8 268.56 578.66 2 3 1
    9 446.34 267.99 2 2 0
    10 604.56 200.98 3 2 −1

     | Show Table
    DownLoad: CSV

    The experiment shows that when the file popularity increases, such as file 4, the number of copies increases; when the access popularity decreases, the number of files decreases, such as file 3. In order to ensure the availability of file copies, when the number of file copies are less than the default value, the number of copies does not change, such as file 9.

    We verify the performance of the system after replica factor adjustment by setting different task quantities, executing tasks for 10 times and recording the average completion time with different task quantities. As is shown in Figure 3.

    Figure 3.  Comparison of average completion time under different tasks.

    Through the analysis of the experimental results in Figure 3, the adjustment of the hot file replica factor has improved the system computing efficiency. The multi-factor node evaluation algorithm (MFNEA) is used to optimize the dynamic placement strategy of the new replica factor, which can promote the overall computing efficiency of the system, make full use of the computing power of the entire cluster node, and then effectively improve the work execution efficiency of MapReduce.

    In the simulation experiment, 600 files were simultaneously written to the cluster by multiple clients, and the default copy placement strategy was adopted for initial writing of files. When the system runs for a period of time, the access popularity of files changes. The default copy placement strategy is compared with the copy placement strategy with MFNEA. The storage resource usage of each node in the cluster is shown in Figure 4.

    Figure 4.  Storage space usage of each node.

    The "default method" represents the default copy placement strategy in the distributed file system, and Balancer is used to adjust the copy distribution. Based on the CPU load, memory load and disk load, MFNEA determines the node performance with the method of analytic hierarchy process. MFNEA evaluates the node in a linear weighted way, with the copy placement node selected according to node evaluation value. It can be seen from the figure that the disk space usage of each node in the default copy placement strategy varies greatly. The highest percent of node usage is 81%, while some nodes are only about 30%, which is relatively idle, causing serious imbalanced copy distribution. The MFNEA balances the storage space utilization of each node in the cluster, with the node utilization within 20%.

    The number of copies will directly affect whether it can provide users with a reliable and fast data access environment in the copy management strategy. Because the file popularity changes dynamically in the system, the static copy decision cannot adapt to the changing demand of file access. The paper designs a file copy dynamic adjustment strategy. Firstly, the recent access characteristics of the file are obtained, the unbiased grey model is used to predict the file access popularity at the next moment, and the Markov model optimized by the cuckoo search algorithm is used to correct the predicted value to increase the accuracy of the predicted value. The number of copies is calculated based on file popularity and file availability, with the number of copies reduced for low-popularity files, which improves the storage space utilization in the distributed file system, and accordingly the number of copies increased for high-popularity files to improve system performance and system reliability.

    This work was supported by Hebei Urban Forest Health Technology Innovation Center, Hebei Key Laboratory of Agricultural Big Data, National Natural Science Foundation of China (No. 62102130), Central Government Guides Local Science and Technology Development Fund Project (No. 226Z0201G), Precision Animal Husbandry Discipline Group Construction Project of Hebei Agricultural University (No. 1090064), Natural Science Foundation of Hebei Province (No. F2020204003), Science and Technology Research Foundation of Hebei Province (No. ZD2016158), Key Research and Development Program of Hebei Province (No. 19220119D), Hebei Youth Talents Support Project (No. BJ2019008), Research Project of Hebei Provincial Department of Human Resources and Social Security (No. JRS-2021-5024), Introducing Talent Research Project of Hebei Agricultural University (No. YJ201829).

    The authors declare there is no conflict of interest.


    Acknowledgments



    The authors of this study express their unconditional gratitude to the tuberculosis treatment supporters and also the faculty members' School of Public Health, University of Ghana, for their input in fine-tuning the study protocol.

    Conflict of interest



    The authors declare no conflict of interest.

    [1] Tanimura T, Jaramillo E, Weil D, et al. (2014) Financial burden for tuberculosis patients in low- And middle-income countries: A systematic review. Eur Respir J 43: 1763-1775. https://doi.org/10.1183/09031936.00193413
    [2] Aryee G, Kwarteng E, Essuman R, et al. (2018) Estimating the incidence of tuberculosis cases reported at a tertiary hospital in Ghana: A time series model approach. BMC Public Health 18: 1-8. https://doi.org/10.1186/s12889-018-6221-z
    [3] Osei E, Oppong S, Adanfo D, et al. (2019) Reflecting on tuberculosis case notification and treatment outcomes in the Volta region of Ghana: A retrospective pool analysis of a multicentre cohort from 2013 to 2017. Glob Heal Res Policy 4: 37. https://doi.org/10.1186/s41256-019-0128-9
    [4] Hill PC, Whalen CC (2014) Non-clinical factors associated with TB: Important for DOTS impact evaluation and disease elimination. Trans R Soc Trop Med Hyg 108: 523-525. https://doi.org/10.1093/trstmh/tru114
    [5] Soomro M, Qadeer E, Amir Khan M, et al. (2012) Treatment supporters and their impact on treatment outcomes in routine tuberculosis program conditions in rawalpindi district, pakistan. Tanaffos 11: 15-22.
    [6] Izudi J, Tamwesigire IK, Bajunirwe F (2020) Treatment supporters and level of health facility influence completion of sputum smear monitoring among tuberculosis patients in rural Uganda: A mixed-methods study. Int J Infect Dis 91: 149-155. https://doi.org/10.1016/j.ijid.2019.12.003
    [7] Lönroth K, Raviglione M (2015) The WHO's new end tb strategy in the post-2015 era of the sustainable development goals. Trans R Soc Trop Med Hyg 110: 148-150. https://doi.org/10.1093/trstmh/trv108
    [8] Arnold M, Beran D, Haghparast-Bidgoli H, et al. (2016) Coping with the economic burden of Diabetes, TB and co-prevalence: Evidence from Bishkek, Kyrgyzstan. BMC Health Serv Res 16: 118. https://doi.org/10.1186/s12913-016-1369-7
    [9] Atif M, Sulaiman SAS, Shafie AA, et al. (2014) Resource utilization pattern and cost of tuberculosis treatment from the provider and patient perspectives in the state of Penang, Malaysia. BMC Health Serv Res 14: 353. https://doi.org/10.1186/1472-6963-14-353
    [10] Laokri S, Dramaix-Wilmet M, Kassa F, et al. (2014) Assessing the economic burden of illness for tuberculosis patients in Benin: determinants and consequences of catastrophic health expenditures and inequities. Trop Med Int Heal 19: 1249-1258. https://doi.org/10.1111/tmi.12365
    [11] GHS.Ghana Health Service Regional Health Directorate Brong Ahafo Region 2018 Annual report. Ghana Heal Serv (2018) 126.
    [12] Bonsu F, Pedrazzoli D, Siroka A, et al. The 2016 patient cost survey in Ghana: results and implications (2018). Avaliable from: https://cdn.who.int/media/docs/default-source/hq-tuberculosis/global-task-force-on-tb-impact-measurement/meetings/2018-05/tf7_p21_ghana_patient_cost_survey.pdf.
    [13] Koopmanschap MA, Van Exel NJA, Van Den Berg B, et al. (2008) An overview of methods and applications to value informal care in economic evaluations of healthcare. Pharmacoeconomics 26: 269-280. https://doi.org/10.2165/00019053-200826040-00001
    [14] Ministry of Employment and Labour RelationsRepublic of Ghana: Ministry of Employment and Labour Relations, Medium Term Expenditure Framework (MTEF) (2019). Avaliable from: https://www.ghana.gov.gh/ministries/89eeb1f13e/.
    [15] Addo R, Agyemang SA, Tozan Y, et al. (2018) Economic burden of caregiving for persons with severe mental illness in sub-Saharan Africa: A systematic review. PLoS One 13: e0199830. https://doi.org/10.1371/journal.pone.0199830
    [16] Needham DM, Godfrey-Faussett P, Foster SD (1998) Barriers to tuberculosis control in urban Zambia: the economic impact and burden on patients prior to diagnosis. Int J Tuberc Lung Dis 2: 811-817.
    [17] Assebe LF, Negussie EK, Jbaily A, et al. (2020) Financial burden of HIV and TB among patients in Ethiopia: a cross-sectional survey. BMJ Open 10: e036892. https://doi.org/10.1136/bmjopen-2020-036892
    [18] Wyss K, Kilima P, Lorenz N (2001) Costs of tuberculosis for households and health care providers in Dar es Salaam, Tanzania. Trop Med Int Health 6: 60-68. https://doi.org/10.1046/j.1365-3156.2001.00677.x
    [19] Elamin EI, Ibrahim MIM, Sulaiman SAS, et al. (2008) Cost of illness of tuberculosis in Penang, Malaysia. Pharm World Sci 30: 281-286. https://doi.org/10.1007/s11096-007-9185-0
    [20] Fuady A, Houweling TAJ, Mansyur M, et al. (2018) Catastrophic total costs in tuberculosis-affected households and their determinants since Indonesia's implementation of universal health coverage. Infect Dis Poverty 7: 3. https://doi.org/10.1186/s40249-017-0382-3
    [21] Getahun B, Wubie M, Dejenu G, et al. (2016) Tuberculosis care strategies and their economic consequences for patients: The missing link to end tuberculosis. Infect Dis Poverty 5: 93. https://doi.org/10.1186/s40249-016-0187-9
    [22] Foster N, Vassall A, Cleary S, et al. (2015) The economic burden of TB diagnosis and treatment in South Africa. Soc Sci Med 130: 42-50. https://doi.org/10.1016/j.socscimed.2015.01.046
    [23] Sajith M, Thomas A, Kothia JJ, et al. (2015) Cost of therapy incurred for tuberculosis patients receiving directly observed therapy (DOT). Int J Pharm Pharm Sci 7: 141-144.
    [24] Jackson S, Sleigh AC, Wang GJ, et al. (2006) Poverty and the economic effects of TB in rural China. Int J Tuberc Lung Dis 10: 1104-1110.
    [25] Asres A, Jerene D, Deressa W (2018) Pre- And post-diagnosis costs of tuberculosis to patients on Directly Observed Treatment Short course in districts of southwestern Ethiopia: A longitudinal study. J Heal Popul Nutr 37: 15. https://doi.org/10.1186/s41043-018-0146-0
    [26] Tarricone R (2006) Cost-of-illness analysis What room in health economics?. Health Policy 77: 51-63. https://doi.org/10.1016/j.healthpol.2005.07.016
    [27] Ukwaja KN, Alobu I, lgwenyi C, et al. (2013) The high cost of free tuberculosis services: patient and household costs associated with tuberculosis care in Ebonyi State, Nigeria. PLoS One 8: e73134. https://doi.org/10.1371/journal.pone.0073134
    [28] John KR, Daley P, Kincler N, et al. (2009) Costs incurred by patients with pulmonary tuberculosis in rural India. Int J Tuberc Lung Dis 13: 1281-1287.
    [29] Russell S (2004) The economic burden of illness for households in developing countries: a review of studies focusing on malaria, tuberculosis, and human immunodeficiency virus/acquired immunodeficiency syndrome. Am J Trop Med Hyg 71: 147-155. https://doi.org/10.4269/ajtmh.2004.71.147
    [30] Umar NA, Fordham R, Abubakar I, et al. (2012) The indirect cost due to pulmonary Tuberculosis in patients receiving treatment in Bauchi State-Nigeria. Cost Eff Resour Alloc 10: 6. https://doi.org/10.1186/1478-7547-10-6
  • This article has been cited by:

    1. Marco Kemmerling, Daniel Lütticke, Robert H. Schmitt, Beyond games: a systematic review of neural Monte Carlo tree search applications, 2024, 54, 0924-669X, 1020, 10.1007/s10489-023-05240-w
    2. weijin mao, Wenzhen Wu, Pierluigi Siano, Hazlie Mokhlis, 2024, A short-term prediction model for photovoltaic power forecasting based on CEEMDAN-CS-LSTM, 9781510679795, 394, 10.1117/12.3024734
    3. Yixin Zhuo, Ling Li, Jian Tang, Wenchuan Meng, Zhanhong Huang, Kui Huang, Jiaqiu Hu, Yiming Qin, Houjian Zhan, Zhencheng Liang, Optimal real-time power dispatch of power grid with wind energy forecasting under extreme weather, 2023, 20, 1551-0018, 14353, 10.3934/mbe.2023642
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2476) PDF downloads(122) Cited by(1)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog