Loading [MathJax]/jax/output/SVG/jax.js
Research article Special Issues

DNA-based detection of Mycobacterium avium subsp. paratuberculosis in domestic and municipal water from Porto (Portugal), an area of high IBD prevalence

  • Mycobacterium avium subsp. paratuberculosis (MAP) may play a role in the pathology of human inflammatory bowel disease (IBD). Previously, we found a high frequency (98% in patients with active disease) of MAP DNA detection in the blood of Portuguese Crohn's Disease patients, suggesting this cohort has high exposure to MAP organisms. Water is an important route for MAP dissemination, in this study we therefore aimed to assess MAP contamination within water sources in Porto area (the residential area of our IBD study cohort).

    Water and biofilms were collected in a wide variety of locations within the Porto area, including taps connected to domestic water sources and from municipal water distribution systems. Baseline samples were collected in early autumn plus further domestic water samples in early winter, to assess the effect of winter rainfall. DNA was extracted from all 131 samples and IS900-based nested PCR used to assess the frequency of MAP presence.

    Our results show high MAP positivity in municipal water sources (20.7% of water samples and 41.4% of biofilm samples) and even higher amongst domestic sources (30.8% of water samples and 50% of biofilm samples). MAP positivity in biofilms correlated with positivity in water samples from the same sources. A significantly higher frequency of MAP-positivity was observed during winter rains as compared with samples collected in autumn prior to the winter rainfall period (61.9% versus 30.8%). We conclude that domestic and municipal water sources of Porto region have a high burden of MAP contamination and this prevalence increases with rainfall. We hypothesize that human exposure to MAP from local water supplies is commonplace and represents a major route for MAP transmission and challenge which, if positively linked to disease pathology, may contribute to the observed high prevalence of IBD in Porto district.

    Citation: Telma Sousa, Marta Costa, Pedro Sarmento, Maria Conceição Manso, Cristina Abreu, Tim J. Bull, José Cabeda, Amélia Sarmento. DNA-based detection of Mycobacterium avium subsp. paratuberculosis in domestic and municipal water from Porto (Portugal), an area of high IBD prevalence[J]. AIMS Microbiology, 2021, 7(2): 163-174. doi: 10.3934/microbiol.2021011

    Related Papers:

    [1] Sahar Badri . HO-CER: Hybrid-optimization-based convolutional ensemble random forest for data security in healthcare applications using blockchain technology. Electronic Research Archive, 2023, 31(9): 5466-5484. doi: 10.3934/era.2023278
    [2] Yang Song, Beiyan Yang, Jimin Wang . Stability analysis and security control of nonlinear singular semi-Markov jump systems. Electronic Research Archive, 2025, 33(1): 1-25. doi: 10.3934/era.2025001
    [3] Duhui Chang, Yan Geng . Distributed data-driven iterative learning control for multi-agent systems with unknown input-output coupled parameters. Electronic Research Archive, 2025, 33(2): 867-889. doi: 10.3934/era.2025039
    [4] Xiaoming Wang, Yunlong Bai, Zhiyong Li, Wenguang Zhao, Shixing Ding . Observer-based event triggering security load frequency control for power systems involving air conditioning loads. Electronic Research Archive, 2024, 32(11): 6258-6275. doi: 10.3934/era.2024291
    [5] Hangyu Hu, Fan Wu, Xiaowei Xie, Qiang Wei, Xuemeng Zhai, Guangmin Hu . Critical node identification in network cascading failure based on load percolation. Electronic Research Archive, 2023, 31(3): 1524-1542. doi: 10.3934/era.2023077
    [6] Ramalingam Sakthivel, Palanisamy Selvaraj, Oh-Min Kwon, Seong-Gon Choi, Rathinasamy Sakthivel . Robust memory control design for semi-Markovian jump systems with cyber attacks. Electronic Research Archive, 2023, 31(12): 7496-7510. doi: 10.3934/era.2023378
    [7] Rizhao Cai, Liepiao Zhang, Changsheng Chen, Yongjian Hu, Alex Kot . Learning deep forest for face anti-spoofing: An alternative to the neural network against adversarial attacks. Electronic Research Archive, 2024, 32(10): 5592-5614. doi: 10.3934/era.2024259
    [8] Yawei Liu, Guangyin Cui, Chen Gao . Event-triggered synchronization control for neural networks against DoS attacks. Electronic Research Archive, 2025, 33(1): 121-141. doi: 10.3934/era.2025007
    [9] Arvind Prasad, Shalini Chandra, Ibrahim Atoum, Naved Ahmad, Yazeed Alqahhas . A collaborative prediction approach to defend against amplified reflection and exploitation attacks. Electronic Research Archive, 2023, 31(10): 6045-6070. doi: 10.3934/era.2023308
    [10] Yongtao Zheng, Jialiang Xiao, Xuedong Hua, Wei Wang, Han Chen . A comparative analysis of the robustness of multimodal comprehensive transportation network considering mode transfer: A case study. Electronic Research Archive, 2023, 31(9): 5362-5395. doi: 10.3934/era.2023272
  • Mycobacterium avium subsp. paratuberculosis (MAP) may play a role in the pathology of human inflammatory bowel disease (IBD). Previously, we found a high frequency (98% in patients with active disease) of MAP DNA detection in the blood of Portuguese Crohn's Disease patients, suggesting this cohort has high exposure to MAP organisms. Water is an important route for MAP dissemination, in this study we therefore aimed to assess MAP contamination within water sources in Porto area (the residential area of our IBD study cohort).

    Water and biofilms were collected in a wide variety of locations within the Porto area, including taps connected to domestic water sources and from municipal water distribution systems. Baseline samples were collected in early autumn plus further domestic water samples in early winter, to assess the effect of winter rainfall. DNA was extracted from all 131 samples and IS900-based nested PCR used to assess the frequency of MAP presence.

    Our results show high MAP positivity in municipal water sources (20.7% of water samples and 41.4% of biofilm samples) and even higher amongst domestic sources (30.8% of water samples and 50% of biofilm samples). MAP positivity in biofilms correlated with positivity in water samples from the same sources. A significantly higher frequency of MAP-positivity was observed during winter rains as compared with samples collected in autumn prior to the winter rainfall period (61.9% versus 30.8%). We conclude that domestic and municipal water sources of Porto region have a high burden of MAP contamination and this prevalence increases with rainfall. We hypothesize that human exposure to MAP from local water supplies is commonplace and represents a major route for MAP transmission and challenge which, if positively linked to disease pathology, may contribute to the observed high prevalence of IBD in Porto district.



    Cyber-physical system (CPS) is an emerging technology that integrates computation, communication, and physical devices. Introduction of network technology in CPS offers significant advantages in system efficiency, scalability, and maintainability. CPS is widely applied in various fields due to its robustness, high reliability, and fast operation speed [1,2,3]. Due to the close interaction among computation, sensing, communication, and actuation in CPS, it is highly susceptible to network security threats.Additionally, intelligent CPS presents new challenges and threats different from existing issues [4,5]. Notably, CPS closely integrated with national infrastructure can lead to immeasurable severe consequences if subjected to malicious attacks. Therefore, ensuring the secure operation of CPS-related devices is an urgent problem that needs to be addressed [6,7].

    There have been many security incidents of CPS in the world, which have brought huge losses [8,9]. In June 2010, Iran's nuclear facilities were attacked by the Stuxnet virus, which seriously damaged nuclear power plants and other facilities, seriously jeopardizing Iran's nuclear security. In 2014, the Havex Trojan attacked numerous European industrial manufacturing systems. In addition to attacks on industrial systems, attacks on the power grid also occur frequently. In 2015, the Ukrainian power grid suffered a Black-Energy attack. In 2016, the Israeli power grid suffered a serious cyber attack, and in 2019, several South American countries suffered a cyber attack on the power system. An attack on the power grid would lead to widespread power outages, which would render factories inoperable and infrastructure paralyzed, causing serious inconvenience and impact to society.

    In recent years, scholars have conducted extensive research on the security of CPSs. Attack detection is one of the important strategies to ensure the safe operation of CPS, aiming to identify malicious behaviors such as network attacks and take appropriate countermeasures as early as possible to minimize or prevent significant losses [10,11]. As the complexity of attacks in CPS increases, traditional anomaly detection methods have limitations and require the design of detection algorithms with specific characteristics for a particular domain [12,13]. Reference [14] tackles the design problem of intrusion detection systems by creatively combining feature-based intrusion detection system (SIDS) and anomaly-based intrusion detection system (AIDS) to form an improved stacked ensemble algorithm (ISEA). This algorithm significantly reduces the false positive rate (FPR) through a false positive elimination strategy (FPES). Reference [15] argues that in the era of Industry 4.0, a layered and distributed approach is required for intrusion detection. This approach includes perception-execution layer monitoring based on Kalman filters, network transmission layer monitoring based on recursive Gaussian mixture models, and application control layer monitoring based on sparse deep belief network models. It enables comprehensive and efficient identification of covert attacks and ensures security protection. Reference [16] proposes a federated deep learning scheme to address the attack problem in large-scale and complex industrial networked physical systems. This scheme utilizes a deep learning-based intrusion detection model combined with federated learning framework and secure communication protocols to enhance the privacy of industrial CPS while ensuring resilience against network threats.

    Data tampering attacks are a prevalent and typical type of network attack targeting CPSs. They have also gained widespread attention in recent years[13,17,18]. The main method of data tampering attacks is to manipulate the data transmitted in the network, affecting the estimation and control center of CPS, leading to incorrect judgments or decisions, and issuing erroneous instructions, which may result in abnormal or even damaged physical devices[19,20,21]. Such attacks are often difficult to be detected by existing intrusion detection systems, thus they can quietly penetrate CPS systems and affect their stable operation[22,23].

    In recent years, the detection algorithms for data tampering attacks have received attention, and some scholars have conducted in-depth analysis and research on these attacks. Reference [24] proposes a solution to mitigate the computational cost and enhance privacy for smart grid aggregation faced with deletion and tampering attacks, targeted specifically at data tampering attacks. Reference [25] addresses firmware tampering attack defense and forensics issues by designing a detection method based on joint testing action groups and memory comparison to detect firmware tampering attacks. Reference [26] addresses the problem of the χ2 detector being difficult to detect false data injection attacks with white noise. It proposes a novel summation (SUM) detector that not only utilizes current compromise information but also collects all historical information to reveal the threat. It also has good identification for improved false data injection. Reference [27] studies the identification problem of finite impulse response (FIR) systems with binary measurements under data tampering, and the optimal attack strategy and defense method are given. Reference [28] introduces a novel secure key aggregation searchable encryption scheme and anti-tampering blockchain technology to propose a data sharing system that selectively shares and retrieves vehicle sensor data, detecting unauthorized data tampering attacks.

    This paper focuses on the data tampering attack problem in binary quantization FIR systems. Under the framework of system identification, a novel algorithm is designed to solve the system parameters and attack strategies using the maximum likelihood method and binary measurement data. The computation method is also provided. To begin, the maximum likelihood function of the measurement data is established. Then, parameter estimation algorithms are proposed for both known and unknown attack strategies. The closeness between the estimated system parameter values obtained from this estimation algorithm and the true values depends on the sample data size and whether the attack strategy is known or unknown. In the case of an unknown attack strategy, the difficulty in algorithm design increases due to the coupling between unknown parameters and attack strategies. The unknown variables in the maximum likelihood function are simplified to the system of equations. The Newton-Raphson iteration method is used to train the back propagation neural network (BPNN), and the attack strategy is estimated in advance. The estimated value of the attack strategy is then substituted into the algorithm to obtain the unknown parameter estimates. The results obtained from the algorithm show that with a small sample data size, the estimation algorithm produces large fluctuations in the solved system parameters. However, as the sample data size increases, the estimated values of the system parameters tend to become closer to the true values.

    The structure of this paper is as follows. Section 2 describes the data tampering detection problem in binary quantization FIR systems; Section 3 presents the expression of the maximum likelihood function of the system; Section 4 discusses the use of maximum likelihood estimation to solve the system parameters in the case of a known attack strategy; Section 5 discusses the step-by-step solution of system parameters and attack strategies in the case of an unknown attack strategy; Section 6 validates the estimation algorithm through numerical simulations; and Section 7 provides a summary and outlook for this paper.

    Consider a single-input single-output discrete-time FIR system:

    yk=a1uk+a2uk1++anukn1+dk=ϕTkθ+dk,k=1,2,, (1)

    where uk is the quantized system input and its possible value is in {μ1,μ2,,μr}, i.e., uk{μ1,μ2,,μr}; ϕk=[uk,,ukn1]T is the regression vector composed of quantized inputs, since uk can only take r different values, ϕk has l=rn possible values, which can be represented as π1,π2,,πl, that is, ϕk{π1,π2,,πl}; θ=[a1,,an]T is the unknown parameters of the system; dk is the system noise; yk is the system output, measured by a binary sensor with threshold C(,), and it can be represented by an indicator function as:

    s0k=I{ykC}={1,ykC;0,else. (2)

    From here on, the superscript T denotes the transpose of a matrix or vector.

    As shown in Figure 1, s0k is transmitted through a communication network to a data center, but it is susceptible to data tampering attacks during the communication process. The data received by the data center is denoted as sk, and its relationship with s0k is as follows:

    {Pr(sk=1|s0k=0)=p0;Pr(sk=0|s0k=1)=p1. (3)
    Figure 1.  System block diagram.

    The above equation essentially describes a data tampering attack strategy, which is denoted as (p0,p1).

    In this paper, a maximum likelihood estimation method is used to provide an estimation algorithm for the unknown parameters θ, and the convergence performance of the algorithm is also analyzed.

    Assumption 1. The system noise {dk} is an independent and identically distributed (i.i.d.) sequence of normal random variables with zero mean and variance σ2, and its probability distribution function and probability density function are denoted as F() and f(), respectively.

    Remark 1. 1) This paper is concerned with the binary observation. For the case of multi-threshold quantization, it can be converted into multiple binary values for processing [29]. 2) The attack process here is independent, and the existing literature often studies the cases where it is dependent and modeled as Markov processes [30]. The method in this paper can provide reference for the case of nonindependent attack process.

    Maximum likelihood estimate is a commonly used parameter estimation method in statistics that estimates model parameters by maximizing the likelihood function of the sample data. This section presents the maximum likelihood function of the data received at the receiving center, laying the foundation for the subsequent algorithm design.

    From Eqs (1) and (2), we can determine the probability of s0k=1 being equal to

    Pr(ykC)=Pr(ϕTkθ+dkC)=Pr(dkCϕTkθ)=F(CϕTkθ). (4)

    Combining this with Eq (3) and using the law of total probability, we obtain:

    Pr(sk=1)=Pr(sk=1|s0k=0)Pr(s0k=0)+Pr(sk=1|s0k=1)Pr(s0k=1)=p0Pr(yk>C)+(1p1)Pr(ykC)=p0(1F(CϕTkθ))+(1p1)F(CϕTkθ)def=gk. (5)

    Then,

    Pr(sk)=gskk(1gk)(1sk). (6)

    Based on Eq (5), gk can be simplified as:

    gk=(1p0p1)F(CϕTkθ)+p0. (7)

    Hence, for a data length of N, the maximum likelihood function of s1,s2,,sN is:

    L(θ|s1,s2,,sN)=Pr(s1,s2,,sN)=Pr(s1)Pr(s2)Pr(sN)=Nk=1gskk(1gk)(1sk). (8)

    The principle of maximum likelihood estimation is based on the intuitive idea that a parameter is the most reasonable estimate if it gives the greatest probability that the sample data will occur. For the maximum likelihood function, the parameters at its maximum value are called the maximum likelihood estimates. In this section, the attack strategy (p0,p1) is assumed to be known. Two functions are defined as follows:

    h1(x)=(1p0p1)f(Cx)(1p0p1)F(Cx)+p0,h2(x)=(1p0p1)f(Cx)(p0+p11)F(Cx)+1p0. (9)

    Let

    θlnL(s1,s2,,sN)=0. (10)

    The solution of the equation is denoted as ˆθN=ˆθN(s1,s2,,sN), which is the maximum likelihood estimate of θ.

    Since Eq (10) is highly nonlinear, an explicit solution generally doesn't exist. Here, an approximate solution method is presented. Taking the logarithm of Eq (8), we have

    lnL(θ|s1,s2,,sN)=Nk=1[sklngk+(1sk)ln(1gk)]=Nk=1skln[F(CϕTkθ)(1p0p1)+p0]+Nk=1(1sk)ln[F(CϕTkθ)(p0+p11)+1p0]. (11)

    Taking the partial derivative of the above equation with respect to θ, we get

    θlnL(θ|s1,s2,,sN)=Nk=1sk(1p0p1)f(CϕTkθ)(ϕTk)(1p0p1)F(CϕTkθ)+p0+Nk=1(sk1)(1p0p1)f(CϕTkθ)(ϕTk)(p0+p11)F(CϕTkθ)+1p0. (12)

    By Eq (9), Eq (12) can be expressed as:

    θlnL(θ)=Nk=1skh1(ϕTkθ)(ϕTk)+Nk=1(sk1)h2(ϕTkθ)(ϕTk). (13)

    Since ϕTk can take the values π1,π2,,πl, grouping the above expression based on these values, we can rearrange it as:

    θlnL(θ)=Nk=1,ϕTk=π1skh1(π1θ)(π1)+Nk=1,ϕTk=π1(sk1)h2(π1θ)(π1)+Nk=1,ϕTk=π2skh1(π2θ)(π2)+Nk=1,ϕTk=π2(sk1)h2(π2θ)(π2)++Nk=1,ϕTk=πlskh1(πlθ)(πl)+Nk=1,ϕTk=πl(sk1)h2(πlθ)(πl). (14)

    In Eq (15), if we have

    Nk=1,ϕTk=πiskh1(πiθ)+Nk=1,ϕTk=πi(sk1)h2(πiθ)=0,i=1,2,,l, (15)

    then θlnL(θ)=0. Thus, the problem of solving the equation θlnL(θ)=0 is reduced to solving the system of Eq (15).

    By rearranging (15), we get

    [h1(πiθ)+h2(πiθ)]Nk=1,ϕTk=πisk=h2(πiθ)Nk=1,ϕTk=πi1.

    As a result, we obtain

    Nk=1skI{ϕTk=πi}Nk=1I{ϕTk=πi}=h2(πiθ)h1(πiθ)+h2(πiθ)=H(πiθ), (16)

    where H(x)=h2(x)h1(x)+h2(x), h1(x), and h2(x) are given by (9). Let the H(x) reverse function be H1(x), and we have

    H(x)=(1p0p1)F(Cx)+p0H(x)p0=(1p0p1)F(Cx)H(x)p01p0p1=F(Cx) (17)
    F1(H(x)p01p0p1)=Cx. (18)

    Therefore, from (16), we have

    πiθ=CF1(Nk=1skI{ϕTk=πi}Nk=1I{ϕTk=πi}p01p0p1),i=1,2,,l.

    Expressing the above equation set in matrix form, we have

    [π1πl]θ=[CF1(Nk=1skI{ϕTk=π1}Nk=1I{ϕTk=π1}p01p0p1)CF1(Nk=1skI{ϕTk=πl}Nk=1I{ϕTk=πl}p01p0p1)]. (19)

    Let Φ=[πT1,πT2,,πTl]T, ηN,i=CF1(Nk=1skI{ϕTk=πi}Nk=1I{ϕTk=πi}p01p0p1), i=1,2,,l. The maximum likelihood estimate of θ is obtained as:

    ˆθN=Φ+[ηN,1,,ηN,l]T, (20)

    where + denotes the Moore-Penrose inverse of the matrix.

    Remark 2. From the above, it can be seen that the distribution function of the system noise plays an important role in the algorithm design. For the unknown case, an estimation algorithm can be designed to estimate it, and then the estimated value can be used instead of the true value, so as to realize the adaptive identification of unknown parameters [29].

    Theorem 1. Consider the system (1) and the binary observation (2) under the data tampering attack (3). If Assumption 1 holds, the matrix Φ generated by the system input is full rank, and Nk=1I{ϕTk=πi} as N for i=1,2,,l, then the maximum likelihood-based parameter estimate ˆθN given by (20) converges strongly to the true value θ, i.e.,

    ˆθNθ,N,w.p.1.

    Proof. By (5), it is known that

    E(skI{ϕTk=πi})=p0(1F(Cπiθ))+(1p1)F(Cπiθ)=p0+(1p0p1)F(Cπiθ).

    According to the Law of Large Numbers, for i=1,2,,l, we have

    Nk=1skI{ϕTk=π1}Nk=1I{ϕTk=π1}p0+(1p0p1)F(Cπiθ),N, (21)

    which implies that

    CF1(Nk=1skI{ϕTk=π1}Nk=1I{ϕTk=π1}p01p0p1)πiθ,N. (22)

    Since Φ is full rank, from (19) and (20), the theorem is proved.

    In the previous section, an identification algorithm with unknown parameters was designed under the assumption of known attack strategies. In the case of unknown attack strategies, the design of the identification algorithm becomes more difficult. This is mainly because the unknown parameters and attack strategies are coupled together. This section primarily addresses this problem.

    Using the maximum likelihood function, Eq (11) is used to obtain the maximum likelihood estimates of θ, p0, and p1, which results in a system of equations consisting of n+2 equations involving θ, p0, and p1. Solving this system of equations yields the estimates ˆθ, ^p0, and ^p1. However, solving a system of n+2 dimensional equations numerically will be challenging and time-consuming. The solution process is illustrated in Figure 2, divided into three steps below.

    Figure 2.  Solution steps.

    Step 1: Construction of system of equations for θ and (p0,p1)

    Based on the analysis process, when the attack strategies are known, the likelihood function L(θ|s1,s2,,sN) is first logarithmically transformed into a logarithmic form and then differentiated. From Eqs (16) and (17), it can be determined that the problem of solving the extremum of the maximum likelihood function is equivalent to the problem of solving the following system of equations:

    {F(Cπ1θ)=Nk=1skI{ϕTk=π1}Nk=1I{ϕTk=π1}p01p0p1;F(Cπ2θ)=Nk=1skI{ϕTk=π2}Nk=1I{ϕTk=π2}p01p0p1;F(Cπlθ)=Nk=1skI{ϕTk=πl}Nk=1I{ϕTk=πl}p01p0p1. (23)

    If πiθ is treated as an unknown variable, then the above system of equations has l+2 unknowns but only l equations. Therefore, it is generally unsolvable. To address this, the correlation between πi is utilized, which leads to the second step.

    Step 2: Solve the system of equations for θ and (p0,p1), and obtain the estimated values of the attack strategies (ˆpN,0,ˆpN,1)

    Let the number of maximal linearly independent sets of π1,π2,,πl be denoted as l0. Without loss of generality, assume that π1,π2,,πl0 form a maximal linearly independent set of π1,π2,,πl. Then, each πi can be expressed as a linear combination of π1,π2,,πl0, as follows:

    πi=l0j=1αi,jπj,i=1,2,,l. (24)

    Substituting the above equation into Eq (23), we get

    {F(Cl0j=1α1,jπjθ)=Nk=1skI{ϕTk=π1}Nk=1I{ϕTk=π1}p01p0p1;F(Cl0j=1α2,jπjθ)=Nk=1skI{ϕTk=π2}Nk=1I{ϕTk=π2}p01p0p1;F(Cl0j=1αl,jπjθ)=Nk=1skI{ϕTk=πl}Nk=1I{ϕTk=πl}p01p0p1. (25)

    The above system of equations consists of l equations, and the number of unknowns is reduced to l0+2. Solving the above system of equations, denoted as g(s1,s2,,sN)=(g1,g2,,gl0+2), which gives the estimated values of π1θ,π2θ,,πl0θ, and (p0,p1) as follows:

    ^π1θN=g1(s1,s2,,sN); (26)
    (27)
    ^πl0θN=gl0(s1,s2,,sN); (28)
    ˆpN,0=gl0+1(s1,s2,,sN); (29)
    ˆpN,1=gl0+2(s1,s2,,sN). (30)

    Equations (29) and (30) provide the estimated values of the attack strategies.

    The most critical part is how to obtain the expression of g(s1,s2,,sN), which can be divided into two cases. One case is when the system of Eq (25) has an analytical solution, in which case the expression of g() can be obtained through mathematical operations. The other case is when (25) does not have an analytical solution, in which case the expression of g() can be approximated using numerical methods and neural networks. The process is as follows.

    Consider the following system of equations:

    {F(Cl0j=1α1,jxj)=β1xl0+11xl0+1xl0+2;F(Cl0j=1α2,jxj)=β2xl0+11xl0+1xl0+2;F(Cl0j=1αl,jxj)=βlxl0+11xl0+1xl0+2, (31)

    where X=[x1,x2,,xl0+2]TRl0+2 is the unknown variable, and β1,β2,,βl are known parameters.

    Given a step size Δ(0,1), we uniformly sample the interval [0,1] to obtain a set

    Γ={(j1)Δ:j=1,2,,1Δ}, (32)

    where denotes the ceiling function. We randomly take any element β=[β1,β2,,βl] from Γl and substitute it into Eq (31). Then, we solve the Eq (31) using the Newton-Raphson iteration method to obtain the solution X=X(β). Specifically, let ϖi(x1,x2,,xl0+2)=F(Cl0j=1αi,jxj)βixl0+11xl0+1xl0+2, i=1,2,,l. Then, the system of Eq (31) can be equivalently written as:

    Ω(X)=[ϖ1(x1,x2,,xl0+2)ϖ2(x1,x2,,xl0+2)ϖl(x1,x2,,xl0+2)]=0. (33)

    The Jacobian matrix of the above equations is given by:

    J(X)=[ϖ1x1ϖ1xl0+2ϖlx1ϖlxl0+2]=[α1,1f(Cl0j=1α1,jxj)1β1xl0+2(1xl0+1xl0+2)2β1+xl0+1(1xl0+1xl0+2)2αl,1f(Cl0j=1α1,jxj)1βlxl0+2(1xl0+1xl0+2)2βl+xl0+1(1xl0+1xl0+2)2]. (34)

    Given an initial value X0, let Xt=[x{t}1,x{t}2,,x{t}l0+2]T represent the zero of the system of equations for the solution of Eq (33) obtained at the t-th iteration. Then,

    Xt=Xt1J1(Xt1)Ω(Xt1). (35)

    Repeat the above process and iteratively calculate until XtXt1<ε is satisfied, where X(β)=Xt is the solution to the system of Eq (31), denotes the norm of a vector, and ε>0 is a given constant called the iteration stopping tolerance.

    Repeat the above process, letting β traverse Γl, and simultaneously obtaining the solution X=X(β) for Eq (31). This way, a set of data {β,X(β):βΓl} is obtained. Treat β as the input and X(β) as the output of a BPNN*, and train the neural network as g0(β1,β2,,βl). As a result, g(s1,s2,,sN) can be computed as follows:

    *Here we choose BPNN as the fitting algorithm, mainly to show our thinking and method. In specific use, one can also choose other regression algorithms, such as random forest and so on.

    g(s1,s2,,sN)=g0(Nk=1skI{ϕTk=π1}Nk=1I{ϕTk=π1},,Nk=1skI{ϕTk=πl}Nk=1I{ϕTk=πl}). (36)

    The above process can be summarized into the following algorithm:

    Algorithm 1 Algorithm for computing function g()
    1. Initial values: sample step size Δ of (0,1); set D=; iteration tolerance ε>0 for stopping Newton-Raphson iteration.
    2. Set Γ based on (32), then obtain set Γl with m elements denoted as β1, β2,,βm
    3. Loop: i=1, 2, , m
      3.1. Substitute βi into system of Eq (31)
      3.2. Initialize X0, and use Newton-Raphson iteration to solve (31), obtain solution Xi=Xi(βi)
      3.3. Update data set D=D{(βi,Xi)}
    4. End loop
    5. Train the BPNN g0(β1,β2,,βl) based on data set D
    6. Calculate g() based on (36)

    Step 3: Obtain the estimated values of the unknown parameters ˆθN

    Express the ^π1θN,^π2θN,,^πl0θN obtained in Step 2 in vector form. Based on Eqs (26) and (28), we have:

    [π1πl0]ˆθN=[g1(s1,s2,,sN)gl0(s1,s2,,sN)].

    Letting [πT1,,πTl0]T=¯Φ, we can obtain the estimate of θ as:

    ˆθN=¯Φ+[g1(s1,s2,,sN)gl0(s1,s2,,sN)]. (37)

    In the above equation, the expressions of g1,,gl0 may contain the attack strategies (p0,p1) as parameters. In this case, replace them with their estimated values from (29) and (30).

    Theorem 2. Under the condition of Theorem 1, if the matrix ¯Φ generated by the system input is full rank, the function g() given by Algorithm 1 is the solution to the Eq (33), then the maximum likelihood-based parameter estimate ˆθN given by (37) converges strongly to the true value θ, i.e.,

    ˆθNθ,N,w.p.1.

    Proof. According to the conditions of the theorem and by (23) and (25), it is known that the solution to (31) is

    X=X([p0+(1p0p1)F(Cπ1θ),,p0+(1p0p1)F(Cπlθ)]T)=[π1θ,,πl0θ]T. (38)

    From (21), we have

    g0(Nk=1skI{ϕTk=π1}Nk=1I{ϕTk=π1},,Nk=1skI{ϕTk=πl}Nk=1I{ϕTk=πl})g0(p0+(1p0p1)F(Cπ1θ),,p0+(1p0p1)F(Cπlθ)),N,

    which together with (36) gives

    g(s1,,sN)g0(p0+(1p0p1)F(Cπ1θ),,p0+(1p0p1)F(Cπlθ))

    as N. Combining the above and (38), by (37), it can be seen that

    ˆθN=¯Φ+[g1(s1,s2,,sN)gl0(s1,s2,,sN)]¯Φ+[π1θπl0θ],N. (39)

    Considering that ¯Φ is full rank, the proof is completed.

    Consider the system

    {yk=ϕTkθ+dk;s0k=I{ykC};

    with the system parameters θ=[a1,,an]T=[2,4,8]T and the system input uk{1,3,5}; the threshold for the binary sensor output is C=30; and the system noise follows an independent and identically distributed normal random variable sequence dk(0,402). The system output is transmitted to the data center through a communication network and is subjected to data tampering attacks with attack strategy (p0,p1)=(0.4,0.2).

    The data center receives the tampered data sk after the original data s0k has been attacked. The relationship between the data is shown in Figure 3. The original data s0k has been randomly altered.

    Figure 3.  Comparison between original data and data after random attacks.

    Experiments are conducted on algorithms (10)–(22) to compute the estimated system parameter values ˆθN, where the length of the data sample is N=80,000. The results are shown in Figure 4, which indicate that: when the data sample size N is small, the estimated parameter values ˆθN have large convergence biases; When the data sample size N exceeds a critical value, the estimated parameter values ˆθN are close to the true values θ; as the data sample size N further increases, the deviation between the estimated parameter values ˆθN and the true values decreases.

    Figure 4.  Estimation of system parameters when the attack strategy is known.

    Experiments are conducted on algorithms (23)–(39) for T=150 times, and the average results of each experiment are computed to obtain ˆθN, p0, and p1, where the length of the data sample is N=60,000. The results are shown in Figures 5 and 6, which indicate that: As the data sample size N increases, the estimated system parameter values ˆθN approach the true values, and the estimated attack strategy values ˆp0 and ˆp1 also approach the true values. Moreover, due to the large number of experiments T, the convergence of each parameter improves as the data sample size N increases.

    Figure 5.  Estimation of system parameters when the attack strategy is unknown.
    Figure 6.  Estimation of the attack strategy when the attack strategy is unknown.

    In the framework of system identification, this paper carried out the research of security issue based on the maximum likelihood estimation method. For FIR systems with binary observations and data tampering attacks, the parameter estimation algorithms are proposed in the two cases of known and unknown attack strategy, and the convergence condition and convergence proof of these algorithms are given.

    The maximum likelihood estimation is a very classical and effective method. This paper explores its application in CPS security identification. In the future, this method can be extended to nonlinear systems, multi-threshold observations, colored noise, and other more general cases.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This research was supported in part by the National Natural Science Foundation of China (62173030) and in part by the Beijing Natural Science Foundation (4222050).

    The authors declare there is no conflict of interest.


    Acknowledgments



    This work was supported by Fundação para a Ciência e a Tecnologia (FCT) (Grant numbers: UID/Multi/04546/2013 and UID/Multi/04546/2016).
    The authors wish to thank all students and staff that volunteered for sample collection.

    Conflict of interest



    All authors declare no conflicts of interest in this paper.

    [1] McClure HM, Chiodini RJ, Anderson DC, et al. (1987) Mycobacterium paratuberculosis infection in a colony of stumptail macaques (Macaca arctoides). The J Infect Dis 155: 1011-1019. doi: 10.1093/infdis/155.5.1011
    [2] Judge J, Kyriazakis I, Greig A, et al. (2005) Clustering of Mycobacterium avium subsp. paratuberculosis in rabbits and the environment: how hot is a hot spot? Appl Environ Microb 71: 6033-6038. doi: 10.1128/AEM.71.10.6033-6038.2005
    [3] Good M, Clegg T, Sheridan H, et al. (2009) Prevalence and distribution of paratuberculosis (Johne's disease) in cattle herds in Ireland. Irish Vet J 62: 597-606. doi: 10.1186/2046-0481-62-9-597
    [4] Nielsen SS, Toft N (2009) A review of prevalences of paratuberculosis in farmed animals in Europe. Prev Vet Med 88: 1-14. doi: 10.1016/j.prevetmed.2008.07.003
    [5] Manning EJB, Collins MT (2001) Mycobacterium avium subsp. paratuberculosis: pathogen, pathogenesis and diagnosis. Rev Sci Tech Off Int Epiz 20: 133-150. doi: 10.20506/rst.20.1.1275
    [6] Waddell LA, Rajic A, Stark KDC, et al. (2015) The zoonotic potencial of Mycobacterium avium ssp. paratuberculosis: a systematic review and meta-analysis of the evidence. Epidemiol Infect 143: 3135-3157. doi: 10.1017/S095026881500076X
    [7] Zamani S, Zali MR, Aghdaei HA, et al. (2016) Mycobacterium avium subsp. paratuberculosis and associated risk factors for inflammatory bowel disease in Iranian patients. Gut Pathog 9: 1-10. doi: 10.1186/s13099-016-0151-z
    [8] Kuenstner JT, Potula R, Bull TJ, et al. (2020) Presence of infection by Mycobacterium avium subsp. paratuberculosis in the blood of patients with Crohn's disease and control subjects shown by multiple laboratory culture and antibody methods. Microorg 8: 2054. doi: 10.3390/microorganisms8122054
    [9] Annese V (2020) Genetics and epigenetics of IBD. Pharmacol Res 159: 104892. doi: 10.1016/j.phrs.2020.104892
    [10] Turpin W, Goethel A, Bedrani L, et al. (2018) Determinants of IBD heritability: genes, bugs, and more. Inflamm Bowel Dis 24: 1133-1148. doi: 10.1093/ibd/izy085
    [11] Alam MT, Amos GCA, Murphy ARJ, et al. (2020) Microbial imbalance in inflammatory bowel disease patients at different taxonomic levels. Gut Pathog 12: 1-8. doi: 10.1186/s13099-019-0341-6
    [12] Iacob DG (2019) Infectious threats, the intestinal barrier, and its trojan horse: dysbiosis. Front Microbiol 10: 1676. doi: 10.3389/fmicb.2019.01676
    [13] Lee A, Griffiths TA, Parab RS, et al. (2011) Association of Mycobacterium avium subspecies paratuberculosis with Crohn disease in pediatric patients. J Pediatr Gastroenterol Nutr 52: 170-174. doi: 10.1097/MPG.0b013e3181ef37ba
    [14] Renouf MJ, Cho YH, McPhee JB (2018) Emergent behavior of IBD-associated Escherichia coli during disease. Inflammatory Bowel Dis 387: 96-12.
    [15] Agrawal G, Clancy A, Huynh R, et al. (2020) Profound remission in Crohn's disease requiring no further treatment for 3–23 years: a case series. Gut Pathog 12: 16. doi: 10.1186/s13099-020-00355-8
    [16] Waddell L, Rajić A, Stärk K, et al. (2016) Mycobacterium avium ssp. paratuberculosis detection in animals, food, water and other sources or vehicles of human exposure: A scoping review of the existing evidence. Prev Vet Med 132: 32-48. doi: 10.1016/j.prevetmed.2016.08.003
    [17] Pierce ES (2018) Where are all the Mycobacterium avium subspecies paratuberculosis in patients with Crohn's disease? Plos Pathog 5: e1000234. doi: 10.1371/journal.ppat.1000234
    [18] Falkinham JO (2003) Mycobacterial aerosols and respiratory disease. Emerg Infect Dis 9: 763-767. doi: 10.3201/eid0907.020415
    [19] Angenent LT, Kelley ST, Amand ASt, et al. (2005) Molecular identification of potential pathogens in water and air of a hospital therapy pool. P Natl Acad Sci Usa 102: 4860-4865. doi: 10.1073/pnas.0501235102
    [20] Falkinham J (2018) Mycobacterium avium complex: adherence as a way of life. AIMS Microbiol 4: 428-438. doi: 10.3934/microbiol.2018.3.428
    [21] Pickup RW, Rhodes G, Bull TJ, et al. (2006) Mycobacterium avium subsp. paratuberculosis in lake catchments, in river water abstracted for domestic use, and in effluent from domestic sewage treatment works: diverse opportunities for environmental cycling and human exposure. Appl Environ Microb 72: 4067-4077. doi: 10.1128/AEM.02490-05
    [22] Samba-Louaka A, Robino E, Cochard T, et al. (2018) Environmental Mycobacterium avium subsp. paratuberculosis hosted by free-living amoebae. Front Cell Infect Microbiol 8: 28. doi: 10.3389/fcimb.2018.00028
    [23] Azevedo LF, Magro F, Portela F, et al. (2010) Estimating the prevalence of inflammatory bowel disease in Portugal using a pharmaco-epidemiological approach. Pharmacoepidemiol Drug Saf 19: 499-510. doi: 10.1002/pds.1930
    [24] Nazareth N, Magro F, Machado E, et al. (2015) Prevalence of Mycobacterium avium subsp. paratuberculosis and Escherichia coli in blood samples from patients with inflammatory bowel disease. Med Microbiol Immunol 204: 681-692. doi: 10.1007/s00430-015-0420-3
    [25] Pickup RW, Rhodes G, Arnott S, et al. (2005) Mycobacterium avium subsp. paratuberculosis in the catchment area and water of the River Taff in South Wales, United Kingdom, and its potential relationship to clustering of Crohn's disease cases in the city of Cardiff. Appl Environ Microbiol 71: 2130-2139. doi: 10.1128/AEM.71.4.2130-2139.2005
    [26] Rhodes G, Henrys P, Thomson BC, et al. (2013) Mycobacterium avium subspecies paratuberculosis is widely distributed in British soils and waters: implications for animal and human health. Environ Microbiol 15: 2761-2774.
    [27] Cunha MV, Rosalino LM, Leão C, et al. (2020) Ecological drivers of Mycobacterium avium subsp. paratuberculosis detection in mongoose (Herpestes ichneumon) using IS900 as proxy. Sci Rep-uk 10: 860. doi: 10.1038/s41598-020-57679-3
    [28] Beumer A, King D, Donohue M, et al. (2010) Detection of Mycobacterium avium subsp. paratuberculosis in drinking water and biofilms by quantitative PCR. Appl Environl Microbiol 76: 7367-7370. doi: 10.1128/AEM.00730-10
    [29] Bull TJ, McMinn EJ, Sidi-Boumedine K, et al. (2003) Detection and verification of Mycobacterium avium subsp. paratuberculosis in fresh ileocolonic mucosal biopsy specimens from individuals with and without Crohn's disease. J Clin Microbiol 41: 2915-2923. doi: 10.1128/JCM.41.7.2915-2923.2003
    [30] Aboagye G, Rowe MT (2011) Occurrence of Mycobacterium avium subsp. paratuberculosis in raw water and water treatment operations for the production of potable water. Water Res 45: 3271-3278. doi: 10.1016/j.watres.2011.03.029
    [31] Taylor RH, Falkinham JO, Norton CD, et al. (2000) Chlorine, chloramine, chlorine dioxide, and ozone susceptibility of Mycobacterium aviumAppl Environ Microb 66: 1702-1705. doi: 10.1128/AEM.66.4.1702-1705.2000
    [32] Esteban J, García-Coca M (2018) Mycobacterium biofilms. Front Microbiol 8: 414-418. doi: 10.3389/fmicb.2017.02651
    [33] Pistone D, Marone P, Pajoro M, et al. (2012) Mycobacterium avium paratuberculosis in Italy: commensal or emerging human pathogen? Digest Liver Dis 44: 461-465. doi: 10.1016/j.dld.2011.12.022
    [34] Coelho AC, Pinto ML, Silva S, et al. (2007) Seroprevalence of ovine paratuberculosis infection in the Northeast of Portugal. Small Ruminant Res 71: 298-303. doi: 10.1016/j.smallrumres.2006.07.009
    [35] Leão C, Amaro A, Santos-Sanches I, et al. (2015) Paratuberculosis asymptomatic cattle as plillovers of Mycobacterium avium subsp. paratuberculosis: consequences for disease control. Rev Port Cien Vet 110: 69-73.
  • Reader Comments
  • © 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(3516) PDF downloads(157) Cited by(6)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog