Analytic delay distributions for a family of gene transcription models

S. Hossein Hosseini; Marc R. Roussel; S. Hossein Hosseini; Marc R. Roussel

doi:10.3934/mbe.2024273

Mathematical Biosciences and Engineering

2024, Volume 21, Issue 6: 6225-6262. doi: 10.3934/mbe.2024273

Previous Article Next Article

Research article Special Issues

Analytic delay distributions for a family of gene transcription models

S. Hossein Hosseini ,
Marc R. Roussel ^,

Alberta RNA Research and Training Institute, Department of Chemistry and Biochemistry, University of Lethbridge, Lethbridge, AB T1K 3M4, Canada

Received: 19 November 2023 Revised: 01 March 2024 Accepted: 24 April 2024 Published: 13 June 2024

Models intended to describe the time evolution of a gene network must somehow include transcription, the DNA-templated synthesis of RNA, and translation, the RNA-templated synthesis of proteins. In eukaryotes, the DNA template for transcription can be very long, often consisting of tens of thousands of nucleotides, and lengthy pauses may punctuate this process. Accordingly, transcription can last for many minutes, in some cases hours. There is a long history of introducing delays in gene expression models to take the transcription and translation times into account. Here we study a family of detailed transcription models that includes initiation, elongation, and termination reactions. We establish a framework for computing the distribution of transcription times, and work out these distributions for some typical cases. For elongation, a fixed delay is a good model provided elongation is fast compared to initiation and termination, and there are no sites where long pauses occur. The initiation and termination phases of the model then generate a nontrivial delay distribution, and elongation shifts this distribution by an amount corresponding to the elongation delay. When initiation and termination are relatively fast, the distribution of elongation times can be approximated by a Gaussian. A convolution of this Gaussian with the initiation and termination time distributions gives another analytic approximation to the transcription time distribution. If there are long pauses during elongation, because of the modularity of the family of models considered, the elongation phase can be partitioned into reactions generating a simple delay (elongation through regions where there are no long pauses), and reactions whose distribution of waiting times must be considered explicitly (initiation, termination, and motion through regions where long pauses are likely). In these cases, the distribution of transcription times again involves a nontrivial part and a shift due to fast elongation processes.

Keywords:

Citation: S. Hossein Hosseini, Marc R. Roussel. Analytic delay distributions for a family of gene transcription models[J]. Mathematical Biosciences and Engineering, 2024, 21(6): 6225-6262. doi: 10.3934/mbe.2024273

Related Papers:

[1]	Yue Li, Wusheng Xu, Wei Li, Ang Li, Zengjin Liu . Research on hybrid intrusion detection method based on the ADASYN and ID3 algorithms. Mathematical Biosciences and Engineering, 2022, 19(2): 2030-2042. doi: 10.3934/mbe.2022095
[2]	Hao Zhang, Lina Ge, Guifen Zhang, Jingwei Fan, Denghui Li, Chenyang Xu . A two-stage intrusion detection method based on light gradient boosting machine and autoencoder. Mathematical Biosciences and Engineering, 2023, 20(4): 6966-6992. doi: 10.3934/mbe.2023301
[3]	Naila Naz, Muazzam A Khan, Suliman A. Alsuhibany, Muhammad Diyan, Zhiyuan Tan, Muhammad Almas Khan, Jawad Ahmad . Ensemble learning-based IDS for sensors telemetry data in IoT networks. Mathematical Biosciences and Engineering, 2022, 19(10): 10550-10580. doi: 10.3934/mbe.2022493
[4]	Zhongwei Li, Wenqi Jiang, Xiaosheng Liu, Kai Tan, Xianji Jin, Ming Yang . GAN model using field fuzz mutation for in-vehicle CAN bus intrusion detection. Mathematical Biosciences and Engineering, 2022, 19(7): 6996-7018. doi: 10.3934/mbe.2022330
[5]	Muhammad Hassan Jamal, Muazzam A Khan, Safi Ullah, Mohammed S. Alshehri, Sultan Almakdi, Umer Rashid, Abdulwahab Alazeb, Jawad Ahmad . Multi-step attack detection in industrial networks using a hybrid deep learning architecture. Mathematical Biosciences and Engineering, 2023, 20(8): 13824-13848. doi: 10.3934/mbe.2023615
[6]	Noor Wali Khan, Mohammed S. Alshehri, Muazzam A Khan, Sultan Almakdi, Naghmeh Moradpoor, Abdulwahab Alazeb, Safi Ullah, Naila Naz, Jawad Ahmad . A hybrid deep learning-based intrusion detection system for IoT networks. Mathematical Biosciences and Engineering, 2023, 20(8): 13491-13520. doi: 10.3934/mbe.2023602
[7]	Kunpeng Li, Zepeng Wang, Yu Zhou, Sihai Li . Lung adenocarcinoma identification based on hybrid feature selections and attentional convolutional neural networks. Mathematical Biosciences and Engineering, 2024, 21(2): 2991-3015. doi: 10.3934/mbe.2024133
[8]	Jing Zhang, Baoqun Yin, Yu Zhong, Qiang Wei, Jia Zhao, Hazrat Bilal . A two-stage grasp detection method for sequential robotic grasping in stacking scenarios. Mathematical Biosciences and Engineering, 2024, 21(2): 3448-3472. doi: 10.3934/mbe.2024152
[9]	Ziyu Jin, Ning Li . Diagnosis of each main coronary artery stenosis based on whale optimization algorithm and stacking model. Mathematical Biosciences and Engineering, 2022, 19(5): 4568-4591. doi: 10.3934/mbe.2022211
[10]	Natalya Shakhovska, Vitaliy Yakovyna, Valentyna Chopyak . A new hybrid ensemble machine-learning model for severity risk assessment and post-COVID prediction system. Mathematical Biosciences and Engineering, 2022, 19(6): 6102-6123. doi: 10.3934/mbe.2022285

Abstract

1. Introduction

With the development of internet technology, there is a tremendous increase in the dimensions of information that are generated, exchanged and processed. Almost in all the fields there is a difficulty in handling large amount of high dimensional data. These data become the target for illegal activities which impose a severe threat to the network security. The traditional security techniques such as antivirus, firewall, data encryption and user identification which acts as the first line of defense that alone is not sufficient to provide the better security to the network. The second line of security is highly recommended that can be provide by an Intrusion Detection System (IDS) ^[1]. Using these two lines of security enhances the overall network security. An Intrusion Detection System (IDS) identifies the intruders who are harmful to an organization. The main goal of an IDS is to monitor the network or system for abnormal pattern or traffic and prevent it from unauthorized access ^[2]. However, the key problem is identifying the unknown malicious traffic. The Intrusion Detection Systems is broadly classified into two types based on the source of data. First, Network Intrusion Detection System (NIDS) placed near the network points checks the network traffic from the routers and gateways for intrusion. It can detect attacks in real time. The main limitation of NIDS is that it can monitor the traffic passing through the specific network nodes. Second, the Host Intrusion Detection System (HIDS) scans the individual host for suspicious activities like an unwanted configuration change, deletion or modification of system files, or an unwanted sequence of system calls. If any of these activity occurs, it sends an alert to the administrator ^[3]. The main limitation in HIDS is that it cannot analyze the network behaviors. Next the Intrusion Detection System (IDS) uses three different detection mechanisms to detect the attacks and each detection method is further classified accordingly. Figure 1 shows the different detection mechanisms to detect the intrusions.

Figure 1. Types of detection mechanism.

DownLoad: Full-Size Img PowerPoint

1.1. Signature-based/misuse detection mechanism

Signature-based IDS uses the set of rules or predefined signatures for detecting the known attacks ^[3]. Misuse detection techniques are based on knowledge or based on Machine Learning (ML) methods. In a knowledge-based strategy, the network stream of traffic or host audit data is analyzed and compared with the predefined set of rules. There are three ways of applying the knowledge-based approach: signature matching, rule-based expert system, and state transition analysis ^[3]. The signature matching compares the incoming network traffic with the predefined attack signatures for intrusion. A rule-based expert system finds out the intrusion by comparing the traffic with the predetermined rules. Finally, state transition analysis maintains the state transition model for each known suspicious activity. Machine Learning (ML) based IDS offers learning based on model to detect the normal behavior from attack behavior. The ML model generates the representation for the known models. It uses supervised machine learning techniques such as SVM, Decision Tree to detect the known attacks more efficiently.

The significant drawback of Signature-based Detection is that it needs to update the signatures regularly for new attacks for which signatures are not there in the database. As a result, it generates more false alarms. It has to maintain a large signature database.

1.2. Anomaly-based detection mechanism

Anomaly-based detection uses a hypothesis to detect novel, unknown intrusion if any deviation or behaviour change occurs ^[3]. Anomaly-based detection comprises statistical techniques, Finite State Machine (FSM) and machine learning techniques. FSM generates the behaviour model that contains states, actions and transitions. Semi-Supervised and Unsupervised ML techniques such as clustering algorithms, one class SVM are mainly used for anomaly detection. The anomaly-based detection can detect both known and unknown attacks. The prime limitation is that anomaly-based detection suffers from high false positives ^[3].

1.3. Hybrid detection mechanism

Hybrid detection mechanism combines signature and anomaly-based detection to detect the intrusion ^[3].

1.4. Limitations of current IDS

Current IDS can detect more accurately and precisely the known attacks, which leaves the system more vulnerable to novel malicious attacks where predefined signatures are not available ^[4].

High False positives-The current IDS suffers from high false positives. False positives are the incorrect classification of a normal event as malicious events. The main aim of an IDS is to minimize the false positives as far as possible ^[4].

High False-negatives-The current IDS suffers from high false negatives. False negatives are the incorrect classification of malicious events as normal events. The main aim of an IDS is to reduce the false negatives as minimum as possible ^[4].

Data Overload-Millions of data are generated every day depending on the company's size and the IDS tools used, which leads to a data overload ^[4].

1.5. DataStream mining

DataStream mining is the continuous well-ordered sequence of data that arrives in a timely fashion ^[5]. Data streams are a constant flow of unlimited data with high speed, and data changes with time compared to the traditional databases ^[6]. The primary requirement of the stream data is to inspect each instance only once. Therefore, it should use a limited amount of memory and should give the outcome in a less amount of time. Data streams are of two types, namely streams that are static and streams that are evolving. The bulk arrival of data that won't change with time is called static stream. The data that arrives continuously and changes with time is called evolving data streams ^[6].

The paper is organized as follows: Section 2 presents the earlier work on various machine learning and feature selection techniques. Section 3 discusses the methodology used for preprocessing, feature selection and classification. Section 4 discusses the experimental setup, dataset details and the performance evaluation metrics used. Section 5 presents the result outcomes and its performance comparison with the state of art methods and finally Section 6 summarizes the research work undergone and future scope.

2. Related works

There has been quite a lot of research on intrusion detection using machine learning and deep learning techniques and the hybrid approaches. The related work (Table 1) presents the latest techniques used and its relevant advantages and disadvantages.

Table 1. Related works.

Authors	Algorithms Used	Computational Methods	Pros	Cons
X. Li et al. (2021) ^[7]	CMPSO, ACO, KH, IKH, LNNLS-KH	The NSL_KDD dataset is taken for intrusion detection. The proposed LNNLS-KH is compared with CMPSO, ACO, KH, and IKH algorithms and best features are selected. Then KNN technique is applied for further classification.	Good convergence speed. Lower false positive rate. LNNLS-KH gives an accuracy of 96.12%	Attack with fewer samples, an adversarial learning method can be used to create similar attacks.
X. K. Zhou et al. (2021) ^[8]	AdaBoost, LSTM, CNN LSTM, VLSTM	The authors proposed a variational long short-term memory technique that detects intrusion anomalies efficiently based on feature reconstruction.	The loss function helps to reconstruct the hidden variable into meaningful form. Proposed VLSTM gives the accuracy of 89.5%.	Imbalanced data is still a challenge in anomaly detection.
T. H. Hai et al. (2020) ^[9]	Novel architecture of storage tools and distributed log processing	The Novel storage with HBase or Apache Spark enhances NIDS data processing.	Processing time is reduced.	Takes more query time.
S. N. Mighan et al. (2020) ^[10]	SVM, SAE-SVM	Hybrid scheme that uses deep learning and machine learning method together (SAE-SVM) can detect intrusion attacks more precisely.	SAE-SVM shows higher accuracy of 95.98%.	Computational time taken is more.
T. Vaiyapuri et al. (2020) ^[11]	Stacked Autoencoder, Sparse Autoencoder, Denoising Autoencoder, Contractive Autoencoder, Convolution Autoencoder	All the methods mentioned are compared with contractive autoencoder. The contractive autoencoder gives 87.98% intrusion detection accuracy.	SAE 85.23%, SAAE 86.02% DAE 86.92% ContAE 87.98% CAE 81.07%	Reduced reconstruction ability further needs to be improved
C. F. Tang et al. (2020) ^[12]	Stacked Autoencoder Deep Neural Network (SAE-DNN), SAAE-DNN	SAAE selects the needed features from the intrusion dataset and initializes weight to DNN thus improves the intrusion detection accuracy.	SAE-DNN gives 82.23% accuracy. The proposed SAAE-DNN shows higher accuracy of 87.74% when compared to SAE-DNN.	The accuracy achieved can be improved further.
A. D. Jadhav et al. (2019) ^[13]	SVM, KNN, Decision Tree, Naïve Bayes classifier machine learning techniques to detect the attacks. Proposed a distributed and parallel approach.	The proposed distributed and parallel approach enhances the efficiency of detecting the intrusions faster.	Faster detection of intrusion.	Not applied in real time environment.
A. Muallem et al. (2017) ^[14]	Hoeffding Tree Restricted, Hoeffding Trees AUE2 with Buffer AUE2 without Buffer Hoeffding Adaptive Trees with DDM and ADWIN	The survey talks about the flexibility of the methods when used in different domains of streaming data. Combinations of technique solves the intrusion anomaly detection problem.	All the techniques surveyed gives good accuracy. Hoeffding Tree 93% Restricted Hoeffding Trees 92.15% AUE2 with Buffer 94.07% AUE2 without Buffer 94.06% Hoeffding Adaptive Trees with DDM and ADWIN 92%.	Improvement in accuracy is needed.
G. Kim et al. (2014) ^[15]	Hybrid approach combines C 4.5 algorithm and one class SVM approach	C 4.5 is used to build the misuse detection model and decompose the training data into smaller subsets. Multiple one class SVM models are built to enhance intrusion detection accuracy	Good detection accuracy for both known and unknown attacks. Low false positive.	Processing time is more for the proposed technique.
H. K. Sok et al. (2013) ^[16]	ADT algorithm	The alternating decision tree algorithm is used for knowledge discovery and effective selection of features.	Classification process simplifies.	Evaluation speed needs to be improved.
S. J. Horng et al. (2011) ^[17]	BIRCH hierarchical clustering technique, SVM	It combines BIRCH hierarchical clustering technique and SVM which gives good detection accuracy.	The training time is reduced and gives a good detection accuracy of 95.72%. It mainly detects DOS or Probe attacks.	It cannot detect U2L and R2L attacks.
Tavallaee et al. (2009) ^[18]	KDD CUP 99 dataset	Analysis on KDD CUP 99 dataset is made.	Good for signature-based detection.	Poor detection when used for anomaly detection. Contains many duplicate values leads to performance degradation.

| Show Table

DownLoad: CSV

3. Methodology

The proposed work uses the NSL_KDD benchmark dataset for intrusion analysis. The first phase focuses on feature selection using Darwinian Particle Swarm Optimization and selects key features that contribute to intrusion. The second phase emphasises applying the proposed Stacked Encoding Hoeffding Tree technique to classify the data based on performance metrics like accuracy, specificity, sensitivity, false-positive rate, false-negative rate and F1 score.

The benchmark NSL_KDD dataset is taken for the analysis. The dataset which we have taken is standalone dataset in order to incorporate stream data, the dataset is streamed using the techniques in Matlab tool. The system object technique simplifies the streaming process in Matlab. The data is now continuous and it possess the characteristics of streaming data. The attacks are detected using the proposed Stacked Autoencoder Hoeffding Tree (SAE-HT) classification approach. The bio-inspired technique called Darwinian Particle Swarm Optimization (DPSO) enhances the performance of the SAE-HT classification technique. The distracting variance is removed from the data using the DPSO feature selection technique that enables the classifier to perform better, especially when dealing with the high dimensional features. Figure 2 shows the flow diagram of SAE-HT classification technique.

Figure 2. Flow diagram of SAE-HT classification technique.

DownLoad: Full-Size Img PowerPoint

3.1. Data Preprocessing

Data preprocessing is a vital step in machine learning. The raw data collected is made ready to be used by machine learning techniques to extract meaningful insights from the data. The NSL_KDD dataset taken from Canadian Institute for Cybersecurity is analyzed. If the downloaded dataset is in gz or Tcpdump format, convert that to CSV file format and load the dataset into the environment. The proposed classification technique supports only the numeric data. Since most machine learning (ML) techniques use mathematical equations that only support the usage of numeric data, the conversion of categorical data into numerical data using data conversion functions should occur. The one-hot encoding technique converts the categorical data to numerical data, thus making it convenient to apply machine learning techniques to the dataset.

3.2. One-hot encoding technique

The one-hot encoding technique is the most effective encoding technology to deal with the conversion of numeric to categorical features ^[19]. It can convert the categorical features to a binary vector. The vector holds Zeroes, and One's as values. The vector holds only one element with the value one and other values corresponding to Zero. An element with value one indicates the occurrence of the possible values against the categorical features. The NSL_KDD dataset contains three categorical features such as protocol_type, service and flag. For example, the protocol_type consists of three attributes: ICMP, TCP, UDP. Using one hot encoding technique, ICMP can be encoded as (1, 0, 0), TCP can be encoded as (0, 1, 0), UDP can be encoded as (0, 0, 1). Similarly, categorical features service and flag are also encoded into one-hot encoding vectors.

3.3. Feature selection

Feature selection mainly reduces the features by removing the insignificant or less significant ones. There are many feature selection techniques. This paper uses the Bio-Inspired feature selection technique called Darwinian Particle Swarm Optimization (DPSO) to select optimal features. The main goal of the DPSO is to find the non-redundant and highly correlated features, thus eliminates the least correlated features ^[20]. DPSO is the extension of the PSO technique with the basic principle of survival of the fittest. The major drawback of PSO and other bio-inspired feature selection techniques is that it gets trapped in the local optimum. No long-term memory effect leads to premature convergence to the local optimum DPSO overcomes the above drawback. DPSO consists of multiple swarms in which each swarm performs like an individual PSO. All the swarms run simultaneously towards the local optimum, and each swarm is compared. The best swarm gets the extended life, and the stagnated or insufficient swarm got deleted.

DPSO is an effective evolutionary algorithm that searches the population of individuals for local optimum. The population represents the "swarm", and individuals represent the "particles". Throughout the evolutionary process, every particle updates its moving direction according to the position. There are two positions, namely local best and global best position ^[21]. The local best position (pbest) is the particle's position among all the particles that are visited so far. The global best position (gbest) is the best fitness achieved among all the visited particles so far. In each iteration, the particle updates its velocity and position, which is given by Eqs (1) and (2) ^[22].

${V}_{j}\left(t+1\right) = {\omega V}_{j}\left(t\right)+{c}_{1}{r}_{1}\left({P}_{j}-{Y}_{j}\left(t\right)\right)+{c}_{2}{r}_{2}\left({P}_{g}-{Y}_{j}\left(t\right)\right)$

(1)

${Y}_{j}\left(t+1\right) = {Y}_{j}\left(t\right)+{V}_{j}(t+1)$

(2)

${Y}_{j} = ({Y}_{j}^{1}, {Y}_{j}^{2}, \dots., {Y}_{j}^{D})$ denotes the particle position at generation j in a D-dimensional search space. V_j (t + 1) is the velocity produced at time t + 1. P_j is the best position of the particles found so far (pbest). It denotes the cognitive component of Eq (1). P_g is the best global position found so far (gbest). It represents the social component of Eq (1). ω signifies the inertia weight, c signifies a constant called local and global weight, r signifies random variable which ranges between (0, 1). The searching process keeps on going until the predefined threshold reaches. DPSO is more efficient than the original PSO and thus prevents premature convergence to a local optimum. Rearrange the velocity function in Eq (1), and it is given by Eq (3) ^[22].

${V}_{j}\left(t+1\right) = \alpha {V}_{j}\left(t\right)+\frac{\alpha }{2}{V}_{j}\left(t-1\right)+\frac{\alpha \left(1-\alpha \right)}{6}{V}_{j}\left(t-2\right)\\ +\frac{\alpha \left(1-\alpha \right)\left(2-\alpha \right)}{24}{V}_{j}\left(t-3\right)+{c}_{1}{r}_{1}\left({P}_{j}-{Y}_{j}\left(t\right)\right)+{c}_{2}{r}_{2}({P}_{g}-{Y}_{j}\left(t\right))$

(3)

The left side of Eq (3) gives a discrete version of the derivative of velocity D𝛼[V𝑡 + 1] with order α = 1. Finally, the Grunwald-Letnikov derivative expresses the discrete-time implementation using Eq (4) ^[22].

${D}^{\alpha }\left[{V}_{t+1}\right] = \frac{1}{{T}^{\alpha }}\sum \limits_{k = 0}^{r}\frac{{\left(-1\right)}^{k}\rm{\Gamma }\left(\alpha +1\right)v(t-kT)}{\rm{\Gamma }(k+1)\rm{\Gamma }(\alpha -k+1)}$

(4)

Here T represents a sample period, and r represents the truncate order. Repeat Eq (3) to update every particle velocity. The different values generated to control the convergence speed of the optimization process.

3.4. Proposed classification algorithm (SAE-HT)

Classification is the supervised machine learning technique that divides the set of data into different classes. As the data generation is continuous, it is impossible to store a massive amount of data. So, the data needs to be analyzed as it comes in. The classification techniques are very much helpful in classifying the streaming data ^[23]. This paper proposes a classification technique for stream data mining called the Stacked Autoencoder Hoeffding Tree approach. Even though stacked autoencoder gives good accuracy on its own but in order to improve it further we applied hoeffding tree approach. The stacked autoencoder is an unsupervised learning technique that maintains three layers: input, output, and hidden layers ^[24]. An encoder takes the input and maps it to the hidden representation. Finally, the decoder reconstructs the input. Equation (5) gives the encoder process ^[24].

${\rm{h}}_{\rm{n}} = \rm{f}({\rm{W}}_{1}{\rm{x}}_{\rm{n}}+{\rm{b}}_{1})$

(56)

where h_n represents the encoder, vector determined from x_n. Encoding function f, weight matrix of the encoder W₁, and bias vector b₁.

Equation 6 represents the decoder process. Where g denotes decoding function, W₂ denotes the decoders weight matrix, and b₂ denotes the bias vector ^[24].

$\widehat{{\rm{X}}_{\rm{n}}} = \rm{g}({\rm{W}}_{2}{\rm{h}}_{\rm{n}}+{\rm{b}}_{2})$

(6)

When the decoder reconstructs the input data, there is a possibility that it results in reconstruction error. Equation (7) minimizes the reconstruction error ^[24].

$\rm{ \mathsf{ ϕ} }\left(\rm{\Theta }\right) = \genfrac{}{}{0pt}{}{\rm{arg}\rm{m}\rm{i}\rm{n}}{\rm{ \mathsf{ θ} }, {\rm{ \mathsf{ θ} }}^{\rm{\text{'}}}}\frac{1}{\rm{n}}{\sum }_{\rm{i} = 1}^{\rm{n}}\rm{L}({\rm{X}}^{\rm{i}}, {\widehat{\rm{X}}}^{\rm{i}})$

(7)

where L denotes the loss function, and $L\left(X, \widehat{X}\right) = \parallel X-\widehat{X}{\parallel }^{2}$ represents the loss function.

Now Hoeffeding tree is used to classify the class labels. The Hoeffding tree is a kind of decision tree that consists of a root node, test node and leaf node. The leaf node holds the class prediction. The main requirement in streaming data is to classify the data in a single pass. The data represented as a tree structure using the Hoeffding tree technique when the model built incrementally. The main disadvantage in Hoeffding tree classification is that it fails to classify the data into a tree when a tie occurs. Equation (8) gives the formula for Hoeffding bound calculation ^[25].

$\in = \sqrt{\frac{\left({R}^{2}\rm{log}\left(\frac{1}{\delta }\right)\right)}{2n}}$

(8)

where R denotes the range of random variable, δ denotes the desired probability not within ε of the expected value, N denotes the number of instances collected at the node.

Algorithm: Stacked Autoencoder Hoeffding Tree Approach
Input: NSL_KDD benchmark dataset
Output: Classification results: Accuracy, Sensitivity, Specificity, False Alarm Rate, False Negative Rate and F₁ Score.
Procedure
1. Load the network intrusion benchmark dataset (NSL_KDD dataset)
2. Data preprocessing
3.Apply bio-inspired feature selection technique (DPSO) to select the significant features
4. Partition the dataset into training and testing data
5. Input data to the encoder and maps it to the hidden representation to obtain a learned feature vector.
6. The feature vector from the previous layer is the input to the next layer. This process repeats till the training ends.
7. The decoder reconstructs the input from the hidden representation. Thus, Eq (7) minimizes the reconstruction error.
8. for entire training data, do
9. Use the Hoeffding tree technique and sort the instances to $f$ leaf
10. Update the necessary statistics in $f$
11. Increment ${m}_{f}$ , for all the instances at $f$
12. if ${m}_{f}$ mod ${m}_{min}=\rm{ }0$ and instances at $f$ are of a different class, then
13. Calculate ${G}_{f}k\left(i\right)$ for each feature
14. The feature with the highest ${G}_{f}$ value represents ${k}_{q}$
15. The feature with the second-highest ${G}_{f}$ value represents ${k}_{q}$
16.Calculate Hoeffding Tree Bound using Eq (8)
17. If ${k}_{q}$ ≠ ${k}_{\rm{\varnothing }}$ and ( ${G}_{f}{(k}_{p})$ - ${G}_{f}{(k}_{q})$ > ∈ or ∈< τ) then
18. Replace $f$ with an internal node that splits on ${k}_{p}$
19. for entire branches, do
20. Add and initialize a new leaf with sufficient statistics
21. End
22.Returns the classification result.

| Show Table

DownLoad: CSV

The pseudocode of the stacked autoencoder hoeffdding tree explains the following: Line 1 loads and streams the NSL_KDD dataset using Matlab platform. Line 2 performs the preprocessing of the dataset. The categorical data in the dataset is converted to numeric data using the one-hot encoding technique. In line 3, the DPSO feature selection algorithm is applied, and the important features are selected based on the selection score generated by the algorithm. Line 4 the data is divided into test and train sets. Lines 5–7 explains the stacked autoencoder process where the input data to the encoder is mapped to the hidden representations to obtain the learned features. The learned feature vector is given as an input to the next layer. This process continues till the training ends. The decoder reconstructs the hidden representations. The reconstruction error is minimized by calculating the loss function. Lines 8–21 explains the procedure of the hoeffding tree. The output generated by the stacked autoencoder is fed as an input to the hoeffding tree technique. The input is taken and the root node is decided and initialized. The tree is constructed incrementally for each training data until suitable leaf arrives. Each node has enough information to make a decision. It uses information gain to make the attribute split. The best attribute is found at each node and test is performed to decide whether the attribute yields better results based on Hoeffding bound. The test is applied on the attributes to find out which attribute gives better results and split the node for the growth of the tree. Line 22 returns the classification results.

4. Experimental setup

The experiment was conducted with the proposed technique in MATLAB R2021a on Windows 10 64-bit operating system with Ryzen 7 processor and 16 GB RAM. The experiment uses a stream-oriented offline database for querying the network traffic data. It enables a natural data analysis within the IDS. The streaming architecture is used across multiple sites to process attack data to increase the performance in large scale systems because the data is processed during the natural flow and stored only for a limited amount of time for analysis. In the research work, we have applied a hybrid classification technique. The proposed hybrid classification technique's performance was analyzed by applying various performance evaluation metrics.

4.1. Intrusion dataset

The NSL_KDD dataset is used for experimentation of the proposed work. The KDD CUP'99 dataset is the popular benchmark dataset used for network intrusion detection system. The main limitation of the KDD CUP'99 dataset is that it contains a high number of redundant records that affect the effectiveness of the evaluated system ^[26]. The improved version of KDD CUP'99 is the NSL_KDD dataset in which the redundant records are removed. NSL_KDD dataset have approximately 125, 973 training data and 22, 544 testing data ^[27]. Similar to KDD CUP'99, the records in NSL KDD dataset are unique and labelled as normal and anomaly. It has 41 features that address four different categories of attacks. Table 2 shows the NSL KDD features ^[27].

Table 2. NSL_KDD dataset features.

No	NSL_KDD Feature Names	No	NSL_KDD Feature Names
1	duration	21	_is_host_login
2	protocols_types	22	_is_guests_login
3	services	23	_counts
4	flag	24	src_counts
5	source_bytes	25	srcerror_rate
6	dstn_bytes	26	srcs_error_rate
7	lang	27	rerrors_rate
8	wrong_fragments	28	src_rerrors_rate
9	urgent	29	simlr_srcs_rate
10	hot	30	diff_srcs_rate
11	numbr_failed_login	31	src_diffs_host_rate
12	usr_login	32	dstv_host_counts
13	numbr_compromises	33	dstn_host_src_count
14	numbr_root_shell	34	dstn_simlr_src_rate
15	numbr_attempts	35	dstn_diffs_srce_rate
16	numbr_of_roots	36	dstn_hosts_sim_srce_port_rates
17	numbr_files_creation	37	dstn_hosts_src_diffr_host_rates
18	numbr_offshell	38	dstn_hosts_srcerror_rate
19	numbr_access_usrfiles	39	dstn_hosts_srce_srerror_rate
20	numbr_outbounds_cmd	40	dstn_hosts_error_rates
		41	dstn_hosts_srce_error_rates

| Show Table

DownLoad: CSV

4.2. Performance evaluation

The study was conducted to measure the performance and compare the results based on the performance metrics such as accuracy, sensitivity, specificity, False Positive Rate (FPR), False Negative Rate (FNR). The feature selection technique selects the important features that play a significant role in the intrusion. Then the proposed classifier performance is determined with the help of performance evaluation metrics. Accuracy is the most common performance evaluation technique. Accuracy is the total number of instances correctly predicted as an attack from the ratio of all predictions made from the given dataset. Equation (9) shows the way of accuracy (ACC) calculation ^[28].

$A\rm{C}\rm{C} = \frac{\rm{T}\rm{P}+\rm{T}\rm{N}}{\rm{T}\rm{P}+\rm{T}\rm{N}+\rm{F}\rm{P}+\rm{F}\rm{N}}$

(9)

The confusion matrix in Table 3 defines the TP, TN, FP, FN

Table 3. Confusion matrix for IDS.

Actual Instance	Predicted Instance
Actual Instance	Normal	Anomaly
Normal	True Negative (TN)	False Positive (FP)
Anomaly	False Negative (FN)	True Positive (TP)

| Show Table

DownLoad: CSV

TN (True Negative): The actual and predicted instances both are classified as normal.

FP (False Positive): The actual normal instance is predicted as an anomaly by the IDS.

FN (False Negative): The actual anomaly instance is predicted as normal by the IDS.

TP (True Positive): The actual anomaly instance is predicted as an anomaly by the IDS.

Sensitivity (S) is the probability to identify an attack instance as an attack accurately. Recall or True Positive Rate (TPR) are the other names for sensitivity. Sensitivity can be calculated as in Eq (10) ^[29].

$\rm{S} = \frac{\rm{T}\rm{P}}{\rm{T}\rm{P}+\rm{F}\rm{N}}$

(10)

Specificity (SP) is the probability to identify a normal instance as normal correctly. Specificity can be named as True Negative Rate (TNR). Equation (11) shows the specificity calculation formula ^[29].

$\rm{S}\rm{P} = \frac{\rm{T}\rm{N}}{\rm{T}\rm{N}+\rm{F}\rm{P}}$

(11)

FPR is the chance that the normal instance incorrectly classified as an attack. FPR is also known as False Alarm Rate (FAR). FPR calculation is given by Eq (12) ^[30].

$\rm{F}\rm{P}\rm{R} = \frac{\rm{F}\rm{P}}{\rm{F}\rm{P}+\rm{T}\rm{N}}$

(12)

FNR is the chance that the attack instance incorrectly classified as normal. It is also named Miss Rate. FNR calculation is given in Eq (13) ^[30].

$\rm{F}\rm{N}\rm{R} = \frac{\rm{F}\rm{N}}{\rm{F}\rm{N}+\rm{T}\rm{P}}$

(13)

F₁ Score is the harmonic mean of precision and recall. F₁ Score serves as the derived effectiveness measurement. F₁ Score calculation is given by Eq (14) ^[30].

${\rm{F}}_{1}\rm{S}\rm{c}\rm{o}\rm{r}\rm{e} = 2\rm{*}\frac{\rm{P}\rm{r}\rm{e}\rm{c}\rm{i}\rm{s}\rm{i}\rm{o}\rm{n}\rm{*}\rm{R}\rm{e}\rm{c}\rm{a}\rm{l}\rm{l}}{\rm{P}\rm{r}\rm{e}\rm{c}\rm{i}\rm{s}\rm{i}\rm{o}\rm{n}+\rm{R}\rm{e}\rm{c}\rm{a}\rm{l}\rm{l}}$

(14)

We designed an overall system architecture for supporting the properties of Network IDS with the help of machine learning techniques. Figure 2 shows the flow diagram of system architecture consisting of network benchmark dataset, feature selection, proposed classifier, evaluation parameters and analysis of results.

5. Results and discussion

In this experimental analysis, the feature selection technique selects the optimal subset of informative features from the given dataset. We take the optimal subset of features for analyzing the impact of the proposed technique on how accurately it classifies the network traffic into normal or anomaly.

A comprehensive study was conducted to validate the impact of the proposed technique, such as performance metrics, feature selection analysis which is helpful in the prediction of anomalies in the network traffic. The main objective of this experimental study is that.

● To identify the optimal subset of informative features that contributes to intrusion from the intrusion dataset. Though the DPSO feature selection technique is not new, the main advantage of DPSO is that it won't get trapped in the local optimum that helps us select the best features for evaluation.

● To enhance the efficiency of intrusion detection using a hybrid classification technique, which is novel and is used to evaluate the performance and detection capabilities.

● To evaluate the proposed classification technique on the NSL_KDD dataset.

● To study the performance evaluation of the proposed classification technique by applying various performance measures such as accuracy, sensitivity, specificity, FPR and FNR.

● To compare the efficiency of the proposed technique with state-of-art methods.

5.1. Feature selection techniques analysis

A comprehensive study was performed between five different bio-inspired feature section techniques in terms of accuracy, detection rate and FPR. The proposed feature selection technique achieved a considerable performance improvement compared to other feature selection techniques. The results achieved after applying the feature selection and proposed classification technique to the NSL_KDD dataset is depicted in Table 4. It compares classification accuracy, classifiers detection rate and false positive rate of the proposed technique with the state-of-art techniques. In the compared feature selection techniques LHHLS-KH feature selection technique has given the best accuracy of 96.12%, detection rate of 96.48% and false positive rate of 4%. Now the performance of the proposed technique combined with feature selection gives an accuracy of 97.70 %, detection rate of 97% and false positive rate of 1.25 % which is 0.52, 1.58 and 2.75% higher than LHHLS-KH technique.

Table 4. Classification accuracy, DR and FPR of various feature selection techniques (NSL_KDD dataset).

Feature Selection	Number of Features	Selected features	Accuracy (%)	Detection Rate (DR) (%)	False Positive Rate (FPR) (%)
CMPSO ^[7]	33	2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 30, 32, 33, 34, 35, 37, 40, 41	80.91	83.01	24.70
ACO ^[7]	31	1, 3, 4, 6, 7, 8, 12, 14, 15, 16, 17, 19, 20, 21, 23, 24, 25, 27, 28, 29, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41	84.03	87.15	19.30
KH ^[7]	26	2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 18, 19, 21, 22, 23, 24, 26, 28, 30, 31, 32, 40, 41	88.21	89.46	12.88
IKH ^[7]	25	2, 3, 4, 5, 6, 8, 10, 11, 12, 14, 17, 18, 20, 21, 22, 27, 28, 29, 30, 31, 34, 35, 36, 39, 41	91.22	91.75	7.34
LHHLS-KH ^[7]	19	2, 3, 4, 6, 8, 10, 11, 15, 17, 19, 20, 21, 29, 30, 33, 34, 36, 37, 40	96.12	96.48	4
DPSO	28	2, 3, 4, 6, 7, 8, 11, 12, 13, 15, 16, 17, 18, 19, 21, 23, 24, 25, 26, 28, 29, 30, 33, 34, 35, 36, 37, 40	97.70	97	1.25

| Show Table

DownLoad: CSV

To visualize the difference between classification accuracy, DR and FPR are shown in Figure 3. For the NSL_KDD dataset, the false positive rate of DPSO with the proposed classification technique is 1.25%. It reduces by 23.45, 18.05, 11.63, 6.09 and 2.75%, respectively, compared with Cross Mutation Particle Swarm Optimization (CMPSO), Ant Colony Optimization (ACO), Krill Herd (KH), Improved Krill Herd (IKH) and Linear Nearest Neighbor Lasso Step-Krill Herd (LHHLS-KH). Similarly, the detection rate of the DPSO with the proposed classification technique is 97% which is 13.99, 9.85, 7.54, 5.25, 0.52% higher than CMPSO, ACO, KH, IKH and LHHLS-KH. Similarly, the accuracy of DPSO with the proposed classification technique is 97.70% which is 16.79, 13.67, 9.49, 6.48 and 1.58% higher than CMPSO, ACO, KH, IKH and LHHLS-KH. In conclusion, the proposed classification technique with DPSO proves that it has higher detection accuracy, lower false positive rate and higher detection rate.

Figure 3. Comparison of classification accuracy, DR and FPR with FS techniques (NSL_KDD dataset).

DownLoad: Full-Size Img PowerPoint

5.2. Detection time analysis

The detection time was compared for further evaluation. Table 5 shows the detection time when applied to different techniques. The detection time represents the time taken from inputting the optimal subset of features into the classifier till the end of detection.

Table 5. Detection time of different feature selection techniques (NSL_KDD dataset).

Feature Selection Techniques	Detection Time (Sec)
CMPSO ^[20]	41.36
ACO ^[20]	41.39
KH ^[20]	37.78
IKH ^[20]	36.99
LNNLS-KH ^[20]	34.40
DPSO	41.56

| Show Table

DownLoad: CSV

Table 5 shows the time taken to detect the intrusion when different bio-inspired feature selection techniques are applied. The features selected by CMPSO, ACO, KH, IKH and LHHLS-KH are inputted to the individual classifier. In the existing systems compared, ACO feature selection took the highest detection time of 41.39 seconds with the accuracy of 84.03%, detection rate of 87.15% and false positive rate of 19.30%. Now the selected features with DPSO feature selection technique when given to a proposed hybrid classifier it took 41.56 seconds to detect the intrusion which is 0.41% higher than ACO feature selection technique. Even though it took slightly longer detection time there is a considerable increase in the performance. The DPSO detection accuracy is 97.7%, detection rate is 97% and false positive rate is 1.25% which when compared to ACO yields 13.67, 9.85% increase in accuracy, detection rate and 18.05% decrease in false positive rate.

5.3. Performance analysis of proposed classifier

The experimental study was conducted using the proposed stacked autoencoder hoeffding tree approach with and without feature selection. The proposed technique performance evaluation was based on accuracy, sensitivity, specificity, false-positive rate and false-negative rate. The results achieved with and without the feature selection of the proposed classification technique is given in Table 6.

Table 6. Performance comparison of the proposed technique with and without feature selection.

Performance Metrics (%)	SAE-HT without FS (41 Features)	SAE-HT with FS (28 Features)
Accuracy	55.93	97.7
Sensitivity (TPR)	100	97
Specificity (TNR)	28.57	98.75
False Positive Rate (FPR)	71.43	1.25
False Negative Rate (FNR)	0	3
F1 Score	37.6	98

| Show Table

DownLoad: CSV

The proposed SAE-HT was employed on the NSL_KDD dataset. Figure 4 demonstrates the performance comparison of proposed technique with and without FS. First, the proposed classifier was applied to all the 41 features of the dataset and found that the detection accuracy was only 55.93%. Moreover, the false positive rate was 71.43% and false negative rate was almost 0%. Our aim was to reduce the false positive rate to a greater extent and increase the accuracy as far as possible. Then the study was made and found that less relevant and unimportant features affect the performance of the proposed classifier. Second, the bio-inspired DPSO feature selection technique was applied and 28 important features that contributes to intrusion were selected. The selected features were given to the stacked autoencoder to learn the features so as to yield good performance and the output from the stacked autoencoder was given to hoeffding tree and found that the accuracy increased to 97.7%, false positive rate decreased to 1.25% which is 43.77% higher than SAE-HT without FS. Same way the false positive rate decreased by 70.18% when SAE-HT with FS was applied. Likewise, specificity was 97% and sensitivity rate was 98.75% when SEA-HT with FS was applied. Therefore, it proves that the proposed classifier with feature selection provides a good detection accuracy with less FPR and FNR.

Figure 4. Performance comparison of the proposed technique with and without feature selection.

DownLoad: Full-Size Img PowerPoint

The exact process is carried out for 29 features, and the accuracy is 97.3%. When applied to 30 features, the accuracy is 97.1%. From this, we conclude that the accuracy reaches saturation. There is no drastic improvement in accuracy when we include more features. As we increase the number of features once again, the accuracy starts decreasing. Therefore 28 features are the optimal features that contribute to intrusion. As the accuracy increases the false positive rate started decreasing.

Figure 5 depicts the performance of Stacked Autoencoder (SAE) and Stacked Autoencoder Hoeffding Tree (SAE-HT). The stacked autoencoder by itself gives a good accuracy, detection rate of 85.23%, 85.13% and reduced false alarm rate of 14.62% but achieves much better results in terms of accuracy, detection rate and false alarm rate when combined with the hoeffding tree. SAE-HT technique when applied it gave the accuracy of 97.7%, Detection Rate of 97% which is 12.47, 11.87% higher than SAE and false alarm rate of 1.25% which is decreased by 13.37% when compared to SAE.

Figure 5. Performance comparison: SAE Vs SAE-HT.

DownLoad: Full-Size Img PowerPoint

5.4. Comparison with state-of-art techniques

The proposed technique was compared with the state-of-Art techniques to prove its efficiency. Table 7 show the comparison of proposed with other state-of-art techniques. We have compared different conventional classification methods and deep leaning methods with our proposed technique.

Table 7. Comparison of proposed technique with state-of-art techniques.

Prediction Techniques	Accuracy (%)
VLSTM ^[8]	89.5
AdaBoost ^[8]	84.8
LSTM ^[8]	85.1
CNN LSTM ^[8]	83.3
SAE-SVM ^[10]	95.98
SAE ^[11]	85.23
SSAE ^[11]	86.02
DAE ^[11]	86.92
ContAE ^[11]	87.98
CAE ^[11]	81.07
SAE-DNN ^[12]	82.23
SAAE-DNN ^[12]	87.74
Hoeffding Tree ^[14]	93
Restricted Hoeffding Trees ^[14]	92.15
AUE2 with Buffer ^[14]	94.07
AUE2 without Buffer ^[14]	94.06
HAT + DDM + ADWIN ^[14]	92
Hierarchical Clustering and SVM ^[17]	95.72
Proposed SAE-HT	97.7

| Show Table

DownLoad: CSV

The SAE-HT classifier with feature selection gives the overall detection accuracy of 97.7 % which increases by 8.2, 12.9, 12.6, 14.4, 1.72, 12.47, 11.68, 10.78, 9.72, 16.7, 15.47, 9.96, 4.7, 5.55, 3.63, 3.64, 5.7 and 1.98% when compared to VLSTM, AdaBoost, LSTM, CNN LSTM, SAE-SVM, SAE, SAAE, DAE, Cont AE, CAE, SAE-DNN, SAAE-DNN, Hoeffding tree, Restricted hoeffding tree, AUE2 with buffer, AUE2 without buffer, HAT+DDM+ADWIN, Hierarchical Clustering and SVM.

Compared with the state-of-the-art techniques in the related work, the proposed SAE-HT classifier performs better in terms of accuracy, detection rate, and false alarm rate. Thus, proving that it has met the significant challenges by giving high accuracy of 97.7% and detection rate of 97% and well reduced false alarm rate of 1.25%.

6. Conclusions

An effective intrusion detection system was developed using the stacked autoencoder hoeffding tree technique for classification and the Darwinian particle swarm optimization method for feature selection. Using DPSO feature selection, the optimal subset of features contributing to intrusions are selected from the given intrusion dataset. The training and testing of data are given to our proposed classification technique to classify the network attacks. The main goal of the stacked autoencoder hoeffding tree technique is to increase the accuracy and reduce the false alarm rate. Moreover, we evaluated our proposed work on the NSL_KDD intrusion dataset. The dataset is streamed in a Matlab environment. Our proposed technique is applied and measures the classification performance using accuracy, specificity, sensitivity, detection rate, false alarm rate, false-negative rate. This classification performance is compared with other states of the art methods. The main challenge with the current intrusion detection is that it cannot detect the new unknown attacks. We proved that our proposed technique shows a robust significant amount of performance improvement in detection accuracy, thus reducing the false alarm rate to a greater extent. The supervised learning technique mainly depends on the predefined attack signatures that make new attacks goes undetected. The proposed stacked autoencoder hoeffding tree detects known attacks and, at the same time, detects unknown attacks to a greater extent. Results are compared with the state-of-art techniques and found that the SAE-HT technique gives higher accuracy of 97.7%, which increases by 8, 12.9, 12.6, 14.4%, 1.72, 12.47, 11.68, 10.78, 9.72, 16.7, 15.47, 9.96, 4.7, 5.55, 3.63, 3.64, 5.7 and 1.98% when compared to VLSTM, AdaBoost, LSTM, CNN LSTM, SAE-SVM, SAE, SAAE, DAE, Cont AE, CAE, SAE-DNN, SAAE-DNN, Hoeffding tree, Restricted hoeffding tree, AUE2 with buffer, AUE2 without buffer, HAT + DDM + ADWIN, Hierarchical Clustering and SVM and higher detection rate of 97%. Thus, reducing the false alarm rate to 1.25%. Further work can incorporate a study on a more effective method to improve the IDS. For the attack with fewer samples, an adversarial learning method can be used to create similar attacks, thus increasing the diversity of training examples that improve the detection accuracy.

Acknowledgments

This Publication is an outcome of the R & D work under MeitY's Visvesvaraya PhD Scheme, Government of India, implemented by Digital India Corporation.

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Funding

This research was supported by the Faculty of Management of Comenius University in Bratislava, Slovakia.

References

[1]	D. R. Larson, D. Zenklusen, B. Wu, J. A. Chao, R. H. Singer, Real-time observation of transcription initiation and elongation on an endogeneous yeast gene, Science, 332 (2011), 475–478. https://doi.org/10.1126/science.1202142 doi: 10.1126/science.1202142
[2]	S. Buratowski, S. Hahn, L. Guarente, P. A. Sharp, Five intermediate complexes in transcription initiation by RNA polymerase Ⅱ, Cell, 56 (1989), 549–561. https://doi.org/10.1016/0092-8674(89)90578-3 doi: 10.1016/0092-8674(89)90578-3
[3]	A. Dvir, J. W. Conaway, R. C. Conaway, Mechanism of transcription initiation and promoter escape by RNA polymerase Ⅱ, Curr. Opin. Genet. Dev., 11 (2001), 209–214. https://doi.org/10.1016/S0959-437X(00)00181-7 doi: 10.1016/S0959-437X(00)00181-7
[4]	X. Darzacq, Y. Shav-Tal, V. de Turris, Y. Brody, S. M. Shenoy, R. D. Phair, et al., In vivo dynamics of RNA polymerase Ⅱ transcription, Nat. Struct. Mol. Biol., 14 (2007), 796–806. https://doi.org/10.1038/nsmb1280 doi: 10.1038/nsmb1280
[5]	J. C. Venter, M. D. Adams, E. W. Myers, P. W. Li, R. J. Mural, G. G. Sutton, et al., The sequence of the human genome, Science, 291 (2001), 1304–1351.
[6]	C. N. Tennyson, H. J. Klamut, R. G. Worton, The human dystrophin gene requires 16 hours to be transcribed and is cotranscriptionally spliced, Nat. Genet., 9 (1995), 184–190. https://doi.org/10.1038/ng0295-184 doi: 10.1038/ng0295-184
[7]	J.-F. Lemay, F. Bachand, Fail-safe transcription termination: Because one is never enough, RNA Biol., 12 (2015), 927–932. https://doi.org/10.1080/15476286.2015.1073433 doi: 10.1080/15476286.2015.1073433
[8]	R. Ben-Yishay, Y. Shav-Tal, The dynamic lifecycle of mRNA in the nucleus, Curr. Opin. Cell Biol., 58 (2019), 69–75. https://doi.org/10.1016/j.ceb.2019.02.007 doi: 10.1016/j.ceb.2019.02.007
[9]	B. Daneholt, Assembly and transport of a premessenger RNP particle, Proc. Natl. Acad. Sci. U.S.A., 98 (2001), 7012–7017. https://doi.org/10.1073/pnas.111145498 doi: 10.1073/pnas.111145498
[10]	J. Sheinberger, Y. Shav-Tal, The dynamic pathway of nuclear RNA in eukaryotes, Nucleus, 4 (2013), 195–205. https://doi.org/10.4161/nucl.24434 doi: 10.4161/nucl.24434
[11]	A. Chaudhuri, S. Das, B. Das, Localization elements and zip codes in the intracellular transport and localization of messenger RNAs in Saccharomyces cerevisiae, WIREs RNA, 11 (2020), e1591.
[12]	U. Schmidt, E. Basyuk, M.-C. Robert, M. Yoshida, J.-P. Villemin, D. Auboeuf, et al., Real-time imaging of cotranscriptional splicing reveals a kinetic model that reduces noise: Implications for alternative splicing regulation, J. Cell Biol., 193 (2011), 819–829. https://doi.org/10.1083/jcb.201009012 doi: 10.1083/jcb.201009012
[13]	P. Cramer, A. Srebrow, S. Kadener, S. Werbajh, M. de la Mata, G. Melen, et al., Coordination between transcription and pre-mRNA processing, FEBS Lett., 498 (2001), 179–182. https://doi.org/10.1016/S0014-5793(01)02485-1 doi: 10.1016/S0014-5793(01)02485-1
[14]	A. Babour, C. Dargemont, F. Stutz, Ubiquitin and assembly of export competent mRNP, Biochim. Biophys. Acta, 1819 (2012), 521–530. https://doi.org/10.1016/j.bbagrm.2011.12.006 doi: 10.1016/j.bbagrm.2011.12.006
[15]	R. A. Coleman, B. F. Pugh, Slow dimer dissociation of the TATA binding protein dictates the kinetics of DNA binding, Proc. Natl. Acad. Sci. U.S.A., 94 (1997), 7221–7226. https://doi.org/10.1073/pnas.94.14.7221 doi: 10.1073/pnas.94.14.7221
[16]	J. F. Kugel, J. A. Goodrich, A kinetic model for the early steps of RNA synthesis by human RNA polymerase Ⅱ, J. Biol. Chem., 275 (2000), 40483–40491. https://doi.org/10.1074/jbc.M006401200 doi: 10.1074/jbc.M006401200
[17]	A. Kalo, I. Kanter, A. Shraga, J. Sheinberger, H. Tzemach, N. Kinor, et al., Cellular levels of signaling factors are sensed by $\beta$ -actin alleles to modulate transcriptional pulse intensity, Cell Rep., 11 (2015), 419–432. https://doi.org/10.1016/j.celrep.2015.03.039 doi: 10.1016/j.celrep.2015.03.039
[18]	R. D. Bliss, P. R. Painter, A. G. Marr, Role of feedback inhibition in stabilizing the classical operon, J. Theor. Biol., 97 (1982), 177–193. https://doi.org/10.1016/0022-5193(82)90098-4 doi: 10.1016/0022-5193(82)90098-4
[19]	F. Buchholtz, F. W. Schneider, Computer simulation of T3/T7 phage infection using lag times, Biophys. Chem., 26 (1987), 171–179. https://doi.org/10.1016/0301-4622(87)80020-0 doi: 10.1016/0301-4622(87)80020-0
[20]	S. N. Busenberg, J. M. Mahaffy, The effects of dimension and size for a compartmental model of repression, SIAM J. Appl. Math., 48 (1988), 882–903. https://doi.org/10.1137/0148049 doi: 10.1137/0148049
[21]	J. Lewis, Autoinhibition with transcriptional delay: A simple mechanism for the zebrafish somitogenesis oscillator, Curr. Biol., 13 (2003), 1398–1408. https://doi.org/10.1016/S0960-9822(03)00534-7 doi: 10.1016/S0960-9822(03)00534-7
[22]	N. A. M. Monk, Oscillatory expression of Hes1, p53, and NF- $\kappa$ B driven by transcriptional time delays, Curr. Biol., 13 (2003), 1409–1413. https://doi.org/10.1016/S0960-9822(03)00494-9 doi: 10.1016/S0960-9822(03)00494-9
[23]	L.-J. Chiu, M.-Y. Ling, E.-H. Wu, C.-X. You, S.-T. Lin, C.-C. Shu, The distributed delay rearranges the bimodal distribution at protein level, J. Taiwan Inst. Chem. Eng., 137 (2022), 104436. https://doi.org/10.1016/j.jtice.2022.104436 doi: 10.1016/j.jtice.2022.104436
[24]	M. Jansen, P. Pfaffelhuber, Stochastic gene expression with delay, J. Theor. Biol., 364 (2015), 355–363. https://doi.org/10.1016/j.jtbi.2014.09.031 doi: 10.1016/j.jtbi.2014.09.031
[25]	K. Rateitschak, O. Wolkenhauer, Intracellular delay limits cyclic changes in gene expression, Math. Biosci., 205 (2007), 163–179. https://doi.org/10.1016/j.mbs.2006.08.010 doi: 10.1016/j.mbs.2006.08.010
[26]	M. R. Roussel, On the distribution of transcription times, BIOMATH, 2 (2013), 1307247. https://doi.org/10.11145/j.biomath.2013.07.247 doi: 10.11145/j.biomath.2013.07.247
[27]	M. R. Roussel, R. Zhu, Stochastic kinetics description of a simple transcription model, Bull. Math. Biol., 68 (2006), 1681–1713. https://doi.org/10.1007/s11538-005-9048-6 doi: 10.1007/s11538-005-9048-6
[28]	S. Vashishtha, Stochastic modeling of eukaryotic transcription at the single nucleotide level, M.Sc. thesis, University of Lethbridge, 2011, URL https://www.uleth.ca/dspace/handle/10133/3190.
[29]	V. Pelechano, S. Chávez, J. E. Pérez-Ortín, A complete set of nascent transcription rates for yeast genes, PLoS One, 5 (2010), e15442. https://doi.org/10.1371/journal.pone.0015442 doi: 10.1371/journal.pone.0015442
[30]	T. Muramoto, D. Cannon, M. Gierliński, A. Corrigan, G. J. Barton, J. R. Chubb, Live imaging of nascent RNA dynamics reveals distinct types of transcriptional pulse regulation, Proc. Natl. Acad. Sci. U.S.A., 109 (2012), 7350–7355. https://doi.org/10.1073/pnas.1117603109 doi: 10.1073/pnas.1117603109
[31]	A. Raj, C. S. Peskin, D. Tranchina, D. Y. Vargas, S. Tyagi, Stochastic mRNA synthesis in mammalian cells, PLoS Biol., 4 (2006), e309. https://doi.org/10.1371/journal.pbio.0040309 doi: 10.1371/journal.pbio.0040309
[32]	D. M. Suter, N. Molina, D. Gatfield, K. Schneider, U. Schibler, F. Naef, Mammalian genes are transcribed with widely different bursting kinetics, Science, 332 (2011), 472–474. https://doi.org/10.1126/science.1198817 doi: 10.1126/science.1198817
[33]	I. Jonkers, H. Kwak, J. T. Lis, Genome-wide dynamics of Pol Ⅱ elongation and its interplay with promoter proximal pausing, chromatin, and exons, eLife, 3 (2014), e02407. https://doi.org/10.7554/eLife.02407 doi: 10.7554/eLife.02407
[34]	P. K. Parua, G. T. Booth, M. Sansó, B. Benjamin, J. C. Tanny, J. T. Lis, et al., A Cdk9-PP1 switch regulates the elongation-termination transition of RNA polymerase Ⅱ, Nature, 558 (2018), 460–464. https://doi.org/10.1038/s41586-018-0214-z doi: 10.1038/s41586-018-0214-z
[35]	L. Bai, R. M. Fulbright, M. D. Wang, Mechanochemical kinetics of transcription elongation, Phys. Rev. Lett., 98 (2007), 068103. https://doi.org/10.1103/PhysRevLett.98.068103 doi: 10.1103/PhysRevLett.98.068103
[36]	L. Bai, A. Shundrovsky, M. D. Wang, Sequence-dependent kinetic model for transcription elongation by RNA polymerase, J. Mol. Biol., 344 (2004), 335–349. https://doi.org/10.1016/j.jmb.2004.08.107 doi: 10.1016/j.jmb.2004.08.107
[37]	F. Jülicher, R. Bruinsma, Motion of RNA polymerase along DNA: a stochastic model, Biophys. J., 74 (1998), 1169–1185. https://doi.org/10.1016/S0006-3495(98)77833-6 doi: 10.1016/S0006-3495(98)77833-6
[38]	H.-Y. Wang, T. Elston, A. Mogilner, G. Oster, Force generation in RNA polymerase, Biophys. J., 74 (1998), 1186–1202. https://doi.org/10.1016/S0006-3495(98)77834-8 doi: 10.1016/S0006-3495(98)77834-8
[39]	T. D. Yager, P. H. Von Hippel, A thermodynamic analysis of RNA transcript elongation and termination in Escherichia coli, Biochemistry, 30 (1991), 1097–1118. https://doi.org/10.1021/bi00218a032 doi: 10.1021/bi00218a032
[40]	S. J. Greive, J. P. Goodarzi, S. E. Weitzel, P. H. von Hippel, Development of a "modular" scheme to describe the kinetics of transcript elongation by RNA polymerase, Biophys. J., 101 (2011), 1155–1165. https://doi.org/10.1016/j.bpj.2011.07.042 doi: 10.1016/j.bpj.2011.07.042
[41]	T. Filatova, N. Popovic, R. Grima, Statistics of nascent and mature rna fluctuations in a stochastic model of transcriptional initiation, elongation, pausing, and termination, Bull. Math. Biol., 83 (2021), 3. https://doi.org/10.1007/s11538-020-00827-7 doi: 10.1007/s11538-020-00827-7
[42]	A. N. Boettiger, P. L. Ralph, S. N. Evans, Transcriptional regulation: Effects of promoter proximal pausing on speed, synchrony and reliability, PLoS Comput. Biol., 7 (2011), e1001136. https://doi.org/10.1371/journal.pcbi.1001136 doi: 10.1371/journal.pcbi.1001136
[43]	X. Xu, N. Kumar, A. Krishnan, R. V. Kulkarni, Stochastic modeling of dwell-time distributions during transcriptional pausing and initiation, in 52nd IEEE Conference on Decision and Control, 2013, 4068–4073.
[44]	M. Hamano, Stochastic transcription elongation via rule based modelling, Electron. Notes Theor. Comput. Sci., 326 (2016), 73–88. https://doi.org/10.1016/j.entcs.2016.09.019 doi: 10.1016/j.entcs.2016.09.019
[45]	S. Klumpp, T. Hwa, Stochasticity and traffic jams in the transcription of ribosomal RNA: Intriguing role of termination and antitermination, Proceedings of the National Academy of Sciences.
[46]	A. S. Ribeiro, O.-P. Smolander, T. Rajala, A. Häkkinen, O. Yli-Harja, Delayed stochastic model of transcription at the single nucleotide level, J. Computat. Biol., 16 (2009), 539–553. https://doi.org/10.1089/cmb.2008.0153 doi: 10.1089/cmb.2008.0153
[47]	M. J. Schilstra, C. L. Nehaniv, Stochastic model of template-directed elongation processes in biology, BioSystems, 102 (2010), 55–60. https://doi.org/10.1016/j.biosystems.2010.07.006 doi: 10.1016/j.biosystems.2010.07.006
[48]	A. Garai, D. Chowdhury, D. Chowdhury, T. V. Ramakrishnan, Stochastic kinetics of ribosomes: single motor properties and collective behavior, Phys. Rev. E, 80 (2009), 011908. https://doi.org/10.1103/PhysRevE.79.011916 doi: 10.1103/PhysRevE.79.011916
[49]	A. Garai, D. Chowdhury, T. V. Ramakrishnan, Fluctuations in protein synthesis from a single RNA template: Stochastic kinetics of ribosomes, Phys. Rev. E, 79 (2009), 011916. https://doi.org/10.1103/PhysRevE.79.011916 doi: 10.1103/PhysRevE.79.011916
[50]	L. Mier-y-Terán-Romero, M. Silber, V. Hatzimanikatis, The origins of time-delay in template biopolymerization processes, PLoS Comput. Biol., 6 (2010), e1000726. https://doi.org/10.1371/journal.pcbi.1000726 doi: 10.1371/journal.pcbi.1000726
[51]	L. S. Churchman, J. S. Weissman, Nascent transcript sequencing visualizes transcription at nucleotide resolution, Nature, 469 (2011), 368–373.
[52]	K. C. Neuman, E. A. Abbondanzieri, R. Landick, J. Gelles, S. M. Block, Ubiquitous transcriptional pausing is independent of RNA polymerase backtracking, Cell, 115 (2003), 437 – 447. https://doi.org/10.1016/S0092-8674(03)00845-6 doi: 10.1016/S0092-8674(03)00845-6
[53]	R. Landick, The regulatory roles and mechanism of transcriptional pausing, Biochem. Soc. Trans., 34 (2006), 1062–1066. https://doi.org/10.1042/BST0341062 doi: 10.1042/BST0341062
[54]	V. Epshtein, F. Toulmé, A. R. Rahmouni, S. Borukhov, E. Nudler, Transcription through the roadblocks: the role of RNA polymerase cooperation, EMBO J., 22 (2003), 4719–4727. https://doi.org/10.1093/emboj/cdg452 doi: 10.1093/emboj/cdg452
[55]	S. Klumpp, Pausing and backtracking in transcription under dense traffic conditions, J. Stat. Phys., 142 (2011), 1252–1267. https://doi.org/10.1007/s10955-011-0120-3 doi: 10.1007/s10955-011-0120-3
[56]	M. Voliotis, N. Cohen, C. Molina-París, T. B. Liverpool, Fluctuations, pauses, and backtracking in DNA transcription, Biophys. J., 94 (2008), 334–348.
[57]	J. Li, D. S. Gilmour, Promoter proximal pausing and the control of gene expression, Curr. Opin. Genet. Dev., 21 (2011), 231–235.
[58]	S. Nechaev, K. Adelman, Pol Ⅱ waiting in the starting gates: Regulating the transition from transcription initiation into productive elongation, Biochim. Biophys. Acta, 1809 (2011), 34 – 45.
[59]	P. B. Rahl, C. Y. Lin, A. C. Seila, R. A. Flynn, S. McCuine, C. B. Burge, et al., c-Myc regulates transcriptional pause release, Cell, 141 (2010), 432 – 445. https://doi.org/10.1016/j.cell.2010.03.030 doi: 10.1016/j.cell.2010.03.030
[60]	P. Feng, A. Xiao, M. Fang, F. Wan, S. Li, P. Lang, et al., A machine learning-based framework for modeling transcription elongation, Proc. Natl. Acad. Sci. U.S.A., 118 (2021), e2007450118. https://doi.org/10.1073/pnas.2007450118 doi: 10.1073/pnas.2007450118
[61]	B. Zamft, L. Bintu, T. Ishibashi, C. Bustamante, Nascent RNA structure modulates the transcriptional dynamics of RNA polymerases, Proc. Natl. Acad. Sci. U.S.A., 109 (2012), 8948–8953. https://doi.org/10.1073/pnas.1205063109 doi: 10.1073/pnas.1205063109
[62]	R. D. Alexander, S. A. Innocente, J. D. Barrass, J. D. Beggs, Splicing-dependent RNA polymerase pausing in yeast, Mol. Cell, 40 (2010), 582–593. https://doi.org/10.1016/j.molcel.2010.11.005 doi: 10.1016/j.molcel.2010.11.005
[63]	N. Gromak, S. West, N. J. Proudfoot, Pause sites promote transcriptional termination of mammalian RNA polymerase Ⅱ, Mol. Cell. Biol., 26 (2006), 3986–3996. https://doi.org/10.1128/MCB.26.10.3986-3996.2006 doi: 10.1128/MCB.26.10.3986-3996.2006
[64]	N. MacDonald, Time delay in prey-predator models, Math. Biosci., 28 (1976), 321–330. https://doi.org/10.1016/0025-5564(76)90130-9 doi: 10.1016/0025-5564(76)90130-9
[65]	N. MacDonald, Time lag in a model of a biochemical reaction sequence with end product inhibition, J. Theor. Biol., 67 (1977), 549–556. https://doi.org/10.1016/0022-5193(77)90056-X doi: 10.1016/0022-5193(77)90056-X
[66]	N. MacDonald, Biological Delay Systems: Linear Stability Theory, Cambridge, Cambridge, 1989.
[67]	M. Barrio, A. Leier, T. T. Marquez-Lago, Reduction of chemical reaction networks through delay distributions, J. Chem. Phys., 138 (2013), 104114. https://doi.org/10.1063/1.4793982 doi: 10.1063/1.4793982
[68]	A. Leier, M. Barrio, T. T. Marquez-Lago, Exact model reduction with delays: Closed-form distributions and extensions to fully bi-directional monomolecular reactions, J. R. Soc. Interface, 11 (2014), 20140108. https://doi.org/10.1098/rsif.2014.0108 doi: 10.1098/rsif.2014.0108
[69]	I. R. Epstein, Differential delay equations in chemical kinetics: Some simple linear model systems, J. Chem. Phys., 92 (1990), 1702–1712. https://doi.org/10.1063/1.458052 doi: 10.1063/1.458052
[70]	D. Bratsun, D. Volfson, L. S. Tsimring, J. Hasty, Delay-induced stochastic oscillations in gene regulation, Proc. Natl. Acad. Sci. U.S.A., 102 (2005), 14593–14598. https://doi.org/10.1073/pnas.0503858102 doi: 10.1073/pnas.0503858102
[71]	M. R. Roussel, R. Zhu, Validation of an algorithm for delay stochastic simulation of transcription and translation in prokaryotic gene expression, Phys. Biol., 3 (2006), 274. https://doi.org/10.1088/1478-3975/3/4/005 doi: 10.1088/1478-3975/3/4/005
[72]	B. H. Jennings, Pausing for thought: Disrupting the early transcription elongation checkpoint leads to developmental defects and tumourigenesis, BioEssays, 35 (2013), 553–560. https://doi.org/10.1002/bies.201200179 doi: 10.1002/bies.201200179
[73]	H. Kwak, N. J. Fuda, L. J. Core, J. T. Lis, Precise maps of RNA polymerase reveal how promoters direct initiation and pausing, Science, 339 (2013), 950–953. https://doi.org/10.1126/science.1229386 doi: 10.1126/science.1229386
[74]	A. R. Hieb, S. Baran, J. A. Goodrich, J. F. Kugel, An 8nt RNA triggers a rate-limiting shift of RNA polymerase Ⅱ complexes into elongation, EMBO J., 25 (2006), 3100–3109. https://doi.org/10.1038/sj.emboj.7601197 doi: 10.1038/sj.emboj.7601197
[75]	T. J. Stasevich, Y. Hayashi-Takanaka, Y. Sato, K. Maehara, Y. Ohkawa, K. Sakata-Sogawa, et al., Regulation of RNA polymerase Ⅱ activation by histone acetylation in single living cells, Nature, 516 (2014), 272–275. https://doi.org/10.1038/nature13714 doi: 10.1038/nature13714
[76]	B. Steurer, R. C. Janssens, B. Geverts, M. E. Geijer, F. Wienholz, A. F. Theil, et al., Live-cell analysis of endogeneous GFP-RPB1 uncovers rapid turnover of initiating and promoter-paused RNA polymerase Ⅱ, Proc. Natl. Acad. Sci. U.S.A., 115 (2018), E4368–E4376. https://doi.org/10.1073/pnas.1717920115 doi: 10.1073/pnas.1717920115
[77]	J. Liu, D. Hansen, E. Eck, Y. J. Kim, M. Turner, S. Alamos, et al., Real-time single-cell characterization of the eukaryotic transcription cycle reveals correlations between RNA initiation, elongation, and cleavage, PLoS Comput. Biol., 17 (2021), e1008999. https://doi.org/10.1371/journal.pcbi.1008999 doi: 10.1371/journal.pcbi.1008999
[78]	A. Kremling, Comment on mathematical models which describe transcription and calculate the relationship between mrna and protein expression ratio, Biotech. Bioeng., 96 (2007), 815–819. https://doi.org/10.1002/bit.21065 doi: 10.1002/bit.21065
[79]	N. Mitarai, S. Pedersen, Control of ribosome traffic by position-dependent choice of synonymous codons, Phys. Biol., 10 (2013), 056011. https://doi.org/10.1088/1478-3975/10/5/056011 doi: 10.1088/1478-3975/10/5/056011
[80]	M. R. Roussel, The use of delay differential equations in chemical kinetics, J. Phys. Chem., 100 (1996), 8323–8330. https://doi.org/10.1021/jp9600672 doi: 10.1021/jp9600672
[81]	R. A. Coleman, B. F. Pugh, Evidence for functional binding and stable sliding of the TATA binding protein on nonspecific DNA, J. Biol. Chem., 270 (1995), 13850–13859. https://doi.org/10.1074/jbc.270.23.13850 doi: 10.1074/jbc.270.23.13850
[82]	A. Dasgupta, S. A. Juedes, R. O. Sprouse, D. T. Auble, Mot1-mediated control of transcription complex assembly and activity, EMBO J., 24 (2005), 1717–1729. https://doi.org/10.1038/sj.emboj.7600646 doi: 10.1038/sj.emboj.7600646
[83]	R. O. Sprouse, T. S. Karpova, F. Mueller, A. Dasgupta, J. G. McNally, D. T. Auble, Regulation of TATA-binding protein dynamics in living yeast cells, Proc. Natl. Acad. Sci. U.S.A., 105 (2008), 13304–13308. https://doi.org/10.1073/pnas.0801901105 doi: 10.1073/pnas.0801901105
[84]	S. H. Hosseini, Analytic Solutions for Stochastic Models of Transcription, Master's thesis, University of Lethbridge, 2016, URL https://www.uleth.ca/dspace/handle/10133/4791.
[85]	H.-J. Woo, Analytical theory of the nonequilibrium spatial distribution of RNA polymerase translocations, Phys. Rev. E, 74 (2006), 011907. https://doi.org/10.1103/PhysRevE.74.011907 doi: 10.1103/PhysRevE.74.011907
[86]	M. H. Larson, J. Zhou, C. D. Kaplan, M. Palangat, R. D. Kornberg, R. Landick, et al., Trigger loop dynamics mediate the balance between the transcriptional fidelity and speed of RNA polymerase Ⅱ, Proc. Natl. Acad. Sci. U.S.A., 109 (2012), 6555–6560. https://doi.org/10.1073/pnas.1200939109 doi: 10.1073/pnas.1200939109
[87]	A. C. M. Cheung, P. Cramer, Structural basis of RNA polymerase Ⅱ backtracking, arrest and reactivation, Nature, 471 (2011), 249–253. https://doi.org/10.1038/nature09785 doi: 10.1038/nature09785
[88]	G. Brzyżek, S. Świeżewski, Mutual interdependence of splicing and transcription elongation, Transcription, 6 (2015), 37–39. https://doi.org/10.1080/21541264.2015.1040146 doi: 10.1080/21541264.2015.1040146
[89]	M. Imashimizu, M. L. Kireeva, L. Lubkowska, D. Gotte, A. R. Parks, J. N. Strathem, et al., Intrinsic translocation barrier as an initial step in pausing by RNA polymerase Ⅱ, J. Mol. Biol., 425 (2013), 697–712. https://doi.org/10.1016/j.jmb.2012.12.002 doi: 10.1016/j.jmb.2012.12.002
[90]	J. W. Roberts, Molecular basis of transcriptional pausing, Science, 344 (2014), 1226–1227. https://doi.org/10.1126/science.1255712 doi: 10.1126/science.1255712
[91]	J. Singh, R. A. Padgett, Rates of in situ transcription and splicing in large human genes, Nat. Struct. Mol. Biol., 16 (2009), 1128–1133. https://doi.org/10.1038/nsmb.1666 doi: 10.1038/nsmb.1666
[92]	E. Rosonina, S. Kaneko, J. L. Manley, Terminating the transcript: breaking up is hard to do, Genes Dev., 20 (2006), 1050–1056. https://doi.org/10.1101/gad.1431606 doi: 10.1101/gad.1431606
[93]	E. A. Abbondanzieri, W. J. Greenleaf, J. W. Shaevitz, R. Landick, S. M. Block, Direct observation of base-pair stepping by RNA polymerase, Nature, 438 (2005), 460–465. https://doi.org/10.1038/nature04268 doi: 10.1038/nature04268
[94]	L. M. Hsu, Promoter clearance and escape in prokaryotes, Biochim. Biophys. Acta, 1577 (2002), 191–207. https://doi.org/10.1016/S0167-4781(02)00452-9 doi: 10.1016/S0167-4781(02)00452-9
[95]	H. Kimura, K. Sugaya, P. R. Cook, The transcription cycle of RNA polymerase Ⅱ in living cells, J. Cell Biol., 159 (2002), 777–782. https://doi.org/10.1083/jcb.200206019 doi: 10.1083/jcb.200206019
[96]	H. A. Ferguson, J. F. Kugel, J. A. Goodrich, Kinetic and mechanistic analysis of the RNA polymerase Ⅱ transcription reaction at the human interleukin-2 promoter, J. Mol. Biol., 314 (2001), 993–1006. https://doi.org/10.1006/jmbi.2000.5215 doi: 10.1006/jmbi.2000.5215
[97]	D. A. Jackson, F. J. Iborra, E. M. M. Manders, P. R. Cook, Numbers and organization of RNA polymerases, nascent transcripts, and transcription units in HeLa nuclei, Mol. Biol. Cell, 9 (1998), 1523–1536. https://doi.org/10.1091/mbc.9.6.1523 doi: 10.1091/mbc.9.6.1523
[98]	P. J. Hurtado, A. S. Kirosingh, Generalizations of the 'linear chain trick': Incorporating more flexible dwell time distributions into mean field ODE models, J. Math. Biol., 79 (2019), 1831–1883. https://doi.org/10.1007/s00285-019-01412-w doi: 10.1007/s00285-019-01412-w
[99]	H. Golstein, Classical Mechanics, chapter 12, Addison-Wesley, Reading, Massachusetts, 1980.
[100]	H. G. Othmer, A continuum model for coupled cells, J. Math. Biol., 17 (1983), 351–369. https://doi.org/10.1007/BF00276521 doi: 10.1007/BF00276521
[101]	C. J. Roussel, M. R. Roussel, Reaction-diffusion models of development with state-dependent chemical diffusion coefficients, Prog. Biophys. Mol. Biol., 86 (2004), 113–160. https://doi.org/10.1016/j.pbiomolbio.2004.03.001 doi: 10.1016/j.pbiomolbio.2004.03.001
[102]	D. Sulsky, R. R. Vance, W. I. Newman, Time delays in age-structured populations, J. Theor. Biol., 141 (1989), 403–422. https://doi.org/10.1016/S0022-5193(89)80122-5 doi: 10.1016/S0022-5193(89)80122-5
[103]	G. Bel, B. Munsky, I. Nemenman, The simplicity of completion time distributions for common complex biochemical processes, Phys. Biol., 7 (2010), 016003. https://doi.org/10.1088/1478-3975/7/1/016003 doi: 10.1088/1478-3975/7/1/016003
[104]	P. Billingsley, Probability and Measure, Wiley, New York, 1995.
[105]	G. Bar-Nahum, V. Epshtein, A. E. Ruckenstein, R. Rafikov, A. Mustaev, E. Nudler, A ratchet mechanism of transcription elongation and its control, Cell, 120 (2005), 183–193. https://doi.org/10.1016/j.cell.2004.11.045 doi: 10.1016/j.cell.2004.11.045
[106]	J. W. Shaevitz, E. A. Abbondanzieri, R. Landick, S. M. Block, Backtracking by single RNA polymerase molecules observed at near-base-pair resolution, Nature, 426 (2003), 684–687. https://doi.org/10.1038/nature02191 doi: 10.1038/nature02191
[107]	M. A. Gibson, J. Bruck, Efficient exact stochastic simulation of chemical systems with many species and many channels, J. Phys. Chem. A, 104 (2000), 1876–1889. https://doi.org/10.1021/jp993732q doi: 10.1021/jp993732q
[108]	H. T. Banks, J. Catenacci, S. Hu, A comparison of stochastic systems with different types of delays, Stoch. Anal. Appl., 31 (2013), 913–955. https://doi.org/10.1080/07362994.2013.806217 doi: 10.1080/07362994.2013.806217
[109]	Y.-L. Feng, J.-M. Dong, X.-L. Tang, Non-Markovian effect on gene transcriptional systems, Chin. Phys. Lett., 33. https://doi.org/10.1088/0256-307X/33/10/108701
[110]	J. Lloyd-Price, A. Gupta, A. S. Ribeiro, Sgns2: A compartmental stochastic chemical kinetics simulator for dynamic cell populations, Bioinformatics, 28 (2012), 3004–3005. https://doi.org/10.1093/bioinformatics/bts556 doi: 10.1093/bioinformatics/bts556
[111]	T. Maarleveld, StochPy User Guide, Release 2.3.0, 2015, URL https://sourceforge.net/projects/stochpy/files/stochpy_userguide_2.3.pdf/download.
[112]	A. S. Ribeiro, J. Lloyd-Price, SGN Sim, a stochastic genetic networks simulator, Bioinformatics, 23 (2007), 777–779. https://doi.org/10.1093/bioinformatics/btm004 doi: 10.1093/bioinformatics/btm004
[113]	D. T. Gillespie, A general method for numerically simulating the stochastic time evolution of coupled chemical reactions, J. Comput. Phys., 22 (1976), 403–434. https://doi.org/10.1016/0021-9991(76)90041-3 doi: 10.1016/0021-9991(76)90041-3
[114]	R. J. Sims Ⅲ, R. Belotserkovskaya, D. Reinberg, Elongation by RNA polymerase Ⅱ: the short and long of it, Genes Dev., 18 (2004), 2437–2468. https://doi.org/10.1101/gad.1235904 doi: 10.1101/gad.1235904
[115]	E. A. M. Trofimenkoff, M. R. Roussel, Small binding-site clearance delays are not negligible in gene expression modeling, Math. Biosci., 325 (2020), 108376. https://doi.org/10.1016/j.mbs.2020.108376 doi: 10.1016/j.mbs.2020.108376
[116]	V. Epshtein, E. Nudler, Cooperation between RNA polymerase molecules in transcription elongation, Science, 300 (2003), 801–805. https://doi.org/10.1126/science.1083219 doi: 10.1126/science.1083219
[117]	C. Jia, L. Y. Wang, G. G. Yin, M. Q. Zhang, Single-cell stochastic gene expression kinetics with coupled positive-plus-negative feedback, Phys. Rev. E, 100. https://doi.org/10.1103/PhysRevE.100.052406
[118]	J. Szavits-Nossan, R. Grima, Uncovering the effect of RNA polymerase steric interactions on gene expression noise: Analytical distributions of nascent and mature RNA numbers, Phys. Rev. E, 108 (2023), 034405. https://doi.org/10.1103/PhysRevE.108.034405 doi: 10.1103/PhysRevE.108.034405
[119]	P. Bokes, J. R. King, A. T. A. Wood, M. Loose, Transcriptional bursting diversifies the behaviour of a toggle switch: Hybrid simulation of stochastic gene expression, Bull. Math. Biol., 75 (2013), 351–371. https://doi.org/10.1007/s11538-013-9811-z doi: 10.1007/s11538-013-9811-z
[120]	M. Dobrzyński, F. J. Bruggeman, Elongation dynamics shape bursty transcription and translation, Proc. Natl. Acad. Sci. U.S.A., 106 (2009), 2583–2588. https://doi.org/10.1073/pnas.0803507106 doi: 10.1073/pnas.0803507106
[121]	L. Cai, N. Friedman, X. S. Xie, Stochastic protein expression in individual cells at the single molecule level, Nature, 440 (2006), 358–362. https://doi.org/10.1038/nature04599 doi: 10.1038/nature04599
[122]	A. J. M. Larsson, P. Johnsson, M. Hagemann-Jensen, L. Hartmanis, O. R. Faridani, B. Reinius, et al., Genomic encoding of transcriptional burst kinetics, Nature, 565 (2019), 251–254. https://doi.org/10.1038/s41586-018-0836-1 doi: 10.1038/s41586-018-0836-1
[123]	Y. Wan, D. G. Anastasakis, J. Rodriguez, M. Palangat, P. Gudla, G. Zaki, et al., Dynamic imaging of nascent RNA reveals general principles of transcription dynamics and stochastic splice site selection, Cell, 184 (2021), 2878–2895. https://doi.org/10.1016/j.cell.2021.04.012 doi: 10.1016/j.cell.2021.04.012
[124]	C. Jia, Y. Li, Analytical time-dependent distributions for gene expression models with complex promoter switching mechanisms, SIAM J. Appl. Math., 83 (2023), 1572–1602. https://doi.org/10.1137/22M147219X doi: 10.1137/22M147219X
[125]	B. W. Lindgren, G. W. McElrath, D. A. Berry, Introduction to Probability and Statistics, 154–156, 4th edition, Macmillan, New York, 1978.
[126]	M. Janisch, Kolmogorov's strong law of large numbers holds for pairwise uncorrelated random variables, Theory Probab. Appl., 66 (2021), 263–275. https://doi.org/10.4213/tvp5459 doi: 10.4213/tvp5459
[127]	B. Li, J. A. Weber, Y. Chen, A. L. Greenleaf, D. S. Gilmour, Analyses of promoter-proximal pausing by RNA polymerase Ⅱ on the hsp70 heat shock gene promoter in a Drosophila nuclear extract, Mol. Cell. Biol., 16 (1996), 5433–5443. https://doi.org/10.1128/MCB.16.10.5433 doi: 10.1128/MCB.16.10.5433
[128]	R.-J. Murphy, Stochastic Modeling of the Torpedo Mechanism of Eukaryotic Transcription Termination, Master's thesis, University of Lethbridge, 2017, URL https://www.uleth.ca/dspace/handle/10133/4906.
[129]	B. Choi, Y.-Y. Cheng, S. Cinar, W. Ott, M. R. Bennett, K. Josić, et al., Bayesian inference of distributed time delay in transcriptional and translational regulation, Bioinformatics, 36 (2020), 586–593. https://doi.org/10.1093/bioinformatics/btz574 doi: 10.1093/bioinformatics/btz574
[130]	H. Hong, M. J. Cortez, Y.-Y. Cheng, H. J. Kim, B. Choi, K. Josić, et al., Inferring delays in partially observed gene regulation processes, Bioinformatics, 39 (2023), btad670. https://doi.org/10.1093/bioinformatics/btad670 doi: 10.1093/bioinformatics/btad670
[131]	D. Holcman, Z. Schuss, The narrow escape problem, SIAM Rev., 56 (2014), 213–257. https://doi.org/10.1137/120898395
[132]	M. R. Roussel, T. Tang, Simulation of mRNA diffusion in the nuclear environment, IET Syst. Biol., 6 (2012), 125–133. https://doi.org/10.1049/iet-syb.2011.0032 doi: 10.1049/iet-syb.2011.0032
[133]	S. Tang, Mathematical Modeling of Eukaryotic Gene Expression, PhD thesis, University of Lethbridge, 2010, URL https://www.uleth.ca/dspace/handle/10133/2567.

mbe-21-06-273 update.zip

This article has been cited by:

1.	V. Deeban Chakravarthy, K L. N. C. Prakash, Kadiyala Ramana, Thippa Reddy Gadekallu, 2023, Chapter 64, 978-981-19-3678-4, 733, 10.1007/978-981-19-3679-1_64
2.	Hao Zhang, Lina Ge, Guifen Zhang, Jingwei Fan, Denghui Li, Chenyang Xu, A two-stage intrusion detection method based on light gradient boosting machine and autoencoder, 2023, 20, 1551-0018, 6966, 10.3934/mbe.2023301
3.	Kadiyala Ramana, A. Revathi, A. Gayathri, Rutvij H. Jhaveri, C.V. Lakshmi Narayana, B. Naveen Kumar, WOGRU-IDS — An intelligent intrusion detection system for IoT assisted Wireless Sensor Networks, 2022, 196, 01403664, 195, 10.1016/j.comcom.2022.10.001
4.	Kranthi Kumar Singamaneni, Kadiyala Ramana, Gaurav Dhiman, Saurabh Singh, Byungun Yoon, A Novel Blockchain and Bi-Linear Polynomial-Based QCP-ABE Framework for Privacy and Security over the Complex Cloud Data, 2021, 21, 1424-8220, 7300, 10.3390/s21217300
5.	Olena Shlyakhetko, Michal Greguš, 2024, Chapter 5, 978-3-031-60814-8, 101, 10.1007/978-3-031-60815-5_5
6.	G. M. Nandana, Ashok Kumar Yadav, 2023, Chapter 8, 978-981-99-3314-3, 83, 10.1007/978-981-99-3315-0_8

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)