
Motif discovery problem (MDP) is one of the well-known problems in biology which tries to find the transcription factor binding site (TFBS) in DNA sequences. In one aspect, there is not enough biological knowledge on motif sites and on the other side, the problem is NP-hard. Thus, there is not an efficient procedure capable of finding motifs in every dataset. Some algorithms use exhaustive search, which is very time-consuming for large-scale datasets. On the other side, metaheuristic procedures seem to be a good selection for finding a motif quickly that at least has some acceptable biological properties. Most of the previous methods model the problem as a single objective optimization problem; however, considering multi-objectives for modeling the problem leads to improvements in the quality of obtained motifs. Some multi-objective optimization models for MDP have tried to maximize three objectives simultaneously: Motif length, support, and similarity. In this study, the multi-objective Imperialist Competition Algorithm (ICA) is adopted for this problem as an approximation algorithm. ICA is able to simulate more exploration along the solution space, so avoids trapping into local optima. So, it promises to obtain good solutions in a reasonable time. Experimental results show that our method produces good solutions compared to well-known algorithms in the literature, according to computational and biological indicators.
Citation: Saeed Alirezanejad Gohardani, Mehri Bagherian, Hamidreza Vaziri. A multi-objective imperialist competitive algorithm (MOICA) for finding motifs in DNA sequences[J]. Mathematical Biosciences and Engineering, 2019, 16(3): 1575-1596. doi: 10.3934/mbe.2019075
[1] | Jujuan Zhuang, Kexin Feng, Xinyang Teng, Cangzhi Jia . GNet: An integrated context-aware neural framework for transcription factor binding signal at single nucleotide resolution prediction. Mathematical Biosciences and Engineering, 2023, 20(9): 15809-15829. doi: 10.3934/mbe.2023704 |
[2] | Xingyu Tang, Peijie Zheng, Yuewu Liu, Yuhua Yao, Guohua Huang . LangMoDHS: A deep learning language model for predicting DNase I hypersensitive sites in mouse genome. Mathematical Biosciences and Engineering, 2023, 20(1): 1037-1057. doi: 10.3934/mbe.2023048 |
[3] | Xuan Zhang, Huiqin Jin, Zhuoqin Yang, Jinzhi Lei . Effects of elongation delay in transcription dynamics. Mathematical Biosciences and Engineering, 2014, 11(6): 1431-1448. doi: 10.3934/mbe.2014.11.1431 |
[4] | Haipeng Zhao, Baozhong Zhu, Tengsheng Jiang, Zhiming Cui, Hongjie Wu . Identification of DNA-protein binding residues through integration of Transformer encoder and Bi-directional Long Short-Term Memory. Mathematical Biosciences and Engineering, 2024, 21(1): 170-185. doi: 10.3934/mbe.2024008 |
[5] | Mohammadjavad Hosseinpoor, Hamid Parvin, Samad Nejatian, Vahideh Rezaie, Karamollah Bagherifard, Abdollah Dehzangi, Amin Beheshti, Hamid Alinejad-Rokny . Proposing a novel community detection approach to identify cointeracting genomic regions. Mathematical Biosciences and Engineering, 2020, 17(3): 2193-2217. doi: 10.3934/mbe.2020117 |
[6] | Run Tang, Wei Zhu, Huizhu Pu . Event-triggered distributed optimization of multi-agent systems with time delay. Mathematical Biosciences and Engineering, 2023, 20(12): 20712-20726. doi: 10.3934/mbe.2023916 |
[7] | Xianzi Zhang, Yanmin Liu, Jie Yang, Jun Liu, Xiaoli Shu . Handling multi-objective optimization problems with a comprehensive indicator and layered particle swarm optimizer. Mathematical Biosciences and Engineering, 2023, 20(8): 14866-14898. doi: 10.3934/mbe.2023666 |
[8] | Min Liu, Guodong Ye . A new DNA coding and hyperchaotic system based asymmetric image encryption algorithm. Mathematical Biosciences and Engineering, 2021, 18(4): 3887-3906. doi: 10.3934/mbe.2021194 |
[9] | Shengming Zhou, Jia Zheng, Cangzhi Jia . SPREAD: An ensemble predictor based on DNA autoencoder framework for discriminating promoters in Pseudomonas aeruginosa. Mathematical Biosciences and Engineering, 2022, 19(12): 13294-13305. doi: 10.3934/mbe.2022622 |
[10] | Chao Wang, Jian Li, Haidi Rao, Aiwen Chen, Jun Jiao, Nengfeng Zou, Lichuan Gu . Multi-objective grasshopper optimization algorithm based on multi-group and co-evolution. Mathematical Biosciences and Engineering, 2021, 18(3): 2527-2561. doi: 10.3934/mbe.2021129 |
Motif discovery problem (MDP) is one of the well-known problems in biology which tries to find the transcription factor binding site (TFBS) in DNA sequences. In one aspect, there is not enough biological knowledge on motif sites and on the other side, the problem is NP-hard. Thus, there is not an efficient procedure capable of finding motifs in every dataset. Some algorithms use exhaustive search, which is very time-consuming for large-scale datasets. On the other side, metaheuristic procedures seem to be a good selection for finding a motif quickly that at least has some acceptable biological properties. Most of the previous methods model the problem as a single objective optimization problem; however, considering multi-objectives for modeling the problem leads to improvements in the quality of obtained motifs. Some multi-objective optimization models for MDP have tried to maximize three objectives simultaneously: Motif length, support, and similarity. In this study, the multi-objective Imperialist Competition Algorithm (ICA) is adopted for this problem as an approximation algorithm. ICA is able to simulate more exploration along the solution space, so avoids trapping into local optima. So, it promises to obtain good solutions in a reasonable time. Experimental results show that our method produces good solutions compared to well-known algorithms in the literature, according to computational and biological indicators.
The information about motifs provides significant knowledge about evolutionary processes and the complexities of different organisms simultaneously. Biologists believe that some special proteins called Transcription Factors (TFs) bind to some patterns of the DNA substrings called Transcription Factor Binding Sites (TFBSs). Then, the process of gene expression initiates [1]. During this process, genes are transcribed into RNA and get activated or deactivated. These regulatory sites in DNA strings correspond to some conservative sequence patterns which are called motifs. The regulatory sites or TFBSs are called occurrences. Finding these regulatory sites or TFBSs seems to be difficult; however, discovering them helps molecular biologists to investigate the interaction between DNA and proteins, gene regulation, cell development, and cell reaction under physiological and pathological conditions. Occurrences have a fixed length, but they have slightly different compositions from their own motif.
This problem is one of the well-known problems in molecular biology and since it has been proven to be NP-hard [2], various methods and algorithms have been proposed to solve it. There are two classes of procedures for finding motifs: Pattern driven and sample driven approaches. In pattern driven procedures, the methods search on all |Σ|l candidate motifs. In sample-driven approaches patterns related to the given instance are only explored [3]. Methods to solve MDP can also be classified into two main groups: exact and approximate methods. The exact methods are very time consuming while approximate methods give a good result(s) in a reasonable time. Thus, they have attracted more attention from research communities. Heuristic procedures (as approximate procedures) have been widely used in the literature since they are fast and give a reasonable answer(s) [4]. Numerous existing algorithms identify motifs with a given length, but they are either not applicable or not efficient when searching motifs with different lengths. Finding motifs with and without gapped in the given sequences is the goal of many algorithms. Locating gapped motifs is a time-consuming task due to using combinatorial approaches [5]. Our proposed procedure is for a fixed-length motif without gap, and we don't study extensible-length motifs here.
To find real biological motifs, several assumptions which are close to the nature of the motif have been considered. Kaya [6] has listed some of these assumptions:
1. Each DNA sequence has an occurrence.
2. Each DNA sequence does not have an occurrence. This is more close to the definition of the motif in nature.
3. Sequences in the dataset have more than one occurrence.
4. There is more than one motif for sequences.
For each of these assumptions, a suitable algorithm has been proposed, but we are looking for an algorithm that behaves more similar to the nature of the motif, indeed. In fact, a composition of these assumptions can help us to find a reasonable answer(s). In the literature, the proposed procedures for solving MDP have been classified into two main groups: Statistical techniques and string-based methods [7]. Some significant statistical algorithms are the expectation maximization (EM) algorithm [8] and Gibbs Sampling [9] proposed by Lawrence and Raly which use a greedy approach. These algorithms use elaborated statistical techniques for finding motifs. MEME (Multiple EM for Motif Elicitation) is an extended EM algorithm for discovering motifs in unaligned biopolymer sequences [10]. AlignAce (Aligns Nucleic Acid Conserved Elements) finds occurrences in a set of DNA sequences [11]. This algorithm tries to find profile motif. An important assumption taken into account by this procedure is that the motif model is one that is most different from the background motif model. Thus, they try to maximize the likelihood ratio of the motif model to the background model or the information entropy of the motif model in order to find the answer. In fact, log likelihood and information content are two scoring functions which are used for profile motif models. In MotifHyades, a novel probabilistic model was proposed and with two derived optimization algorithms attempted to find paired motifs in DNA sequences in human cell lines [12]. In [13], the k-spectrum modeling was used to catch DNA motif patterns from protein sequences. In the algorithm, multiple evaluation metrics gathered on millions of k-mer binding intensities from 92 proteins across 5 DNA-binding families.
The second class of algorithms is string-based ones. This kind of algorithms directly tries to find a motif through a substring that starts from zero to l. In fact, a tree of depth l is constructed where a node at depth k represents a k-length prefix of the motif. Some of the algorithms in this category are SPELLER [3], Weeder [14], MITRA [15], CENSUS [16], GENMOTIF [17], and RISOTTO [18]. Swarm intelligence algorithms are created based on natural phenomena. Genetic, Firefly, artificial bee colony and gravitation search algorithms are among the metaheuristic algorithms which are inspired by nature. Swarm intelligence deals with a system consisting of a group of individuals who move independently, but the group stays together and gradually converges to the optimal solution(s). These algorithms are iterative methods in which a single answer tries to improve itself during a period. Most of the metaheuristic algorithms in MDP use a single objective function, for example, MEME, MITRA, Weeder, and Consensus [19]. Kaya proposed MOGAMOD [6] considering three objective functions and demonstrated that considering the problem of MDP as a multi-objective optimization problem gives better motifs than the single objective ones. In fact, MOGAMOD attempts to optimize the length, support, and similarity of the motif simultaneously. These functions are in conflict with each other. Thus finding a solution that optimizes all these objectives at the same time is practically impossible and bounds to use the Pareto optimality notion. It is shown that the Multi-objective Artificial Bee Colony (MOABC) algorithm obtained the best results for a group of datasets among different approaches in the literature [20]. Similar to MOGAMOD, MOABC uses three objectives to find motifs. The authors of this paper also proposed a parallel version of this algorithm for protein strings [21]. Like all multi-objective optimization problems, it finds multiple answers at the end. In fact, this characteristic makes multi-objective algorithms superior to single objective ones. In this paper, we propose a multi-objective imperialist competition algorithm (MOICA) for MDP considering the three above mentioned objectives. Imperialist Competition Algorithm has recently attracted the attention of researchers to solve the optimization problems more efficiently.
For comparison purposes, we have used several measurements to demonstrate the efficiency of MOICA. We used hypervolume [22] which is an indicator that defines the volume of search space dominated by Pareto fronts. We also computed biological indicators such as nCC, nPC, nSn, nPPV, sSn, sPPV, sASP, and nSp to demonstrate the biological quality of our results [23]. Finally, we considered the motifs extracted from the same instances as in [6,7,20].
This paper is structured as follows: As you read the first section is the introduction. In Section2, we briefly define the motif discovery problem (MDP). Then in Section 3 describes the multi-objective version of MDP. Next, in Section 4, we describe the ICA and our algorithm that would adjust ICA for the multi-objective version of MDP. Next section shows the experimental results obtained from MOICA and compares the results with some of the already existing approximation algorithms. Finally, we present our conclusion in the last section.
Consider n sequences with length m, which are a composition of the alphabet Σ, are called by si. We defined si[k, l] as a subsequence of si with length l that started from k. Also, let si[k] denote a single alphabet of si in position k. In this paper, we will be using the terms string(s) and sequence(s) interchangeably. A motif is obtained in the given input sequences by its occurrences. We denoted the occurrences related to one motif as a vector x = (x1, x2, …, xn), where xi is the starting position of an occurrence in the si and xi = 0, 1, 2, …, m-l+1, where l is the length of each occurrence in the dataset. Let us suppose that each sequence has a probability of selection. Then, we indicate xi = 0 if we don't select any occurrence in si. As a result, there is not an occurrence in the ith sequence with xi = 0 (Figure 1). The probability of selection is called rate in our paper. It is clear that a sequence with a rate = 40 has less chance for selection than one with a rate = 60. The start of metaheuristic algorithms is to create individual populations. Using each of these occurrences as an individual, then, we extract occurrences in the sequences regarding the length of motif (Figure 1). The length of the motif in our algorithm varies between 7 and 64 as in [24]. In order to obtain a consensus motif, we put all of the occurrences under each other (Figure 1). A composition of the maximum alphabet frequency for each column results in consensus motif (Figure 1). We define two variables to show consensus in a precise way:
After finding the consensus motif, we find concordance of each occurrence; it is a normalized value that shows the resemblance between the consensus motif and the occurrence (Figure 1).
concordance(xi)=l−dH(si[xi,l],consensus(x))l,i=1,2,…,n,xi≠0 |
Where dH is the Hamming distance between two sequences with equal length, i.e., the number of positions where the two sequences differ. Furthermore, same as other algorithms in the literature, the threshold of our algorithm for concordance is 0.5. It means that if the concordance of ith occurrence is less than 0.5, we let xi = 0. The number of xi > 0 is the support of the motif which depends on its occurrences. The mathematical notation of it can be written in the following way:
support(x)=n∑i=1uxi |
Where
similarity(x)=∑li=1wisupport(x)l |
Then, the similarity is the average of maximum frequency for a potential motif with length l (Figure 1).
We computed the similarity of the example motif in Figure 1. A concept that we used in our algorithm is Complexity based on Fogel [25][Fogel, 2004 #51;Atashpaz-Gargari, 2007 #30]. The complexity of a string with length l in |Σ| different alphabets is computed as follows:
Complexity(xi)=log|Σ|l!ni,1!ni,2…ni,|Σ|!,i=1,2,…,n |
Where ni, j is the number of the jth alphabet from Σ in the selected sequence xi and |Σ| = 4 in DNA. As an example, the complexity of string "AAAAAAA' is
complexity(x)=∑ni=1complexity(xi)support(x) |
The complexity of two strings "AAAAAAA" and "AAATAAA" is
Our goal in this paper is to maximize the length, support, and similarity of the motif simultaneously while avoiding falls in low complexity result(s). In fact, these three values (length, support, and similarity) are our objectives. Fitness value of an individual is assessed based on its length, support, and similarity. We also have used the fitness value of each individual to compute the fitness value of a group of individuals. These two values help us to find the weakest and the most likehood individuals and the groups of individuals in our method.
We gathered all the variables and functions to define the problem in Table 1, again. The definition of each one also is illustrated.
Variable | Definition |
n | Number of sequences |
m | Length of each sequence |
l | length of motif |
si | ith sequence with length m which is a composition of the alphabet Σ |
si[k, l] | A subsequence from si with length l that started from k |
si[k] | A single alphabet of si in position k |
xi | The starting position of an occurrence in the si and xi=0, 1, 2, …, m-l+1 |
x | An individual, x=(x1, x2, …, xn) |
fiΣ | Frequency of Σ in the ith column of the occurrence matrix |
wi | wi=fiΣ |
cwi | The corresponding alphabet for wi |
ni, j | Number of the jth alphabet from Σ in the selected sequence xi |
uxi | |
δ | |
consensus(x) | consensus(x)=cw1 cw2…cwn |
concordance(xi) | |
support(x) | |
similarity(x) | |
Complexity(xi) | |
complexity(x) |
We can express the problem mathematically as:
s.t.
A single-objective optimization problem is defined as:
s.t
Where f(x) is a single function and gj(x), hk(x) are its constraints. In multi-objective optimization problems (MOOP), we have a vector of functions which has to be optimized. In such problems, the aim is to find the best value for all components, though the components of multi-objective function may be in conflict with each other, such that increasing in one component may causes decreasing in some other component(s).
The general form of a MOOP can be written as:
Maxf(x)=(fi(x)),i=1,…,n |
s.t
Where f(x) is a vector with n components and gj(x), hk(x) are its constraints. After solving a MOOP, we obtain multiple solutions which may not be better than other solutions. We call such solutions as non-dominated solutions. In fact, the value of x dominates the value of y, which we indicate as f(x)≻f(y) if and only if f(x) is better than f(y) in at least one component, which may also be not worse than the other components. We say x is pareto- optimal or non-dominated optimal if there is no y in feasible space such that f(y) dominates f(x). We call a set of pareto-optimal solutions or non-dominated optimal solutions as pareto-optimal set. In fact, after solving a MOOP, we have a pareto-optimal set [26]. We also call the value of the pareto-optimal set as pareto fronts. Our goal is to find the best pareto fronts.
A lot of metaheuristic algorithms based on natural phenomena have been defined to solve optimization problems like genetic algorithm inspired by gene evolution, ant colony algorithm mimiced by ant search behavior, fish swarm optimization based on fish behavior looking for food. Among the metaheuristic algorithms, ICA has attracted the researchers' attention due to its ability to find good solutions for NP-hard problems. In the next section, we explain the ICA in detail. This is the main part of our paper.
In swarm algorithms, a group of individuals work collaboratively to solve a problem. To start the algorithm, initial individuals (countries) are created randomly. The cost of each individual (country) is calculated on the basis of its objective function value. We can refer to ICA ability to simulate a wider exploration during the solution searching space as an advantage.
The Imperialist Competition Algorithm proposed by Atashpaz et al. in 2007 [27] was inspired by the intelligent behaviour of imperialist competition in the real world. This algorithm has some simple parameters such as the number of countries, the number of imperialists, and the maximum number of iterations. To solve multi-objective MDP, we propose a new algorithm based on the Imperialist Competition Algorithm. Similar to the other evolutionary algorithms, the first part of this algorithm is the creation of initial population. Then, all countries are classified into two groups: Imperialists and colonies. In fact, some of the best countries are selected as imperialists and the rest of them are called colonies. Colonies are divided by imperialists on the basis of their power. Thus, a group of empires are created which challenge each other to catch more countries. Two main processes occur after this assimilation process: Intra competition and extra completion. In intra competition, each colony moves towards its own imperialist and the imperialist tries also to increase its power. During this process, a colony replaces its own imperialist if it has more power. The next process is extra competition. In this process, the algorithm finds the weakest colony in the weakest empire and adds it to one of the strongest empires. The total power of an empire depends on the power of its imperialist and a percentage of the mean power of its colonies.
These two main processes would repeat in a reasonable time until the empires gradually collapse and finally all countries converge into one empire.
In this work, a particular intelligent behaviour which causes countries to rise or collapse is considered. Individuals in this algorithm are similar to MOGAMOD [6] and MOABC [20]. The main steps of the algorithm are given below:
Algorithm MOICA:
1. Input: n sequences, the selection probability (rate), the number of Population (nPop), the number of Empires (nEmp), and the Maximum Iteration (MaxIt).
2.Create nPop countries with a fixed length l∈[lmin, lmax] and compute support and similarity for each of them.
3.Use non-dominated sorting to sort all Motifs (countries).
4.Assimilation: Select nEmp of best Motifs (countries) as imperialists and divide the other nCol countries between them based on imperialists' positions in Step 3.
5.Intra Competition: Make a revolution in colonies and imperialists in each empire and replace each imperialist by a colony if it is better.
6.Use non-dominated sorting to sort all Motifs (countries).
7.Compute the fitness function of each empire and find the weakest.
8.Extra competition: Select the weakest Motif (country) from the weakest empire and give it to the empire that has the most likehood to possess it.
9.Repeat 5–8 MaxIt times.
10.Output: Non-dominated set.
MaxIt, n sequences, rate, nPop, and nEmp are given as input. We should know the minimum and maximum length of motifs, lmin and lmax respectively to search within this range. To start the algorithm, we create and evaluate the initial countries (item 2). This step repeats for each motif length. Here, we add two propositions to our algorithm in this line. The first proposition is that the support of each individual (country) cannot be less than 2 for n = 2, 3 and it cannot be less than 3 for n≥4. The second proposition is that we don't let an individual (country) with low complexity get produced. In fact, we require the program to reproduce a new individual instead of a previous one which didn't satisfy our two propositions. The main process of MOICA starts after item 2.
After non-dominate sorting (in item 3), all the countries assimilate into nEmp countries which are selected after the sorting process (in item 4). We can divide the rest of algorithm into three blocks: Intra competition, non-dominate sorting, and extra competition. In intra competition block, we create colonies on the basis of their distances from their own imperialists and also create a new imperialist (we explain the way of mutation and crossover at the end of this section). We replace the imperialist by each of its colonies or create a new imperialist if its support and similarity is better than the imperialist based on the dominant concept and if its complexity is not low (item 5). In the second block, we do non-dominated sorting to find the rank between all countries (item 6). Hence, we calculate the fitness value of each country by:
(nPop−1)rank,rank=1,2,…,nPop. |
We use the fitness value of each country to calculate the fitness (total power) of an empire. The fitness value of an empire is the fitness value of its imperialist added to a ζ percent of average fitness value of its colonies. In our proposal, we take ζ = 30%.
In the last block, we do an extra competition to find the weakest colony in the weakest empire to add it to one of the strongest empires (items 7 and 8). We find both of them based on their fitness, the fitness value of empire, and the fitness value of each country. Finally, we repeat this process Maximum Iteration (MaxIt) times to reach to some non-dominated imperialists. Then, we report them as motifs.
At the end of this section, we explain how we make new colonies move towards an imperialist. We also explain how we mutate a country (imperialist or colony). Let Imp = (a1, a2, …, an) and Col = (b1, b2, …, bn). First of all, in order to make a Col move towards an Imp for the creation of a new colony (b1', b2', …, bn') we do as follows:
b′i={0ai=0,η<prevbiai=0,η>prevaiai≠0,bi=0bi+βη(ai−bi)ai≠0,bi≠0 |
Where η~U(0, 1) that is a uniform number between 0 and 1. We take β = 2 and prev = 10% in MOICA. We also notice that if ai≠0, bi≠0 then we don't let b'i < 1 or b'i > m-l+1. If bi < 1 or bi > m-l+1, we replace it with one and m-l+1, respectively.
Mutation is to make a revolution in the country (imperialist or colony). We produce a new country (a1', a2', …, an') as follows:
a′i={0ai=0ai+σθai≠0 |
Here we take θ~U(-1, 1) and σ = 0.1(m-l+1). We also notice that if ai≠0, then we don't let a'i < 1 or a'i > m-l+1. If a'i < 1 or a'i > m-l+1, we replace it with one and m-l+1, respectively.
In this section, we compare our algorithm with other significant algorithms in the literature. In order to show the efficiency of our algorithm, we performed some experiments as in Kaya [6] and Gonzalez et al. [7,20], and on real dataset as TRANSFAC database [23]. We compare our algorithm with some previous significant methods in the literature such as, MEME [10], Weeder [14], AlignACE [11], MOABC [20], MO-VNS [28], DEPT [24], MO-FA [29], MO-GSA [30], SPEA2 [31], and NSGA-II [32]. The comparison is made according to three metrics: Hypervolume indicator, biological indicators, and motif finding. Our proposed algorithm is written in C++1. The computation has been performed on a laptop computer with an Intel(R) Core(TM) i3 CPU M 330 (2.13 GHz) and 4.0 GB memory.
1 The source code would be available upon request.
One important performance measure for comparing MOICA with other multi-objective algorithms for finding TFBs is hypervolume indicator [22]. Hypervolume was used by Gonzalez [20] to show that MOABC mostly produces better results than the other algorithms in the literature. Hypervolume is an indicator that indicates the volume of search space covered by the result of algorithm. To calculate the hypervolume, (0, 0, 0) is selected as the worst point. In order to obtain more accurate results, we ran our algorithm 31 times for each instance to show its statistical significance and reported its average and standard deviation in Table 3. Reference volume in each instance depends on the number of sequences, the maximum length of each sequence, and the maximum similarity that is 1 (100%). For example, a dataset with 4 sequences will have the reference volume of 4 × 64 × 1. Since other source codes are not available, to compare MOICA with algorithms of [6,7], we have approximately used the same runtimes. To assure the power of our algorithm, we selected a composition of 54 real, generic, and Markov chain from TRANSFAC dataset [23] (called as ending in "r", "g", and "m" respectively) with different sequence numbers and lengths.
The dataset specifications, algorithm parameters for each instance, and runtime(s) are reported in Table 2. The total size of each data is equal to the product of the number of sequence and the length of sequence. As it is seen, MOICA has 4 parameters: the probability of selection of each sequence (rate), the Maximum Iteration (MaxIt), the number of Population (nPop), and the number of Empires (nEmp). To reach the best result, we ran the program several times to find the best parameters for MOICA and reported all of them in Table 2.
Time(sec.) | nCol | nEmp | nPop | MaxIt | rate | Len. | Seq. | Dataset |
15 | 110 | 50 | 160 | 390 | 55% | 500 | 3 | yst05r |
15 | 110 | 50 | 160 | 390 | 55% | 500 | 4 | yst02g |
20 | 110 | 50 | 160 | 390 | 55% | 1000 | 5 | yst10m |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 6 | yst07m |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 7 | yst06g |
25 | 110 | 50 | 160 | 390 | 55% | 500 | 8 | yst03m |
25 | 110 | 50 | 160 | 390 | 55% | 1000 | 9 | yst01g |
30 | 110 | 50 | 160 | 390 | 55% | 1000 | 11 | yst08r |
30 | 100 | 35 | 135 | 330 | 55% | 1000 | 16 | yst09g |
25 | 110 | 50 | 160 | 390 | 70% | 1000 | 7 | yst04r |
25 | 110 | 50 | 160 | 390 | 70% | 500 | 8 | yst03r |
30 | 90 | 35 | 125 | 300 | 70% | 1000 | 16 | yst09g |
30 | 110 | 50 | 160 | 400 | 60% | 500 | 12 | mus11r |
15 | 110 | 50 | 160 | 400 | 75% | 1500 | 4 | mus07r |
30 | 130 | 50 | 180 | 470 | 60% | 3000 | 7 | hm16r |
40 | 110 | 50 | 160 | 400 | 60% | 2000 | 13 | hm04r |
15 | 110 | 50 | 160 | 390 | 55% | 1500 | 3 | dm07m |
15 | 120 | 50 | 170 | 420 | 80% | 2000 | 3 | dm08m |
15 | 120 | 50 | 170 | 420 | 80% | 2000 | 3 | dm03m |
15 | 110 | 50 | 160 | 390 | 55% | 2500 | 3 | dm05g |
20 | 120 | 50 | 170 | 420 | 80% | 1500 | 4 | dm01g |
20 | 120 | 50 | 170 | 420 | 80% | 2000 | 4 | dm04g |
15 | 120 | 50 | 170 | 420 | 90% | 500 | 2 | hm12r |
15 | 120 | 50 | 170 | 420 | 90% | 500 | 2 | hm25g |
15 | 120 | 50 | 170 | 420 | 90% | 1000 | 2 | hm14r |
15 | 120 | 50 | 170 | 420 | 80% | 1000 | 3 | hm05r |
20 | 120 | 50 | 170 | 420 | 80% | 500 | 4 | hm23r |
15 | 110 | 50 | 160 | 390 | 55% | 2000 | 4 | hm15r |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 5 | hm19g |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 5 | hm21g |
20 | 110 | 50 | 160 | 390 | 55% | 1000 | 5 | hm07m |
20 | 110 | 50 | 160 | 390 | 55% | 3000 | 5 | hm18m |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 6 | hm22m |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 6 | hm10m |
20 | 110 | 50 | 160 | 390 | 55% | 1000 | 6 | hm13r |
20 | 110 | 45 | 155 | 350 | 55% | 3000 | 7 | hm16g |
20 | 110 | 40 | 150 | 345 | 55% | 500 | 8 | hm24m |
20 | 110 | 40 | 150 | 345 | 55% | 1000 | 8 | hm11g |
20 | 110 | 40 | 150 | 345 | 55% | 500 | 9 | hm06g |
20 | 110 | 40 | 150 | 345 | 55% | 1000 | 9 | hm26m |
20 | 110 | 40 | 150 | 345 | 55% | 1000 | 9 | hm02r |
30 | 110 | 50 | 160 | 390 | 55% | 1500 | 10 | hm03r |
30 | 110 | 50 | 160 | 390 | 55% | 1500 | 10 | hm09g |
30 | 110 | 50 | 160 | 390 | 55% | 500 | 11 | hm17g |
30 | 100 | 40 | 140 | 340 | 55% | 2000 | 13 | hm04m |
30 | 100 | 35 | 135 | 330 | 55% | 500 | 15 | hm08m |
30 | 100 | 30 | 130 | 300 | 55% | 2000 | 18 | hm01g |
30 | 90 | 20 | 110 | 190 | 55% | 2000 | 35 | hm20r |
10 | 110 | 50 | 160 | 390 | 55% | 500 | 2 | mus09r |
15 | 110 | 50 | 160 | 390 | 55% | 500 | 3 | mus01r |
15 | 110 | 50 | 160 | 390 | 55% | 500 | 3 | mus12m |
15 | 110 | 50 | 160 | 390 | 55% | 500 | 3 | mus06g |
15 | 110 | 50 | 160 | 390 | 55% | 1500 | 3 | mus08m |
15 | 110 | 50 | 160 | 390 | 55% | 500 | 4 | mus05r |
15 | 110 | 50 | 160 | 390 | 55% | 1500 | 4 | mus07g |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 5 | mus03g |
20 | 110 | 45 | 155 | 350 | 55% | 1000 | 7 | mus04m |
20 | 110 | 40 | 150 | 345 | 55% | 1000 | 9 | mus02r |
30 | 110 | 40 | 150 | 345 | 55% | 500 | 12 | mus11m |
30 | 100 | 40 | 140 | 340 | 55% | 1000 | 13 | mus10g |
20 | 110 | 50 | 160 | 450 | 95% | 2500 | 5 | dm05r |
25 | 150 | 50 | 200 | 450 | 80% | 2000 | 4 | dm04r |
15 | 110 | 50 | 160 | 400 | 75% | 1500 | 4 | dm01r |
Hypervolume indicator is the first indicator used to compare MOICA with MOABC [20], MO-VNS [28], DEPT [24], MO-FA [29], MO-GSA [30], SPEA2 [31], and NSGA-II [32].
The result of this comparison is shown in Table 3. We divide the results in four parts: Datasets with n = 2 sequences, datasets with n = 3, 4, 5, 6, 7 sequences, datasets with n = 8, 9 sequences, and datasets with n > 9 sequences. In datasets with n = 2 sequences, NSGA-II has the best hypervolume value among all algorithms which is specified in Table 2, though MOABC, MO-GSA and SPEA2 have a value close to that of NSGA-II. In datasets with n = 2, 3, 4, 5, 6, 7 sequences, MOABC gets the best results in different cases most of the time. MO-VNS, NSGA-II, and SPEA2 have close results in this interval. For example, NSGA-II has the best hypervolume in dm05g (n = 2), mus07g (n = 3), dm01g (n = 3), and yst04r (n = 7). On the other hand, MO-VNS's hypervolume and SPEA2's hypervolume decrease quickly for n > 5 and n > 4, respectively. DEPT, MO-FA, MO-GSA, and MOICA have very weak hypervolume in this interval in most cases, though MO-FA and MOICA have the best hypervolume for hm16g (n = 7) and yst10m (n = 5), respectively. DEPT, MO-FA and MOICA gets better hypervolume if we increase the number of sequences.
n, m | Dataset | SPEA2 | MO-GSA | MO-FA | NSGA-II | DEPT | MO-VNS | MOABC | MOICA |
2,500 | mus09r | 93.35% | 92.92% | 90.36% | 93.46% | 90.95% | 84.79% | 92.95% | 83.12% |
0.002 | 0.005 | 0.004 | 0.001 | 0.002 | 0.061 | 0.005 | 0.002 | ||
2,500 | hm12r | 94.59% | 93.88% | 91.14% | 94.71% | 91.50% | 90.69% | 94.29% | 85.12% |
0.007 | 0.005 | 0.006 | 0.002 | 0.005 | 0.034 | 0.005 | 0.002 | ||
2,500 | hm25g | 93.82% | 93.20% | 90.24% | 93.94% | 91.30% | 79.92% | 93.66% | 81.93% |
0.003 | 0.011 | 0.007 | 0.01 | 0.008 | 0.062 | 0.007 | 0.003 | ||
2, 1000 | hm14r | 93.96% | 94.11% | 91.03% | 94.40% | 91.40% | 86.77% | 93.88% | 84.07% |
0.003 | 0.003 | 0.006 | 0.004 | 0.003 | 0.062 | 0.006 | 0.002 | ||
3,500 | yst05r | 85.66% | 81.29% | 83.63% | 85.96% | 80.76% | 86.61% | 86.89% | 82.32% |
0.004 | 0.02 | 0.014 | 0.01 | 0.021 | 0.007 | 0.007 | 0.004 | ||
3,500 | mus01r | 85.65% | 80.50% | 84.44% | 85.26% | 80.40% | 85.90% | 86.93% | 80.28% |
0.004 | 0.022 | 0.017 | 0.011 | 0.017 | 0.007 | 0.006 | 0.004 | ||
3,500 | mus12m | 84.96% | 80.78% | 82.72% | 84.19% | 80.04% | 85.41% | 85.78% | 80.46% |
0.004 | 0.018 | 0.01 | 0.012 | 0.018 | 0.006 | 0.006 | 0.004 | ||
3,500 | mus06g | 83.46% | 79.94% | 81.91% | 83.35% | 80.01% | 84.21% | 84.33% | 79.60% |
0.003 | 0.017 | 0.008 | 0.01 | 0.02 | 0.007 | 0.006 | 0.003 | ||
3, 1000 | hm05r | 85.10% | 80.32% | 82.73% | 85.19% | 80.46% | 85.41% | 85.50% | 82.38% |
0.005 | 0.015 | 0.009 | 0.012 | 0.019 | 0.007 | 0.009 | 0.005 | ||
3, 1500 | dm07m | 85.19% | 80.42% | 83.53% | 85.01% | 81.01% | 85.87% | 86.09% | 81.65% |
0.006 | 0.02 | 0.01 | 0.008 | 0.015 | 0.007 | 0.011 | 0.004 | ||
3, 1500 | mus08m | 84.60% | 80.48% | 83.10% | 84.52% | 80.85% | 85.11% | 85.77% | 80.71% |
0.004 | 0.023 | 0.01 | 0.01 | 0.01 | 0.007 | 0.008 | 0.005 | ||
3, 2000 | dm08m | 85.17% | 80.94% | 83.25% | 85.90% | 81.30% | 85.75% | 85.96% | 81.81% |
0.004 | 0.017 | 0.01 | 0.007 | 0.015 | 0.007 | 0.008 | 0.003 | ||
3, 2000 | dm03m | 85.16% | 80.75% | 83.52% | 84.56% | 81.38% | 85.56% | 86.06% | 81.29% |
0.006 | 0.021 | 0.011 | 0.007 | 0.017 | 0.007 | 0.011 | 0.003 | ||
3, 2500 | dm05g | 85.59% | 80.30% | 83.67% | 86.69% | 81.55% | 85.88% | 86.39% | 82.18% |
0.005 | 0.021 | 0.012 | 0.005 | 0.014 | 0.008 | 0.01 | 0.003 | ||
4,500 | yst02g | 82.28% | 78.78% | 80.82% | 79.81% | 78.21% | 82.50% | 82.87% | 78.80% |
0.005 | 0.012 | 0.011 | 0.007 | 0.016 | 0.007 | 0.01 | 0.003 | ||
4,500 | hm23r | 81.18% | 78.23% | 80.89% | 79.51% | 78.13% | 81.57% | 82.38% | 77.16% |
0.005 | 0.013 | 0.011 | 0.007 | 0.018 | 0.007 | 0.007 | 0.002 | ||
4,500 | mus05r | 80.44% | 78.08% | 80.25% | 79.81% | 77.56% | 81.45% | 81.69% | 77.88% |
0.004 | 0.013 | 0.008 | 0.006 | 0.019 | 0.006 | 0.008 | 0.003 | ||
4, 1500 | mus07g | 87.73% | 79.85% | 82.43% | 90.43% | 80.01% | 89.08% | 89.21% | 85.55% |
0.01 | 0.018 | 0.015 | 0.022 | 0.034 | 0.015 | 0.029 | 0.013 | ||
4, 1500 | dm01g | 81.74% | 79.03% | 81.62% | 83.77% | 79.66% | 82.95% | 83.24% | 80.91% |
0.004 | 0.015 | 0.013 | 0.01 | 0.017 | 0.006 | 0.008 | 0.004 | ||
4, 2000 | dm04g | 81.08% | 79.31% | 81.91% | 81.16% | 79.78% | 82.90% | 83.77% | 80.34% |
0.005 | 0.014 | 0.014 | 0.006 | 0.018 | 0.008 | 0.014 | 0.003 | ||
4, 2000 | hm15r | 85.93% | 80.23% | 83.98% | 85.23% | 80.29% | 85.23% | 88.59% | 82.80% |
0.009 | 0.025 | 0.042 | 0.022 | 0.022 | 0.016 | 0.027 | 0.01 | ||
5,500 | hm19g | 79.74% | 76.81% | 80.08% | 79.33% | 78.71% | 80.35% | 81.37% | 76.76% |
0.004 | 0.019 | 0.009 | 0.007 | 0.022 | 0.007 | 0.009 | 0.003 | ||
5,500 | mus03g | 77.31% | 74.58% | 79.67% | 76.48% | 77.61% | 77.39% | 79.84% | 72.74% |
0.004 | 0.044 | 0.012 | 0.012 | 0.016 | 0.018 | 0.008 | 0.003 | ||
5, 1000 | yst10m | 65.38% | 63.39% | 64.99% | 65.24% | 63.49% | 66.08% | 66.67% | 75.53% |
0.003 | 0.01 | 0.009 | 0.004 | 0.014 | 0.005 | 0.007 | 0.003 | ||
5, 1000 | hm21g | 78.61% | 77.01% | 79.49% | 78.24% | 78.72% | 79.76% | 81.12% | 75.83% |
0.004 | 0.019 | 0.01 | 0.008 | 0.012 | 0.007 | 0.01 | 0.002 | ||
5, 1000 | hm07m | 78.71% | 77.37% | 79.91% | 79.39% | 78.91% | 79.66% | 81.24% | 74.87% |
0.004 | 0.017 | 0.01 | 0.005 | 0.017 | 0.006 | 0.009 | 0.003 | ||
5, 3000 | hm18m | 78.58% | 77.39% | 79.97% | 77.58% | 79.80% | 79.38% | 81.31% | 75.16% |
0.004 | 0.023 | 0.009 | 0.004 | 0.014 | 0.007 | 0.008 | 0.003 | ||
6,500 | yst07m | 75.92% | 70.49% | 77.36% | 74.57% | 76.81% | 74.22% | 78.27% | 72.27% |
0.005 | 0.038 | 0.008 | 0.013 | 0.012 | 0.032 | 0.008 | 0.003 | ||
6,500 | hm22m | 75.53% | 69.87% | 76.84% | 75.02% | 76.98% | 74.12% | 77.91% | 72.22% |
0.005 | 0.047 | 0.009 | 0.03 | 0.011 | 0.028 | 0.007 | 0.003 | ||
6,500 | hm10m | 75.22% | 69.01% | 77.54% | 74.61% | 77.11% | 72.36% | 78.23% | 72.06% |
0.005 | 0.038 | 0.009 | 0.02 | 0.014 | 0.034 | 0.011 | 0.003 | ||
6, 1000 | hm13r | 75.19% | 70.27% | 77.35% | 76.08% | 77.24% | 72.88% | 78.18% | 72.93% |
0.005 | 0.04 | 0.008 | 0.01 | 0.008 | 0.031 | 0.01 | 0.003 | ||
7,500 | yst06g | 73.19% | 68.05% | 75.75% | 73.84% | 74.94% | 72.04% | 75.93% | 74.10% |
0.005 | 0.03 | 0.014 | 0.011 | 0.018 | 0.028 | 0.012 | 0.002 | ||
7, 1000 | yst04r | 71.43% | 68.20% | 73.98% | 77.04% | 73.50% | 72.88% | 75.54% | 75.30% |
0.006 | 0.034 | 0.014 | 0.007 | 0.02 | 0.028 | 0.012 | 0.003 | ||
7, 1000 | mus04m | 69.43% | 63.99% | 74.24% | 67.86% | 74.31% | 65.49% | 71.10% | 70.02% |
0.02 | 0.03 | 0.01 | 0.012 | 0.012 | 0.021 | 0.023 | 0.003 | ||
7, 3000 | hm16g | 71.13% | 69.12% | 83.63% | 71.82% | 78.42% | 70.36% | 81.55% | 76.90% |
0.023 | 0.043 | 0.051 | 0.015 | 0.044 | 0.026 | 0.055 | 0.015 | ||
8,500 | yst03m | 68.37% | 62.59% | 72.65% | 66.44% | 73.26% | 65.03% | 69.68% | 69.97% |
0.011 | 0.038 | 0.01 | 0.011 | 0.012 | 0.026 | 0.024 | 0.004 | ||
8,500 | hm24m | 68.28% | 59.58% | 72.19% | 65.10% | 72.91% | 62.65% | 67.65% | 69.08% |
0.013 | 0.036 | 0.008 | 0.011 | 0.012 | 0.028 | 0.019 | 0.003 | ||
8, 1000 | hm11g | 69.34% | 60.14% | 73.15% | 69.38% | 74.54% | 63.86% | 69.93% | 72.17% |
0.012 | 0.044 | 0.017 | 0.031 | 0.021 | 0.028 | 0.021 | 0.005 | ||
9,500 | hm06g | 61.24% | 56.60% | 69.37% | 60.06% | 70.68% | 59.43% | 63.21% | 69.52% |
0.012 | 0.029 | 0.012 | 0.015 | 0.027 | 0.019 | 0.019 | 0.003 | ||
9, 1000 | yst01g | 63.16% | 59.37% | 70.14% | 62.75% | 70.72% | 61.48% | 65.16% | 69.94% |
0.013 | 0.024 | 0.011 | 0.013 | 0.014 | 0.021 | 0.025 | 0.004 | ||
9, 1000 | hm26m | 62.28% | 58.89% | 70.48% | 60.86% | 70.68% | 60.98% | 64.43% | 68.51% |
0.015 | 0.031 | 0.012 | 0.013 | 0.016 | 0.025 | 0.025 | 0.004 | ||
9, 1000 | hm02r | 59.09% | 57.57% | 69.88% | 61.50% | 70.85% | 59.04% | 63.45% | 68.68% |
0.013 | 0.024 | 0.014 | 0.014 | 0.023 | 0.022 | 0.033 | 0.006 | ||
9, 1000 | mus02r | 60.62% | 57.21% | 69.78% | 61.99% | 70.02% | 59.34% | 63.58% | 69.08% |
0.012 | 0.025 | 0.015 | 0.013 | 0.019 | 0.025 | 0.024 | 0.005 | ||
10, 1500 | hm03r | 53.77% | 58.44% | 69.06% | 51.59% | 65.03% | 58.75% | 60.18% | 69.23% |
0.017 | 0.021 | 0.02 | 0.038 | 0.03 | 0.014 | 0.026 | 0.005 | ||
10, 1500 | hm09g | 54.79% | 57.51% | 68.71% | 50.71% | 65.20% | 57.72% | 59.23% | 68.28% |
0.013 | 0.022 | 0.023 | 0.043 | 0.033 | 0.017 | 0.027 | 0.005 | ||
11,500 | hm17g | 52.47% | 54.40% | 65.86% | 49.94% | 64.12% | 55.12% | 55.55% | 66.08% |
0.012 | 0.024 | 0.03 | 0.037 | 0.033 | 0.019 | 0.023 | 0.006 | ||
11, 1000 | yst08r | 57.96% | 61.24% | 69.12% | 64.92% | 67.80% | 63.03% | 62.91% | 71.15% |
0.014 | 0.03 | 0.019 | 0.028 | 0.048 | 0.03 | 0.023 | 0.006 | ||
12,500 | mus11m | 48.87% | 49.51% | 62.01% | 45.32% | 58.85% | 49.54% | 51.25% | 62.98% |
0.014 | 0.031 | 0.014 | 0.043 | 0.039 | 0.031 | 0.02 | 0.007 | ||
13, 2000 | hm04m | 47.89% | 52.18% | 62.17% | 45.03% | 60.90% | 52.21% | 53.20% | 63.88% |
0.011 | 0.029 | 0.018 | 0.029 | 0.03 | 0.022 | 0.024 | 0.005 | ||
13, 1000 | mus10g | 43.94% | 47.05% | 59.25% | 41.73% | 55.34% | 46.22% | 48.92% | 62.64% |
0.01 | 0.032 | 0.016 | 0.036 | 0.039 | 0.03 | 0.021 | 0.005 | ||
15,500 | hm08m | 43.45% | 46.98% | 58.07% | 42.63% | 57.15% | 46.81% | 48.57% | 61.25% |
0.011 | 0.037 | 0.015 | 0.028 | 0.046 | 0.037 | 0.023 | 0.009 | ||
16, 1000 | yst09g | 42.34% | 46.50% | 56.86% | 42.68% | 55.70% | 45.74% | 47.92% | 62.11% |
0.011 | 0.029 | 0.021 | 0.022 | 0.056 | 0.037 | 0.022 | 0.007 | ||
18, 2000 | hm01g | 34.55% | 37.95% | 49.56% | 33.92% | 42.98% | 34.40% | 42.05% | 59.96% |
0.009 | 0.042 | 0.065 | 0.027 | 0.054 | 0.029 | 0.033 | 0.01 | ||
35, 2000 | hm20r | 21.91% | 25.68% | 36.43% | 22.89% | 25.33% | 22.67% | 25.08% | 46.70% |
0.006 | 0.02 | 0.065 | 0.014 | 0.026 | 0.027 | 0.02 | 0.016 |
For datasets with n = 8, 9 sequences, DEPT has the best hypervolume in most cases. Also, MOICA and MO-FA have a value close to that of DEPT in this interval, but other algorithms have very weak results. For datasets with n > 9, MOICA has the best results and its hypervolume for hm10g (n = 18) and hm20r (n = 35) is significant. Other algorithms are very weak for datasets with more than 9 sequences except MO-FA that has a better result just for hm09g (n = 10).
In biological indicators, we examine the biological properties of our proposal with fifteen methods in TRANSFAC database. In this assessment, we just ran MOICA 5 times for each dataset to elicit its properties. This database contains a list of four species with different n and m. We first define these biological tools based on both the nucleotide level and the site level. Specifically, at the nucleotide level [33].
● nTP is the number of nucleotide positions in both known sites and predicted sites;
● nFN is the number of nucleotide positions in known sites but not in predicted sites;
● nFP is the number of nucleotide positions not in known sites but in predicted sites;
● nTN is the number of nucleotide positions in neither known sites nor predicted sites.
At the site level [33]:
● sTP is the number of known sites overlapped by predicted sites;
● sFN is the number of known sites not overlapped by predicted sites;
● sFP is the number of predicted sites not overlapped by known sites.
We ran MOICA on the selected TRANSFAC database and found the following results for Fly, Human, Mouse, and Yeast.
MOICA | nTP | nFP | nFN | nTN | sTP | sFP | sFN |
Fly | 177 | 655 | 494 | 41674 | 11 | 7 | 40 |
Human | 1092 | 4860 | 4027 | 277021 | 53 | 49 | 245 |
Mouse | 612 | 1692 | 1027 | 52169 | 36 | 14 | 62 |
Yeast | 339 | 2349 | 871 | 57441 | 21 | 23 | 53 |
Like Table 4, these parameters were computed for AlignAce, ANN-Spec, Consensus, GLAM, Improbizer, MEME, MEME3, MITRA, MotifSampler, ologo/dyad-analysis, QuickScore, SeSiMCMC, Weeder, and YMF. They were reported in Tompa [33].
We need some biological indicators to show the performance of each method on different species. These biological indicators are defined at the nucleotide (x = n) and site (x = s) levels as follows [33]:
●
●
●
●
●
●
Add nTP, nFP, nFN, nTN, sTP, sFP, and sFN over the TRANSFAC database and compute each indicator based on the top values (Table 3). We can see that our algorithm improves all biological indicators for Mouse and Fly instances significantly. It also gets best results for Human instances except for nPPV indicator. We also ran our proposal on Yeast datasets, but its result is weak. Table 4 indicates that Weeder has the best performance on Yeast. These biological indicators show that MOICA gets good results in most instances. It is important to mention that some biological tools achieve good results just in some instances but fail to obtain reasonable motifs in others.
Fly | nSn | nPPV | nPC | nCC | sSn | sPPV | sASP |
MOICA | 0.263785 | 0.21274 | 0.133484 | 0.223421 | 0.215686 | 0.611111 | 0.413399 |
AlignAce | 0 | 0 | 0 | −0.0063762 | 0 | 0 | 0 |
ANN-Spec | 0.0253353 | 0.0175258 | 0.010468 | 0.0023548 | 0.0196078 | 0.009434 | 0.0145209 |
Consensus | 0 | 0 | 0 | −0.0110554 | 0 | 0 | 0 |
GLAM | 0.0029806 | 0.004902 | 0.001857 | −0.0084518 | 0 | 0 | 0 |
Improbizer | 00149031 | 0.0176056 | 0.0081367 | 0.0018679 | 0.0196078 | 0.0227273 | 0.0211676 |
MEME | 0.0417288 | 0.0421053 | 0.0214067 | 0.0267982 | 0.0588235 | 0.0555556 | 0.0571895 |
MEME3 | 0.0372578 | 0.0263713 | 0.0156838 | 0.0130431 | 0.0588235 | 0.0447761 | 0.0517998 |
MITRA | 0 | 0 | 0 | −0.0078616 | 0 | 0 | 0 |
MotifSampler | 0.0044709 | 0.0082192 | 0.0029042 | −0.0055135 | 0 | 0 | 0 |
ologo/dyad-analysis | 0 | 0 | 0 | −0.0149773 | 0 | 0 | 0 |
QuickScore | 0 | 0 | 0 | −0.0157195 | 0 | 0 | 0 |
SeSiMCMC | 0.1013413 | 0.0538827 | 0.0364611 | 0.0537034 | 0.0980392 | 0.125 | 0.1115196 |
Weeder | 0.0119225 | 0.0344828 | 0.0089385 | 0.0112184 | 0.0196078 | 0.0344828 | 0.0270453 |
YMF | 0 | 0 | 0 | −0.0138211 | 0 | 0 | 0 |
Human | nSn | nPPV | nPC | nCC | sSn | sPPV | sASP |
MOICA | 0.213323 | 0.183468 | 0.10943 | 0.182113 | 0.177852 | 0.519608 | 0.34873 |
AlignAce | 0.0392655 | 0.102551 | 0.0292236 | 0.0531478 | 0.0738255 | 0.1235955 | 0.0987105 |
ANN-Spec | 0.090252 | 0.1031711 | 0.0505747 | 0.0812784 | 0.1644295 | 0.0983936 | 0.1314116 |
Consensus | 0 | NaN | 0 | NaN | 0 | NaN | NaN |
GLAM | 0.0236374 | 0.0367669 | 0.0145977 | 0.0155035 | 0.0402685 | 0.06 | 0.0501342 |
Improbizer | 0.0416097 | 0.0476084 | 0.0227079 | 0.0284205 | 0.0704698 | 0.0483871 | 0.0594284 |
MEME | 0.0380934 | 0.0603902 | 0.0239176 | 0.0343922 | 0.0604027 | 0.0810811 | 0.0707419 |
MEME3 | 0.0420004 | 0.0470975 | 0.0227057 | 0.0282219 | 0.0637584 | 0.0788382 | 0.0712983 |
MITRA | 0.0244188 | 0.0471342 | 0.0163484 | 0.0214655 | 0.0402685 | 0.046875 | 0.0435717 |
MotifSampler | 0.0250049 | 0.0416531 | 0.015873 | 0.0188157 | 0.0469799 | 0.0430769 | 0.0450284 |
ologo/dyad-analysis | 0.0371166 | 0.213964 | 0.0326629 | 0.0825991 | 0.0604027 | 0.15 | 0.1052013 |
QuickScore | 0.0050791 | 0.0099388 | 0.0033727 | -0.0056328 | 0 | 0 | 0 |
SeSiMCMC | 0.0459074 | 0.0279962 | 0.0176984 | 0.0134837 | 0.0671141 | 0.0630915 | 0.0651028 |
Weeder | 0.0543075 | 0.2747036 | 0.047497 | 0.1154935 | 0.1073826 | 0.2580645 | 0.1827235 |
YMF | 0.0410236 | 0.0967296 | 0.029661 | 0.0521165 | 0.0738255 | 0.080292 | 0.0770587 |
Mouse | nSn | nPPV | nPC | nCC | sSn | sPPV | sASP |
MOICA | 0.373398 | 0.265625 | 0.183729 | 0.290236 | 0.367347 | 0.720000 | 0.543673 |
AlignAce | 0.028676 | 0.0490605 | 0.0184314 | 0.0152885 | 0.0306122 | 0.0361446 | 0.0333784 |
ANN-Spec | 0.0433191 | 0.0430564 | 0.0220703 | 0.0139802 | 0.0816327 | 0.0425532 | 0.0620929 |
Consensus | 0.0488103 | 0.1062417 | 0.0346021 | 0.0531418 | 0.1020408 | 0.1219512 | 0.111996 |
GLAM | 0.0073215 | 0.0187793 | 0.0052957 | −0.0068546 | 0.0102041 | 0.0153846 | 0.0127943 |
Improbizer | 0.1079927 | 0.1219008 | 0.0607412 | 0.0894309 | 0.2244898 | 0.1605839 | 0.1925369 |
MEME | 0.0732154 | 0.1337793 | 0.0496689 | 0.0789261 | 0.1428571 | 0.175 | 0.1589286 |
MEME3 | 0.1061623 | 0.170088 | 0.0699357 | 0.1137754 | 0.1938776 | 0.2 | 0.1969388 |
MITRA | 0.0061013 | 0.015456 | 0.0043937 | −0.0090299 | 0.0204082 | 0.0327869 | 0.0265975 |
MotifSampler | 0.0445394 | 0.0913642 | 0.0308668 | 0.0441428 | 0.0816327 | 0.0952381 | 0.0884354 |
ologo/dyad-analysis | 0.0268456 | 0.1067961 | 0.0219233 | 0.03947 | 0.0612245 | 0.0952381 | 0.0782313 |
QuickScore | 0.0359976 | 0.089939 | 0.0263864 | 0.0390251 | 0.0816327 | 0.0677966 | 0.0747146 |
SeSiMCMC | 0.0622331 | 0.0408818 | 0.0252976 | 0.0145461 | 0.1020408 | 0.0952381 | 0.0986395 |
Weeder | 0.0616229 | 0.1753472 | 0.0477767 | 0.0882065 | 0.122449 | 0.1791045 | 0.1507767 |
YMF | 0.1012813 | 0.2017011 | 0.0722997 | 0.1247729 | 0.2040816 | 0.1818182 | 0.1929499 |
Yeast | nSn | nPPV | nPC | nCC | sSn | sPPV | sASP |
MOICA | 0.280165 | 0.126116 | 0.0952515 | 0.163648 | 0.283784 | 0.477273 | 0.380528 |
AlignAce | 0.1855754 | 0.1849758 | 0.1020954 | 0.1684257 | 0.28 | 0.2019231 | 0.2409615 |
ANN-Spec | 0.165316 | 0.14011099 | 0.0820595 | 0.1331542 | 0.3066667 | 0.1411043 | 0.2238855 |
Consensus | 0.0794165 | 0.2 | 0.0602706 | 0.1149074 | 0.1466667 | 0.2391304 | 0.1928986 |
GLAM | 0.0713128 | 0.0585106 | 0.0332075 | 0.0432325 | 0.1466667 | 0.0578947 | 0.1022807 |
Improbizer | 0.1572123 | 0.0950049 | 0.0629461 | 0.0988463 | 0.2666667 | 0.1333333 | 0.2 |
MEME | 0.1928687 | 0.3801917 | 0.1467324 | 0.260354 | 0.3066667 | 0.3833333 | 0.345 |
MEME3 | 0.2098865 | 0.3001159 | 0.140914 | 0.2381559 | 0.32 | 0.3037975 | 0.3118987 |
MITRA | 0.1110211 | 0.1525612 | 0.0686717 | 0.1148955 | 0.16 | 0.1538462 | 0.1569231 |
MotifSampler | 0.2560778 | 0.5039872 | 0.2045307 | 0.3501753 | 0.3866667 | 0.4915254 | 0.439096 |
ologo/dyad-analysis | 0.0899514 | 0.3303571 | 0.0760795 | 0.1639418 | 0.1866667 | 0.3043478 | 0.2455072 |
QuickScore | 0.0534846 | 0.0613953 | 0.0294249 | 0.0391636 | 0.12 | 0.0434783 | 0.0817391 |
SeSiMCMC | 0.1012966 | 0.0570255 | 0.0378673 | 0.0504601 | 0.0933333 | 0.0721649 | 0.0827491 |
Weeder | 0.2925446 | 0.5340237 | 0.2330536 | 0.3863337 | 0.52 | 0.5492958 | 0.5346479 |
YMF | 0.1442464 | 0.3296296 | 0.1115288 | 0.2076962 | 0.28 | 0.3387097 | 0.3093548 |
We also compared our proposal with MOABC in biological realm. The nTP, nFP, nFN, nTN, sTP, sFP, and sFN is not available for MOABC except a figure in [7] where nTP, nFP, nFN, nTN, sTP, sFP, and sFN of all species added up and then all biological indicators computed. We did not the same rule because it compares the mean results and may be unfair.
The last experiment is accomplished based on motif finding and its non-dominated solutions are reported in Table 6. We ran our algorithm on four datasets of dm04g, hm22m, mus03g, and yst01g; hm22m and mus03g have 6 and 5 sequences respectively and the sequence length of each one is 500 bps, dm04g has 4 sequences, each with length 2000 bps, and yst01g has 9 sequences each with length 1000 bps. We compared MOICA with MOABC [20], because MOABC almost had the best performance among all algorithms.
Predicted Motif | complexity | similarity | length | support | Method | data |
TCAACTGTAAATATAACTTAAAAAGGGAATACT | 0.84 | 0.64 | 33 | 3 | MOABC | dm04g |
GTTTGGAAGTGCTTAAATAAACTTGCAAAAAAC | 0.85 | 0.74 | 33 | 3 | MOICA | dm04g |
ATGACCCACACCACGCGCACGCATGGCCCGGCC | 0.81 | 0.61 | 33 | 4 | MOABC | hm22m |
CCACGCCAGACGGGCGTTGCAAGCCAGACCTACT | 0.83 | 0.68 | 34 | 4 | MOICA | hm22m |
AAGGCGTTGCTCAAGTGTTAAGAAAATACTGACAC | 0.87 | 0.58 | 35 | 4 | MOABC | mus03g |
TGGATAGAAGAACTACAACCTTCATGTCATACATTT | 0.87 | 0.63 | 36 | 4 | MOICA | mus03g |
ATGAAATTAAACCCAA | 0.74 | 0.59 | 16 | 4 | MOABC | yst01g |
TACCATAATTCATAGAT | 0.74 | 0.69 | 17 | 4 | MOICA | yst01g |
We used the reported results for these four datasets in [7]. We computed length, support, similarity, and complexity of the reported results for MOABC and compared them with our results. The comparative results are recorded in Table 6. Our goal and other researchers were not to maximize complexity, but we report it just to show a fair comparison. In fact, most of algorithms can find the best results for length, support, and similarity with a low complexity. Table 6 shows that MOICA has the best results. We almost fixed support, length, and complexity to find a motif with high similarity. The results confirm that MOICA beats MOABC in motif finding.
Notice that MOABC and MOICA are multi-objective, so we can find alternative answers with different support and similarity for each length. As we can see in Table 6, MOICA obtained the best results for all of these datasets.
In this experiment, we ran our algorithm so that only one or two empire(s) remained at the end. To reach this goal, we fixed the parameters of our algorithm at nPop = 200, nEmp = 50, nCol = 150 and MaxIt = 800. We also defined rate based on datasets. The selection probability of each sequence (rate) depends on the data and the kind of the motif that we want to find. For example, if the support of the desired motif is close to the number of sequences of the dataset, it is better to take rate from 70 to 100. Moreover, the selection probability of each sequence (rate) changes from 30 to 100.
In this paper, we have proposed a metaheuristic algorithm for solving MDP considering a Multi-objective Imperialist Competition Algorithm (MOICA) based on three optimization functions. A big contribution of our method is that it gives alternative answers in a single run. Thus, the user can select useful ones among multiple answers.
In order to assess the performance of our algorithm, we ran MOICA on 54 sequence datasets from four species with different length and various numbers of sequences and compared the result with those obtained by previous methods. To facilitate comparing, we used the same criteria as used in multi-objective algorithms like hypervolume indicator. It is demonstrated that our algorithm gets a significant average hypervolume indicator in a set of instances with more than nine sequences. We also computed biological indicators (nCC, nPC, nSn, nPPV, sSn, sPPV, sASP, and nSp) for some selected instances in the benchmark that were mentioned in previous works. Its biological indicators have the best results in all instances except yeast. Furthermore, we also found motifs for 4 different datasets with different number of sequences and lengths. It can be concluded that the solutions found by our algorithm gives the best results among all algorithms in the literature. In future, we can modify our algorithm by composing other SI algorithms to make it more realistic with a better performance. Artificial Bee Colony (ABC) is a metaheuristic procedure which obtained good result for MDP considering a multi-objective framework.
As future works, we intend to design new approaches to discover common patterns in sequences. At first, we interest in integrating ICA and ABC to invent a new algorithm to improve results for a bigger set of the datasets. Other desirable research is directly related to model this problem as an optimization-programming problem to use the existing algorithms for solving it. Finally, we would like to try to find new methods with the aim of improving both optimization programming and biological results.
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The authors appreciate University of Guilan for partial financial support.
The authors declare that they have no conflict of interest.
[1] | F. Zare-Mirakabad, H. Ahrabian and M. Sadeghi, et al., Genetic algorithm for dyad pattern finding in DNA sequences, Genes Genet. Syst., 84 (2009), 81–93. |
[2] | M. Li, B. Ma and L. Wang, Finding similar regions in many sequences, J. Comput. Syst. Sci., 65 (2002), 73–96. |
[3] | M. F. Sagot, Spelling approximate repeated or common motifs using a suffix tree, Springer, 1998. |
[4] | F. W. Glover and G. A. Kochenberger, Handbook of metaheuristics, Springer Science & Business Media, 2006. |
[5] | E. Czeizler, T. Hirvola and K. Karhu, A graph-theoretical approach for motif discovery in protein sequences, IEEE/ACM Trans. Comput. Biol. Bioinf., 14 (2017), 121–130. |
[6] | M. Kaya, MOGAMOD: Multi-objective genetic algorithm for motif discovery, Expert. Syst. Appl., 36 (2009), 1039–1047. |
[7] | D. L. González-Álvarez, M. A. Vega-Rodríguez and Á. Rubio-Largo, Multiobjective optimization algorithms for motif discovery in DNA sequences, Genet. Program. Evolvable Mach., 16 (2015), 167–209. |
[8] | C. E. Lawrence and A. A. Reilly, An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences, Proteins, 7 (1990), 41–51. |
[9] | C. E. Lawrence, S. F. Altschul and M. S. Boguski, et al., Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment, Science, 262 (1993), 208–214. |
[10] | T. L. Bailey and C. Elkan, Fitting a mixture model by expectation maximization to discover motifs in bipolymers, Proc. Int. Conf. Intell. Syst. Mol. Biol., 2 (1994), 28–36.. |
[11] | F. P. Roth, J. D. Hughes and P. W. Estep, et al., Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation, Nat. Biotechnol., 16 (1998), 939–945. |
[12] | K. C. Wong, MotifHyades: Expectation maximization for de novo DNA motif pair discovery on paired sequences, Bioinformatics, 33 (2017), 3028–3035. |
[13] | K. C. Wong, DNA Motif Recognition Modeling from Protein Sequences, iScience, 7 (2018), 198–211. |
[14] | G. Pavesi, P. Mereghetti and G. Mauri, et al., Weeder Web: Discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res., 32 (2004), W199–W203. |
[15] | E. Eskin and P. A. Pevzner, Finding composite regulatory patterns in DNA sequences, Bioinformatics, 18 (2002), S354–S363. |
[16] | P. A. Evans and A. D. Smith, Toward optimal motif enumeration, Springer, 2003. |
[17] | J. Serra, A. Matic and A. Karatzoglou, et al., A genetic algorithm to discover flexible motifs with support, IEEE, 2016. |
[18] | N. Pisanti, A. M. Carvalho and L. Marsan, et al., RISOTTO: Fast extraction of motifs with mismatches, Springer, 2006. |
[19] | G. Z. Hertz and G. D. Stormo, Identifying DNA and protein patterns with statistically significant alignments of multiple sequences, Bioinformatics, 15 (1999), 563–577. |
[20] | D. L. González-Álvarez, M. A. Vega-Rodríguez and J. A. Gómez-Pulido, et al., Finding Motifs in DNA Sequences Applying a Multiobjective Artificial Bee Colony (MOABC) Algorithm, Springer, 2011. |
[21] | D. L. González-Álvarez, M. A. Vega-Rodríguez and Á. Rubio-Largo, Searching for common patterns on protein sequences by means of a parallel hybrid honey-bee mating optimization algorithm, Parallel. Comput., 76 (2018), 1–17. |
[22] | E. Zitzler and L. Thiele, Multiobjective evolutionary algorithms: A comparative case study and the strength Pareto approach, IEEE T. Evolut. Comput., 3 (1999), 257–271. |
[23] | E. Wingender, P, Dietze and H. Karas, et al., TRANSFAC: A database on transcription factors and their DNA binding sites, Nucleic Acids Res., 24 (1996), 238–241. |
[24] | D. L. González-Álvarez, M. A. Vega-Rodríguez and J. A. Gómez-Pulido, et al., Solving the motif discovery problem by using differential evolution with pareto tournaments, IEEE, 2010. |
[25] | G. B. Fogel, D. G. Weekes and G. Varga, et al., Discovery of sequence motifs related to coexpression of genes using evolutionary computation, Nucleic Acids Res., 32 (2004), 3826–3835. |
[26] | E. Zitzler, K. Deb and L. Thiele, Comparison of multiobjective evolutionary algorithms: Empirical results, Evolut. Comput., 8 (2000), 173–195. |
[27] | E. Atashpaz-Gargari and C. Lucas, Imperialist competitive algorithm: An algorithm for optimization inspired by imperialistic competition, IEEE, 2007. |
[28] | D. L. Gonzalez-Álvarez, M. A. Vega-Rodriguez and J. A. Gomez-Pulido, et al., Predicting DNA motifs by using evolutionary multiobjective optimization, IEEE. T. Syst. Man. Cy. C., 42 (2012), 913–925. |
[29] | X. S. Yang, Firefly algorithms for multimodal optimization, Springer, 2009. |
[30] | D. L. González-Álvarez, M. A. Vega-Rodríguez and J, A. Gómez-Pulido, et al., Applying a multiobjective gravitational search algorithm (MO-GSA) to discover motifs, Springer, 2011. |
[31] | E. Zitzler, M. Laumanns and L. Thiele, SPEA2: Improving the strength Pareto evolutionary algorithm, 2001. |
[32] | K. Deb, A. Pratap and S. Agarwal, et al., A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE T. Evolut. Comput., 6 (2002), 182–197. |
[33] | M. Tompa, N. Li and T. L. Bailey, et al., Assessing computational tools for the discovery of transcription factor binding sites, Nat. Biotechnol., 23 (2005), 137. |
1. | José M. Granado-Criado, Sergio Santander-Jiménez, Miguel A. Vega-Rodríguez, Álvaro Rubio-Largo, A multi-objective optimization procedure for solving the high-order epistasis detection problem, 2020, 142, 09574174, 113000, 10.1016/j.eswa.2019.113000 | |
2. | Ehsan Yarmohammadi, Mohammad Ali Izadbakhsh, Ahmad Rajabi, Fariborz Yosefvand, Saeid Shabanlou, Optimal operation of water resources systems using MOICA algorithm with reservoir hedging approach in low‐water regions*, 2022, 71, 1531-0353, 406, 10.1002/ird.2660 | |
3. | Laura Calvet, Sergio Benito, Angel A. Juan, Ferran Prados, On the role of metaheuristic optimization in bioinformatics, 2022, 0969-6016, 10.1111/itor.13164 | |
4. | Alexander I. Dolgiy, Sergey M. Kovalev, Anna E. Kolodenkova, Andrey V. Sukhanov, 2019, Chapter 17, 978-3-030-30762-2, 203, 10.1007/978-3-030-30763-9_17 | |
5. | Fatma A. Hashim, Essam H. Houssein, Kashif Hussain, Mai S. Mabrouk, Walid Al-Atabany, A modified Henry gas solubility optimization for solving motif discovery problem, 2020, 32, 0941-0643, 10759, 10.1007/s00521-019-04611-0 | |
6. | S. Kovalev, A. Kolodenkova, 2019, Intellectual Approach to the Design of Fuzzy Systems Based on Multi-objective Evolutionary Modeling, 978-1-7281-0061-6, 1, 10.1109/FarEastCon.2019.8933848 | |
7. | Mehrzad Alizadeh, Jeff Gostick, Takahiro Suzuki, Shohji Tsushima, Topological optimization for tailored designs of advection–diffusion-reaction porous reactors based on pore scale modeling and simulation: A PNM-NSGA framework, 2024, 301, 00457949, 107452, 10.1016/j.compstruc.2024.107452 | |
8. | Hamed Nikravesh, Ali Ranjbar, Yousef Kazemzadeh, Zohre Nikravesh, A Review of Optimization Methods for Production and Injection Well Placement in Hydrocarbon Reservoirs, 2024, 24058440, e39232, 10.1016/j.heliyon.2024.e39232 | |
9. | Yanyun Zhang, Li Cheng, Guanyu Chen, Daniyal Alghazzawi, Evolutionary Computation in bioinformatics: A survey, 2024, 591, 09252312, 127758, 10.1016/j.neucom.2024.127758 |
Variable | Definition |
n | Number of sequences |
m | Length of each sequence |
l | length of motif |
si | ith sequence with length m which is a composition of the alphabet Σ |
si[k, l] | A subsequence from si with length l that started from k |
si[k] | A single alphabet of si in position k |
xi | The starting position of an occurrence in the si and xi=0, 1, 2, …, m-l+1 |
x | An individual, x=(x1, x2, …, xn) |
fiΣ | Frequency of Σ in the ith column of the occurrence matrix |
wi | wi=fiΣ |
cwi | The corresponding alphabet for wi |
ni, j | Number of the jth alphabet from Σ in the selected sequence xi |
uxi | |
δ | |
consensus(x) | consensus(x)=cw1 cw2…cwn |
concordance(xi) | |
support(x) | |
similarity(x) | |
Complexity(xi) | |
complexity(x) |
Time(sec.) | nCol | nEmp | nPop | MaxIt | rate | Len. | Seq. | Dataset |
15 | 110 | 50 | 160 | 390 | 55% | 500 | 3 | yst05r |
15 | 110 | 50 | 160 | 390 | 55% | 500 | 4 | yst02g |
20 | 110 | 50 | 160 | 390 | 55% | 1000 | 5 | yst10m |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 6 | yst07m |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 7 | yst06g |
25 | 110 | 50 | 160 | 390 | 55% | 500 | 8 | yst03m |
25 | 110 | 50 | 160 | 390 | 55% | 1000 | 9 | yst01g |
30 | 110 | 50 | 160 | 390 | 55% | 1000 | 11 | yst08r |
30 | 100 | 35 | 135 | 330 | 55% | 1000 | 16 | yst09g |
25 | 110 | 50 | 160 | 390 | 70% | 1000 | 7 | yst04r |
25 | 110 | 50 | 160 | 390 | 70% | 500 | 8 | yst03r |
30 | 90 | 35 | 125 | 300 | 70% | 1000 | 16 | yst09g |
30 | 110 | 50 | 160 | 400 | 60% | 500 | 12 | mus11r |
15 | 110 | 50 | 160 | 400 | 75% | 1500 | 4 | mus07r |
30 | 130 | 50 | 180 | 470 | 60% | 3000 | 7 | hm16r |
40 | 110 | 50 | 160 | 400 | 60% | 2000 | 13 | hm04r |
15 | 110 | 50 | 160 | 390 | 55% | 1500 | 3 | dm07m |
15 | 120 | 50 | 170 | 420 | 80% | 2000 | 3 | dm08m |
15 | 120 | 50 | 170 | 420 | 80% | 2000 | 3 | dm03m |
15 | 110 | 50 | 160 | 390 | 55% | 2500 | 3 | dm05g |
20 | 120 | 50 | 170 | 420 | 80% | 1500 | 4 | dm01g |
20 | 120 | 50 | 170 | 420 | 80% | 2000 | 4 | dm04g |
15 | 120 | 50 | 170 | 420 | 90% | 500 | 2 | hm12r |
15 | 120 | 50 | 170 | 420 | 90% | 500 | 2 | hm25g |
15 | 120 | 50 | 170 | 420 | 90% | 1000 | 2 | hm14r |
15 | 120 | 50 | 170 | 420 | 80% | 1000 | 3 | hm05r |
20 | 120 | 50 | 170 | 420 | 80% | 500 | 4 | hm23r |
15 | 110 | 50 | 160 | 390 | 55% | 2000 | 4 | hm15r |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 5 | hm19g |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 5 | hm21g |
20 | 110 | 50 | 160 | 390 | 55% | 1000 | 5 | hm07m |
20 | 110 | 50 | 160 | 390 | 55% | 3000 | 5 | hm18m |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 6 | hm22m |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 6 | hm10m |
20 | 110 | 50 | 160 | 390 | 55% | 1000 | 6 | hm13r |
20 | 110 | 45 | 155 | 350 | 55% | 3000 | 7 | hm16g |
20 | 110 | 40 | 150 | 345 | 55% | 500 | 8 | hm24m |
20 | 110 | 40 | 150 | 345 | 55% | 1000 | 8 | hm11g |
20 | 110 | 40 | 150 | 345 | 55% | 500 | 9 | hm06g |
20 | 110 | 40 | 150 | 345 | 55% | 1000 | 9 | hm26m |
20 | 110 | 40 | 150 | 345 | 55% | 1000 | 9 | hm02r |
30 | 110 | 50 | 160 | 390 | 55% | 1500 | 10 | hm03r |
30 | 110 | 50 | 160 | 390 | 55% | 1500 | 10 | hm09g |
30 | 110 | 50 | 160 | 390 | 55% | 500 | 11 | hm17g |
30 | 100 | 40 | 140 | 340 | 55% | 2000 | 13 | hm04m |
30 | 100 | 35 | 135 | 330 | 55% | 500 | 15 | hm08m |
30 | 100 | 30 | 130 | 300 | 55% | 2000 | 18 | hm01g |
30 | 90 | 20 | 110 | 190 | 55% | 2000 | 35 | hm20r |
10 | 110 | 50 | 160 | 390 | 55% | 500 | 2 | mus09r |
15 | 110 | 50 | 160 | 390 | 55% | 500 | 3 | mus01r |
15 | 110 | 50 | 160 | 390 | 55% | 500 | 3 | mus12m |
15 | 110 | 50 | 160 | 390 | 55% | 500 | 3 | mus06g |
15 | 110 | 50 | 160 | 390 | 55% | 1500 | 3 | mus08m |
15 | 110 | 50 | 160 | 390 | 55% | 500 | 4 | mus05r |
15 | 110 | 50 | 160 | 390 | 55% | 1500 | 4 | mus07g |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 5 | mus03g |
20 | 110 | 45 | 155 | 350 | 55% | 1000 | 7 | mus04m |
20 | 110 | 40 | 150 | 345 | 55% | 1000 | 9 | mus02r |
30 | 110 | 40 | 150 | 345 | 55% | 500 | 12 | mus11m |
30 | 100 | 40 | 140 | 340 | 55% | 1000 | 13 | mus10g |
20 | 110 | 50 | 160 | 450 | 95% | 2500 | 5 | dm05r |
25 | 150 | 50 | 200 | 450 | 80% | 2000 | 4 | dm04r |
15 | 110 | 50 | 160 | 400 | 75% | 1500 | 4 | dm01r |
n, m | Dataset | SPEA2 | MO-GSA | MO-FA | NSGA-II | DEPT | MO-VNS | MOABC | MOICA |
2,500 | mus09r | 93.35% | 92.92% | 90.36% | 93.46% | 90.95% | 84.79% | 92.95% | 83.12% |
0.002 | 0.005 | 0.004 | 0.001 | 0.002 | 0.061 | 0.005 | 0.002 | ||
2,500 | hm12r | 94.59% | 93.88% | 91.14% | 94.71% | 91.50% | 90.69% | 94.29% | 85.12% |
0.007 | 0.005 | 0.006 | 0.002 | 0.005 | 0.034 | 0.005 | 0.002 | ||
2,500 | hm25g | 93.82% | 93.20% | 90.24% | 93.94% | 91.30% | 79.92% | 93.66% | 81.93% |
0.003 | 0.011 | 0.007 | 0.01 | 0.008 | 0.062 | 0.007 | 0.003 | ||
2, 1000 | hm14r | 93.96% | 94.11% | 91.03% | 94.40% | 91.40% | 86.77% | 93.88% | 84.07% |
0.003 | 0.003 | 0.006 | 0.004 | 0.003 | 0.062 | 0.006 | 0.002 | ||
3,500 | yst05r | 85.66% | 81.29% | 83.63% | 85.96% | 80.76% | 86.61% | 86.89% | 82.32% |
0.004 | 0.02 | 0.014 | 0.01 | 0.021 | 0.007 | 0.007 | 0.004 | ||
3,500 | mus01r | 85.65% | 80.50% | 84.44% | 85.26% | 80.40% | 85.90% | 86.93% | 80.28% |
0.004 | 0.022 | 0.017 | 0.011 | 0.017 | 0.007 | 0.006 | 0.004 | ||
3,500 | mus12m | 84.96% | 80.78% | 82.72% | 84.19% | 80.04% | 85.41% | 85.78% | 80.46% |
0.004 | 0.018 | 0.01 | 0.012 | 0.018 | 0.006 | 0.006 | 0.004 | ||
3,500 | mus06g | 83.46% | 79.94% | 81.91% | 83.35% | 80.01% | 84.21% | 84.33% | 79.60% |
0.003 | 0.017 | 0.008 | 0.01 | 0.02 | 0.007 | 0.006 | 0.003 | ||
3, 1000 | hm05r | 85.10% | 80.32% | 82.73% | 85.19% | 80.46% | 85.41% | 85.50% | 82.38% |
0.005 | 0.015 | 0.009 | 0.012 | 0.019 | 0.007 | 0.009 | 0.005 | ||
3, 1500 | dm07m | 85.19% | 80.42% | 83.53% | 85.01% | 81.01% | 85.87% | 86.09% | 81.65% |
0.006 | 0.02 | 0.01 | 0.008 | 0.015 | 0.007 | 0.011 | 0.004 | ||
3, 1500 | mus08m | 84.60% | 80.48% | 83.10% | 84.52% | 80.85% | 85.11% | 85.77% | 80.71% |
0.004 | 0.023 | 0.01 | 0.01 | 0.01 | 0.007 | 0.008 | 0.005 | ||
3, 2000 | dm08m | 85.17% | 80.94% | 83.25% | 85.90% | 81.30% | 85.75% | 85.96% | 81.81% |
0.004 | 0.017 | 0.01 | 0.007 | 0.015 | 0.007 | 0.008 | 0.003 | ||
3, 2000 | dm03m | 85.16% | 80.75% | 83.52% | 84.56% | 81.38% | 85.56% | 86.06% | 81.29% |
0.006 | 0.021 | 0.011 | 0.007 | 0.017 | 0.007 | 0.011 | 0.003 | ||
3, 2500 | dm05g | 85.59% | 80.30% | 83.67% | 86.69% | 81.55% | 85.88% | 86.39% | 82.18% |
0.005 | 0.021 | 0.012 | 0.005 | 0.014 | 0.008 | 0.01 | 0.003 | ||
4,500 | yst02g | 82.28% | 78.78% | 80.82% | 79.81% | 78.21% | 82.50% | 82.87% | 78.80% |
0.005 | 0.012 | 0.011 | 0.007 | 0.016 | 0.007 | 0.01 | 0.003 | ||
4,500 | hm23r | 81.18% | 78.23% | 80.89% | 79.51% | 78.13% | 81.57% | 82.38% | 77.16% |
0.005 | 0.013 | 0.011 | 0.007 | 0.018 | 0.007 | 0.007 | 0.002 | ||
4,500 | mus05r | 80.44% | 78.08% | 80.25% | 79.81% | 77.56% | 81.45% | 81.69% | 77.88% |
0.004 | 0.013 | 0.008 | 0.006 | 0.019 | 0.006 | 0.008 | 0.003 | ||
4, 1500 | mus07g | 87.73% | 79.85% | 82.43% | 90.43% | 80.01% | 89.08% | 89.21% | 85.55% |
0.01 | 0.018 | 0.015 | 0.022 | 0.034 | 0.015 | 0.029 | 0.013 | ||
4, 1500 | dm01g | 81.74% | 79.03% | 81.62% | 83.77% | 79.66% | 82.95% | 83.24% | 80.91% |
0.004 | 0.015 | 0.013 | 0.01 | 0.017 | 0.006 | 0.008 | 0.004 | ||
4, 2000 | dm04g | 81.08% | 79.31% | 81.91% | 81.16% | 79.78% | 82.90% | 83.77% | 80.34% |
0.005 | 0.014 | 0.014 | 0.006 | 0.018 | 0.008 | 0.014 | 0.003 | ||
4, 2000 | hm15r | 85.93% | 80.23% | 83.98% | 85.23% | 80.29% | 85.23% | 88.59% | 82.80% |
0.009 | 0.025 | 0.042 | 0.022 | 0.022 | 0.016 | 0.027 | 0.01 | ||
5,500 | hm19g | 79.74% | 76.81% | 80.08% | 79.33% | 78.71% | 80.35% | 81.37% | 76.76% |
0.004 | 0.019 | 0.009 | 0.007 | 0.022 | 0.007 | 0.009 | 0.003 | ||
5,500 | mus03g | 77.31% | 74.58% | 79.67% | 76.48% | 77.61% | 77.39% | 79.84% | 72.74% |
0.004 | 0.044 | 0.012 | 0.012 | 0.016 | 0.018 | 0.008 | 0.003 | ||
5, 1000 | yst10m | 65.38% | 63.39% | 64.99% | 65.24% | 63.49% | 66.08% | 66.67% | 75.53% |
0.003 | 0.01 | 0.009 | 0.004 | 0.014 | 0.005 | 0.007 | 0.003 | ||
5, 1000 | hm21g | 78.61% | 77.01% | 79.49% | 78.24% | 78.72% | 79.76% | 81.12% | 75.83% |
0.004 | 0.019 | 0.01 | 0.008 | 0.012 | 0.007 | 0.01 | 0.002 | ||
5, 1000 | hm07m | 78.71% | 77.37% | 79.91% | 79.39% | 78.91% | 79.66% | 81.24% | 74.87% |
0.004 | 0.017 | 0.01 | 0.005 | 0.017 | 0.006 | 0.009 | 0.003 | ||
5, 3000 | hm18m | 78.58% | 77.39% | 79.97% | 77.58% | 79.80% | 79.38% | 81.31% | 75.16% |
0.004 | 0.023 | 0.009 | 0.004 | 0.014 | 0.007 | 0.008 | 0.003 | ||
6,500 | yst07m | 75.92% | 70.49% | 77.36% | 74.57% | 76.81% | 74.22% | 78.27% | 72.27% |
0.005 | 0.038 | 0.008 | 0.013 | 0.012 | 0.032 | 0.008 | 0.003 | ||
6,500 | hm22m | 75.53% | 69.87% | 76.84% | 75.02% | 76.98% | 74.12% | 77.91% | 72.22% |
0.005 | 0.047 | 0.009 | 0.03 | 0.011 | 0.028 | 0.007 | 0.003 | ||
6,500 | hm10m | 75.22% | 69.01% | 77.54% | 74.61% | 77.11% | 72.36% | 78.23% | 72.06% |
0.005 | 0.038 | 0.009 | 0.02 | 0.014 | 0.034 | 0.011 | 0.003 | ||
6, 1000 | hm13r | 75.19% | 70.27% | 77.35% | 76.08% | 77.24% | 72.88% | 78.18% | 72.93% |
0.005 | 0.04 | 0.008 | 0.01 | 0.008 | 0.031 | 0.01 | 0.003 | ||
7,500 | yst06g | 73.19% | 68.05% | 75.75% | 73.84% | 74.94% | 72.04% | 75.93% | 74.10% |
0.005 | 0.03 | 0.014 | 0.011 | 0.018 | 0.028 | 0.012 | 0.002 | ||
7, 1000 | yst04r | 71.43% | 68.20% | 73.98% | 77.04% | 73.50% | 72.88% | 75.54% | 75.30% |
0.006 | 0.034 | 0.014 | 0.007 | 0.02 | 0.028 | 0.012 | 0.003 | ||
7, 1000 | mus04m | 69.43% | 63.99% | 74.24% | 67.86% | 74.31% | 65.49% | 71.10% | 70.02% |
0.02 | 0.03 | 0.01 | 0.012 | 0.012 | 0.021 | 0.023 | 0.003 | ||
7, 3000 | hm16g | 71.13% | 69.12% | 83.63% | 71.82% | 78.42% | 70.36% | 81.55% | 76.90% |
0.023 | 0.043 | 0.051 | 0.015 | 0.044 | 0.026 | 0.055 | 0.015 | ||
8,500 | yst03m | 68.37% | 62.59% | 72.65% | 66.44% | 73.26% | 65.03% | 69.68% | 69.97% |
0.011 | 0.038 | 0.01 | 0.011 | 0.012 | 0.026 | 0.024 | 0.004 | ||
8,500 | hm24m | 68.28% | 59.58% | 72.19% | 65.10% | 72.91% | 62.65% | 67.65% | 69.08% |
0.013 | 0.036 | 0.008 | 0.011 | 0.012 | 0.028 | 0.019 | 0.003 | ||
8, 1000 | hm11g | 69.34% | 60.14% | 73.15% | 69.38% | 74.54% | 63.86% | 69.93% | 72.17% |
0.012 | 0.044 | 0.017 | 0.031 | 0.021 | 0.028 | 0.021 | 0.005 | ||
9,500 | hm06g | 61.24% | 56.60% | 69.37% | 60.06% | 70.68% | 59.43% | 63.21% | 69.52% |
0.012 | 0.029 | 0.012 | 0.015 | 0.027 | 0.019 | 0.019 | 0.003 | ||
9, 1000 | yst01g | 63.16% | 59.37% | 70.14% | 62.75% | 70.72% | 61.48% | 65.16% | 69.94% |
0.013 | 0.024 | 0.011 | 0.013 | 0.014 | 0.021 | 0.025 | 0.004 | ||
9, 1000 | hm26m | 62.28% | 58.89% | 70.48% | 60.86% | 70.68% | 60.98% | 64.43% | 68.51% |
0.015 | 0.031 | 0.012 | 0.013 | 0.016 | 0.025 | 0.025 | 0.004 | ||
9, 1000 | hm02r | 59.09% | 57.57% | 69.88% | 61.50% | 70.85% | 59.04% | 63.45% | 68.68% |
0.013 | 0.024 | 0.014 | 0.014 | 0.023 | 0.022 | 0.033 | 0.006 | ||
9, 1000 | mus02r | 60.62% | 57.21% | 69.78% | 61.99% | 70.02% | 59.34% | 63.58% | 69.08% |
0.012 | 0.025 | 0.015 | 0.013 | 0.019 | 0.025 | 0.024 | 0.005 | ||
10, 1500 | hm03r | 53.77% | 58.44% | 69.06% | 51.59% | 65.03% | 58.75% | 60.18% | 69.23% |
0.017 | 0.021 | 0.02 | 0.038 | 0.03 | 0.014 | 0.026 | 0.005 | ||
10, 1500 | hm09g | 54.79% | 57.51% | 68.71% | 50.71% | 65.20% | 57.72% | 59.23% | 68.28% |
0.013 | 0.022 | 0.023 | 0.043 | 0.033 | 0.017 | 0.027 | 0.005 | ||
11,500 | hm17g | 52.47% | 54.40% | 65.86% | 49.94% | 64.12% | 55.12% | 55.55% | 66.08% |
0.012 | 0.024 | 0.03 | 0.037 | 0.033 | 0.019 | 0.023 | 0.006 | ||
11, 1000 | yst08r | 57.96% | 61.24% | 69.12% | 64.92% | 67.80% | 63.03% | 62.91% | 71.15% |
0.014 | 0.03 | 0.019 | 0.028 | 0.048 | 0.03 | 0.023 | 0.006 | ||
12,500 | mus11m | 48.87% | 49.51% | 62.01% | 45.32% | 58.85% | 49.54% | 51.25% | 62.98% |
0.014 | 0.031 | 0.014 | 0.043 | 0.039 | 0.031 | 0.02 | 0.007 | ||
13, 2000 | hm04m | 47.89% | 52.18% | 62.17% | 45.03% | 60.90% | 52.21% | 53.20% | 63.88% |
0.011 | 0.029 | 0.018 | 0.029 | 0.03 | 0.022 | 0.024 | 0.005 | ||
13, 1000 | mus10g | 43.94% | 47.05% | 59.25% | 41.73% | 55.34% | 46.22% | 48.92% | 62.64% |
0.01 | 0.032 | 0.016 | 0.036 | 0.039 | 0.03 | 0.021 | 0.005 | ||
15,500 | hm08m | 43.45% | 46.98% | 58.07% | 42.63% | 57.15% | 46.81% | 48.57% | 61.25% |
0.011 | 0.037 | 0.015 | 0.028 | 0.046 | 0.037 | 0.023 | 0.009 | ||
16, 1000 | yst09g | 42.34% | 46.50% | 56.86% | 42.68% | 55.70% | 45.74% | 47.92% | 62.11% |
0.011 | 0.029 | 0.021 | 0.022 | 0.056 | 0.037 | 0.022 | 0.007 | ||
18, 2000 | hm01g | 34.55% | 37.95% | 49.56% | 33.92% | 42.98% | 34.40% | 42.05% | 59.96% |
0.009 | 0.042 | 0.065 | 0.027 | 0.054 | 0.029 | 0.033 | 0.01 | ||
35, 2000 | hm20r | 21.91% | 25.68% | 36.43% | 22.89% | 25.33% | 22.67% | 25.08% | 46.70% |
0.006 | 0.02 | 0.065 | 0.014 | 0.026 | 0.027 | 0.02 | 0.016 |
MOICA | nTP | nFP | nFN | nTN | sTP | sFP | sFN |
Fly | 177 | 655 | 494 | 41674 | 11 | 7 | 40 |
Human | 1092 | 4860 | 4027 | 277021 | 53 | 49 | 245 |
Mouse | 612 | 1692 | 1027 | 52169 | 36 | 14 | 62 |
Yeast | 339 | 2349 | 871 | 57441 | 21 | 23 | 53 |
Fly | nSn | nPPV | nPC | nCC | sSn | sPPV | sASP |
MOICA | 0.263785 | 0.21274 | 0.133484 | 0.223421 | 0.215686 | 0.611111 | 0.413399 |
AlignAce | 0 | 0 | 0 | −0.0063762 | 0 | 0 | 0 |
ANN-Spec | 0.0253353 | 0.0175258 | 0.010468 | 0.0023548 | 0.0196078 | 0.009434 | 0.0145209 |
Consensus | 0 | 0 | 0 | −0.0110554 | 0 | 0 | 0 |
GLAM | 0.0029806 | 0.004902 | 0.001857 | −0.0084518 | 0 | 0 | 0 |
Improbizer | 00149031 | 0.0176056 | 0.0081367 | 0.0018679 | 0.0196078 | 0.0227273 | 0.0211676 |
MEME | 0.0417288 | 0.0421053 | 0.0214067 | 0.0267982 | 0.0588235 | 0.0555556 | 0.0571895 |
MEME3 | 0.0372578 | 0.0263713 | 0.0156838 | 0.0130431 | 0.0588235 | 0.0447761 | 0.0517998 |
MITRA | 0 | 0 | 0 | −0.0078616 | 0 | 0 | 0 |
MotifSampler | 0.0044709 | 0.0082192 | 0.0029042 | −0.0055135 | 0 | 0 | 0 |
ologo/dyad-analysis | 0 | 0 | 0 | −0.0149773 | 0 | 0 | 0 |
QuickScore | 0 | 0 | 0 | −0.0157195 | 0 | 0 | 0 |
SeSiMCMC | 0.1013413 | 0.0538827 | 0.0364611 | 0.0537034 | 0.0980392 | 0.125 | 0.1115196 |
Weeder | 0.0119225 | 0.0344828 | 0.0089385 | 0.0112184 | 0.0196078 | 0.0344828 | 0.0270453 |
YMF | 0 | 0 | 0 | −0.0138211 | 0 | 0 | 0 |
Human | nSn | nPPV | nPC | nCC | sSn | sPPV | sASP |
MOICA | 0.213323 | 0.183468 | 0.10943 | 0.182113 | 0.177852 | 0.519608 | 0.34873 |
AlignAce | 0.0392655 | 0.102551 | 0.0292236 | 0.0531478 | 0.0738255 | 0.1235955 | 0.0987105 |
ANN-Spec | 0.090252 | 0.1031711 | 0.0505747 | 0.0812784 | 0.1644295 | 0.0983936 | 0.1314116 |
Consensus | 0 | NaN | 0 | NaN | 0 | NaN | NaN |
GLAM | 0.0236374 | 0.0367669 | 0.0145977 | 0.0155035 | 0.0402685 | 0.06 | 0.0501342 |
Improbizer | 0.0416097 | 0.0476084 | 0.0227079 | 0.0284205 | 0.0704698 | 0.0483871 | 0.0594284 |
MEME | 0.0380934 | 0.0603902 | 0.0239176 | 0.0343922 | 0.0604027 | 0.0810811 | 0.0707419 |
MEME3 | 0.0420004 | 0.0470975 | 0.0227057 | 0.0282219 | 0.0637584 | 0.0788382 | 0.0712983 |
MITRA | 0.0244188 | 0.0471342 | 0.0163484 | 0.0214655 | 0.0402685 | 0.046875 | 0.0435717 |
MotifSampler | 0.0250049 | 0.0416531 | 0.015873 | 0.0188157 | 0.0469799 | 0.0430769 | 0.0450284 |
ologo/dyad-analysis | 0.0371166 | 0.213964 | 0.0326629 | 0.0825991 | 0.0604027 | 0.15 | 0.1052013 |
QuickScore | 0.0050791 | 0.0099388 | 0.0033727 | -0.0056328 | 0 | 0 | 0 |
SeSiMCMC | 0.0459074 | 0.0279962 | 0.0176984 | 0.0134837 | 0.0671141 | 0.0630915 | 0.0651028 |
Weeder | 0.0543075 | 0.2747036 | 0.047497 | 0.1154935 | 0.1073826 | 0.2580645 | 0.1827235 |
YMF | 0.0410236 | 0.0967296 | 0.029661 | 0.0521165 | 0.0738255 | 0.080292 | 0.0770587 |
Mouse | nSn | nPPV | nPC | nCC | sSn | sPPV | sASP |
MOICA | 0.373398 | 0.265625 | 0.183729 | 0.290236 | 0.367347 | 0.720000 | 0.543673 |
AlignAce | 0.028676 | 0.0490605 | 0.0184314 | 0.0152885 | 0.0306122 | 0.0361446 | 0.0333784 |
ANN-Spec | 0.0433191 | 0.0430564 | 0.0220703 | 0.0139802 | 0.0816327 | 0.0425532 | 0.0620929 |
Consensus | 0.0488103 | 0.1062417 | 0.0346021 | 0.0531418 | 0.1020408 | 0.1219512 | 0.111996 |
GLAM | 0.0073215 | 0.0187793 | 0.0052957 | −0.0068546 | 0.0102041 | 0.0153846 | 0.0127943 |
Improbizer | 0.1079927 | 0.1219008 | 0.0607412 | 0.0894309 | 0.2244898 | 0.1605839 | 0.1925369 |
MEME | 0.0732154 | 0.1337793 | 0.0496689 | 0.0789261 | 0.1428571 | 0.175 | 0.1589286 |
MEME3 | 0.1061623 | 0.170088 | 0.0699357 | 0.1137754 | 0.1938776 | 0.2 | 0.1969388 |
MITRA | 0.0061013 | 0.015456 | 0.0043937 | −0.0090299 | 0.0204082 | 0.0327869 | 0.0265975 |
MotifSampler | 0.0445394 | 0.0913642 | 0.0308668 | 0.0441428 | 0.0816327 | 0.0952381 | 0.0884354 |
ologo/dyad-analysis | 0.0268456 | 0.1067961 | 0.0219233 | 0.03947 | 0.0612245 | 0.0952381 | 0.0782313 |
QuickScore | 0.0359976 | 0.089939 | 0.0263864 | 0.0390251 | 0.0816327 | 0.0677966 | 0.0747146 |
SeSiMCMC | 0.0622331 | 0.0408818 | 0.0252976 | 0.0145461 | 0.1020408 | 0.0952381 | 0.0986395 |
Weeder | 0.0616229 | 0.1753472 | 0.0477767 | 0.0882065 | 0.122449 | 0.1791045 | 0.1507767 |
YMF | 0.1012813 | 0.2017011 | 0.0722997 | 0.1247729 | 0.2040816 | 0.1818182 | 0.1929499 |
Yeast | nSn | nPPV | nPC | nCC | sSn | sPPV | sASP |
MOICA | 0.280165 | 0.126116 | 0.0952515 | 0.163648 | 0.283784 | 0.477273 | 0.380528 |
AlignAce | 0.1855754 | 0.1849758 | 0.1020954 | 0.1684257 | 0.28 | 0.2019231 | 0.2409615 |
ANN-Spec | 0.165316 | 0.14011099 | 0.0820595 | 0.1331542 | 0.3066667 | 0.1411043 | 0.2238855 |
Consensus | 0.0794165 | 0.2 | 0.0602706 | 0.1149074 | 0.1466667 | 0.2391304 | 0.1928986 |
GLAM | 0.0713128 | 0.0585106 | 0.0332075 | 0.0432325 | 0.1466667 | 0.0578947 | 0.1022807 |
Improbizer | 0.1572123 | 0.0950049 | 0.0629461 | 0.0988463 | 0.2666667 | 0.1333333 | 0.2 |
MEME | 0.1928687 | 0.3801917 | 0.1467324 | 0.260354 | 0.3066667 | 0.3833333 | 0.345 |
MEME3 | 0.2098865 | 0.3001159 | 0.140914 | 0.2381559 | 0.32 | 0.3037975 | 0.3118987 |
MITRA | 0.1110211 | 0.1525612 | 0.0686717 | 0.1148955 | 0.16 | 0.1538462 | 0.1569231 |
MotifSampler | 0.2560778 | 0.5039872 | 0.2045307 | 0.3501753 | 0.3866667 | 0.4915254 | 0.439096 |
ologo/dyad-analysis | 0.0899514 | 0.3303571 | 0.0760795 | 0.1639418 | 0.1866667 | 0.3043478 | 0.2455072 |
QuickScore | 0.0534846 | 0.0613953 | 0.0294249 | 0.0391636 | 0.12 | 0.0434783 | 0.0817391 |
SeSiMCMC | 0.1012966 | 0.0570255 | 0.0378673 | 0.0504601 | 0.0933333 | 0.0721649 | 0.0827491 |
Weeder | 0.2925446 | 0.5340237 | 0.2330536 | 0.3863337 | 0.52 | 0.5492958 | 0.5346479 |
YMF | 0.1442464 | 0.3296296 | 0.1115288 | 0.2076962 | 0.28 | 0.3387097 | 0.3093548 |
Predicted Motif | complexity | similarity | length | support | Method | data |
TCAACTGTAAATATAACTTAAAAAGGGAATACT | 0.84 | 0.64 | 33 | 3 | MOABC | dm04g |
GTTTGGAAGTGCTTAAATAAACTTGCAAAAAAC | 0.85 | 0.74 | 33 | 3 | MOICA | dm04g |
ATGACCCACACCACGCGCACGCATGGCCCGGCC | 0.81 | 0.61 | 33 | 4 | MOABC | hm22m |
CCACGCCAGACGGGCGTTGCAAGCCAGACCTACT | 0.83 | 0.68 | 34 | 4 | MOICA | hm22m |
AAGGCGTTGCTCAAGTGTTAAGAAAATACTGACAC | 0.87 | 0.58 | 35 | 4 | MOABC | mus03g |
TGGATAGAAGAACTACAACCTTCATGTCATACATTT | 0.87 | 0.63 | 36 | 4 | MOICA | mus03g |
ATGAAATTAAACCCAA | 0.74 | 0.59 | 16 | 4 | MOABC | yst01g |
TACCATAATTCATAGAT | 0.74 | 0.69 | 17 | 4 | MOICA | yst01g |
Variable | Definition |
n | Number of sequences |
m | Length of each sequence |
l | length of motif |
si | ith sequence with length m which is a composition of the alphabet Σ |
si[k, l] | A subsequence from si with length l that started from k |
si[k] | A single alphabet of si in position k |
xi | The starting position of an occurrence in the si and xi=0, 1, 2, …, m-l+1 |
x | An individual, x=(x1, x2, …, xn) |
fiΣ | Frequency of Σ in the ith column of the occurrence matrix |
wi | wi=fiΣ |
cwi | The corresponding alphabet for wi |
ni, j | Number of the jth alphabet from Σ in the selected sequence xi |
uxi | |
δ | |
consensus(x) | consensus(x)=cw1 cw2…cwn |
concordance(xi) | |
support(x) | |
similarity(x) | |
Complexity(xi) | |
complexity(x) |
Time(sec.) | nCol | nEmp | nPop | MaxIt | rate | Len. | Seq. | Dataset |
15 | 110 | 50 | 160 | 390 | 55% | 500 | 3 | yst05r |
15 | 110 | 50 | 160 | 390 | 55% | 500 | 4 | yst02g |
20 | 110 | 50 | 160 | 390 | 55% | 1000 | 5 | yst10m |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 6 | yst07m |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 7 | yst06g |
25 | 110 | 50 | 160 | 390 | 55% | 500 | 8 | yst03m |
25 | 110 | 50 | 160 | 390 | 55% | 1000 | 9 | yst01g |
30 | 110 | 50 | 160 | 390 | 55% | 1000 | 11 | yst08r |
30 | 100 | 35 | 135 | 330 | 55% | 1000 | 16 | yst09g |
25 | 110 | 50 | 160 | 390 | 70% | 1000 | 7 | yst04r |
25 | 110 | 50 | 160 | 390 | 70% | 500 | 8 | yst03r |
30 | 90 | 35 | 125 | 300 | 70% | 1000 | 16 | yst09g |
30 | 110 | 50 | 160 | 400 | 60% | 500 | 12 | mus11r |
15 | 110 | 50 | 160 | 400 | 75% | 1500 | 4 | mus07r |
30 | 130 | 50 | 180 | 470 | 60% | 3000 | 7 | hm16r |
40 | 110 | 50 | 160 | 400 | 60% | 2000 | 13 | hm04r |
15 | 110 | 50 | 160 | 390 | 55% | 1500 | 3 | dm07m |
15 | 120 | 50 | 170 | 420 | 80% | 2000 | 3 | dm08m |
15 | 120 | 50 | 170 | 420 | 80% | 2000 | 3 | dm03m |
15 | 110 | 50 | 160 | 390 | 55% | 2500 | 3 | dm05g |
20 | 120 | 50 | 170 | 420 | 80% | 1500 | 4 | dm01g |
20 | 120 | 50 | 170 | 420 | 80% | 2000 | 4 | dm04g |
15 | 120 | 50 | 170 | 420 | 90% | 500 | 2 | hm12r |
15 | 120 | 50 | 170 | 420 | 90% | 500 | 2 | hm25g |
15 | 120 | 50 | 170 | 420 | 90% | 1000 | 2 | hm14r |
15 | 120 | 50 | 170 | 420 | 80% | 1000 | 3 | hm05r |
20 | 120 | 50 | 170 | 420 | 80% | 500 | 4 | hm23r |
15 | 110 | 50 | 160 | 390 | 55% | 2000 | 4 | hm15r |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 5 | hm19g |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 5 | hm21g |
20 | 110 | 50 | 160 | 390 | 55% | 1000 | 5 | hm07m |
20 | 110 | 50 | 160 | 390 | 55% | 3000 | 5 | hm18m |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 6 | hm22m |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 6 | hm10m |
20 | 110 | 50 | 160 | 390 | 55% | 1000 | 6 | hm13r |
20 | 110 | 45 | 155 | 350 | 55% | 3000 | 7 | hm16g |
20 | 110 | 40 | 150 | 345 | 55% | 500 | 8 | hm24m |
20 | 110 | 40 | 150 | 345 | 55% | 1000 | 8 | hm11g |
20 | 110 | 40 | 150 | 345 | 55% | 500 | 9 | hm06g |
20 | 110 | 40 | 150 | 345 | 55% | 1000 | 9 | hm26m |
20 | 110 | 40 | 150 | 345 | 55% | 1000 | 9 | hm02r |
30 | 110 | 50 | 160 | 390 | 55% | 1500 | 10 | hm03r |
30 | 110 | 50 | 160 | 390 | 55% | 1500 | 10 | hm09g |
30 | 110 | 50 | 160 | 390 | 55% | 500 | 11 | hm17g |
30 | 100 | 40 | 140 | 340 | 55% | 2000 | 13 | hm04m |
30 | 100 | 35 | 135 | 330 | 55% | 500 | 15 | hm08m |
30 | 100 | 30 | 130 | 300 | 55% | 2000 | 18 | hm01g |
30 | 90 | 20 | 110 | 190 | 55% | 2000 | 35 | hm20r |
10 | 110 | 50 | 160 | 390 | 55% | 500 | 2 | mus09r |
15 | 110 | 50 | 160 | 390 | 55% | 500 | 3 | mus01r |
15 | 110 | 50 | 160 | 390 | 55% | 500 | 3 | mus12m |
15 | 110 | 50 | 160 | 390 | 55% | 500 | 3 | mus06g |
15 | 110 | 50 | 160 | 390 | 55% | 1500 | 3 | mus08m |
15 | 110 | 50 | 160 | 390 | 55% | 500 | 4 | mus05r |
15 | 110 | 50 | 160 | 390 | 55% | 1500 | 4 | mus07g |
20 | 110 | 50 | 160 | 390 | 55% | 500 | 5 | mus03g |
20 | 110 | 45 | 155 | 350 | 55% | 1000 | 7 | mus04m |
20 | 110 | 40 | 150 | 345 | 55% | 1000 | 9 | mus02r |
30 | 110 | 40 | 150 | 345 | 55% | 500 | 12 | mus11m |
30 | 100 | 40 | 140 | 340 | 55% | 1000 | 13 | mus10g |
20 | 110 | 50 | 160 | 450 | 95% | 2500 | 5 | dm05r |
25 | 150 | 50 | 200 | 450 | 80% | 2000 | 4 | dm04r |
15 | 110 | 50 | 160 | 400 | 75% | 1500 | 4 | dm01r |
n, m | Dataset | SPEA2 | MO-GSA | MO-FA | NSGA-II | DEPT | MO-VNS | MOABC | MOICA |
2,500 | mus09r | 93.35% | 92.92% | 90.36% | 93.46% | 90.95% | 84.79% | 92.95% | 83.12% |
0.002 | 0.005 | 0.004 | 0.001 | 0.002 | 0.061 | 0.005 | 0.002 | ||
2,500 | hm12r | 94.59% | 93.88% | 91.14% | 94.71% | 91.50% | 90.69% | 94.29% | 85.12% |
0.007 | 0.005 | 0.006 | 0.002 | 0.005 | 0.034 | 0.005 | 0.002 | ||
2,500 | hm25g | 93.82% | 93.20% | 90.24% | 93.94% | 91.30% | 79.92% | 93.66% | 81.93% |
0.003 | 0.011 | 0.007 | 0.01 | 0.008 | 0.062 | 0.007 | 0.003 | ||
2, 1000 | hm14r | 93.96% | 94.11% | 91.03% | 94.40% | 91.40% | 86.77% | 93.88% | 84.07% |
0.003 | 0.003 | 0.006 | 0.004 | 0.003 | 0.062 | 0.006 | 0.002 | ||
3,500 | yst05r | 85.66% | 81.29% | 83.63% | 85.96% | 80.76% | 86.61% | 86.89% | 82.32% |
0.004 | 0.02 | 0.014 | 0.01 | 0.021 | 0.007 | 0.007 | 0.004 | ||
3,500 | mus01r | 85.65% | 80.50% | 84.44% | 85.26% | 80.40% | 85.90% | 86.93% | 80.28% |
0.004 | 0.022 | 0.017 | 0.011 | 0.017 | 0.007 | 0.006 | 0.004 | ||
3,500 | mus12m | 84.96% | 80.78% | 82.72% | 84.19% | 80.04% | 85.41% | 85.78% | 80.46% |
0.004 | 0.018 | 0.01 | 0.012 | 0.018 | 0.006 | 0.006 | 0.004 | ||
3,500 | mus06g | 83.46% | 79.94% | 81.91% | 83.35% | 80.01% | 84.21% | 84.33% | 79.60% |
0.003 | 0.017 | 0.008 | 0.01 | 0.02 | 0.007 | 0.006 | 0.003 | ||
3, 1000 | hm05r | 85.10% | 80.32% | 82.73% | 85.19% | 80.46% | 85.41% | 85.50% | 82.38% |
0.005 | 0.015 | 0.009 | 0.012 | 0.019 | 0.007 | 0.009 | 0.005 | ||
3, 1500 | dm07m | 85.19% | 80.42% | 83.53% | 85.01% | 81.01% | 85.87% | 86.09% | 81.65% |
0.006 | 0.02 | 0.01 | 0.008 | 0.015 | 0.007 | 0.011 | 0.004 | ||
3, 1500 | mus08m | 84.60% | 80.48% | 83.10% | 84.52% | 80.85% | 85.11% | 85.77% | 80.71% |
0.004 | 0.023 | 0.01 | 0.01 | 0.01 | 0.007 | 0.008 | 0.005 | ||
3, 2000 | dm08m | 85.17% | 80.94% | 83.25% | 85.90% | 81.30% | 85.75% | 85.96% | 81.81% |
0.004 | 0.017 | 0.01 | 0.007 | 0.015 | 0.007 | 0.008 | 0.003 | ||
3, 2000 | dm03m | 85.16% | 80.75% | 83.52% | 84.56% | 81.38% | 85.56% | 86.06% | 81.29% |
0.006 | 0.021 | 0.011 | 0.007 | 0.017 | 0.007 | 0.011 | 0.003 | ||
3, 2500 | dm05g | 85.59% | 80.30% | 83.67% | 86.69% | 81.55% | 85.88% | 86.39% | 82.18% |
0.005 | 0.021 | 0.012 | 0.005 | 0.014 | 0.008 | 0.01 | 0.003 | ||
4,500 | yst02g | 82.28% | 78.78% | 80.82% | 79.81% | 78.21% | 82.50% | 82.87% | 78.80% |
0.005 | 0.012 | 0.011 | 0.007 | 0.016 | 0.007 | 0.01 | 0.003 | ||
4,500 | hm23r | 81.18% | 78.23% | 80.89% | 79.51% | 78.13% | 81.57% | 82.38% | 77.16% |
0.005 | 0.013 | 0.011 | 0.007 | 0.018 | 0.007 | 0.007 | 0.002 | ||
4,500 | mus05r | 80.44% | 78.08% | 80.25% | 79.81% | 77.56% | 81.45% | 81.69% | 77.88% |
0.004 | 0.013 | 0.008 | 0.006 | 0.019 | 0.006 | 0.008 | 0.003 | ||
4, 1500 | mus07g | 87.73% | 79.85% | 82.43% | 90.43% | 80.01% | 89.08% | 89.21% | 85.55% |
0.01 | 0.018 | 0.015 | 0.022 | 0.034 | 0.015 | 0.029 | 0.013 | ||
4, 1500 | dm01g | 81.74% | 79.03% | 81.62% | 83.77% | 79.66% | 82.95% | 83.24% | 80.91% |
0.004 | 0.015 | 0.013 | 0.01 | 0.017 | 0.006 | 0.008 | 0.004 | ||
4, 2000 | dm04g | 81.08% | 79.31% | 81.91% | 81.16% | 79.78% | 82.90% | 83.77% | 80.34% |
0.005 | 0.014 | 0.014 | 0.006 | 0.018 | 0.008 | 0.014 | 0.003 | ||
4, 2000 | hm15r | 85.93% | 80.23% | 83.98% | 85.23% | 80.29% | 85.23% | 88.59% | 82.80% |
0.009 | 0.025 | 0.042 | 0.022 | 0.022 | 0.016 | 0.027 | 0.01 | ||
5,500 | hm19g | 79.74% | 76.81% | 80.08% | 79.33% | 78.71% | 80.35% | 81.37% | 76.76% |
0.004 | 0.019 | 0.009 | 0.007 | 0.022 | 0.007 | 0.009 | 0.003 | ||
5,500 | mus03g | 77.31% | 74.58% | 79.67% | 76.48% | 77.61% | 77.39% | 79.84% | 72.74% |
0.004 | 0.044 | 0.012 | 0.012 | 0.016 | 0.018 | 0.008 | 0.003 | ||
5, 1000 | yst10m | 65.38% | 63.39% | 64.99% | 65.24% | 63.49% | 66.08% | 66.67% | 75.53% |
0.003 | 0.01 | 0.009 | 0.004 | 0.014 | 0.005 | 0.007 | 0.003 | ||
5, 1000 | hm21g | 78.61% | 77.01% | 79.49% | 78.24% | 78.72% | 79.76% | 81.12% | 75.83% |
0.004 | 0.019 | 0.01 | 0.008 | 0.012 | 0.007 | 0.01 | 0.002 | ||
5, 1000 | hm07m | 78.71% | 77.37% | 79.91% | 79.39% | 78.91% | 79.66% | 81.24% | 74.87% |
0.004 | 0.017 | 0.01 | 0.005 | 0.017 | 0.006 | 0.009 | 0.003 | ||
5, 3000 | hm18m | 78.58% | 77.39% | 79.97% | 77.58% | 79.80% | 79.38% | 81.31% | 75.16% |
0.004 | 0.023 | 0.009 | 0.004 | 0.014 | 0.007 | 0.008 | 0.003 | ||
6,500 | yst07m | 75.92% | 70.49% | 77.36% | 74.57% | 76.81% | 74.22% | 78.27% | 72.27% |
0.005 | 0.038 | 0.008 | 0.013 | 0.012 | 0.032 | 0.008 | 0.003 | ||
6,500 | hm22m | 75.53% | 69.87% | 76.84% | 75.02% | 76.98% | 74.12% | 77.91% | 72.22% |
0.005 | 0.047 | 0.009 | 0.03 | 0.011 | 0.028 | 0.007 | 0.003 | ||
6,500 | hm10m | 75.22% | 69.01% | 77.54% | 74.61% | 77.11% | 72.36% | 78.23% | 72.06% |
0.005 | 0.038 | 0.009 | 0.02 | 0.014 | 0.034 | 0.011 | 0.003 | ||
6, 1000 | hm13r | 75.19% | 70.27% | 77.35% | 76.08% | 77.24% | 72.88% | 78.18% | 72.93% |
0.005 | 0.04 | 0.008 | 0.01 | 0.008 | 0.031 | 0.01 | 0.003 | ||
7,500 | yst06g | 73.19% | 68.05% | 75.75% | 73.84% | 74.94% | 72.04% | 75.93% | 74.10% |
0.005 | 0.03 | 0.014 | 0.011 | 0.018 | 0.028 | 0.012 | 0.002 | ||
7, 1000 | yst04r | 71.43% | 68.20% | 73.98% | 77.04% | 73.50% | 72.88% | 75.54% | 75.30% |
0.006 | 0.034 | 0.014 | 0.007 | 0.02 | 0.028 | 0.012 | 0.003 | ||
7, 1000 | mus04m | 69.43% | 63.99% | 74.24% | 67.86% | 74.31% | 65.49% | 71.10% | 70.02% |
0.02 | 0.03 | 0.01 | 0.012 | 0.012 | 0.021 | 0.023 | 0.003 | ||
7, 3000 | hm16g | 71.13% | 69.12% | 83.63% | 71.82% | 78.42% | 70.36% | 81.55% | 76.90% |
0.023 | 0.043 | 0.051 | 0.015 | 0.044 | 0.026 | 0.055 | 0.015 | ||
8,500 | yst03m | 68.37% | 62.59% | 72.65% | 66.44% | 73.26% | 65.03% | 69.68% | 69.97% |
0.011 | 0.038 | 0.01 | 0.011 | 0.012 | 0.026 | 0.024 | 0.004 | ||
8,500 | hm24m | 68.28% | 59.58% | 72.19% | 65.10% | 72.91% | 62.65% | 67.65% | 69.08% |
0.013 | 0.036 | 0.008 | 0.011 | 0.012 | 0.028 | 0.019 | 0.003 | ||
8, 1000 | hm11g | 69.34% | 60.14% | 73.15% | 69.38% | 74.54% | 63.86% | 69.93% | 72.17% |
0.012 | 0.044 | 0.017 | 0.031 | 0.021 | 0.028 | 0.021 | 0.005 | ||
9,500 | hm06g | 61.24% | 56.60% | 69.37% | 60.06% | 70.68% | 59.43% | 63.21% | 69.52% |
0.012 | 0.029 | 0.012 | 0.015 | 0.027 | 0.019 | 0.019 | 0.003 | ||
9, 1000 | yst01g | 63.16% | 59.37% | 70.14% | 62.75% | 70.72% | 61.48% | 65.16% | 69.94% |
0.013 | 0.024 | 0.011 | 0.013 | 0.014 | 0.021 | 0.025 | 0.004 | ||
9, 1000 | hm26m | 62.28% | 58.89% | 70.48% | 60.86% | 70.68% | 60.98% | 64.43% | 68.51% |
0.015 | 0.031 | 0.012 | 0.013 | 0.016 | 0.025 | 0.025 | 0.004 | ||
9, 1000 | hm02r | 59.09% | 57.57% | 69.88% | 61.50% | 70.85% | 59.04% | 63.45% | 68.68% |
0.013 | 0.024 | 0.014 | 0.014 | 0.023 | 0.022 | 0.033 | 0.006 | ||
9, 1000 | mus02r | 60.62% | 57.21% | 69.78% | 61.99% | 70.02% | 59.34% | 63.58% | 69.08% |
0.012 | 0.025 | 0.015 | 0.013 | 0.019 | 0.025 | 0.024 | 0.005 | ||
10, 1500 | hm03r | 53.77% | 58.44% | 69.06% | 51.59% | 65.03% | 58.75% | 60.18% | 69.23% |
0.017 | 0.021 | 0.02 | 0.038 | 0.03 | 0.014 | 0.026 | 0.005 | ||
10, 1500 | hm09g | 54.79% | 57.51% | 68.71% | 50.71% | 65.20% | 57.72% | 59.23% | 68.28% |
0.013 | 0.022 | 0.023 | 0.043 | 0.033 | 0.017 | 0.027 | 0.005 | ||
11,500 | hm17g | 52.47% | 54.40% | 65.86% | 49.94% | 64.12% | 55.12% | 55.55% | 66.08% |
0.012 | 0.024 | 0.03 | 0.037 | 0.033 | 0.019 | 0.023 | 0.006 | ||
11, 1000 | yst08r | 57.96% | 61.24% | 69.12% | 64.92% | 67.80% | 63.03% | 62.91% | 71.15% |
0.014 | 0.03 | 0.019 | 0.028 | 0.048 | 0.03 | 0.023 | 0.006 | ||
12,500 | mus11m | 48.87% | 49.51% | 62.01% | 45.32% | 58.85% | 49.54% | 51.25% | 62.98% |
0.014 | 0.031 | 0.014 | 0.043 | 0.039 | 0.031 | 0.02 | 0.007 | ||
13, 2000 | hm04m | 47.89% | 52.18% | 62.17% | 45.03% | 60.90% | 52.21% | 53.20% | 63.88% |
0.011 | 0.029 | 0.018 | 0.029 | 0.03 | 0.022 | 0.024 | 0.005 | ||
13, 1000 | mus10g | 43.94% | 47.05% | 59.25% | 41.73% | 55.34% | 46.22% | 48.92% | 62.64% |
0.01 | 0.032 | 0.016 | 0.036 | 0.039 | 0.03 | 0.021 | 0.005 | ||
15,500 | hm08m | 43.45% | 46.98% | 58.07% | 42.63% | 57.15% | 46.81% | 48.57% | 61.25% |
0.011 | 0.037 | 0.015 | 0.028 | 0.046 | 0.037 | 0.023 | 0.009 | ||
16, 1000 | yst09g | 42.34% | 46.50% | 56.86% | 42.68% | 55.70% | 45.74% | 47.92% | 62.11% |
0.011 | 0.029 | 0.021 | 0.022 | 0.056 | 0.037 | 0.022 | 0.007 | ||
18, 2000 | hm01g | 34.55% | 37.95% | 49.56% | 33.92% | 42.98% | 34.40% | 42.05% | 59.96% |
0.009 | 0.042 | 0.065 | 0.027 | 0.054 | 0.029 | 0.033 | 0.01 | ||
35, 2000 | hm20r | 21.91% | 25.68% | 36.43% | 22.89% | 25.33% | 22.67% | 25.08% | 46.70% |
0.006 | 0.02 | 0.065 | 0.014 | 0.026 | 0.027 | 0.02 | 0.016 |
MOICA | nTP | nFP | nFN | nTN | sTP | sFP | sFN |
Fly | 177 | 655 | 494 | 41674 | 11 | 7 | 40 |
Human | 1092 | 4860 | 4027 | 277021 | 53 | 49 | 245 |
Mouse | 612 | 1692 | 1027 | 52169 | 36 | 14 | 62 |
Yeast | 339 | 2349 | 871 | 57441 | 21 | 23 | 53 |
Fly | nSn | nPPV | nPC | nCC | sSn | sPPV | sASP |
MOICA | 0.263785 | 0.21274 | 0.133484 | 0.223421 | 0.215686 | 0.611111 | 0.413399 |
AlignAce | 0 | 0 | 0 | −0.0063762 | 0 | 0 | 0 |
ANN-Spec | 0.0253353 | 0.0175258 | 0.010468 | 0.0023548 | 0.0196078 | 0.009434 | 0.0145209 |
Consensus | 0 | 0 | 0 | −0.0110554 | 0 | 0 | 0 |
GLAM | 0.0029806 | 0.004902 | 0.001857 | −0.0084518 | 0 | 0 | 0 |
Improbizer | 00149031 | 0.0176056 | 0.0081367 | 0.0018679 | 0.0196078 | 0.0227273 | 0.0211676 |
MEME | 0.0417288 | 0.0421053 | 0.0214067 | 0.0267982 | 0.0588235 | 0.0555556 | 0.0571895 |
MEME3 | 0.0372578 | 0.0263713 | 0.0156838 | 0.0130431 | 0.0588235 | 0.0447761 | 0.0517998 |
MITRA | 0 | 0 | 0 | −0.0078616 | 0 | 0 | 0 |
MotifSampler | 0.0044709 | 0.0082192 | 0.0029042 | −0.0055135 | 0 | 0 | 0 |
ologo/dyad-analysis | 0 | 0 | 0 | −0.0149773 | 0 | 0 | 0 |
QuickScore | 0 | 0 | 0 | −0.0157195 | 0 | 0 | 0 |
SeSiMCMC | 0.1013413 | 0.0538827 | 0.0364611 | 0.0537034 | 0.0980392 | 0.125 | 0.1115196 |
Weeder | 0.0119225 | 0.0344828 | 0.0089385 | 0.0112184 | 0.0196078 | 0.0344828 | 0.0270453 |
YMF | 0 | 0 | 0 | −0.0138211 | 0 | 0 | 0 |
Human | nSn | nPPV | nPC | nCC | sSn | sPPV | sASP |
MOICA | 0.213323 | 0.183468 | 0.10943 | 0.182113 | 0.177852 | 0.519608 | 0.34873 |
AlignAce | 0.0392655 | 0.102551 | 0.0292236 | 0.0531478 | 0.0738255 | 0.1235955 | 0.0987105 |
ANN-Spec | 0.090252 | 0.1031711 | 0.0505747 | 0.0812784 | 0.1644295 | 0.0983936 | 0.1314116 |
Consensus | 0 | NaN | 0 | NaN | 0 | NaN | NaN |
GLAM | 0.0236374 | 0.0367669 | 0.0145977 | 0.0155035 | 0.0402685 | 0.06 | 0.0501342 |
Improbizer | 0.0416097 | 0.0476084 | 0.0227079 | 0.0284205 | 0.0704698 | 0.0483871 | 0.0594284 |
MEME | 0.0380934 | 0.0603902 | 0.0239176 | 0.0343922 | 0.0604027 | 0.0810811 | 0.0707419 |
MEME3 | 0.0420004 | 0.0470975 | 0.0227057 | 0.0282219 | 0.0637584 | 0.0788382 | 0.0712983 |
MITRA | 0.0244188 | 0.0471342 | 0.0163484 | 0.0214655 | 0.0402685 | 0.046875 | 0.0435717 |
MotifSampler | 0.0250049 | 0.0416531 | 0.015873 | 0.0188157 | 0.0469799 | 0.0430769 | 0.0450284 |
ologo/dyad-analysis | 0.0371166 | 0.213964 | 0.0326629 | 0.0825991 | 0.0604027 | 0.15 | 0.1052013 |
QuickScore | 0.0050791 | 0.0099388 | 0.0033727 | -0.0056328 | 0 | 0 | 0 |
SeSiMCMC | 0.0459074 | 0.0279962 | 0.0176984 | 0.0134837 | 0.0671141 | 0.0630915 | 0.0651028 |
Weeder | 0.0543075 | 0.2747036 | 0.047497 | 0.1154935 | 0.1073826 | 0.2580645 | 0.1827235 |
YMF | 0.0410236 | 0.0967296 | 0.029661 | 0.0521165 | 0.0738255 | 0.080292 | 0.0770587 |
Mouse | nSn | nPPV | nPC | nCC | sSn | sPPV | sASP |
MOICA | 0.373398 | 0.265625 | 0.183729 | 0.290236 | 0.367347 | 0.720000 | 0.543673 |
AlignAce | 0.028676 | 0.0490605 | 0.0184314 | 0.0152885 | 0.0306122 | 0.0361446 | 0.0333784 |
ANN-Spec | 0.0433191 | 0.0430564 | 0.0220703 | 0.0139802 | 0.0816327 | 0.0425532 | 0.0620929 |
Consensus | 0.0488103 | 0.1062417 | 0.0346021 | 0.0531418 | 0.1020408 | 0.1219512 | 0.111996 |
GLAM | 0.0073215 | 0.0187793 | 0.0052957 | −0.0068546 | 0.0102041 | 0.0153846 | 0.0127943 |
Improbizer | 0.1079927 | 0.1219008 | 0.0607412 | 0.0894309 | 0.2244898 | 0.1605839 | 0.1925369 |
MEME | 0.0732154 | 0.1337793 | 0.0496689 | 0.0789261 | 0.1428571 | 0.175 | 0.1589286 |
MEME3 | 0.1061623 | 0.170088 | 0.0699357 | 0.1137754 | 0.1938776 | 0.2 | 0.1969388 |
MITRA | 0.0061013 | 0.015456 | 0.0043937 | −0.0090299 | 0.0204082 | 0.0327869 | 0.0265975 |
MotifSampler | 0.0445394 | 0.0913642 | 0.0308668 | 0.0441428 | 0.0816327 | 0.0952381 | 0.0884354 |
ologo/dyad-analysis | 0.0268456 | 0.1067961 | 0.0219233 | 0.03947 | 0.0612245 | 0.0952381 | 0.0782313 |
QuickScore | 0.0359976 | 0.089939 | 0.0263864 | 0.0390251 | 0.0816327 | 0.0677966 | 0.0747146 |
SeSiMCMC | 0.0622331 | 0.0408818 | 0.0252976 | 0.0145461 | 0.1020408 | 0.0952381 | 0.0986395 |
Weeder | 0.0616229 | 0.1753472 | 0.0477767 | 0.0882065 | 0.122449 | 0.1791045 | 0.1507767 |
YMF | 0.1012813 | 0.2017011 | 0.0722997 | 0.1247729 | 0.2040816 | 0.1818182 | 0.1929499 |
Yeast | nSn | nPPV | nPC | nCC | sSn | sPPV | sASP |
MOICA | 0.280165 | 0.126116 | 0.0952515 | 0.163648 | 0.283784 | 0.477273 | 0.380528 |
AlignAce | 0.1855754 | 0.1849758 | 0.1020954 | 0.1684257 | 0.28 | 0.2019231 | 0.2409615 |
ANN-Spec | 0.165316 | 0.14011099 | 0.0820595 | 0.1331542 | 0.3066667 | 0.1411043 | 0.2238855 |
Consensus | 0.0794165 | 0.2 | 0.0602706 | 0.1149074 | 0.1466667 | 0.2391304 | 0.1928986 |
GLAM | 0.0713128 | 0.0585106 | 0.0332075 | 0.0432325 | 0.1466667 | 0.0578947 | 0.1022807 |
Improbizer | 0.1572123 | 0.0950049 | 0.0629461 | 0.0988463 | 0.2666667 | 0.1333333 | 0.2 |
MEME | 0.1928687 | 0.3801917 | 0.1467324 | 0.260354 | 0.3066667 | 0.3833333 | 0.345 |
MEME3 | 0.2098865 | 0.3001159 | 0.140914 | 0.2381559 | 0.32 | 0.3037975 | 0.3118987 |
MITRA | 0.1110211 | 0.1525612 | 0.0686717 | 0.1148955 | 0.16 | 0.1538462 | 0.1569231 |
MotifSampler | 0.2560778 | 0.5039872 | 0.2045307 | 0.3501753 | 0.3866667 | 0.4915254 | 0.439096 |
ologo/dyad-analysis | 0.0899514 | 0.3303571 | 0.0760795 | 0.1639418 | 0.1866667 | 0.3043478 | 0.2455072 |
QuickScore | 0.0534846 | 0.0613953 | 0.0294249 | 0.0391636 | 0.12 | 0.0434783 | 0.0817391 |
SeSiMCMC | 0.1012966 | 0.0570255 | 0.0378673 | 0.0504601 | 0.0933333 | 0.0721649 | 0.0827491 |
Weeder | 0.2925446 | 0.5340237 | 0.2330536 | 0.3863337 | 0.52 | 0.5492958 | 0.5346479 |
YMF | 0.1442464 | 0.3296296 | 0.1115288 | 0.2076962 | 0.28 | 0.3387097 | 0.3093548 |
Predicted Motif | complexity | similarity | length | support | Method | data |
TCAACTGTAAATATAACTTAAAAAGGGAATACT | 0.84 | 0.64 | 33 | 3 | MOABC | dm04g |
GTTTGGAAGTGCTTAAATAAACTTGCAAAAAAC | 0.85 | 0.74 | 33 | 3 | MOICA | dm04g |
ATGACCCACACCACGCGCACGCATGGCCCGGCC | 0.81 | 0.61 | 33 | 4 | MOABC | hm22m |
CCACGCCAGACGGGCGTTGCAAGCCAGACCTACT | 0.83 | 0.68 | 34 | 4 | MOICA | hm22m |
AAGGCGTTGCTCAAGTGTTAAGAAAATACTGACAC | 0.87 | 0.58 | 35 | 4 | MOABC | mus03g |
TGGATAGAAGAACTACAACCTTCATGTCATACATTT | 0.87 | 0.63 | 36 | 4 | MOICA | mus03g |
ATGAAATTAAACCCAA | 0.74 | 0.59 | 16 | 4 | MOABC | yst01g |
TACCATAATTCATAGAT | 0.74 | 0.69 | 17 | 4 | MOICA | yst01g |