Effective method for detecting error causes from incoherent biological ontologies

Yu Zhang; Haitao Wu; Jinfeng Gao; Yongtao Zhang; Ruxian Yao; Yuxiang Zhu; Yu Zhang; Haitao Wu; Jinfeng Gao; Yongtao Zhang; Ruxian Yao; Yuxiang Zhu

doi:10.3934/mbe.2022349

Mathematical Biosciences and Engineering

2022, Volume 19, Issue 7: 7388-7409. doi: 10.3934/mbe.2022349

Previous Article Next Article

Research article Special Issues

Effective method for detecting error causes from incoherent biological ontologies

1.
College of Information Engineering, Huanghuai University, Zhumadian 463000, China
2.
Henan Key Laboratory of Smart Lighting, Zhumadian 463000, China
3.
Henan Joint International Research Laboratory of Behavior Optimization Control for Smart Robots, Zhumadian 463000, China
4.
Department of Information and Electronic Engineering, Shangqiu Institute of Technology, Shangqiu 476000, China

Academic Editor: Xiangtao Li

Received: 05 December 2021 Revised: 02 April 2022 Accepted: 08 April 2022 Published: 19 May 2022

Computing the minimal axiom sets (MinAs) for an unsatisfiable class is an important task in incoherent ontology debugging. Ddebugging ontologies based on patterns (DOBP) is a pattern-based debugging method that uses a set of heuristic strategies based on four patterns. Each pattern is represented as a directed graph and the depth-first search strategy is used to find the axiom paths relevant to the MinAs of the unsatisfiable class. However, DOBP is inefficient when a debugging large incoherent ontology with a lot of unsatisfiable classes. To solve the problem, we first extract a module responsible for the erroneous classes and then compute the MinAs based on the extracted module. The basic idea of module extraction is that rather than computing MinAs based on the original ontology $\mathcal{O}$ , they are computed based on a module $\mathcal{M}$ extracted from $\mathcal{O}$ . $\mathcal{M}$ provides a smaller search space than $\mathcal{O}$ because $\mathcal{M}$ is considerably smaller than $\mathcal{O}$ . The experimental results on biological ontologies show that the module extracted using the Module-DOBP method is smaller than the original ontology. Lastly, our proposed approach optimized with the module extraction algorithm is more efficient than the DOBP method both for large-scale ontologies and numerous unsatisfiable classes.

Keywords:

Citation: Yu Zhang, Haitao Wu, Jinfeng Gao, Yongtao Zhang, Ruxian Yao, Yuxiang Zhu. Effective method for detecting error causes from incoherent biological ontologies[J]. Mathematical Biosciences and Engineering, 2022, 19(7): 7388-7409. doi: 10.3934/mbe.2022349

Related Papers:

[1]	Keruo Jiang, Zhen Huang, Xinyan Zhou, Chudong Tong, Minjie Zhu, Heshan Wang . Deep belief improved bidirectional LSTM for multivariate time series forecasting. Mathematical Biosciences and Engineering, 2023, 20(9): 16596-16627. doi: 10.3934/mbe.2023739
[2]	Yufeng Qian . Exploration of machine algorithms based on deep learning model and feature extraction. Mathematical Biosciences and Engineering, 2021, 18(6): 7602-7618. doi: 10.3934/mbe.2021376
[3]	Long Wen, Liang Gao, Yan Dong, Zheng Zhu . A negative correlation ensemble transfer learning method for fault diagnosis based on convolutional neural network. Mathematical Biosciences and Engineering, 2019, 16(5): 3311-3330. doi: 10.3934/mbe.2019165
[4]	Jianhua Jia, Lulu Qin, Rufeng Lei . DGA-5mC: A 5-methylcytosine site prediction model based on an improved DenseNet and bidirectional GRU method. Mathematical Biosciences and Engineering, 2023, 20(6): 9759-9780. doi: 10.3934/mbe.2023428
[5]	Jianhua Jia, Mingwei Sun, Genqiang Wu, Wangren Qiu . DeepDN_iGlu: prediction of lysine glutarylation sites based on attention residual learning method and DenseNet. Mathematical Biosciences and Engineering, 2023, 20(2): 2815-2830. doi: 10.3934/mbe.2023132
[6]	Yutao Wang, Qian Shao, Shuying Luo, Randi Fu . Development of a nomograph integrating radiomics and deep features based on MRI to predict the prognosis of high grade Gliomas. Mathematical Biosciences and Engineering, 2021, 18(6): 8084-8095. doi: 10.3934/mbe.2021401
[7]	Shuai Cao, Biao Song . Visual attentional-driven deep learning method for flower recognition. Mathematical Biosciences and Engineering, 2021, 18(3): 1981-1991. doi: 10.3934/mbe.2021103
[8]	Honglei Wang, Wenliang Zeng, Xiaoling Huang, Zhaoyang Liu, Yanjing Sun, Lin Zhang . MTTLm⁶A: A multi-task transfer learning approach for base-resolution mRNA m⁶A site prediction based on an improved transformer. Mathematical Biosciences and Engineering, 2024, 21(1): 272-299. doi: 10.3934/mbe.2024013
[9]	Pingping Sun, Yongbing Chen, Bo Liu, Yanxin Gao, Ye Han, Fei He, Jinchao Ji . DeepMRMP: A new predictor for multiple types of RNA modification sites using deep learning. Mathematical Biosciences and Engineering, 2019, 16(6): 6231-6241. doi: 10.3934/mbe.2019310
[10]	H. Swapnarekha, Janmenjoy Nayak, H. S. Behera, Pandit Byomakesha Dash, Danilo Pelusi . An optimistic firefly algorithm-based deep learning approach for sentiment analysis of COVID-19 tweets. Mathematical Biosciences and Engineering, 2023, 20(2): 2382-2407. doi: 10.3934/mbe.2023112

Abstract

1. Introduction

Plant virus diseases have brought great losses to agriculture. RNA interference (RNAi) attracts more and more attention as one important mechanism of plant resistance to viruses ^[1]. There are mainly three types of key proteins in RNAi: Dicer-like (DCL), RNA-dependent RNA polymerase (PDR) and Argonaute (AGO) ^[2,3,4]. The main process is that: (1) DCL cuts double strand RNA (dsRNA) into primary small interference RNA (siRNA); (2) PDR reconstitutes siRNA into dsRNA, and then cuts the newly synthesized dsRNA into more secondary siRNA; (3) AGO is combined with siRNA to form RNA silencing complex (RISC) ^[5]. RNAi can cut the RISC, target and ultimately degrade virus or RNA nucleic acid sequence through complementary base pairs. SiRNAs, in the size range of 21–24 nucleotides, mediate RNAi and play the most important mechanism in the whole process of RNAi ^[6]. The main activity of siRNAs is the negative regulation of specific mRNAs or gene expression through target degradation, translational repression, or directing chromatin modification ^[7,8].

Phasic small interfering RNAs (phasiRNAs) are plant secondary siRNAs that typically produced by miRNAs targeting polyadenylated mRNAs ^[9]. A growing number of studies have shown that miRNA-initiated phasiRNAs play crucial roles in regulating plant growth and stress responses ^[10,11,12]. Substantial analyses in genome and small RNA (sRNA) sequence enhanced the annotations of sRNAs, notably phasiRNAs as well as their targets ^[13]; therefore relevant databases have been established in succession. Recently, Liu et al. ^[14] established a database named TarDB that contained 62,888 cross-species conserved miRNA targets, 4304 degradome PARE-seq supported miRNA targets and 3182 miRNA triggered phasiRNA loci.

Given the importance of phasiRNA in plant-pathogen interactions, we proposed an efficient deep learning based predictor, named DIGITAL, for identifying miRNA-triggered phasiRNA loci. We collected experimental verified duplex mRNA and phasiRNAs from TarDB database, and generated the negative dataset by randomly substituting a certain number of nucleotides in positive samples. The key architecture of DIGITAL consists of a multi-scale residual network (multi-scale ResNet) and a bi-directional long-short term memory (bi-LSTM) network. Consequently, when tested on two independent test sets of 21-nt and 24-nt phasiRNAs, DIGITAL reached the accuracy of 98.45% and 94.02%, respectively, which proves its good robustness and generalization ability.

2. Materials and methods

2.1. Overall framework

Figure 1 illustrates the overall design of DIGITAL. The input layer transforms each nucleic acid into a four-dimensional binary vector by one-hot encoding, which means A, C, G and T are represented as (1 0 0 0), (0 1 0 0), (0 0 1 0) and (0 0 0 1), respectively. To get the feature vectors with the same dimension, we use the way of supplementing 0. Then a deep residual block formed by multi-scale CNN layers is employed to extract local relevant features in input vectors; besides, the bi-directional long-short term memory (bi-LSTM) network is implemented to explore long-range global contextual information. Finally, the resultant latent information is integrated through a flattened layer, and a following fully connected layer with softmax is adopted for label classification.

Figure 1. The overall framework of DIGITAL.

DownLoad: Full-Size Img PowerPoint

2.2. Data processing

We collected the siRNA sequence information from the TarDB database. ^[14] This database contains three categories of relatively high-confidence plant miRNA targets: (i) cross-species conserved miRNA targets; (ii) degradome/PARE (Parallel Analysis of RNA Ends) sequencing supported miRNA targets; (iii) miRNA-triggered phasiRNA loci. However, only the miRNA-triggered phasiRNAs were used to construct our prediction model, because they have been identified by previous well-documented criteria ^{[15,16,17,18]}.

The TarDB platform deposits both 21-nt and 24-nt phasiRNA in various plants. We obtained 6389 miRNA-phasiRNA target duplex in which miRNA triggered 21-nt phasiRNA, as well as 526 miRNA-phasiRNA target duplex in which miRNA triggered 24-nt phasiRNA in 43 plant species. After removing the repetitive miRNA-target pair, there are 5,408 duplex data left for miRNA-initiated 21-nt phasiRNAs, altogether with 443 duplex data for miRNA-initiated 24-nt phasiRNA, as positive samples.

The approach to constructing corresponding negative dataset is similar to the method proposed by Mhaned Oubounyt et al. ^[19], based on the fact that positive and negative sets with less intersection are easier to distinguish ^[20]. In detail, each positive sequence is divided into multiple 1bp long fragments, and 60% of the fragments are selected and replaced randomly, with the remaining 40% conserved. In this approach, each negative sequence is generated from a positive sequence, and they are equal in length. Also, the number of negative data generated by this process is equivalent to that of positive data.

In addition, the miRNA dataset that initiates 21-nt phasiRNAs is further divided into three subsets, including the training dataset (60% of the original dataset), the validation dataset (20% of the original dataset) and the independent test dataset (20% of the original dataset, denoted as dataset test_21), where the training set is used to train the classifier, the validation set is used to optimize hyper-parameters and the independent test set is used to evaluate the performance of DIGITAL. The miRNA dataset that initiates 24-nt phasiRNAs is also used as an independent test set to evaluate the performance of DIGITAL, denoted as dataset test_24. The statistics of each dataset are shown in Table 1.

Table 1. The statistics of datasets.

Dataset	Positive	Negative
Training	3244	3244
Validation	1082	1082
Test_21	1082	1082
Test_24	443	443

| Show Table

DownLoad: CSV

2.3. Establishment and curation of prediction model

Fundamental structures in DIGITAL are a multi-scale ResNet network and a bi-LSTM architecture, which have been used by some researches ^[21,22,23]. Compared with the traditional CNN, the residual network improves the interaction of information, and avoids the gradient disappearance and degradation problems caused by network depth. So we used multi-scale ResNet network with identity mapping. At the same time, in order to extract long-term global context information, we combined multi-scale ResNet network and BiLSTM. Details are as follows.

The multi-scale ResNet network includes three channels of 1-dimension CNN with 64 convolution filters. Among them, the first channel contains one convolution layer, and the size of the convolution kernel is fixed to 1; the second channel employs two convolution layers, with kernels in size 1 and 3, respectively; the third channel uses three convolution layers, and the sizes of the corresponding convolution kernel are set as 1, 5 and 5, respectively. The bi-LSTM with a self-attention network consists of 121 hidden units, followed by a fully-connected layer with 16 units. The Adam optimizer with a batch size of 110 simultaneously trains all layers in our model, and the learning rate scheduler in Keras is employed to regulate the learning rate. Early stopping is applied based on validation loss. To provide insight into the training process of DIGITAL, the average validation loss and accuracy change during training are shown in Supplementary Figure S1.

2.4. Performance evaluation

We evaluate DIGITAL based on four most common metrics, containing sensitivity (Sn), specificity (Sp), accuracy (Acc), and Matthew's correlation coefficient (MCC). The formulas are listed as below:

$\left\{ \begin{array}{l} Sp = \frac{{TN}}{{TN + FP}} \hfill \\ Sn = \frac{{TP}}{{FN + TP}} \hfill \\ Acc = \frac{{TP + TN}}{{TP + TN + FN + FP}} \hfill \\ Mcc = \frac{{TP \times TN - FP \times FN}}{{\sqrt {(TP + FN)(FP + TN)(TP + FP)(FN + TN)} }} \hfill \\ \end{array} \right.$

(1)

where TP, TN, FP and FN represent the number of true positives, true negatives, false positives and false negatives, respectively. In addition, the area under the receiver operating characteristic curve (AUC) is also used to examine the performance of DIGITAL.

3. Results

In this study, we proposed a deep learning model, named DIGITAL, based on multi-scale ResNet network and bi-LSTM to predict miRNA-triggered phasiRNA loci. During training, Bayesian optimization was used to search the most appropriate parameters for identifying miRNA-triggered phasiRNA sites. DIGITAL reaches the satisfying Acc of 98.45% and 94.02% on independent datasets test_21 and test_24, respectively. In addition, six traditional classification algorithms were also constructed and compared with DIGITAL. In empirical studies based on independent tests, DIGITAL outperforms six traditional classification algorithms, and this fact demonstrates the effectiveness of our model. In addition, the robustness and generalization ability of DIGITAL suggest it can be easily extended and applied for recognizing miRNA targets of other species.

4. Discussion

4.1. Optimization and establishment of DIGITAL

Bayesian optimization is a very effective global optimization algorithm widely used in multitudinous prediction tasks in bioinformatics ^{[24,25,26,27]}. In this work, to further improve the performance of DIGITAL, we also applied this method to optimize key hyper-parameters in the training process. As works in previous ^[28,29], the difference between the experimental value and the predictive value on the validation set is defined as the fitness function evaluation of the hyper-parameter optimization during the training process. The unit number in Bi-LSTM ^[30,31,32] and the fully-connected layer, as well as the batch size, all varies in the range of (16,128). Corresponding results for each combination are listed in Supplementary Table S1, and the best results with the Acc of 98.71%, MCC of 96.13%, and AUC of 99.78% are achieved at the combination of (121, 16,110).

In addition, we also choose the parameters by empirical methods ^[33,34], where the unit number of Bi-LSTM is set as 64, the unit number of the fully-connected layer is set as 32, and the batch size is set as 100. Prediction performance of this combination is shown Figure 2 as DIGITAL_E. As shown in Figure 2, the model based on Bayesian optimization achieved superior results on the validation dataset. Thus, the final model for phasiRNA identification is designed as 121 units in Bi-LSTM, 16 units in the fully-connected layer, and the batch size is designed as 110. DIGITAL denotes a Bayesian optimization and DIGITAL_E denotes an empirical parameter.

Figure 2. Results of empirical tuning and Bayesian optimization on the validation dataset.

DownLoad: Full-Size Img PowerPoint

4.2. Further evaluation of DIGITAL performance

In this section, the independent datasets test_21 and test_24 are applied to further evaluate the robustness and generalization ability of DIGITAL. As shown in Table 1, DIGITAL obtains the Acc of 98.48%, Sn of 98.95%, Sp of 98.02% and MCC of 96.95% on independent dataset test_21, and achieves the Acc of 94.02%, Sn of 95.04%, Sp of 93.00% and MCC of 88.05% on independent dataset test_24. In order to display the prediction results more intuitively, we plot the ROC curves and calculate the AUC values, as shown in Figure 3. Our model achieves satisfactory AUC of 99.88% on the independent dataset test_21 and AUC of 98.41% on the independent dataset test_24. The similar prediction performance demonstrates that DIGITAL has good robustness and generalization ability. Besides, these two groups of results also demonstrate that the length of the sequence has a great influence on the prediction performance. With the increasing amount of data in the future, it is necessary to establish special predictors aiming at different sequence lengths.

Figure 3. The ROC curves of two independent datasets.

DownLoad: Full-Size Img PowerPoint

In addition, we also implemented 5-fold and 10-fold cross-validation tests to further evaluate the generalization capability, respectively, and listed the average results in the Supplementary Table S2. We observed that COPPER achieved the average Acc of 98.14% and 98.30% on 5-fold and 10-fold cross-validation, respectively. The k-fold (k = 5, 10) results are basically consistent with those results on validation dataset.

4.3. Comparison with other machine-learning models on two test datasets

In addition to deep learning classification algorithm, we also applied six other commonly used traditional machine learning methods to develop predictive models, consisting of support vector machines (SVM), Naive Bayes (NB), k-nearest neighbors (KNN), XGBoost, logistic regression (LR), and random forest (RF). For each classification algorithm, we implemented parameter selection to achieve the best prediction results. Prediction performances before and after parameter selection on the validation dataset are shown in Supplementary Figure S2. It is surprising that except KNN, the other models do not show significant change before and after parameter selection. For this reason, we tested the six models using default parameters on our two independent datasets and compared them with DIGITAL. As shown in Table 2, DIGITAL reveals better predictive performance relative to the other predictors in terms of MCC, Acc, Sn and Sp, except for Sp on which random forest reaches the best performance. Specifically, the MCC of DIGITAL is 1% higher than the second best method SVM on test_21 dataset, and 16.9% higher than the second best method XGBoost on test_24 dataset. The improved MCC suggests that the Sn and Sp are balanced and relatively similar.

Table 2. The performance of DIGITAL and other six machine learning algorithms on two independent datasets.

Method	Dataset	Sn(%)	Sp(%)	Acc(%)	MCC	AUC
DIGITAL	test_21	98.95	98.02	98.45	0.969	0.999
DIGITAL	test_24	95.04	93.00	94.02	0.881	0.984
SVM	test_21	96.08	99.81	97.92	0.959	0.979
SVM	test_24	43.57	99.09	71.33	0.513	0.713
KNN	test_21	97.72	12.57	55.78	0.197	0.551
KNN	test_24	83.97	73.81	78.89	0.581	0.789
NB	test_21	88.89	99.81	94.27	0.891	0.944
NB	test_24	1.58	98.65	50.11	0.009	0.501
XGBoost	test_21	97.63	98.87	98.24	0.965	0.983
XGBoost	test_24	73.14	96.36	84.65	0.712	0.847
LR	test_21	95.26	94.28	94.78	0.896	0.948
LR	test_24	11.29	92.10	51.69	0.058	0.517
RF	test_21	95.81	1.0	97.87	0.958	0.979
RF	test_24	4.51	1.0	52.26	0.152	0.523

| Show Table

DownLoad: CSV

As shown in Table 2, for all the seven classification algorithms, prediction results on dataset test_24 are inferior to those on dataset test_21. This may be due to these models are established based on miRNA-initiated 21-nt phasiRNAs. In the future, we shall pay efforts to overcome the influence of sequence length on the model.

4.4. Model construction based on word2vec

In this section, we constructed the classification model based on word2vec embedding method. We adopted the grammar of 1, window size of context 4 and dimensions of embedding vector of 4 because the dimension of one-hot is also 4. When training the embedding matrix, we chose our training set as the corpus. The comparison of one-hot and word2vec is shown in Figure 4. It can be seen that the model based on one-hot encoding reached the best performance on validation for all of five indicators, and gave relatively low Sps and high values of other for indicators on both test-21 and test-24 datasets. Therefore, we provided the code of two models at https://github.com/yuanyuanbu/DIGITAL.

Figure 4. The performance evaluation results of one-hot and word2vec models.

DownLoad: Full-Size Img PowerPoint

4.5. Ablation study

The hybrid network of DIGITAL is composed of multi-scale ResNet and bi-LSTM these two parts. To analyze the role of each part, we built two based models based on only multi-scale ResNet and bi-LSTM, respectively. The prediction results are listed in Table 3 of measurement by five evaluation indictors. It can be observed that DIGITAL obviously outperformed other two models on for indicators of Sn, Acc, MCC and AUC, especially with the improvement of more than 5% for Sn, but the model based on multi-scale ResNet achieved the high Sp of 99.34% and the model based on only bi-LSTM achieved the high Sp of 98.88%. The reason why the integration of multi-scale ResNet and bi-LSTM can improve Sn significantly is worth studying in the future.

Table 3. The performance of ablation experiment.

Model	Sn(%)	Sp(%)	Acc(%)	MCC	AUC
DIGITAL	98.86	97.31	98.06	0.961	0.998
Only bi-LSTM	91.57	99.34	95.37	0.910	0.994
Only multi-scale ResNet	93.86	98.88	96.35	0.928	0.969

| Show Table

DownLoad: CSV

4.6. Visualization of learning effects in different stages

In order to intuitively display the process of deep learning to distinguish samples, we employed the popular visualization algorithm termed t-distributed stochastic neighbor embedding (t-SNE) which has been used in bioinformatics. ^[35,36] As illustrated in Figure 5A and 5B, these two kinds of points are mixed up in confusion by using one-hot encoding and after Multi-scale ResNet. In contrast, most of the points in the two kinds have been separated after bi-LSTM, except that the boundary is not obvious (Figure 5C). Through the last Dense layer, the two types of points are almost completely separated, and the boundary is clear. Taken together, it can be concluded the DIGITAL framework can effectively learn the effective information from the one-hot encoding mapped from the RNA sequences.

Figure 5. Visualization of training process projected in 2D space.

DownLoad: Full-Size Img PowerPoint

Acknowledgments

This work was supported by the Fundamental Research Funds for the Central Universities 3132022204.

Conflict of interest

The authors declare no competing interests.

Supplementary

Table S1. The details of Bayesian optimization.

Iter	Target	Bi-LSTM	Dense	Batch_size
1	0.9815	43	75	23
2	0.9815	28	105	61
3	0.9797	45	73	41
4	0.9852	58	125	64
5	0.9838	127	67	26
6	0.9797	74	128	60
7	0.9871	121	16	110
8	0.9834	113	73	123
9	0.9866	98	42	61
10	0.9838	97	79	106
11	0.9783	38	66	98
12	0.9810	30	107	116
13	0.9806	126	86	18
14	0.9834	42	19	81
15	0.9806	70	41	55
16	0.9834	99	111	35
17	0.9801	98	41	60
18	0.9838	106	99	89
19	0.9815	56	97	39
20	0.9866	112	104	95
21	0.9861	76	119	52
22	0.9847	81	89	112
23	0.9797	110	24	61
24	0.9820	44	89	106
25	0.9810	69	65	85
26	0.9857	82	19	109
27	0.9838	79	87	73
28	0.9834	61	43	38
29	0.9783	97	80	106
30	0.9857	98	21	59

| Show Table

DownLoad: CSV

Table S2. The performance of the 5-fold and 10-fold cross validation tests.

	Sn(%)	Sp(%)	Acc(%)	MCC	AUC
5-fold	98.44	97.83	98.14	0.963	0.997
10-fold	98.61	97.99	98.30	96.61	99.78

| Show Table

DownLoad: CSV

Figure S1. The loss and accuracy trend with different number of epochs on the DIGITAL.

DownLoad: Full-Size Img PowerPoint

Figure S2. Accuracy comparison of six machine learning methods before and after parameter selection on validation datasets.

DownLoad: Full-Size Img PowerPoint

References

[1]	F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. F. Patel-Schneider, The Description Logic Handbook: Theory, Implementation, and Applications, Cambridge University Press, 2003.
[2]	I. Horrocks, P. F. Patel-Schneider, F. van Harmelen, From SHIQ and RDF to OWL: The making of a web ontology language, J. Web Semantics, 1 (2003), 7–26. https://doi.org/10.1016/j.websem.2003.07.001 doi: 10.1016/j.websem.2003.07.001
[3]	J. S. C. Lam, D. Sleeman, J. Z. Pan, W. Vasconcelos, A fine-grained approach to resolving unsatisfiable ontologies, J. Data Semantics X, 10 (2008), 62–95. https://doi.org/10.1007/978-3-540-77688-8_3 doi: 10.1007/978-3-540-77688-8_3
[4]	L. Qiu, Y. Liu, Y. Song, B. Zhang, A conflict diagnosis approach of changing sequences in gene ontology evolution, Int. J. Control Autom., 7 (2014), 269–284.
[5]	X. W. Zhao, X. T. Li, Z. Q. Ma, M. H. Yin, Identify DNA-binding proteins with optimal Chou's amino acid composition, Proteins Pept. Lett., 19 (2012), 398–405. https://doi.org/10.2174/092986612799789404 doi: 10.2174/092986612799789404
[6]	J. Zhang, Y. Zhang, Z. Ma, In silico prediction of human secretory proteins in plasma based on discrete firefly optimization and application to Cancer biomarkers identification, Front. Genet., 10 (2019), 542. https://doi.org/10.3389/fgene.2019.00542 doi: 10.3389/fgene.2019.00542
[7]	J. Zhang, H. Chai, G. Yang, Z. Ma, Prediction of bioluminescent proteins by using sequence-derived features and lineage-specific scheme, BMC Bioinf., 18 (2017), 1–13. https://doi.org/10.1186/s12859-017-1709-6 doi: 10.1186/s12859-017-1709-6
[8]	S. Schlobach, Z. Huang, R. Cornet, F. Harmelen, Debugging incoherent terminologies, J. Autom. Reasoning, 39 (2007), 317–349. https://doi.org/10.1007/s10817-007-9076-z doi: 10.1007/s10817-007-9076-z
[9]	Y. Zhang, D. Ouyang, Y. Ye, Glass-box debugging algorithm based on unsatisfiable dependent paths, IEEE Access, 5 (2017), 18725–18736. https://doi.org/10.1109/ACCESS.2017.2753381 doi: 10.1109/ACCESS.2017.2753381
[10]	F. Simančík, B. Motik, I. Horrocks, Consequence-based and fixed-parameter tractable reasoning in description logics, Artif. Intell., 209 (2014), 29–77. https://doi.org/10.1016/j.artint.2014.01.002 doi: 10.1016/j.artint.2014.01.002
[11]	Z. Zhou, G. Qi, B. Suntisrivaraporn, A new method of finding all justifications in OWL 2 EL, in 2013 IEEE/WIC/ACM International Conferences on Web Intelligence, (2013), 213–220. https://doi.org/10.1109/WI-IAT.2013.31
[12]	X. Fu, G. Qi, Y. Zhang, Z. Zhou, Graph-based approaches to debugging and revision of terminologies in DL-Lite, Knowl. Based Syst., 100 (2016), 1–12. https://doi.org/10.1016/j.knosys.2016.01.039 doi: 10.1016/j.knosys.2016.01.039
[13]	B. C. Grau, I. Horrocks, Y. Kazakov, U. Sattler, A logical framework for modularity of ontologies, in Proceedings of the 20th International Joint Conference on Artificial Intelligence, (2007), 298–303.
[14]	B. Grau, I. Horrocks, Y. Kazakov, U. Sattler, Just the right amount: extracting modules from ontologies, in Proceedings of the 16th international conference on World Wide Web, (2007), 717–726. https://doi.org/10.1145/1242572.1242669
[15]	B. Cuenca Grau, C. Halaschek-Wiener, Y. Kazakov, History matters: incremental ontology reasoning using modules, in Proceedings of the 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, (2007), 183–196. https://doi.org/10.1007/978-3-540-76298-0_14
[16]	A. Kalyanpur, B. Parsia, M. Horridge, E. Sirin, Finding all justifications of OWL DL entailments, in Proceedings of 6th International Semantic Web Conference, ISWC 2007 and 2nd Asian Semantic Web Conference, (2007), 267–280. https://doi.org/10.1007/978-3-540-76298-0_20
[17]	M. Horridge, Justification Based Explanation in Ontologies, Ph.D thesis, University of Manchester in Manchester, 2011.
[18]	B. Suntisrivaraporn, Module Extraction and Incremental Classification: A pragmatic approach for $\mathcal EL$ + ontologies, in Proceedings of the 5th European Semantic Web Conference, (2008), 230–244. https://doi.org/10.1007/978-3-540-68234-9_19
[19]	J. Du, G. Qi, Q. Ji, Goal-directed module extraction for explaining OWL DL entailments, in Proceedings of the 8th International Semantic Web Conference, (2009), 163–179. https://doi.org/10.1007/978-3-642-04930-9_11
[20]	J. Du, G. Qi, Decomposition-Based Optimization for Debugging of Inconsistent OWL DL Ontologies, in Proceedings of the 4th International Conference on the Knowledge Science, Engineering and Management, (2010), 88–100. https://doi.org/10.1007/978-3-642-15280-1_11
[21]	M. Gao, Y. Ye, D. Ouyang, B. Wang, Finding justifications by approximating core for large-scale ontologies, in Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI2019), (2019), 6432–6433.
[22]	Y. Zhang, R. Yao, D. Ouyang, J. Gao, F. Liu, Debugging incoherent ontology by extracting a clash module and identifying root unsatisfiable concepts, Knowl. -Based Syst., 223 (2021), 107043. https://doi.org/10.1016/j.knosys.2021.107043 doi: 10.1016/j.knosys.2021.107043
[23]	Y. Zhang, D. Ouyang, Y. Ye, An optimization strategy for debugging incoherent terminologies in dynamic environments, IEEE Access, 5 (2017), 24284–24300. https://doi.org/10.1109/ACCESS.2017.2758521 doi: 10.1109/ACCESS.2017.2758521
[24]	Q. Ji, Z. Gao, Z. Huang, Study of ontology debugging approaches based on the criterion set BLUEI2CI, in Proceedings of the 6th Chinese Semantic Web Symposium and 1st Chinese Web Science Conference, (2013), 251–264. https://doi.org/10.1007/978-1-4614-6880-6_22
[25]	Q. Ji, Z. Gao, Z. Huang, M. Zhu, Measuring effectiveness of ontology debugging systems, Knowl. -Based Syst., 71 (2014), 169–186. https://doi.org/10.1016/j.knosys.2014.07.023 doi: 10.1016/j.knosys.2014.07.023
[26]	Y. Ye, X. Cui, D. Ouyang, Extracting a justification for OWL ontologies by critical axioms, Front. Comput. Sci., 14 (2020), 55–64. https://doi.org/10.1007/s11704-019-7267-5 doi: 10.1007/s11704-019-7267-5
[27]	J. Gao, D. Ouyang, Y. Ye, Exploring duality on ontology debugging, Appl. Intell., 50 (2020), 620–633. https://doi.org/10.1007/s10489-019-01528-y doi: 10.1007/s10489-019-01528-y
[28]	J. Du, G. Qi, X. Fu, A practical fine-grained approach to resolving incoherent OWL 2 DL terminologies, in Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, (2014), 919–928. https://doi.org/10.1145/2661829.2662046
[29]	D. Fleischhacker, C. Meilicke, J. Völker, M. Niepert, Computing incoherence explanations for learned ontologies, in Proceedings of the 7th International Conference on the Web Reasoning and Rule Systems, (2013), 80–94. https://doi.org/10.1007/978-3-642-39666-3_7
[30]	G. Flouris, Z. Huang, J. Z. Pan, D. Plexousakis, H. Wache, Inconsistencies, negations and changes in ontologies, in Proceedings of the Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, (2006), 1295–1300.
[31]	E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur, Y. Katz, Pellet: a practical owl-dl reasoner, J. Web Semantics, 5 (2007), 51–53. https://doi.org/10.1016/j.websem.2007.03.004 doi: 10.1016/j.websem.2007.03.004
[32]	R. Shearer, B. Motik, I. Horrocks, HermiT: a highly-efficient OWL reasoner, in Proceedings of the Fifth OWLED Workshop on OWL: Experiences and Directions, collocated with the 7th International Semantic Web Conference (ISWC-2008), (2008), 1–10.
[33]	D. Tsarkov, I. Horrocks, FaCT++ description logic reasoner: system description, in International joint conference on automated reasoning, (2006), 292–297. https://doi.org/10.1007/11814771_26
[34]	F. Baader, B. Suntisrivaraporn, Debugging SNOMED CT using axiom pinpointing in the description logic EL, in Proceedings of the 3rd International Conference on Knowledge Representation in Medicine, 410 (2008), 1–7.
[35]	B. C. Grau, I. Horrocks, Y. Kazakov, U. Sattler, Modular reuse of ontologies: theory and practice, J. Artif. Intell. Res., 31 (2008), 273–318. https://doi.org/10.1613/jair.2375 doi: 10.1613/jair.2375
[36]	H. Wang, M. Horridge, A. Rector, N. Drummond, J. Seidenberg, Debugging OWL-DL ontologies: a heuristic approach, in Proceedings of the 4th International Semantic Web Conference, (2005), 745–757. https://doi.org/10.1007/11574620_53
[37]	Q. Ji, Z. Gao, Z. Huang, M. Zhu, An efficient approach to debugging ontologies based on patterns, in Proceedings of the Semantic Web-Joint International Semantic Technology Conference, (2011), 425–433. https://doi.org/10.1007/978-3-642-29923-0_33
[38]	Ó. Corcho, C. Roussey, L. M. Vilches-Blázquez, I. Perez, Pattern-based OWL ontology debugging guidelines, in Proceedings of the Workshop on Ontology Patterns (WOP 2009), (2009), 1–15.

This article has been cited by:

1.	Shree Prakash Pandey, 2024, 9781394209934, 283, 10.1002/9781394209965.ch12
2.	Shree P. Pandey, Priyanka Pandey, Prashant K. Srivastava, 2025, Chapter 3, 978-1-0716-4397-6, 73, 10.1007/978-1-0716-4398-3_3

Reader Comments

Your name:*

Email:*
© 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)