Research article Special Issues

A method based on multi-standard active learning to recognize entities in electronic medical record

  • Received: 29 September 2020 Accepted: 21 December 2020 Published: 05 January 2021
  • Deep neural networks(DNN)have achieved good results in the application of Named Entity Recognition (NER), but most of the DNN methods are based on large numbers of annotated data. Electronic Medical Record (EMR) belongs to text data of the specific professional field. The annotation of this kind of data needs experts with strong knowledge of the medical field and time labeling. To tackle the problems of professional medical areas, large data volume, and annotation difficulties of EMR, we propose a new method based on multi-standard active learning to recognize entities in EMR. Our approach uses three criteria: the number of labeled data, the cost of sentence annotation, and the balance of data sampling to determine the choice of active learning strategy. We put forward a more suitable way of uncertainty calculation and measurement rule of sentence annotation for NER's neural network model. Also, we use incremental training to speed up the iterative training in the process of active learning. Finally, the named entity experiment of breast clinical EMRs shows that it can achieve the same accuracy of NER results under the premise of obtaining the same sample's quality. Compared with the traditional supervised learning method of randomly selecting labeled data, the method proposed in this paper reduces the amount of data that needs to be labeled by 66.67%. Besides, an improved TF-IDF method based on Word2Vec is also proposed to vectorize the text by considering the word frequency.

    Citation: Qiao Pan, Chen Huang, Dehua Chen. A method based on multi-standard active learning to recognize entities in electronic medical record[J]. Mathematical Biosciences and Engineering, 2021, 18(2): 1000-1021. doi: 10.3934/mbe.2021054

    Related Papers:

    [1] Zongying Feng, Guoqiang Tan . Dynamic event-triggered H control for neural networks with sensor saturations and stochastic deception attacks. Electronic Research Archive, 2025, 33(3): 1267-1284. doi: 10.3934/era.2025056
    [2] Yawei Liu, Guangyin Cui, Chen Gao . Event-triggered synchronization control for neural networks against DoS attacks. Electronic Research Archive, 2025, 33(1): 121-141. doi: 10.3934/era.2025007
    [3] Xingyue Liu, Kaibo Shi, Yiqian Tang, Lin Tang, Youhua Wei, Yingjun Han . A novel adaptive event-triggered reliable H control approach for networked control systems with actuator faults. Electronic Research Archive, 2023, 31(4): 1840-1862. doi: 10.3934/era.2023095
    [4] Chao Ma, Hang Gao, Wei Wu . Adaptive learning nonsynchronous control of nonlinear hidden Markov jump systems with limited mode information. Electronic Research Archive, 2023, 31(11): 6746-6762. doi: 10.3934/era.2023340
    [5] Hangyu Hu, Fan Wu, Xiaowei Xie, Qiang Wei, Xuemeng Zhai, Guangmin Hu . Critical node identification in network cascading failure based on load percolation. Electronic Research Archive, 2023, 31(3): 1524-1542. doi: 10.3934/era.2023077
    [6] Chengbo Yi, Jiayi Cai, Rui Guo . Synchronization of a class of nonlinear multiple neural networks with delays via a dynamic event-triggered impulsive control strategy. Electronic Research Archive, 2024, 32(7): 4581-4603. doi: 10.3934/era.2024208
    [7] Liping Fan, Pengju Yang . Load forecasting of microgrid based on an adaptive cuckoo search optimization improved neural network. Electronic Research Archive, 2024, 32(11): 6364-6378. doi: 10.3934/era.2024296
    [8] Ramalingam Sakthivel, Palanisamy Selvaraj, Oh-Min Kwon, Seong-Gon Choi, Rathinasamy Sakthivel . Robust memory control design for semi-Markovian jump systems with cyber attacks. Electronic Research Archive, 2023, 31(12): 7496-7510. doi: 10.3934/era.2023378
    [9] Nacera Mazouz, Ahmed Bengermikh, Abdelhamid Midoun . Dynamic design and optimization of a power system DC/DC converter using peak current mode control. Electronic Research Archive, 2025, 33(4): 1968-1997. doi: 10.3934/era.2025088
    [10] Yejin Yang, Miao Ye, Qiuxiang Jiang, Peng Wen . A novel node selection method for wireless distributed edge storage based on SDN and a maldistributed decision model. Electronic Research Archive, 2024, 32(2): 1160-1190. doi: 10.3934/era.2024056
  • Deep neural networks(DNN)have achieved good results in the application of Named Entity Recognition (NER), but most of the DNN methods are based on large numbers of annotated data. Electronic Medical Record (EMR) belongs to text data of the specific professional field. The annotation of this kind of data needs experts with strong knowledge of the medical field and time labeling. To tackle the problems of professional medical areas, large data volume, and annotation difficulties of EMR, we propose a new method based on multi-standard active learning to recognize entities in EMR. Our approach uses three criteria: the number of labeled data, the cost of sentence annotation, and the balance of data sampling to determine the choice of active learning strategy. We put forward a more suitable way of uncertainty calculation and measurement rule of sentence annotation for NER's neural network model. Also, we use incremental training to speed up the iterative training in the process of active learning. Finally, the named entity experiment of breast clinical EMRs shows that it can achieve the same accuracy of NER results under the premise of obtaining the same sample's quality. Compared with the traditional supervised learning method of randomly selecting labeled data, the method proposed in this paper reduces the amount of data that needs to be labeled by 66.67%. Besides, an improved TF-IDF method based on Word2Vec is also proposed to vectorize the text by considering the word frequency.


    Gender hormones regulate structure and function of many tissue and organ systems [1],[2]. Sexual dimorphism is defined as “the differences in appearance between males and females of the same species, such as in colour, shape, size, and structure, that are caused by the inheritance of one or the other sexual pattern in the genetic material” [3]. Some studies have reported that gender hormones affect renal morphology and physiology, and gender differences exist in the prevalence and prognosis of renal diseases. However, there are inconsistent results across study outcomes. There are also limited data available on this issue in humans [1],[2],[4],[5].

    It is emphasizes that women have a slower rate of decline in renal function than men. This condition can be due to gender differences in kidney size and weight, biological, metabolic and hemodynamic processes [1],[4]. In a study of 13,925 Chinese adults, Xu et al. [6] reported that the rates of decline in estimated glomerular filtration rate in men in both the at-risk group and the chronic kidney disease (CKD) group were faster compared to women, after referencing to the healthy group. Fanelli et al. [7] investigated gender differences in the progression of experimental CKD induced by chronic nitric oxide inhibition in rats. Their findings have indicated that female rats developed less severe CKD compared to males. According to Fanelli et al. [7], “female renoprotection could be promoted by both the estrogen anti-inflammatory activity and/or by the lack of testosterone, related to renin-angiotensin-aldosterone system hyperactivation and fibrogenesis” [p. 1]. Other studies have also reported that CKD was slightly more common among women than men [8],[9]. In a prospective, community-based, cohort study of 5488 participants from the Netherlands, Halbesma et al. [10] investigated gender differences as predictors of the decline of renal function. They found that systolic blood pressure and plasma glucose level negatively associated with renal function decline for both genders. Interestingly, this follow-up study demonstrated that waist circumference was positively associated with renal function in men only [10]. In another community-based, cohort study of 1876 Japanese adults, a higher body mass index was also found to be an independent risk factor for the development of CKD in women only [11]. On the other hand, compared with men, women tend to initiate hemodialysis with an arteriovenous fistula less frequently, and have greater risk of arteriovenous fistula failure [8]. Carrero et al. [5] also reported that women are less likely to receive kidney transplants than men. Further research is therefore needed to better understand the effect of gender on kidney function and health outcomes.



    [1] C. Zeng, G. Hui, Construction of electronic medical record system for standardized diagnosis and treatment of breast cancer, J. Chin. Med. Dev., 29 (2014), 46–48.
    [2] Q. M. Ling, Research on the advantages and development of electronic medical record in medical record management, Electron. J. Gen. Stomatol., 7 (2020), 26–31.
    [3] L. Liu, D. B. Wang, Summary of research on named entity recognition, J. Chin. Soc. Sci. Tech. Inf., 3 (2018), 329–340.
    [4] C. Y. Kun, L. T. A, M. Q. Zhu, D. J. C, X. Hua, A study of active learning methods for named entity recognition in clinical text, J. Biomed. Inf., 58 (2015), 11–18. doi: 10.1016/j.jbi.2015.09.010
    [5] Y. Shen, H. Yun, Z. C. Lipton, Y. Kronrod, A. Anandkumar, Deep active learning for named entity recognition, preprint, arXiv: 1707.05928.
    [6] W. W. Ning, L. Yang, G. M. Zu, L. X. Yan, Research progress of active learning algorithm based on sampling strategy, J. Comput. Res. Dev., 49 (2012), 1162–1173.
    [7] W. R. Qi, L. X. Li, H. Y. Li, B. He, G. Yi, Research on active learning method for named entity recognition of Chinese electronic medical record, Chin. Digital Med., 12 (2017), 51–53.
    [8] L. M. Qun, S. Martin, E. E. Khaled, M. B. A, Efficient active learning for electronic medical record de-identification, AMIA Summits Transl. Sci. Proc., (2019), 462–471.
    [9] M. Kholghi, L. Sitbon, G. Zuccon, A. Nguyen, External knowledge and query strategies in active learning: a study in clinical information extraction, in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, (2015), 143–152.
    [10] J. Zhu, E. H. Hovy, Active learning for word sense disambiguation with methods for addressing the class imbalance problem, in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), (2007), 783–790.
    [11] M. Bloodgood, K. Vijay–Shanker, Taking into account the differences between actively and passively acquired data: The case of active learning with support vector machines for imbalanced datasets, preprint, arXiv: 1409.4835.
    [12] K. Tomanek, U. Hahn, Reducing class imbalance during active learning for named entity annotation, in Proceedings of the fifth international conference on Knowledge capture, (2009), 105–112.
    [13] S. Ertekin, J. Huang, C. L. Giles, Active learning for class imbalance problem, in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 1 (2007), 823–824.
    [14] S. Ertekin, J. Huang, L. Bottou, C. L. Giles, Learning on the border: Active learning in imbalanced data classification, in Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, (2007), 127–136.
    [15] H. Guang, Z. C. Xia, H. X. Lei, A new SVM active learning algorithm and its application in obstacle detection, J. Comput. Res. Dev., 46 (2009), 1934–1941.
    [16] B. C. Mei, Classification of weighted support vector machines based on active learning, Comput. Eng. Des., 30 (2009), 966–970.
    [17] Y. F. Liang, Research on Active Learning Algorithm Based on Expert Committee, Master thesis, Ocean University of China, 2010.
    [18] L. Feng, Research and Application of Active Semi-supervised K-means Clustering Algorithm, Master thesis, Hebei University of Geosciences, 2018.
    [19] X. Li, Y. Guo, Adaptive active learning for image classification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013 (2013), 859–866.
    [20] M. Kholghi, L. D. Vine, L. Sitbon, G. Zuccon, A. Nguyen, Clinical information extraction using small data: an active learning approach based on sequence representations and word embeddings, J. Assoc. Inf. Sci. Technol., 68 (2017), 2543–2556. doi: 10.1002/asi.23936
    [21] D. Angluin, Queries and concept learning, Mach. Learn., 2 (1988), 319–342.
    [22] R. Grishman, B. Sundheim, Message understanding conference-6: A brief history, in COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics, 1996.
    [23] C. Friedman, P. O. Alderson, J. H. Austin, S. B. Johnson, A general natural-language text processor for clinical radiology, J. Am. Med. Inf. Assoc., 1 (1994), 161–174. doi: 10.1136/jamia.1994.95236146
    [24] W. S. Li, Research on Chinese Electronic Medical Record of Named Entity Recognition Based on Improved Deep Belief Network, Master thesis, Beijing University of Chemical Technology, 2018.
    [25] G. K. Savova, J. J. Masanz, P. V. Ogren, J. P. Zheng, S. W. Sohn, C. G. Chute, Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications, J. Am. Med. Inf. Assoc., 17 (2010), 507–513. doi: 10.1136/jamia.2009.001560
    [26] S. T. Wu, H. F. Liu, D. C. Li, C. Tao, M. A. Musen, N. H. Shah, Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis, J. Am. Med. Inf. Assoc., 19 (2012), 149–156. doi: 10.1136/amiajnl-2012-000844
    [27] E. F. Sang, F. D. Meulder, Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition, preprint, arXiv: 0306050.
    [28] Y. Li, S. L. Gorman, N. Elhadad, Section classification in clinical notes using supervised hidden markov model, in Proceedings of the 1st ACM International Health Informatics Symposium, (2010), 744–750.
    [29] P. Y. Wang, D. H. Gi, Disease name extraction based on multi-label CRF, Appl. Res. Comput., 1 (2017), 118–122.
    [30] F. Ye, Y. Y. Chen, G. G. Zhou, H. M. Li, Y. Li, Intelligent recognition of named entities in electronic medical records, Chin. J. Biomed. Eng., 2 (2011), 98–104.
    [31] J. Liang, X. M. Xian, X. J. He, M. F. Xu, S. Dai, J. Y. Xin, A novel approach towards medical entity recognition in Chinese clinical text, J. Healthcare Eng., (2017), 1–16.
    [32] G. Luo, X. Huang, C. Y, Z. Nie, Joint entity recognition and disambiguation, in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, (2015), 879–888.
    [33] A. Passos, V. Kumar, M. C. Andrew, Lexicon infused phrase embeddings for named entity resolution, preprint, arXiv: 1404.5367.
    [34] Y. J. Zhang, Z. T. Xu, X. Y. Xue, A maximum entropy Chinese named entity recognition model of integrating multiple features, J. Comput. Res. Dev., 6 (2008), 1004–1010.
    [35] A. Mccallum, W. Li, Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons, Comput. Sci. Dep. Fac. Publ. Ser. 11, (2003), 188–191.
    [36] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa, Natural language processing (almost) from scratch, J. Mach. Learn. Res., 12(2011), 2493–2537.
    [37] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, C. Dyer, Neural architectures for named entity recognition, preprint, arXiv: 1603.01360.
    [38] A. Jagannatha, Y. Hong, Structured prediction models for RNN based sequence labeling in clinical text, in Proceedings of the conference on empirical methods in natural language processing. conference on empirical methods in natural language processing, (2016), 856–865.
    [39] J. Zhu, H. Wang, B. K. Tsou, M. Ma, Active learning with sampling by uncertainty and density for data annotations, IEEE. Trans. Audio, Speech, Lang. Process., 18 (2012), 1323–1331.
    [40] X. Yan, Research on Image Annotation Method Based on Active Learning, Master thesis, Liaoning University of Technology, 2014.
    [41] L. Jin, Y. F. Cao, C. X. Su, J. Y. Ren, Multi-class image classification based on HS sample selection and BvSB feedback, J. Guizhou Norm. Univ. (Nat. Sci.), (2014), 56–61.
    [42] Q. H. Zhao, Two active learning methods, Master thesis, He Bei University, 2010.
    [43] S. Ertekin, J. Huang, C. L. Giles, Active learning for class imbalance problem, in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, (2007), 823–824.
    [44] H. S. Seung, M. Opper, H. Sompolinsky, Query by Committee, in Proceedings of the fifth annual workshop on Computational learning theory, (1992), 287–294.
    [45] D. D. Lewis, J. Catlett, Heterogeneous uncertainty sampling for supervised learning, in Machine learning proceedings, Morgan Kaufmann, 1994,148–156.
    [46] T. Scheffer, C. Decomain, S. Wrobel, Active hidden markov models for information extraction, in International Symposium on Intelligent Data Analysis, Springer, Berlin, Heidelberg, (2001), 309–318.
    [47] S. Tong, D. Koller, Support vector machine active learning with applications to text classification, J. Mach. Learn. Res., 2 (2002), 45–66.
    [48] A. Kapoor, E. Horvitz, S. Basu, Selective supervision: guiding supervised learning with decision-theoretic active learning, in IJCAI, 7 (2007), 877–882.
    [49] S. Arora, E. Nyberg, C. P. Rose, Estimating annotation cost for active learning in a multi-annotator environment, in Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing, (2009), 18–26.
    [50] J. Carroll, R. Haertel, P. McClanahan, E. K. Ringger, K. Seppi, Assessing the costs of sampling methods in active learning for annotation, Fac. Publ., (2008), 185.
    [51] M. Kholghi, L. Sitbon, G. Zuccon, A. Nguyen, Active learning: a step towards automating medical concept extraction, J. Am. Med. Inf. Assoc., 23 (2016), 289–296. doi: 10.1093/jamia/ocv069
    [52] R. Q. Wang, X. L. Li, Y. L. Huang, B. He, Y. Guan, Research on active learning method of Chinese electronic medical record named entity recognition, China Digital Med., 12 (2017), 51–53.
    [53] J. Z. Cheng, W. Qiang, A. Franklin, T. Cohen, H. Xu, Cost-sensitive active learning for phenotyping of electronic health records, AMIA Summits Transl. Sci. Proc., 2019 (2019), 829–838.
    [54] Ö. Uzuner, B. R. South, S. Shen, S. L. Duvall, 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text, J. Am. Med. Inf. Assoc., 18 (2011), 552–556. doi: 10.1136/amiajnl-2011-000203
    [55] S. Pradhan, N. Elhadad, B. South, D. Martinez, L. Christensen, A. Vogel, Task 1: ShARe/CLEF ehealth evaluation lab 2013, in CLEF (Working Notes), 2013.
    [56] G. S. Wang, X. J. Huang, Text classification model of convolutional neural network based on Word2vec and improved TF-IDF, J. Chin. Mini-Micro Comput. Syst., 40 (2019), 210–216.
    [57] M. Kholghi, L. Sitbon, G. Zuccon, A. Nguyen, Active learning reduces annotation time for clinical concept extraction, Int. J. Med. Inform., 106 (2017), 25–31. doi: 10.1016/j.ijmedinf.2017.08.001
    [58] T. H. Nguyen, A. Sil, G. Dinu, R. Florian, Toward mention detection robustness with recurrent neural networks, preprint, arXiv: 1602.07749.
    [59] Z. H. Huang, W. Xu, K. Yu, Bidirectional LSTM-CRF models for sequence tagging, preprint, arXiv: 1508.01991.
    [60] Z. L. Yang, R. Salakhutdinov, W. Cohen, Multi-task cross-lingual sequence tagging from scratch, preprint, arXiv: 1603.06270.
  • Reader Comments
  • © 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(3590) PDF downloads(331) Cited by(6)

Article outline

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog