Multiple-instance learning for text categorization based on semantic representation

  • Published: 01 January 2017
  • 97R40

  • Text categorization is the fundamental bricks of other related researches in NLP. Up to now, researchers have proposed many effective text categorization methods and gained well performance. However, these methods are generally based on the raw features or low level features, e.g., tf or tfidf, while neglecting the semantic structures between words. Complex semantic information can influence the precision of text categorization. In this paper, we propose a new method to handle the semantic correlations between different words and text features from the representations and the learning schemes. We represent the document as multiple instances based on word2vec. Experiments validate the effectiveness of proposed method compared with those state-of-the-art text categorization methods.

    Citation: Jian-Bing Zhang, Yi-Xin Sun, De-Chuan Zhan. 2017: Multiple-instance learning for text categorization based on semantic representation, Big Data and Information Analytics, 2(1): 69-75. doi: 10.3934/bdia.2017009

    Related Papers:

  • Text categorization is the fundamental bricks of other related researches in NLP. Up to now, researchers have proposed many effective text categorization methods and gained well performance. However, these methods are generally based on the raw features or low level features, e.g., tf or tfidf, while neglecting the semantic structures between words. Complex semantic information can influence the precision of text categorization. In this paper, we propose a new method to handle the semantic correlations between different words and text features from the representations and the learning schemes. We represent the document as multiple instances based on word2vec. Experiments validate the effectiveness of proposed method compared with those state-of-the-art text categorization methods.



    加载中
    [1] Amores J. (2013) Multiple instance classification: Review, taxonomy and comparative study. Artificial Intelligence 201: 81-105. doi: 10.1016/j.artint.2013.06.003
    [2] Andrews S., Tsochantaridis I., Hofmann T. (2002) Support vector machines for multiple-instance learning. Advances in Neural Information Processing Systems 15: 561-568.
    [3] Cavnar W.B., Trenkle J.M., et al. (1994) N-gram-based text categorization. Ann Arbor MI 48113: 161-175.
    [4] Y. Chevaleyre and J. D. Zucker, Solving multiple-instance and multiple-part learning problems with decision trees and rule sets. application to the mutagenesis problem, In Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence, (2001), 204–214.

    10.1007/3-540-45153-6_20

    [5] Dietterich T.G., Lathrop R.H., Lozano-Pérez T. (1997) Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence 89: 31-71. doi: 10.1016/S0004-3702(96)00034-3
    [6] Dumais S. (1998) Using svms for text categorization. IEEE Expert 13: 21-23.
    [7] N. Ishii, T. Murai, T. Yamada and Y. Bao, Text classification by combining grouping, lsa and knn, In Ieee/acis International Conference on Computer and Information Science and Ieee/acis International Workshop on Component-Based Software Engineering, software Architecture and Reuse, (2006), 148–154.

    10.1109/ICIS-COMSAR.2006.81

    [8] Kuang Q., Xu X. (2010) Improvement and application of tfidf method based on text classification. International Conference on Internet Technology and Applications 1-4.
    [9] Lai S., Xu L., Liu K., Zhao J. (2015) Recurrent convolutional neural networks for text classification. AAAI 2267-2273.
    [10] Maron O., Lozano-Pérez T. (1998) A framework for multiple-instance learning. Advances in Neural Information Processing Systems 200: 570-576.
    [11] Mccallum A., Nigam K. (2009) A comparison of event models for naive bayes text classification. In AAAI-98 Workshop On Learning For Text Categorization 62: 41-48.
    [12] T. Mikolov, K. Chen, G. Corrado and J. Dean, Efficient estimation of word representations in vector space, Computer Science, 2013.
    [13] Mikolov T., Sutskever I., Chen K., Corrado G., Dean J. (2013) Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26: 3111-3119.
    [14] Wang J., Zucker J.D. (2000) Solving multiple-instance problem: A lazy learning approach. Proc.international Conf.on Machine Learning 1119-1126.
    [15] Zhang M.L., Zhou Z.H. (2004) Improve multi-instance neural networks through feature selection. Neural Processing Letters 19: 1-10. doi: 10.1023/B:NEPL.0000016836.03614.9f
    [16] Z. H. Zhou and M. L. Zhang, Neural networks for multi-instance learning, In International Conference on Intelligent Information Technology 2002.
  • Reader Comments
  • © 2017 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2590) PDF downloads(629) Cited by(1)

Article outline

Figures and Tables

Figures(2)  /  Tables(2)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog