Why Curriculum Learning & Self-paced Learning Work in Big/Noisy Data: A Theoretical Perspective

  • Received: 01 May 2015 Revised: 01 August 2015 Published: 01 January 2016
  • Since being recently raised, curriculum learning (CL) and selfpaced learning (SPL) have attracted increasing attention due to its multiple successful applications. While currently the rationality of this learning regime is heuristically inspired by the cognitive principle of humans, there still isn't a sound theory to explain the intrinsic mechanism leading to its effectiveness, especially on some successful attempts on big/noise data. To address this issue, this paper presents some theoretical results for revealing the insights under this learning scheme. Specifically, we first formulate a new learning problem aiming to learn a proper classifier from samples generated from the training distribution which is deviated from the target distribution. Furthermore, we find that the CL/SPL regime provides a feasible solving strategy for this learning problem. Especially, by first introducing high-confidence/easy samples and gradually involving low-confidence/complex ones into learning, the CL/SPL process latently minimizes an upper bound of the expected risk under target distribution, purely using the data from the deviated training distribution. We further construct a new SPL learning algorithm based on random sampling, which better complies with our theory, and substantiate its effectiveness by experiments implemented on synthetic and real data.

    Citation: Tieliang Gong, Qian Zhao, Deyu Meng, Zongben Xu. Why Curriculum Learning & Self-paced Learning Work in Big/Noisy Data: A Theoretical Perspective[J]. Big Data and Information Analytics, 2016, 1(1): 111-127. doi: 10.3934/bdia.2016.1.111

    Related Papers:

  • Since being recently raised, curriculum learning (CL) and selfpaced learning (SPL) have attracted increasing attention due to its multiple successful applications. While currently the rationality of this learning regime is heuristically inspired by the cognitive principle of humans, there still isn't a sound theory to explain the intrinsic mechanism leading to its effectiveness, especially on some successful attempts on big/noise data. To address this issue, this paper presents some theoretical results for revealing the insights under this learning scheme. Specifically, we first formulate a new learning problem aiming to learn a proper classifier from samples generated from the training distribution which is deviated from the target distribution. Furthermore, we find that the CL/SPL regime provides a feasible solving strategy for this learning problem. Especially, by first introducing high-confidence/easy samples and gradually involving low-confidence/complex ones into learning, the CL/SPL process latently minimizes an upper bound of the expected risk under target distribution, purely using the data from the deviated training distribution. We further construct a new SPL learning algorithm based on random sampling, which better complies with our theory, and substantiate its effectiveness by experiments implemented on synthetic and real data.


    加载中
    [1] [ S. Basu and J. Christensen, Teaching Classification Boundaries to Humans, Proceddings of the 27th AAAI Conference on Artificial Intelligence, 2013.
    [2] [ Y. Bengio, J. Louradour, R. Collobert and J. Westone, Curriculum Learning, Proceedings of the 26th International Conference on Machine Learning, (2009), 41-48.
    [3] [ C.-C. Chang and C.-J. Lin, LIBSVM:A library for support vector machines, ACM Transactions on Intelligent Systems and Technology, 2(2011), 1-27. Software available from:http://www.csie.ntu.edu.tw/~cjlin/libsvm.
    [4] [ X. Chen, A. Shrivastava and A. Gupta, NEIL:Extracting visual knowledge from web data, Proceedings of the IEEE International Conference on Computer Vision, (2013), 1409-1416.
    [5] [ F. Cucker and S. Smale, On the mathematical foundations of learning, Bull. Amer. Math. Soc., 39(2002), 1-49.
    [6] [ F. Cucker and D. X. Zhou, Learning Theory:An Approximation Theory Viewpoint, Cambridge University Press, New York, NY, USA, 2007.
    [7] [ Y. Freund and R. E. Schapire, Experiments with a new boosting algorithm, Proceedings of the 13th International Conference on Machine Learning, 1996.
    [8] [ L. Jiang, D. Y. Meng, T. Mitamura and A. Hauptman, Easy samples first:Self-paced reranking for multimedia search, Proceddings of the ACM International Conference on Multimedia, (2014), 547-556.
    [9] [ L. Jiang, D. Y. Meng, S. Yu, Z. Z. Lan, S. G. Shan and A. Hauptma, Self-paced Learning with Diversity, Advances in Nerual Information Processing Systems 27, 2014.
    [10] [ L. Jiang and D. Y. Meng, Q. Zhao, S. G. Shan and A. Hauptman, Self-paced Curriculum Learning, Proceddings of the 29th AAAI Conference on Artificial Intelligence, 2015.
    [11] [ F. Khan, X. Zhu and B. Mutlu, How do Humans Teach:On Curriculum Learning and Teaching Dimension, Advances in Nerual Information Processing Systems 24, 2011.
    [12] [ M. Kumar, B. Packer and D. Koller, Self-paced Learning for Latent Variable Models, Advances in Nerual Information Processing Systems 23, 2010.
    [13] [ M. Kumar, H. Turki, D. Preston and D. Koller, Learning specfic-class segmentation from diverse data, Proceedings of the IEEE International Conference on Computer Vision, 2011.
    [14] [ Y. Lee and K. Grauman, Learning the easy things first:Self-paced visual category discovery, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2011), 1721-1728.
    [15] [ T. Mitchell, W. Cohen, E. Hruschka, P. Talukdar, J. Betteridge, A. Carlson, B. Dalvi, M. Gardner, B. Kisiel, J. Krishnamurthy, N. Lao, K. Mazaitis, T. Mohamed, N. Nakashole, E. Platanios, A. Ritter, M. Samadi, B. Settles, R. Wang, D. Wijaya, A. Gupta, X. Chen, A. Saparov, M. Greaves and J. Welling, Never-Ending Learning, Proceddings of the 29th AAAI Conference on Artificial Intelligence, 2015.
    [16] [ M. Mohri, A. Rostamizadeh and A. Talwalkar, Foundations of Machine Learning, The MIT Press, Cambridge, Massachusetts, London, England, 2012.
    [17] [ E. Ni and C Ling, Supervised learning with minimal effort, Advances in Knowledge Discovery and Data Mining, 6119(2010), 476-487.
    [18] [ J. Supanvcivc and D. Ramana, Self-paced learning for long-term tracking, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013.
    [19] [ Y. Tang, Y. B. Yang and Y. Gao, Self-paced Dictionary Learning for Image Classification, Proceddings of the ACM International Conference on Multimedia, (2012), 833-836.
    [20] [ K. Tang, V. Ramanathan, F. Li and D. Koller, Shifting weights:Adapting object detectors from image to video, Advances in Nerual Information Processing Systems 25, 2012.
    [21] [ V. Vapnik, Statistical Learning Theory, Wiley-Interscience, New York, 1998.
    [22] [ S. Yu, L. Jiang, Z. Mao, X. J. Chang, X. Z. Du, C. Gan, Z. Z. Lan, Z. W. Xu, X. C. Li, Y. Cai, A. Kumar, Y. Miao, L. Martin, N. Wolfe, S. C. Xu, H. Li, M. Lin, Z. G. Ma, Y. Yang, D. Y. Meng, S. G. Shan, P. D. Sahin, S. Burger, F. Metze, R. Singh, B. Raj, T. Mitamura, R. Stern and A. Hauptmann, CMU-Informedia@TRECVID 2014 Multimedia Event Detection (MED), TRECVID Video Retrieval Evaluation Workshop, 2014.
    [23] [ Q. Zhao, D. Y. Meng, L. Jiang, Q. Xie, Z. B. Xu and A. Hauptman, Self-paced Matrix Factorization, Proceddings of the 29th AAAI Conference on Artificial Intelligence, 2015.
  • Reader Comments
  • © 2016 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(3082) PDF downloads(569) Cited by(7)

Article outline

Figures and Tables

Figures(5)  /  Tables(2)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog