Rating news claims: Feature selection and evaluation

Izzat Alsmadi; Michael J. O'Brien; Izzat Alsmadi; Michael J. O'Brien

doi:10.3934/mbe.2020101

Mathematical Biosciences and Engineering

2020, Volume 17, Issue 3: 1922-1939. doi: 10.3934/mbe.2020101

Previous Article Next Article

Research article Special Issues

Rating news claims: Feature selection and evaluation

Izzat Alsmadi ^{1
,
,},
Michael J. O'Brien ²

1.
Department of Computing and Cyber Security, Texas A&M University–San Antonio, San Antonio, Texas 78224, USA
2.
Office of the Provost, Texas A&M University–San Antonio, San Antonio, Texas 78224, USA

Received: 17 September 2019 Accepted: 16 December 2019 Published: 19 December 2019

News claims that travel the Internet and online social networks (OSNs) originate from different, sometimes unknown sources, which raises issues related to the credibility of those claims and the drivers behind them. Fact-checking websites such as Snopes, FactCheck, and Emergent use human evaluators to investigate and label news claims, but the process is labor- and time-intensive. Driven by the need to use data analytics and algorithms in assessing the credibility of news claims, we focus on what can be generalized about evaluating human-labeled claims. We developed tools to extract claims from Snopes and Emergent and used public datasets collected by and published on those websites. Claims extracted from those datasets were supervised or labeled with different claim ratings. We focus on claims with definite ratings—false, mostly false, true, and mostly true, with the goal of identifying distinctive features that can be used to distinguish true from false claims. Ultimately, those features can be used to predict future unsupervised or unlabeled claims. We evaluate different methods to extract features as well as different sets of features and their ability to predict the correct claim label. By far, we noticed that OSN websites report high rates of false claims in comparison with most of the other website categories. The rate of reported false claims is higher than the rate of true claims in fact-checking websites in most categories. At the content-analysis level, false claims tend to have more negative tones in sentiments and hence can provide supporting features to predict claim classification.
- feature extraction,
- information credibility,
- online social networks,
- predictive models
Citation: Izzat Alsmadi, Michael J. O'Brien. Rating news claims: Feature selection and evaluation[J]. Mathematical Biosciences and Engineering, 2020, 17(3): 1922-1939. doi: 10.3934/mbe.2020101

Related Papers:

Abstract

News claims that travel the Internet and online social networks (OSNs) originate from different, sometimes unknown sources, which raises issues related to the credibility of those claims and the drivers behind them. Fact-checking websites such as Snopes, FactCheck, and Emergent use human evaluators to investigate and label news claims, but the process is labor- and time-intensive. Driven by the need to use data analytics and algorithms in assessing the credibility of news claims, we focus on what can be generalized about evaluating human-labeled claims. We developed tools to extract claims from Snopes and Emergent and used public datasets collected by and published on those websites. Claims extracted from those datasets were supervised or labeled with different claim ratings. We focus on claims with definite ratings—false, mostly false, true, and mostly true, with the goal of identifying distinctive features that can be used to distinguish true from false claims. Ultimately, those features can be used to predict future unsupervised or unlabeled claims. We evaluate different methods to extract features as well as different sets of features and their ability to predict the correct claim label. By far, we noticed that OSN websites report high rates of false claims in comparison with most of the other website categories. The rate of reported false claims is higher than the rate of true claims in fact-checking websites in most categories. At the content-analysis level, false claims tend to have more negative tones in sentiments and hence can provide supporting features to predict claim classification.

References

[1]	J. E. Alexander, M. A. Tate, Web Wisdom: How to Evaluate and Create Information Quality on the Web. Erlbaum, Hillsdale, NJ (1999).
[2]	D. S. Brandt, Evaluating information on the Internet, Comp. Lib., 16 (1996), 44-46.
[3]	J. W. Fritch, R. L. Cromwell, Evaluating Internet resources: Identity, affiliation, and cognitive authority in a networked world, J. Am. Soc. Inf. Sci. Tec., 52 (2001), 499-507.
[4]	M. Meola, Chucking the checklist: A contextual approach to teaching undergraduates Web-site evaluation, portal: Lib. Acad., 4 (2004), 331-344.
[5]	L. A. Tran, Evaluation of community web sites: A case study of the Community Social Planning Council of Toronto web site, Online Inform. Rev., 33 (2009), 96-116.
[6]	U. K. H. Ecker, J. L. Hogan, S. Lewandowsky, Reminders and repetition of misinformation: Helping or hindering its retraction? J. App. Res. Mem. Cogn. 6 (2017), 185-192.
[7]	X. Wang, C. Yu, S. Baumgartner, F. Korn, Relevant document discovery for fact-checking articles, WWW'18 Companion Proc. Web Conf., (2018), 525-533.
[8]	C. Shao, G. L. Ciampaglia, A. Flammini, F. Menczer, Hoaxy: A platform for tracking online misinformation, WWW'16 Companion Proc. Web Conf., (2016), 745-750.
[9]	C. Shao, G. L. Ciampaglia, O. Varol, K. C. Yang, A. Flammini, F. Menczer, The spread of low-credibility content by social bots, Nat. Comm., 9 (2018), 4787.
[10]	Z. Jin, J. Cao, H. Guo, Y. Zhang, Y. Wang, J. Luo, Detection and analysis of 2016 U.S. presidential election-related rumors on Twitter, in Social, Cultural, and Behavioral Modeling (eds. D. Lee, Y. R. Lin, N. Osgood, and R. Thomson) (2017), 14-24, Springer, Cham, Switzerland.
[11]	J. Kim, B. Tabibian, A. Oh, B. Schölkopf, M. Gomez-Rodriguez, Leveraging the crowd to detect and reduce the spread of fake news and misinformation, Proc. Eleventh ACM Int. Conf. Web Search Data Mining, (2018), 324-332.
[12]	S. Androniki, K. E. Psannis, Social networking data analysis tools & challenges, Future Generat. Comput.r Syst., 86 (2018): 893-913.
[13]	S. Mizzaro, How many relevances in information retrieval?, Interact. Comp., 10 (1998), 303-320.
[14]	D. Wilson, Web Site Evaluation. Market Difference Communications Group, Rocklin, CA (2010).
[15]	C. Castillo, M. Mendoza, B. Poblete, Information credibility on Twitter, Proc. 20th Int. Conf. World Wide Web, (2011), 675-684.
[16]	S. Sikdar, B. Kang, J. O'Donovan, T. Hollerer, S. Adal, Cutting through the noise: Defining ground truth in information credibility on Twitter, Human, (2013), 151-167.
[17]	I. Alsmadi, X. Dianxiang, J. H. Cho, Interaction-based reputation model in online social networks, in Proceedings of the Second International Conference on Information Systems Security and Privacy (2016), 265-272, Science and Technology Publishers, Setúbal, Portugal.
[18]	A. Nourbakhsh, Who starts and who debunks rumors, https://www.kaggle.com/arminehn/rumor-citation, 2017.
[19]	M. Yeomans, A. Kantor, D. Tingley, The politeness package: Detecting politeness in natural language, R J., (2018), 489-502.

Reader Comments

Your name:*

Email:*
© 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)