A New Evaluation Index for Application of Machine Learning Algorithms to Determine Trust in Skewed Social Media Data
In this paper we study the problem of accuracy paradox which arises on the application of machine learning algorithms for inference of trust in skewed social media data. Skewness is defined as the under representation of one class over another in a binary classification problem. We achieved our purpose of identifying the accuracy paradox problem in various algorithms by identifying a new evaluation index called predictive index. The dataset used was that of Twitter one of the most commonly used collaborative system which has experienced enormous growth in a small amount of time. It has evolved from a microblogging service to a major news source used by people as a platform to share and disseminate information about current events. However, not all information posted on Twitter is trustworthy or useful in providing information about the event. Gossips, fake news etc. are also a part of genuine news. The main aim of this paper is to tackle the issue of accuracy paradox, a major problem when dealing with social media research, were the data extracted by us was highly skewed. This high skewness in the dataset gives us biased information about the performance of our machine learning algorithms.
Cite this Article
Shifaa Basharat Fazili, Manzoor Ahmad. A New Evaluation Index for Application of Machine Learning Algorithms to Determine Trust in Skewed Social Media Data. Journal of Artificial Intelligence Research & Advances. 2018; 5(3): 49–57p.
West AG, Chang J, Venkatasubramanian KK et al. Trust in collaborative web applications. Future Generation Computer Systems. 2012; 28: 1238–1251p.
Huan Liu, Jiawei Han, Hiroshi Motoda. Uncovering deception. Springer Journal Social Network Analysis and Mining.
Pogatchnik S. Student hoaxes world’s media on Wikipedia. http://www.msnbc.msn.com/id/30699302/.
TPR Center. Internet overtakes newspapers as news outlet. December 2008. http://pewresearch.org/pubs/1066/internet-overtakes-newspapers-as-newssource [pewresearch.org; posted 23-December-2008].
Java A, Song X, Finin T et al. Why we twitter: Understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis. ACM. 2007; 56–65p.
Naaman M, Boase J, Lai C-H. Is it really about me? Message content in social awareness streams. Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work. ACM. 2010; 189–192p.
Laird S. How social media is taking over the news industry. April 2012. http://mashable.com/2012/04/18/social-media-and-the-news/mashable.com; posted 18-April-2012.
Kwak H, Lee C, Park H et al. What is twitter, a social network or a news media? In: Proceedings of the 19th international conference on World Wide Web. ACM. 2010; 591–600p.
Stassen W. Your news in 140 characters: Exploring the role of social media in journalism. Global Media Journal. 2010; 4(1): 116–131p.
CNBC. 2013. False rumor of explosion at white house causes stocks to briefly plunge; ape confirms its twitter feed was hacked. http://www.cnbc.com/id/100646197.
Valverde-Albacete, Francisco J, Carmen Peláez-Moreno. 100% classification accuracy considered harmful: The normalized information transfer factor explains the accuracy paradox. PloS One 2014; 9.1: e84217.
Shifaa B, Ahmad M. Inferring trust from message features using linear regression and support vector machines. International Conference on Next Generation Computing Technologies. Springer, Singapore. 2017.
Richards J, Lewis P. 2011. How Twitter was used to spread and knock down rumours during the riots. http://www.guardian.co.uk/uk/2011/dec/07/how-twitter-spread-rumours- riots.
Zeldin W. Venezuela: 2010. Twitter users arrested on charges of spreading rumors. http://www.loc.gov/lawweb/servlet/lloc news?disp3 l205402106 text.
Soroush V. Automatic detection and verification of rumors on Twitter. Diss. Massachusetts Institute of Technology 2015.
Gupta A, Kumaraguru P. Credibility ranking of tweets during high impact events. Proceedings of the 1st workshop on privacy and security in online social media. ACM. 2012.
Zhao WX, Jiang J, Weng J et al. Comparing twitter and traditional media using topic models. In: Proceedings of the 33rd European conference on Advances in information retrieval (Berlin, Heidelberg, 2011), ECIR’11. Springer-Verlag. 338–349p.
Castillo C, Mendoza M, Poblete B. Information credibility on twitter. In: Proceedings of the 20th international conference on World Wide Web (New York, NY, USA, 2011). WWW ’11. ACM. 675–684.
Gupta M, Zhao P, Han J. Evaluating event credibility on twitter. In: Proceedings of the 2012 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics. April 2012; 153–164.
Ratkiewicz J et al. Truthy: Mapping the spread of AstroTurf in microblog streams. Proceedings of the 20th international conference companion on World Wide Web. ACM. 2011.
Mendoza M, Poblete B, Castillo C. Twitter under crisis: Can we trust what we write? In: Proceedings of the first workshop on social media analytics (New York, NY, USA, 2010), SOMA ’10. ACM. 71–79p.
Barbier G, Liu H. Information provenance in social media. Social Computing, Behavioral-Cultural Modeling and Prediction. 2011; 276–283p.
Guha R, Kumar R, Raghavan P et al. Propagation of trust and distrust. In: Proceedings of the 13th international conference on World Wide Web. ACM. 2004; 403–412p.
Saikaew KR, Noyunsan C. Features for measuring credibility on Facebook information. International Scholarly and Scientific Research & Innovation. 2015; 9.1: 174–177p.
Basharat S, Chachoo M. On Linear vs. Hybrid Configuration: An Empirical Study. In: Commune. 2015; 180–184p.
- There are currently no refbacks.