Fake News Detection Using N-Gram Analysis and Machine Learning Algorithms

Asha John, Meenakowshalya A

Abstract


Fake news is untrue information presented as news. Fake news easily spread than real news amongst social networking sites. Detection of fake news is an important emerging research area which is gaining popularity. The main challenge in fake news detection is limited availability of resources (datasets). This project work detects fake news using n-gram analysis and machine learning algorithms. Evaluation and comparison between two feature extraction technique namely Term Frequency (TF)and Term Frequency-Inverted Document Frequency (TF-IDF) and six machine learning algorithms namely like Support Vector Machines (SVM), Linear Support Vector Machines (LSVM), K-Nearest Neighbor (KNN), Stochastic Gradient Descent (SGD), Decision Trees (DT), Logistic Regression (LR) are performed. The evaluation yields best performance exploiting feature extraction technique as Term Frequency-Inverted Document Frequency (TF-IDF) as and Linear SVM and SGD (Stochastic Gradient Descent) as machine learning algorithm with an accuracy of 93.5%. In SGD Performance tuning is applied to increase the accuracy. Comparing Grid Search CV and Random Search CV, the later gives better accuracy of 94.2%.


Keywords


News detection, feature extraction, N-gram analysis, SVM, LSVM, KNN, DT, SGD, LR, GSCV, RSCV

References


Qbeitah MA, Aldwairi M. Dynamic malware analysis of phishing emails. 9th International

Conference on Information and Communication Systems (ICICS). May 7. Irbid, Jordan, United

States: IEEE Publications; 2018; Apr 3–5 2018. p. 18–24.

Horne BD, Adali, et al. This just in: fake news packs a lot in title, uses simpler, repetitive content

in text body, more similar to satire than real news. 2nd International Workshop on News and

Public Opinion at ICWSM. Vol. 1; 2017.

Rubin V, Conroy N. Fake news or truth? Using satirical cues to detect potentially misleading

news. Proceedings of the second workshop on computational approaches to deception detection.

Vol. 2016. San Diego: Association for Computational Linguistics; 2016 Jul. p. 7–17.

Horne BD, Adali S. This just in: fake news packs a lot in title, uses simpler, repetitive content in

text body, more similar to satire than real news. 2nd International Workshop on News and Public

Opinion at ICWSM. Vol. 1017(Mar 28). United States: Association for the Advancement of

Artificial Intelligence; 2017. p. 759–66.

Silverman C, Singer-Vine BF. News. Vol. 2016; 2016, Dec 6. Most Americans who see fake

news believe it; new survey says (Online). Available from:

https://www.buzzfeednews.com/article/craigsilverman/fake-news-survey.

Chen Y, Conroy NJ, Rubin VL. News in an online world: the need for an “automatic crap

detector”. Inf Sci Technol. 2015;52(1):81:(1–4).

Spicer RN. Free speech and false speech: lies, damn lies, alternative facts, fake news, propaganda,

Pinocchios, pants on fire, disinformation, misinformation, post-truth, data, and statistics. New

York City: Springer International Publishing. p. 2018.1–31.

8Brewer PR, Young DG, Morreale M. The impact of real news about fake news. Int J Public Opin

Res. 2013;25(3):323–43. doi: 10.1093/ijpor/edt015.

Prasetijo AB, Rizal Isnanto R, Eridani D, et al. Hoax detection system on Indonesian news sites

based on text classification using SVM and SGD. Comput Electr Eng. 4th International

Conference on Information Tech. Jan 15. Semarang, Indonesia, New York: IEEE Publications;

, Oct 18–19. p. 2018.

Westerman D, Spence PR, Van Der Heide B. Social media as information source: recency of

updates and credibility of information. J Comput Mediated Commun. 2014;19(2):171–83. doi:

1111/jcc4.12041.

Steni Mol TS, Sreeja PS. Fake news detection on social media-A review. Test Eng Manag.

;83:12997–3003.

Suvarna R, Kowshalya AM. Credit card Fraud Detection Using deep learning techniques. Journal

of Web Engineering & Technology. 2020;7(1):30–47.

Hiramath CK, Deshpande GC. Fake news detection using deep learning. Techniques. 1st

International Conference on Advances in Information Technology. Feb 14. Chikmagalur, India,

New York: IEEE Publications; 2020; 2019 Jul 25–26. p. 411–5.

Conroy NK, Rubin VL, Chen Y. Automatic deception detection: methods for finding fake news.

Conroy NJ, Rubin VL, Chen Y. Automatic deception detection: methods for finding fake news.

Inf Sci Technol. 2016;52(1):82:(1–4).

Gottfried J, Shearer E, Pew Research Center. News use across social media platforms; 2016, May

Available from: https://www.journalism.org/2016/05/26/news-use-across-social-mediaplatforms-

/2016.

Gottfried J, et al. The 2016 presidential campaign–a news event that’s hard to miss. Pew Research

Center; 2016, Feb 4. Available from: https://www.journalism.org/2016/02/04/the-2016-

presidential-campaign-a-news-event-thats-hard-to-miss/2016.

Kesarwani A, Chauhan SS, Nair AR. Fake news detection on social media using K-nearest

neighbor classifier. International Conference on Advances in Computing and Communication

Engineering. Vol. 2020(Jun 22–24). Las Vegas, New York: IEEE Publications; 2020, Jun. p. 2020

Aug 4. 1–4.

Mahir EM, Akhter S, et al. Detecting fake news using machine learning and deep learning

algorithms. 7th International Conference on Smart Computing & Communications. Sep 19. 1–5.

Sarawak, Malaysia, New York: IEEE Publications; 2019 Jun 28–30. p. 2019.

Rubin VL, Chen Y, Conroy NK. Deception detection for news: three types of fakes. Proc Assoc

Info Sci Tech. Proceedings of the 78th ASIS&T annual meeting: information science with impact:

research in and for the community. 2015;52(1):1–4. doi: 10.1002/pra2.2015.145052010083.

Burfoot C, Baldwin T. Automatic satire detection: are you having a laugh? Proceedings of the

ACL-IJCNLP 2009 conference short papers. Vol. 2009. Singapore: Suntec; 2009 Aug 4. p. 161–


Refbacks

  • There are currently no refbacks.