Open Access Open Access  Restricted Access Subscription or Fee Access

Boosting Decision Tree Algorithm for Weather Prediction

Divya Chauhan, Jawahar Thakur

Abstract


Data mining is the computer assisted process which predicts behaviors and future trends by digging through and analyzing enormous sets of data and then extracting the meaningful data. It is used to answer questions that traditionally were very time consuming to resolve. This feature makes it useful to predict meteorological data that is weather prediction. Weather prediction is a necessary application in meteorology and has been one of the most scientifically and technologically challenging problems across the world in the last century. Predicting the weather is essential to help people preparing for the best and the worst of the climate. Accurately predicting the weather has been one of the most challenging problems around the world. Many weather predictions like thunderstorm prediction, rainfall prediction, predicting wind conditions, and cloud burst are major challenges for atmospheric research. The data for this work is authentic and has been collected from India Meteorological Department for city Shimla (India) from the period of January 2010 to December 2013. This paper compared three decision tree algorithms named C4.5, CART, LMT without boosting and with boosting their performance by using boosting algorithm AdaBoost. AdaBoost was run with three different iterations. It has been observed that with every increase in iteration, the accuracy of decision tree algorithm is improved. It has been also observed that LMT shows the best improvement among the decision tree algorithms after boosting. Therefore, boosted LMT is used to predict the weather of Shimla. The prediction accuracy of boosted LMT is nearly 100% when taken yearly and comparatively better than un-boosted LMT when the dataset is taken separately for each month.

Keywords: data mining, decision trees, boosting, prediction, C4.5, CART, LMT,
AdaBoost


Full Text:

PDF

References


International Conference on 31, 163–167, DOI: 10.1109/ WISM, (2013).

Rokach L, Maomom O. Data Mining with Decision Tree: Theory and Application, World scientific publishing Co. Pte Ltd., 2008.

Han J, Kamber M. Data Mining, Concepts and Techniques, Morgan Kaufmann Publishers, 2000.

Brachman R, Khabaza T, Kloesgan W, et al. Mining Buisness Databases, Comm. ACM, 1996; 39(11): 42–8p.

Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edn., Springer; 2009.

Fayyad UM, Piatetsky-Shapiro G, Smyth P. From Data Mining to Knowledge Discovery in Databases, AI Magazine, 1996; 17: 37–54p.

Schapire RE. The strength of weak learnability machine learning 1990; 5(2).

Ayash RM. Research Methodologies in Computer Science and Information Systems http://www.ptcdb.edu.ps/ar/sites/default/files/

Kotsiantis S. Using Data Mining Techniques for Estimating Minimum, Maximum and Average Daily Temperature Values, World Academy of Science, Engineering and Technology. 2007; 450–4p.

Zaiane OR. Chapter I: Introduction to Data Mining, CMPUT690 Principles of Knowledge Discovery in Databases, 1990.

Singh D, Ganju A, Singh A. Weather prediction using nearest neighbor model. Current Science, 2005; 88(8) 25: 1283–9p.

Ertoz L, Steinbach M, Kumar V. Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data. Proceedings of the 3rd SIAM International Conference on Data Mining, 2003; San Francisco, CA, USA.

Hall P, Park BU, Samworth RJ. Choice of neighbor order in nearest-neighbor classification. Annals of statistics 2008; 36(5): 2135–52p.

Available from:http://en.wikipedia.org/wiki/Decision_tree_learning.

Han J, Kamber M. Data Mining Concepts and Techniques. 2nd Edn., Morgan Kaufman; 2006.

Cohen J, Cohen P, West SG, et al. Applied multiple regression/correlation analysis for the behavioral sciences, 2nd Edn., Hillsdale, NJ: Lawrence Erlbaum Associates, 2003.

Ji SY, Sharma S, Yu B, et al. Designing a Rule-Based Hourly Rainfall Prediction Model, IEEE. 2012; 978-1-4673-2284-3/12, 303–8p.

Hemalatha P. Implementation of Data Mining Techniques for Weather Report Guidance for Ships Using Global Positioning System, Int J Of Comput Eng Research. 2013; 3(3):198–202p

Olaiya F, Adeyemo AB. Application of Data Mining Techniques in Weather Prediction and Climate Change Studies. I.J. Information Eng. and Electronic Business. 2012; DOI: 10.5815/ijieeb. 2012.01.07: 51–9p.

Landwehr N, Hall M, Frank E. Logistic Model Tree, Kluwer Academic Publishers. 2006; 1–99p.

Gupta DL, Malviya AK, Singh S. Performance Analysis of Classification Tree Learning Algorithm. Int J of Comput Appl. 2012; 55(6): 39–44p.

Petre EG. A Decision Tree for Weather Prediction. Buletinul, 2009; Vol. LXI No. 1, 77–82p.

Kannan S, Ghosh S. Prediction of daily rainfall state in a river basin usingstatistical downscaling from GCM output. Springer-Verlag, 2010.

Freund Y. Boosting a weak learning algorithm by majority, Information and Computation, 1995; 121(2).

Kankanala P., Das S, Pahwa A. ADABOOST+: An ensemble learning approach for estimating weather-related outages in distribution systems. IEEE Transactions on Power Systems, 2014; 29(1): 359–67p.

26. Ganatra A, Kosta YP. Uncertainty and Climate Change and its effect on Generalization and Prediction abilities by creating Diverse Classifiers and Feature Section Methods using Information Fusion. IJDMS. 2010; 2(4).

Perler D, Marchand O. A Study in Weather Model Output Postprocessing: Using the Boosting Method for Thunderstorm Detection, American Meteorological Society. 2009; 24: 221–2p.

Kongara VS, Punyasesudu D. Data Warehousing and Data Mining Applications for Atmospheric Studies. Proceedings of 5th IACEECE. 2013; 978–93-82702-30-6.

Joseph J, Ratheesh TK. Rainfall Prediction using Data Mining Techniques. Int J of Comput Appl. 2013; 83(8): 0975–8887p.

Nagalakshmi R, Usha M, Ramanathan RMAN. Application of Data Mining Techniques in Maximum Temperature Forecasting: A Comprehensive Literature Review. IJARCSMS. 2013; 2321-7782, 1–9p.

Kohail SN, El-Halees AM. Implementation of Data Mining Techniques for Meteorological Data Analysis (A case study for Gaza Strip). IJICT, 2011; 1(3): 96–100p.

Adhatrao K, Gaykar A, Dhawan A, et al. Predicting Students’ Performance Using ID3 And C4.5 classification Algorithms. IJDKP. 2013; 3(5): 39–52p.

Zhang Z, Xie X. Research on AdaBoost.Ml with Random Forest. IEEE, 2010; Vol 1.

An TK, Kim MH. A New Diverse AdaBoost Classifier. International Conference on Artificial Intelligence and Computational Intelligence, IEEE. 2010; 59–63p.

Kalyankar MA, Alaspurkar SJ. Data Mining Technique to Analyse the Metrological Data. IJARCSSE. 2013; 3(2): 114–8p.

Palmer TN. Predicting uncertainty in forecasts of weather and climate. ECMWF Technical memorandum No. 294, 2003; 1–48p.


Refbacks

  • There are currently no refbacks.


This site has been shifted to https://stmcomputers.stmjournals.com/