Open Access Open Access  Restricted Access Subscription or Fee Access

A Survey on Different Clustering Algorithms with Their Major Features

Pratishtha Singh Baghel, Divakar Singh

Abstract


Data mining techniques make it possible to search large amounts of data for characteristic rules and patterns. Clustering is used to organize data for efficient retrieval. The aim is to create homogeneous subgroups of examples. The individuals in the same subgroup are similar; the individuals in different subgroups are as different as possible. One of the problems in clustering is the identification of clusters in given data. A popular technique for clustering is based on K-means such that the data is partitioned into K clusters. K‐means is a clustering (unsupervised learning) algorithm. In this method, the number of clusters is pre defined and the technique is highly dependent on the initial identification of elements that represent the clusters well. But we cannot change number of cluster at mid of execution of algorithm. But in k-mean, important factor is that how many clusters we should take, it may be less and it may be more. This paper gives an overview of different clustering algorithms used in large data sets. It describes about the general working behaviour, and the methodologies followed on these approaches and the parameters which used in these algorithms with large data sets.

Keywords: Data-mining, association, clustering


Full Text:

PDF

References


Berry M. J. A., Linoff G. Data Mining Techniques for Marketing, Sales and Customer Support, USA: John Wiley and Sons, 1997.

Fayyad U. M, Piatetsky-Shapiro G., Smyth P., et al. Advances in Knowledge

Discovery and Data Mining. Menlo Park, Calif: AAAI Press, 1996.

Jiawei Han, Micheline Kamber, Data Mining Concepts and Techniques, 2nd ed., Morgan Kauffman, 2006. ISBN 1-55860-901–6.

Literature Review: Data Mining, Available form: http://nccur.lib.nccu.edu.twlbitstream/ 140.1 I 9/3523 I/S/35603 I OS.pdf, Retrieved on June 2012.

Dr. Gary Parker, Data Mining: Modules in Emerging Fields, CD-ROM, Vol. 7; 2004.

Meta Group Inc. Data Mining: Trends, Technology, and Implementation Imperatives. Stamford, CT, February 1997.

Goebel M., Grunewald L., A Survey of Knowledge Discovery and Data Mining Tools. Technical Report, University of Oklahoma, School of Computer Science, Norman, OK, February 1998.

Krzysztof Koperski, Junas Adhikary, Jiawei Han. Spatial Data Mining: Progress and Challenges Survey Paper, School of Computer Science Simon Fraser University Burnaby, B.C.Canada V5A IS6.

B¨Ohm C., Kailing K., Kriegel H.-P., et al. Density Connected Clustering with Local Subspace Preferences. In Proceedings of the 4th International Conference on Data Mining (ICDM), Brighton, UK, 2004.

Xu R. Survey of Clustering Algorithms. IEEE T Neural Networ 2005; 16(3): 645–678p.

Guha S., Rastogi R., Shim K., ROCK: A Robust Clustering Algorithm for Categorical Attributes, Inform Syst 2000; 25(5): 345–366p. 12. Gholamhosein Sheikholeslami, Surojit Chatterjee, Aidong Zhang. Wave Cluster: A Multi-Resolution Clustering Approach for Very large databases. VLDB J 2000; 8(3-4): 289–304p.

Sanjay Goil, Harsha Nagesh, Alok Choudhary. MAFIA: Efficient and Scalable Clustering for Very Large Data Sets: Technical Report No. CPDC – TR – 9906 – 010 ©1999 Center for Parallel and distributed Computing. June 1999.


Refbacks

  • There are currently no refbacks.