Open Access Open Access  Restricted Access Subscription or Fee Access

An Empirical Study on Evaluating Graph Based Clustering for HD Data Using Attribute Selection

V. Hemapriya, K.P.N.V. Satya Sree, K.V. Narasimha Reddy

Abstract


An attribute subset selection can be showed as a process of identifying and eliminating or removing a number of irrelevant and surplus attributes (features) because irrelevant attributes do not give predictive accuracy and the surplus attributes provide the information that is already present in the other attributes. Attribute selection involves identifying a subset of the most useful attributes that produces the similar results as the final set of results. An attribute (feature) selection algorithm may be evaluated from two points of view. First one concerns the time required to get the subset of attributes and the second one concerns quality of the subset of attributes. Based on these criteria, graph-based clustering for attribute selection algorithm, GRACE is proposed. This algorithm works in two steps. In the first step, attributes are divided into clusters by using graph-theoretic clustering methods. In the second step, most similar attributes that are strongly related to the object classes are selected from each cluster from a subset of attributes. Attributes in different clusters are relatively independent. To ensure the efficiency of this algorithm, the authors implemented the minimum spanning tree clustering method.

Keywords: Graph-based clustering, filter method, attribute subset selection


Full Text:

PDF

References


Almuallim H, Dietterich TG. Algorithms for identifying relevant features. In: Proceedings of the 9th Canadian Conference on AI. 1992; 38–45 p.

Arauzo-Azofra A, Benitez JM, Castro JL. A feature set measure based on relief. In: Proceedings of the Fifth International Conference of Recent Advances in Soft Computing. 2004; 104–9p.

Baker LD. McCallum AK. Distributional clustering of words for text classification. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1998; 96–103p.

Battiti R. Using mutual information for selecting feature in supervised neural net learning. IEEE Transactions on Neural Networks. 1994; 5(4): 737–50p.

Bell DA, Wang H. A formalism for relevance and its application in feature subset selection. Machine Learning. 2000; 41(2): 175–95p.

Hall MA. Correlation-based feature subset selection for machine learning. Ph.D. dissertation. Waikato, Newzeland; Univ. Waikato. 1999.

Biesiada J, Duch W. Features selection for high dimensional data person redundancy based filter. Advances in Soft Computing. 2008; 45; 242C–49p.

Butterworth R, Piatetsky-Shapiro G, Simovicci DA. On feature selection through clustering. In: Proceedings of the Fifth IEEE International Conference on Data Mining Workshops. 2009; 350–55p.

Chanda P, Cho Y, Zhang A, et al. Mining of attribute interaction using information theoretic metrics. In: Proceedings of IEEE International Conference on Data Mining Workshops. 2009; 350–5p.

Dash M, Liu H. Consistency-based search in attribute/feature selection. Artificial Intelligence. 2006; 151(1-2): 1–30p.

John GH, Kohavi R, Pfleger K. Irrelevant features and subset selection problem.

Fayyad U, Irani K. Multi-intervel discritization of continuous valued attributes for classification learning. In: Proceedings of the Thirteenth International Joint Conference on Artificial. 1993; 1022–7p.


Refbacks

  • There are currently no refbacks.