Implementation of Clustering-Based Feature Subset Selection Algorithm-Fast

A.Bemberkar Pankaj; R.Wagh Vinod; Naradhania Mahendra; R.Nikade Sonam

A.Bemberkar Pankaj Department of Computer Engineering, Savitribai Phule Pune University, Imperial College of Engineering and Research, India.
R.Wagh Vinod Department of Computer Engineering, Savitribai Phule Pune University, Imperial College of Engineering and Research, India.
Naradhania Mahendra Department of Computer Engineering, Savitribai Phule Pune University, Imperial College of Engineering and Research, India.
R.Nikade Sonam Department of Computer Engineering, Savitribai Phule Pune University, Imperial College of Engineering and Research, India.

Keywords: Clustering, filter method, subset selection, graph-based clustering, Constructing MST

Abstract

A FAST Subset Selection Algorithm gives the subset of most useful features from the original set of features. Efficiency is depends upon the time required to find out subset of features. The FAST algorithm works in two steps. In the first step, the features are divided into clusters by using graphical theoretic method. In the second step, the most representatives features are selected from each cluster which is totally related to target classes. The feature selection algorithm is implemented from both point of views, among that one is the efficiency which is the time required to find out subsets of the features and another one is effectiveness which is related to the quality of the subset of features. we apply the FAST algorithm on micro array data ,high dimensional images or any text data then it will not only give required subsets of features but also improves the performances of that. Feature section means to identify a required and most useful data from the database.

References

H.Almuallim and T.G.Dietterich, Algorithms for Identifying Relevant Features, In Proceedings of the 9th Canadian Conference of AI, (1992), 38-45.

H.Almuallim and T.G.Dietterich, Learning Boolean Concepts in the Presence of Irrelevant Features and Artificial Intelligence, 69(1/2)(1994), 279-305.

L.D.Baker and A.K.McCallum, Distributional Clustering of Words for Text Classification, In Proceedings of the 21st Annual international ACM SIGIR Conference on Research and Development in information Retrieval, (1998), 96-103.

D.A.Bell and H.Wang, A Formalism for Relevance and Its Application in Feature Subset Selection, 41(2)(2000), 175-195.

R.Butterworth, G.Piatetsky-Shapiro and D.A.Simovici, On Feature Selection Through Clustering, In Proceedings of the Fifth IEEE international Conference on Data Mining (2005), 581-584.

C.Cardie, Used Decision Trees Improving Case-Based learning, In Proc. of Tenth International Conference on the Machine Learning, (1993), 25-32.

P.Chanda, Y.Cho, A.Zhang and M.Ramanathan, Mining of Attribute Interactions Using Information Theoretic Metrics, In Proceedings of IEEE International Conference of Data Mining Workshops (2009), 350-355.

W.Cohen, Fast Effective Rule Induction, In Procedure.12th international Conference on Machine Learning (ICML95) (1995), 115-123.

M.Dash and H.Lliu, Feature Selection for Classification, Intelligent Data Analysis, 1(3)(1997), 131-156.

M.Dash, H.Lliu and H.Motoda, Consistency based feature Selection, Proceedings of the Fourth Pacific Asian Conference on Knowledge Discovery And Data Mining (2000), 98-109.

J.Demsar, Statistical comparison of classifiers over multiple data sets, J.Mach, Learn. Res., 7(2006), 1-30.

F.Fleuret, Fast binary feature selection with conditional mutual Information, In the Journal of the Machine Learning Research, (2004).