Clustering and Classifying Diabetic Data Sets Using K-Means Algorithm
Published: 2013
Author(s) Name: M. Kothainayaki, P. Thangaraj |
Author(s) Affiliation: Bannari Amman Institute of Technology, Sathyamangalam, Tamil Nadu, India.
Locked
Subscribed
Available for All
Abstract
The k-means algorithm is well known for its efficiency
in clustering large data sets. However, working only on
numeric values prohibits it from being used to cluster
real world data containing categorical values. In this
paper we present the Classification of diabetic’s data
set and the k-means algorithm to categorical domains.
Before classify the data set preprocessing of data set
is done to remove the noise in the data set. We use
the missing value algorithm to replace the null values
in the data set. This algorithm is also used to improve
the classification rate and cluster the data set using
two attributes namely plasma and pregnancy attribute.
Keywords: Classification, Cluster Analysis, Clustering Algorithms, Categorical Data, Pre-processing
View PDF