Data Mining and Analysis by R Language for Business Research: A Case Study on Stress and its Influence on Health
Published: 2013
Author(s) Name: Kamakshaiah Musunuru |
Author(s) Affiliation: Associate Professor, Business Analytics, Dhruva College of Management, Hyderabad, India.
Locked
Subscribed
Available for All
Abstract
R is not only a statistical suite but also efficient data
mining software for data manipulation, calculation and
graphical display. In fact, R being a language also has
an effective data handling and storage facility. Besides
having a suite of operators for calculations on arrays, in
particular matrices. The R is developed from a simple
and effective programming language (called āSā) which
includes conditionals, loops; user defined recursive
functions and input and output facilities. Methods: In
this paper the data mining capabilities of R has been
explained with the help of a study on secondary data
sources, obtained from certain authenticated sources.
The study is all about to understand stress with respect
to certain other factors like heavy drinking, perceived
health and life satisfaction. As it mentioned the data
so used is secondary in nature, which is in its crude
from having no sense to the user. But by a systematic
execution of certain data mining tools, like correlation
and MANOVA, certain important relationships along
with ties were realized. Conclusions: The realizations
were that all variables are strictly correlated with Karl
Pearson correlation coefficient ranging from 0.73 to
0.99. In significant test all variables do not belie with
alternative hypothesis, which means the association/
relationship is not zero. In MANOVA, the null
hypothesis is rejected as the p-value is less than 0.05.
Apart from this, most interestingly the variables are
behaving like cohorts whereby resulting cohort effect.
Keywords: R Language, Rstudio, Secondary Data, Data Mining, Correlation, Manova
View PDF