Spoken English Digit Classification Using Supervised Learning
Published: 2019
Author(s) Name: Maddimsetti Srinivas, Kasiprasad Mannepalli and G. L. P . Ashok |
Author(s) Affiliation: Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur(Dt), Andhra Pradesh, India.
Locked
Subscribed
Available for All
Abstract
Multiclass classification is a fundamental problem for many speech recognition systems. Spoken digit recognition is a multiclass problem of 10 classes. Present paper using Support Vector Machine (SVM) and K-Nearest-Neighbour (KNN) and Ensemble method i.e., Random Forest (RF) to English digit classification. Caffe speech dataset of 2400 input instances (15 speakers*16 repetitions*10 digits) used for experiments. Mel Frequency Cepstral Coefficients (MFCC) features are formed for all input instances. The dataset is divided into training set and testing set with 10%, 30% and 50% of dataset as testing set. Confusion matrices generated with all test cases for all classification methods. Performance of Ensemble method is high compared to SVM and KNN at different number of frames. The highest accuracy achieved by RF method is 97.50% by taking 10% testing data.
Keywords: Caffe, Ensemble methods, KNN, MFCC, Random Forest (RF), Spoken english digit, SVM.
View PDF