Dwijayanti, Suci (2022) Speaker Identification Using a Convolutional Neural Network. JURNAL RESTI, 6 (1). pp. 140-145. ISSN 2580-0760
Text
resti.pdf - Published Version Download (459kB) |
Abstract
Speech, a mode of communication between humans and machines, has various applications, including biometric systems for identifying people have access to secure systems. Feature extraction is an important factor in speech recognition with high accuracy. Therefore, we implemented a spectrogram, which is a pictorial representation of speech in terms of raw features, to identify speakers. These features were inputted into a convolutional neural network (CNN), and a CNN-visual geometry group (CNN-VGG) architecture was used to recognize the speakers. We used 780 primary data from 78 speakers, and each speaker uttered a number in Bahasa Indonesia. The proposed architecture, CNN-VGG-f, has a learning rate of 0.001, batch size of 256, and epoch of 100. The results indicate that this architecture can generate a suitable model for speaker identification. A spectrogram was used to determine the best features for identifying the speakers. The proposed method exhibited an accuracy of 98.78%, which is significantly higher than the accuracies of the method involving Mel-frequency cepstral coefficients (MFCCs; 34.62%) and the combination of MFCCs and deltas (26.92%). Overall, CNN-VGG-f with the spectrogram can identify 77 speakers from the samples, validating the usefulness of the combination of spectrograms and CNN in speech recognition applications.
Item Type: | Article |
---|---|
Subjects: | #3 Repository of Lecturer Academic Credit Systems (TPAK) > Articles Access for TPAK (Not Open Sources) |
Divisions: | 03-Faculty of Engineering > 20201-Electrical Engineering (S1) |
Depositing User: | Ms Suci Dwijayanti |
Date Deposited: | 25 May 2023 00:27 |
Last Modified: | 25 May 2023 00:27 |
URI: | http://repository.unsri.ac.id/id/eprint/105014 |
Actions (login required)
View Item |