Performance of Speaker Recognition System in Mismatching Speaking Style

Pinky J. Brahmbhatt; K. G. Maradia

Performance of Speaker Recognition System in Mismatching Speaking Style

Pinky J. Brahmbhatt, K. G. Maradia

Abstract

Analysis of speaker recognition system under different speaking style and mismatch conditions is presented. Generally, in text independent speaker recognition system, the way of speaking style used remains the same in training and testing phase. Whisper speech is generally used quietly to convey secret information or to avoid disturbing others in quiet place, so hearing of speech is limited to nearby listener only. Source of excitation and vocal tractsystem contains speaker specific information. Vocal folds do not vibrate in whisper speech so source excitation related information is not present. Fast speech is a tendency to speak rapidly, as if motivated by urgency unobvious to listener. The CHAINS speech corpus: CHAracterizing INdividual Speakers database in whisper, solo and fast speaking style is used for performing experiments with Gaussian Mixture Modeling- Universal Background Modeling (GMM-UBM) approach with the most widely used feature Mel Frequency Cepstral Coefficients (MFCC). The mismatch condition in training and testing speaking style is observed from the experiments and the performance degradation is found to be high when mismatch of train-test condition is considered.

Cite this Article

Pinky J. Brahmbhatt1, K.G. Maradia, Performance of Speaker Recognition System in Mismatching Speaking Style. Journal of Artificial Intelligence Research & Advances. 2018; 5(3): 58–65p.

Keywords

whisper speech; fast speech; CHAINS database; GMM-UBM; MFCC

Full Text:

PDF

References

Douglas O’Shaughnessy. Speech Communications: Human and Machine, second edition. New Jersey: Wiley-IEEE Press; 2009.

Kinnunen T, Li H. An overview of text-independent speaker recognition: From features to supervectors. Speech Communication. 2010; 52 (1): 12-40p.

Davis SB, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transaction on Acoustics, Speech, Signal Processing. 1980; 28: 357-366p.

Huang X, Acero A, Hon H. Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. New Jersey: Prentice-Hall; 2001.

Douglas A Reynolds, Quatieri T, Dunn R. Speaker verification using adapted Gaussian mixture models. Digital Signal Process. 2000; 10(1): 19-41p.

Thomas F Quatieri. Discrete-Time Speech Signal Processing: Principles and Practice. Upper Saddle River, NJ, USA: Prentice Hall Press; 2001.

Krishnamoorthy P, Jayanna HS, Prasanna SRM. Speaker recognition under limited data condition by noise addition. Expert Systems with Applications. 2011; 38: 13487-13490p.

Cummins, Fred. CHAINS: Characterizing Individual Speakers. [Online] CHAINS database, available from http://chains.ucd.ie/ [Accessed December 2018].

Microsoft Research. (2013). Emerging Technology, Computer, and Software Research. [Online] MSR identity toolkit, Microsoft research, available from http://research.microsoft.com/ [Accessed December 2018].

JPC Jr. Speaker recognition: a tutorial. Proceedings of the IEEE. 1999; 85 (9): 1437–1462p.

Thomas F Quatieri, Robert B Dunn, Douglas A Reynolds. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing. 2000; 10: 19-41p.

Douglas Reynolds. Speaker identification and verification using Gaussian Mixture Models. Speech Communications. 1995; 17: 91–108p.

Refbacks

There are currently no refbacks.

This site has been shifted to https://stmcomputers.stmjournals.com/

Username
Password
Remember me