Publication page of Dr.Ir. J.W. Koolwaaij

Automatic Speaker Verification in Telephony: a probabilistic approach

View all publications

Reference

Title: Automatic Speaker Verification in Telephony: a probabilistic approach

Author(s): Johan Koolwaaij

Reference: PhD thesis, University of Nijmegen, 192 pages, contents

Keywords: Speaker Recognition

Paper

Een wetenschappelijke proeve op het gebied van de Letteren

Proefschrift

ter verkrijging van de graad van doctor
aan de Katholieke Universiteit Nijmegen,
volgens het besluit van het College van Decanen in het
openbaar te verdedigen op 12 december 2000
des namiddags om 15:30 uur precies

door

Johannes Wijnandus Koolwaaij

geboren op 22 juni 1973
te Hardinxveld-Giessendam

ISBN: 90-9014043-3

Enjoy the presentation held during the defense.

More pictures of the defense and the party are available...

Introduction

Speech contains information about the identity of the speaker. Often, humans are able to extract the identity information when the speech comes from a speaker they are acquainted with. This process is called speaker recognition, or also voice recognition. Sometimes speaker recognition by humans errs: sons may be mistaken for their fathers when answering the phone, and it may be difficult to recognize the voice of someone you only met a few times, or when a speaker utters only a few words. These examples show that different speakers can sound very similar, and that 'similarity' may very well depend on many factors, including the degree of familiarity with the speaker. Also, the amount of speech that is available will have an impact on recognition accuracy.

For example, the longer the son speaks, the more likely the listener will discover that (s)he is speaking with the son instead of the father. Although the human voice might be intrinsically unique, a very large amount of speech might be necessary to discover the distinguishing features, and a limitation of the amount of speech limits recognition capabilities.

Speaker recognition is a cover term for speaker verification (SV) and speaker identification (SI). Verification answers the question ''Is the speaker who (s)he claims to be?'', whereas identification answers the question ''Who is this speaker?''. In this thesis, we focus on speaker verification, since this approach has a substantial number of feasible applications.

The first major step from speaker verification by humans towards speaker verification by computers was made by Lawrence Kersta when he developed spectrographic voice verification at Bell Labs in the early 1960s. He introduced the term voiceprint for a spectrogram, which was generated by a complicated electro-mechanical device, and his verification algorithm was based upon visual comparison of these voiceprints [Kersta1962]. Although the term voiceprint is a misnomer [Broeders1995] and visual voiceprint comparison cannot cope with the intrinsic physical and linguistic variation in speech, the work by Kersta paved the way for the introduction of automatic speaker verification.

The state-of-the-art approach to automatic speaker verification is to build a stochastic model of a speaker, based on speaker characteristics extracted from the available amount of training speech. During verification, the speaker characteristics extracted from the test speech are compared to the model. If the match is close enough, it is safe to assume that training and test speech were uttered by the same speaker. Extensive overviews of automatic speaker verification technology are presented in [Doddington1985,OShaughnessy1986,Furui1994]. The applications of automatic speaker verification are numerous [Boves1998,Markowitz1999], and can be categorized into civil and forensic applications.

This thesis consists of five papers, preceded by an introductory review. The organization of the introduction is as follows: In chapter 1, speaker verification is placed in the context of other biometric verification technologies, which use an individual's distinguishing traits for identity verification purposes. The fundamentals of speaker verification based on stochastic speaker modeling are described in chapter 2. Potential applications of speaker verification are discussed in chapter 3, in the form of some case studies. The body of this thesis consists of a number of papers published in (or submitted to) scientific journals, covering various topics in automatic speaker verification. These papers are summarized and placed in context in chapter 4. The major conclusions of the research are reported in chapter 5, and the integral versions of the papers are published in the last chapter of this thesis.

List of publications included in this thesis

J.W. Koolwaaij, L. Boves, ''The concept of model check in speaker verification'', Submitted for publication in Speech Communication.
J.W. Koolwaaij, L. Boves, ''Local normalization and delayed decision making in speaker detection and tracking'', Special Issue of Digital Signal Processing: A Review Journal on the NIST Speaker Recognition Workshop, Volume 10, Number 1-3, pp. 113-132, 2000.
F. Bimbot, M. Blomberg, L. Boves, D. Genoud, H.-P. Hutter, C. Jaboulet, J.W. Koolwaaij, J. Lindberg, J.-B. Pierrot, ''An overview of the CAVE project research activities in speaker verification'', Speech Communication, Vol 31, Number 2-3, pp. 155-180, 2000.
J.W. Koolwaaij, L. Boves, ''On decision making in forensic casework'', Forensic Linguistics, the International Journal of Speech, Language and the Law, Vol. 6, Number 2, pp. 242-264, 1999.
L. Boves, J.W. Koolwaaij, ''Speaker verification in WWW applications'', Proceedings of RLA2C: la Reconnaissance du Locuteur et ses Applications Commerciales et Criminalistiques, Avignon, France, pp. 178-181, 1998.

The following cartoon (in Dutch, clipped by Eric Sanders) says:
'No hassle with keys anymore, it recognizes the voice, unless he comes from the pub'.