On the Use of Automatic Speaker Verification Systems in Forensic Casework |
Title: On the Use of Automatic Speaker Verification Systems in Forensic Casework
Author(s): Johan Koolwaaij & Lou Boves
Reference: Proceedings of the Second International Conference on Audio- and Video-based Biometric Person Authentication (AVBPA-99), Washington, USA, pp. 224-229
Keywords: Speaker Recognition
There is a PostScript version (145423 bytes) available.
Automatic Speaker Verification (SV) and Forensic Casework have long been considered as essentially unrelated disciplines, because the former was seen as a one alternative forced choice problem, whereas the latter used to be presented as a an open set identification problem. However, Doddington has pointed out that many forensic cases boil down to the question whether a set of recordings, some of which are definitely from the perpetrator and others from a single suspect, do or do not originate from the same speaker. In other words: many forensic cases can be formulated as a one alternative forced choice problem.
One broad class of cases where automatic SV techniques might prove to be useful in forensic work is in the processing of telephone taps that are made in the investigation of drug trafficking cases. Very often, the perpetrators are foreigners, who speak a language unknown to the police officers but also to the forensic phoneticians. In many cases the police is interested in knowing how many different speakers are involved in a given set of telephone taps. Leaving the speaker recognition task to interpreters has been shown to be unreliable, if only because of possible links between the interpreters and the criminals. Such links are to be expected if the case is investigated in a small language community, where the number of persons who speak the language is small. In these cases a text-independent SV system might be of great help.
In all stages of forensic applications of speaker recognition it is
important that one is
able to state a confidence interval for conclusions regarding
the identity of the voices of a known suspect and an unknown
perpetrator. If the statement must be used in a court, a specification
of the confidence level is necessary to allow the judge to weigh
this piece of evidence. If it is to be used during the police
investigation, confidence levels will be used to weigh the evidence in
setting priorities for investigating specific suspects. In the harassment
case described in this paper, the confidence statement was used to
decide on how to proceed with the investigation.
It is well known that forensic phoneticians often have difficulty in making estimates of the confidence level with which they can identify a person by her/his voice. Thus, forensic case workers are interested to know to what extent the use of automatic SV systems could be used to obtain an 'objective' confidence estimate.
In this paper we investigate the implications of using an SV system to
estimate the confidence level for an identity statement on the basis of
a specific case that was brought to our attention by a Dutch private
investigations bureau. A male person left obscene messages in the voice
mail boxes of female employees of a large IT company. The calls
could be traced to handsets in in-house classrooms. Three victims
identify the same colleague as the likely perpetrator, but the accused
person denied all charges, and agreed to collaborate in a test in which
he read transcripts of the messages. The speech was recorded in one of the
classrooms, using the same handset type and the same voice mail system as
during the harassing calls.
However, while the harassment calls were whispered, probably with the intent to
sound 'sexy', the test calls were read with normal voice. Approximately one
month after the test recordings the harassing calls started again, in a
whispery voice and from the same classrooms. Now, the obvious question is
whether the two sets of harassing calls have been made by the same speaker,
and whether this speaker is the same person as the one who read the
transcripts. Obviously, this problem can be cast in the form of a one
alternative forced choice problem: we can take the test calls for building a
voice pattern of a known speaker, and try to answer the question whether all
harassing calls have been made by the same person.
In this paper we take this case as the starting point to investigate to
the contingencies of applying the procedures and technology developed
for Automatic Speaker Verification to forensic cases that can be
formulated as speaker verification problems.
Error processing SSI file