Abstract for johnson_eurospeech99

Proc Eurospeech '99 (Budapest, Hungary, September 1999).


S.E. Johnson

September 1999

The problem of labelling speaker turns by automatically segmenting and clustering a continuous audio stream is addressed. A new clustering scheme is presented and evaluated using a clustering efficiency score which treats both agglomerative and divisive clustering strategies equally. Results show an efficiency of 70% can be obtained on both manually and automatically derived segments on the 1996 Hub4 development data.

For the task of identifying potentially unknown anchor speakers within broadcast news shows, the frame classification error rate is very important. To reflect this, a frame-based cluster efficiency is defined and the results show a 90% frame-based efficiency can be achieved. Finally a frame-based comparison between the manually and automatically derived segment/cluster sets shows that approximately one third of the errors are introduced during segmentation and two-thirds during clustering.

