Home Page

Saike HE | Beijing University of Posts and Telecommunications

Best-Cut Clustering Algorithm >

Speaker Diarization >

Evolutionary Clustering >

Best-Cut Clustering Algorithm

Best-Cut [1] is a graph partitioning method devised for coreference resolution. It is inspired from the well-know graph-partitioning algorithm graph-partitioning algorithm Min-Cut [2]. It represents the coreference space as an undirected edge-weighted graph in which the nodes represent all the mentions from a text, whereas the edges between nodes constitute the confidence values derived from the coreference classification phase. In order to detect the entities referred in the text, Best-Cut algorithm needs to partition the graph such that all nodes in each subgraph refer to the same entity. We made 2 modifications on the original Best-Cut algorithm:

(1) we added a weight coefficient to adjust the score of cut weight at
each partition.

(2) we utilized nerual_networks to training a stopping model for the
terminal condition of clustering.

This algorithm has been employed in our Coreference_Resolution system for NIST ACE08 Evaluation.

Reference:

[1] " Bestcut: A graph algorithm for coreference resolution" , C. Nicolae and G. Nicolae, In Proc. of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006). Sydney, Australia: Association for Computational Linguistics, 275-283 (2006).

[2] " A simple min cut algorithm" , M. Stoer and F. Wagner, In Jan van Leeuwen,editor, Proceedings of the 1994 European Symposium on Algorithms, pages 141-147, New York. Springer-Verlag. (1994).

Speaker Diarization

The task of speaker diarization involves speaker segmentation and speaker clustering.

The goal of a speaker segmentation system is to divide a speech signal into a sequence of speaker-homogeneous regions. Thus, the output of such a system provides the answer to the question, "Who spoke when?"

However, to guarantee the purity of each speech segment, it highly possible that a continuous speech region would be over-segmented into several speech chips. Thus, there is a challenge to identify which portions of the speech belong to which speakers -- the "speaker clustering" problem. Accordingly, the speaker clustering problem requires that we correctly identify how many unique speakers occur in the speech recording.

The segmentation module in our speaker tracking system is based on speaker turn detection. While, the clustering model is implemented in 3 methods: Linear Clustering, BIC Hierarchical Clustering and CLR Hierarchical Clustering.

Linear Clustering:

Step-1: Initially, each segment is considered as a single cluster, upon which a single Gauss Model is constructed. This is based on the consideration that there is not enough data in each cluster (only one segment initially), therefore, single Gauss Model is more appropriate than a complex one, such as GMM.

       Step-2: Checking each 2 adjoining segments.
       (1) If their distance measured by Bayesian_Information_Criterion (BIC) is less than 0, which indicates their combination would be a better choice, then they will be merged [1], a new single Gauss model is also rebuilt on such merge result accordingly.
       (2) If all the distances are equal or greater than 0, then go to step 3.

Step-3: Stop Clustering.

BIC Hierarchical Clustering:

Step-1: Taking the clustering result of the "Linear Clustering" aforementioned as the initial state. Still, each cluster is constructed with a single Gauss model.

       Step-2: Checking each pair of clusters in the current clustering state (they are not need to be adjoining any more).
        (1) If their BIC distance is less than 0, then they would be merged.
        (2) If all the distances are equal or greater than 0, then go to step 3.

Step-3: Stop Clustering.

CLR Hierarchical Clustering:

As the pure clustering phrase would disperse data from the same speaker into different clusters, due to noise factors such as: clean speech, speech with noise or speech with music .... Thus, the acoustic background is necessary to be normalized, then CLR (Cross Likelihood Ratio) Hierarchical Clustering [2] is conducted to modify the results given by "BIC Hierarchical Clustering". The criterion for CLR is given by:

Reference:

[1] " Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion " , Scott_S._Chen and P._S._Gopalakrishnan, In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop (February 1998).

[2] " SPEAKER DIARIZATION: COMBINATION OF THE LIUM AND IRIT SYSTEMS " , E. El Khoury, S.Meigner, and C.Sénac .

Evolutionary Clustering

The concept of "dynamic networks" came into my view only recently, it usually employs evolutionary clustering methods. However, previous methods suffer from some inherent drawbacks:

(1) assuming only a fixed number of communities over time.

(2) not allowing arbitrary start/stop of community over time.

Catering to such problems, a novel particle-and-density based evolutionary clustering method has been proposed in [1], which models a dynamic network as a collection of lots of particles called nano-communities.

Reference:

[1] "A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks", Min-Soo Kim and Jiawei Han, " Proc. 2009 Int. Conf. on Very Large Data Bases (VLDB'09), Lyon, France, Aug. 2009.

TOP >