聚类效果评价指标:MI, NMI, AMI(互信息,标准化互信息,调整互信息)

简介

在无监督学习中,常见的两种任务为聚类与降维。这里给出三个聚类效果评价指标:互信息,标准化互信息,调整互信息(MI, NMI, AMI),分别给出它们的计算方法与代码。需要指出的是,这三个指标均需要已知数据点的真实标签。

Preliminaries and Notation

已知 N N N D D D 维的数据,构成数据矩阵 X = [ x 1 , x 2 , ⋯   , x N ] ∈ R D × N \mathbf{X} = [x_1, x_2, \cdots, x_N] \in \mathbb{R}^{D\times N} X=[x1,x2,,xN]RD×N

对每个数据 x i ∈ R D × 1 x_i \in \mathbb{R}^{D\times 1} xiRD×1 ,其对应标签(label)为 u i ∈ R u_i \in \mathbb{R} uiR

标签中共有 R R R 个不同的取值,或者说数据共有 R R R 类,其对应的标签组成向量 U = [ u 1 , u 2 , ⋯   , u N ] ∈ R N U = [u_1, u_2, \cdots, u_N] \in \mathbb{R}^N U=[u1,u2,,uN]RN

我们取自然数 1 1 1 N N N 来为这 N N N 个数据点依次编号,并取 S = { 1 , 2 , ⋯   , N } S = \left\{1, 2, \cdots, N\right\} S={1,2,,N} ,则有
S = ∪ i = 1 R U i = { U 1 , U 2 , ⋯   , U R } S = \mathop{\cup}\limits_{i=1}^R U_i = \left\{U_1, U_2, \cdots, U_R\right\} S=i=1RUi={U1,U2,,UR}
其中 U i U_i Ui 代表归属于第 i i i 类的数据集合;如对向量 U = [ 1 , 1 , 2 , 2 , 3 , 3 , 3 ] U = [1, 1, 2, 2, 3, 3, 3] U=[1,1,2,2,3,3,3] ,则有
N = 7 , R = 3 , U 1 = { 1 , 2 } , U 2 = { 3 , 4 } , U 3 = { 5 , 6 , 7 } N = 7, R = 3, U_1 = \left\{1, 2\right\}, U_2 = \left\{3, 4\right\}, U_3 = \left\{5, 6, 7 \right\} N=7,R=3,U1={1,2},U2={3,4},U3={5,6,7}

这里需要指出的是,如果整个数据集共有5类,则我们默认用自然数1至5来代表这5类;

对于 U U U 而言,其包含 R R R 类,则用自然数 1 1 1 R R R 来代表这 R R R 个类别。

所谓聚类问题,即为对原始数据集 X \mathbf{X} X 进行一种划分;通过某种聚类算法(如DBSCAN、谱聚类等),我们可以得到一个对 X \mathbf{X} X 的划分,即 V = [ v 1 , v 2 , ⋯   , v N ] ∈ R N V = [v_1, v_2, \cdots, v_N] \in \mathbb{R}^N V=[v1,v2,,vN]RN

类似地,我们有
S = ∪ i = 1 C V i = { V 1 , V 2 , ⋯   , V C } S = \mathop{\cup}\limits_{i=1}^C V_i = \left\{V_1, V_2, \cdots, V_C \right\} S=i=1CVi={V1,V2,,VC}
其中 V i V_i Vi 代表聚类结果中归属于第 i i i 类的数据集合。

信息熵与列联表(contingency table)

在介绍互信息之前,首先介绍一下香农熵(信息熵)和列联表(contingency table)。

对上述标签向量 U ∈ R N U \in \mathbb{R}^N URN ,其香农熵(信息熵)可以被计算为:
H ( U ) = − ∑ i = 1 R p i log ⁡ p i (1) \text{H}(U) = -\sum_{i=1}^R p_i \log p_i \tag{1} H(U)=i=1Rpilogpi(1)
其中对数函数的底常取 2 2 2 或自然对数 e e e

p i p_i pi 为归属于第 i i i 类的数据个数占数据总量的比例,即
p i = ∣ U i ∣ N (2) p_i = \frac{\left| U_i \right|}{N} \tag{2} pi=NUi(2)

如对向量 U = [ 1 , 1 , 1 ] U = [1, 1, 1] U=[1,1,1] ,则有 p 1 = 1 p_1 = 1 p1=1 H ( U ) = − 1 ⋅ log ⁡ ( 1 ) = 0 \text{H} (U) = - 1 \cdot \log(1) = 0 H(U)=1log(1)=0

对向量 U = [ 1 , 2 , 3 ] U = [1, 2, 3] U=[1,2,3] ,则有 p 1 = p 2 = p 3 = 1 3 p_1 = p_2 = p_3 = \frac{1}{3} p1=p2=p3=31 H ( U ) = − 3 ⋅ 1 3 ⋅ log ⁡ ( 1 3 ) = log ⁡ 3 \text{H} (U) = - 3 \cdot \frac{1}{3} \cdot \log(\frac{1}{3}) = \log 3 H(U)=331log(31)=log3

对向量 U = [ 1 , 2 , 2 ] U = [1, 2, 2] U=[1,2,2] ,则有 p 1 = 1 3 , p 2 = 2 3 p_1 = \frac{1}{3}, p_2 = \frac{2}{3} p1=31,p2=32 H ( U ) = − 1 3 ⋅ log ⁡ ( 1 3 ) − 2 3 ⋅ log ⁡ ( 2 3 ) = log ⁡ 3 − 2 3 ⋅ log ⁡ 2 \text{H} (U) = - \frac{1}{3} \cdot \log(\frac{1}{3}) - \frac{2}{3} \cdot \log(\frac{2}{3}) = \log 3 - \frac{2}{3} \cdot \log2 H(U)=31log(31)32log(32)=log332log2

不难发现,信息熵是非负的,因为 0 < p i ≤ 1 0 < p_i \leq 1 0<pi1

我们取矩阵 M ∈ R R × C M \in \mathbb{R}^{R\times C} MRR×C 为真实标签向量 U U U 与预测标签向量 V V V 的列联表(contingency table),满足
m i j = ∣ U i ∩ V j ∣ (3) m_{ij} = \left|U_i \cap V_j\right| \tag{3} mij=UiVj(3)
其中 i = 1 , 2 , ⋯   , R i = 1, 2, \cdots, R i=1,2,,R j = 1 , 2 , ⋯   , C j = 1, 2, \cdots, C j=1,2,,C

如对向量 U = [ 1 , 1 , 2 , 2 ] , V = [ 1 , 1 , 1 , 2 ] U = [1, 1, 2, 2], V = [1, 1, 1, 2] U=[1,1,2,2],V=[1,1,1,2] ,有
U 1 = { 1 , 2 } U 2 = { 3 , 4 } V 1 = { 1 , 2 , 3 } V 2 = { 4 } \begin{aligned}U_1 &= \left\{1, 2\right\} & U_2 &= \left\{3, 4\right\} \\V_1 &= \left\{1, 2, 3\right\} & V_2 &= \left\{4\right\}\end{aligned} U1V1={1,2}={1,2,3}U2V2={3,4}={4}
因此,
m 11 = ∣ U 1 ∩ V 1 ∣ = ∣ { 1 , 2 } ∣ = 2 m 12 = ∣ U 1 ∩ V 2 ∣ = ∣ ∅ ∣ = 0 m 21 = ∣ U 2 ∩ V 1 ∣ = ∣ { 3 } ∣ = 1 m 22 = ∣ U 2 ∩ V 2 ∣ = ∣ { 4 } ∣ = 1 \begin{aligned} m_{1 1} & = \left|U_1 \cap V_1\right| = \left|\left\{1, 2\right\}\right| = 2 \\ m_{1 2} & = \left|U_1 \cap V_2\right| = \left|\varnothing\right| = 0 \\ m_{2 1} & = \left|U_2 \cap V_1\right| = \left|\left\{3 \right\}\right| = 1 \\ m_{2 2} & = \left|U_2 \cap V_2\right| = \left|\left\{4 \right\}\right| = 1 \\ \end{aligned} m11m12m21m22=U1V1={1,2}=2=U1V2==0=U2V1={3}=1=U2V2={4}=1

列联表为

M = [ 2 0 1 1 ] M = \left[ \begin{array}{l} 2 & 0 \\ 1 & 1 \end{array} \right] M=[2101]

互信息(Mutual information)

互信息的计算公式如下:

MI ( U , V ) = ∑ i = 1 R ∑ j = 1 C p i , j log ⁡ ( p i , j p i × p j ) (4) \text{MI}(U, V) = \sum_{i = 1}^R\sum_{j=1}^C p_{i,j} \log\left(\frac{p_{i, j}}{p_i \times p_j}\right) \tag{4} MI(U,V)=i=1Rj=1Cpi,jlog(pi×pjpi,j)(4)
其中
p i , j = ∣ U i ∩ V j ∣ N = m i j N (5) p_{i,j} = \frac{\left|U_i \cap V_j \right|}{N} = \frac{m_{ij}}{N} \tag{5} pi,j=NUiVj=Nmij(5)

p i = ∣ U i ∣ N , p j = ∣ V j ∣ N p_i = \frac{\left| U_i \right|}{N}, p_j = \frac{\left| V_j \right|}{N} pi=NUi,pj=NVj

如对向量 U = [ 1 , 1 , 2 , 2 ] , V = [ 1 , 1 , 1 , 2 ] U = [1, 1, 2, 2], V = [1, 1, 1, 2] U=[1,1,2,2],V=[1,1,1,2] ,有
p 1 , 1 = m 11 / N = 0.5 p 1 , 2 = m 12 / N = 0 p 2 , 1 = m 21 / N = 0.25 p 2 , 2 = m 22 / N = 0.25 \begin{aligned} p_{1, 1} & = m_{11} / N = 0.5 \\ p_{1, 2} & = m_{12} / N = 0 \\ p_{2, 1} & = m_{21} / N = 0.25 \\ p_{2, 2} & = m_{22} / N = 0.25 \\ \end{aligned} p1,1p1,2p2,1p2,2=m11/N=0.5=m12/N=0=m21/N=0.25=m22/N=0.25

标准化互信息(NMI, Normalized Mutual Information)

通常采用NMI和AMI来作为衡量聚类效果的指标。

标准化互信息的计算方法如下:

NMI ( U , V ) = MI ( U , V ) F ( H ( U ) , H ( V ) ) (6) \text{NMI}(U, V) = \frac{\text{MI}(U, V)}{F\left(\text{H}\left(U\right), \text{H}\left(V\right)\right)} \tag{6} NMI(U,V)=F(H(U),H(V))MI(U,V)(6)

其中 F ( x 1 , x 2 ) F(x_1, x_2) F(x1,x2) 可以为 min ⁡ \min min/ max ⁡ \max max 函数;可以为几何平均,即 F ( x 1 , x 2 ) = x 1 x 2 F(x_1, x_2) = \sqrt{x_1x_2} F(x1,x2)=x1x2 ;可以为算术平均,即 F ( x 1 , x 2 ) = x 1 + x 2 2 F(x_1, x_2) = \frac{x_1 + x_2}{2} F(x1,x2)=2x1+x2

通常我们选取算术平均,则标准化互信息即可被计算为
NMI ( U , V ) = 2 MI ( U , V ) H ( U ) + H ( V ) (7) \text{NMI}(U, V) = 2 \frac{\text{MI}(U, V)}{\text{H}\left(U\right) + \text{H}\left(V\right)} \tag{7} NMI(U,V)=2H(U)+H(V)MI(U,V)(7)

调整互信息(AMI, Adjusted Mutual Information)

调整互信息的计算要复杂一些,其计算方法如下:

AMI ( U , V ) = MI ( U , V ) − E { MI ( U , V ) } F ( H ( U ) , H ( V ) ) − E { MI ( U , V ) } (8) \text{AMI}(U, V) = \frac{\text{MI}(U, V) - \mathbb E\left\{ \text{MI}(U, V) \right\}}{F\left(\text{H}\left(U\right), \text{H}\left(V\right)\right) - \mathbb E\left\{ \text{MI}(U, V) \right\}} \tag{8} AMI(U,V)=F(H(U),H(V))E{MI(U,V)}MI(U,V)E{MI(U,V)}(8)

其中, E { MI ( U , V ) } \mathbb E\left\{ \text{MI}(U, V) \right\} E{MI(U,V)} 为互信息 MI ( U , V ) \text{MI}(U, V) MI(U,V) 的期望,计算方法为
E { MI ( U , V ) } = ∑ i = 1 R ∑ j = 1 C ∑ k = ( a i + b j − N ) + min ⁡ ( a i , b j ) k N log ⁡ ( N × k a i × b j ) a i ! b j ! ( N − a i ) ! ( N − b j ) ! N ! k ! ( a i − k ) ! ( b j − k ) ! ( N − a i − b j + k ) ! (9) \begin{aligned} \mathbb E\left\{ \text{MI}(U, V) \right\} &= \\ \sum_{i=1}^R \sum_{j=1}^C &\sum_{k = \left(a_i + b_j - N \right)^+}^{\min \left(a_i, b_j\right)} \frac{k}{N} \log\left(\frac{N \times k}{a_i \times b_j}\right)\frac{a_i!b_j!\left(N - a_i\right)!\left(N - b_j\right)!}{N!k!\left(a_i - k\right)!\left(b_j - k\right)!\left(N - a_i - b_j + k\right)!} \end{aligned} \tag{9} E{MI(U,V)}i=1Rj=1C=k=(ai+bjN)+min(ai,bj)Nklog(ai×bjN×k)N!k!(aik)!(bjk)!(Naibj+k)!ai!bj!(Nai)!(Nbj)!(9)
其中 ( a i + b j − N ) + \left(a_i + b_j - N \right)^+ (ai+bjN)+ max ⁡ ( 1 , a i + b j − N ) \max \left(1, a_i + b_j - N \right) max(1,ai+bjN)

a i , b j a_i, b_j ai,bj 分别为列联表 M M M 的第 i i i 行和与第 j j j 列和,具体为
a i = ∑ j = 1 C m i j b j = ∑ i = 1 R m i j (10) \begin{aligned} a_i = \sum_{j=1}^C m_{ij} \\ b_j = \sum_{i=1}^R m_{ij} \end{aligned} \tag{10} ai=j=1Cmijbj=i=1Rmij(10)
如果我们选取函数 F ( x 1 , x 2 ) F(x_1, x_2) F(x1,x2) max ⁡ \max max 函数,则调整互信息可被计算为
AMI ( U , V ) = MI ( U , V ) − E { MI ( U , V ) } max ⁡ ( H ( U ) , H ( V ) ) − E { MI ( U , V ) } (11) \text{AMI}(U, V) = \frac{\text{MI}(U, V) - \mathbb E\left\{ \text{MI}(U, V) \right\}}{\max\left(\text{H}\left(U\right), \text{H}\left(V\right)\right) - \mathbb E\left\{ \text{MI}(U, V) \right\}} \tag{11} AMI(U,V)=max(H(U),H(V))E{MI(U,V)}MI(U,V)E{MI(U,V)}(11)

如果我们选取函数 F ( x 1 , x 2 ) F(x_1, x_2) F(x1,x2) 为几何平均,则调整互信息可被计算为

AMI ( U , V ) = MI ( U , V ) − E { MI ( U , V ) } H ( U ) ⋅ H ( V ) − E { MI ( U , V ) } (12) \text{AMI}(U, V) = \frac{\text{MI}(U, V) - \mathbb E\left\{ \text{MI}(U, V) \right\}}{\sqrt{\text{H}\left(U\right) \cdot \text{H}\left(V\right)} - \mathbb E\left\{ \text{MI}(U, V) \right\}} \tag{12} AMI(U,V)=H(U)H(V) E{MI(U,V)}MI(U,V)E{MI(U,V)}(12)

如果我们选取函数 F ( x 1 , x 2 ) F(x_1, x_2) F(x1,x2) 为算术平均,则调整互信息可被计算为

AMI ( U , V ) = MI ( U , V ) − E { MI ( U , V ) } 1 2 ( H ( U ) + H ( V ) ) − E { MI ( U , V ) } (13) \text{AMI}(U, V) = \frac{\text{MI}(U, V) - \mathbb E\left\{ \text{MI}(U, V) \right\}}{\frac{1}{2}\left(\text{H}\left(U\right) + \text{H}\left(V\right)\right) - \mathbb E\left\{ \text{MI}(U, V) \right\}} \tag{13} AMI(U,V)=21(H(U)+H(V))E{MI(U,V)}MI(U,V)E{MI(U,V)}(13)

编程实现

Python中的 sklearn 库里有这三个指标的类,可以直接调用;

matlab中似乎没有找到现成的包,因此自己编写。

python

这里NMI和AMI的计算均采用算术平均; log ⁡ \log log 函数的底为自然对数 e e e

from sklearn.metrics.cluster import entropy, mutual_info_score, normalized_mutual_info_score, adjusted_mutual_info_score

MI = lambda x, y: mutual_info_score(x, y)
NMI = lambda x, y: normalized_mutual_info_score(x, y, average_method='arithmetic')
AMI = lambda x, y: adjusted_mutual_info_score(x, y, average_method='arithmetic')

A = [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3]
B = [1, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 1, 1, 3, 3, 3]
#print(entropy(A))
#print(MI(A, B))
print(NMI(A, B))
print(AMI(A, B))

C = [1, 1, 2, 2, 3, 3, 3]
D = [1, 1, 1, 2, 1, 1, 1]
print(NMI(C, D))
print(AMI(C, D))

matlab

这里的NMI与AMI的代码实现均采用的算术平均, log ⁡ \log log 函数的底采用自然对数 e e e

函数文件:

NMI_AMI.m

function [NMI, AMI] = NMI_AMI(X, Y)
%NMI_AMI return NMI, AMI
% MI: mutual information
% H; entropy
% NMI: normalized mutual infomation
% AMI: adjusted mutual infomation
% NMI(X, Y) = MI(X, Y) / F(H(X), H(Y))
% AMI(X, Y) = (MI(X, Y) - EMI(X, Y)) / (F(H(X) + H(Y)) - EMI(X, Y))
% F(x, y) is a function, can be "mean", "max", "geometric", "arithmetic"
% here we both use arithmetric

NMI = 2 * MI(X, Y) / (H(X) + H(Y));

AMI = (MI(X, Y) - EMI(X, Y)) / (1/2 * (H(X) + H(Y)) - EMI(X, Y));

end



function [res] = MI(X, Y)
%MI mutual infomation

n = length(X);
X_list = unique(X);
Y_list = unique(Y);
res = 0;
for x = X_list
    for y = Y_list
        loc_x = find(X == x);
        loc_y = find(Y == y);
        loc_xy = intersect(loc_x, loc_y);
        res = res + length(loc_xy) / n * log(length(loc_xy) / n / ((length(loc_x) / n) * (length(loc_y) / n)) + eps);
    end
end

end



function [res] = H(X)
%H information entropy

n = length(X);
X_list = unique(X);
res = 0;

for x = X_list
    loc = find(X == x);
    px = length(loc) / n;
    res = res - px * log(px);
end

end



function [res] = f(a, b)
% F calculate a! / b!
% sometimes a and b can be very large, hence, directly calculate a! or b! is not
% suitable; but maybe a-b is small; 
% a,b should both be positive integers

res = 1;
if a > b
    for i = b+1:a
        res = res * i;
    end
elseif a < b
    for i = a+1:b
        res = res / i;
    end
else
    res = 1;
end

end



function [res] = EMI(U, V)
% EMI expected mutual information, E[MI(X, Y)]

N = length(U);

U_list = unique(U);
V_list = unique(V);
R = length(U_list);
C = length(V_list);

M = zeros(R, C);
for i = 1:R
    for j = 1:C
        U_loc = find(U == U_list(i));
        V_loc = find(V == V_list(j));
        M(i, j) = length(intersect(U_loc, V_loc));
    end
end

a = sum(M, 2);
b = sum(M, 1);
res = 0;

for i = 1:R
    for j = 1:C
        for nij = max(a(i) + b(j) - N, 1):min(a(i), b(j))
            res = res + nij / N * log(N * nij / (a(i) * b(j)) + eps) * f(a(i), a(i) - nij) * f(b(j), b(j) - nij) * f(N - a(i), N) * f(N - b(j), N - a(i) - b(j) + nij) / factorial(nij);
        end
    end
end

end

主文件(或脚本):

clc;
format long

A = [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3];
B = [1, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 1, 1, 3, 3, 3];
[NMI, AMI] = NMI_AMI(A, B)

C = [1, 1, 2, 2, 3, 3, 3];
D = [1, 1, 1, 2, 1, 1, 1];
[NMI, AMI] = NMI_AMI(C, D)

结果对比

python代码的运行结果为:

0.36456177185718985

0.26018122538925054

0.28483386264113447

0.056748831755324296

matlab代码的运行结果为:

NMI =

0.364561771857190

AMI =

0.260181225389251

NMI =

0.284833862641135

AMI =

0.056748831755324

总结与反思

这里仅给出了三个指标的计算方法,方便直接使用;对于各个指标的优缺点、构造思想等,并没有研究。

References

  1. https://en.wikipedia.org/wiki/Adjusted_mutual_information (科学上网)

  2. Vinh, Nguyen Xuan; Epps, Julien; Bailey, James (2010), “Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance” (PDF), The Journal of Machine Learning Research, 11 (oct): 2837–54

    论文链接:http://jmlr.csail.mit.edu/papers/volume11/vinh10a/vinh10a.pdf

Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐