生活随笔
收集整理的这篇文章主要介绍了
社区发现算法之——Louvain
小编觉得挺不错的,现在分享给大家,帮大家做个参考.
1、什么是社区
如果一张图是对一片区域的描述的话,我们将这张图划分为很多个子图。当子图之内满足关联性尽可能大,而子图之间关联性尽可能低时,这样的子图我们可以称之为一个社区。
2、社区发现算法及评价标准
社区发现算法有很多,例如LPA,HANP,SLPA以及我们今天的主人公——Louvain。不同的算法划分社区的效果不尽相同。那么,如何评价这些算法孰优孰劣呢?
用模块度modularity来衡量。模块度定义如下:模块度是评估一个社区网络划分好坏的度量方法,它的物理含义是社区内节点的连边数与随机情况下的边数只差,它的取值范围是 [−1/2,1)。可以简单地理解为社区内部所有边权重和减去与社区相连的边权重和。
Louvain算法
一种基于模块度的图算法模型,与普通的基于模块度和模块度增益不同的是,该算法速度很快,而且对一些点多边少的图,进行聚类效果特别明显。
算法流程:
1、初始时将每个顶点当作一个社区,社区个数与顶点个数相同。
2、依次将每个顶点与之相邻顶点合并在一起,计算它们的模块度增益是否大于0,如果大于0,就将该结点放入该相邻结点所在社区。
3、迭代第二步,直至算法稳定,即所有顶点所属社区不再变化。
4、将各个社区所有节点压缩成为一个结点,社区内点的权重转化为新结点环的权重,社区间权重转化为新结点边的权重。
5、重复步骤1-3,直至算法稳定。
import collections
import random
def load_graph(path
):G
= collections
.defaultdict
(dict)with open(path
) as text
:for line
in text
:vertices
= line
.strip
().split
()v_i
= int(vertices
[0])v_j
= int(vertices
[1])w
= float(vertices
[2])G
[v_i
][v_j
] = wG
[v_j
][v_i
] = w
return G
class Vertex():def __init__(self
, vid
, cid
, nodes
, k_in
=0):self
._vid
= vidself
._cid
= cidself
._nodes
= nodesself
._kin
= k_in
class Louvain():def __init__(self
, G
):self
._G
= Gself
._m
= 0 self
._cid_vertices
= {} self
._vid_vertex
= {} for vid
in self
._G
.keys
():self
._cid_vertices
[vid
] = set([vid
])self
._vid_vertex
[vid
] = Vertex
(vid
, vid
, set([vid
]))self
._m
+= sum([1 for neighbor
in self
._G
[vid
].keys
() if neighbor
> vid
])def first_stage(self
):mod_inc
= False visit_sequence
= self
._G
.keys
()random
.shuffle
(list(visit_sequence
))while True:can_stop
= True for v_vid
in visit_sequence
:v_cid
= self
._vid_vertex
[v_vid
]._cidk_v
= sum(self
._G
[v_vid
].values
()) + self
._vid_vertex
[v_vid
]._kincid_Q
= {}for w_vid
in self
._G
[v_vid
].keys
():w_cid
= self
._vid_vertex
[w_vid
]._cid
if w_cid
in cid_Q
:continueelse:tot
= sum([sum(self
._G
[k
].values
()) + self
._vid_vertex
[k
]._kin
for k
in self
._cid_vertices
[w_cid
]])if w_cid
== v_cid
:tot
-= k_vk_v_in
= sum([v
for k
, v
in self
._G
[v_vid
].items
() if k
in self
._cid_vertices
[w_cid
]])delta_Q
= k_v_in
- k_v
* tot
/ self
._m cid_Q
[w_cid
] = delta_Qcid
, max_delta_Q
= sorted(cid_Q
.items
(), key
=lambda item
: item
[1], reverse
=True)[0]if max_delta_Q
> 0.0 and cid
!= v_cid
:self
._vid_vertex
[v_vid
]._cid
= cidself
._cid_vertices
[cid
].add
(v_vid
)self
._cid_vertices
[v_cid
].remove
(v_vid
)can_stop
= Falsemod_inc
= Trueif can_stop
:breakreturn mod_inc
def second_stage(self
):cid_vertices
= {}vid_vertex
= {}for cid
, vertices
in self
._cid_vertices
.items
():if len(vertices
) == 0:continuenew_vertex
= Vertex
(cid
, cid
, set())for vid
in vertices
:new_vertex
._nodes
.update
(self
._vid_vertex
[vid
]._nodes
)new_vertex
._kin
+= self
._vid_vertex
[vid
]._kin
for k
, v
in self
._G
[vid
].items
():if k
in vertices
:new_vertex
._kin
+= v
/ 2.0cid_vertices
[cid
] = set([cid
])vid_vertex
[cid
] = new_vertexG
= collections
.defaultdict
(dict)for cid1
, vertices1
in self
._cid_vertices
.items
():if len(vertices1
) == 0:continuefor cid2
, vertices2
in self
._cid_vertices
.items
():if cid2
<= cid1
or len(vertices2
) == 0:continueedge_weight
= 0.0for vid
in vertices1
:for k
, v
in self
._G
[vid
].items
():if k
in vertices2
:edge_weight
+= v
if edge_weight
!= 0:G
[cid1
][cid2
] = edge_weightG
[cid2
][cid1
] = edge_weightself
._cid_vertices
= cid_verticesself
._vid_vertex
= vid_vertexself
._G
= G
def get_communities(self
):communities
= []for vertices
in self
._cid_vertices
.values
():if len(vertices
) != 0:c
= set()for vid
in vertices
:c
.update
(self
._vid_vertex
[vid
]._nodes
)communities
.append
(c
)return communities
def execute(self
):iter_time
= 1while True:iter_time
+= 1mod_inc
= self
.first_stage
()if mod_inc
:self
.second_stage
()else:breakreturn self
.get_communities
()if __name__
== '__main__':G
= load_graph
(r
'C:\\Users\\程勇\\Desktop\\similarity.txt')algorithm
= Louvain
(G
)communities
= algorithm
.execute
()communities
= sorted(communities
, key
=lambda b
: -len(b
)) count
= 0for communitie
in communities
:count
+= 1print("社区", count
, " ", communitie
)
测试用例文件如图:
这是部分测试用例的截图,一行的前两个数是顶点编号,第三个数是权重。按照每个社区大小顺序从大到小打印:
\quad需要测试文件的话在评论区留下你的邮箱哦,求关注~
总结
以上是生活随笔为你收集整理的社区发现算法之——Louvain的全部内容,希望文章能够帮你解决所遇到的问题。
如果觉得生活随笔网站内容还不错,欢迎将生活随笔推荐给好友。