欢迎访问 生活随笔!

生活随笔

当前位置: 首页 > 编程资源 > 编程问答 >内容正文

编程问答

node2vec: Scalable Feature Learning for networks

发布时间:2025/4/5 编程问答 43 豆豆
生活随笔 收集整理的这篇文章主要介绍了 node2vec: Scalable Feature Learning for networks 小编觉得挺不错的,现在分享给大家,帮大家做个参考.

Node2vec历史意义:

  • 是目前引用量比较高的文章
  • 与DeepWalk文章一样,属于早期网络表征学习的代表性工作,后期作为经典baseline
  • 启发了大量基于random walk来做网络表征学习的工作

图学习领域(人工特征提取->特征筛选-> 输入分类器) ---------- DeepWalk、node2vec ---------- 深度学习领域(特征工程和分类集于一体)

                 (基于特征工程)                                                                                                                    (基于神经网络)

 

论文主要结构如下:

一、摘要Abstract

介绍背景及提出node2vec模型,灵活可调的搜索策略

1、强调之前的基于特征工程的工作缺点,从而引出node2vec并能探索邻域的多样性

2、通过biased random walk算法提出可调的搜索策略,生成不同语义的节点序列的信息

3、讨论算法的高效性、鲁棒性,从案例分析和大量实验论文模型的特点

4、基于以上算法 node2vec算法在多个领域的网络数据集上达到比较好的效果

二、Introduction

介绍图的重要性,与以前的方法做对比

三、Realted Work

介绍传统基于图的人工特征的算法,降维算法等

四、Feature Learning

介绍图的基本概念,skip-gram算法,优化目标函数

模型细节一  优化目标函数 类似skip-gram

独立性假设: 邻居节点之间互不影响、负采样、SGD优化方法

核心: 通过随机游走策略生成Ns(u)

五、Search strategies

介绍BFS、DFS、搜索策略

模型细节二 bfs、dfs

""" BFS算法 数据结构:queue (队列) FIFO 先进先出 BFS具有结构相似性 (structural equivalence)"""from collections import dequedef iter_bfs(G,s,S=None):S,Q = set(),deque()Q.append(s)while Q:u = Q.popleft()if u in S: continueS.add(u)Q.extend(G[u])yield u""" DFS算法 数据结构: stack(栈) 先进先出 DFS具有同质相似性、社群相似性(homophily)"""def iter_dfs(G,s):S,Q = set(),[]Q.append(s)while Q:u = Q.pop()if u in S: continueS.add(u)Q.extend(G[u])yield u

 

     模型细节三 Biased Random Walk(2nd order)

 

dtx: t、x之间的最短路径、dtx: {0,1,2} p、q控制了从源点(v)离开其邻居的快慢

超参数的理解:

p: 

 p值大:倾向不回溯,降低了2-hop的冗余度

p值小:倾向回溯,采样序列集中在起始点的周围

q:

q>1: BFS-behaviour,local view

q<1:DFS-behaviour

六、Node2vec

介绍Biased random-walk算法,参数p,q的选取方法,时间复杂度分析、alias sampling

模型细节四 node2vec算法

 

时间复杂度O(m)

模型细节五 alias sampling

采样技巧:

 p=[0.3,0.2,0.1,0.4] sump = [0.3,0.5,0.6,1] 

O(n): linear search

O(logn):binary search

O(1): alias sampling

Alias sampling讲的比较好的一篇文章:https://blog.csdn.net/haolexiao/article/details/65157026

七、Effectiveness

介绍实验探究模型的有效性:  Case study、baselines 参数设定、分类任务和边预测任务

 

八、结论

  关键点:

  • word2vec训练框架
  • 基于random walk产生训练序列
  • 性能-alias sampling
  • 实验设置
  • 启发点:

  • 图的理解对网络表征学习的作用
  • 基于randomwalk方法启发大量的工作
  •  

    九、代码

    # -*- coding: utf-8 -*-# @Time : 2020/12/14 下午8:22 # @Author : TaoWang # @Description : 主函数import argparse import networkx as nx from gensim.models import Word2Vec from model import Graphdef parse_args():""":return: 相关参数"""parser = argparse.ArgumentParser(description="Run node2vec.")parser.add_argument('--input', nargs='?', default='graph/karate.edgelist', help='Input graph path.')parser.add_argument('--output', nargs='?', default='emb/karate.emb', help='Embedding path')parser.add_argument("--dimensions", type=int, default=128, help='Number of dimensions. Default is 128.')parser.add_argument('--walk-length', type=int, default=80, help='Length of walk per source. Default is 80.')parser.add_argument('--num-walks', type=int, default=10, help='Number of walks per source. Default is 10.')parser.add_argument('--window-size', type=int, default=10, help='Context size for optimization. Default is 10.')parser.add_argument('--iter', type=int, default=1, help='Number of epochs in SGD')parser.add_argument('--workers', type=int, default=8, help='Number of parallel workers. Default is 8.')parser.add_argument('--p', type=float, default=1, help='Return hyperparameter. Default is 1.')parser.add_argument('--q', type=float, default=1, help='Inout hyperparameter. Default is 1.')parser.add_argument('--weighted', dest='weighted', action='store_true', help='Default is unweighted.')parser.add_argument('--unweighted', dest='unweighted', action='store_false')parser.set_defaults(weighted=False)parser.add_argument('--directed', dest='directed', action='store_true', help='Graph is undirected.')parser.add_argument('--undirected', dest='undirected', action='store_false')parser.set_defaults(directed=False)return parser.parse_args()def read_graph():""":return:"""print(args.weighted)"""有权图"""if args.weighted:G = nx.read_edgelist(args.input, nodetype=int, data=(('weight', float),), create_using=nx.DiGraph())else:"""无权图"""G = nx.read_edgelist(args.input, nodetype=int, create_using=nx.DiGraph())for edge in G.edges():G[edge[0]][edge[1]]['weight'] = 1print((edge[0], edge[1]))# 无向操作 a->b,b->a 去掉一个边if not args.directed:G = G.to_undirected()return Gdef learning_embedding(walks):""":param walks::return:"""# 将node 的类型 int转化为stringwalks = [list(map(str, walk)) for walk in walks]# 调用gensim包运行word2vecmodel = Word2Vec(walks,size=args.dimensions,window=args.window_size,min_count=0,sg=1,workers=args.workers,iter=args.iter)## # 保存embedding信息model.wv.save_word2vec_format(args.output)def main(args):""":param args::return:"""nx_G = read_graph()G = Graph(nx_G,args.directed,args.p,args.q)G.propeocess_transition_probs()walks = G.simulate_walk(args.num_walks, args.walk_length)print(walks)learning_embedding(walks)if __name__ == "__main__":args = parse_args()main(args)# -*- coding: utf-8 -*-# @Time : 2020/12/14 下午8:49 # @Author : TaoWang # @Description : 构建node2vec算法import numpy as np import randomclass Graph():"""初始化参数"""def __init__(self, nx_G, is_directed, p, q):self.G = nx_Gself.is_directed = is_directedself.p = pself.q = qdef node2vec_walk(self, walk_length, start_node):""":param walk_length::param start_node::return:"""G = self.G# 上一步计算出的alias table,完成O(1)的采样alias_nodes = self.alias_nodesalias_edges = self.alias_edgeswalk = [start_node]# 直到生成长度为walk_length的节点序列位为止while len(walk) < walk_length:cur = walk[-1]# 对邻居节点排序,目的是和alias table计算时的顺序对应起来cur_nbrs = sorted(G.neighbors(cur))if len(cur_nbrs) > 0:# 节点序列只有一个节点的情况if len(walk) == 1:walk.append(cur_nbrs[alias_draw(alias_nodes[cur][0], alias_nodes[cur][1])])else:"""节点序列大于一个节点的情况,看前一个节点,prev是论文中的节点t"""prev = walk[-2]next = cur_nbrs[alias_draw(alias_edges[(prev, cur)][0],alias_edges[(prev, cur)][1])]walk.append(next)else:breakreturn walkdef simulate_walk(self, num_walks, walk_length):"""有偏的随机游走生成节点序列:param num_walks::param walk_length::return:"""G = self.Gwalks = []nodes = list(G.nodes())print("Walk iteration:")for walk_iter in range(num_walks):"""遍历 num_walks次图,也就是遍历所有节点num_walks次"""print(str(walk_iter + 1), '/', str(num_walks))random.shuffle(nodes)for node in nodes:# node2vec_walk是一次有偏的随机游走walks.append(self.node2vec_walk(walk_length=walk_length, start_node=node))return walksdef get_alias_edge(self, src, dst):""":param src::param dst::return:"""G, p, q = self.G, self.p, self.qunnormalized_probs = []# node2vec算法 论文3.2.2 dst->v"""apq(t,x) = 1/p if dtx = 0 = 1 if dtx = 1= 1/q if dtx = 2"""for dst_nbr in sorted(G.neighbors(dst)):if dst_nbr == src: # 上一个点srcunnormalized_probs.append(G[dst][dst_nbr]['weight']/p)elif G.has_edge(dst_nbr, src):unnormalized_probs.append(G[dst][dst_nbr]['weight'])else:unnormalized_probs.append(G[dst][dst_nbr]['weight']/q)# 归一化norm_const = sum(unnormalized_probs)normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs]return alias_setup(normalized_probs)def propeocess_transition_probs(self):""":return:"""G = self.Gis_directed = self.is_directed# 创建词典alias_nodes = {}# 节点概率alias sampling和归一化for node in G.nodes():unnormalized_probs = [G[node][nbr]['weight'] for nbr in sorted(G.neighbors(node))]norm_const = sum(unnormalized_probs)normalized_probs = [float(u_prob)/norm_const for u_prob in unnormalized_probs] # 概率列表[1/9,1/9,....1/9]alias_nodes[node] = alias_setup(normalized_probs)if node==2:print(unnormalized_probs)print(norm_const)print(normalized_probs)print(alias_nodes[node])alias_edges = {}"""边概率alias sampling 和归一化"""if is_directed:for edge in G.edges():alias_edges[edge] = self.get_alias_edge(edge[0], edge[1])else:for edge in G.edges():alias_edges[edge] = self.get_alias_edge(edge[0], edge[1])alias_edges[(edge[1], edge[0])] = self.get_alias_edge(edge[1], edge[0])self.alias_nodes = alias_nodesself.alias_edges = alias_edgesreturndef alias_setup(probs):"""Algorithm: Naive Alias MethodInitialization:1.Multiply each probability pi by n.2.Create arrays Alias and Prob,each of size n3.For j=1 to n-1:1.Find a probability pl satisfying pl <=12.Find a probability pg(l!=g) satisfying pg>=13.Set Prob[l] = pl4.Set Prob[l] = g5.Remove pl from the list of initial probabilities.6.Set pg = pg - (1-pl)4.Let i be the last probability remaining, which must have weight 1.5.Set Prob[i] = 1.Generation:1.Generate a fair die roll from an n-sided die; call the side i.2.Flip a biased coin that comes up heads with probability Prob[i]3.if the coin comes up "heads" return u4.otherwise return Alias[i]:param probs::return:"""K = len(probs)q = np.zeros(K) # probabilityJ = np.zeros(K, dtype=np.int) # aliassmaller, larger = [], []# 将各个概率分成两组,一组的概率小于1,另一组的概率值大于1for kk, prob in enumerate(probs):q[kk] = K*probif q[kk] < 1.0:smaller.append(kk)else:larger.append(kk)# 使用贪心算法,将概率值小于1的不断填满while len(smaller) > 0 and len(larger) > 0:small = smaller.pop()large = larger.pop()J[small] = larger# 更新概率值q[large] = q[large] + q[small] - 1.0if q[large] < 1.0:smaller.append(large)else:larger.append(large)return J, qdef alias_draw(J, q):""":param J::param q::return:"""K = len(J)kk = int(np.float(np.random.rand()*K))# 取自己if np.random.rand() < q[kk]:return kk# 取aliaselse:return J[kk]

     

    总结

    以上是生活随笔为你收集整理的node2vec: Scalable Feature Learning for networks的全部内容,希望文章能够帮你解决所遇到的问题。

    如果觉得生活随笔网站内容还不错,欢迎将生活随笔推荐给好友。