当前位置：首页 > 编程资源 > 编程问答 >内容正文

编程问答

Tensorflow中GRU和LSTM的权重初始化

发布时间：2025/3/15 编程问答 36 豆豆

生活随笔收集整理的这篇文章主要介绍了 Tensorflow中GRU和LSTM的权重初始化小编觉得挺不错的,现在分享给大家,帮大家做个参考.

GRU和LSTM权重初始化

在编写模型的时候，有时候你希望RNN用某种特别的方式初始化RNN的权重矩阵，比如xaiver或者orthogonal，这时候呢，只需要：

12345678910

cell = LSTMCell if self.args.use_lstm else GRUCellwith tf.variable_scope(initializer=tf.orthogonal_initializer()):input = tf.nn.embedding_lookup(embedding, questions_bt)cell_fw = MultiRNNCell(cells=[cell(hidden_size) for _ in range(num_layers)])cell_bw = MultiRNNCell(cells=[cell(hidden_size) for _ in range(num_layers)])outputs, last_states = tf.nn.bidirectional_dynamic_rnn(cell_bw=cell_bw,cell_fw=cell_fw,dtype="float32",inputs=input,swap_memory=True)

那么这么写到底是不是正确的初始化了权重呢，我们跟着bidirectional_dynamic_rnn的代码看进去，先只看forward：

123456

with vs.variable_scope("fw") as fw_scope:output_fw, output_state_fw = dynamic_rnn(cell=cell_fw, inputs=inputs, sequence_length=sequence_length,initial_state=initial_state_fw, dtype=dtype,parallel_iterations=parallel_iterations, swap_memory=swap_memory,time_major=time_major, scope=fw_scope)

发现它增加了一个variable_scope叫做fw_scope，继续看dynamic_rnn发现这个scope只用在了缓存管理中，而dynamic_rnn实际调用了下面的内容：

12345678

(outputs, final_state) = _dynamic_rnn_loop(cell,inputs,state,parallel_iterations=parallel_iterations,swap_memory=swap_memory,sequence_length=sequence_length,dtype=dtype)

总之，调用来调用去，最后调用到了一个语句：

1	call_cell = lambda: cell(input_t, state)

好，最后都调用了GRUCell或者LSTMCell的__call__()方法，我们顺着看进去，比如GRU的__call__()长下面这个样子：

12345678910111213141516

def __call__(self, inputs, state, scope=None): """Gated recurrent unit (GRU) with nunits cells.""" with _checked_scope(self, scope or "gru_cell", reuse=self._reuse): with vs.variable_scope("gates"): # Reset gate and update gate. # We start with bias of 1.0 to not reset and not update.value = sigmoid(_linear([inputs, state], 2 * self._num_units, True, 1.0))r, u = array_ops.split(value=value,num_or_size_splits=2,axis=1) with vs.variable_scope("candidate"):c = self._activation(_linear([inputs, r * state],self._num_units, True))new_h = u * state + (1 - u) * c return new_h, new_h

咦？怎么没有权重和偏置呢？好像__init__()方法里也没有，看到这个_linear()了吧，其实所有的权重都在这个方法里面（LSTMCell也一样），这个方法中有玄机了：

12345678910

with vs.variable_scope(scope) as outer_scope:weights = vs.get_variable(_WEIGHTS_VARIABLE_NAME, [total_arg_size, output_size], dtype=dtype)# ....some code with vs.variable_scope(outer_scope) as inner_scope:inner_scope.set_partitioner(None)biases = vs.get_variable(_BIAS_VARIABLE_NAME, [output_size],dtype=dtype,initializer=init_ops.constant_initializer(bias_start, dtype=dtype))

所以，这个方法里面，就是又增加了一个variable_scope，然后调用get_variable()方法获取权重和偏置。所以，我们的variable_scope里面嵌套了若干层variable_scope后，我们定义的初始化方法还有没有用呢，实验一下吧：

好的，经过我们的测试，嵌套的variable_scope如果内层没有初始化方法，那么以外层的为准。所以我们的结论呼之欲出：

RNN的两个变种在Tensorflow版本1.1.0的实现，只需要调用它们时在variable_scope加上初始化方法，它们的权重就会以该方式初始化；

但是无论是LSTM还是GRU，都没有提供偏置的初始化方法（不过好像可以定义初始值）。

原文地址：　 http://cairohy.github.io/2017/05/05/ml-coding-summarize/Tensorflow%E4%B8%ADGRU%E5%92%8CLSTM%E7%9A%84%E6%9D%83%E9%87%8D%E5%88%9D%E5%A7%8B%E5%8C%96/

总结

以上是生活随笔为你收集整理的Tensorflow中GRU和LSTM的权重初始化的全部内容，希望文章能够帮你解决所遇到的问题。

如果觉得生活随笔网站内容还不错，欢迎将生活随笔推荐给好友。

上一篇：反卷积在神经网络可视化上的成功应用
下一篇： Autoencoder 详解