循环神经网络–LSTM模型

大家好，欢迎来到IT知识分享网。

LSTM模型

1、概述：

LSTM（Long Short-Term Memory）模型是一种特殊的循环神经网络（RNN），它能够学习和记忆长期依赖关系。LSTM通过引入门控机制来解决传统RNN在处理长序列数据时遇到的梯度消失问题。这些门控机制包括遗忘门、输入门和输出门，它们可以控制信息的流动，从而使得网络能够学习到长期依赖关系。

2、门：

门是一种让信息选择式通过的方法，包含一个sigmoid神经网络层和一个pointwise乘法操作。

1、遗忘门（Forget Gate）

遗忘门决定哪些信息应该从细胞状态中被遗忘或保留。它通过以下公式计算：

其中 σ 是sigmoid激活函数，Wf是遗忘门的权重矩阵，ht−1 是上一时间步的隐藏状态，xt是当前时间步的输入，bf是偏置项。

2、输出门（Input Gate）

输入门由两部分组成：一个sigmoid层决定哪些值将要更新，和一个tanh层创建一个新的候选值向量，该向量将被加入到状态中。输入门的计算如下：

其中 it 是输入门的输出，C~t是候选值

3、状态更新（Cell State Update）

其中 Ct是当前时间步的细胞状态，Ct−1 是上一时间步的细胞状态。

4、输出门（Output Gate）

输出门决定隐藏状态的值，隐藏状态包含关于观测序列的信息，输出门的计算如下：

其中 ot 是输出门的输出，ht 是当前时间步的隐藏状态。

3、代码实现

手写的简单LSTM：

import numpy as np import torch import torch.nn as nn import torch.nn.functional as F class LSTM(nn.Module): def __init__(self, input_size, hidden_size, output_size): super(LSTM, self).__init__() self.input_size = input_size self.hidden_size = hidden_size self.output_size = output_size # 使用 nn.Parameter 来初始化权重和偏置 self.w_f = np.random.rand(hidden_size, input_size+hidden_size) self.b_f = np.random.rand(hidden_size) self.w_i = np.random.rand(hidden_size, input_size+hidden_size) self.b_i = np.random.rand(hidden_size) self.w_c = np.random.rand(hidden_size, input_size+hidden_size) self.b_c = np.random.rand(hidden_size) self.w_o = np.random.rand(hidden_size, input_size+hidden_size) self.b_o = np.random.rand(hidden_size) # 输出层 self.w_y = np.random.rand(output_size, hidden_size) self.b_y = np.random.rand(output_size) def tanh(self, x): return np.tanh(x) def sigmoid(self, x): return 1/(1+np.exp(-x)) def forward(self, x): # 初始化隐藏状态和细胞状态 h_t = np.zeros((self.hidden_size,)) c_t = np.zeros((self.hidden_size,)) h_states = [] c_states = [] for t in range(x.size(0)): x_t = x[t] x_t = np.concatenate([x_t, h_t]) # 遗忘门 f_t = self.sigmoid(np.dot(self.w_f,x_t) + self.b_f) # 输入门 i_t = self.sigmoid(np.dot(self.w_i,x_t) + self.b_i) # 候选细胞状态 c_hat_t = self.tanh(np.dot(self.w_c,x_t) + self.b_c) # 更新细胞状态 c_t = f_t * c_t + i_t * c_hat_t # 输出门 o_t = self.sigmoid(np.dot( self.w_o,x_t) + self.b_o) # 更新隐藏状态 h_t = o_t * self.tanh(c_t) # 保存每个时间步的隐藏状态和细胞状态 h_states.append(h_t) c_states.append(c_t) y_t = np.dot(self.w_y,h_t) + self.b_y output = torch.softmax(torch.tensor(y_t), dim=0) return np.array(h_states), np.array(c_states), output # 将 NumPy 数组转换为 PyTorch 张量 x = torch.tensor(np.random.randn(3, 2), dtype=torch.float32) hidden_size = 5 lstm = LSTM(input_size=2, hidden_size=hidden_size, output_size=6) hidden_states, cell_states, output = lstm.forward(x) print(hidden_states, cell_states, output)

多对一（简单案例）：

import torch import torch.nn as nn class ManyToOneLSTM(nn.Module): def __init__(self, input_size, hidden_size, output_size): super(ManyToOneLSTM, self).__init__() self.hidden_size = hidden_size self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True) self.fc = nn.Linear(hidden_size, output_size) def forward(self, x): # 初始化隐藏状态和细胞状态 h0 = torch.zeros(1, x.size(0), self.hidden_size) c0 = torch.zeros(1, x.size(0), self.hidden_size) # 前向传播LSTM out, _ = self.lstm(x, (h0, c0)) # 只取最后一个时间步的输出 out = out[:, -1, :] # 通过全连接层得到最终输出 output = self.fc(out) return output # 示例使用 input_size = 10 hidden_size = 20 output_size = 2 model = ManyToOneLSTM(input_size, hidden_size, output_size) x = torch.randn(4, 7, input_size) output = model(x) print(output.shape) # 输出形状：(4, output_size)

多对多（简单案例）：

import torch import torch.nn as nn import torch.nn.functional as F class ManyToManyLSTM(nn.Module): def __init__(self, input_size, hidden_size, output_size): super(ManyToManyLSTM, self).__init__() self.hidden_size = hidden_size self.input_size = input_size self.output_size = output_size self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True) self.fc = nn.Linear(hidden_size, output_size) def forward(self, x): # 初始化隐藏状态和细胞状态 h0 = torch.zeros(1, x.size(0), self.hidden_size) c0 = torch.zeros(1, x.size(0), self.hidden_size) # 前向传播LSTM out, _ = self.lstm(x, (h0, c0)) # 应用全连接层到每个时间步的输出 out = self.fc(out) return out # 超参数设置 input_size = 10 # 输入特征的维度 hidden_size = 20 # LSTM隐藏层的维度 output_size = 5 # 输出的维度 # 创建模型实例 model = ManyToManyLSTM(input_size, hidden_size, output_size) # 示例输入数据 (batch_size, sequence_length, input_size) x = torch.randn(4, 7, input_size) # 假设有4个样本，每个样本是7个时间步的序列 # 前向传播 output = model(x) print(output.shape) # 输出形状将是 (batch_size, sequence_length, output_size)

一对多（简单案例）：

import torch import torch.nn as nn class OneToManyLSTM(nn.Module): def __init__(self, input_size, hidden_size, output_size): super(OneToManyLSTM, self).__init__() self.hidden_size = hidden_size self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True) self.fc = nn.Linear(hidden_size, output_size) def forward(self, x): # 初始化隐藏状态和细胞状态 h0 = torch.zeros(1, x.size(0), self.hidden_size) c0 = torch.zeros(1, x.size(0), self.hidden_size) # 前向传播LSTM out, _ = self.lstm(x, (h0, c0)) # 应用全连接层到每个时间步的输出 out = self.fc(out) return out # 示例使用 input_size = 10 hidden_size = 20 output_size = 2 model = OneToManyLSTM(input_size, hidden_size, output_size) x = torch.randn(4, 1, input_size) # 假设每个样本是一个时间步 output = model(x) print(output.shape) # 输出形状：(4, 1, output_size)

4、序列池化

1、最大池化（MAX Pooling）：

最大池化通过选择序列中的最大值来生成固定长度的输出。在NLP中，这可以用于提取关键词或短语的最重要特征。最大池化对于异常值具有一定的鲁棒性，因为它只关注最大的激活值。

超级简单案例：

import torch import torch.nn as nn input_data = torch.randn(100, 1000,32) max_pool = nn.AdaptiveMaxPool1d(1) input_data = input_data.permute(0,2,1) output = max_pool(input_data) print(output.size()) #torch.Size([100, 32, 1])

2、平均池化（Average Pooling）：

平均池化通过计算序列中所有值的平均值来生成输出。这种方法倾向于平滑特征，减少噪声的影响。然而，它可能会丢失一些重要信息，因为它对所有值给予相同的权重。

又是一个超级简单的案例：

import torch import torch.nn as nn input_data = torch.randn(100, 1000,32) avg_pool = nn.AdaptiveAvgPool1d(1) input_data = input_data.permute(0,2,1) output = avg_pool(input_data) print(output.size()) #torch.Size([100, 32, 1])

3、注意力池化（Attention Pooling）:

加权池化是一种更灵活的方法，它为序列中的每个元素分配一个权重，然后根据这些权重计算加权平均值。权重可以基于不同的标准，如位置信息、时间衰减或其他自定义函数。这种方法可以更好地捕捉序列中的复杂模式。

免责声明：本站所有文章内容,图片，视频等均是来源于用户投稿和互联网及文摘转载整编而成，不代表本站观点，不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益，请在线联系站长,一经查实,本站将立刻删除。本文来自网络,若有侵权，请联系删除，如若转载，请注明出处：https://haidsoft.com/122507.html