卷积层详解_IT分享知识网

大家好，欢迎来到IT知识分享网。

卷积神经网络

卷积神经网络（Convolutional Neural Network，CNN）是一种深度学习模型，主要用于图像识别、语音识别和自然语言处理等领域。CNN 通过卷积层、池化层和全连接层等组件构成。

卷积层是 CNN 的核心部分，它通过滤波器（Filter）对输入图像进行卷积操作，提取出图像的特征信息。卷积层通过多个不同的滤波器，可以提取出不同的特征信息，例如边缘、纹理和形状等。

CNN 通常使用反向传播算法进行训练，通过优化损失函数，不断调整网络参数，使其能够更好地拟合训练数据。在实际应用中，CNN 已经取得了许多优秀的成果，例如在图像分类、物体检测和语音识别等领域中，CNN 已经成为了一种主流的模型。

卷积层

本文主要讲最常用的Conv2d，二维图像

主要的数学公式：

⋆ 是有效的二维互相关运算符，N 是批量大小，C表示通道数，H 是输入平面的高度（以像素为单位），并且W 是以像素为单位的宽度。

$\mathrm{out}(N_i,C_{\mathrm{out}_j})=\mathrm{bias}(C_{\mathrm{out}_j})+\sum_{k=0}^{C_{\mathrm{in}-1}}\mathrm{weight}(C_{\mathrm{out}_j},k)\star\mathrm{input}(N_i,k)$

参数：

pytorch官网给出的框架中可设置的参数：

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)

参数的含义：

参数名称	含义
in_channels	输入的通道数，也就是输入图像的深度（channel）
out_channel	输出的通道数，也就是卷积核（滤波器）的个数，决定了卷积层的深度。
kernel_size	卷积核的大小，可以是一个整数或者一个元组 (H, W)，其中 H 和 W 分别表示卷积核的高度和宽度。
stride	卷积的步长，可以是一个整数或者一个元组 (S_H, S_W)，其中 S_H 和 S_W 分别表示在高度和宽度方向上的步长，默认值为 1。
padding	输入的零填充（zero-padding）的大小，可以是一个整数或者一个元组 (P_H, P_W)，其中 P_H 和 P_W 分别表示在高度和宽度方向上的填充大小，默认值为 0。
dilation	卷积核的扩展率（dilation rate），默认值为 1。如果设置为大于 1 的值，将会增加卷积核内部元素之间的间距，从而改变卷积操作的感受野大小。
groups	输入和输出之间的连接方式，可以是一个整数，默认值为 1。当 groups 等于输入通道数时，表示每个输入通道对应一个输出通道；当 groups 等于 1 时，表示所有输入通道共享一个卷积核。
bias	是否使用偏置项，默认值为 True。如果设置为 False，卷积操作中将不添加偏置项。
padding_mode	填充模式，默认为 ‘zeros’，表示使用零填充。
device	指定张量所在的设备（CPU 或 GPU）。
dtype	指定张量的数据类型。

示例代码：

import torch import torchvision from torch import nn from torch.nn import Conv2d from torch.utils.data import DataLoader from torch.utils.tensorboard import SummaryWr //加载CIFAR10数据集（自动下载） dataset = torchvision.datasets.CIFAR10("../data", train=False, transform=torchvision.transforms.ToTensor(), download=True) //加载数据集 dataloader = DataLoader(dataset, batch_size=64) //创建Touch模型 class Touch(nn.Module): def __init__(self): super(Touch,self).__init__() //传入二维卷积所需要的参数 self.conv1 = Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0) def forward(self, x): x = self.conv1(x) return x //实例化Touch模型 touch = Touch() //将下列卷积输出到日志文件中，并载入tensorboard模块可视化展示 writer = SummaryWriter("../../logs") //初始化step step = 0 //通过for循环将每一个卷积输出结果写入日志文件中 for data in dataloader: imgs, targets = data output = touch(imgs) //打印输出imgs，output的数据形状 print(imgs.shape) print(output.shape) writer.add_images("input", imgs, step) //将输出的数据形状转化为模型所需要形状 output = torch.reshape(output, (-1, 3, 30, 30)) writer.add_images("output", output, step) //持续走步 step = step + 1

将输出的日志文件传入tensorboard模块进行可视化展示：

tensorboard --logdir=logs

框架源码：

class Conv2d(_ConvNd): __doc__ = r"""Applies a 2D convolution over an input signal composed of several input planes. In the simplest case, the output value of the layer with input size :math:`(N, C_{\text{in}}, H, W)` and output :math:`(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})` can be precisely described as: .. math:: \text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k) where :math:`\star` is the valid 2D `cross-correlation`_ operator, :math:`N` is a batch size, :math:`C` denotes a number of channels, :math:`H` is a height of input planes in pixels, and :math:`W` is width in pixels. """ + r""" This module supports :ref:`TensorFloat32<tf32_on_ampere>`. On certain ROCm devices, when using float16 inputs this module will use :ref:`different precision<fp16_on_mi200>` for backward. * :attr:`stride` controls the stride for the cross-correlation, a single number or a tuple. * :attr:`padding` controls the amount of padding applied to the input. It can be either a string { 
   {'valid', 'same'}} or an int / a tuple of ints giving the amount of implicit padding applied on both sides. * :attr:`dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this `link`_ has a nice visualization of what :attr:`dilation` does. {groups_note} The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be: - a single ``int`` -- in which case the same value is used for the height and width dimension - a ``tuple`` of two ints -- in which case, the first `int` is used for the height dimension, and the second `int` for the width dimension Note: {depthwise_separable_note} Note: {cudnn_reproducibility_note} Note: ``padding='valid'`` is the same as no padding. ``padding='same'`` pads the input so the output has the shape as the input. However, this mode doesn't support any stride values other than 1. Note: This module supports complex data types i.e. ``complex32, complex64, complex128``. Args: in_channels (int): Number of channels in the input image out_channels (int): Number of channels produced by the convolution kernel_size (int or tuple): Size of the convolving kernel stride (int or tuple, optional): Stride of the convolution. Default: 1 padding (int, tuple or str, optional): Padding added to all four sides of the input. Default: 0 padding_mode (str, optional): ``'zeros'``, ``'reflect'``, ``'replicate'`` or ``'circular'``. Default: ``'zeros'`` dilation (int or tuple, optional): Spacing between kernel elements. Default: 1 groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1 bias (bool, optional): If ``True``, adds a learnable bias to the output. Default: ``True`` """.format(reproducibility_notes, convolution_notes) + r""" Shape: - Input: :math:`(N, C_{in}, H_{in}, W_{in})` or :math:`(C_{in}, H_{in}, W_{in})` - Output: :math:`(N, C_{out}, H_{out}, W_{out})` or :math:`(C_{out}, H_{out}, W_{out})`, where .. math:: H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor .. math:: W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor Attributes: weight (Tensor): the learnable weights of the module of shape :math:`(\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}},` :math:`\text{kernel\_size[0]}, \text{kernel\_size[1]})`. The values of these weights are sampled from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where :math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}` bias (Tensor): the learnable bias of the module of shape (out_channels). If :attr:`bias` is ``True``, then the values of these weights are sampled from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where :math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}` Examples: >>> # With square kernels and equal stride >>> m = nn.Conv2d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2)) >>> # non-square kernels and unequal stride and with padding and dilation >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1)) >>> input = torch.randn(20, 16, 50, 100) >>> output = m(input) .. _cross-correlation: https://en.wikipedia.org/wiki/Cross-correlation .. _link: https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md """ def __init__( self, in_channels: int, out_channels: int, kernel_size: _size_2_t, stride: _size_2_t = 1, padding: Union[str, _size_2_t] = 0, dilation: _size_2_t = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', # TODO: refine this type device=None, dtype=None ) -> None: factory_kwargs = { 
   'device': device, 'dtype': dtype} kernel_size_ = _pair(kernel_size) stride_ = _pair(stride) padding_ = padding if isinstance(padding, str) else _pair(padding) dilation_ = _pair(dilation) super().__init__( in_channels, out_channels, kernel_size_, stride_, padding_, dilation_, False, _pair(0), groups, bias, padding_mode, factory_kwargs) def _conv_forward(self, input: Tensor, weight: Tensor, bias: Optional[Tensor]): if self.padding_mode != 'zeros': return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode), weight, bias, self.stride, _pair(0), self.dilation, self.groups) return F.conv2d(input, weight, bias, self.stride, self.padding, self.dilation, self.groups) def forward(self, input: Tensor) -> Tensor: return self._conv_forward(input, self.weight, self.bias)

免责声明：本站所有文章内容,图片，视频等均是来源于用户投稿和互联网及文摘转载整编而成，不代表本站观点，不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益，请在线联系站长,一经查实,本站将立刻删除。本文来自网络,若有侵权，请联系删除，如若转载，请注明出处：https://haidsoft.com/114128.html

卷积层详解

卷积神经网络

卷积层

主要的数学公式：

参数：

示例代码：

框架源码：

相关推荐

发表回复