大家好,欢迎来到IT知识分享网。
卷积神经网络
卷积神经网络(Convolutional Neural Network,CNN)是一种深度学习模型,主要用于图像识别、语音识别和自然语言处理等领域。CNN 通过卷积层、池化层和全连接层等组件构成。
卷积层是 CNN 的核心部分,它通过滤波器(Filter)对输入图像进行卷积操作,提取出图像的特征信息。卷积层通过多个不同的滤波器,可以提取出不同的特征信息,例如边缘、纹理和形状等。
CNN 通常使用反向传播算法进行训练,通过优化损失函数,不断调整网络参数,使其能够更好地拟合训练数据。在实际应用中,CNN 已经取得了许多优秀的成果,例如在图像分类、物体检测和语音识别等领域中,CNN 已经成为了一种主流的模型。
卷积层
本文主要讲最常用的Conv2d,二维图像
主要的数学公式:
⋆ 是有效的二维互相关运算符,N 是批量大小,C表示通道数,H 是输入平面的高度(以像素为单位),并且W 是以像素为单位的宽度。
o u t ( N i , C o u t j ) = b i a s ( C o u t j ) + ∑ k = 0 C i n − 1 w e i g h t ( C o u t j , k ) ⋆ i n p u t ( N i , k ) \mathrm{out}(N_i,C_{\mathrm{out}_j})=\mathrm{bias}(C_{\mathrm{out}_j})+\sum_{k=0}^{C_{\mathrm{in}-1}}\mathrm{weight}(C_{\mathrm{out}_j},k)\star\mathrm{input}(N_i,k) out(Ni,Coutj)=bias(Coutj)+k=0∑Cin−1weight(Coutj,k)⋆input(Ni,k)
参数:
pytorch官网给出的框架中可设置的参数:
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)
参数的含义:
| 参数名称 | 含义 |
|---|---|
| in_channels | 输入的通道数,也就是输入图像的深度(channel) |
| out_channel | 输出的通道数,也就是卷积核(滤波器)的个数,决定了卷积层的深度。 |
| kernel_size | 卷积核的大小,可以是一个整数或者一个元组 (H, W),其中 H 和 W 分别表示卷积核的高度和宽度。 |
| stride | 卷积的步长,可以是一个整数或者一个元组 (S_H, S_W),其中 S_H 和 S_W 分别表示在高度和宽度方向上的步长,默认值为 1。 |
| padding | 输入的零填充(zero-padding)的大小,可以是一个整数或者一个元组 (P_H, P_W),其中 P_H 和 P_W 分别表示在高度和宽度方向上的填充大小,默认值为 0。 |
| dilation | 卷积核的扩展率(dilation rate),默认值为 1。如果设置为大于 1 的值,将会增加卷积核内部元素之间的间距,从而改变卷积操作的感受野大小。 |
| groups | 输入和输出之间的连接方式,可以是一个整数,默认值为 1。当 groups 等于输入通道数时,表示每个输入通道对应一个输出通道;当 groups 等于 1 时,表示所有输入通道共享一个卷积核。 |
| bias | 是否使用偏置项,默认值为 True。如果设置为 False,卷积操作中将不添加偏置项。 |
| padding_mode | 填充模式,默认为 ‘zeros’,表示使用零填充。 |
| device | 指定张量所在的设备(CPU 或 GPU)。 |
| dtype | 指定张量的数据类型。 |
示例代码:
import torch import torchvision from torch import nn from torch.nn import Conv2d from torch.utils.data import DataLoader from torch.utils.tensorboard import SummaryWr //加载CIFAR10数据集(自动下载) dataset = torchvision.datasets.CIFAR10("../data", train=False, transform=torchvision.transforms.ToTensor(), download=True) //加载数据集 dataloader = DataLoader(dataset, batch_size=64) //创建Touch模型 class Touch(nn.Module): def __init__(self): super(Touch,self).__init__() //传入二维卷积所需要的参数 self.conv1 = Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0) def forward(self, x): x = self.conv1(x) return x //实例化Touch模型 touch = Touch() //将下列卷积输出到日志文件中,并载入tensorboard模块可视化展示 writer = SummaryWriter("../../logs") //初始化step step = 0 //通过for循环将每一个卷积输出结果写入日志文件中 for data in dataloader: imgs, targets = data output = touch(imgs) //打印输出imgs,output的数据形状 print(imgs.shape) print(output.shape) writer.add_images("input", imgs, step) //将输出的数据形状转化为模型所需要形状 output = torch.reshape(output, (-1, 3, 30, 30)) writer.add_images("output", output, step) //持续走步 step = step + 1
将输出的日志文件传入tensorboard模块进行可视化展示:
tensorboard --logdir=logs
框架源码:
class Conv2d(_ConvNd): __doc__ = r"""Applies a 2D convolution over an input signal composed of several input planes. In the simplest case, the output value of the layer with input size :math:`(N, C_{\text{in}}, H, W)` and output :math:`(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})` can be precisely described as: .. math:: \text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k) where :math:`\star` is the valid 2D `cross-correlation`_ operator, :math:`N` is a batch size, :math:`C` denotes a number of channels, :math:`H` is a height of input planes in pixels, and :math:`W` is width in pixels. """ + r""" This module supports :ref:`TensorFloat32<tf32_on_ampere>`. On certain ROCm devices, when using float16 inputs this module will use :ref:`different precision<fp16_on_mi200>` for backward. * :attr:`stride` controls the stride for the cross-correlation, a single number or a tuple. * :attr:`padding` controls the amount of padding applied to the input. It can be either a string {
{'valid', 'same'}} or an int / a tuple of ints giving the amount of implicit padding applied on both sides. * :attr:`dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this `link`_ has a nice visualization of what :attr:`dilation` does. {groups_note} The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be: - a single ``int`` -- in which case the same value is used for the height and width dimension - a ``tuple`` of two ints -- in which case, the first `int` is used for the height dimension, and the second `int` for the width dimension Note: {depthwise_separable_note} Note: {cudnn_reproducibility_note} Note: ``padding='valid'`` is the same as no padding. ``padding='same'`` pads the input so the output has the shape as the input. However, this mode doesn't support any stride values other than 1. Note: This module supports complex data types i.e. ``complex32, complex64, complex128``. Args: in_channels (int): Number of channels in the input image out_channels (int): Number of channels produced by the convolution kernel_size (int or tuple): Size of the convolving kernel stride (int or tuple, optional): Stride of the convolution. Default: 1 padding (int, tuple or str, optional): Padding added to all four sides of the input. Default: 0 padding_mode (str, optional): ``'zeros'``, ``'reflect'``, ``'replicate'`` or ``'circular'``. Default: ``'zeros'`` dilation (int or tuple, optional): Spacing between kernel elements. Default: 1 groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1 bias (bool, optional): If ``True``, adds a learnable bias to the output. Default: ``True`` """.format(reproducibility_notes, convolution_notes) + r""" Shape: - Input: :math:`(N, C_{in}, H_{in}, W_{in})` or :math:`(C_{in}, H_{in}, W_{in})` - Output: :math:`(N, C_{out}, H_{out}, W_{out})` or :math:`(C_{out}, H_{out}, W_{out})`, where .. math:: H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor .. math:: W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor Attributes: weight (Tensor): the learnable weights of the module of shape :math:`(\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}},` :math:`\text{kernel\_size[0]}, \text{kernel\_size[1]})`. The values of these weights are sampled from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where :math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}` bias (Tensor): the learnable bias of the module of shape (out_channels). If :attr:`bias` is ``True``, then the values of these weights are sampled from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where :math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}` Examples: >>> # With square kernels and equal stride >>> m = nn.Conv2d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2)) >>> # non-square kernels and unequal stride and with padding and dilation >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1)) >>> input = torch.randn(20, 16, 50, 100) >>> output = m(input) .. _cross-correlation: https://en.wikipedia.org/wiki/Cross-correlation .. _link: https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md """ def __init__( self, in_channels: int, out_channels: int, kernel_size: _size_2_t, stride: _size_2_t = 1, padding: Union[str, _size_2_t] = 0, dilation: _size_2_t = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', # TODO: refine this type device=None, dtype=None ) -> None: factory_kwargs = {
'device': device, 'dtype': dtype} kernel_size_ = _pair(kernel_size) stride_ = _pair(stride) padding_ = padding if isinstance(padding, str) else _pair(padding) dilation_ = _pair(dilation) super().__init__( in_channels, out_channels, kernel_size_, stride_, padding_, dilation_, False, _pair(0), groups, bias, padding_mode, factory_kwargs) def _conv_forward(self, input: Tensor, weight: Tensor, bias: Optional[Tensor]): if self.padding_mode != 'zeros': return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode), weight, bias, self.stride, _pair(0), self.dilation, self.groups) return F.conv2d(input, weight, bias, self.stride, self.padding, self.dilation, self.groups) def forward(self, input: Tensor) -> Tensor: return self._conv_forward(input, self.weight, self.bias)
免责声明:本站所有文章内容,图片,视频等均是来源于用户投稿和互联网及文摘转载整编而成,不代表本站观点,不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益,请在线联系站长,一经查实,本站将立刻删除。 本文来自网络,若有侵权,请联系删除,如若转载,请注明出处:https://haidsoft.com/114128.html