转置卷积神经网络..

简介

卷积神经网络彻底改变了图像分类和目标检测领域。但你听说过转置卷积吗？你知道如何使用它们吗？

在本文中，我将解释什么是转置卷积，它们如何与常规卷积进行比较，并向你展示如何构建一个简单的神经网络，将其用于图像分辨率放大。

什么是转置卷积？

请注意，在一些文献中，转置卷积也称为反卷积或Francially Strided卷积。

为了理解转置卷积，让我们首先提醒自己什么是卷积。

卷积

卷积有三个部分：输入（例如，2D图像）、滤波器（也称为核）和输出（也称为卷积特征）。

卷积过程是迭代的。首先，对输入图像的一部分应用滤波器，并记录输出值。然后，当步长=1时，滤波器移动一个位置，或者当步长设置为更大的数字时，滤波器偏移多个位置，并且重复相同的过程，直到卷积特征完成。

下面的gif图像说明了在5x5输入上应用3x3过滤器的过程。

转置卷积

转置卷积的目标是做与常规卷积相反的事情，即，将输入特征图上采样为期望的更大尺寸输出特征图。

为了实现这一点，转置卷积经历了一个迭代过程，将输入特征图中的条目乘以滤波器，然后将它们相加。请注意，我们还在每一步中移动指定的位置（步幅）。

下面的gif演示了转置卷积的工作原理。该示例使用步幅1，通过2x2滤波器从2x2输入移动到3x3输出。

取第一个输入项，并将其乘以滤波器矩阵。临时存储结果。
然后，取第二个输入项并将其乘以滤波器矩阵。临时存储结果。对输入矩阵的其余部分继续此过程。
最后，对所有部分输出进行汇总，以获得最终结果。

值得注意的是，我们基本上是在转置卷积操作期间生成额外数据，因为我们正在将特征图从较小的尺寸上采样到较大的尺寸。

然而，这一操作并不完全与卷积相反。这是因为在卷积过程中，某些信息总是会丢失，这意味着我们无法通过应用转置卷积精确地重新创建相同的数据。

最后，我们可以尝试使用过滤器大小或步幅来实现输出特征图的期望大小。例如，我们可以将步幅从1增加到2，以避免部分重叠并产生4x4输出（见下图）。

转置卷积用于什么？

转置卷积对于生成性对抗网络（GAN）中的语义分割和数据生成至关重要。一个更直接的例子是经过训练以提高图像分辨率的神经网络。我们将立即建立一个这样的网络。

利用Keras/Tensorflow的完整Python示例

安装程序

我们需要获得以下数据和库：

Caltech 101图像数据集（来源）

https://data.caltech.edu/records/20086

Pandas和Numpy用于数据操作
OpenCV、Matplotlib和Graphviz以获取和显示图像并显示图
用于构建神经网络的Tensorflow/Keras
用于拆分数据的Scikit-learn库（train_test_split）

让我们导入库：

# Tensorflow / Keras
import tensorflow as tf # for building Neural Networks
from tensorflow import keras # for building Neural Networks
print('Tensorflow/Keras: %s' % keras.__version__) # print version
from keras.models import Model # for assembling a Neural Network
from keras import Input # for instantiating a keras tensor
from keras.layers import Conv2D, Conv2DTranspose # for adding layers to our Neural Network
from tensorflow.keras.utils import plot_model # for plotting model diagram


# Data manipulation
import pandas as pd # for data manipulation
print('pandas: %s' % pd.__version__) # print version
import numpy as np # for data manipulation
print('numpy: %s' % np.__version__) # print version


# Sklearn
import sklearn # for model evaluation
print('sklearn: %s' % sklearn.__version__) # print version
from sklearn.model_selection import train_test_split # for splitting the data into train and test samples


# Visualization
import cv2 # for ingesting images
print('OpenCV: %s' % cv2.__version__) # print version
import matplotlib # for showing images
import matplotlib.pyplot as plt # for showing images
print('matplotlib: %s' % matplotlib.__version__) # print version
import graphviz # for showing model diagram
print('graphviz: %s' % graphviz.__version__) # print version


# Other utilities
import sys
import os

# Assign main directory to a variable
main_dir=os.path.dirname(sys.path[0])
#print(main_dir)

上面的代码打印了我在这个示例中使用的包版本：

Tensorflow/Keras: 2.7.0
pandas: 1.3.4
numpy: 1.21.4
sklearn: 1.0.1
OpenCV: 4.5.5
matplotlib: 3.5.1
graphviz: 0.19.1

接下来，我们下载、保存并提取Caltech 101图像数据集。请注意，在这个示例中，我将只使用pandas的图像（Category=“panda”），而不是101个类别的完整列表。

同时，我准备数据并以两种不同的分辨率保存图像：

64 x 64像素，这将是我们的低分辨率输入数据。
256 x 256像素，这将是我们的高分辨率目标数据。

# Specify the location of images after you have downloaded them
ImgLocation=main_dir+"/data/101_ObjectCategories/"

# List image categories we are interested in (Only "panda" this time)
#CATEGORIES = set(["dalmatian", "hedgehog", "llama", "panda"])
CATEGORIES = set(["panda"])

# Create a list to store image paths
ImagePaths=[]
for category in CATEGORIES:
    for image in list(os.listdir(ImgLocation+category)):
        ImagePaths=ImagePaths+[ImgLocation+category+"/"+image]

# Load images and resize. We will need images in 64 x 64 and 256 x 256 pixels.
data_lowres=[]
data_hires=[]
for img in ImagePaths:
    image = cv2.imread(img)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image_lowres = cv2.resize(image, (64, 64))
    image_hires = cv2.resize(image, (256, 256))
    data_lowres.append(image_lowres)
    data_hires.append(image_hires)

# Convert image data to numpy array and standardize values (divide by 255 since RGB values ranges from 0 to 255)
data_lowres = np.array(data_lowres, dtype="float") / 255.0
data_hires = np.array(data_hires, dtype="float") / 255.0

# Show data shape
print("Shape of whole data_lowres: ", data_lowres.shape)
print("Shape of whole data_hires: ", data_hires.shape)


# ---- Create training and testing samples ---
X_train, X_test, Y_train, Y_test = train_test_split(data_lowres, data_hires, test_size=0.2, random_state=0)

# Print shapes
# Note, model input must have a four-dimensional shape [samples, rows, columns, channels]
print("Shape of X_train: ", X_train.shape)
print("Shape of Y_train: ", Y_train.shape)
print("Shape of X_test: ", X_test.shape)
print("Shape of Y_test: ", Y_test.shape)

上面的代码打印数据的形状，即[样本、行、列、通道]。

Shape of whole data_lowres:  (38, 64, 64, 3)
Shape of whole data_hires:  (38, 256, 256, 3)
Shape of X_train:  (30, 64, 64, 3)
Shape of Y_train:  (30, 256, 256, 3)
Shape of X_test:  (8, 64, 64, 3)
Shape of Y_test:  (8, 256, 256, 3)

为了更好地理解我们正在处理的数据，让我们显示一些低分辨率图像，我们将使用这些图像作为输入。

# Display images of 6 pandas in the training set (low resolution 64 x 64 pixels)
fig, axs = plt.subplots(2, 3, sharey=False, tight_layout=True, figsize=(16,9), facecolor='white')
n=0
for i in range(0,2):
    for j in range(0,3):
        axs[i,j].matshow(X_train[n])
        n=n+1
plt.show()

还有一些高分辨率图像将用作我们模型中的目标。

# Display images of 6 pandas in the test set (higher resolution 256 x 256 pixels)
fig, axs = plt.subplots(2, 3, sharey=False, tight_layout=True, figsize=(16,9), facecolor='white')
n=0
for i in range(0,2):
    for j in range(0,3):
        axs[i,j].matshow(Y_train[n])
        n=n+1
plt.show()