CN-116844009-B - RGB-D image characteristic collaborative fusion method based on transfer learning and width learning

CN116844009BCN 116844009 BCN116844009 BCN 116844009BCN-116844009-B

Abstract

The invention provides an RGB-D image feature collaborative fusion method based on transfer learning and width learning, which comprises the following steps of obtaining an RGB-D data set, performing preliminary training through a neural network, performing secondary training on the data set after modifying a structure, performing correlation analysis fusion on RGB image features and depth image features after extracting features, and performing classification recognition on the fused features by using width learning. The invention can reasonably integrate the characteristics of the RGB image and the depth image, ensure that the characteristic information of the color image and the depth image can be mutually complemented, improve the running speed of the system by utilizing the width learning, and finally lead the classification result to have higher accuracy and reliability.

Inventors

LI PENGYUE
HU DONGMEI
XING JIWEI
XU XINYING

Assignees

太原理工大学

Dates

Publication Date: 20260512
Application Date: 20230705

Claims (7)

1. The RGB-D image characteristic collaborative fusion method based on transfer learning and width learning is characterized by comprising the following steps: S1, acquiring an RGB-D data set from a public data set; S2, constructing ResNet neural networks, performing preliminary training on a dataset ImageNet to obtain ResNet neural networks for the preliminary training, and storing the ResNet neural networks; S3, performing fine adjustment on the ResNet neural network obtained through preliminary training on an RGB-D data set, removing the last layer of the neural network ResNet, and after data are input, respectively outputting a feature vector of an RGB image and a depth image in the network to obtain features of the two images; S4, based on the two feature vectors of the RGB image and the depth image in the step S3, respectively serving as two groups of variables without explicit correlation inputted by a typical correlation analysis CCA layer, maximizing a correlation coefficient between the two groups of variables, uniformly mapping the two groups of variables onto a feature space generated by CCA for learning, and taking a parallel matrix as a fusion feature according to a CCA feature fusion strategy to realize fusion dimension reduction of the feature; and S5, generating feature nodes and enhancement nodes by utilizing width learning based on the fusion features in the step S4, and obtaining a final classification recognition result by utilizing generalized inverse of ridge regression.
2. The collaborative fusion method of RGB-D image features based on transfer learning and width learning according to claim 1, wherein in step S1, the specific process is that each object is placed on a turntable, a complete rotation period of the turntable is recorded by using a 3D camera, each object comprises 3 video sequences, and the 3 video sequences are recorded on cameras with different heights respectively, so that RGB images and depth images with different visual angles and under different illumination conditions can be obtained to form a data set.
3. The collaborative fusion method of RGB-D image features based on transfer learning and width learning according to claim 2, wherein in step S2, the neural network ResNet is a module composed of residual blocks, each module is composed of a plurality of residual blocks with the same number of output channels, and the construction process is that a PyTorch framework is adopted to construct the neural network ResNet.
4. The RGB-D image feature collaborative fusion method based on transfer learning and width learning according to claim 3, wherein in step S2, the preliminary training is carried out by loading and storing the parameters ResNet of the publication which are trained on the dataset ImageNet.
5. The collaborative fusion method of RGB-D image features based on transfer learning and width learning according to claim 4, wherein in the step S3, after the network is trimmed, the last layer of the neural network ResNet is removed, after data is input, the RGB image and the depth image output a feature vector in the network respectively, and the output feature vector is subjected to average pooling and flattening operation.
6. The RGB-D image feature collaborative fusion method based on transfer learning and width learning according to claim 5, wherein in the step S4, the fusion of two sets of feature vectors is performed in such a way that the purpose of CCA is two sets of feature vectors And Find a pair of projection axes And Performing linear transformation to make linear combination And The correlation between the two is the largest.
7. The RGB-D image feature collaborative fusion method based on transfer learning and width learning according to claim 6, wherein in step S5, the specific process of width learning classification recognition is that the fusion feature in step S4 is used as an input sample Generating Group feature mapping, each group comprising Generating random weight matrix by characteristic nodes The value is Gaussian, then Group mapping features , wherein, The activation function is represented as a function of the activation, Representing randomly generated bias, and optimizing random weight matrix by adopting sparse self-coding thought in width learning , Defining and generating m groups of enhanced nodes, and calculating And is opposite to Scaling and calculating Activating the enhanced node using an activation function, the enhanced node of the j-th group is expressed as , wherein, Representing a non-linear activation function, And Representing fixed randomly generated weights and offsets, the output of the width learning is expressed as: wherein Representing the output layer weights of the width learning, Representing all input features of the width learning.

Description

RGB-D image characteristic collaborative fusion method based on transfer learning and width learning Technical Field The invention relates to computer vision and image processing technology, in particular to an RGB-D image feature collaborative fusion method based on transfer learning and width learning. Background The computer vision refers to the machine vision that a camera and a computer replace human eyes to identify, track, measure and the like targets, and further perform graphic processing to carry out classification identification, pose judgment, size measurement and the like of the targets. With the continuous development of technology, research on computer vision has also made breakthrough progress. The classification and identification of the images are the most basic and important branches in computer vision, and have important promotion effect on the development of multimedia retrieval technology. In recent years, more and more research works design a large number of deep convolutional neural networks for RGB image recognition, and the deep convolutional neural networks have been widely applied to various industries such as traffic monitoring, intelligent security, intelligent robots and automatic assembly and return of parts. However, in practical application, due to limitations of RGB images, the RGB images are easily affected by factors such as illumination background, mutual shielding and overlapping among targets, and the like in the practical application process, and the situations that the target recognition rate is low and the targets cannot be correctly classified exist, so that the requirements in practical application cannot be met. With the development of sensor technology, devices capable of acquiring three-dimensional images, which are based on two-dimensional images, have emerged, increasing depth information of objects in the images. In recent years, an RGB-D camera such as Kinect can capture an RGB image and a depth image of an object at the same time, wherein the depth image contains space geometry information of the object, so that the RGB image and the depth image effectively supplement each other information, and robustness and accuracy of image recognition can be improved. Therefore, how to sufficiently fuse both information is a key issue for RGB images and depth images. The current exploration of RGB image and depth image data cross-modal complementary RGB-D fusion networks is divided into a single-flow network architecture and a double-flow network architecture. The single-stream network architecture learns the RGB image and depth image features together by concatenating both data to obtain a feature map. However, the algorithm directly and serially ignores the difference of the RGB image and the depth image data, and the obtained feature map cannot fully express the image. The dual stream network architecture learns the RGB image and the depth image separately through two independent branches, and then learns the joint representation of the two features through a shared network layer added early or late to obtain the final feature map. The algorithm performs feature extraction on the depth image and the RGB image respectively, and performs fusion of the RGB features and the depth features on the feature layer, so that the accuracy of image classification can be effectively improved. However, the deep neural network for feature extraction has the problems of long calculation time, complex structure and the like. Therefore, aiming at the characteristics of the RGB image and the depth image, the invention selects the network structure with simple flat network structure, has high learning speed and less steps, introduces transfer learning to establish the network structure fusing the image characteristics of the two images, and performs experiments on the RGB-D data set, thereby indicating that the network structure has higher classification precision and robustness. Disclosure of Invention Aiming at the problem that the existing RGB image and depth image cannot be fully fused, the invention provides the RGB-D image feature collaborative fusion method based on transfer learning and width learning, which has high accuracy and good stability compared with the traditional neural network model. The technical scheme provided by the invention is an RGB-D image characteristic collaborative fusion method based on transfer learning and width learning, which is implemented according to the following steps: S1, acquiring a Washington RGB-D data set from a public data set, and processing the data to obtain network input; S2, constructing a neural network ResNet, performing preliminary training on a data set ImageNet to obtain a neural network ResNet for the preliminary training, and storing; S3, performing fine adjustment on the RGB-D data set on the neural network ResNet obtained through preliminary training, removing the last layer of the neural network ResNet, and after data are i