Search

US-12626510-B2 - Video data processing method and system, and relevant assemblies

US12626510B2US 12626510 B2US12626510 B2US 12626510B2US-12626510-B2

Abstract

A video data processing method, including: obtaining three-dimensional feature data and three-dimensional weight data corresponding to video data; separately preprocessing the three-dimensional feature data and the three-dimensional weight data to obtain a feature value matrix and a weight value matrix; and inputting the feature value matrix and the weight value matrix into a plurality of three-dimensional systolic arrays for parallel computing to obtain a video data processing result. The present method can fully expand the degree of parallelism of computation and a four-dimensional systolic computation architecture is constructed by using multiple three-dimensional systolic arrays, so as to perform parallel computing on a three-dimensional feature value matrix and a three-dimensional weight value matrix, thereby shortening the computation time of three-dimensional convolution, and improving the video data processing efficiency.

Inventors

  • Gang Dong
  • Yaqian ZHAO
  • Rengang Li
  • Hongbin YANG
  • Haiwei Liu
  • Dongdong JIANG

Assignees

  • INSPUR SUZHOU INTELLIGENT TECHNOLOGY CO., LTD.

Dates

Publication Date
20260512
Application Date
20210426
Priority Date
20200925

Claims (16)

  1. 1 . A video data processing method, comprising: acquiring, by a video acquisition device, three-dimensional feature data and three-dimensional weight data corresponding to video data; pre-processing the three-dimensional feature data and the three-dimensional weight data, respectively, to obtain a feature value matrix and a weight value matrix; and inputting the feature value matrix and the weight value matrix into a plurality of three-dimensional systolic arrays for parallel calculations to obtain a video data processing result; wherein the video data processing result comprises a classification result, a feature extraction result; wherein inputting the feature value matrix and the weight value matrix into a plurality of three-dimensional systolic arrays for parallel calculations to obtain a video data processing result comprises: calculating the feature value matrix and the weight value matrix in an i-th input channel according to a corresponding target intermediate value through an i-th three-dimensional systolic array to obtain an i-th calculation result, where i=1, 2, . . . , Cin; and obtaining the video data processing result according to a Cin-th calculation result; where the target intermediate value is 0 when i=1, and the target intermediate value is an (i−1)th calculation result when 1<i≤Cin; wherein calculating the feature value matrix and the weight value matrix in an i-th input channel according to a corresponding target intermediate value through an i-th three-dimensional systolic array to obtain an i-th calculation result comprises: storing Cout weight value matrices corresponding to the feature value matrix in the i-th input channel into Cout calculation units of the i-th three-dimensional systolic array, respectively, wherein Cout is a number of output channels; sequentially inputting each sub-feature value matrix corresponding to the feature value matrix in the i-th input channel into the i-th three-dimensional systolic array in a first preset cycle; performing calculation through each of the calculation units according to the target intermediate value, the feature value matrix that is received, and the weight value matrix that is stored, to obtain sub-calculation results corresponding to the calculation units; and obtaining the i-th calculation result based on all the sub-calculation results; wherein inputting each sub-feature value matrix corresponding to the feature value matrix in the i-th input channel into the i-th three-dimensional systolic array comprises: inputting q feature values of an r-th row of each sub-feature value matrix corresponding to the feature value matrix in the i-th input channel into q processing elements of an r-th row of the Cout calculation units of the i-th three-dimensional systolic array in a second preset cycle, where a size of the sub-feature value matrix is p×q, p and q are both positive integers, and r=1, 2, . . . , p−1; wherein a time interval between inputting q feature values in an (r+1)th row of the sub-feature value matrix to a j-th calculation unit and inputting q feature values in the r-th row of the sub-feature value matrix to the j-th calculation unit is the second preset cycle, where j=1, 2, . . . , Cout; wherein the video data is video data taken from a security monitor or collected during an automatic driving process, or video data of a streaming media on-line video.
  2. 2 . The video data processing method according to claim 1 , wherein pre-processing the three-dimensional feature data to obtain a feature value matrix comprises: splitting the three-dimensional feature data according to a convolution kernel size into a plurality of feature data groups, and converting each of the feature data groups into a corresponding two-dimensional matrix according to a preset mapping relationship; and obtaining the feature value matrix from all the two-dimensional matrices.
  3. 3 . The video data processing method according to claim 2 , wherein pre-processing the three-dimensional weight data to obtain a weight value matrix comprises: rearranging the three-dimensional weight data according to the preset mapping relationship to obtain the weight value matrix.
  4. 4 . The video data processing method according to claim 1 , wherein performing calculation through each of the calculation units according to the target intermediate value, the feature value matrix that is received, and the weight value matrix that is stored to obtain sub-calculation results corresponding to the calculation units comprises: performing calculation according to a first relational equation through q processing elements of the r-th row of each of the calculation units to obtain a calculation result of each processing element; wherein the first relational equation is h rw =t rw ×q rw +c rw , where h rw is a calculation result of a w-th processing element in the r-th row, t rw is the feature value received by the w-th processing element in the r-th row, q rw is the weight value of the w-th processing element in the r-th row, c rw is the target intermediate value corresponding to the w-th processing element in the r-th row, and w=1, 2, . . . , q; and obtaining the sub-calculation results of the calculation units from a sum of the calculation results of all the processing elements in a same column.
  5. 5 . The video data processing method according to claim 4 , wherein obtaining the video data processing result according to a Cin-th calculation result comprises: acquiring output results of all the calculation units in the Cin-th three-dimensional systolic array; and obtaining the video data processing result according to output results output from the Cout calculation units.
  6. 6 . The video data processing method according to claim 5 , wherein acquiring output results of all the calculation units in the Cin-th three-dimensional systolic array comprises: acquiring the output results of all the calculation units in the Cin-th three-dimensional systolic array through a second relational equation, wherein the second relational equation is H = ∑ w = 1 q ⁢ ( ∑ r = 1 p ⁢ h rw ) .
  7. 7 . An electronic device, comprising: a memory configured to store a computer program; and a processor configured to execute the computer program to implement the video data processing method according to claim 1 .
  8. 8 . The electronic device according to claim 7 , wherein pre-processing the three-dimensional feature data to obtain a feature value matrix comprises: splitting the three-dimensional feature data according to a convolution kernel size into a plurality of feature data groups, and converting each of the feature data groups into a corresponding two-dimensional matrix according to a preset mapping relationship; and obtaining the feature value matrix from all the two-dimensional matrices.
  9. 9 . The electronic device according to claim 8 , wherein pre-processing the three-dimensional weight data to obtain a weight value matrix comprises: rearranging the three-dimensional weight data according to the preset mapping relationship to obtain the weight value matrix.
  10. 10 . The electronic device according to claim 7 , wherein the three-dimensional feature data and the three-dimensional weight data corresponding to the video data are acquired in a preset obtaining cycle or acquired after an acquisition instruction is received.
  11. 11 . A non-transitory computer-readable storage medium having stored therein a computer program which, when executed by a processor, implements the video data processing method according to claim 1 .
  12. 12 . The non-transitory computer-readable storage medium according to claim 11 , wherein pre-processing the three-dimensional feature data to obtain a feature value matrix comprises: splitting the three-dimensional feature data according to a convolution kernel size into a plurality of feature data groups, and converting each of the feature data groups into a corresponding two-dimensional matrix according to a preset mapping relationship; and obtaining the feature value matrix from all the two-dimensional matrices.
  13. 13 . The video data processing method according to claim 1 , wherein the video data is taken from a security monitor process, or collected during an automatic driving process, or the video data of a streaming media on-line video.
  14. 14 . The video data processing method according to claim 1 , wherein the three-dimensional feature data and the three-dimensional weight data corresponding to the video data are acquired in a preset obtaining cycle or acquired after an acquisition instruction is received.
  15. 15 . The video data processing method according to claim 1 , wherein after the three-dimensional feature data and the three-dimensional weight data are obtained, the three-dimensional feature data and the three-dimensional weight data are subjected to dimension reduction, so as to satisfy the requirements of the scale and time sequence of the three-dimensional systolic array.
  16. 16 . The video data processing method according to claim 1 , wherein the video data processing result is at least one of a classification result and a feature extraction result.

Description

CROSS-REFERENCE TO RELATED APPLICATION The present disclosure claims the priority of the Chinese patent application filed on Sep. 25, 2020 before the CNIPA, China National Intellectual Property Administration with the application number of 202011026282.4 and the title of “VIDEO DATA PROCESSING METHOD, SYSTEM, AND RELATED COMPONENTS”, which is incorporated herein in its entirety by reference. FIELD The present disclosure relates to the field of video data processing, and more particularly, to a video data processing method, system, and related components. BACKGROUND Video feature extraction is a basic step of video data processing, and almost all the processes of video analysis and video processing need video feature extraction first. Three-dimensional convolutional neural networks (CNN) have great advantages in video classification, motion recognition, and other fields because such a technique may better capture the time and spatial feature information in a video. Three-dimensional convolution is the main calculation step in three-dimensional CNN, through which video data may be classified or features may be extracted therefrom. In the related technology, the calculation to the three-dimensional convolution is basically conducted by reducing the dimension, transforming, and mapping three-dimensional data into two-dimensional data or even one-dimensional data for local parallel calculation. However, due to the huge amount of calculation, the calculation runs quite slowly, resulting in the inefficient video data processing. SUMMARY It is an object of the present disclosure to provide a video data processing method, system, electronic device, and computer-readable storage medium, whereby the parallel degree of calculation is fully extended, a four-dimensional systolic calculation architecture is constructed by using multiple three-dimensional systolic arrays to perform parallel calculations on the feature value matrix and the weight value matrix, which shortens the calculation time of three-dimensional convolution and improves the video data processing efficiency. To solve the problem, the present disclosure discloses a video data processing method, including: acquiring three-dimensional feature data and three-dimensional weight data corresponding to video data;pre-processing the three-dimensional feature data and the three-dimensional weight data, respectively, to obtain a feature value matrix and a weight value matrix; andinputting the feature value matrix and the weight value matrix into a plurality of three-dimensional systolic arrays for parallel calculations to obtain a video data processing result. In some embodiments, pre-processing the three-dimensional feature data to obtain a feature value matrix includes: splitting the three-dimensional feature data according to a convolution kernel size into a plurality of feature data groups, and converting each of the feature data groups into a corresponding two-dimensional matrix according to a preset mapping relationship; andobtaining the feature value matrix from all the two-dimensional matrices. In some embodiments, pre-processing the three-dimensional weight data to obtain a weight value matrix includes: rearranging the three-dimensional weight data according to the preset mapping relationship to obtain the weight value matrix. In some embodiments, inputting the feature value matrix and the weight value matrix into a plurality of three-dimensional systolic arrays for parallel calculations to obtain a video data processing result includes: calculating the feature value matrix and the weight value matrix in an i-th input channel according to a corresponding target intermediate value through an i-th three-dimensional systolic array to obtain an i-th calculation result, where i=1, 2, . . . , Cin; andobtaining the video data processing result according to a Cin-th calculation result;where the target intermediate value is 0 when i=1, and the target intermediate value is an (i−1)th calculation result when 1<i≤Cin. In some embodiments, calculating the feature value matrix and the weight value matrix in an i-th input channel according to a corresponding target intermediate value through an i-th three-dimensional systolic array to obtain an i-th calculation result includes: storing Cout weight value matrices corresponding to the feature value matrix in the i-th input channel into Cout calculation units of the i-th three-dimensional systolic array, respectively, wherein Cout is a number of output channels;sequentially inputting each sub-feature value matrix corresponding to the feature value matrix in the i-th input channel into the i-th three-dimensional systolic array in a first preset cycle;performing calculation through each of the calculation units according to the target intermediate value, the feature value matrix that is received, and the weight value matrix that is stored, to obtain sub-calculation results corresponding to the calculation units; andobtaining