Search

CN-122024134-A - Food material identification analysis method, system and device based on video image processing

CN122024134ACN 122024134 ACN122024134 ACN 122024134ACN-122024134-A

Abstract

The application discloses a food material identification analysis method, a system and a device based on video image processing, relates to the technical field of computer vision, and solves the technical problems that in the prior art, single-mode information based on static images is often identified, fusion analysis on multi-mode characteristics of food materials is lacking, so that accuracy of food material identification is low, multi-mode alignment video data are obtained through space-time alignment operation on multi-mode video data, multi-scale fusion characteristic data are generated, hierarchical identification and state analysis are carried out on the multi-scale fusion characteristic data to obtain a food material identification result, the problem that in the prior art, the fact that information dimension is insufficient due to the fact that the multi-mode video data depend on static single-mode images is solved, the multi-scale fusion characteristic data overcomes the defect that single-mode characteristics are not expressed fully, and accuracy of food material identification in complex scenes is improved.

Inventors

  • CHENG FEILONG
  • CHENG FEIYING
  • SHEN CUICUI
  • Hu Hanyun

Assignees

  • 安徽盈川大数据有限公司

Dates

Publication Date
20260512
Application Date
20260123

Claims (10)

  1. 1. A food material identification analysis method based on video image processing is characterized by comprising the following steps: Acquiring multi-mode video data; performing space-time alignment operation on the multi-modal video data to obtain multi-modal aligned video data; The multi-scale fusion characteristic data is obtained by carrying out multi-scale space-time characteristic extraction and fusion operation on the multi-mode aligned video data; and carrying out layering identification and state analysis on the multi-scale fusion characteristic data to obtain a food material identification result.
  2. 2. The method for identifying and analyzing food materials based on video image processing according to claim 1, wherein the performing a space-time alignment operation on multi-modal video data to obtain multi-modal aligned video data comprises: Extracting a plurality of frame images corresponding to multi-mode video data, wherein the multi-mode video data comprises RGB video, depth video and near infrared video; Matching each frame of image of the depth video and the near infrared video with the time point of the RGB video by using the RGB video as a reference through a nearest neighbor interpolation method to obtain time-aligned multi-mode video data; in the time-aligned multi-mode video data, the RGB video is used as a reference, a pre-calibrated transformation matrix is used for remapping, and each frame image of the depth video and the near infrared video is aligned to a coordinate system corresponding to a frame image in the RGB video to obtain space-aligned multi-mode video data; Performing modal preprocessing operation on the spatial aligned multi-modal video data to obtain the multi-modal aligned video data, wherein the modal preprocessing operation comprises RGB preprocessing, depth preprocessing and near infrared preprocessing.
  3. 3. The method for identifying and analyzing food materials based on video image processing according to claim 1, wherein the step of obtaining multi-scale fusion feature data by performing multi-scale space-time feature extraction and fusion operation on multi-modal aligned video data comprises the steps of: Extracting frame images corresponding to a plurality of time points in multi-mode aligned video data, wherein the frame images comprise RGB images, depth images and near infrared images; carrying out multi-scale feature extraction on frame images corresponding to a plurality of time points through corresponding feature extraction network models to obtain a modal feature sequence, wherein the modal feature sequence comprises an RGB feature sequence, a depth feature sequence and a near infrared feature sequence; and obtaining multi-scale fusion characteristic data by performing cross-modal characteristic fusion and time sequence aggregation operation on the modal characteristic sequence.
  4. 4. A method for identifying and analyzing food materials based on video image processing according to claim 3, wherein the cross-modal feature fusion comprises: extracting a plurality of scale features in a modal feature sequence; the scale features corresponding to a plurality of modes under the same scale are subjected to channel normalization operation and space alignment operation to obtain a plurality of aligned scale features; Respectively remolding the aligned scale features into two-dimensional feature matrixes, and calculating a similarity matrix between any two feature matrixes, wherein the similarity matrix represents the similarity degree of feature vectors of the scale features of different modes on corresponding spatial positions; Calculating the attention weight of the scale feature in the current mode relative to the scale features in other modes through a Softmax function, wherein the attention weight is used for indicating the supplement degree of the scale features of the other modes to the scale features of the current mode in the scale feature fusion process of the modes; weighting and fusing the scale features of each mode with the scale features of other modes according to the attention weight to obtain the scale enhancement features of each mode; fusing the scale enhancement features of a plurality of modes through a gating mechanism to obtain scale fusion features under the current scale; And performing cross-scale interaction on the primary scale fusion features under a plurality of scales to obtain scale fusion features, wherein the cross-scale interaction is realized through a two-way feature propagation path from top to bottom and from bottom to top.
  5. 5. A food material recognition analysis method based on video image processing according to claim 3, wherein the time-series aggregation comprises: Extracting a scale fusion feature sequence subjected to cross-modal feature fusion; Performing scale time sequence recombination on the scale fusion feature sequences to obtain scale fusion feature sequences corresponding to a plurality of scales; Carrying out time sequence aggregation operation on the scale fusion feature sequences of each scale through a time sequence aggregation network to obtain scale aggregation fusion feature sequences and scale global features of each scale; The method comprises the steps of obtaining a multi-scale fusion feature sequence by carrying out data remolding on the scale aggregation fusion feature sequence of each scale, wherein the data remolding is the operation of converting the scale aggregation fusion feature sequences of a plurality of scales into the same format as the scale fusion feature sequence; And determining multi-scale fusion feature data based on the multi-scale fusion feature sequence and the scale global feature.
  6. 6. The method for identifying and analyzing food materials based on video image processing according to claim 1, wherein the hierarchical identification and status analysis comprises: Acquiring environmental data; extracting multi-scale fusion feature data, wherein the multi-scale fusion feature data comprises a multi-scale fusion feature sequence and a scale global feature; generating scale feature fusion weights based on the environmental data and the scale global features; carrying out feature fusion on the multi-scale fusion feature sequence and the scale global feature through the scale feature fusion weight to obtain a comprehensive fusion feature sequence and a comprehensive global feature; obtaining a hierarchical classification result through the comprehensive global features by a hierarchical classification model; The comprehensive fusion characteristic sequence is input into a state trend analysis model to obtain a trend change result, and the state trend analysis model is constructed through an artificial intelligent model and is used for analyzing the change condition of freshness of food materials along with time.
  7. 7. The method for identifying and analyzing food materials based on video image processing according to claim 6, wherein the hierarchical classification model comprises a feature extraction layer and a hierarchical classification layer; the feature extraction layer is used for extracting high-level semantic features of comprehensive global features to obtain feature vectors, and consists of a global average pooling layer and a full connection layer; the hierarchical classification layer comprises a major class classifier and a minor class classifier; the input data of the large class classifier is a feature vector, and the output data is a large class classification result; the input data of the subclass classifier is a feature vector and a major class classification result, and the output data is a subclass classification result.
  8. 8. The method for identifying and analyzing food materials based on video image processing according to claim 6, wherein the generating scale feature fusion weights based on the environmental data and the scale global features comprises: extracting environment data and a plurality of scale global features, wherein the environment data comprises illumination intensity, temperature and humidity; the method comprises the steps that a plurality of scale global features are subjected to a corresponding linear layer to obtain scale global features with the same dimension; Respectively carrying out feature stitching on the scale global features and the environment data to obtain a plurality of comprehensive scale features; obtaining scores corresponding to a plurality of scales by a plurality of comprehensive scale features through a fully connected network, wherein the fully connected network consists of a linear layer and an activation function; and obtaining the scale feature fusion weights corresponding to the scales by a softmax normalization method.
  9. 9. A food material identification and analysis system based on video image processing is characterized by comprising a data acquisition module and a data analysis module which are connected with each other; The data acquisition module is used for acquiring multi-mode video data through data acquisition equipment; The data analysis module comprises a data processing unit, a modal fusion unit and a result generation unit; the data processing unit performs space-time alignment operation on the multi-modal video data to obtain multi-modal aligned video data; The modal fusion unit is used for obtaining multi-scale fusion characteristic data by carrying out multi-scale space-time characteristic extraction and fusion operation on multi-modal aligned video data; And the result generating unit is used for carrying out layering identification and state analysis on the multi-scale fusion characteristic data to obtain a food material identification result.
  10. 10. A food material identification and analysis device based on video image processing, comprising a processor and a storage medium, wherein the storage medium comprises instructions, and the processor is used for executing the instructions to realize the food material identification and analysis method based on video image processing according to any one of claims 1-8.

Description

Food material identification analysis method, system and device based on video image processing Technical Field The application belongs to the technical field of computer vision, and particularly relates to a food material identification and analysis method, a system and a device based on video image processing. Background Under the background of rapid development of intelligent kitchens, intelligent catering and health management at present, a food material identification and analysis method based on video image processing is becoming an important bridge for connecting artificial intelligence and daily life. The method utilizes computer vision and deep learning technology to automatically identify the food material types by processing the image data collected by the camera. The core significance of the method is not only to promote the automation and the intellectualization level of food processing, but also to promote the evolution of the diet behavior to the scientific, personalized and healthy directions. In the prior art, identification is often performed based on single-mode information of a static image, fusion analysis on multidimensional characteristics of food materials is absent, so that accuracy of food material identification is low, and therefore, a food material identification analysis method for video image processing still needs to be further improved. Disclosure of Invention The application aims to at least solve one of the technical problems in the prior art, and provides a food material identification and analysis method, a system and a device based on video image processing, which are used for solving the technical problems that the prior art usually carries out identification based on single-mode information of a static image, and fusion analysis on multidimensional characteristics of food materials is lacking, so that the accuracy of food material identification is lower. To achieve the above object, a first aspect of the present application provides a food material recognition analysis method based on video image processing, including: Acquiring multi-mode video data; performing space-time alignment operation on the multi-modal video data to obtain multi-modal aligned video data; The multi-scale fusion characteristic data is obtained by carrying out multi-scale space-time characteristic extraction and fusion operation on the multi-mode aligned video data; and carrying out layering identification and state analysis on the multi-scale fusion characteristic data to obtain a food material identification result. The method breaks through the limitation of the traditional static image by utilizing the multi-mode video data, then performs multi-scale space-time feature extraction and cross-mode fusion on the aligned video sequence, builds unified feature representation integrating texture details, three-dimensional geometric structures and internal component information, remarkably enhances the integrity and discriminant of food material features, realizes dynamic and accurate judgment on food material types and freshness variation trends thereof by layering identification and time sequence state modeling on the basis, and improves the robustness and accuracy of food material identification in complex real scenes. Further, the performing the space-time alignment operation on the multi-modal video data to obtain multi-modal aligned video data includes: Extracting a plurality of frame images corresponding to multi-mode video data, wherein the multi-mode video data comprises RGB video, depth video and near infrared video; Matching each frame of image of the depth video and the near infrared video with the time point of the RGB video by using the RGB video as a reference through a nearest neighbor interpolation method to obtain time-aligned multi-mode video data; in the time-aligned multi-mode video data, the RGB video is used as a reference, a pre-calibrated transformation matrix is used for remapping, and each frame image of the depth video and the near infrared video is aligned to a coordinate system corresponding to a frame image in the RGB video to obtain space-aligned multi-mode video data; Performing modal preprocessing operation on the spatial aligned multi-modal video data to obtain multi-modal aligned video data, wherein the modal preprocessing operation comprises RGB preprocessing, depth preprocessing and near infrared preprocessing and is used for improving the image quality of a plurality of frames of images in the multi-modal video data and reducing noise. The method takes RGB video as a reference, adopts a nearest neighbor interpolation method to realize synchronous matching of depth and near infrared video frames in a time dimension, utilizes a pre-calibrated transformation matrix to remap in a space dimension, unifies multi-mode data to the same coordinate system, performs special preprocessing on each mode characteristic, ensures strict consistency of the multi-mode data in ti