CN-121982399-A - Food classification deep learning method adapting to few sample scenes

CN121982399ACN 121982399 ACN121982399 ACN 121982399ACN-121982399-A

Abstract

The invention discloses a food classification deep learning method adapting to a few-sample scene, which comprises the steps of obtaining a few-sample food classification data set, preprocessing and enhancing an original image in the few-sample food classification data set to obtain an image x, constructing an RL-Clip model to perform feature processing on the image x, finally obtaining a food classification probability matrix, and optimizing an image processing process, so that the purposes of improving generalization of a detection model and improving detection efficiency, accuracy and confidence of a specific data set are achieved.

Inventors

XU XIANYING
XU ZHI
GAO SUO
CAO YINGHONG
MOU JUN

Assignees

大连工业大学

Dates

Publication Date: 20260505
Application Date: 20260123

Claims (4)

1. A food classification deep learning method adapting to few sample scenes is characterized by comprising the following steps: step 1, data acquisition; the small sample food classification data set adopts Padang food data sets, and comprises 993 original images, wherein the original images are all color images comprising three color channels, and 9 types of barton dishes are covered; Step 2, data enhancement; preprocessing and enhancing an original image in the small-sample food classification data set, and firstly scaling the original image to 224×224 pixels with uniform size to obtain a scaled image; Carrying out normalization processing on the scaled image, wherein the normalization processing comprises the steps of dividing 255 by each element, mapping the scaled image to a [0,1] interval to obtain Padang food normalization images; Finally, carrying out zero-mean normalization operation on each channel based on the pixel value mean value and standard deviation of each color channel of Padang food normalization images to obtain standardized food images, and carrying out random automatic data enhancement and sample mixing enhancement operation on the standardized food images to obtain images ; Step 3, constructing RL-Clip model, and aiming at the image Performing feature processing to obtain a food classification probability matrix; And 4, optimizing the image processing process.
2. The deep learning method for classifying delicious foods adapting to few sample scenes according to claim 1, wherein the specific method in the step 3 comprises the following steps: image Patch segmentation and vectorization, the formula is: In the formula, Representing an image The 768-dimensional token vector after segmentation, Representing dividing the image into 16 x 16 blocks of pixels and then flattening into a one-dimensional vector; Obtained by The image part Transformer Encoder module entering the CLIP model specifically includes: Layer Normalization, processing, wherein the formula is as follows: In the formula, Representation of 768-Dimensional token vectors processed by Layer Normalization, 、 Respectively is Is defined as the mean and standard deviation of (c), 、 Is a learnable scaling and offset parameter; The 12-head self-attention mechanism is processed by the following formula: In the formula, Representation of 768-Dimensional feature vectors processed by a 12-head multi-head self-attention mechanism, Is the first The output of the individual attention heads, 、、 Is the first The transform weights of the individual attention headers, Is that Is used in the manufacture of a printed circuit board, For the characteristic splicing operation, the method comprises the following steps, Is a linear projection matrix; residual connection processing, wherein the formula is as follows: In the formula, Representation of And (3) with 768-Dimensional visual feature vectors after residual connection, + represents feature vector element-by-element addition operation; Visual feature vector Performing a Fixed Random Attention Pooling (FRAP) operation to obtain feature vectors adapted to classification tasks The formula is: In the formula, The feature vector after pooling is represented, pool (∈) represents pooling operation of freezing weight parameters; Double-path feature fusion and classification processing are carried out on feature vectors Performing dual-path parallel RVFL operation and gating fusion operation to obtain a final food classification result, wherein the specific method comprises the following steps: Path 1, rvfl hidden layer operation, formula: In the formula, Representation of The high-order feature vector obtained by RVFL hidden layer processing, A matrix of weights is randomly fixed for RVFL hidden layers, The layer bias parameters are hidden for RVFL, Is a non-linear activation function of ReLU; path 2, linearly skipping the mapping operation, the formula is: In the formula, Representation of Is mapped to the original visual features of the video signal by linear skipping, Is a linear weight coefficient; And (3) gating fusion operation, namely integrating feature vectors of the two paths and giving a final prediction result, wherein the formula is as follows: In the formula, Representing the final feature matrix of the food image, The linear layer is represented to convert the feature matrix of the food picture into a prediction probability, Representing the final acquired categorical probability matrix, i.e., the probability that each category is predicted to be true, the highest one being the predicted outcome, For a trainable gated fusion weight coefficient, +. + represents an element-by-element addition operation of the feature vector.
3. The deep learning method for classifying delicious foods adapting to few sample scenes according to claim 1, wherein the specific method in the step 4 comprises the following steps: the cross entropy loss is used to evaluate the progress of model training and optimize the parameters of the model fine-tuning CLIP image decoder section and RVFL classifier section, and the formula of the cross entropy loss is as follows: Where L represents a loss function, What number of pictures, c is a specific category, Represent the first The samples belong to the true tags of class c, Represent the first The prediction probability of the sample belonging to the c-th class is obtained in the step 3 Elements in the matrix.
4. The deep learning method of categorical foods adapting to few sample scenes according to claim 1, wherein the categorical foods data set is derived from Kaggle websites, the data structure is divided into two folders, a training set train and a test set val, each folder contains classified images and corresponding labels, and the data set comprises nine categories of barton dishes, namely barton fried chicken, barton tender boiled chicken, barton coconut pulp stewed beef, barton shredded jerky, barton coconut pulp curry fish, barton coconut pulp curry stewed bone and meat, barton coconut pulp curry beef tendon, barton hot and spicy sauce fried chicken eggs and barton fried egg rolls.

Description

Food classification deep learning method adapting to few sample scenes Technical Field The invention relates to the technical field of deep learning, in particular to a food classification deep learning method adapting to few sample scenes. Background The regional characteristic food is one of the core carriers of regional culture, not only bears the unique food culture connotation, but also is an important foundation for the development of digital and intelligent catering services of regional travel. The food product covers rich regional dietary characteristics, such as food collocation, cooking technology, appearance form and the like, and the characteristics are key identifiers for regional culture identification and transmission. However, as with other image classification tasks, due to factors such as large regional feature food sample collection difficulty, high labeling cost, small appearance difference among categories and the like, a large-scale labeled food image dataset is difficult to construct, so that the performance of a main stream food classification model is obviously reduced in a few sample scene, category discrimination precision is insufficient, generalization capability is weak, and suitability of the model during storage may be reduced. The traditional regional food classification method determines food categories through artificial vision recognition, food component analysis and the like, has strong subjectivity, low efficiency and high cost, searches for documents for solving the problem of classifying food with few samples by using different artificial intelligent algorithms or different deep learning models, but has low generalization of most models, is effective for detecting specific small-scale data sets only, has difficult classification accuracy and F1 value to be compatible, and cannot meet the actual requirements of regional food digital protection and intelligent food service. According to the invention, the CLIP model comes from a Contrastive Language-Image Pre-tracking model published in 2021 by OpenAI, a Padang data set adopted by a small sample food classification experiment comes from a Kaggle website, and the data set can be obtained from the website, wherein https are:// www.kaggle.com/datasets/faldoae/padangfood Disclosure of Invention The invention aims to solve the problems that the existing food classification method has poor classification effect on small samples, the generalization of the existing food classification model is not high, the existing food classification method is only effective on each type of a specific data set, the accuracy and the confidence are not high, and the like. In order to solve the above problems, the present invention provides a deep learning method for classifying food in a small sample scene, comprising: step 1, data acquisition; the small sample food classification data set adopts Padang food data sets, and comprises 993 original images, wherein the original images are all color images comprising three color channels, and 9 types of barton dishes are covered; Step 2, data enhancement; preprocessing and enhancing an original image in the small-sample food classification data set, and firstly scaling the original image to 224×224 pixels with uniform size to obtain a scaled image; Carrying out normalization processing on the scaled image, wherein the normalization processing comprises the steps of dividing 255 by each element, mapping the scaled image to a [0,1] interval to obtain Padang food normalization images; Finally, carrying out zero-mean normalization operation on each channel based on the pixel value mean value and standard deviation of each color channel of Padang food normalization images to obtain standardized food images, and carrying out random automatic data enhancement and sample mixing enhancement operation on the standardized food images to obtain images ; Step 3, constructing RL-Clip model, and aiming at the imagePerforming feature processing to obtain a food classification probability matrix; And 4, optimizing the image processing process. Preferably, the specific method in step 3 includes: image Patch segmentation and vectorization, the formula is: In the formula, Representing an imageThe 768-dimensional token vector after segmentation,Representing dividing the image into 16 x 16 blocks of pixels and then flattening into a one-dimensional vector; Obtained by The image part Transformer Encoder module entering the CLIP model specifically includes: Layer Normalization, processing, wherein the formula is as follows: In the formula, Representation of768-Dimensional token vectors processed by Layer Normalization,、Respectively isIs defined as the mean and standard deviation of (c),、Is a learnable scaling and offset parameter; The 12-head self-attention mechanism is processed by the following formula: In the formula, Representation of768-Dimensional feature vectors processed by a 12-head multi-head self-attention mecha