CN-121982386-A - Visual dairy product wastewater flocculation body detection method
Abstract
The invention discloses a visual-based dairy waste water flocculent body detection method, relates to the field of flocculent body visual detection, and aims to solve the problems that the difference between flocculent body images of dairy waste water is small, the prior art has limitation on global information modeling and multi-scale feature fusion, and classification accuracy is insufficient. Firstly dividing a floccule image into image blocks with fixed sizes, performing linear embedding and position coding, introducing a cross attention mechanism to realize dynamic interaction and fusion of different scale features, and simultaneously combining a loss function to improve the resolution speed. The model has high classification accuracy, good prediction consistency of each class, excellent multi-scale feature capturing capability, obviously reduced subdivision class misjudgment rate, obviously improved generalization performance and classification stability, and the detection method based on the model has high detection speed and accurate identification, and can provide reliable guidance for automatic addition of flocculant.
Inventors
- XU JUNPENG
- WANG WENZHAN
- CHEN BOWEN
- LI WEI
- Liang Shunting
- LI HAO
- LIN KAIKAI
- WANG WENTENG
- GAO SHUAIPENG
- XU YAOCHANG
- CHEN GUANGSHENG
- NIE FUQUAN
- ZHANG WEIDONG
- WANG YU
- LU SIYU
- CHENG JINHAO
Assignees
- 河南科技学院
- 河南绿丰环保工程有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260116
Claims (6)
- 1. The visual dairy waste water flocculent detection method is characterized by comprising the following steps of: step 1, obtaining a visual image of a sample; Step 2, inputting the visual image obtained in the step 1 into a Vim model, and generating a plurality of groups of images with different sizes by the Vim model according to different sizes, and marking the images as In each group of images there is Non-overlapping image blocks; Step 3, flattening each image block into a vector, and mapping the vector to Dimension embedding space to form high dimension vector features; Step 4, each obtained in the step 3 is fused by a cross attention fusion module The high-dimensional vector features of the (4) are fused, feature conversion and compression are carried out through a Mamba coding module, the converted features are mapped into multi-dimensional original scores through a fully connected network layer, and then the multi-dimensional original scores are obtained through the fully connected network layer The function maps the multidimensional original score into probability distribution, and the expression of the image recognition process in the step is as follows: Wherein, the Representing a given input floc image; representing an output layer activation function for mapping the original output of the model to probability distributions over the categories; Representing an input floc image in dimensions The real number matrix form representation of H, W corresponds to the height and width of the image respectively, M represents a model adopted by the input floc image classification task; and 5, outputting probability distribution.
- 2. The method for detecting flocculation of dairy waste water based on vision as claimed in claim 1, wherein in the step 2, the visual image obtained in the step 1 is Wherein The height and width of the image respectively, The number of channels, the number of nodes in the space-time network, and the number of non-overlapping image blocks The calculation mode of (a) is as follows: wherein each image block has a size of 。
- 3. A visual dairy waste water flocculation detection method according to claim 2, wherein in step 3, a learnable linear projection matrix is passed Mapping flattened vectors to Dimension embedding space, embedding vector The following is shown: Subsequently, it will Is arranged in a shape of Is embedded in a vector The following formula is shown: Wherein, the Is flattened Mapping.
- 4. A visual dairy waste water flocculation detection method according to claim 2, wherein in step 3, the Vim model is for each embedded vector Position coding is added to obtain an embedded sequence : Wherein the dimension of the position code and the embedded vector Is provided with a plurality of grooves of the same dimension, vim model in embedded sequence A classification flag is added at the beginning of the image block, the dimension of the classification flag is the same as the dimension of the embedded image block.
- 5. The method for detecting flocculation body of dairy waste water based on vision according to claim 1, wherein in the step 4, the output result of the cross-attention fusion module is calculated as follows: Wherein, the Representation of A query sequence of image blocks, Is that The key sequence of the image block is set, Representation of The sequence of values of the image block, Is the evolution of the dimension of the key vector in the cross-attention fusion module, Representing a pair of key sequences And performing transposition.
- 6. A visual dairy waste water flocculant detection method according to claim 1, wherein in said step 4, parallel Focal Loss functions are optimized in mapping probability distributions, the Loss functions are as follows: Wherein, the To be a balance factor, for a positive sample, For the negative example, the negative sample, , The value of (2) is between [0,1 ]; Is the focal factor; to predict probability.
Description
Visual dairy product wastewater flocculation body detection method Technical Field The invention relates to the technical field of floccule vision detection, in particular to a dairy waste water floccule detection method based on vision. Background The images of the dairy waste water floccules have small differences among classes due to similar colors and similar forms, and the traditional Convolutional Neural Network (CNN) and the traditional vision model (such as ViT model and ResNet model) have the problems of limitation and insufficient classification precision in global information modeling and multi-scale feature fusion, and a ViT model is taken as an example, as shown in fig. 1, and is a Vision Transformer model (namely ViT model) in the prior art, and an input image can be divided into grid blocks from left to right and from top to bottom. Then, each grid can be flattened, position-coded, spatially modeled by using a transducer, and globally pooled to obtain an image representation, and fused space-time representations are generated by using an MLP head and a transducer encoder, and finally output, but the transducer has the problems of high resource consumption, low operation speed and the like, while the Mamba structure has the advantages of high speed, low resource consumption and the like, so that the prior art replaces the transducer with Mamba to obtain a Vision Mamba model (namely a Vim model, as shown in fig. 2). However, the existing Vim model only uses a single scale, so that only the spatial position relation of the image under the scale can be obtained, and the spatial relation inside the patch cannot be effectively modeled and extracted. This results in visual-based false judgment of the dairy waste water flocculent form judgment, and brings inconvenience to intelligent guidance of automatic addition of sewage flocculant. The Cross-Attention mechanism (Cross-Attention) is an existing mechanism for solving the problem of information alignment between different modalities in a multi-modality task, but because the Vim model is an efficient alternative self-Attention mechanism designed for a single-modality visual task, the two mechanisms have no relevance in application scenes and technical targets, so the two technologies are not combined in the prior art. Chinese patent CN119785239A discloses a dynamic detection and early warning method and system for pond reconstruction based on unmanned aerial vehicle images, and adopts the technical scheme that a U-Mamba module, a multi-scale ViT module and a multi-scale feature fusion module are connected to form a dynamic detection and early warning system, and real-time video streams acquired by unmanned aerial vehicles are processed and segmented to identify pond areas. The prior art is used for macroscopic scenes, the dynamic video (512×512 resolution) of the surrounding pond area of unmanned aerial vehicle aerial photography is characterized in that the data features are large in target scale, complex in background (such as vegetation/water interference) and dynamic change to be captured, the unmanned aerial vehicle aerial photography cannot be quantitatively classified, only two kinds of judgment are performed, the judgment is simple, the retrospective judgment basis cannot be performed, the unmanned aerial vehicle aerial photography is only suitable for area supervision, the production process cannot be directly guided, the used U-Mamba + ViT mixed architecture depends on a U-shaped structure (U-Mamba) to adapt and divide tasks, a multiscale ViT module (containing ACA-Former) is required to be connected in series to supplement global information, the architecture is complex, an up-sampling/down-sampling module is introduced for adapting and dividing tasks, and the calculation complexity is high (512×512 feature graphs are required to be processed and 500 rounds of training is required). Disclosure of Invention The invention aims to overcome the existing defects, and provides a visual dairy waste water flocculating body detection method which can effectively solve the problems in the background technology. In order to achieve the purpose, the invention discloses a visual dairy waste water flocculating body detection method, which adopts the technical scheme that the method comprises the following steps: step 1, obtaining a visual image of a sample; step 2, inputting the visual image obtained in the step 1 into Vim (Vision Mamba) models, generating a plurality of groups of images with different sizes by using the Vim models according to different sizes, and marking the images as In each group of images there isNon-overlapping image blocks; Step 3, flattening each image block into a vector, and mapping the vector to Dimension embedding space to form high dimension vector features; Step 4, each obtained in the step 3 is fused by a cross attention fusion module The high-dimensional vector features of the (4) are fused, feature conversion and compre