CN-121982747-A - Method for identifying individual identity of disturbance-stripping type multi-mode cattle in natural scene
Abstract
A method for identifying the identity of a multi-mode individual of a cow by stripping interference in a natural scene. The method is characterized in that three types of pure mode input data are respectively formed through three-mode feature extraction after original image processing, the three types of pure mode input data are respectively processed by a mixed expert module and a quality scoring module to obtain single-mode features and quality scores, and the single-mode features and the quality scores are combined sequentially to complete the process of cow individual identity matching after dynamic sparse gating and feature fusion processing.
Inventors
- LI QI
- ZHANG ZHIWEI
- YANG MEI
- Bai Zhuoyu
Assignees
- 内蒙古科技大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260128
Claims (9)
- 1. A method for identifying the identity of a multi-mode individual cattle in a natural scene by using a descrambling mode is characterized in that three types of pure mode input data are respectively formed through three-mode feature extraction after an original image is processed, the three types of pure mode input data are respectively processed by a mixed expert module and a quality scoring module to obtain single-mode features and quality scores, and the single-mode features and the quality scores are combined and sequentially subjected to dynamic sparse gating and feature fusion processing to complete the process of individual identity matching.
- 2. The method for identifying the identity of the multi-mode individual cattle in the natural scene, which is characterized in that the process of respectively forming three types of pure mode input data through three-mode feature extraction after processing an original image is to collect an image through a camera, the collected image is respectively formed into a boundary box and a mask of three sub-areas of an output cattle face, a cattle nose mirror pattern and a cattle body pattern after being processed through an example segmentation and mode extraction module, and the boundary box and the mask of the three sub-areas of the output cattle face, the cattle nose mirror pattern and the cattle body pattern are the three types of pure mode input data.
- 3. The method for identifying the identity of the multi-modal cattle individual in the natural scene according to claim 2, wherein the process of respectively forming three types of pure modal input data through three-modal feature extraction after processing the original image is to use the YOLO11n-seg trained by the corresponding data as a basic segmentation model to complete the complete extraction process from the original image to the three modal input, and the method is characterized by comprising the following steps of: Sequentially performing size scaling, normalization and channel adjustment treatment on a natural scene image shot by a camera, and inputting the natural scene image into a YOLO11n-seg model; step two, in the YOLO11n-seg model, the model outputs a boundary box of the whole target of the cow Screening targets with the confidence coefficient larger than or equal to 0.7, and correspondingly processing each bovine case after de-duplication processing through non-maximum value inhibition when multiple bovine is present in a single image; Step three, for a single-head cow example, performing three-mode feature extraction processing in a YOLO11n-seg model, namely outputting boundary frames and masks of three sub-areas of a cow face, cow nose patterns and cow body patterns simultaneously, wherein the cow face area uses eyes and ears as core positioning references, the cow nose patterns area is limited in a nose mirror skin texture range, and the cow body patterns area uses side face features of a trunk as the most obvious feature corresponding area; Extracting a cattle face area, a cattle nose mirror pattern area and a cattle body pattern area, dividing a corresponding area from a cattle whole image according to mask information of the cattle face area, the cattle nose mirror pattern area and the cattle body pattern area, removing background pixels to obtain an image only containing single mode characteristics, and correspondingly marking the sub mode area as a missing state when the confidence coefficient of the sub mode area is smaller than 0.5, wherein subsequent processing corresponding to the missing state is zero matrix; Step five, uniformly adjusting the three extracted mode images into the following mode images And three types of pure modal input data are formed after the standard size is converted into a tensor format.
- 4. The method for identifying the identity of the individual peaked multi-modal cattle in the natural scene according to claim 3, wherein the process of obtaining the single-mode characteristics and the quality scores by respectively processing the three types of pure modal input data through the mixed expert module and the quality score module after obtaining the three types of pure modal input data is as follows: The process of obtaining the single-mode characteristics after the three types of pure mode input data are processed by the mixed expert module is that the three types of pure mode input data are subjected to preprocessing with natural scene adaptation to form a preprocessing mode diagram, the preprocessing mode diagram is processed by the expert model to finish the extraction processing of the original characteristics, and the data after the extraction processing of the original characteristics are subjected to L 2 normalization processing to form the single-mode characteristics; The process of obtaining the quality score by processing the three types of pure modal input data through the quality scoring module is that the three types of pure modal input data are sequentially subjected to image feature extraction, global average pooling and full-connection layer processing to form an original quality score, and the original quality score is subjected to modal mask correction processing to form a corrected quality score.
- 5. The method for identifying the identity of the individual peaked multi-modal cattle in natural scenes according to claim 4, wherein the calculation process for obtaining the single-mode characteristics by processing three kinds of pure-mode input data through the mixed expert module is as follows: Step one, setting an input mode image to be after natural scene adaptation pretreatment , The backbone network output original characteristics are: (1) Performing L 2 normalization on the original features to ensure that the feature module length is 1 and the stability of the features in the natural scene is enhanced, wherein the calculation formula is as follows: (2) In the above-mentioned method, the step of, Is that The norm of the sample is calculated, Minimum values introduced to prevent denominator from being 0; After being processed by the single-mode feature extraction module, the normalized feature vector with the dimension of 512 is output, namely 。
- 6. The method for identifying the identity of the individual peaked multi-modal cattle in the natural scene as set forth in claim 4, wherein the calculating process of obtaining the quality score by processing the three kinds of pure modal input data through the quality scoring module comprises the following steps: step one, inputting a single-mode image obtained by YOLO11n-seg segmentation Capturing detail features related to images in natural scenes by a 3-layer convolution feature extraction unit, and outputting a multi-channel two-dimensional feature map, wherein the corresponding calculation formula is as follows: (3) In the above-mentioned method, the step of, The method is a 3-layer convolution feature extraction unit, the number of output channels is 128, and a convolution kernel is 3 multiplied by 3 in small size; Step two, for the two-dimensional characteristic diagram Performing self-adaptive global average pooling processing, namely compressing the features of the space dimension into one-dimensional vectors, eliminating interference of space position difference on quality score to obtain quality feature vectors of fixed dimension, wherein the calculation formula is as follows: (4) In the above-mentioned method, the step of, In order to be a multi-channel two-dimensional feature map, For the purpose of the average pooling operation, The representation output is a 128-dimensional real vector; third, one-dimensional quality feature vector After the link treatment of the full connection layer, the random inactivation, the full connection layer and the Sigmoid activation phase, the quality fraction of the interval 0-1 is finally output, and the closer the fraction is to 1, the better the quality of the representative modal image is, the calculation formula is: (5) In the above-mentioned method, the step of, Is that Activating a function; Representing a fully connected layer; For the random deactivation operation, the discard probability is set to ; For the 128-dimensional feature vector to be input, As the weight coefficient to be finally output, Is a range of values of (a) ; Step four, combining a modal mask output by the YOLO11-seg segmentation stage The mass fraction is corrected, the mass fraction of the missing mode is set to 0, the invalid mode is prevented from participating in subsequent fusion, and a specific calculation formula is as follows: (6) In the above-mentioned method, the step of, For the original mass fraction output by the present module, The corrected mass fraction is output for the module.
- 7. The method for identifying the identity of the disturbance-free multi-modal cattle in the natural scene according to any one of claims 1 to 6 is characterized in that after the obtained single-modal characteristics and quality fractions are subjected to effective modal global average processing to obtain global characteristics, the global characteristics and modal masks are subjected to dimension splicing processing and then input into a gating network module to be calculated to obtain an activation coefficient, the activation coefficient is subjected to coefficient softening and optimal modal number knocking processing to form N optimal modes, sparse weights are generated in the N optimal modes, and the optimal characteristics indicate that the processing process of the dynamic sparse gating module is completed.
- 8. The method for identifying the individual identity of the disturbance-removed multi-mode cattle in the natural scene according to claim 7, wherein the processing flow of combining the obtained single-mode characteristics and the obtained mass fraction through dynamic sparse gating is as follows: step one, outputting normalized characteristics of a single-mode expert model Corrected mass fraction output by dimension 512 and modal mass scoring module Element-by-element multiplication is performed, the characteristic contribution of a high-quality mode is enhanced, the interference of a low-quality mode is weakened, and a calculation formula is as follows: (7) In the above-mentioned method, the step of, Respectively correspond to three modes of a cow face, a nasal mirror pattern and a body pattern, The single-mode characteristics after the quality weighting are adopted; Step two, averaging the quality weighted characteristics of the three modes of the cow face, the nasal mirror lines and the body patterns to obtain global characteristics comprehensively reflecting the information of all effective modes Global features The calculation formula of (2) is as follows: (8) In the above-mentioned method, the step of, In order for the number of modes to be effective, Weighting characteristics for the quality of the three modes; step three, global features are obtained Mask vector with three modes Performing dimension splicing processing to ensure that the gating network module considers whether the feature quality and the mode are effective at the same time when deciding, and correspondingly recalculates when the gating network module considers that the feature quality and the mode are ineffective at the same time when deciding, wherein when the gating network considers that the feature quality and the mode are effective at the same time when deciding, the input feature of the gating network is finally obtained, and the specific calculation formula is as follows: (9) In the above-mentioned method, the step of, Representing the input characteristics of the gating network, As a global feature of the device, it is possible, Is a mask vector for three modes, Splicing representative vectors; performing gating network output calculation according to the obtained input characteristics of the gating network module, and completing a sparse characteristic selection process by processing the forward propagation output expert activation weight and the optimal k value, wherein the calculation process is as follows: 1) Input features After link processing of the full connection layer, batch normalization, activation and random inactivation, outputting low-dimensional characteristics for subsequent decision making, wherein a calculation formula is as follows: (10) (11) In the above-mentioned method, the step of, In order to activate the function, Is a fully connected layer, is an input feature with dimension 515 Mapping to 256-dimensional hidden feature space; For batch normalization operation, the training stability and generalization capability of the model under a natural scene are enhanced through standardized feature distribution; for the random deactivation operation, the discard probability is set to 0.3; for the full connection layer, the full connection layer maps 256-dimensional hidden features to 4-dimensional output features ; 2) Taking out Is used as expert activation logits to introduce temperature parameters Softening logits value, and the calculation formula is: (12) In the above-mentioned method, the step of, , Respectively corresponding to the first 3-dimensional outputs, The higher the value, the higher the priority representing that the corresponding modality expert is activated; 3) Taking out Is output in the 4 th dimension of (2) After mapping to the [0,1] interval is activated, the reduction is performed The interval is rounded downwards to obtain the optimal expert number to be activated The mass combinations of different modes are adapted, and the calculation formula is as follows: (13) (14) In the above-mentioned method, the step of, , To round down the function, ensure Wherein the single mode reaches the standard When the dual modes reach the standard ; 4) For a pair of Selecting the modal specialists corresponding to the top k highest values according to descending order, generating gating weights through Softmax normalization, activating only the optimal k modalities, and setting the rest weights to 0 to finish sparse fusion processing, wherein the specific calculation formula is as follows: (15) (16) In the above-mentioned method, the step of, The indexes are ordered in order of decreasing order, , And is also provided with ; The above calculation process completes the sparse gating weight output through dynamic sparse gating And corresponding k optimal single-mode characteristics.
- 9. The method for identifying the identity of the individual cattle in the multiple modes by the descrambling mode in the natural scene according to claim 8, wherein the process of completing the identity matching of the individual cattle after the feature fusion processing of the image subjected to the dynamic sparse gating is characterized by comprising the following steps: firstly, integrating the optimal modal characteristics after dynamic sparse gating screening into high-quality multi-modal fusion characteristics, uniformly projecting the high-quality multi-modal fusion characteristics to d=512 dimension according to the existing characteristic dimension difference, and ensuring the consistency of characteristic fusion, wherein the specific projection mode is divided into two cases: In case one, when expert outputs dimensions The projection layer is a normalization layer, and the calculation formula is as follows: (17) In the second case, if the expert outputs the dimension The projection layer is linear mapping+normalization, and the calculation formula is as follows: (18) In the above-mentioned method, the step of, In order to blend the expert module output characteristics, Is that Complete the (1) Feature dimension of individual modalities Unified mapping to target dimensions Is a process of (1); secondly, carrying out two-stage fusion processing, wherein the fusion process comprises two steps of mass weighting and gating weighting, wherein the high-quality characteristics are screened based on modal quality, and then the accurate fusion is realized by combining an optimal modal selection result of dynamic gating, and the specific flow is as follows: The first stage is that the quality weighting is carried out, the contribution of high-quality mode characteristics in a natural scene is highlighted, the interference of low-quality modes is reduced, and the calculation formula is as follows: (19) In the above-mentioned method, the step of, The quality score is output by the quality scoring module; in the second stage, gating weighting, dynamically selecting the optimal mode combination in the natural scene, and weighting the characteristics of the weighted quality The weighted summation processing is carried out to ensure that only k optimal modal characteristics screened by gating are ensured 0 Participates in final fusion, remaining invalid modalities =0 Does not contribute weight, and the calculation formula is: (20) In the above-mentioned method, the step of, In order to achieve a final fusion of the features, The dimension is obtained by weighted summation of weighted projection features of 3 modes A dimension real space; and finally, performing cosine similarity matching on the high-quality multi-mode fusion features output by the feature fusion module and a pre-constructed cow identity feature library, and finally outputting accurate cow Identity (ID) to complete an identity recognition closed-loop process.
Description
Method for identifying individual identity of disturbance-stripping type multi-mode cattle in natural scene Technical Field The invention particularly relates to a method for identifying the identity of a multi-mode individual cattle through descrambling in a natural scene. Background The identity of the individual cattle can be accurately identified, and the method has important roles in breeding management, epidemic disease prevention and control, traceability supervision, breeding and the like, and the current mainstream identification method is difficult to meet the requirements of natural scenes, so that obvious technical shortboards exist. The single-mode identity recognition method constructs a model by extracting single biological characteristics such as cow face, nasal mirror lines, body patterns and the like, and commonly uses a characteristic extraction and classification framework based on a deep convolutional neural network, such as CNN extraction of cow face characteristics and combination of cow face characteristicsThe method is characterized in that the loss function is classified, or the nose pattern texture features are extracted through a traditional machine vision method and template matching is carried out, but the method relies on single features, the conditions of direct irradiation of strong light, shadow of an object, weak light at night and the like are alternately changed, the phenomena of irregular cow postures, low head feeding, random turning, mutual crowding and shielding and the like are quite common, biological features are easy to be interfered by the outside, the problems that the nose pattern is attached with dirt, body patterns are shielded by thick and heavy hairs, the cow face shooting angle is offset and the like frequently occur, and modal data are also frequently lost, so that only single modal information such as cow face or nose pattern and the like can be acquired. These complications place extremely high demands on the robustness, flexibility and tamper resistance of the identification algorithm. The robustness is insufficient under the interference of illumination change, shielding, characteristic blurring and the like, the recognition accuracy is easy to be greatly reduced, and the specific conditions are mainly represented as follows: firstly, based on the complex and changeable outdoor environment, the cattle are only in strong light, backlight, overcast and rainy days or other outdoor illumination, and under the interference of relevant factors of weed shielding and mud coverage, the characteristics are easy to blur, and the recognition and collection effects are influenced. Secondly, the characteristics of the cattle are easy to change and similar, and the characteristics of the body surface can be changed due to the similarity of the appearance and the hair color patterns of the cattle with the same quality, and the ox horn and the ear tag are easy to wear and are easy to lose or fall off due to the hair falling, hair changing and trauma of the cattle, so that the identification object is unstable. Thirdly, the behavior of the cattle is uncontrollable, the cattle can move, lower the head, tie up and throw the head randomly in a natural scene, clear and positive identification images/information are difficult to obtain, and the acquisition efficiency is low. Fourthly, the position of the cattle is generally easy to form long-distance or non-contact limit, and the natural breeding is mostly free-range and semi-free-range, the cattle cannot be manually captured and identified at a short distance, and the requirements on the distance and the precision of a non-contact identification technology are high, so that the natural breeding cannot be uniformly achieved at present. In addition, the surrounding background elements such as trees, fences, other livestock and the like of the farm are easy to be confused with characteristics of cattle, and the misjudgment rate of the recognition algorithm is increased. Fifth, the individual features of the cattle are less different, and compared with the dogs and cats, the individual identification degree of the facial textures and the body surface features of the cattle is lower, the single-pass facial or body surface judgment is inaccurate, and the unique marker features are absent. The existing processing method fuses multi-mode characteristics by simple modes such as splicing, weighted summation and the like, for example, the characteristics of the cow face and the nasal mirror patterns are directly spliced and input into a classifier, or fixed weight fusion characteristics are set based on experience, so that the flexibility is extremely poor, the problem of mode missing cannot be adapted, the quality difference of different modes is not considered, meanwhile, the problem of unbalanced efficiency and generalization capability is solved, all calculation branches need to be completely operated to cause calculation redund