CN-121502694-B - Method for predicting weight of enclaved animal based on fusion of environment and visual characteristics
Abstract
The invention relates to the technical field of intelligent breeding, in particular to a method for predicting the weight of a housed animal based on fusion of environment and visual characteristics. The method comprises the steps of synchronously collecting environment parameters, image data and weight labels, constructing a triplet data set, preprocessing the triplet data set, constructing a mapping relation of the environment parameters, a single frame image and a single weight of a corresponding housing animal based on a time stamp and preset space coordinates, generating multi-mode training data, inputting the environment parameters in the multi-mode training data into an environment-image feature fusion module in a pre-trained CNN-transporter mixed architecture, and performing dimension matching with image features to obtain a fusion feature training model. The invention realizes the high precision, stability and generalization capability of the weight prediction of the housing animals by constructing the deep coupling of the environment and the visual characteristics, combining the time-space mapping, the multitask constraint and the cross-scene self-adaptive mechanism.
Inventors
- LIANG YANBIN
- Kuang Yingjie
- HUANG JINGYI
- CHEN YAN
Assignees
- 温氏食品集团股份有限公司
- 深圳喜为智慧科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260112
Claims (7)
- 1. A method for predicting the weight of a containment animal based on fusion of environmental and visual characteristics, comprising the steps of: step S1, synchronously acquiring environmental parameters, image data and weight labels, and constructing a triplet data set; step S2, preprocessing a triplet data set, constructing a mapping relation between environmental parameters and single frame images, wherein the mapping relation corresponds to the weight of a single animal in the containment, and generating multi-mode training data based on a time stamp and preset space coordinates; s3, inputting environmental parameters in the multi-mode training data to an environment-image feature fusion module in a pre-trained CNN-converter mixed architecture, and performing dimension matching with image features to obtain a fusion feature training model; step S4, designing a multi-task loss function, pre-training the fusion feature training model, and adjusting parameters according to different scenes to obtain a weight prediction model, wherein the step S4 comprises the following steps: constructing a multi-task loss function, and performing scene difference self-adaptive training after supervising the pre-training preliminary model to obtain a sub-scene adaptation model; Introducing a federation learning mechanism into the split scene adaptation model, and establishing a multi-farm distributed training node based on the cross-scene participant index to obtain a federation training initial layout; Performing gradient compression-sparsification operation on model gradients uploaded by all participants in the cross-scene participant index, setting the gradient sparsity to be 95%, and performing quantization compression to obtain a sparse gradient set; Performing global parameter aggregation on the split scene adaptation model based on the sparse gradient set to generate a cross-scene generalization model; performing generalization error evaluation on the cross-scene generalization model on the unified verification set to obtain a generalization error quantization table; Performing index comparison on the generalization error quantization table, triggering an online model updating process when the generalization error reduction percentage is greater than or equal to 25%, and performing parameter adjustment on the cross-scene generalization model to obtain a weight prediction model; The method for performing scene difference self-adaptive training after constructing the multi-task loss function and pre-training the preliminary model through supervision comprises the following steps: Constructing a multi-task loss function consisting of weight regression loss, body type parameter regression loss and environmental consistency loss for the fusion feature training model to obtain a multi-task loss set; Based on the multi-task loss set, performing supervision pre-training on the fusion feature training model to obtain a preliminary weight prediction model; based on the environmental data grouping of different farms, performing scene difference self-adaptive training on the preliminary weight prediction model to obtain a sub-scene adaptation model; Step S5, verifying the environment suitability of the weight prediction model, performing error analysis to generate error optimization data, and updating weight prediction model parameters based on the error optimization data to obtain the weight prediction optimization model, wherein the step S5 is used for verifying the environment suitability of the weight prediction model and performing the error analysis and comprises the following steps: Performing end-side inference tests on the weight prediction model under different environmental conditions, and constructing an adaptability evaluation matrix based on multiple environmental indexes to obtain environmental adaptability verification data; performing scene difference analysis on the environment suitability verification data, identifying error offset of the weight prediction model in cross-scene inference, and constructing a high-error environment group and a low-error environment group to obtain a scene error difference table; Performing label consistency verification on corresponding samples of the scene error difference table based on the mapping ternary data set, and removing risk samples to obtain a trusted error sample set; performing error source classification on the trusted error sample set to obtain an error decomposition matrix, wherein the error source type comprises a visual error, an environment weight error and a body type parameter regression error; Performing error reverse mapping on the environment parameter embedding matrix and FiLM conditional modulation parameters based on the error decomposition matrix, and constructing an environment modulation reverse map to obtain an environment modulation correction coefficient; Recalibrating the text channel weight and the visual channel weight based on the environment modulation correction coefficient, and automatically increasing the text channel weight in a low-illumination area to obtain environment modulation recalibration parameters; and performing characteristic reconstruction on the environment modulation recalibration parameters and the error decomposition matrix to form an error optimization tensor, and obtaining error optimization data.
- 2. The method for predicting weight of a containment animal based on fusion of environmental and visual characteristics according to claim 1, wherein step S1 comprises: collecting illumination intensity, hurdle type, temperature and humidity and camera installation parameters through a sensor and a camera, and carrying out unified quantization coding to form a structured environment parameter set; acquiring the posture of the housed animals and the picture coverage area through an end-side camera device, and screening key frames to form an image data set; collecting the weight of the housing animals through manual weighing, and marking weight labels to obtain a weight label set; And (3) performing three-party alignment on the environmental parameters, the image data and the weight label to obtain a triplet data set.
- 3. The method for predicting the weight of a containment animal based on fusion of environmental and visual features of claim 2, wherein the three-party alignment of the environmental parameters, the image data and the weight label further comprises: performing block chain hash check on the triplet data set, and performing linked timestamp anchoring to obtain a triplet trusted data set; Performing text coding on the environment text description in the triplet trusted dataset, and converting the hurdle type, the illumination level and the camera installation parameters into 128-dimensional text embedding vectors to obtain an environment parameter embedding matrix; constructing a double-channel gating structure for the environment parameter embedding matrix based on a preset ViT environment injection mechanism to obtain a text weight channel and a visual weight channel, and meeting the condition that the sum of the text weight channel and the visual weight channel is 1; When the illumination is less than 500lux, automatically setting the text weight to be greater than 0.7, and obtaining the environment weight configuration data with low illumination self-adaption.
- 4. The method for predicting weight of a containment animal based on fusion of environmental and visual characteristics according to claim 1, further comprising, after step S1: Acquiring data sources of a plurality of farms, grouping scenes, and respectively configuring federal learning node identifiers to obtain cross-scene participant indexes; pre-analyzing the data scale, distribution difference and environmental characteristic difference of different farms, and setting gradient compression sparsity rate to be 95% to obtain a federal training front-end parameter table; And (3) carrying out statistics on the environmental difference of the triplet trusted data set, and calculating the lower limit of the environment generalization error theory to obtain a generalization error reference table.
- 5. The method for predicting weight of a containment animal based on fusion of environmental and visual characteristics according to claim 1, wherein step S2 comprises: constructing a mapping triplet data set of environmental parameters, a single frame image and a weight label by using the triplet data set based on the time stamp and the preset space coordinates; carrying out multi-mode structure division on the mapping triplet data set, encoding the environment parameters into structured environment vectors, formatting the image data into visual input tensors, and converting the weight labels into supervision signals to obtain a structured multi-mode data set; based on the environment parameter embedding matrix and the environment weight configuration data, aligning the 128-dimensional text embedding vector with the visual input dimension to obtain environment-visual alignment data before FiLM injection; invoking a dual-channel gating mechanism on the environment-vision alignment data based on the structured multi-mode dataset, and executing dynamic complementation configuration through text channel weight and vision channel weight to obtain FiLM injectable multi-mode pre-fusion data; and determining tensor training batches based on the multi-mode pre-fusion data according to the preset input requirement to obtain multi-mode training data.
- 6. The method for predicting weight of a containment animal based on fusion of environmental and visual characteristics according to claim 1, wherein step S3 comprises: performing modal separation on the multi-modal training data, and respectively analyzing the structured environment parameter input sequence and the visual image input sequence to obtain an environment embedded input stream and a visual characteristic input stream; performing dimension adjustment on the environment embedding input stream based on the environment parameter embedding matrix, and matching the visual feature dimension of the CNN-transducer mixed architecture to obtain a dimension matching environment vector; invoking a double-channel gating mechanism on the dimension matching environment vector based on the environment weight configuration data to obtain a weighted environment vector; performing FiLM conditional modulation operation on the weighted environment vector, and injecting the weighted environment vector into a feature extraction layer of the CNN-transform hybrid architecture to obtain environment-vision joint features; Performing multi-head self-attention calculation on the visual feature input stream through a transducer visual backbone network to obtain a deep visual feature map; performing cross-modal fusion on the deep visual feature map and the environment-visual joint feature to obtain a cross-modal fusion feature; Performing serialization rearrangement and channel normalization on the cross-modal fusion characteristics to obtain a fusion characteristic sequence; And constructing a fusion characteristic training model based on the fusion characteristic sequence, and executing structural consistency check on the model.
- 7. The method for predicting weight of a containment animal based on fusion of environmental and visual characteristics according to claim 1, wherein updating weight prediction model parameters based on error optimization data in step S5 comprises: performing quantitative evaluation on the model performance corresponding to the error optimization data, and triggering an online model updating process when the generalized error reduction percentage is less than 25%, so as to obtain an online updating trigger signal; Performing gradient update on model parameters corresponding to the online update trigger signals, and executing a gradient compression-sparsification mechanism in federal learning to obtain a parameter update gradient set; And executing global parameter aggregation based on the parameter updating gradient set, and carrying out joint parameter correction on the transducer trunk, the CNN branch and the FiLM modulation layer to obtain the weight prediction optimization model.
Description
Method for predicting weight of enclaved animal based on fusion of environment and visual characteristics Technical Field The invention relates to the technical field of intelligent breeding, in particular to a method for predicting the weight of a housed animal based on fusion of environment and visual characteristics. Background The traditional manual weighing method relies on contact measurement and experience estimation, and weight is approximately calculated through an empirical formula, but the method has high dependence on manual operation and poor data consistency, and is difficult to meet the real-time management requirement of a large-scale pig farm. The image regression method based on monocular or binocular vision gradually appears, and the convolution neural network is utilized to extract the body type outline and the posture characteristics from the single frame image, so that the non-contact type weight estimation is realized. However, such methods generally treat environmental factors as noise, and key environmental parameters such as illumination, humidity, hurdle materials, ground reflection and the like are not modeled, so that error of the model is significantly increased when the model is deployed across scenes. The application of the transducer structure in visual characterization enables the multi-scale feature acquisition capability to be remarkably enhanced, but the environmental differences among different pig farms still lead to inconsistent visual feature distribution and insufficient model generalization capability. Part of research attempts to introduce environmental labels as auxiliary variables, but mostly only stay in a simple splicing or fixed weight fusion stage, and does not construct the environment-vision joint expression capable of dynamically adapting, meanwhile, training data are mostly collected in a single scene, large-scale triplet data of a pig farm and multiple time periods are lacked, and the credibility of the training data is lacked. Disclosure of Invention Accordingly, there is a need for a method for predicting weight of a housed animal based on fusion of environmental and visual features, which solves at least one of the above-mentioned problems. In order to achieve the above object, a method for predicting the weight of a containment animal based on fusion of environmental and visual characteristics comprises the following steps: step S1, synchronously acquiring environmental parameters, image data and weight labels, and constructing a triplet data set; step S2, preprocessing a triplet data set, constructing a mapping relation between environmental parameters and single frame images, wherein the mapping relation corresponds to the weight of a single animal in the containment, and generating multi-mode training data based on a time stamp and preset space coordinates; s3, inputting environmental parameters in the multi-mode training data to an environment-image feature fusion module in a pre-trained CNN-converter mixed architecture, and performing dimension matching with image features to obtain a fusion feature training model; s4, designing a multi-task loss function, pre-training the fusion feature training model, and adjusting parameters according to different scenes to obtain a weight prediction model; And S5, verifying the environment suitability of the weight prediction model, performing error analysis to generate error optimization data, and updating the weight prediction model parameters based on the error optimization data to obtain the weight prediction optimization model. The invention realizes the quantifiable expression of multisource environmental factors such as illumination intensity, hurdle materials, humidity fluctuation, reflection interference and the like through constructing a deep coupling mechanism of environmental parameters and visual characteristics in the weight prediction process, ensures that the visual characteristics keep stability in a complex scene, depends on a mapping relation established by a timestamp and a space coordinate, ensures that environmental records, key frame images and weight labels of the same farm animal keep strict correspondence, improves the consistency and the credibility of training samples, fuses the CNN and the Transformer structure to simultaneously acquire local body type contours and global posture distribution, ensures that the model has more complete body type structure expression capability, leads the multitask loss structure to simultaneously consider the visual characteristics, the environmental characterization and the semantic, improves the prediction precision in the complex scene, ensures that the model can adaptively converge the environmental differences of different farm animal models, reduces the prediction deviation in the cross scene, realizes the parameter automatic correction through the error reverse mapping and the environmental modulation recalibration, ensures that the model