CN-121986732-A - Animal behavior analysis method, apparatus, computer device and storage medium
Abstract
The invention belongs to the technical field of analysis methods, and particularly relates to an animal behavior analysis method, an animal behavior analysis device, computer equipment and a storage medium based on a multi-mode intelligent agent. The method comprises the steps of receiving multi-view video data and natural language prompt words, extracting three-dimensional skeleton key point data of animals from the multi-view video data, enabling a behavior analysis intelligent body to generate semantic descriptions and a calculation task list at least comprising one physical index according to video key frames and the natural language prompt words in the multi-view video data, calling an independent calculation tool chain, executing quantization calculation on the three-dimensional skeleton key point data according to the calculation task list to obtain quantization results corresponding to the physical index, and fusing the semantic descriptions and the quantization results through a slot filling mechanism to generate an animal behavior analysis report. The invention not only can realize the physical accuracy of calculation, but also has the capability of generating scientific and interpretable analysis reports, and realizes the span from data discrimination to intelligent analysis.
Inventors
- WEI PENGFEI
- ZHANG WENKANG
Assignees
- 中国科学院深圳先进技术研究院
Dates
- Publication Date
- 20260508
- Application Date
- 20260210
Claims (10)
- 1. The animal behavior analysis method based on the multi-mode intelligent agent is characterized by comprising the following steps of: Receiving multi-view video data and natural language prompt words; extracting three-dimensional bone key point data of animals from the multi-view video data; Inputting the multi-view video data and the natural language prompt words into a behavior analysis agent constructed based on a large language model, wherein the behavior analysis agent generates semantic descriptions and a calculation task list containing at least one physical index according to video key frames in the multi-view video data and the natural language prompt words; invoking an independent calculation tool chain, performing quantization calculation on the three-dimensional skeleton key point data according to the calculation task list to obtain a quantization result corresponding to the physical index, and And fusing the semantic description and the quantitative result through a slot filling mechanism to generate an animal behavior analysis report.
- 2. The multi-modal agent-based animal behavioral analysis method of claim 1, wherein the extracting three-dimensional skeletal keypoint data of an animal from the multi-view video data comprises the steps of: Extracting two-dimensional key point coordinates of animals from each frame of image of the multi-view video data by using a deep learning posture estimation model; based on a camera internal and external parameter matrix calibrated in advance, back-projecting the two-dimensional key point coordinates under different view angles at the same moment into a three-dimensional world coordinate system by a triangulation principle to obtain three-dimensional key point coordinates; and performing time sequence smoothing on the three-dimensional key point coordinates to obtain three-dimensional skeleton key point data with time sequence continuity.
- 3. The multi-modal agent-based animal behavioral analysis method of claim 1, wherein the behavioral analysis agent generates a semantic description and a computational task list comprising at least one physical indicator from video key frames in the multi-view video data and the natural language prompt, comprising the steps of: the behavior analysis agent analyzes video key frames in the multi-view video data to identify animal types and experimental scenes; Determining behavior categories of animals based on a built-in animal behavior ontology library and combining the animal types and the experimental scenes; And generating semantic descriptions and a computing task list containing at least one physical index based on the behavior category and the natural language prompt word.
- 4. The multi-modal agent-based animal behavioral analysis method of claim 1, wherein the fusing of the semantic description and the quantification result by a slot filling mechanism generates an animal behavioral analysis report comprising the steps of: Forcibly filling the numerical value in the quantized result into a preset placeholder in the semantic description to generate fused text content, wherein modification of the numerical value is forbidden; generating a visual chart based on the quantification result, wherein the visual chart comprises at least one of a space preference thermodynamic diagram, a speed time sequence diagram and a behavioral Gantt chart; And automatically typesetting and packaging the text content and the visual chart to generate an animal behavior analysis report.
- 5. An animal behavior analysis device based on multi-modal agents, comprising: the data receiving unit is used for receiving the multi-view video data and the natural language prompt words; the data extraction unit is used for extracting three-dimensional bone key point data of animals from the multi-view video data; The intelligent analysis unit is used for inputting the multi-view video data and the natural language prompt words into a behavior analysis intelligent body constructed based on a large language model, and the behavior analysis intelligent body generates semantic description and a calculation task list containing at least one physical index according to video key frames in the multi-view video data and the natural language prompt words; a quantization calculation unit for calling an independent calculation tool chain, performing quantization calculation on the three-dimensional skeleton key point data according to the calculation task list to obtain a quantization result corresponding to the physical index, and And the report generating unit is used for fusing the semantic description and the quantification result through a slot filling mechanism to generate an animal behavior analysis report.
- 6. The multi-modal agent based animal behavioral analysis apparatus of claim 5, wherein the data extraction unit comprises: the two-dimensional extraction module is used for extracting two-dimensional key point coordinates of animals from each frame of image of the multi-view video data by using a deep learning gesture estimation model; the three-dimensional transformation module is used for back projecting the two-dimensional key point coordinates under different visual angles at the same moment into a three-dimensional world coordinate system through a triangulation principle based on a camera internal and external parameter matrix calibrated in advance to obtain three-dimensional key point coordinates; and the smoothing processing module is used for carrying out time sequence smoothing processing on the three-dimensional key point coordinates to obtain three-dimensional skeleton key point data with time sequence continuity.
- 7. The multi-modal agent based animal behavioral analysis apparatus of claim 5, wherein the intelligent analysis unit comprises: The preliminary analysis module is used for analyzing video key frames in the multi-view video data by the behavior analysis agent to identify animal types and experimental scenes; the category determining module is used for determining the behavior category of the animal based on the built-in animal behavior ontology library and combining the animal types and the experimental scene; and the list generation module is used for generating semantic description and a calculation task list containing at least one physical index based on the behavior category and the natural language prompt word.
- 8. The multi-modal agent based animal behavior analysis device of claim 5, wherein the report generating unit comprises: The text generation module is used for forcedly filling the numerical value in the quantized result into a preset placeholder in the semantic description to generate fused text content, wherein the numerical value is forbidden to be modified; A chart generation module for generating a visual chart based on the quantification result, wherein the visual chart comprises at least one of a space preference thermodynamic diagram, a speed time sequence chart and a behavioral Gantt chart; And the report generation module is used for automatically typesetting and packaging the text content and the visual chart to generate an animal behavior analysis report.
- 9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the multi-modal agent-based animal behavior analysis method of any one of claims 1 to 7 when the computer program is executed.
- 10. A storage medium storing a computer program executable to implement the steps of the multi-modal agent-based animal behavior analysis method of any one of claims 1 to 7.
Description
Animal behavior analysis method, apparatus, computer device and storage medium Technical Field The invention belongs to the technical field of analysis methods, and particularly relates to an animal behavior analysis method, an animal behavior analysis device, computer equipment and a storage medium based on a multi-mode intelligent agent. Background Animal behavior analysis is a core link in life science, neuroscience and pharmacology research, and by quantitatively analyzing the posture, action sequence and spatial position of a tested animal, key data support can be provided for understanding nervous system functions, evaluating drug efficacy and monitoring animal health states. As computer vision technology evolves, this field has undergone a transition from manual visual recording to automated analysis. Currently, deep learning-based pose estimation techniques have enabled high precision tracking of animal joints and body parts, generating a continuous sequence of coordinates. In the field of animal behavior analysis, the existing mainstream technical scheme mainly relies on a computer vision algorithm to realize accurate extraction and quantitative expression of animal gestures. In the scheme, a deep learning model (such as DeepLabCut, SLEAP) is generally utilized to detect and track animal key points in a video, and the pixel coordinates of the animal key points in a two-dimensional plane are obtained by identifying anatomical feature points such as the nose tip, the ears, the trunk, the tail root and the like of the animal. In order to further acquire more real physical space information, a part of advanced schemes adopt a multi-view camera synchronous acquisition or depth sensor technology, and two-dimensional coordinates of multiple dimensions are mapped into a three-dimensional space through a camera calibration and geometric triangulation principle, so that three-dimensional reconstruction of animal bone points is realized. The three-dimensional skeleton data-based method can effectively eliminate the influence of view angle shielding and perspective distortion, and provides higher-dimension original features for behavior analysis. After the time sequence data of the bone points are acquired, a series of quantitative description indexes are further extracted through a physical formula and a space geometric algorithm in the prior art. These indicators include, but are not limited to, individual movement speed, acceleration, body torso inclination, joint opening and closing, and the like. For standardized behavioral scenes such as open field experiments, the system calculates the residence time, the entry times and the movement track distribution of the animal in a specific area (such as a central area or an edge area) according to preset virtual area coordinates. These physical quantification data constitute the underlying feature space of animal behavior, which can intuitively reflect the animal's activity level and spatial preference. In the final behavior classification link, the prior proposal generally adopts a time sequence deep learning model to process the extracted skeleton sequence and the quantized features. By performing supervised learning on the large-scale labeled behavior data set, the model can identify specific behavior patterns, such as walking, hair tidying, standing, social interaction, and the like, and output corresponding category labels. The technical architecture based on the key point extraction, feature quantization and deep learning classification is a general implementation path for realizing animal behavior automation judgment in the current scientific research and industry. Although the prior technical scheme based on the key point estimation and the deep learning classification has achieved a certain effect on the recognition accuracy of specific behaviors, the technical scheme still has significant limitations in practical application and scientific research. First, the pipeline model of "key point extraction-feature quantization-deep learning classification" has a serious semantic gap, which is essentially a closed discrimination system, and can only output predefined behavior labels (such as "walking", "resting") and discrete values, and lacks the ability to convert these quantized data into a continuous natural language description with biological significance, and cannot intuitively explain the context or logic behind the behavior. Secondly, the flexibility of the technical route is poor, the technical route is highly dependent on the preset characteristic engineering and the fixed model structure, once undefined complex behaviors or new combination indexes need to be analyzed, large-scale data labeling and model training often need to be carried out again, and the technical route is difficult to adapt to changeable experimental requirements in an open environment. In addition, although the existing deep learning model utilizes three-dimensional bone data, th