CN-121074986-B - YOLOv-based mine personnel behavior recognition method and YOLOv-based mine personnel behavior recognition system

CN121074986BCN 121074986 BCN121074986 BCN 121074986BCN-121074986-B

Abstract

The application relates to the technical field of image processing, and discloses a mine personnel behavior recognition method and system based on YOLOv. The method comprises the steps of obtaining a video frame sequence of a mine site, extracting an environment interference vector from the video frame sequence, carrying out optimization processing on the video frame sequence based on the environment interference vector to obtain a clear frame sequence, extracting human body key point coordinates and gesture vectors from the clear frame sequence, analyzing time sequence changes of the key point coordinates to obtain preliminary action features, calculating similarity scores of the preliminary action features and a preset standard template, refining the preliminary action features based on the similarity scores to obtain refined action component vectors, matching the refined action component vectors from a preset action library to obtain action recognition results, and generating corresponding action response output based on the action recognition results. The application improves the accuracy of identifying the personnel actions in the complex mine environment.

Inventors

ZHU HAO

Assignees

河南影聪科技发展有限公司

Dates

Publication Date: 20260508
Application Date: 20250830

Claims (8)

1. A mine personnel behavior recognition method based on YOLOv, the method comprising: Step S1, acquiring a video frame sequence of a mine site, extracting an environment interference vector from the video frame sequence, and carrying out optimization processing on the video frame sequence based on the environment interference vector to obtain a clear frame sequence; S2, extracting human body key point coordinates and gesture vectors from the clear frame sequence, and analyzing time sequence changes of the key point coordinates to obtain preliminary action characteristics; step S3, calculating similarity scores of the preliminary action features and a preset standard template, and refining the preliminary action features based on the similarity scores to obtain refined action component vectors; The method comprises the steps of calculating similarity scores of the preliminary action features and a preset standard template, wherein the similarity scores comprise the steps of extracting a standardized gesture vector set from a mine action recognition database as the preset standard template, calculating initial similarity of the preliminary action features and the preset standard template, and if the initial similarity is lower than a preset threshold value, adjusting weight coefficients of each action component in the preliminary action features according to illumination uniform distribution and dust particle concentration in the environment interference vector; the method comprises the steps of carrying out refinement on the preliminary action feature based on the similarity score, judging whether the similarity score is larger than a target threshold value, if yes, determining that a matching relation exists between the preliminary action feature and a preset template, if no, carrying out fine granularity decomposition on the preliminary action feature, dividing the preliminary action feature into a plurality of action components, wherein each action component corresponds to one time segment in an action sequence, extracting a key region from the decomposed action components, wherein the key region is a partial region capable of reflecting gesture differences or small changes of gestures, distributing weights for the key region through an attention mechanism, generating an attention weighted vector, carrying out self-adaptive adjustment on the target threshold value by combining an environment interference vector, calculating a dynamic gain coefficient based on the similarity score, and carrying out correction and amplification processing on the key feature in the attention weighted vector to obtain a refined action component vector which is a speed, an angle and a track change feature of a local layer; And S4, matching the refined motion component vector from a preset motion library to obtain a motion recognition result, and generating a corresponding motion response output based on the motion recognition result.
2. The method of claim 1, wherein optimizing the sequence of video frames based on the ambient interference vector comprises: Analyzing the illumination uniform distribution and dust particle concentration of each frame of image in the video frame sequence; Calculating an illumination dynamic range according to the illumination uniform distribution, and calculating a dust diffusion speed according to the dust particle concentration; Fusing the illumination dynamic range and the dust diffusion speed to generate the environment interference vector; Processing the video frame sequence by adopting a convolutional neural network, and adjusting illumination color temperature deviation according to the illumination dynamic range in the environment interference vector; And adjusting the contrast of the video picture according to the dust diffusion speed in the environment interference vector to obtain a clear frame sequence.
3. The method of claim 2, wherein adjusting the contrast of the video frame based on the dust diffusion rate in the ambient interference vector comprises: Obtaining a displacement vector of a dust region pixel by applying an optical flow field estimation method to continuous multi-frame video images; calculating the direction distribution and the speed distribution of dust movement according to the statistical result of the displacement vector; The contrast enhancement coefficient is self-adaptively set according to the dust density gradient, and the local area is subjected to regional histogram equalization treatment, so that the whole picture is enhanced, and the edge characteristics of dust particles are highlighted.
4. The method of claim 1, analyzing the time series variation of the keypoint coordinates to obtain a preliminary action feature comprising: Establishing a target detection model based on YOLOv, detecting and positioning a human body region in the clear frame sequence, and extracting a target boundary box; inputting the target boundary box into a gesture estimation network model to obtain coordinates and gesture vectors of key points of a human body; Inputting the continuous time sequence of the key point coordinates into a cyclic neural network model, analyzing the key point displacement and the speed change, and outputting the dynamic characteristics of the action evolving along with time; Acquiring real-time acquisition frequency, and multiplying the dynamic characteristics by the acquisition frequency to obtain a weighted dynamic value; and if the weighted dynamic value is larger than a preset threshold value, marking the action corresponding to the dynamic characteristic as a remarkable action, and generating a preliminary action characteristic.
5. The method of claim 1, wherein calculating a dynamic gain factor based on the similarity score comprises: And inputting the similarity score into a combined model of the linear mapping function and the exponential smoothing function to obtain a gain factor, automatically improving the gain factor if the similarity score is smaller than a first threshold, and reducing the gain factor if the similarity score is larger than the first threshold.
6. The method of claim 1, wherein matching the refined motion component vector from a preset motion library to obtain a motion recognition result comprises: constructing a preset action library according to historical mine video data, wherein the preset action library comprises a waving signal and an action mode of squat examination; matching the refined motion component vector with the preset motion library, and generating a motion recognition result by combining updating iteration period and weight normalization processing.
7. The method of claim 1, wherein generating a corresponding action response output based on the action recognition result comprises: Generating a safety early warning signal or a production efficiency optimization instruction based on the action recognition result; If the motion recognition result indicates abnormal motion, extracting a multidimensional vector in a refined motion component vector of the abnormal motion, distributing a dimension coefficient for the multidimensional vector, and carrying out weighted fusion on the multidimensional vector based on the dimension coefficient to obtain a fusion vector; Checking the stability of the dimension coefficients, calculating the variance of each dimension coefficient, judging that the dimension coefficients are stable if the variance is lower than a preset threshold, otherwise, iteratively adjusting the dimension coefficients until the dimension coefficients are stable, and generating an instruction for triggering equipment operation adjustment; and generating an action response output according to the adjusted instruction, wherein the action response output is a device control signal.
8. A mine personnel behavior recognition system based on YOLOv for implementing a mine personnel behavior recognition method based on YOLOv5 as defined in any one of claims 1-7, the system comprising: The acquisition module is used for acquiring a video frame sequence of a mine site, extracting an environment interference vector from the video frame sequence, and carrying out optimization processing on the video frame sequence based on the environment interference vector to obtain a clear frame sequence; The analysis module is used for extracting human body key point coordinates and gesture vectors from the clear frame sequence, and analyzing time sequence changes of the key point coordinates to obtain preliminary action characteristics; The optimization module is used for calculating similarity scores of the preliminary motion features and a preset standard template, refining the preliminary motion features based on the similarity scores to obtain refined motion component vectors, wherein calculating the similarity scores of the preliminary motion features and the preset standard template comprises the steps of extracting a standardized gesture vector set from a mine motion recognition database as the preset standard template, calculating initial similarity between the preliminary motion features and the preset standard template, if the initial similarity is lower than a preset threshold, adjusting weight coefficients of each motion component in the preliminary motion features according to illumination uniform distribution and dust particle concentration in the environment interference vectors, generating the self-adaptive adjusted similarity scores according to the updated weight coefficients, wherein refining the preliminary motion features based on the similarity scores comprises judging whether the similarity scores are larger than a target threshold, determining that a matching relation exists between the preliminary motion features and the preset standard template, if the initial similarity is lower than the preset threshold, dividing the preliminary motion features into a plurality of components, extracting a corresponding time-series of the components according to illumination uniform distribution and dust particle concentration in the environment interference vectors, generating a key motion vector from the motion vectors, amplifying a key motion region by combining the key motion vectors with a threshold, generating a key motion region, and automatically adjusting the key motion region by using the key motion vector, and automatically adjusting the key motion region, the refinement motion component vector is the speed, angle and track change characteristic of the local layer; And the output module is used for matching the refined motion component vector from a preset motion library to obtain a motion recognition result, and generating a corresponding motion response output based on the motion recognition result.

Description

YOLOv-based mine personnel behavior recognition method and YOLOv-based mine personnel behavior recognition system Technical Field The application relates to the technical field of image processing, in particular to a mine personnel behavior recognition method and system based on YOLOv. Background Under the complex environment of mine sites, the action recognition technology faces a core technical problem of how to accurately and robustly recognize human actions under dynamic environment factors such as uneven illumination, dust interference and the like so as to support safety monitoring and production efficiency optimization. The problem is caused by unique service scenes of mine sites, the image quality of a video frame sequence is often reduced due to severe illumination change and rapid dust diffusion, and the action characteristics are difficult to extract, so that the action recognition precision is reduced, and the timeliness of safety early warning and the reliability of equipment control instructions are influenced. Specifically, uneven illumination causes fluctuation of brightness and contrast of a video frame, key point detection is easy to be interfered, time sequence features of human body actions are difficult to be stably extracted, dust diffusion further blurs images, noise removal difficulty is increased, and refinement of motion component vectors is inaccurate. In addition, the dynamic change of the environment interference and the real-time requirement of the action recognition form contradiction that the rapidly-changed environment needs a complex preprocessing algorithm to eliminate the interference, but the real-time requirement limits the calculation complexity of the algorithm, and meanwhile, the action recognition needs to be matched with a preset action library with high precision, but the similarity calculation of the environment interference vector and the standard template is easy to be interfered, and is difficult to be adjusted in a self-adaptive mode. These small problems together form the core challenges of robustness and accuracy of motion recognition in complex environments, especially in mine sites, misjudgment or missed judgment of motion recognition may directly lead to safety accidents or production efficiency reduction, and the uniqueness and importance of the technical problem are highlighted. Disclosure of Invention In order to solve the technical problems, the application provides a YOLOv-based mine personnel behavior recognition method and system, which are used for improving the accuracy of personnel behavior recognition in a complex mine environment. In a first aspect, the application provides a mine personnel behavior recognition method based on YOLOv, which comprises the following steps: Step S1, acquiring a video frame sequence of a mine site, extracting an environment interference vector from the video frame sequence, and carrying out optimization processing on the video frame sequence based on the environment interference vector to obtain a clear frame sequence; S2, extracting human body key point coordinates and gesture vectors from the clear frame sequence, and analyzing time sequence changes of the key point coordinates to obtain preliminary action characteristics; step S3, calculating similarity scores of the preliminary action features and a preset standard template, and refining the preliminary action features based on the similarity scores to obtain refined action component vectors; And S4, matching the refined motion component vector from a preset motion library to obtain a motion recognition result, and generating a corresponding motion response output based on the motion recognition result. With reference to the first aspect, in a first implementation manner of the first aspect of the present application, performing an optimization process on the video frame sequence based on the environmental interference vector includes: Analyzing the illumination uniform distribution and dust particle concentration of each frame of image in the video frame sequence, calculating an illumination dynamic range according to the illumination uniform distribution, and calculating dust diffusion speed according to the dust particle concentration; Processing the video frame sequence by adopting a convolutional neural network, and adjusting illumination color temperature deviation according to the illumination dynamic range in the environment interference vector; And adjusting the contrast of the video picture according to the dust diffusion speed in the environment interference vector to obtain a clear frame sequence. With reference to the first aspect, in a second implementation manner of the first aspect of the present application, adjusting a contrast of a video picture according to a dust diffusion speed in the environmental interference vector includes: Obtaining a displacement vector of a dust region pixel by applying an optical flow field estimation method to co