CN-121980056-A - Video image quick retrieval method combining human body characteristics and gait characteristics
Abstract
The application relates to the technical field of image recognition and discloses a video image quick retrieval method combining human body characteristics and gait characteristics, which comprises the following steps of obtaining a video or an image of a target person and extracting the appearance characteristics of the target person from the video or the image; the method comprises the steps of carrying out similarity matching in a monitoring video library, screening candidate targets exceeding a preset threshold, synchronously recording the occurrence time period and the geographic position of each candidate target, intercepting continuous walking video fragments from corresponding videos according to the occurrence time period of each candidate target, extracting gait features of the candidate targets from the continuous walking video fragments, extracting gait features from videos of target persons, carrying out similarity comparison on the gait features of the candidate targets and the gait features of the target persons, and outputting final candidate targets according to comparison results. The application enhances the target recognition and tracking efficiency in video monitoring.
Inventors
- CHEN GUANGZHU
- CHEN JUNLI
- ZHANG JIAN
- ZHANG ZHENGMING
- CHEN YI
Assignees
- 池州市公安局
Dates
- Publication Date
- 20260505
- Application Date
- 20260106
Claims (8)
- 1. The video image quick retrieval method combining human body characteristics and gait characteristics is characterized by comprising the following steps of: S1, acquiring a video or an image of a target person, and extracting appearance characteristics of the target person from the video or the image by utilizing a pre-trained person re-recognition algorithm; s2, taking the appearance characteristics of the target person as a query vector, performing similarity matching in a monitoring video library, screening candidate targets exceeding a preset threshold, and synchronously recording the occurrence time period and the geographic position of each candidate target; S3, intercepting continuously walking video clips from the corresponding videos according to the time period of each candidate target; S4, extracting gait characteristics of candidate targets from the continuous walking video segments by using a gait recognition algorithm; S5, extracting gait characteristics from videos of target personnel by using a gait recognition algorithm; s6, comparing the gait characteristics of the candidate target with the gait characteristics of the target person, if the similarity is higher than a preset threshold, judging that the candidate target and the target person are the same person, if the similarity is lower than the preset threshold, eliminating the candidate target, and outputting a final candidate target according to the comparison result.
- 2. The method for quickly retrieving video images combining human body features and gait features according to claim 1, wherein the steps of acquiring the video or image of the target person and extracting the appearance features of the target person from the video or image by using a pre-trained person re-recognition algorithm comprise the steps of: S11, acquiring information of target personnel for retrieval, wherein the information comprises video or images; s12, if the target person is an image, carrying out standardized preprocessing on the image; If the video is the video of the target person, extracting a key frame containing a clear front or side view of the target from the video, and carrying out standardized pretreatment on the key frame; S13, inputting the standardized preprocessed image or key frame into a pre-trained pedestrian re-recognition algorithm to generate the appearance characteristics including color textures, body shapes and wearing articles.
- 3. The method for quickly searching video images combining human body features and gait features according to claim 1, wherein the step of performing similarity matching in a monitoring video library by using the appearance features of the target person as query vectors, screening candidate targets exceeding a preset threshold, and synchronously recording the occurrence time period and the geographic position of each candidate target comprises the following steps: S21, acquiring historical track data of a target, analyzing a movement rule by using a space-time prediction model, and outputting a camera list with probability exceeding a preset confidence threshold and a corresponding time window; s22, defining a target area for the search according to the camera list and the time window; s23, taking the target appearance characteristics as query vectors, and performing similarity calculation in a defined target area to generate appearance similarity scores of all matching objects; s24, comparing the appearance similarity score with a preset appearance score threshold, screening candidate targets exceeding the appearance score threshold according to a comparison result, and forming a candidate target set; s25, recording the accurate time period and the geographical position information of each candidate object in the candidate object set.
- 4. The method for quickly retrieving video images combining human body features and gait features according to claim 3, wherein the step of obtaining the historical track data of the target, analyzing the movement rule thereof by using a space-time prediction model, and outputting a camera list with probability exceeding a preset confidence threshold and a corresponding time window comprises the following steps: S211, calculating the transfer tendency of the target in the camera network based on the historical track data of the target and combining the current time, the position and the historical movement mode, and outputting the prior probability distribution based on the statistical rule; S212, inputting the historical track of the target into a cyclic neural network according to time sequence, extracting a long-term movement rule, and generating a predictive probability distribution based on a sequence model; S213, dynamically fusing prior probability distribution and predictive probability distribution through an attention mechanism, and generating uniform mixed probability distribution according to weight; S214, performing multi-step time sequence prediction by taking the mixed probability distribution as an initial state, generating a complete probability map of a future period, and screening a camera list and a corresponding time window of the camera list with all probabilities exceeding a preset confidence threshold.
- 5. The method for quickly retrieving a video image combining human body features and gait features according to claim 4, wherein the dynamic fusion of the prior probability distribution and the predictive probability distribution by the attention mechanism, the generation of a unified mixed probability distribution by weight, comprises the steps of: s2131, respectively calculating the information entropy of prior probability distribution and prediction probability distribution, and generating a prior distribution entropy value and a prediction distribution entropy value; s2132, generating prior distribution fusion weights and prediction distribution fusion weights through an attention weight calculation module based on the prior distribution entropy values and the prediction distribution entropy values; s2133, respectively carrying out weighted summation on the prior probability distribution and the prediction probability distribution according to the prior distribution fusion weight and the prediction distribution fusion weight, and carrying out normalization processing on the result to generate uniform mixed probability distribution.
- 6. The method for rapid retrieval of video images combining features of the human body and gait features according to claim 1, wherein the step of extracting the gait features of the candidate object from the continuously moving video segments using the gait recognition algorithm comprises the steps of: s41, dividing an input video segment frame by frame, and outputting a binarization contour of a person; s42, detecting a gait cycle of the binarized contour sequence, and calculating and generating a gait energy diagram based on the detected contour in the complete cycle; S43, inputting the gait energy diagram into a pre-trained deep neural network, and finally outputting gait characteristics of the candidate target.
- 7. The method for quickly searching the video image combining the human body characteristics and the gait characteristics according to claim 1, wherein the step of comparing the similarity between the gait characteristics of the candidate target and the gait characteristics of the target person comprises the following steps: Calculating gait similarity of the candidate target and the target person in an initial time period, judging that the candidate target and the target person pass verification if the value is higher than a preset preliminary pass threshold value, and judging that the candidate target and the target person do not pass the verification if the value is not higher than the preset preliminary pass threshold value; Extracting gait features of the failed candidate targets in a plurality of time periods, respectively carrying out similarity calculation with the reference gait features of target personnel, and carrying out weighted fusion to generate comprehensive gait similarity; weighting and fusing the comprehensive gait similarity and the appearance similarity according to balance factors to generate the comprehensive similarity; and comparing the comprehensive similarity with a preset threshold interval, and marking candidate targets falling into the manual rechecking confidence interval as the targets needing to be manually rechecked.
- 8. The method for quickly searching video images combining human body features and gait features according to claim 7, wherein for the failed candidate target, extracting gait features of the candidate target in a plurality of time periods, respectively performing similarity calculation with reference gait features of the target person, and performing weighted fusion to generate comprehensive gait similarity, wherein the method comprises the following steps of: For candidate targets which do not pass the initial comparison, intercepting continuous walking fragments at a plurality of different time points from the original video of the candidate targets to form a multi-period fragment set; Respectively extracting gait features of candidate targets from the multi-period segment sets to form a candidate multi-period gait feature set; each feature in the multi-period gait feature set is subjected to similarity calculation with the reference gait feature of the target person to obtain a corresponding gait similarity score list; Based on the time adjacency and the video quality criterion, distributing corresponding fusion weights for each score in the gait similarity score list, and generating a weight list; And carrying out weighted average calculation on the gait similarity score list by using the weight list, and outputting the comprehensive gait similarity of the candidate target.
Description
Video image quick retrieval method combining human body characteristics and gait characteristics Technical Field The application relates to the technical field of image recognition, in particular to a video image quick retrieval method combining human body characteristics and gait characteristics. Background In the current video image intelligent retrieval application, face recognition and pedestrian re-recognition are common techniques, face recognition depends on clear facial features, but in actual monitoring, a camera is often higher and far away, so that the face is fuzzy and difficult to effectively retrieve, and therefore, the human feature recognition technology is developed, identity matching is carried out through features such as clothing, body types and the like, the technology can extract human features, retrieve in massive videos and effectively reduce labor cost. For example, security systems perform cross-camera tracking through clothing colors, patterns, and shapes, existing offline retrieval schemes utilize GPU acceleration, are fast and low cost, however, retrieval relying only on appearance features has the following problems: 1. similar clothing and appearance can lead to false positives and difficulty in continuing tracking when personnel change clothing; 2. the comparison method of the fixed bayonet snap cannot cover the occurrence of the target in other scenes, and blind areas are easy to occur. In order to improve the personnel identification accuracy under the complex condition, a gait identification algorithm is applied in recent years, the gait algorithm identifies the identity by analyzing the limb movement law when a human body walks, and the method has the advantages of being effective under the conditions of long distance, low resolution, even facial shielding or back to a camera, for example, the gait identification system is started to be used in the field of Chinese security, the identity is confirmed by body state and step gesture outside 50 meters, gait characteristics are difficult to disguise, because the algorithm provides more stable identification basis based on whole body movement analysis compared with static characteristics such as clothing, the application of the gait algorithm in large-scale video retrieval still faces the problems of cost and efficiency, the human body contour or estimated gesture is required to be segmented frame by extracting the gait characteristics, the calculated amount is large, the existing software cannot process video streams in real time, the video is required to be operated for about 10 minutes every 1 hour, thus, huge calculation resources and time cost are brought by comprehensively enabling gait comparison, and in addition, the commercial gait identification system is high in price, and large-scale deployment of the gait identification system is limited. In summary, in the prior art, in the video personnel retrieval, the human appearance features are relied on rapidly and at low cost, but the accuracy is low, the gait recognition is accurate, but the calculation cost is high and the popularization is difficult, so that the technical means for effectively combining the high-efficiency retrieval of the human appearance features and the gait features is lacking at present. Disclosure of Invention In order to solve the problem that the existing technology means for effectively combining the human body appearance characteristics with the gait characteristics is lacking, the application provides a video image rapid searching method combining the human body characteristics with the gait characteristics. The application provides a video image quick retrieval method combining human body characteristics and gait characteristics, which comprises the following steps: S1, acquiring a video or an image of a target person, and extracting appearance characteristics of the target person from the video or the image by utilizing a pre-trained person re-recognition algorithm; s2, taking the appearance characteristics of the target person as a query vector, performing similarity matching in a monitoring video library, screening candidate targets exceeding a preset threshold, and synchronously recording the occurrence time period and the geographic position of each candidate target; S3, intercepting continuously walking video clips from the corresponding videos according to the time period of each candidate target; S4, extracting gait characteristics of candidate targets from the continuous walking video segments by using a gait recognition algorithm; S5, extracting gait characteristics from videos of target personnel by using a gait recognition algorithm; s6, comparing the gait characteristics of the candidate target with the gait characteristics of the target person, if the similarity is higher than a preset threshold, judging that the candidate target and the target person are the same person, if the similarity is lower than the preset threshold, e