CN-122024289-A - Personnel track tracking method and system based on network topology and multi-mode characteristics

CN122024289ACN 122024289 ACN122024289 ACN 122024289ACN-122024289-A

Abstract

The invention discloses a personnel track tracking method and a system based on network topology and multi-mode characteristics, wherein the method comprises the steps of constructing a physical topology diagram of a monitoring camera network and calculating space-time transition probability weights among nodes; the method comprises the steps of adopting a double-flow neural network to extract the facial, clothing and gait multi-mode characteristics of a target person, predicting a candidate node set which possibly appears at the next moment by utilizing a space-time graph neural network according to topological weight after the target disappears from a current node, carrying out pedestrian detection and characteristic extraction in a candidate node video stream, calculating the similarity of each mode characteristic, carrying out dynamic weighted fusion on space-time transition probability and the multi-mode similarity through a fusion scoring function to obtain a final matching score, judging the position of the target and updating the track according to the final matching score, and supporting multi-target collaborative tracking and heterogeneous sensor data fusion by the system. The invention effectively solves the problems of low recognition rate and large calculation redundancy of the traditional method under complex scenes such as changing, shielding, illumination change and the like.

Inventors

GAO XIANGLI
LI GUANGJIN
CAI XIN
ZHA XIAOYING
Ding Bosong
ZENG WEI

Assignees

南京城市职业学院(南京开放大学)

Dates

Publication Date: 20260512
Application Date: 20260408

Claims (8)

1. The personnel track tracking method based on the network topology and the multi-mode characteristics is characterized by comprising the following steps: Step S1, constructing a monitoring network topological graph, defining cameras as nodes, defining physical paths among the cameras as edges, and calculating space-time transition probability weights among the nodes according to historical people stream data and real-time people stream dynamics; S2, acquiring an initial video image sequence of a target person, and extracting a multi-modal feature vector of the target by adopting a deep learning model, wherein the multi-modal feature at least comprises facial features, clothing texture features and gait action features; Step 3, determining a current source node camera of the target person, and predicting a candidate node set which possibly appears in a next preset time window of the target person according to the space-time transfer probability weight and a path prediction model based on a space-time diagram neural network; step S4, detecting pedestrian objects in real time in video streams corresponding to the candidate node sets, extracting multi-modal feature vectors of the pedestrian objects, and respectively calculating visual similarity of the multi-modal feature vectors of target personnel; S5, constructing a fusion scoring function, carrying out dynamic weighted fusion on the space-time transition probability weight and the visual similarity of each mode, and calculating the final matching score of each candidate node; s6, judging the next physical position of the target personnel according to the final matching score, updating the action track of the target, and triggering abnormal behavior early warning when the track deviates from a conventional mode or enters an unauthorized area; the calculation formula of the fusion scoring function in step S5 is: ; Wherein, the Representing the final match score for candidate node i, Representing the probability of a space-time transition from a source node to node i, 、、 Respectively representing the facial, clothing and gait feature similarity of the pedestrian detected at the node i and the target, 、、 And Is that 、、 And The dynamic weight coefficient is adjusted according to the current tracking scene context, wherein the scene context comprises illumination conditions, people flow density and whether a reloading or shielding event is known to occur.
2. The method for tracking personnel trajectories based on network topology and multi-modal characteristics according to claim 1, wherein step S1 specifically comprises: S1.1, constructing a directional weighted graph G= (V, E) according to the physical installation position and space communication relation of the monitoring camera, wherein V represents a camera node set, E represents a directional edge set among nodes, and the edge direction represents the possible direction of personnel movement; s1.2, establishing a space-time transfer matrix, and counting the frequency and time interval distribution of personnel moving from a source node to each adjacent target node in different time periods; S1.3, calculating and dynamically updating the state transition probability among nodes by using a Markov chain model or a hidden Markov model and combining historical statistics and real-time sliding window data to serve as the space-time transition probability weight; And S1.4, when the monitoring network topology structure is changed, updating the graph structure and recalculating the transfer weight of the related edge.
3. The personnel track tracking method based on network topology and multi-modal features of claim 1, wherein in the step S2, a dual-flow fusion neural network structure is adopted to extract multi-modal features, and the method specifically comprises the steps of firstly, performing static facial features and clothing texture features based on a convolutional neural network, wherein the facial features are high-dimensional embedded vectors obtained through training by adopting ArcFace or CosFace loss functions, the clothing features are obtained through a block feature extraction network based on human body analysis, secondly, performing dynamic gait features based on a time sequence convolutional network or a cyclic neural network from a continuous video frame sequence, wherein the gait features comprise motion tracks, stride, step frequency and limb swing amplitude of skeleton key points, and performing space-time alignment and fusion on the static features and the dynamic features by adopting a cross-modal attention mechanism or a trans-former encoder to generate unified and robust multi-modal feature representation.
4. The personnel trajectory tracking method based on the network topology and the multi-modal characteristics according to claim 1, wherein in the step S3, a path prediction model based on a space-time diagram neural network comprises the steps of inputting space-time transfer weights of monitoring network topology diagrams, node attributes and edges into the space-time diagram neural network, capturing spatial dependency relations among nodes through a graph convolution layer, capturing a time evolution rule of a personnel flow mode through a time convolution layer or a circulation unit, taking a previous target position as an input, outputting probability distribution of targets appearing in each node in a future period, and selecting K nodes with highest probability as candidate node sets according to the probability.
5. The method for tracking personnel trajectories based on network topology and multi-modal characteristics according to claim 1, further comprising a multi-objective collaborative tracking mechanism: when the system simultaneously tracks a plurality of targets, maintaining independent feature templates and trajectory hypotheses for each target; When a plurality of pedestrians appear in the candidate nodes, a graph matching algorithm or a Hungary algorithm is adopted to solve the problems of target crossing and identity confusion, and stable association of a plurality of target tracks is realized.
6. The method for tracking personnel trajectories based on network topology and multi-mode features of claim 1, further integrating heterogeneous sensor data fusion, namely fusing data from Wi-Fi probes, bluetooth beacons, access control systems and infrared sensors, abstracting the heterogeneous data sources into virtual nodes or attributes, enhancing or correcting a video-based topology network, and providing multi-source evidence for trajectory prediction and positioning.
7. A person tracking system based on network topology and multi-modal characteristics for implementing the person tracking method of any one of claims 1-6, comprising: the data acquisition and access module is used for interfacing a monitoring camera with various protocols, an optional Wi-Fi probe and an access control system heterogeneous sensor, so as to realize real-time acquisition and synchronization of multi-source data; The network topology construction and management module is used for establishing and dynamically maintaining a directional weighted topological graph of the camera network, calculating and updating space-time transition probability weights among nodes, and supporting incremental learning and updating of a topological structure; the multi-mode feature extraction and fusion module integrates a double-flow fusion neural network and a cross-mode attention mechanism and is used for extracting and fusing facial, clothing and gait features of a person from a video stream in real time; The space-time path prediction module predicts candidate areas which possibly appear in the future according to the current target position and the topology weight based on a space-time diagram neural network model; the dynamic fusion decision module comprises a reinforcement learning agent and is used for dynamically adjusting multi-source information fusion weight according to scene context, calculating the matching score of the candidate node and making a positioning decision; The multi-target tracking and identity management module is used for managing characteristic templates and tracks of a plurality of targets and solving the problems of target intersection and identity confusion; The abnormal early warning and visualization module is used for triggering an alarm when detecting that the track is abnormal or intrudes into an unauthorized area, and providing a visual interface for personnel track playback, thermodynamic diagram analysis and decision process explanatory display.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method for tracing a person's trajectory based on network topology and multimodal features as claimed in any one of claims 1-6.

Description

Personnel track tracking method and system based on network topology and multi-mode characteristics Technical Field The invention relates to the technical field of computer vision, in particular to a personnel track tracking method and system based on network topology and multi-mode characteristics. Background In the scenes of smart cities, public safety, large-scale park management and the like, the method has important value for continuous and accurate track tracking of specific personnel. The current mainstream method mainly relies on a pedestrian Re-identification (Re-ID) technology to perform identity matching and association under a cross-camera view by extracting visual features (such as appearance, clothing, human face and the like) of pedestrians. However, the prior art faces major challenges in actual deployment such as visual feature failure, in complex real-world scenarios, the target person may face away or sideways from the camera (no effective face can be obtained), change coats, carry shields (such as umbrellas, packages), or be in a severely changing light, low light environment, resulting in a dramatic decrease in single or few visual features reliability, and a high matching failure rate. Computational redundancy and search efficiency are low, and conventional cross-camera tracking generally requires exhaustive search and feature comparison within the global scope of all networked camera coverage after a target is lost. The method has the advantages of huge calculated amount, high response delay and easy generation of false alarm due to more interference of irrelevant areas. The physical space logic constraint is lacking, most of the existing methods only depend on visual similarity, and the physical space constraint and the personnel movement rule contained in the monitoring camera network are ignored. The movement of personnel in a physical space is limited by the layout of channels, access, stairs, etc., and the trajectories thereof have inherent relevance and predictability in time and space. Identity confusion in multi-target tracking, namely in a crowd-intensive area, when a plurality of targets simultaneously appear or paths cross, identity (ID) switching or confusion is easy to occur only by visual features, so that track breakage or false association is caused. Therefore, a personnel track tracking scheme capable of integrating physical space priori knowledge, multi-dimensional robust features and an intelligent decision mechanism is needed to improve positioning accuracy, tracking continuity and overall system efficiency in complex scenes. Disclosure of Invention The invention aims at overcoming the defects and provides a personnel track tracking method and system based on network topology and multi-mode characteristics. In order to achieve the above purpose, the present invention adopts the following technical scheme: The personnel track tracking method based on the network topology and the multi-mode characteristics comprises the following steps: Step S1, constructing a monitoring network topological graph, defining cameras as nodes, defining physical paths among the cameras as edges, and calculating space-time transition probability weights among the nodes according to historical people stream data and real-time people stream dynamics; S2, acquiring an initial video image sequence of a target person, and extracting a multi-modal feature vector of the target by adopting a deep learning model, wherein the multi-modal feature at least comprises facial features, clothing texture features and gait action features; Step 3, determining a current source node camera of the target person, and predicting a candidate node set which possibly appears in a next preset time window of the target person according to the space-time transfer probability weight and a path prediction model based on a space-time diagram neural network; step S4, detecting pedestrian objects in real time in video streams corresponding to the candidate node sets, extracting multi-modal feature vectors of the pedestrian objects, and respectively calculating visual similarity of the multi-modal feature vectors of target personnel; S5, constructing a fusion scoring function, carrying out dynamic weighted fusion on the space-time transition probability weight and the visual similarity of each mode, and calculating the final matching score of each candidate node; and S6, judging the next physical position of the target personnel according to the final matching score, updating the action track of the target, and triggering abnormal behavior early warning when the track deviates from a conventional mode or enters an unauthorized area. The calculation formula of the fusion scoring function in step S5 is: ; Wherein, the Representing the final match score for candidate node i,Representing the probability of a space-time transition from a source node to node i,、、Respectively representing the facial, clothing and gait feature si