CN-115546260-B - Target identification tracking method, device, electronic equipment and storage medium

CN115546260BCN 115546260 BCN115546260 BCN 115546260BCN-115546260-B

Abstract

The invention provides a target recognition tracking method, a device, electronic equipment and a storage medium, wherein the method comprises the steps of constructing a first neural network model for training target recognition based on a RETINANET network and constructing a second neural network model for training target tracking based on a deepSORT algorithm; the method comprises the steps of obtaining a video stream of a preset monitoring area, inputting a picture frame of the preprocessed video stream into a trained first neural network model to obtain a target detection result of each target in the picture frame, and inputting a target detection result of at least one target into a trained second neural network model to perform position prediction to obtain track information corresponding to the target. The invention can accurately identify the target and meet the requirement of target track tracking.

Inventors

Huang Pijian
ZHANG TING
HUANG TAO
NIE DAGAN

Assignees

中国船舶集团有限公司第七一一研究所

Dates

Publication Date: 20260512
Application Date: 20220921

Claims (12)

1. A method of target identification tracking, the method comprising: The model construction step comprises the steps of constructing a first neural network model for training target recognition based on RETINANET networks, wherein the first neural network model comprises a feature extraction layer and a bounding box regression prediction layer which are reserved in the RETINANET networks, freezing target classification prediction layers which are not used in the RETINANET networks, defining a temporary target box prediction check point through tf. Train. Checkpoints to reserve a tower foundation layer and a target bounding box prediction layer, defining a temporary model check point to reserve the feature extraction layer and designate the target bounding box prediction layer as the temporary target box prediction check point, designating paths of the temporary model check points through a recovery function from the check points of a pre-training model to selectively recover weights of the feature extraction layer and the bounding box regression prediction layer, and constructing a second neural network model for training target tracking based on deepSORT algorithms, wherein the second neural network model comprises training a warehouse deepSORT tracker through a cosine_metric_training and applying TensorFlow frames to derive a frozen pb model; the target detection step comprises the steps of obtaining a video stream of a preset monitoring area, inputting a picture frame of the preprocessed video stream into a trained first neural network model to obtain a target detection result of each target in the picture frame, and inputting a target detection result of at least one target into a trained second neural network model to perform position prediction to obtain track information corresponding to the target.
2. The target recognition tracking method of claim 1, wherein the step of constructing a first neural network model for training target recognition based on RETINANET networks comprises: utilizing RETINANET network-based transfer learning to construct a video processing module and a detection algorithm module so as to build a training environment of the first neural network model; The training environment comprises the steps of extracting picture features by using a ResNet network and an FPN network in RETINANET networks, searching frames by using an anchor frame, calling Focal loss function prediction categories in a class subnet, and predicting frame coordinates and sizes by using a box subnet.
3. The target recognition tracking method of claim 2, wherein the step of performing the transfer learning using RETINANET network comprises: modifying parameters of a configuration file of a preset model to adapt to training requirements of RETINANET networks, wherein the parameters comprise one or more combinations of the number of training target types, the size of a training picture readjustment, changing a classification model, training a data path, evaluating the data path and a label index path; Setting a training mode of the first neural network model and configuring training parameters to perform model training, wherein the training parameters comprise one or more combinations of the number of samples for one training, the number of training batches, the learning rate and the optimization mode; setting tracking parameters in the training process to select an optimal model as the first neural network model.
4. The target recognition tracking method of claim 1, wherein the step of constructing a second neural network model for training target tracking based on deepSORT algorithm further comprises: And taking a model for deriving the training result as the second neural network model.
5. The method according to claim 1, wherein the steps of acquiring a video stream of a preset monitoring area, inputting a frame of a picture of the preprocessed video stream into a trained first neural network model, and obtaining a target detection result of each target in the frame of the picture include: The method comprises the steps of accessing camera equipment to obtain a video stream of a preset monitoring area, wherein parameters of the video stream comprise one or more combinations of a frame transmission number per second, a width and a height of each frame and pixels; initializing a tracker, setting related parameters, calculating cosine distance measurement, setting the maximum cosine distance between two frames of picture targets in the tracker, and controlling calculation of adjacent target characteristics; Preprocessing the video stream into image pictures according to frames, changing each frame of picture to the size processed by the first neural network model, and adjusting parameters according to different scenes; And inputting the frame picture into the first neural network model to obtain a target detection result of the picture.
6. The method according to claim 1, wherein the step of inputting the target detection result of the at least one target to the trained second neural network model for position prediction, and obtaining the track information corresponding to the target comprises: creating a corresponding track detection frame according to the target detection result; Performing secondary classification on the target detection result by using the second neural network model, and converting target features, target frame coordinates, target frame categories and confidence level extracted from the target detection result into a data format input into a deepSORT tracker; The created track detection frame is subjected to position prediction according to the track detected by the previous frame number in the deepSORT tracker.
7. The method of claim 6, wherein the step of predicting the track detection frame based on the track detected by the previous frame number in deepSORT tracker comprises: predicting the position of the track detection frame at the time t based on the position of the created track detection frame at the time t-1; and based on the position detected at the time t, updating the positions of other track detection frames associated with the track detection frames to obtain track information corresponding to the target.
8. The target recognition tracking method of claim 7, wherein predicting the position of the track detection box at time t based on the position of the created track detection box at time t-1 comprises: determining a formula and a covariance formula of the position prediction of the track detection frame; the formula of the position prediction of the track detection frame is as follows: ; Mean value of target position information corresponding to track detection frame at t-1 moment, which is obtained by central position coordinate of target boundary frame ) The aspect ratio r and the height h, and the corresponding speed change value of each track detection box, The average value of the target position information corresponding to the track detection frame at the time t is represented, and F represents a state transition matrix; the covariance formula of the position prediction of the track detection frame is as follows: ; P represents the covariance matrix of the track detection box at time t-1, Q represents the noise matrix of the system, The covariance matrix of the track detection box at time t is shown.
9. The method according to claim 8, wherein the step of correcting the positions of other track detection frames associated therewith based on the positions detected at the time t to obtain the track information corresponding to the target includes: and calculating an error value of the average value of the target detection result and the track detection frame, wherein the calculation formula is as follows: ; wherein z represents the mean vector of the track detection frame, H represents the measurement matrix, and y represents the error value; Updating the mean vector x and the covariance matrix P of the track detection frame, wherein the updated calculation formula is as follows: ; ; ; wherein S represents an intermediate variable, R represents a noise matrix of the deepSORT tracker, and I represents an identity matrix; Based on the updated track detection frame, performing cascade matching on the current track detection frame and the track detection frame associated with the current track detection frame; And outputting the coordinate frame position point of the target as track information of the target according to the cascade matching result.
10. An object recognition tracking device, the device comprising: The model construction module is used for constructing a first neural network model for training target recognition based on RETINANET networks, and comprises a feature extraction layer and a bounding box regression prediction layer which are reserved for the RETINANET networks, a target classification prediction layer which is not used in the RETINANET networks, a temporary target box prediction check point which is defined by tf. Trace. Checkpoints to reserve a tower foundation layer and a target bounding box prediction layer, a temporary model check point which is defined to reserve the feature extraction layer and designate the target bounding box prediction layer as the temporary target box prediction check point, a path of the temporary model check point which is designated by a recovery function from the pre-training model check point to selectively recover the weights of the feature extraction layer and the bounding box regression prediction layer, and a second neural network model which is used for training target tracking based on deepSORT algorithms and comprises training a warehouse deepSORT tracker by a cosine_metric_training and applying TensorFlow frames to derive a frozen pb model; the target detection module is used for acquiring a video stream of a preset monitoring area, inputting a picture frame of the preprocessed video stream into the trained first neural network model, and obtaining a target detection result of each target in the picture frame; and the target prediction module is used for inputting a target detection result of at least one target into the trained second neural network model to perform position prediction so as to obtain track information corresponding to the target.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the object recognition tracking method according to any one of claims 1 to 9 when the program is executed.
12. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the object recognition tracking method according to any of claims 1 to 9.

Description

Target identification tracking method, device, electronic equipment and storage medium Technical Field The present invention relates to the field of artificial intelligence technologies, and in particular, to a target identification tracking method, apparatus, electronic device, and storage medium. Background For example, in the marine industry, monitoring of a particular work environment on a ship is an important component of the daily work operations and personnel protection of the ship. The personnel behavior and the line identification of various cabins such as a ship cab, a locomotive furnace cabin and the like are also very important, so that monitoring for identifying non-staff to intrude into a sensitive working area is needed, and the line of the staff is recorded and identified. In the technical field of multi-target tracking, a few manually designed features are adopted by a common algorithm, such as an optical flow method, a particle filtering method, a mean shift filtering method and the like. However, because these algorithms are affected by noise, target motion speed, frame rate selection and other multiple factors in the scene, the target features in the traditional multi-target tracking algorithm need to depend on professional parameter adjustment, the uncertainty of the algorithm is too large, and the target is difficult to accurately identify and the requirements of target movement tracking and monitoring cannot be met. Disclosure of Invention The invention provides a target identification tracking method, a target identification tracking device, electronic equipment and a storage medium, which are used for solving the problems that in the prior art, a target is difficult to accurately identify and the requirement of target movement tracking monitoring cannot be met. In a first aspect, the present invention provides a target recognition tracking method, the method comprising: Constructing a first neural network model for training target recognition based on RETINANET networks and constructing a second neural network model for training target tracking based on deepSORT algorithms; Acquiring a video stream of a preset monitoring area, and inputting a picture frame of the preprocessed video stream into a trained first neural network model to obtain a target detection result of each target in the picture frame; And inputting a target detection result of at least one target into the trained second neural network model to perform position prediction, so as to obtain track information corresponding to the target. In one embodiment of the present invention, the step of constructing a first neural network model for training object recognition based on RETINANET networks includes: utilizing RETINANET network-based transfer learning to construct a video processing module and a detection algorithm module so as to build a training environment of the first neural network model; The training environment comprises the steps of extracting picture features by using a ResNet network and an FPN network in RETINANET networks, searching frames by using an anchor frame, calling Focal loss function prediction categories in a class subnet, and predicting frame coordinates and sizes by using a box subnet. In one embodiment of the present invention, the step of performing the transfer learning using RETINANET networks includes: modifying parameters of a configuration file of a preset model to adapt to training requirements of RETINANET networks, wherein the parameters comprise one or more combinations of the number of training target types, the size of a training picture readjustment, changing a classification model, training a data path, evaluating the data path and a label index path; Preserving a feature extraction layer and a bounding box regression prediction layer of the RETINANET network and an unused target classification prediction layer in the frozen RETINANET network; Setting a training mode of the first neural network model and configuring training parameters to perform model training, wherein the training parameters comprise one or more combinations of the number of samples for one training, the number of training batches, the learning rate and the optimization mode; setting tracking parameters in the training process to select an optimal model as the first neural network model. In one embodiment of the present invention, the step of constructing a second neural network model for training object tracking based on deepSORT algorithm includes: training deepSORT the tracker through a cosine_metric_training training warehouse of deepSORT algorithm, and applying TensorFlow framework to derive training results, wherein the derived training results are frozen pb models; And taking a model for deriving the training result as the second neural network model. In an embodiment of the present invention, the step of obtaining a video stream of a preset monitoring area, and inputting a frame of a preprocessed video