CN-121811456-B - Fish tracking method based on improved IoU and ReID networks
Abstract
The invention relates to the field of underwater Fish identification, in particular to a Fish tracking method based on improved IoU and ReID networks, which comprises the steps of carrying out image enhancement on underwater video, detecting a Fish object by utilizing YOLOv, constructing a Fish body central axis, calculating weighted transverse overlapping degree of a detection frame and a prediction frame along normal projection to obtain Fish-IoU which accurately reflects the Fish body shape, designing a quality gating ReID network, fusing appearance and time sequence flow characteristics, carrying out self-adaptive weighting according to image quality, outputting a robust identity characteristic vector, and carrying out linear weighted fusion on Fish-IoU and ReID characteristic similarity to generate comprehensive similarity measurement for track-detection association. The method and the device remarkably improve the association accuracy and the identity keeping capability of fish tracking in complex underwater scenes, and are suitable for underwater ecological monitoring and behavior analysis.
Inventors
- CHEN XIAOYAN
- YAO BOHUAI
- XIONG XIONG
- CHEN DEFANG
- PANG TAO
- HE XIAO
Assignees
- 四川农业大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260310
Claims (7)
- 1. A fish tracking method based on the improved IoU and ReID networks, comprising: s1, acquiring underwater fish videos, and performing image enhancement pretreatment; continuously collecting video data, processing the video, cutting fish-independent pictures and reserving key pictures, preprocessing the cut video data set through an image enhancement technology to obtain a preprocessed video data set; s2, detecting fish targets of each frame of image in the video dataset through a YOLOv model, and outputting a detection frame and the confidence level of the detection frame; Analyzing the processed video data set through YOLOv target detection frames, extracting multi-scale features through a backbone network, fusing semantic information of different levels through a feature pyramid network, executing bounding box regression and classification tasks through a detection head to generate a fish body detection frame with confidence scores, filtering the overlapped frames through a non-maximum suppression algorithm, reserving the detection result with the highest score as the accurate positioning and detection frame demarcation of the fish body targets in each frame of image, connecting the fish body targets in series frame by frame, constructing a sequence of moving paths and behavior changes of the fish body targets as tracks of the fish body targets, and taking the tracks of all the fish body targets as candidate tracks; S3, carrying out motion prediction on fish bodies in each frame of image in the video data set by using a Kalman filter to obtain a prediction frame; s4, acquiring a fish body contour through Canny edge detection, and constructing a fish body central axis in the contour; S5, projecting the detection frame and the prediction frame along the central axis and the normal direction of the central axis, calculating the transverse overlapping degree of each sampling position, and obtaining Fish-IoU according to a weighting strategy that the weight of the head is greater than that of the tail; S6, constructing a quality gating ReID network, extracting the ROI sequence of the latest T frame from each candidate track, extracting features through an appearance stream and a time sequence stream respectively, carrying out self-adaptive weighted fusion by utilizing a quality score, and outputting ReID feature vectors; and S7, carrying out data association by combining Fish-IoU with ReID feature similarity to realize track matching scoring of multi-target tracking.
- 2. The method for fish-tracking based on the improved IoU and ReID networks of claim 1, wherein in the step S1, the clipped video dataset is preprocessed by image enhancement technology, comprising: Step S1.1, bilateral filtering is carried out on the video images in the cut video data set, namely, the video images are subjected to smoothing treatment through the bilateral filter, so that the noise amplified by the enhancement process and the water body particle noise are effectively suppressed, and meanwhile, sharp edge information of fishes is reserved; s1.2, converting the smoothed image obtained in the S1.1 into an HSV color space from an RGB color space, and separating brightness information from color information; step S1.3, a CLAHE algorithm is applied to a brightness channel in an HSV color space so as to adaptively enhance the contrast of fish trunk, fish fin and a dim underwater background and highlight the shape outline and key edge characteristics of the whole fish body, wherein the CLAHE refers to limiting contrast adaptive histogram equalization; And S1.4, applying saturation and contrast fusion enhancement, namely converting the image back to an RGB color space, adaptively enhancing the saturation of the A and B channels in an LAB color space, finely adjusting the global contrast of the L channel, cooperatively improving the visual distinction degree of the fish body characteristics and the background, and assisting in head-tail direction judgment.
- 3. The method for fish tracking based on the improved IoU and ReID networks of claim 1, wherein the YOLOv model in the step S2 is a pre-training model for detecting underwater fish and outputting results, and the output results are the position of a detection frame in an image and the detection confidence.
- 4. The method for fish-tracking based on the improved IoU and ReID networks of claim 1, wherein the step S4 comprises: S4.1, processing the regional image in the detection frame output by the YOLOv model through a Canny edge detection algorithm, outputting a binarized contour image, marking pixels belonging to the contour of the fish body as a foreground and the rest as a background, thereby obtaining an initial fish body contour point set Outputting the corresponding framework confidence; Introducing a detection frame threshold and a framework confidence coefficient threshold, and when the confidence coefficient of the detection frame is lower than the detection frame threshold or the framework confidence coefficient is lower than the framework confidence coefficient threshold, judging that the detection or framework extraction quality of the current frame is unreliable, and directly measuring the detection effect by adopting the traditional generalized intersection ratio GIoU; step S4.2, performing Euclidean distance transformation on the inner region in the image formed by the binary contour clicking in step S4.1, namely the region surrounded by the foreground contour, for each pixel point in the inner region Calculate it to the nearest contour boundary point Wherein Recorded as distance value Generating a distance map, wherein the gray value of each pixel point represents the nearest distance of the pixel point from the contour boundary, The expression is: ; step S4.3, in the inner area of the distance graph, each point in the point set is sequentially used as a candidate point, and the distance value At the position of Identifying and extracting all pixel points meeting the local maximum value condition as candidate point sets of the central axis of the fish body ; Step S4.4 for the extracted discrete candidate Point set Sequencing and connecting, namely orderly connecting discrete points through a shortest path algorithm to form an initial continuous fold line penetrating through the head and the tail of the fish body, wherein the initial continuous fold line is used as an initial central axis ; Step S4.5, interpolating the initial central axis by cubic spline Fitting each ordered point of (a) to generate a final smooth curve Smooth curve Represented by a set of continuous parameters Defined two-dimensional space coordinate sequence, smooth curve The expression is: ; in the formula, =0 Sum =1 Corresponds to tail and head break points of the fish profile respectively, Respectively corresponding to the abscissa and the ordinate of the two-dimensional space coordinate, and smoothing the curve The morphology reference axis required for the subsequent Fish-IoU calculation is also referred to as the central axis L.
- 5. The method for fish-tracking based on the modified IoU and ReID networks of claim 4, wherein the step S5 comprises: Step S5.1 for the established central axis Evenly sampling, and setting the number of sampling points as N, and longitudinally sampling along the central axis direction, wherein the position expression is as follows: ; each longitudinal position is set As a sampling point, at each longitudinal position Taking M transverse sampling positions along the normal direction of the central axis to form a two-dimensional sampling grid, and representing each grid point as Wherein n represents a lateral offset index; Step S5.2, calculating the unit tangent vector of the central axis at each sampling point And unit normal vector Normalizing the unit tangent vector of the central axis, wherein the expression is as follows: ; In the middle of The derivative corresponding to the point i is L; Will be Rotating 90 degrees to obtain a normal vector The expression is: ; In the middle of And Respectively represent tangent vectors Components in the x and y directions; step S5.3 for detection frame Prediction frame Respectively extracting four vertex coordinates of the two in a coordinate system formed by the central axis and the normal line of the central axis; For each sampling point Normal vector corresponding to the same Projecting vertices of the detection and prediction frames to As the origin, to On a one-dimensional straight line which is the direction; Respectively calculating projection scalar values of four vertexes of the detection frame and four vertexes of the prediction frame, wherein the expression is as follows: ; In the middle of For projection scalar values, V is any one vertex of the detection frame and the prediction frame, 、 The respective horizontal and vertical coordinates of V are, 、 Respectively are sampling points Is defined by the abscissa of the (c), And Respectively the normal vector Components corresponding to x-axis and y-axis directions in an image coordinate system; Will detect frame Projection scalar values calculated from four vertices of (2) are respectively noted as Prediction frame The projection label values calculated from the four vertices of (2) are marked as ; For detecting frame Projection scalar values of four vertexes, and minimum value is calculated And maximum value Forming a one-dimensional projection interval in the normal direction: ; For prediction frames Projection scalar values of four vertexes, and minimum value is calculated And maximum value Forming a one-dimensional projection interval in the normal direction: ; and S5.4, calculating the transverse overlapping length of the detection frame and the prediction frame at each sampling point, calculating the projection width of the detection frame and the prediction frame in the normal direction, and simultaneously distributing different weights along the central axis direction of the Fish body to ensure the accuracy of the detection direction, and calculating Fish-IoU by combining the weight calculation weighted intersection and the weighted union, wherein the expression is as follows: ; ; ; Fish-IoU = ; In the middle of Represents the lateral overlap length of the detection frame and the prediction frame, The projection widths of the detection frame and the prediction frame in the normal direction are respectively, Representing sampling points Is a weight of (2).
- 6. The method for fish-tracking based on the modified IoU and ReID networks of claim 5, wherein the step S6 comprises: Step S6.1, data preparation, namely selecting a target with the confidence coefficient of the detection frame higher than the threshold value of the detection frame in the step S4 from candidate tracks, wherein the target is a fish body in the detection frame, the detection frame region corresponding to the latest T frame is extracted from the history record of the fish body, each detection frame region is cut into a fixed size to form an ROI image sequence, the sequence is organized into dimensions (B, T, C, H and W), and the sequence is sequentially in batch processing size, processing frame number, channel dimension, image height and width; Step S6.2, extracting appearance characteristic stream, namely inputting the ROI sequence into an appearance characteristic stream branch for processing, wherein the branch firstly uses a 2D convolution encoder with shared weight to independently perform characteristic coding on each frame image and output space characteristic graphs, then performs global average pooling operation on each characteristic graph to compress the characteristic graph into a global characteristic vector, finally, applies layer normalization on the vector to obtain appearance representation of each frame, and generates appearance characteristic tensor after processing T frames Wherein superscript D is the feature dimension; step S6.3, extracting the time sequence feature stream, namely outputting the sequence of the appearance feature stream Inputting time sequence feature stream branches, processing sequence data by using a time sequence modeling network, performing context modeling on the input appearance feature sequence along a time dimension T by using the network, capturing dynamic change and dependency relationship of the target appearance in a short time sequence, and outputting feature tensor after time sequence enhancement after processing is completed ; Step S6.4, quality gating and self-adaptive fusion, wherein the quality gating and self-adaptive fusion function is realized by a lightweight multi-layer perceptron to realize the appearance characteristics in S6.2 Predicting a scalar quality score for each frame in the sequence Calculating the fusion weight of the appearance flow and the time sequence flow through a weighting function, carrying out weighted fusion on the two paths of characteristics to generate a fusion characteristic with self-adaptive quality, wherein the expression is as follows: ; ; In the middle of Respectively integrating weights of the appearance stream and the time sequence stream, wherein a and b are weight parameters for regulating and controlling the quality fraction; Step S6.5, for the fused time sequence characteristics Aggregating along time dimension, aggregating sequence features of (B, T, D) dimension into sequence level features of (B, D) dimension by taking time sequence average value, transforming and regularizing the features through an output layer composed of a linear projection layer and a Dropout layer, performing L2 normalization on the output feature vector to obtain final ReID feature vector of each track 。
- 7. The method for fish-tracking based on the modified IoU and ReID networks of claim 6, wherein the step S7 comprises: step S7.1 for any one track Obtaining a current frame prediction frame obtained by prediction of a Kalman filter Obtaining a corresponding detection frame Counting the corresponding Fish-IoU value or GIoU value and recording as the shape matching degree ; Step S7.2. One track is followed Extracting the latest Y feature vectors from a historical video frame ReID feature queue maintained by the method to form a feature vector set, calculating cosine similarity between the feature vectors and all feature vectors in the set, taking the maximum value as appearance matching degree, and expressing as follows: ; In the middle of For detecting frame The corresponding feature vector is used to determine the feature vector, Is a track Feature vectors of the k-th frame nearest to the historical video; step S7.3, constructing a binary matching measure, and matching the geometric alignment-based shape matching degree and the identity consistency-based appearance matching degree And carrying out weighted fusion to obtain a comprehensive similarity score, wherein the completion expression is as follows: ; In the middle of A weight parameter representing dynamic adjustment IoU of the similarity to ReID similarity ratio according to the track state; Introducing similarity threshold to trajectories Is a comprehensive similarity score of (1) Checking if If the similarity is larger than the similarity threshold, adding the similarity into the tracking sequence; and S7.4, repeating the steps S7.1-S7.3 for each candidate track to obtain all tracking sequences, and tracking fish.
Description
Fish tracking method based on improved IoU and ReID networks Technical Field The invention relates to the field of underwater fish identification, in particular to a fish tracking method based on improved IoU and ReID networks. Background In recent years, the application of computer technology promotes the rapid rise of agricultural modernization, the application of computer vision related application has become an existing important subject, and the multi-objective tracking of underwater fish is a key link of refined ecological monitoring and behavior analysis. The existing method generally relies on a rectangle IoU or a mask IoU as a spatial similarity measure between the track and the detection, and combines the single-frame appearance ReID characteristics to perform identity correlation. However, for fish, which is an elongated and non-rigid object, a large bending often occurs in the swimming process, so that a large-area invalid background is contained in the bounding box or the tail is suspended, and the traditional IoU calculation is extremely unstable. Particularly, when the fish body is bent, turned or partially shielded, the simple IoU measurement is very easy to misjudge different individuals with adjacent spatial positions as the same track, so that tracking drift is caused. In addition, the inherent problems of blurring, noise, uneven illumination and the like of the underwater image lead to severe fluctuation of the quality of appearance characteristics of a single frame, and color textures among individuals of the same fish species are highly similar, and the identity is easily confused only by relying on the single frame ReID. The existing method generally lacks effective modeling of continuous time sequence information, cannot explicitly evaluate and fuse the image quality of each frame, causes insufficient tracking association robustness in a complex underwater environment, frequently generates identity switching and track breakage, and seriously affects the accuracy of subsequent quantitative analysis. Disclosure of Invention The invention aims to overcome the defects of the prior art and pain points of application scenes, and provides a fish tracking method based on improved IoU and ReID networks, so that the association accuracy and the identity keeping capability of fish tracking in complex underwater scenes are realized. The invention adopts the following technical scheme to realize the aim and provides a fish tracking method based on improved IoU and ReID networks, which comprises the following steps: s1, acquiring underwater fish videos, and performing image enhancement pretreatment; continuously collecting video data, processing the video, cutting fish-independent pictures and reserving key pictures, preprocessing the cut video data set through an image enhancement technology to obtain a preprocessed video data set; S2, detecting fish targets of each frame of image through YOLOv models, and outputting a detection frame and confidence level thereof; Analyzing the processed video data set through YOLOv target detection frames, extracting multi-scale features through a backbone network, fusing semantic information of different levels through a feature pyramid network, executing bounding box regression and classification tasks through a detection head to generate a fish body detection frame with confidence scores, filtering the overlapped frames through a non-maximum suppression algorithm, reserving the detection result with the highest score as the accurate positioning and detection frame demarcation of the fish body targets in each frame of image, connecting the fish body targets in series frame by frame, constructing a sequence of moving paths and behavior changes of the fish body targets as tracks of the fish body targets, and taking the tracks of all the fish body targets as candidate tracks; S3, carrying out motion prediction on fish bodies in each frame of image in the video data set by using a Kalman filter to obtain a prediction frame; S4, acquiring a fish body contour through Canny edge detection, and constructing a fish body central axis in the contour; S5, projecting the detection frame and the prediction frame along the central axis and the normal direction of the central axis, calculating the transverse overlapping degree of each sampling position, and obtaining Fish-IoU according to a weighting strategy that the weight of the head is greater than that of the tail; S6, constructing a quality gating ReID network, extracting the ROI sequence of the latest T frame from each candidate track, extracting features through an appearance stream and a time sequence stream respectively, carrying out self-adaptive weighted fusion by utilizing a quality score, and outputting ReID feature vectors; and S7, carrying out data association by combining Fish-IoU and ReID feature similarity to realize track matching scoring of multi-target tracking. Further, the step S1 performs preprocessin