CN-121982688-A - Automatic fish fry identification, tracking and counting method based on machine vision

CN121982688ACN 121982688 ACN121982688 ACN 121982688ACN-121982688-A

Abstract

The invention relates to the field of fry counting, in particular to a fry automatic identification, tracking and counting method based on machine vision. The method comprises the steps of introducing an image enhancement module before detection to be combined with hardware to highlight integral characteristics of fish fries, introducing a CGAF module in a YOLOv network model in a detection and identification stage, adaptively fusing multi-source characteristics through a multi-attention mechanism, replacing an original detection head with a dynamic detection head DYNAMICHEAD with a self-attention mechanism, improving detection and counting precision, remarkably reducing ambiguity of tracking and matching, carrying out time sequence verification on a detection result of a current frame through a DeepSORT algorithm by utilizing a historical track in the tracking stage, feeding corrected information back to the YOLOv network model in real time for training feedback, finally forming a closed loop of detection guidance tracking and tracking supervision detection, triggering line passing counting based on a stable track, effectively aiming at complex scenes such as rolling, reflection, shielding and winding, and realizing high-flux accurate counting.

Inventors

TIAN CHANGFENG
ZHANG XINAN
CHE XUAN
ZHOU ZHULI
QU ZHI

Assignees

中国水产科学研究院渔业机械仪器研究所
农业农村部渔政保障中心

Dates

Publication Date: 20260505
Application Date: 20260409

Claims (10)

1. The automatic fry identifying, tracking and counting method based on machine vision is characterized by comprising the following steps: S1, acquiring a video stream image of a fish fry to obtain an initial data set; s2, preprocessing an initial data set through an image enhancement module to obtain a first data set; s3, inputting the first data set into a YOLOv network model constructed and improved by taking YOLOv lightweight network model as a reference model for identification, and outputting the boundary frame coordinates and class labels of the fries in the image; the improved YOLOv network model comprises a CGAF module introduced into a neck network, wherein the detection head is a DYNAMICHEAD dynamic detection head with a self-attention mechanism; S4, taking the bounding box coordinates and the class label information of the fries output in the S3 as state variables, constructing a cost matrix by utilizing a Marsdian distance and a cosine distance through DeepSORT real-time multi-target tracking algorithm, distributing unique identity for each target through cascade matching, outputting a tracking track, and simultaneously feeding back an output result to a YOLOv network model in the S3 for training; s5, counting the fries based on the tracking tracks obtained in the S4.
2. The method for automatically identifying, tracking and counting fries based on machine vision of claim 1, wherein the step S1 of capturing the video stream images of fries comprises capturing high-contrast fries images using a backlight illumination and a high-frame-rate area-array camera.
3. The method for automatically identifying, tracking and counting fish fries based on machine vision according to claim 1, wherein the preprocessing of the initial dataset in S2 by the image enhancement module comprises decomposing the image into a low-frequency part and a high-frequency part, and calculating gain coefficients of the high-frequency part by an adaptive contrast enhancement ACE algorithm.
4. The method for automatically identifying, tracking and counting fish fries based on machine vision according to claim 3, wherein the low-frequency part is obtained by low-pass filtering of an image, and the high-frequency part is obtained by subtracting the low-frequency part from an original image.
5. The method for automatically identifying, tracking and counting fish fries based on machine vision according to claim 1, wherein the CGAF module comprises: And calculating weights of the shallow features and the deep features by using a CGA module, giving different weights to feature graphs of different layers, fusing the feature graphs with the input features, and finally outputting final features by using convolution of 1 multiplied by 1.
6. The method for automatically identifying, tracking and counting fish fries based on machine vision of claim 5, wherein calculating weights of the shallow features and the deep features using the CGA module comprises: Calculating an input feature diagram of the CGA module: Carrying out channel attention and spatial attention fusion on the input feature map to respectively obtain a spatial attention map and a channel vector; And generating a space importance map of a specific channel by using the input feature map as content guidance, rearranging the feature map by using channel mixing operation, and normalizing after grouping convolution to obtain the weight.
7. The method for automatically identifying, tracking and counting fish fries based on machine vision of claim 1, wherein the DYNAMICHEAD dynamic detection head with self-attention mechanism comprises a scale-aware attention module, a space-aware attention module and a task-aware module.
8. The automatic identification, tracking and counting method for fries based on machine vision according to claim 1, wherein S4 comprises the following steps: S41, using the bounding box coordinates and the class label information of the fries output in the S3 as state variables, creating an initial tracking track, initializing a motion state of Kalman filtering, and predicting a corresponding box through the Kalman filtering; s42, predicting the position of the existing track in the current frame by using Kalman filtering, calculating the cross ratio between the prediction frame and the current frame detection frame, and constructing a cost matrix; s43, inputting a cost matrix into a Hungary algorithm for linear matching to obtain three results, and correspondingly processing a tracking track according to the three results; S44, repeatedly executing S42 and S43 until a confirmation track appears or the video stream ends; S45, predicting the current position of the confirmation track by using Kalman filtering, and performing cascade matching with a current detection frame; s46, performing secondary matching, namely performing cross-joint ratio matching on the unmatched detection, the track with the unacknowledged state and the track mismatched in the cascade matching after cascade matching, and constructing a cost matrix again; s47, hungary matching, namely inputting the cost matrix obtained in the step S46 into a Hungary algorithm to obtain three results again, wherein the processing logic is the same as that in the step S43; S48, circularly executing the steps S45-S47 until the video stream is ended.
9. The automatic identification, tracking and counting method for fish fries based on machine vision of claim 8, wherein the cascade matching comprises: calculating the mahalanobis distance and the cosine distance between the predicted target and the detected target; and respectively comparing and matching the Markov distance and the cosine distance with corresponding thresholds to obtain a cost matrix.
10. The automatic identification, tracking and counting method for fries based on machine vision according to claim 1, wherein the fries are counted in S5 by adopting a line passing counting strategy, a virtual detection line is set, and a counting event is triggered when the fries pass through the line.

Description

Automatic fish fry identification, tracking and counting method based on machine vision Technical Field The invention relates to the field of fry counting, in particular to a fry automatic identification, tracking and counting method based on machine vision. Background In the scenes of aquatic offspring seed proliferation and release, large-scale seedling production and the like, the quantity statistics of the fries is a core link for evaluating offspring seed yield, accounting release scale and guaranteeing culture benefits. The counting of fries in the prior art is generally accomplished by a machine vision based fry counting method. However, when the method is applied to complex scenes such as fish fry morphology difference, body surface reflection, stacking shielding, gesture winding and the like, the fish fry quantity counting method in the prior art is easy to generate missing detection, false detection, identity loss and counting errors, and is difficult to meet the real-time, accurate and high-flux counting requirements of proliferation and release, and needs to be improved. Disclosure of Invention The invention provides a machine vision-based automatic fish fry identification, tracking and counting method, which comprises the following steps: S1, acquiring a video stream image of a fish fry to obtain an initial data set; s2, preprocessing an initial data set through an image enhancement module to obtain a first data set; s3, inputting the first data set into a YOLOv network model constructed and improved by taking YOLOv lightweight network model as a reference model for identification, and outputting the boundary frame coordinates and class labels of the fries in the image; The improved YOLOv network model comprises the steps of introducing CGAF modules into a neck network, replacing a detection head with a DYNAMICHEAD dynamic detection head with a self-attention mechanism; S4, taking the bounding box coordinates and the class label information of the fries output in the S3 as state variables, constructing a cost matrix by utilizing a Marsdian distance and a cosine distance through DeepSORT real-time multi-target tracking algorithm, distributing unique identity for each target through cascade matching, outputting a tracking track, and simultaneously feeding back an output result to a YOLOv network model in the S3 for training; s5, counting the fries based on the tracking tracks obtained in the S4. Further, the step of acquiring the video stream image of the fry in S1 comprises adopting a backlight source to illuminate and adopting a high-frame-rate area array camera to acquire a high-contrast fry image. Further, in S2, preprocessing the initial data set by the image enhancement module includes decomposing the image into a low-frequency part and a high-frequency part, and calculating a gain coefficient of the high-frequency part by an adaptive contrast enhancement ACE algorithm. Further, the low-frequency part is obtained through low-pass filtering of the image, and the high-frequency part is obtained by subtracting the low-frequency part from the original image. Further, the CGAF module includes: And calculating weights of the shallow features and the deep features by using a CGA module, giving different weights to feature graphs of different layers, fusing the feature graphs with the input features, and finally outputting final features by using convolution of 1 multiplied by 1. Further, calculating weights for shallow features and deep features using the CGA module includes: Calculating an input feature diagram of the CGA module: Carrying out channel attention and spatial attention fusion on the input feature map to respectively obtain a spatial attention map and a channel vector; And generating a space importance map of a specific channel by using the input feature map as content guidance, rearranging the feature map by using channel mixing operation, and normalizing after grouping convolution to obtain the weight. Further, the DYNAMICHEAD dynamic detection head with the self-attention mechanism comprises a scale-aware attention module, a space-aware attention module and a task-aware module. Further, S4 includes the following steps: S41, using the bounding box coordinates and the class label information of the fries output in the S3 as state variables, creating an initial tracking track, initializing a motion state of Kalman filtering, and predicting a corresponding box through the Kalman filtering; s42, predicting the position of the existing track in the current frame by using Kalman filtering, calculating the cross ratio between the prediction frame and the current frame detection frame, and constructing a cost matrix; s43, inputting a cost matrix into a Hungary algorithm for linear matching to obtain three results, and correspondingly processing a tracking track according to the three results; S44, repeatedly executing S42 and S43 until a confirmation track appears or the video stream ends;