CN-116109916-B - Target counting method, device, terminal and computer readable storage medium

CN116109916BCN 116109916 BCN116109916 BCN 116109916BCN-116109916-B

Abstract

The application provides a target counting method, a device, a terminal and a computer readable storage medium, wherein the target counting method comprises the steps of performing target detection on a current video frame to obtain detection information of a target object contained in the current video frame; the method comprises the steps of determining motion variable information between a target object and a historical object through a neighborhood attention module based on position information of the target object in a current video frame and position information of each historical object with the same category information as the target object in a historical video frame, determining whether the target object is identical to the historical object based on the motion variable information, and counting the target object if the target object is not identical to the historical object. The neighborhood attention module can acquire the richer characteristics of the target object and the historical target, further improve the detection accuracy of the motion variable information, further judge whether the target object and the historical target in different frames are the same target, and further avoid repeated counting.

Inventors

ZHAO LEI
SUN HAITAO
XIONG JIANPING
LI NINGCHUAN
YANG JIANBO
YAN JIN
Hu kaixuan

Assignees

浙江大华技术股份有限公司

Dates

Publication Date: 20260508
Application Date: 20221230

Claims (9)

1. A target counting method, characterized in that the target counting method comprises: performing target detection on a current video frame to obtain detection information of a target object contained in the current video frame, wherein the detection information comprises position information and category information; Determining, by a neighborhood attention module, motion variable information between the target object and the historical targets based on the position information of the target object in the current video frame and the position information of each historical target having the same category information as the target object in the historical video frames before the current video frame; determining whether the target object and the historical target are identical based on motion variable information between the target object and the historical target; Counting the target objects in response to the target objects being different from the historical targets; The determining, by the neighborhood attention module, motion variable information between the target object and the historical targets based on the position information of the target object in the current video frame and the position information of each historical target having the same category information as the target object in the historical video frames before the current video frame includes: Generating a corresponding attention vector between the target object and the historical target based on the position information of each target object in the current video frame and the position information of each historical target having the same category information as the target object in the historical video frame, wherein the attention vector comprises a key value K, a query Q and a value V; Determining a probability value that the target object and the historical target are the same target based on the attention vector corresponding between the target object and the historical target; motion variable information between the target object and the historical target is determined based on the corresponding attention vector between the target object and the historical target and the probability value.
2. The object counting method according to claim 1, wherein the motion variable information includes a position offset amount and angle information; the determining whether the target object and the history object are identical based on the motion variable information between the target object and the history object includes: And determining that the target object and the historical target are the same target in response to the position offset between the target object and the historical target meeting an offset threshold and the angle information meeting an angle threshold.
3. The method of claim 1, wherein the neighborhood attention module is a transducer attention model.
4. The method of object counting according to claim 1, wherein, The target detection is performed on the current video frame to obtain detection information of a target object contained in the current video frame, including: Extracting features of the video frames through a target detection network to obtain a plurality of target feature graphs with different scales; The visual attention module introduced in the target detection network is used for respectively convolving each target feature graph based on the matched convolution kernel, so that an attention feature graph corresponding to each target feature graph is obtained; Feature fusion is carried out on the attention feature images corresponding to the target feature images to obtain fusion feature images; And carrying out target detection on the fusion feature map to obtain detection information of each target object contained in the video frame.
5. The method according to claim 4, wherein the target detection network comprises M serially cascaded residual modules and M visual attention modules, each of the residual modules being connected to one of the visual attention modules, M being a positive integer; the feature extraction is performed on the video frame through the object detection network to obtain a plurality of object feature graphs with different scales, including: extracting features of the current video frame through the residual error modules to obtain target feature graphs with different scales output by the residual error modules; The visual attention module introduced in the target detection network respectively convolves each target feature map based on the matched convolution kernel to obtain an attention feature map corresponding to each target feature map, and the method comprises the following steps: And each visual attention module selects a matched convolution check based on the scale of the target feature map to respectively carry out convolution processing on the target feature map so as to obtain an attention feature map corresponding to each target feature map.
6. The method of object counting as claimed in claim 5, wherein, Before the step of performing target detection on the current video frame to obtain detection information of the target object contained in the current video frame, the method further comprises the following steps: constructing the target detection network based on YOLOv network and SKNet attention model; And pruning the target detection network, removing channels with low weight and/or removing the residual modules in the deep layer of the network.
7. An object counting device, characterized in that the object counting device comprises: The detection module is used for carrying out target detection on the current video frame to obtain detection information of a target object contained in the current video frame, wherein the detection information comprises position information and category information; The system comprises a target object, an analysis module, a motion variable information generation module and a motion variable information generation module, wherein the target object is used for generating a target object, a history object and a motion variable information between the target object and the history object, the analysis module is used for determining motion variable information between the target object and the history object based on the position information of the target object in the current video frame and the position information of each history object which has the same category information with the target object in the history video frame in a history video frame before the current video frame, the attention vector comprises a key value K, a query Q and a value V, the probability value of the target object and the history object is determined to be the same target based on the attention vector which corresponds to the target object and the history object, and the probability value of the history object; a processing module, configured to determine whether the target object and the history object are the same based on motion variable information between the target object and the history object; And the determining module is used for counting the target objects in response to the target objects being different from the historical targets.
8. A terminal comprising a memory, a processor and a computer program stored in the memory and running on the processor, the processor being configured to execute program data to implement the steps in the target counting method according to any one of claims 1 to 6.
9. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when executed by a processor, the computer program implements the steps of the object counting method according to any one of claims 1 to 6.

Description

Target counting method, device, terminal and computer readable storage medium Technical Field The present invention relates to the field of image recognition technologies, and in particular, to a target counting method, a device, a terminal, and a computer readable storage medium. Background With the development of science and technology, the development of aquaculture in China is in the transformation stage from traditional aquaculture to modern aquaculture, and the concepts of mechanization, automation and intelligence are gradually popularized in practice. Currently, there is a strong demand for automatic counting of fish shoals in aquaculture enterprises. The fish shoal counting is a basic operation for biomass estimation in the aquaculture industry, not only is beneficial to accurately calculating the reproduction rate of a breeder and estimating the production potential, but also has good guiding effect on survival rate estimation, breeding density control, transportation sales management and the like. At present, research on fish swarm counting at home and abroad can be classified into vision-based research and non-vision research according to different methods. The research based on vision mainly refers to shooting images of the fish shoal by using an underwater camera, and obtaining the number of the fish shoal through analysis of the images, while the research based on non-vision mainly comprises obtaining the number of the fish shoal based on signal changes generated by various sensors (such as an infrared optical counter, a resistivity fish counter), sonar equipment and the like through the motion process of the fish shoal. For non-visual studies, the advantage is that the fish-school signal capture is relatively sensitive, the counting algorithm is relatively simple and mature, while the disadvantage is that the signals generated by these devices generally affect the movements of the fish-school, thus altering the fish-school distribution and even the habit, and for overlapping or dense fish-school, accurate results cannot be obtained due to the mutual interference of the signals. The visual study can avoid the influence of a non-visual mode on the shoal of fish, and can solve the counting problem of the intensive condition of the shoal of fish to a certain extent, thereby becoming a current hot spot of domestic and foreign study. Disclosure of Invention The invention mainly solves the technical problem of providing a target counting method, a device, a terminal and a computer readable storage medium, and solves the problem of inaccurate target counting in the prior art. In order to solve the technical problems, the first technical scheme adopted by the invention is to provide a target counting method, which comprises the following steps: performing target detection on the current video frame to obtain detection information of a target object contained in the current video frame, wherein the detection information comprises position information and category information; Determining motion variable information between a target object and a historical target by a neighborhood attention module based on the position information of the target object in a current video frame and the position information of each historical target with the same category information as the target object in the historical video frames before the current video frame; Determining whether the target object and the historical target are identical based on the motion variable information between the target object and the historical target; in response to the target object being different from the historical target, the target object is counted. The method for determining the motion variable information between the target object and the historical targets through the neighborhood attention module based on the position information of the target object in the current video frame and the position information of each historical target with the same category information as the target object in the historical video frames before the current video frame comprises the following steps: generating a corresponding attention vector between the target object and the historical target based on the position information of each target object in the current video frame and the position information of each historical target with the same category information as the target object in the historical video frame, wherein the attention vector comprises a key value K, a query Q and a value V; Determining a probability value that the target object and the historical target are the same target based on the corresponding attention vector between the target object and the historical target; motion variable information between the target object and the historical target is determined based on the corresponding attention vector and probability value between the target object and the historical target. The motion variable information comprises position offse