CN-121999288-A - Image target detection system and method based on deep learning
Abstract
The invention belongs to the technical field of computer vision, and discloses an image target detection system and method based on deep learning; the method comprises the steps of collecting image data to be detected, generating candidate areas, combining data cleaning to obtain a candidate area image data set, conducting reduced semantic area screening, removing redundant semantic area images to obtain a reduced candidate image data set, conducting multidimensional association information extraction to construct a target area association diagram, constructing a target sub-diagram set by combining the target area association diagram and parameter information of the candidate areas, conducting unified information fusion to output a multichannel target association diagram, conducting dynamic adjustment of calculated value to obtain an optimized target association diagram, conducting statistic density adjustment to output an enhanced feature target association diagram, conducting target detection to output a target detection result, and improving the structural expression capacity and path scheduling flexibility of the image target detection.
Inventors
- LI JIQIAN
- LI LIN
- LIU ZHIWEI
- YANG SHOUZHI
Assignees
- 东莞职业技术学院
Dates
- Publication Date
- 20260508
- Application Date
- 20260126
Claims (10)
- 1. An image target detection method based on deep learning is characterized by comprising the following steps: s1, collecting image data to be detected, generating candidate areas, and combining data cleaning treatment to obtain a candidate area image data set; s2, performing reduced semantic region screening on the candidate region image data set, and removing redundant semantic region images to obtain a reduced candidate image data set; S3, extracting multidimensional association information from the reduced candidate image data set, and constructing a target area association diagram based on the multidimensional association information and the correlation relation of the corresponding candidate areas; s4, constructing a target sub-image set for the reduced candidate image data set by combining the target area association graph and the parameter information of the candidate area, and carrying out information unified fusion on the target sub-image set to output a multichannel target association graph; S5, dynamically adjusting the calculation value of the multi-channel target association graph, and synchronously updating the calculation priority of each candidate area in the multi-channel target association graph to obtain an optimized target association graph; S6, carrying out statistic density adjustment on the optimized target association diagram, outputting an enhanced feature target association diagram, carrying out target detection based on the enhanced feature target association diagram, and outputting a target detection result.
- 2. The method for detecting an image target based on deep learning according to claim 1, wherein the method for performing reduced semantic region screening comprises: Dividing each candidate region image in the candidate region image dataset into regions, and outputting a candidate region and other regions; extracting the regional parameter characteristics of the candidate region of each candidate region image, and constructing a regional characteristic vector of the candidate region based on the regional parameter characteristics; Calculating the feature similarity of the region feature vector and the feature vector of the preset target class, and integrating all feature similarities to construct a confidence output vector of the candidate region; calculating a confidence coefficient standard deviation based on the confidence coefficient output vector, identifying the maximum confidence coefficient and the second maximum confidence coefficient, and calculating a confidence coefficient difference value; Identifying the center point positions of the candidate region and each other region, calculating the center point distance between the center point position of the candidate region and the center point position of each other region, and simultaneously obtaining the region intersection ratio of the candidate region and the other regions; If the candidate region in the candidate region image meets the semantic fuzzy region and the candidate redundant region simultaneously, judging the candidate region image corresponding to the candidate region as a redundant semantic region image, and integrating the rest candidate region images to be used as a reduced candidate image data set.
- 3. The method for detecting an image target based on deep learning according to claim 2, wherein the method for extracting multidimensional association information comprises: identifying all candidate areas in each reduced candidate image, constructing any two candidate areas as candidate area pairs, and acquiring geometric parameters of the candidate area pairs; calculating the center coordinate distance of each candidate region pair based on the geometric parameters, and calculating the minimum circumscribed rectangle size parameter; The area proportion of each candidate region in each candidate region pair is calculated, and the candidate region with larger area proportion is used as a main region; and numbering all candidate region pairs in each reduced candidate image, and carrying out numbering and sorting based on the direction weight factors to construct a region direction influence matrix, namely multidimensional associated information.
- 4. A method for detecting an image object based on deep learning according to claim 3, wherein the method for constructing the object region association graph comprises: setting an edge construction threshold, if the direction weight factor of the candidate region pair exists in the region direction influence matrix and is higher than the edge construction threshold, constructing a directional edge of the candidate region pair, and determining the directional edge direction based on the region direction vector of the candidate region pair; and integrating the directional edges of all candidate areas in each reduced candidate image, taking the candidate areas as nodes, taking the direction weight factors as the directional edge weights of the corresponding directional edges, and constructing a target area association graph of the reduced candidate image.
- 5. The method for detecting an image object based on deep learning according to claim 4, wherein the method for constructing the object sub-graph set comprises: performing edge detection on each reduced candidate image, outputting boundary contour area image blocks of each candidate area, and calculating boundary gradient distribution of each boundary contour area image block; Carrying out correlation comparison on boundary gradient distribution of any two boundary contour region image blocks, if the gradient direction correlation degree is higher than a preset gradient similarity threshold value, establishing an undirected connecting edge between the two corresponding candidate regions, and constructing a boundary similarity correlation diagram; Extracting the local texture entropy of each boundary contour region image block, calculating the gray standard deviation, and calculating the texture complexity of the boundary contour region image block based on the local texture entropy and the gray standard deviation; Calculating the pixel overlapping area of any two boundary contour area image blocks, and if the ratio of the pixel overlapping area to the total area is higher than an overlapping threshold value, establishing an overlapping area structure diagram for the corresponding two boundary contour area image blocks; and integrating the target region association graph, the boundary similarity association graph, the texture structure association graph and the overlapped region structure graph of each boundary contour region image block to form a target sub-graph subset of the corresponding candidate region, and integrating all the target sub-graph subsets by taking the number of the candidate region as a label to obtain a target sub-graph set.
- 6. The method for detecting an image target based on deep learning according to claim 5, wherein the method for performing unified fusion of information comprises: aligning the edge connection relations of each target sub-graph subset, extracting connection edge attribute information, and calculating edge attribute values of the corresponding connection relations of each sub-graph based on the connection edge attribute information; Fusing the connecting edges of the subgraphs with the same number, weighting and fusing the edge attribute values to obtain fused edge weights, and adding channel labels; and associating the fused edges with the corresponding candidate areas, and outputting a multi-channel target association diagram of the candidate areas.
- 7. The method for detecting an image object based on deep learning according to claim 6, wherein the means for dynamically adjusting the calculated value comprises: Acquiring an initial calculation path identifier and a calculation cost value in a multi-channel target association diagram corresponding to each reduced candidate image; Acquiring a graph transmission history, identifying the edge connection strength variation amplitude of a corresponding candidate region in the graph transmission history, and calculating the node diffusivity and response times of the candidate region; Calculating a propagation influence value of a corresponding candidate region based on the dynamic feedback parameter subset, if the change amplitude of the propagation influence value is higher than a preset propagation change range, estimating and updating the calculation cost value of the candidate region, and adjusting a calculation path identifier based on a preset calculation path configuration table; and carrying out priority adjustment on the corresponding candidate areas based on the updated calculated path identifiers and the calculated cost values, and integrating all the adjusted multi-channel target association graphs to serve as optimization target association graphs.
- 8. The method for detecting an image object based on deep learning according to claim 7, wherein the means for performing statistical density adjustment comprises: identifying scale levels of each optimization target association graph, and extracting the number of candidate areas, the area occupation ratio of the candidate areas and the distribution density of the candidate areas in each scale level as statistical density parameters; setting a statistical density sensitivity threshold, comparing each component in the statistical density parameter with a corresponding statistical density sensitivity threshold, and calculating a density contrast coefficient of a corresponding scale level; Matching a channel mapping rate in a preset channel adjustment strategy based on the density contrast coefficient, and adjusting channel weights of corresponding scale levels based on the channel mapping rate; And transmitting the adjusted scale level information to a corresponding optimization target association diagram to obtain an enhanced feature target association diagram.
- 9. The method for detecting an image object based on deep learning according to claim 8, wherein the means for performing object detection comprises: And extracting a fusion feature vector corresponding to each candidate region in each enhanced feature target association graph, carrying out target class prediction and bounding box regression on the corresponding candidate region based on the fusion feature vector, and outputting a target class label and a corresponding bounding box parameter as a target detection result.
- 10. A deep learning-based image target detection system for implementing the deep learning-based image target detection method of any one of claims 1 to 9, comprising: The data acquisition module acquires image data to be detected, generates candidate areas and combines data cleaning treatment to obtain a candidate area image data set; The redundancy elimination module is used for carrying out reduced semantic region screening on the candidate region image data set, eliminating the redundant semantic region image and obtaining a reduced candidate image data set; the association construction module is used for extracting multidimensional association information from the reduced candidate image data set and constructing a target area association diagram based on the multidimensional association information and the correlation relation of the corresponding candidate areas; the unified fusion module is used for constructing a target sub-image set for the reduced candidate image dataset by combining the target region association graph and the parameter information of the candidate region, carrying out information unified fusion on the target sub-image set, and outputting a multichannel target association graph; the association adjustment module is used for dynamically adjusting the calculation value of the multi-channel target association graph, synchronously updating the calculation priority of each candidate area in the multi-channel target association graph and obtaining an optimized target association graph; the system comprises a density adjusting module, an enhancement feature target correlation diagram, a target detection module and a target detection module, wherein the density adjusting module is used for carrying out statistic density adjustment on the optimization target correlation diagram and outputting the enhancement feature target correlation diagram, the target detection is carried out based on the enhancement feature target correlation diagram, and the target detection results are output.
Description
Image target detection system and method based on deep learning Technical Field The invention relates to the technical field of computer vision, in particular to an image target detection system and method based on deep learning. Background The deep learning target detection technology is an image recognition means based on a convolutional neural network, can automatically position and recognize various types of target objects from complex scenes, and is widely applied to the fields of video monitoring, industrial quality inspection and the like; In the initial target detection stage, a large number of candidate target areas often exist in an image, for example, when a certain candidate area covers a plurality of possible targets, the characteristics of the candidate areas can be mixed with multi-class semantics, however, the traditional target detection method only focuses on the candidate areas so as to neglect the situation that the candidate areas with undefined semantics enter a subsequent graph structure to propagate and cause misjudgment, in addition, in the process of constructing the associated graph, an asymmetric semantic dependency relationship often exists between different targets, for example, a traffic light has a guiding effect on pedestrians and has weaker reverse relationship, however, the traditional target detection method often adopts a symmetrical adjacent matrix to construct a graph structure, and regards the relationship between nodes as bidirectional equivalence, so that the graph structure cannot accurately express the semantic dominant direction between targets and influence the effectiveness of information propagation, and meanwhile, the traditional target detection method easily ignores the complex semantic combination relationship between targets to form a single-structure associated graph, so that the expression capability of the graph structure is limited, on the other hand, when the candidate areas are in the process of constructing the graph, for the state change, for example, the traditional target detection method needs to dynamically adjust a calculation path, and the traditional target detection method has a poor state, however, the traditional target detection method has a high-level resource distribution density is not considered, and the situation of the fact that the dynamic resource is fused with the actual resource is not fused with the actual resource distribution layer-related to the dynamic resource is not considered, and the situation of the dynamic resource is not fused, the accuracy and efficiency balance of target detection are affected. In view of the above, the present invention provides an image target detection system and method based on deep learning to solve the above-mentioned problems. Disclosure of Invention In order to overcome the defects in the prior art and achieve the purposes, the invention provides a deep learning-based image target detection method, which comprises the following steps: s1, collecting image data to be detected, generating candidate areas, and combining data cleaning treatment to obtain a candidate area image data set; s2, performing reduced semantic region screening on the candidate region image data set, and removing redundant semantic region images to obtain a reduced candidate image data set; S3, extracting multidimensional association information from the reduced candidate image data set, and constructing a target area association diagram based on the multidimensional association information and the correlation relation of the corresponding candidate areas; s4, constructing a target sub-image set for the reduced candidate image data set by combining the target area association graph and the parameter information of the candidate area, and carrying out information unified fusion on the target sub-image set to output a multichannel target association graph; S5, dynamically adjusting the calculation value of the multi-channel target association graph, and synchronously updating the calculation priority of each candidate area in the multi-channel target association graph to obtain an optimized target association graph; S6, carrying out statistic density adjustment on the optimized target association diagram, outputting an enhanced feature target association diagram, carrying out target detection based on the enhanced feature target association diagram, and outputting a target detection result. Further, the method for performing reduced semantic region screening includes: Dividing each candidate region image in the candidate region image dataset into regions, and outputting a candidate region and other regions; extracting the regional parameter characteristics of the candidate region of each candidate region image, and constructing a regional characteristic vector of the candidate region based on the regional parameter characteristics; Calculating the feature similarity of the region feature vector and the feature vector of