CN-121982297-A - Multi-target scene instance separation processing method based on visual recognition
Abstract
The invention discloses a multi-target scene instance separation processing method based on visual identification, which relates to the technical field of image processing, and comprises the steps of extracting features of an input image, generating a plurality of rectangular candidate bounding boxes through a target detection algorithm based on a depth neural network, extracting a local area image from each rectangular bounding box, obtaining a real boundary curve of a corresponding target according to boundary fitting of the local area image, analyzing fitting result features of the real boundary curve of the target, calculating a boundary extraction qualified value, judging a fitting result state of the real boundary curve of the target according to the boundary extraction qualified value, and outputting the real boundary of the target.
Inventors
- LI HAIYAN
- ZHOU YONGQIANG
- LI BINGCHUN
- ZHANG BOFENG
Assignees
- 喀什大学
Dates
- Publication Date
- 20260505
- Application Date
- 20251204
Claims (10)
- 1. The multi-target scene instance separation processing method based on visual recognition is characterized by comprising the following steps of: extracting features of an input image, generating a plurality of rectangular candidate boundary boxes through a target detection algorithm based on a depth neural network, wherein the rectangular candidate boundary boxes are used for representing the positions of targets in the image, and generating a plurality of rectangular boundary box sets; Extracting local area images from each rectangular boundary frame, and obtaining a real boundary curve of a corresponding target according to boundary fitting of the local area images; analyzing the fitting result characteristics of the real boundary curve of the target, and calculating a boundary extraction qualification value; and judging the fitting result state of the real boundary curve of the target according to the boundary extraction qualification value, and outputting the real boundary of the target.
- 2. The visual recognition-based multi-target scene instance separation processing method according to claim 1, wherein the step of generating a plurality of rectangular candidate bounding boxes by a deep neural network-based target detection algorithm comprises: the input image is subjected to feature extraction through a convolutional neural network to generate a multi-scale feature map; Generating anchor points by sliding windows on the feature map, and respectively predicting the target existence probability of each anchor point and the boundary regression offset relative to the anchor points through target classification and boundary regression; calculating a rectangular boundary frame corresponding to each anchor point by using the target existence classification probability and the boundary regression offset, performing non-maximum suppression operation on the rectangular boundary frame, removing repeated frames, and reserving the frames with high confidence; and outputting a final rectangular candidate boundary box set.
- 3. The visual recognition-based multi-target scene instance separation processing method according to claim 1, wherein the step of extracting a local area image from each rectangular bounding box and obtaining a true boundary curve of a potential target according to boundary fitting of the local area image comprises: extracting a local area image from each rectangular boundary box, and carrying out feature extraction on the local area image through a convolutional neural network to generate a local feature map; On the extracted feature map, pixel aggregation and separation operation of instance perception constraint is executed, and the pixel features of the same target instance are ensured to be aggregated together in a feature space through instance cohesion constraint, so that the pixel features in the same target instance are consistent; After constraint optimization, generating a mask of a target instance, extracting boundary points in the mask based on the generated instance mask, and fitting the extracted boundary point set by a curve fitting method to generate a real boundary curve of the target; And smoothing the fitted boundary curve to obtain a real boundary curve, and outputting the real boundary curve corresponding to each rectangular boundary frame.
- 4. The visual recognition-based multi-target scene instance separation processing method according to claim 1, wherein the step of analyzing the fitting result characteristics of the true boundary curve of the target and calculating the boundary extraction qualification value is as follows: The fitting result features comprise boundary topology stability values and information propagation consistency values, and the boundary topology stability values and the information propagation consistency values are added to obtain boundary extraction qualified values.
- 5. The visual recognition-based multi-target scene instance separation processing method according to claim 4, wherein the calculating step of the boundary topology stability value is: Dividing a real boundary curve of a target into a plurality of boundary points, and calculating the angle of each boundary point in a boundary point set; Calculating the absolute difference value of the angles of the adjacent boundary points to be used as the angle difference value between the two adjacent boundary points; Dividing the angle difference value between two adjacent boundary points by the Euclidean distance, taking the divided result as a first ratio value, and taking the average value of all the first ratio values as the boundary self-adaptive change rate; Calculating the curvature of each boundary point, calculating the average value of all curvatures, calculating the absolute difference value of each curvature and the curvature average value, and recording the average value of all the absolute difference values as the boundary morphology distortion degree; and calculating a boundary topology stability value according to the boundary self-adaptive change rate and the boundary morphology distortion degree.
- 6. The visual recognition-based multi-objective scene instance separation processing method according to claim 5, wherein the step of calculating the boundary topology stability value according to the boundary adaptive change rate and the boundary morphology distortion degree is: And calculating the sum of the boundary self-adaptive change rate, the boundary morphology distortion degree and the numerical value 1, and taking the reciprocal of the sum as a boundary topology stability value.
- 7. The method for separating and processing multiple target scene instances based on visual recognition according to claim 4, wherein the step of calculating the information propagation consistency value is: dividing a real boundary curve of a target into a plurality of boundary points to form a boundary point set Wherein Index of boundary point, boundary point Representing coordinates of each pixel on the boundary of the object; For each boundary point Calculating the local direction angle of the corresponding image information , In which, in the process, Represent the first The number of boundary points is chosen to be the number of boundary points, A gray value representing the image at that location; 、 respectively representing images in the transverse direction And longitudinally (a) Direction deviation; Representation points Corresponding to the local direction angle of the image information.
- 8. The visual recognition-based multi-objective scene instance separation processing method according to claim 7, wherein the step of calculating the information propagation consistency value further comprises: For each pair of adjacent boundary points , Calculating local direction jump value , And calculates the jump intensity ratio according to all the jump values Taking the inverse of the value of the jump intensity ratio added with the numerical value 1 as an information direction consistency factor, wherein the calculated formula is as follows; In which, in the process, Represent the first The value of the local directional jump, Representing a jump strength ratio; For each point Constructing local image information diffusion tensors And calculating the Frobenius norm of the diffusion tensor between adjacent points as the disturbance amplitude ; According to all disturbance amplitudes Calculating the diffusion tensor disturbance ratio Taking the reciprocal of the value of the diffusion tensor disturbance ratio plus the numerical value 1 as a diffusion consistency factor, and calculating the formula as follows: , represent the first The amplitude of each disturbance; calculating a local direction jump value between a boundary starting point and an ending point to be used as a head-tail information direction difference, and taking the reciprocal of the value obtained by adding the head-tail information direction difference with a value 1 as a return consistency factor; Information direction consistency factor, diffusion consistency factor and return consistency factor Multiplying to obtain information propagation consistency value.
- 9. The visual recognition-based multi-target scene instance separation processing method according to claim 1, wherein the step of judging a fitting result state of a true boundary curve of the target according to the boundary extraction qualification value, and outputting the true boundary of the target comprises: comparing the boundary extraction qualified value corresponding to the real boundary curve of each target with a preset threshold value, if the boundary extraction qualified value is not smaller than the preset threshold value, the real boundary curve of the target is qualified, and the real boundary of the corresponding target is directly output.
- 10. The visual recognition-based multi-target scene instance separation processing method according to claim 1, wherein the step of judging a fitting result state of a true boundary curve of the target according to the boundary extraction qualification value, and outputting the true boundary of the target further comprises: If the boundary extraction qualified value is smaller than the preset threshold, the boundary curve representing the target is unqualified, extracting the boundary curve of the corresponding target again until the boundary curve of the target is qualified, and outputting the real boundary of the corresponding target.
Description
Multi-target scene instance separation processing method based on visual recognition Technical Field The invention relates to the technical field of image processing, in particular to a multi-target scene instance separation processing method based on visual identification. Background Along with the rapid development of computer vision and deep learning technology, object detection and image segmentation have become two core tasks in the field of vision recognition. In the existing target detection method, such as fast R-CNN, SSD, YOLO, EAST, a convolutional neural network is generally adopted to perform feature extraction on an input image, and a rectangular or rotated rectangular bounding box is predicted on a feature map to represent the approximate position of a target. The method has good detection efficiency in a conventional scene, and can realize rapid positioning of a plurality of targets. However, in natural complex scenes, such as street view photos, billboards, package printing, medical images, aerial maps, etc., the target shape often exhibits curved, inclined, irregular or densely arranged features, resulting in that the conventional rectangular detection frame cannot accurately describe the real boundary of the target. For example, textual information may be distributed along circular arcs, waves or free curves, industrial part boundaries may have variations in relief, and structures in medical images often exhibit complex contours. Under such circumstances, the rectangular or fixed-shape detection frame can only provide rough positions, the boundary details cannot be accurately described, although the traditional method of directly performing pixel-level segmentation can provide high-precision boundaries, the calculation load is heavy, the reasoning speed is slow, real-time requirements are difficult to meet, especially on equipment with limited resources, and the traditional method lacks an effective mechanism for quality evaluation and correction of boundaries separated by each target scene instance, cannot judge whether boundary results are qualified or not, and cannot automatically refine and optimize the boundaries, so that the separation effect of targets is poor, the recognition precision is low, and a large amount of calculation resources are wasted. Disclosure of Invention The invention aims to solve the problems and provide a multi-target scene instance separation processing method based on visual identification. In a first aspect of the present invention, a method for separating and processing multiple target scene instances based on visual recognition is first provided, where the method includes: The method comprises the following steps: extracting features of an input image, generating a plurality of rectangular candidate boundary boxes through a target detection algorithm based on a depth neural network, wherein the rectangular candidate boundary boxes are used for representing the positions of targets in the image, and generating a plurality of rectangular boundary box sets; Extracting local area images from each rectangular boundary frame, and obtaining a real boundary curve of a corresponding target according to boundary fitting of the local area images; analyzing the fitting result characteristics of the real boundary curve of the target, and calculating a boundary extraction qualification value; and judging the fitting result state of the real boundary curve of the target according to the boundary extraction qualification value, and outputting the real boundary of the target. Optionally, the step of generating a plurality of rectangular candidate bounding boxes by a deep neural network based target detection algorithm comprises: the input image is subjected to feature extraction through a convolutional neural network to generate a multi-scale feature map; Generating anchor points by sliding windows on the feature map, and respectively predicting the target existence probability of each anchor point and the boundary regression offset relative to the anchor points through target classification and boundary regression; calculating a rectangular boundary frame corresponding to each anchor point by using the target existence classification probability and the boundary regression offset, performing non-maximum suppression operation on the rectangular boundary frame, removing repeated frames, and reserving the frames with high confidence; and outputting a final rectangular candidate boundary box set. Optionally, the step of extracting a local area image from each rectangular bounding box and obtaining a true boundary curve of the potential target according to the boundary fit of the local area image includes: extracting a local area image from each rectangular boundary box, and carrying out feature extraction on the local area image through a convolutional neural network to generate a local feature map; On the extracted feature map, pixel aggregation and separation operation of i