CN-115690624-B - Aerial image target inclined frame detection method and device

CN115690624BCN 115690624 BCN115690624 BCN 115690624BCN-115690624-B

Abstract

The application discloses a method and a device for detecting an aerial image target inclined frame, and belongs to the technical field of target detection. The method comprises the steps of extracting a first feature map of an aerial image by using a backbone network, obtaining a second feature map of the first feature map by using a feature pyramid network, generating a horizontal candidate frame of the second feature map by using a region generation network, extracting the horizontal feature map of the horizontal candidate frame by using a feature extractor, obtaining a first rotating rectangular frame from the horizontal feature map by using a rotating rectangular frame learner, setting the first rotating rectangular frame as an initial rotating rectangular frame, iteratively executing a rotating rectangular frame updating step of preset iteration times, judging whether the current iteration times reach the preset iteration times, setting a higher threshold value if the current iteration times do not reach the preset iteration times, setting the second rotating rectangular frame as an initial rotating rectangular frame, and executing the rotating rectangular frame updating step, otherwise, taking the current second rotating rectangular frame as a final rotating rectangular frame.

Inventors

JING XIAOYU
YIN QIDONG
LIU WEI
HUO ZHIHANG
WANG JIANGAN

Assignees

土豆数据科技集团有限公司

Dates

Publication Date: 20260505
Application Date: 20221028

Claims (8)

1. The aerial image target inclined frame detection method is characterized by comprising the following steps of: extracting a first feature map of the acquired aerial image by using a backbone network, and acquiring a second feature map of the first feature map by using a feature pyramid network; Generating a horizontal candidate frame of the second feature map by using a region generation network, and extracting a horizontal feature map of the horizontal candidate frame by using a feature extractor; The method comprises the steps of obtaining a first rotating rectangular frame from a horizontal feature map by using a rotating rectangular frame learner, setting the first rotating rectangular frame as an initial rotating rectangular frame, wherein the step of obtaining the first rotating rectangular frame from the horizontal feature map by using the rotating rectangular frame learner comprises the steps of classifying the horizontal candidate frame, taking a sample with a threshold value larger than 0.5 as a positive sample and taking a sample with a threshold value smaller than 0.5 as a negative sample, wherein the threshold value is the intersection ratio of the horizontal candidate frame and the circumscribed rectangle of a true value inclined frame, using the rotating rectangular frame learner trained by a second number of the positive samples and the negative sample to predict the rotating rectangular frame in a regression mode, outputting 5 regression parameters comprising a center point regression parameter, a long regression parameter and an angle in the regression process, and obtaining coordinates corresponding to the aerial image through reverse thrust conversion of the horizontal candidate frame and the regression parameters; Iteratively executing a rotating rectangular frame updating step of preset iteration times, wherein the rotating rectangular frame updating step comprises the steps of acquiring a first rotating feature image of the initial rotating rectangular frame by using a rotating feature extractor, extracting a second rotating rectangular frame from the first rotating feature image by using a rotating target detection head, and outputting 5 regression parameters comprising a center point regression parameter, a long and wide regression parameter and an angle review parameter in the regression process, wherein the first rotating rectangular frame is classified, a sample with a threshold value larger than 0.6 is taken as a positive sample, and a sample with a threshold value smaller than 0.6 is taken as a negative sample, wherein the threshold value is the intersection ratio of the first rotating rectangular frame and a true value inclined frame; and when the iteration preset times are not reached, setting a higher threshold value, setting the second rotating rectangular frame as the initial rotating rectangular frame, and otherwise, taking the current second rotating rectangular frame as the final rotating rectangular frame.
2. The method of claim 1, wherein the dimensions of the acquired aerial image are The dimension of the first feature map obtained after feature extraction is ; The second feature map comprises 4 dimensions, respectively , , And 。
3. The method according to claim 1 or 2, wherein the generating a horizontal candidate box of the second feature map using a region generation network comprises: generating 9 preset anchor frames with different areas and length-width ratios at each anchor point in the second characteristic diagram by using the area generating network; Classifying the preset anchor frames, taking a sample with a threshold value larger than 0.7 as a positive sample and taking a sample with a threshold value smaller than 0.3 as a negative sample, wherein the threshold value is the cross ratio of a true positive frame to a regression parameter predicted value; Generating a network predicted anchor frame coordinate offset using the region trained from the first number of positive samples and the negative samples; and calculating corrected anchor frame coordinates according to the coordinate offset of the true positive frame and the predicted anchor frame, so as to obtain the horizontal candidate frame of the second feature map.
4. The method of claim 1, wherein the backbone network is a swin-transformer backbone network.
5. The method of claim 1, wherein the predetermined number of iterations is no more than 3.
6. An aerial image target inclined frame detection device, which is characterized by comprising: the feature acquisition module is used for extracting a first feature map of the acquired aerial image by using a backbone network and acquiring a second feature map of the first feature map by using a feature pyramid network; the feature extraction module is used for generating a horizontal candidate frame of the second feature map by utilizing a region generation network and extracting a horizontal feature map of the horizontal candidate frame by utilizing a feature extractor; The initialization module is used for acquiring a first rotating rectangular frame from the horizontal feature map by using a rotating rectangular frame learner and setting the first rotating rectangular frame as an initial rotating rectangular frame, wherein the acquisition of the first rotating rectangular frame from the horizontal feature map by using the rotating rectangular frame learner comprises the steps of classifying the horizontal candidate frame, taking a sample with a threshold value larger than 0.5 as a positive sample and taking a sample with a threshold value smaller than 0.5 as a negative sample, wherein the threshold value is the intersection ratio of the horizontal candidate frame and the circumscribed rectangle of a truth value inclined frame, using the rotating rectangular frame learner trained by a second number of the positive samples and the negative sample to regress and predict the rotating rectangular frame, outputting 5 regression parameters comprising a center point regression parameter, a long regression parameter and an angle in the regression process, and obtaining coordinates corresponding to the aerial image by inverse-pushing transformation of the horizontal candidate frame and the regression parameters; The iteration module is used for iteratively executing a rotating rectangular frame updating step of preset iteration times, wherein the rotating rectangular frame updating step comprises the steps of acquiring a first rotating feature diagram of the initial rotating rectangular frame by using a rotating feature extractor, extracting a second rotating rectangular frame from the first rotating feature diagram by using a rotating target detection head, and outputting 5 regression parameters comprising a center point regression parameter, a long and wide regression parameter and an angle review parameter from the first rotating feature diagram by using the rotating target detection head, wherein the first rotating rectangular frame is classified, a sample with a threshold value larger than 0.6 is used as a positive sample, and a sample with a threshold value smaller than 0.6 is used as a negative sample, wherein the threshold value is the intersection ratio of the first rotating rectangular frame and a true value inclined frame; the regression module is used for executing a preset judging step, wherein the preset judging step comprises judging whether the preset iteration times are reached, setting a higher threshold value and setting the second rotating rectangular frame as the initial rotating rectangular frame if the current iteration times do not reach the preset iteration times, and executing the rotating rectangular frame updating step, otherwise, taking the current second rotating rectangular frame as a final rotating rectangular frame.
7. An apparatus, comprising: A processor; A memory for storing processor-executable instructions; the processor, when executing the executable instructions, implements the method of any one of claims 1 to 5.
8. A non-transitory computer readable storage medium comprising instructions for storing a computer program or instructions which, when executed, cause the method of any one of claims 1 to 5 to be implemented.

Description

Aerial image target inclined frame detection method and device Technical Field The application relates to the technical field of target detection, in particular to a method and a device for detecting an aerial image target inclined frame. Background Target detection, also called target extraction, is one of the research directions in the field of computer vision, and has wide application value in the fields of face recognition, intelligent traffic, automatic driving, medical treatment and the like. How to analyze information which can be understood by a computer from an image is a core problem to be solved by computer vision. Particularly in complex scenes, real-time processing needs to be performed on a plurality of targets, and automatic extraction and identification of the targets are important. In the existing target detection task, the image features are automatically learned by using a deep learning network model, and then the target detection accuracy of the deep learning network model is improved in a manual correction mode. However, in practical application, the target detection scenes are various, and some target objects which are not in the horizontal or vertical direction are often detected, and the time and the effort are consumed when a detection frame is corrected manually, so that currently used main algorithms mainly comprise a one-stage algorithm YOLO series and a two-stage algorithm fast-RCNN series. The YOLO series algorithm predicts based on the whole image information, directly completes classification and positioning of all targets in the whole image on the image, does not use a complex frame, has high detection speed but has low target detection accuracy, and the Faster-RCNN series algorithm firstly screens a large number of target candidate areas in the image to be detected and then detects the candidate areas, so that the detected targets have high accuracy but have low speed and are difficult to realize real-time detection of target objects. Disclosure of Invention The embodiment of the application solves the problem that the bottleneck is encountered in the improvement of the accuracy of the detection of the target inclined frame in the prior art by providing the method for detecting the target inclined frame of the aerial image, and realizes that the cascading thought is applied to the target detection, thereby improving the accuracy of the target inclined frame detection. In a first aspect, an embodiment of the present application provides a method for detecting an aerial image target bezel, where the method includes: The method comprises the steps of obtaining a first feature map of an aerial image through a backbone network, obtaining a second feature map of the first feature map through a feature pyramid network, generating a horizontal candidate frame of the second feature map through a region generation network, extracting the horizontal feature map of the horizontal candidate frame through a feature extractor, obtaining a first rotating rectangular frame from the horizontal feature map through a rotating rectangular frame learner, setting the first rotating rectangular frame as an initial rotating rectangular frame, and iteratively executing a rotating rectangular frame updating step of a preset iteration number, wherein the rotating rectangular frame updating step comprises the steps of obtaining the first rotating feature map of the initial rotating rectangular frame through the rotating feature extractor, extracting the second rotating rectangular frame from the first rotating feature map through a rotating target detection head, setting a higher threshold value when the preset iteration number is not reached, and setting the second rotating rectangular frame as the initial rotating rectangular frame, otherwise, taking the current second rotating rectangular frame as a final rotating rectangular frame. With reference to the first aspect, in one possible implementation manner, the dimensions of the acquired aerial image are h×w×c, and the dimensions of the first feature map obtained after feature extraction areThe second feature map comprises 4 dimensions, respectivelyAnd With reference to the first aspect, in a possible implementation manner, the generating, by the area generating network, a horizontal candidate box of the second feature map includes: The method comprises the steps of generating 9 preset anchor frames with different areas and length-width ratios at each anchor point in a second feature map by using a region generating network, classifying the preset anchor frames, taking samples with a threshold value larger than 0.7 as positive samples and samples with a threshold value smaller than 0.3 as negative samples, wherein the threshold value is the intersection ratio of a true positive frame and a regression parameter predicted value, generating network predicted anchor frame coordinate offset by using the regions trained by the first number of positive samp