CN-121767823-B - Image target identification and region discrimination method and device
Abstract
The invention discloses a method and a device for identifying an image target and distinguishing an area, and relates to the technical field of image mode identification and classification distinguishing. The method comprises the steps of obtaining a preset identification discrimination model comprising a feature generation and dual-classifier output module, preprocessing a source domain image (carrying target class pixel level true value labeling) and a target domain image, inputting the preprocessed source domain image and the target domain image into a feature generation module to obtain splicing prediction features and cross-domain alignment constraint losses, inputting the splicing prediction features into the dual-classifier output module to obtain cross-domain classification prediction results, carrying out multi-stage training on the preset model based on the true value labeling, the alignment constraint losses and the prediction results, obtaining a target region identification and region discrimination model with cross-scene generalization capability, and taking an image to be identified as input to output a target region identification tag map. The invention can effectively make up the defect of fine granularity characteristic alignment, optimize output space difference measurement logic and improve training stability and cross-domain adaptation capability.
Inventors
- NIU XIAONAN
- LIU JIAN
- Zong Leli
- SUN YANWEI
- WANG SHANGXIAO
- ZHOU MO
- Xiao Shengjuan
- ZHANG YIWU
- TANG ZHIMIN
- LIU HONGYING
Assignees
- 中国地质调查局南京地质调查中心(华东地质科技创新中心)
Dates
- Publication Date
- 20260505
- Application Date
- 20260303
Claims (6)
- 1. An image target recognition and region discrimination method is characterized by comprising the following steps: acquiring a preset identification discrimination model, wherein the preset identification discrimination model comprises a feature generation module and a double classifier output module; Preprocessing a source domain image and a target domain image carrying target class pixel level true value labels, inputting the source domain image and the target domain image into the feature generation module, and outputting splicing prediction features and cross-domain alignment constraint loss; Inputting the spliced prediction features into the dual classifier output module for classification prediction to obtain a cross-domain classification prediction result; Performing multi-stage training on the preset identification discrimination model by adopting the target class pixel level true value label, the cross-domain alignment constraint loss and the cross-domain classification prediction result to obtain a target region identification and region discrimination model; Inputting the image to be identified into the target area identification and area discrimination model to carry out target identification and area discrimination, and outputting a target area identification tag map; The feature generation module comprises a residual error network unit, an average pooling pyramid unit, a shift grid alignment module and a detail gating unit, the cross-domain alignment constraint loss comprises a shift regular term and a feature level alignment loss, the source domain image and the target domain image carrying a target class pixel level true value mark are preprocessed, the preprocessed source domain image and the target domain image are input into the feature generation module, and a splicing prediction feature and the cross-domain alignment constraint loss are output, and the feature generation module comprises: Preprocessing a source domain image and a target domain image carrying target class pixel level true value labels to obtain a preprocessed source domain image and a preprocessed target domain image; Inputting the preprocessing source domain image and the preprocessing target domain image into the residual error network unit to extract multi-layer residual error characteristics, and obtaining a first residual error characteristic, a second residual error characteristic and a third residual error characteristic; inputting the third residual features into the average pooling pyramid unit to pool so as to obtain semantic features; respectively inputting the first residual error characteristic and the second residual error characteristic into the detail gating unit to perform detail enhancement operation to obtain a first enhanced residual error characteristic and a second enhanced residual error characteristic; Inputting the first enhanced residual features, the second enhanced residual features and the semantic features into the shift grid alignment module to perform alignment and regular calculation to obtain a plurality of alignment features and the displacement regular term; Splicing the alignment features to obtain splicing prediction features; adopting the first residual error feature, the second residual error feature, the third residual error feature and the semantic feature to aggregate to obtain a source domain alignment feature and a target domain alignment feature; performing distribution matching on the source domain alignment feature and the target domain alignment feature to obtain the feature level alignment loss; The detail enhancement operation specifically comprises the following steps: extracting image gradient features from the source domain image and the target domain image; Splicing the feature to be enhanced with the image gradient feature, wherein the feature to be enhanced is the first residual feature or the second residual feature; Carrying out batch normalization on the splicing results to obtain first normalized splicing characteristics; Performing convolution operation on the first normalized splicing characteristic to obtain a convolution processing characteristic; carrying out batch normalization on the convolution processing features to obtain second normalized splicing features; performing Sigmoid activation on the second normalized splicing feature to obtain a gating coefficient graph; Carrying out gating enhancement by adopting the gating coefficient diagram and the feature to be enhanced to obtain gating enhancement features; performing ReLU activation on the gating enhancement feature to obtain an enhancement residual feature, wherein the enhancement residual feature is the first enhancement residual feature or the second enhancement residual feature; The dual classifier output module comprises a first task classifier and a second task classifier, the cross-domain classification prediction result comprises a source domain prediction result and a target domain prediction result, the splicing prediction feature is input into the dual classifier output module for classification prediction, and the cross-domain classification prediction result is obtained, and the method comprises the following steps: performing feature transformation on the spliced prediction features to obtain transformed features; performing predictive convolution operation on the transformation characteristics to obtain a category predictive graph; Respectively inputting the category prediction graph into the first task classifier and the second task classifier, and outputting the corresponding source domain prediction result and the target domain prediction result; The multi-stage training of the preset recognition discrimination model by adopting the target class pixel level true value labeling, the cross-domain alignment constraint loss and the cross-domain classification prediction result to obtain a target region recognition and region discrimination model comprises the following steps: Constructing a first loss function of a training stage A, a second loss function of a training stage B and a third loss function of a training stage C by adopting the target class pixel level true value label, the cross-domain alignment constraint loss and the cross-domain classification prediction result; The training phase A, the training phase B and the training phase C are circularly executed by taking the first loss function, the second loss function and the third loss function as targets, until the number of model iterations reaches a preset iteration number threshold or the average fluctuation amplitude of the first loss function, the second loss function and the third loss function in continuous preset rounds is smaller than a preset loss fluctuation threshold, and the target area identification and area discrimination model is obtained; the training stage A is specifically to update parameters of the feature generation module, the first task classifier and the second task classifier simultaneously; the training stage B is specifically to freeze the parameters of the feature generation module, and only update the parameters of the first task classifier and the second task classifier; The training stage C specifically freezes the parameters of the first task classifier and the second task classifier, and only updates the parameters of the feature generation module.
- 2. The method for identifying and discriminating an image object according to claim 1 wherein said alignment and regularization calculation is specifically: constructing a displacement grid of features to be aligned, wherein the features to be aligned are the first enhanced residual features or the second enhanced residual features or the semantic features; Adding a learnable integral displacement to perform displacement learning on the displacement grid to obtain an offset grid; performing convolution operation on the offset grid to generate a local displacement field; Differential sampling is carried out on the feature to be aligned based on the local displacement field, so that a plurality of alignment features are obtained; and calculating the square sum of the learnable integral displacement and the local displacement field to obtain the displacement regularization term.
- 3. The method for identifying and discriminating an image target according to claim 1 wherein said performing distribution matching on said source domain alignment feature and said target domain alignment feature to obtain said feature level alignment loss comprises: performing distribution mapping on the source domain alignment feature and the target domain alignment feature to obtain a source domain mapping feature and a target domain mapping feature; Respectively extracting a point set from the source domain mapping feature and the target domain mapping feature to obtain a source domain point set and a target domain point set; And calculating the optimal transmission distance between the source domain point set and the target domain point set by utilizing Xin Kehuo En algorithm to obtain the characteristic level alignment loss.
- 4. The method for identifying and discriminating an image object according to claim 1 wherein said constructing a first loss function of training phase a, a second loss function of training phase B, and a third loss function of training phase C using said object class pixel level true labels, said cross-domain alignment constraint loss, and said cross-domain classification prediction result comprises: Calculating a first source domain cross entropy loss by adopting a source domain prediction result output by the first task classifier to be compared with a target class pixel level true value label, and calculating a second source domain cross entropy loss by adopting a source domain prediction result output by the second task classifier to be compared with the target class pixel level true value label; Calculating a third source domain cross entropy loss by adopting a source domain prediction result output by the first task classifier and a target class pixel level true value labeling comparison, calculating a fourth source domain cross entropy loss by adopting a source domain prediction result output by the second task classifier and a target class pixel level true value labeling comparison, calculating the Euclidean distance between a target domain prediction result output by the first task classifier and a target domain prediction result output by the second task classifier and constructing an exponential form target domain output difference term, and summing the third source domain cross entropy loss, the fourth source domain cross entropy loss, the displacement regularization term and the target domain output difference term to obtain the second loss function; And calculating the Manhattan distance between the target domain prediction result output by the first task classifier and the target domain prediction result output by the second task classifier to obtain the third loss function.
- 5. A domain offset oriented image target area automatic extraction system according to the method of any one of claims 1-4, comprising: The acquisition module is used for acquiring a preset identification discrimination model, and the preset identification discrimination model comprises a feature generation module and a double classifier output module; The preprocessing module is used for preprocessing a source domain image and a target domain image carrying target class pixel level true value labels, inputting the feature generation module, and outputting splicing prediction features and cross-domain alignment constraint loss; The classification prediction module is used for inputting the splicing prediction features into the dual classifier output module to perform classification prediction to obtain a cross-domain classification prediction result; The iterative training module is used for carrying out multi-stage training on the preset identification discrimination model by adopting the target class pixel level true value label, the cross-domain alignment constraint loss and the cross-domain classification prediction result to obtain a target region identification and region discrimination model; And the target identification and region discrimination module is used for inputting the image to be identified into the target region identification and region discrimination model to carry out target identification and region discrimination and outputting a target region identification tag map.
- 6. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed, implements the image object recognition and area discrimination method according to any one of claims 1 to 4.
Description
Image target identification and region discrimination method and device Technical Field The invention relates to the technical field of image mode identification and classification discrimination, in particular to an image target identification and region discrimination method and device. Background With the development of high-resolution imaging and intelligent visual analysis technologies, image target recognition and region discrimination based on deep learning have been widely applied to tasks such as target region extraction, scene understanding and fine interpretation. In practical application, the image data often come from different imaging conditions, different scene backgrounds or different acquisition platforms, so that significant statistical distribution difference (domain offset) exists between the training data and the test data, and further the problems of precision reduction, unstable boundary and the like of an identification discrimination model obtained based on single data domain training in a new scene are caused. The existing unsupervised domain self-adaptive method generally improves the cross-domain generalization capability by aligning the characteristic distribution or the output distribution of a source domain and a target domain, but has the defects in a fine-granularity target extraction scene that on one hand, details and boundary information are easily influenced by noise and texture changes, so that characteristic alignment is insufficient, on the other hand, a part of output space difference measurement mechanism is unreasonable in design, loss oscillation and even abnormal values possibly occur in the training process, and therefore model convergence and cross-domain adaptation effects are influenced. Therefore, there is a need for an image target recognition and region determination method and apparatus that can maintain training stability under domain offset conditions and improve accuracy of fine-grained target region determination. Disclosure of Invention The invention provides an image target identification and region discrimination method and device, which solve the technical problems of insufficient accuracy of cross-scene discrimination and insufficient training stability caused by insufficient characteristic alignment and unstable output space difference measurement mechanism when the conventional unsupervised domain self-adaptive method performs fine-granularity target region identification and discrimination under a domain offset scene. The image target identification and region discrimination method provided by the first aspect of the invention comprises the following steps: acquiring a preset identification discrimination model, wherein the preset identification discrimination model comprises a feature generation module and a double classifier output module; Preprocessing a source domain image and a target domain image carrying target class pixel level true value labels, inputting the source domain image and the target domain image into the feature generation module, and outputting splicing prediction features and cross-domain alignment constraint loss; Inputting the spliced prediction features into the dual classifier output module for classification prediction to obtain a cross-domain classification prediction result; Performing multi-stage training on the preset identification discrimination model by adopting the target class pixel level true value label, the cross-domain alignment constraint loss and the cross-domain classification prediction result to obtain a target region identification and region discrimination model; and inputting the image to be identified into the target area identification and area discrimination model to carry out target identification and area discrimination, and outputting a target area identification tag map. Optionally, the feature generating module includes a residual network unit, an average pooling pyramid unit, a shift grid alignment module and a detail gating unit, the cross-domain alignment constraint loss includes a shift regularization term and a feature level alignment loss, the preprocessing is performed on a source domain image and a target domain image carrying a target class pixel level true value label, the feature generating module is input, and a splice prediction feature and a cross-domain alignment constraint loss are output, including: Preprocessing a source domain image and a target domain image carrying target class pixel level true value labels to obtain a preprocessed source domain image and a preprocessed target domain image; Inputting the preprocessing source domain image and the preprocessing target domain image into the residual error network unit to extract multi-layer residual error characteristics, and obtaining a first residual error characteristic, a second residual error characteristic and a third residual error characteristic; inputting the third residual features into the average pooling pyra