CN-115761351-B - Target identification method and device and electronic equipment

CN115761351BCN 115761351 BCN115761351 BCN 115761351BCN-115761351-B

Abstract

The invention provides a target identification method, a target identification device and electronic equipment, which are used for improving the accuracy of target identification. The method comprises the steps of inputting a training image into a model to be trained, determining the confidence that a first target in the training image is of a possible type, determining the maximum confidence as the first confidence in the possible type of the first target, marking the possible type corresponding to the first confidence as the first possible type, determining the confidence which is lower than the first confidence as the second confidence, marking the possible type corresponding to the second confidence as the second possible type, responding to the fact that the first possible type is consistent with a first preset type label of the first target, adjusting parameters in the model to be trained by using a first loss function until the value of the first loss function is smaller than a first preset threshold value, obtaining a classification model, inputting the image to be detected into the classification model, and determining the target type of the target in the image to be detected.

Inventors

SHEN KONGHUAI
SHAO MING

Assignees

浙江大华技术股份有限公司

Dates

Publication Date: 20260508
Application Date: 20221123

Claims (12)

1. A method of object recognition, comprising: inputting a training image into a model to be trained, and determining the confidence coefficient of a first target in the training image as a possible type, wherein the training image comprises a one-to-one correspondence between the first target and a first preset type label, and the first preset type label is used for indicating a first preset type; Determining that the highest confidence is a first confidence and marking the possible type corresponding to the first confidence as a first possible type, and determining that the confidence which is lower than the first confidence is a second confidence and marking the possible type corresponding to the second confidence as a second possible type; Responding to the first possible type consistent with a first preset type label of the first target, and adjusting parameters in the model to be trained by using a first loss function to reduce the gap between the attribute elements of the first possible type and the attribute elements of the first preset type, and increasing the gap between the attribute elements of the first possible type and the attribute elements of the second possible type until the value of the first loss function is smaller than a first preset threshold value to obtain a classification model; Inputting the image to be detected into the classification model, and determining the target type of the target in the image to be detected.
2. The method of claim 1, wherein the first loss function is , L cls is a classification loss function corresponding to the gap between the attribute elements of the first possible type and the attribute elements of the first preset type, L cnfuse is a classification loss function corresponding to the gap between the attribute elements of the first possible type and the attribute elements of the second possible type, gamma is a preset overparameter, epsilon is a preset constant, PT j is an attribute element of the first possible type, and PT i is an attribute element of the second possible type; Is the distance of PT i and PT j in the feature space.
3. The method of claim 1, wherein prior to adjusting parameters in the model to be trained using the first loss function, further comprising: and updating the attribute elements of the first preset type by using the attribute elements corresponding to the first possible type to obtain updated attribute elements of the first preset type.
4. The method of claim 3, wherein updating the attribute elements of the first preset type with the attribute elements corresponding to the first possible type to obtain updated attribute elements of the first preset type comprises: Updating the attribute elements of the first preset type by using an updating formula to obtain the updated attribute elements of the first preset type, wherein the updating formula is as follows: ; , Is a preset reconciliation factor which is set to be, For the number of iterations, j is the first possible type of flag, PT j （t） is the updated attribute element of the first preset type, PT j （t-1） is the attribute element of the first preset type, T j (t) is the hidden layer input corresponding to the updated attribute element of the first preset type, score j is the first confidence.
5. The method of any one of claims 1-4, wherein after the responding to the first possible type corresponding to the first confidence level being consistent with the first preset type tag of the first target, further comprising: And if not, adjusting parameters in the model to be trained by using a second loss function to reduce the gap between the attribute elements of the first possible type and the attribute elements of the first preset type until the value of the second loss function is smaller than the first preset threshold value, thereby obtaining the classification model.
6. The method of claim 5, wherein the second loss function is the L cls .
7. The method according to any one of claims 1-4, 6, wherein said inputting the image to be detected into the classification model, determining the object type of the object in the image to be detected, comprises: inputting an image to be detected into a target classification model to obtain N image blocks, wherein the N image blocks are image blocks with the same shape and different sizes, the image blocks with each size respectively and independently form the image to be detected, and N is an integer greater than 1; Determining the size of a target image block subjected to feature fusion according to the distance between the target image block and the minimum size image block, and performing feature fusion by taking the minimum size image block as a processing unit based on self-attention calculation to obtain a feature matrix of the minimum size image block and a feature matrix of the image to be detected, wherein the size of the target image block increases along with the increase of the distance; and identifying the target type of the target in the image to be detected based on the feature matrix of the image to be detected.
8. The method of claim 7, wherein determining the target image block size of the feature fusion according to the distance between the target image block and the minimum size image block, and performing feature fusion with the minimum size image block as a processing unit based on self-attention calculation to obtain the feature matrix of the minimum size image block and the feature matrix of the image to be detected, comprises: dividing the image to be detected into M areas comprising the image blocks with the minimum size, wherein the M areas are in a nested relation, the shapes of the M areas are the same as those of the image blocks with the minimum size, M is an integer larger than 1, and M is less than or equal to N; Determining the size of the target image block of feature fusion between each region and the minimum-size image block in the M regions and the target image block corresponding to the size of the target image block of each region respectively based on the distance between the M regions and the minimum-size image block; performing feature fusion on the minimum-size image block and the target image blocks in the M areas by using the self-attention calculation to obtain a feature matrix of the minimum-size image block; And combining the feature matrix of the image block with the minimum size to obtain the feature matrix of the image to be detected.
9. The method of claim 8, wherein the M = N, the N = 2, then the N-sized tiles include a first tile and a second tile, the first tile being the same shape as the second tile, and a size of any four of the first tile combinations being the same size as the second tile; Determining the size of the target image block of the feature fusion according to the distance between the target image block and the minimum size image block, and performing the feature fusion by taking the minimum size image block as a processing unit based on self-attention calculation to obtain a feature matrix of the minimum size image block, wherein the method comprises the following steps: Dividing the image to be detected into a first area and a second area which comprise a first image block to be processed, wherein the first area is embedded in the second area, and the size of the first area is not smaller than that of the second image block; Determining that the size of the target image block subjected to feature fusion with the first image block to be processed is the size of the first image block in the first area, and determining that the target image block of the first area is the first image block; determining that the size of the target image block subjected to feature fusion with the first image block to be processed is the size of the second image block in the second region, and determining that the target image block of the second region is the second image block; And based on the self-attention calculation, performing feature fusion by using the first image block in the first area and the first image block to be processed, and performing feature fusion by using the second image block in the second area and the first image block to be processed, so as to obtain a feature matrix of the first image block to be processed.
10. An apparatus for object recognition, comprising: The input unit is used for inputting the training image into the model to be trained and determining the confidence that the first target in the training image is of a possible type; the training image comprises a one-to-one correspondence between the first target and a first preset type label, wherein the first preset type label is used for indicating a first preset type; the marking unit is used for determining that the maximum confidence is a first confidence in the possible types of the first target, marking the possible type corresponding to the first confidence as a first possible type, determining that the confidence which is lower than the first confidence is a second confidence, and marking the possible type corresponding to the second confidence as a second possible type; The adjustment unit is used for responding to the fact that a first possible type is consistent with a first preset type label of the first target, adjusting parameters in the model to be trained by using a first loss function, enabling the gap between the attribute elements of the first possible type and the attribute elements of the first preset type to be reduced, and enabling the gap between the attribute elements of the first possible type and the attribute elements of the second possible type to be increased until the value of the first loss function is smaller than a first preset threshold value, so as to obtain a classification model; the detection unit is used for inputting the image to be detected into the classification model and determining the target type of the target in the image to be detected.
11. A readable storage medium having a function of storing a program, it is characterized by comprising the following steps of, The memory device is used for storing the data, The memory is configured to store instructions that, when executed by a processor, cause an apparatus comprising the readable storage medium to perform the method of any of claims 1-9.
12. An electronic device, comprising: A memory for storing a computer program; A processor for executing a computer program stored on the memory to implement the method of any one of claims 1-9.

Description

Target identification method and device and electronic equipment Technical Field The present application relates to the field of image recognition technologies, and in particular, to a method and an apparatus for target recognition, and an electronic device. Background At present, the deep learning model in the field of image recognition greatly improves the effects of tasks such as classification, detection and the like through learning and training of massive data. However, due to factors such as the posture of the target, the imaged scene, the view angle, the shielding and the like, the deep learning model is not ideal for distinguishing the targets among different categories. The types of vehicles are classified into trucks, buses, cars, tricycles and the like, taking the distinction of vehicle types as an example. For some business scenarios, more detailed division is required, for example, trucks are further divided into small trucks, medium trucks, large trucks, and the like. When different types of targets are identified in a large class of targets, the targets are obviously easier to be influenced by factors such as illumination, imaging angles, imaging distances and the like, and the problem of reduced accuracy of target type identification appears. Therefore, the accuracy of target recognition in the prior art needs to be improved. Disclosure of Invention The invention provides a target identification method, a target identification device and electronic equipment, which are used for improving the accuracy of target identification. In a first aspect, an embodiment of the present application provides a method for identifying a target, including: inputting a training image into a model to be trained, and determining the confidence coefficient of a first target in the training image as a possible type, wherein the training image comprises a one-to-one correspondence between the first target and a first preset type label, and the first preset type label is used for indicating a first preset type; Determining that the maximum confidence coefficient is marked as a first confidence coefficient in the possible types of the first target, and marking the possible type corresponding to the first confidence coefficient as a first possible type; determining that only the confidence coefficient lower than the first confidence coefficient is a second confidence coefficient, and marking the possible type corresponding to the second confidence coefficient as a second possible type; Responding to the first possible type consistent with a first preset type label of the first target, and adjusting parameters in the model to be trained by using a first loss function to reduce the gap between the attribute elements of the first possible type and the attribute elements of the first preset type, and increasing the gap between the attribute elements of the first possible type and the attribute elements of the second possible type until the value of the first loss function is smaller than a first preset threshold value to obtain a classification model; Inputting the image to be detected into the classification model, and determining the target type of the target in the image to be detected. According to the embodiment of the application, in the training process of the model to be trained, the first loss function is utilized to reduce the gap between the first possible type and the second possible type, and meanwhile, the attribute elements of the two possible types (namely the first possible type and the second possible type) with higher confidence level are subjected to distance punishment in the feature space, namely the gap between the attribute elements of the first possible type and the attribute elements of the second possible type is increased, so that the model to be trained can accurately distinguish the first possible type and the second possible type after training, namely the model to be trained can more accurately distinguish and distinguish targets of similar characteristics and different types, and the accuracy of identifying the target type in the image to be detected is effectively improved. In one possible implementation, the first loss function includes a first sub-loss function and a second sub-loss function, and the second sub-loss function is inversely related to a distance between the attribute elements of the first possible type and the attribute elements of the second possible type in the feature space. In one possible implementation, the first loss function is l=l cls+γ·Lconfuse,L cls is a classification loss function corresponding to the gap between the attribute elements of the first possible type and the attribute elements of the first preset type, L cnfuse is a classification loss function corresponding to the gap between the attribute elements of the first possible type and the attribute elements of the second possible type, gamma is a preset overparameter, epsilon is a preset constant, PT j