CN-115983345-B - Model searching method, device, equipment and storage medium

CN115983345BCN 115983345 BCN115983345 BCN 115983345BCN-115983345-B

Abstract

The embodiment of the application discloses a method, a device, equipment and a storage medium for model searching, which comprise the steps of obtaining a search space, wherein the search space comprises a convolution module and a visual conversion module, the visual conversion module comprises a self-attention submodule used for executing a self-attention algorithm, the complexity of the self-attention algorithm is O (n), the convolution module is adopted to preprocess marked training images, the visual conversion module is adopted to conduct image processing based on target tasks on the preprocessed results, a first gradient of the search space is determined based on first loss between the image processing results and marking of the training images, gradient updating is conducted on the search space based on the first gradient to obtain a target search space, the target search space is used for searching a target model, and the target model is used for conducting image processing based on the target tasks on the target images to be processed. Therefore, the searched model has linear complexity, and the problem that the model is difficult to deploy is effectively solved.

Inventors

OU JUNHONG

Assignees

哲库科技(上海)有限公司

Dates

Publication Date: 20260505
Application Date: 20230103

Claims (12)

1. A method of model searching, the method comprising: obtaining a search space, wherein the search space comprises a convolution module and a visual conversion module, and the visual conversion module comprises a self-attention sub-module for executing a self-attention algorithm, and the complexity of the self-attention algorithm is O (n); Preprocessing the marked training image by adopting the convolution module; adopting the vision conversion module to perform image processing based on a target task on the preprocessing result; determining a first gradient of the search space based on a first loss between a result of the image processing and a label of the training image; The search space is used as a teacher model, and the method further comprises the following steps: Determining at least one sub-model in the search space as a student model; Determining, for each of the student models, a third gradient of the student model based on a second loss between an image processing result of the student model and an image processing result of the teacher model; Determining, for each of the student models, a fourth gradient of the student model based on a third loss between an image processing result of the student model and a labeling of the training image, determining, if the first gradient is consistent with a direction of the fourth gradient, a result of adding the third gradient to the fourth gradient as a second gradient of the student model; and carrying out gradient update on the teacher model based on the first gradient, and carrying out gradient update on the corresponding student model based on the second gradient of each student model to obtain a target search space, wherein the target search space is used for searching a target model, and the target model is used for carrying out image processing based on the target task on a target image to be processed.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises, Each student model comprises a convolution sub-module obtained by sampling the convolution module and a vision conversion sub-module obtained by sampling the vision conversion module.
3. The method according to claim 1 or 2, characterized in that the method further comprises: Determining, for each of the fourth gradients, a cosine value between the first gradient and the fourth gradient; determining that the direction of the first gradient is consistent with the direction of the fourth gradient when the cosine value is greater than or equal to zero; And determining that the directions of the first gradient and the fourth gradient are inconsistent under the condition that the cosine value is smaller than zero.
4. The method according to claim 1 or 2, wherein said determining at least one sub-model in the search space as a student model comprises: Obtaining a candidate parameter set, wherein the candidate parameter set comprises model parameters selectable by the convolution sub-module and model parameters selectable by the vision conversion sub-module; Determining target model parameters selected by the convolution sub-module and target model parameters selected by the vision conversion sub-module from the candidate parameter set; And sampling the search space based on the target model parameters selected by the convolution sub-module and the target model parameters selected by the vision conversion sub-module to obtain corresponding sub-models, and determining the sub-models as the student models.
5. The method according to claim 1 or 2, characterized in that the method further comprises: The target model is obtained by searching from the target search space based on a search algorithm, wherein the search algorithm comprises any one of an evolution search algorithm, a random search algorithm and a reinforcement learning search algorithm.
6. The method of claim 5, wherein the search algorithm comprises an evolutionary search algorithm, wherein the searching the target model from the target search space based on the search algorithm comprises: sampling the target search space to obtain at least two seed models, and executing recombination operation and/or mutation operation on each seed model to obtain at least two generation models; Wherein the reorganizing operation comprises exchanging at least one pair of model parameters in a model, and the mutating operation comprises randomly changing at least one model parameter within a predefined range of values; estimating the accuracy of the image processing results of the at least two sub-generation models by adopting a pre-trained verification model; The target model is determined from the at least two sub-models based on the accuracy of the image processing results of the at least two sub-models.
7. The method of claim 6, wherein sampling the target search space results in at least two seed models, comprising: Sampling the target search space based on model constraint conditions to obtain at least two seed models meeting the model constraint conditions, wherein the model constraint conditions comprise at least one of the following: the model parameter is smaller than or equal to the preset parameter, the hardware delay is smaller than or equal to the preset delay when the model is operated, and the floating point operation times carried out by the hardware per second are smaller than or equal to the preset times when the model is operated.
8. The method of claim 1 or 2, the method further comprising: And the self-attention sub-module is adopted to carry out the following operation processing on the preprocessing result of the convolution module: Multiplying the first weight matrix with the preprocessing result to obtain a first feature matrix; multiplying the second weight matrix with the preprocessing result to obtain a second feature matrix; sampling the first feature matrix to obtain a third feature matrix; performing first operation processing on the first feature matrix and the third feature matrix based on a Gaussian kernel function to obtain a first operation result; and performing second operation processing on the first operation result and the second feature matrix to obtain an output result of the self-attention sub-module.
9. The method of claim 5, wherein the target task comprises any one of: an image classification task, an image detection task, and an image segmentation task.
10. An apparatus for model searching, the apparatus comprising: The system comprises an acquisition module, a visual conversion module and a processing module, wherein the acquisition module is used for acquiring a search space, the search space comprises a convolution module and the visual conversion module, the visual conversion module comprises a self-attention sub-module used for executing a self-attention algorithm, and the complexity of the self-attention algorithm is O (n); The preprocessing module is used for preprocessing the marked training image by adopting the convolution module; the image processing module is used for performing image processing based on a target task on the preprocessing result by adopting the visual conversion module; a first determining module for determining a first gradient of the search space based on a first loss between a result of the image processing and a annotation of the training image; the search space is used as a teacher model, and the device further comprises: a second determining module for determining at least one sub-model in the search space as a student model; A distillation module for determining, for each of the student models, a third gradient of the student model based on a second loss between an image processing result of the student model and an image processing result of the teacher model; A third determining module configured to determine, for each of the student models, a fourth gradient of the student model based on a third loss between an image processing result of the student model and a labeling of the training image, determine, when the first gradient is identical to the fourth gradient in direction, a result of adding the third gradient to the fourth gradient as a second gradient of the student model, and determine, when the first gradient is not identical to the fourth gradient in direction, the third gradient as the second gradient of the student model; The gradient updating module is used for carrying out gradient updating on the teacher model based on the first gradient and carrying out gradient updating on the corresponding student model based on the second gradient of each student model to obtain a target search space, wherein the target search space is used for searching a target model, and the target model is used for carrying out image processing based on the target task on a target image to be processed.
11. A model search apparatus, characterized in that the model search apparatus comprises: A memory for storing computer executable instructions; a processor, coupled to the memory, for implementing the method of any one of claims 1 to 9 by executing the computer-executable instructions.
12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by at least one processor, implements the method of any one of claims 1 to 9.

Description

Model searching method, device, equipment and storage medium Technical Field The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a storage medium for model searching. Background A neural network model (hereinafter referred to simply as a transducer model) based on a visual transducer (transducer) may be used for a variety of image processing tasks such as image classification, image detection, image segmentation, and the like. In order to find the optimal network structure of the transducer model, a network structure search (Neural Architecture Search, NAS) method can be adopted, and automatic parameter adjustment of the transducer model can be realized by constructing a search space. However, in the conventional network structure search method for the transducer model, the complexity of the search space is high, and is usually O (n 2), so that a great amount of computation and storage are required, which is difficult for end-side deployment. Disclosure of Invention The embodiment of the application provides a method, a device, equipment and a storage medium for model searching. The technical scheme of the embodiment of the application is realized as follows: The embodiment of the application provides a model searching method, which comprises the steps of obtaining a search space, wherein the search space comprises a convolution module and a visual conversion module, the visual conversion module comprises a self-attention sub-module used for executing a self-attention algorithm, the complexity of the self-attention algorithm is O (n), preprocessing an annotated training image by the convolution module, image processing based on a target task is carried out on a preprocessed result by the visual conversion module, a first gradient of the search space is determined based on a first loss between the image processing result and the annotation of the training image, gradient updating is carried out on the search space based on the first gradient to obtain a target search space, the target search space is used for searching a target model, and the target model is used for carrying out image processing based on the target task on a target image to be processed. In a second aspect, the embodiment of the application provides a device for searching a model, which comprises an acquisition module, a first determination module and a gradient updating module, wherein the acquisition module is used for acquiring a search space, the search space comprises a convolution module and a visual conversion module, the visual conversion module comprises a self-attention submodule used for executing a self-attention algorithm, the complexity of the self-attention algorithm is O (n), the preprocessing module is used for preprocessing a marked training image by adopting the convolution module, the image processing module is used for performing image processing based on a target task by adopting the visual conversion module, the first determination module is used for determining a first gradient of the search space based on a first loss between the result of the image processing and the marking of the training image, the gradient updating module is used for performing gradient updating on the search space to obtain a target search space, and the target search space is used for searching a target model and the target model is used for performing image processing based on the target task. In a third aspect, an embodiment of the present application provides a model searching device, which includes a memory and a processor, where the memory is configured to store computer executable instructions, and the processor is connected to the memory and configured to implement the method according to the first aspect by executing the computer executable instructions. In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program which, when executed by at least one processor, implements a method as described in the first aspect. The method, the device, the equipment and the computer readable storage medium for model searching provided by the embodiment of the application can acquire a search space, wherein the search space comprises a convolution module and a visual conversion module, the visual conversion module comprises a self-attention submodule used for executing a self-attention algorithm, the complexity of the self-attention algorithm is O (n), when the search space is trained, the convolution module is adopted to preprocess marked training images, the visual conversion module is adopted to process images based on target tasks on the preprocessed results, a first gradient of the search space is determined based on first loss between the image processing results and the marking of the training images, gradient update is carried out on the search space based on the first gradient, and