CN-122020225-A - Training method of semantic quantization model, and generation method and system of semantic identifier

CN122020225ACN 122020225 ACN122020225 ACN 122020225ACN-122020225-A

Abstract

The embodiment of the specification provides a training method of a semantic quantization model, a generating method and a generating system of a semantic identifier, wherein the semantic quantization model comprises M quantization layers which are sequentially connected. The training system obtains a sample data set, and sequentially carries out iterative training on each quantization layer according to the connection sequence of the quantization layers. Each iteration of the mth quantization layer comprises the steps that the training system sequentially transmits the description information of the sample object to the mth quantization layer, an input vector of each sample is obtained, and the input vector is encoded by utilizing an encoding network of the mth quantization layer, so that an encoding vector of each sample is obtained. And then the training system generates each codeword vector and the associated mth hierarchical category label in the codebook based on the hierarchical category label and the clustering condition of the coding vectors of each sample so as to obtain the target codeword vector corresponding to each sample, thereby determining the semantic alignment loss of the mth quantization layer. Finally, based on at least semantic alignment loss, updating parameters of the coding network.

Inventors

YANG ZHIXIANG
YUAN GONGLIN
ZHANG ZHENGYANG
Weng Zujian
MA CHENGUANG

Assignees

支付宝(杭州)数字服务技术有限公司
大连理工大学

Dates

Publication Date: 20260512
Application Date: 20260202

Claims (17)

1. A training method of a semantic quantization model, the semantic quantization model comprising M quantization layers connected in sequence, M being an integer greater than 1, the method comprising: Obtaining a sample data set, wherein each sample in the sample data set comprises description information of a sample object and a corresponding multi-level category label, and According to the connection sequence of the M quantization layers, sequentially utilizing the sample data set to carry out iterative training on each quantization layer, wherein at least an mth quantization layer exists in the M quantization layers, M is an integer smaller than M, and each round of iteration of the mth quantization layer comprises: for each sample in the sample data set, sequentially transmitting the description information of the sample object from the 1 st quantization layer to the m-th quantization layer, and obtaining the input vector of the sample in the m-th quantization layer; Coding the input vector of each sample by using the coding network of the mth quantization layer to obtain the coding vector of each sample, generating each codeword vector in the codebook corresponding to the mth quantization layer based on the mth hierarchical class label to which each sample belongs and the clustering condition of the coding vector of each sample, determining the mth hierarchical class label associated with each codeword vector, Determining target codeword vectors to be aligned of the code vectors of the samples based on the codeword vectors and the associated mth hierarchical category labels in the codebook corresponding to the mth quantization layer, determining semantic alignment loss of the mth quantization layer based on the target codeword vectors and the code vectors corresponding to the samples, and Updating parameters of the coding network in the mth quantization layer based at least on the semantic alignment loss.
2. The method of claim 1, wherein the number of samples in the sample dataset is N, and wherein for an i-th sample, i takes a value from 1 to N, the code vector for the i-th sample is noted as The target codeword vector corresponding to the ith sample is marked as ; The semantic alignment penalty is based at least on And (3) with Similarity determination between them and used for constraint in feature space To the direction of Close to each other.
3. The method of claim 2, wherein the codebook divides a target codeword vector The codeword vectors outside are noted as The semantic alignment penalty is also based on And (3) with Similarity determination between them and used for constraint in feature space Away from 。
4. A method according to claim 3, wherein the semantic alignment loss Based on the following formula: Wherein, the Is based on codeword vector Associated mth hierarchical category labels and codeword vectors The weight coefficient determined by the associated mth hierarchical category label, Is the number of codeword vectors in the codebook corresponding to the mth quantization level.
5. The method of claim 4, wherein, The determination is based on the following formula: Wherein, the Representing codeword vectors Semantic vectors of associated mth hierarchical category labels; Representing codeword vectors Semantic vectors of associated mth hierarchical category labels; Representation of And (3) with The semantic correlation between the two, Is a super parameter.
6. The method of claim 1, wherein each round of iterations of the mth quantization layer further comprises: for each sample, carrying out semantic quantization on coding vectors of the samples by the 1 st quantization layer to the m th quantization layer, outputting m codeword vectors, and decoding the m codeword vectors to obtain a reconstruction vector; determining a reconstruction loss based on a difference between an embedded vector corresponding to the description information of the sample object and the reconstruction vector; the updating of parameters of the coding network in the mth quantization layer based at least on the semantic alignment loss comprises: constructing a target loss based on the reconstruction loss and the semantic alignment loss; And updating parameters of the coding network in the mth quantization layer based on the target loss.
7. The method of claim 6, wherein each iteration of the mth quantization layer further comprises determining a layer normalization loss for the mth quantization layer, wherein the layer normalization loss characterizes a degree to which codeword vectors in the codebook are uniformly used; The constructing a target loss based on the reconstruction loss and the semantic alignment loss includes: The target penalty is constructed based on the reconstruction penalty, the semantic alignment penalty, and the layer normalization penalty.
8. The method of claim 7, wherein the layer normalization is lost Based on the following formula: Wherein, the For the frequency of use of the kth codeword vector in the codebook, Is a super parameter.
9. The method of claim 1, wherein an nth quantization layer is also present in the M quantization layers, the n being an integer less than or equal to M, each iteration of the nth quantization layer comprising: for each sample in the sample data set, sequentially transmitting the description information of the sample object from the 1 st quantization layer to the n-th quantization layer, and obtaining the input vector of the sample in the n-th quantization layer; encoding the input vector of each sample by using the encoding network of the n-th quantization layer to obtain the encoding vector of each sample, generating each codeword vector in the codebook corresponding to the n-th quantization layer based on the clustering condition of the encoding vector of each sample, Quantizing the code vectors of each sample based on each code word vector in the codebook corresponding to the nth quantization layer to output the code word vector, and determining the layer standardization loss of the nth quantization layer based on the semantic quantization condition of each sample, wherein the layer standardization loss represents the uniform use degree of each code word vector in the codebook, and Updating parameters of the coding network in the nth quantization layer based at least on the layer normalization loss.
10. The method of claim 10, wherein the layer normalization is lost Based on the following formula: Wherein, the For the frequency of use of the kth codeword vector in the codebook, Is a super parameter.
11. The method of claim 10, wherein the nth quantization layer is located after the mth quantization layer.
12. The method of claim 1, wherein when m = 1, the input vector is an embedded vector corresponding to descriptive information of the sample object; when m >1, the input vector is a residual vector between the coding vector of the m-1 th quantization layer and the codeword vector output from the m-1 th layer.
13. The method of claim 1, wherein generating each codeword vector in a codebook corresponding to an mth quantization layer based on an mth hierarchical category label to which each sample belongs and a clustering condition of coded vectors of each sample, and determining an mth hierarchical category label associated with each codeword vector, comprises: clustering the coding vectors of each sample to obtain K clustering clusters, and determining the clustering center vectors corresponding to the K clustering clusters; respectively using the cluster center vectors corresponding to the K clusters as codeword vectors in the codebook corresponding to the m-th quantization layer, and For each codeword vector, determining the mth hierarchical category label associated with the codeword vector based on the mth hierarchical category label to which each sample belongs in the cluster corresponding to the codeword vector.
14. The method of claim 13, wherein determining the target codeword vector to which the encoded vector of each sample is to be aligned based on each codeword vector and its associated mth-level category label in the codebook corresponding to the mth quantization layer, respectively, comprises: for each sample, determining at least one candidate codeword vector associated with the corresponding category label from a codebook corresponding to the mth quantization layer based on the mth level category label to which the sample belongs; And taking the candidate codeword vector with the maximum similarity with the coding vector of the sample as the target codeword vector in the at least one candidate codeword vector.
15. The method of claim 1, wherein the sample object is a service or commodity; the descriptive information of the sample object includes descriptive information of at least one modality of text, image or voice.
16. A method of generating a semantic identifier, the method comprising: Acquiring description information of a target object; A trained semantic quantization model is obtained, the semantic quantization model comprises M quantization layers connected in sequence, each quantization layer comprises a coding network and a codebook, the codebook comprises a plurality of codeword vectors, and M is an integer greater than 1. Inputting the description information of the target object into the semantic quantization model to perform layer-by-layer quantization and residual error transfer through the M quantization layers to obtain codeword identifiers respectively output by each quantization layer, wherein for any mth quantization layer, the quantization process comprises encoding an input vector by using an encoding network of the mth quantization layer to obtain an encoded vector, semantically quantizing the encoded vector into one codeword vector in the codebook based on a codebook of the mth quantization layer, and outputting codeword identifiers for identifying corresponding codeword vectors, and And generating an identifier sequence for identifying the target object based on the codeword identifiers respectively output by the M quantization layers.
17. A computing system, comprising: at least one storage medium storing at least one instruction set, and At least one processor communicatively coupled to the at least one storage medium, wherein the at least one processor reads the at least one instruction set when the computing system is running and performs the method of any one of claims 1-15 or performs the method of claim 16 as directed by the at least one instruction set.

Description

Training method of semantic quantization model, and generation method and system of semantic identifier Technical Field The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a training method for a semantic quantization model, and a method and a system for generating a semantic identifier. Background With the rapid development of digital services and electronic commerce, a recommendation system or a search system has become a key bridge connecting users with services or goods. In order for a recommender system or retrieval system to handle the massive demands efficiently, it is often necessary to translate descriptive information for an object (e.g. text or image descriptive information for a commodity) into an identifier that can be quickly retrieved by a computer. However, identifiers currently output by models often lack sufficient characterization capability, making it difficult to adequately characterize a particular object. Such insufficient characterizations often result in that, during the actual application process, the system (such as a recommendation system or a search system) applying the identifier often cannot effectively locate the desired object by using the identifier, thereby affecting the output accuracy and user experience of these systems. Therefore, how to train the semantic quantization model, so that the trained semantic quantization model can output the identifier with strong semantic representation capability becomes the current urgent problem to be solved. The statements in this background section merely provide information to the inventors and may not represent prior art to the present disclosure nor may they represent prior art to the filing date of the present disclosure. Disclosure of Invention The specification provides a training method of a semantic quantization model, a generating method and a system of a semantic identifier, which can adopt a strategy of iterative training layer by layer, utilize multi-level category labels to guide the generation of codebooks of all quantization layers, respectively update parameters of all layers of coding networks by restraining the degree of difference between coding vectors output by the quantization layers and target codeword vectors to be aligned, improve the convergence rate of the model, and enable the trained semantic quantization model to output an identifier sequence with semantic representation and distinction degree. According to a first aspect, the specification provides a training method of a semantic quantization model, the semantic quantization model comprises M quantization layers which are sequentially connected, M is an integer larger than 1, the method comprises the steps of obtaining a sample data set, each sample in the sample data set comprises description information of sample objects and corresponding multi-level class labels of the sample objects, and sequentially carrying out iterative training on each quantization layer by using the sample data set according to the connection sequence of the M quantization layers, wherein at least an M-th quantization layer exists in the M quantization layers, the M is an integer smaller than M, each round of iteration of the M-th quantization layer comprises sequentially transmitting the description information of the sample objects from the 1 st quantization layer to the M-th quantization layer for each sample in the sample data set, and obtaining an input vector of the sample at the M-th quantization layer, coding the input vector of each sample by using a coding network of the M-th quantization layer, obtaining a coding vector of each sample, determining a quantization vector of each sample based on the M-th quantization layer, a corresponding to the M-th quantization layer, and a corresponding codeword of each sample layer, and determining a corresponding sample in the M-th quantization layer, and a corresponding codeword, and determining a vector of the M-th quantization layer, and a corresponding sample layer, and a vector of the sample is determined based on the M-th quantization layer, and a corresponding sample layer, and a vector of the corresponding sample layer, and a vector is determined based on the corresponding to the M-quantization layer, and a vector, and updating the parameters of the coding network in the mth quantization layer. In some embodiments, the number of samples in the sample data set is N, and for the ith sample, i takes a value from 1 to N, the coding vector of the ith sample is recorded asThe target codeword vector corresponding to the ith sample is marked asThe semantic alignment loss is based at least onAnd (3) withSimilarity determination between them and used for constraint in feature spaceTo the direction ofClose to each other. In some embodiments, the codebook divides the target codeword vectorThe codeword vectors outside are noted asThe semantic alignment penalty is also based onAnd (