CN-116705041-B - Quantization method of speaker verification model, electronic device, and storage medium

CN116705041BCN 116705041 BCN116705041 BCN 116705041BCN-116705041-B

Abstract

The application discloses a quantization method of a speaker verification model, electronic equipment and a storage medium, wherein the quantization method of the speaker verification model comprises the steps of obtaining real-value weights of all layers of the speaker verification model, mapping the real-value weights of all layers to a fixed integer set, or dynamically determining binary weights corresponding to the real-value weights of each layer so as to better match real-value weight distribution. The method of the embodiment of the application provides two brand new quantization strategies, namely static quantization and self-adaptive quantization. Furthermore, for static quantization, the embodiment of the application provides a weight regularization technology to maintain maximum information entropy and reduce information loss. Furthermore, the embodiment of the application also provides an adaptive quantization scheme, which can dynamically determine the optimal binary value of each layer so as to realize better alignment with real-value weight distribution.

Inventors

YU KAI
LIU BEI
WANG HAOYU
QIAN YANMIN

Assignees

思必驰科技股份有限公司

Dates

Publication Date: 20260508
Application Date: 20230608

Claims (4)

1. A method of quantifying a speaker verification model, comprising: acquiring real-value weights of all layers of a speaker verification model; The method comprises the steps of dynamically determining the binary weight corresponding to the real-value weight of each layer to better match the real-value weight distribution, wherein the method comprises the steps of aligning a center beta of the binary with an average value of the real-value weight distribution, determining a standard deviation of a real-value weight matrix as a boundary-to-center distance alpha, and determining the binary { beta-alpha, beta + alpha } of each layer according to the center beta of the binary and the boundary-to-center distance alpha, wherein alpha and beta can be dynamically updated together with the real-value weight in the training process of each layer.
2. The method of claim 1, wherein the dynamically determining the binary weight corresponding to the real-valued weight for each layer comprises: KL divergence is used to measure the similarity of distribution between the binarized weights and the real-valued weights.
3. An electronic device comprising at least one processor and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of claim 1 or 2.
4. A storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the method of claim 1 or 2.

Description

Quantization method of speaker verification model, electronic device, and storage medium Technical Field The embodiment of the application relates to the technical field of neural networks, in particular to a quantization method of a speaker verification model, electronic equipment and a storage medium. Background In the related art, speaker verification (Speaker verification, SV) involves determining whether or not the registered audio and the test audio are spoken by the same person. The mode of the SV system underwent a transition from traditional i-vector and Probabilistic linear discriminant analysis (Probabilitic LINEAR DISCRIMINANT ANALYSIS, PLDA) to speaker-dependent learning using deep learning techniques. Recently, the performance of SV systems has been significantly improved due to the use of deeper and larger neural networks. For example, some related art proposes a ResNet model of depth-first version and increases the depth of the network to 233 to a large extent. Other techniques further push the depth of ResNet models to 293 and achieve an impressive performance improvement. While large models have achieved modest results, they typically consume significant storage and computing resources, impeding deployment on mobile devices. Developing a lightweight speaker verification system that is custom to a mobile device is a challenging and demanding task. In previous studies, those skilled in the art have explored several approaches to miniaturized speaker verification systems verification, including knowledge distillation and efficient architectural design. Knowledge distillation is a common compression method to transfer knowledge from a teacher's network to a student's network. Although it is possible to improve the performance of student networks without expanding the model size, deploying these networks on mobile devices is still challenging due to the considerable parameters involved. On the other hand, many efforts have been made to manually design more efficient computing operators and network architectures. To reduce computational costs, researchers have focused on replacing computationally intensive convolution operations with lightweight convolution operations and introduced a more efficient architecture suitable for embedded use cases. Although the number of parameters and the computational complexity are greatly reduced, serious performance degradation occurs, and the requirement of SV application in real life can hardly be met. Disclosure of Invention The embodiment of the invention provides a quantization method of a speaker verification model, electronic equipment and a storage medium, which are used for at least solving one of the technical problems. In a first aspect, an embodiment of the present invention provides a quantization method for a speaker verification model, including obtaining real-valued weights of all layers of the speaker verification model, mapping the real-valued weights of all layers to a fixed integer set, or dynamically determining binary weights corresponding to the real-valued weights of each layer, so as to better match real-valued weight distribution. In a second aspect, an embodiment of the present invention provides an electronic device, including at least one processor, and a memory communicatively coupled to the at least one processor, where the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform a quantization method of any one of the speaker verification models of the present invention. In a third aspect, embodiments of the present invention provide a storage medium having stored therein one or more programs including execution instructions that are readable and executable by an electronic device (including, but not limited to, a computer, a server, or a network device, etc.) for performing the quantization method of any one of the above speaker verification models of the present invention. In a fourth aspect, embodiments of the present invention also provide a computer program product comprising a computer program stored on a storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the quantization method of any one of the speaker verification models described above. The method of the embodiment of the application provides two brand new quantization strategies, namely static quantization and self-adaptive quantization. Furthermore, for static quantization, the embodiment of the application provides a weight regularization technology to maintain maximum information entropy and reduce information loss. Furthermore, the embodiment of the application also provides an adaptive quantization scheme, which can dynamically determine the optimal binary value of each layer so as to realize better alignment with real-value weight distribution. Drawings In or