CN-122024742-A - Voiceprint recognition method and device, electronic equipment and storage medium

CN122024742ACN 122024742 ACN122024742 ACN 122024742ACN-122024742-A

Abstract

The application provides a voiceprint recognition method, a voiceprint recognition device, electronic equipment and a storage medium, wherein the method comprises the steps of acquiring initial registration voice and initial verification voice; and performing channel conversion on the initial registration voice according to the target channel characteristic corresponding to the initial registration voice and/or performing channel conversion on the initial verification voice according to the target channel characteristic corresponding to the initial verification voice to obtain target registration voice and target verification voice under the same channel type so as to perform voiceprint recognition. The application can convert the registered voice and the verification voice into the same channel type, effectively improve the recognition performance of the cross-channel voice print, solve the problem of characteristic mismatch, keep the stability of the model noise resistance and avoid the discrimination threshold.

Inventors

JIA XUPENG
LUO LIUPING
ZHENG RONG
DENG JING

Assignees

北京远鉴信息技术有限公司

Dates

Publication Date: 20260512
Application Date: 20260213

Claims (10)

1. A method of voiceprint recognition, the method comprising: acquiring initial registration voice and initial verification voice; determining target channel characteristics of initial voice by adopting a corresponding preset channel characteristic determining mode according to prior information states of initial voice corresponding to initial channel types, wherein the initial voice is the initial registration voice or the initial verification voice; and performing channel conversion on the initial registration voice according to the target channel characteristics corresponding to the initial registration voice, and/or performing channel conversion on the initial verification voice according to the target channel characteristics corresponding to the initial verification voice to obtain target registration voice and target verification voice under the same channel type so as to perform voiceprint recognition.
2. The voiceprint recognition method of claim 1, wherein determining the target channel characteristic of the initial voice according to the prior information state of the initial voice corresponding to the initial channel type by adopting a corresponding preset channel characteristic determination mode comprises: If the prior information state is a known state, determining target channel characteristics corresponding to the initial voice according to the voice sample under the initial channel type; If the prior information state is an unknown state, determining the target channel characteristic corresponding to the initial voice according to the channel characteristics corresponding to each preset channel type in a pre-established channel characteristic database.
3. The voiceprint recognition method according to claim 2, wherein determining a target channel characteristic corresponding to the initial voice from the voice sample under the initial channel type comprises; Calculating a first energy spectrum of the voice sample, wherein a first energy value of each voice frame in the first energy spectrum is used for representing the degree of influence of noise on energy distribution of the voice frame; Selecting a first key voice frame meeting a voice frame screening rule from the voice sample according to the first energy spectrum; And calculating the target channel characteristic corresponding to the initial voice according to the first energy value of the first key voice frame in the first energy spectrum.
4. The voiceprint recognition method of claim 2, wherein determining the target channel characteristic corresponding to the initial voice based on the channel characteristics corresponding to each preset channel type in the pre-established channel characteristics database comprises: inputting the initial voice into a channel classification model to obtain an initial channel type corresponding to the initial voice; And inquiring target channel characteristics corresponding to the initial voice from the channel characteristics database according to the initial channel type corresponding to the initial voice.
5. The voiceprint recognition method of claim 2, wherein determining the target channel characteristic corresponding to the initial voice according to the channel characteristics corresponding to each preset channel type in the pre-established channel characteristics database further comprises: Calculating a second energy spectrum of the initial voice, wherein a second energy value of each voice frame in the second energy spectrum is used for representing the degree of influence of noise on energy distribution of the voice frame; Selecting a second key voice frame meeting a voice frame screening rule from the initial voice according to the second energy spectrum; calculating initial channel characteristics corresponding to the initial voice according to a second energy value of the second key voice frame in the second energy spectrum; and selecting target channel characteristics corresponding to the initial voice from all the channel characteristics corresponding to each preset channel type according to the similarity between the initial channel characteristics and the channel characteristics corresponding to each preset channel type.
6. The voiceprint recognition method according to any one of claims 1 to 5, wherein the initial voice is channel-converted according to a target channel characteristic corresponding to the initial voice by: For each frequency point corresponding to a target channel characteristic, calculating a channel difference value between a channel characteristic value of the frequency point in the target channel characteristic and a channel characteristic value of the frequency point in a standard channel characteristic, wherein the standard channel characteristic is the channel characteristic of the initial voice after channel conversion; calculating a filter according to the channel difference value of each frequency point and the frequency of each frequency point; And convolving the initial voice and the filter obtained by calculation to obtain a corresponding target voice, wherein the target voice is a target registration voice or a target verification voice.
7. A voiceprint recognition apparatus, the apparatus comprising: the acquisition module is used for acquiring initial registration voice and initial verification voice; The system comprises a determining module, a determining module and a determining module, wherein the determining module is used for determining the target channel characteristic of initial voice by adopting a corresponding preset channel characteristic determining mode according to the prior information state of the initial voice corresponding to the initial channel type, wherein the initial voice is the initial registration voice or the initial verification voice; And the channel conversion module is used for carrying out channel conversion on the initial registration voice according to the target channel characteristics corresponding to the initial registration voice and/or carrying out channel conversion on the initial verification voice according to the target channel characteristics corresponding to the initial verification voice to obtain target registration voice and target verification voice under the same channel type so as to carry out voiceprint recognition.
8. The voiceprint recognition device of claim 7, wherein the determining module is specifically configured to: If the prior information state is a known state, determining target channel characteristics corresponding to the initial voice according to the voice sample under the initial channel type; If the prior information state is an unknown state, determining the target channel characteristic corresponding to the initial voice according to the channel characteristics corresponding to each preset channel type in a pre-established channel characteristic database.
9. An electronic device comprising a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor in communication with the storage medium via the bus when the electronic device is in operation, the processor executing the machine-readable instructions to perform the steps of the voiceprint recognition method of any one of claims 1 to 6.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the voiceprint recognition method according to any one of claims 1 to 6.

Description

Voiceprint recognition method and device, electronic equipment and storage medium Technical Field The invention relates to the field of cross-channel voiceprint recognition, in particular to a voiceprint recognition method, a voiceprint recognition device, electronic equipment and a storage medium. Background Voiceprint recognition is a biometric technique that discriminates the identity of a speaker by analyzing unique acoustic features of individual voices. Deep learning-based schemes are the current mainstream technology path, but there is still room for improvement in cross-channel recognition performance of the technology. Cross-channel voiceprint recognition refers to the task of speaker identity discrimination when registered speech and verified speech originate from different recording devices, codecs, sampling rates, or acoustic environments. Under the scene, different channels can generate nonlinear distortion with different degrees and different types on the audio signal, so that not only is the stability of a voiceprint model destroyed, but also the problem of mismatching of reasoning features and training features can be caused under a deep learning framework, and further the noise resistance of the model is reduced, and the discrimination threshold is invalid. Disclosure of Invention Accordingly, the present application is directed to a voiceprint recognition method, apparatus, electronic device, and storage medium, which can convert registered voice and verified voice into the same channel type, effectively improve the recognition performance of cross-channel voiceprint, solve the problem of feature mismatch, maintain the stability of model noise resistance, and avoid discrimination threshold. In a first aspect, an embodiment of the present application provides a voiceprint recognition method, where the method includes: acquiring initial registration voice and initial verification voice; determining target channel characteristics of initial voice by adopting a corresponding preset channel characteristic determining mode according to prior information states of initial voice corresponding to initial channel types, wherein the initial voice is the initial registration voice or the initial verification voice; and performing channel conversion on the initial registration voice according to the target channel characteristics corresponding to the initial registration voice, and/or performing channel conversion on the initial verification voice according to the target channel characteristics corresponding to the initial verification voice to obtain target registration voice and target verification voice under the same channel type so as to perform voiceprint recognition. In a possible implementation manner, the determining, according to the prior information state of the initial voice corresponding to the initial channel type, the target channel characteristic of the initial voice by adopting a corresponding preset channel characteristic determining manner includes: If the prior information state is a known state, determining target channel characteristics corresponding to the initial voice according to the voice sample under the initial channel type; If the prior information state is an unknown state, determining the target channel characteristic corresponding to the initial voice according to the channel characteristics corresponding to each preset channel type in a pre-established channel characteristic database. In one possible implementation manner, the determining the target channel characteristic corresponding to the initial voice according to the voice sample under the initial channel type includes; Calculating a first energy spectrum of the voice sample, wherein a first energy value of each voice frame in the first energy spectrum is used for representing the degree of influence of noise on energy distribution of the voice frame; Selecting a first key voice frame meeting a voice frame screening rule from the voice sample according to the first energy spectrum; And calculating the target channel characteristic corresponding to the initial voice according to the first energy value of the first key voice frame in the first energy spectrum. In one possible implementation manner, the determining, according to the channel characteristics corresponding to each preset channel type in the pre-established channel characteristic database, the target channel characteristic corresponding to the initial voice includes: inputting the initial voice into a channel classification model to obtain an initial channel type corresponding to the initial voice; And inquiring target channel characteristics corresponding to the initial voice from the channel characteristics database according to the initial channel type corresponding to the initial voice. In one possible implementation manner, the determining the target channel characteristic corresponding to the initial voice according to the channel characteristic