CN-122024741-A - Apparatus, method and computer readable medium for identifying voiceprints

CN122024741ACN 122024741 ACN122024741 ACN 122024741ACN-122024741-A

Abstract

A device, a method and a computer readable medium for quickly identifying voiceprints based on space-time situation are provided, which can reduce the number of models for voiceprint comparison and achieve the technical effects of improving the speed and accuracy of identifying voiceprints by judging the current situation according to space-time labels and selecting candidate voiceprint models associated with the current situation from a voiceprint database after the space-time labels synchronous with voice signals are obtained and identifying the voiceprints in the voice signals according to the candidate voiceprint models.

Inventors

QIU QUANCHENG
YU TAO
YAN DAWEI
JIANG QIULONG

Assignees

上海顺诠科技有限公司
英业达股份有限公司

Dates

Publication Date: 20260512
Application Date: 20260212

Claims (9)

1. A method for quickly identifying voiceprints based on space-time context, which is applied to a device and at least comprises the following steps: Establishing a voiceprint database, wherein the voiceprint database comprises a plurality of voiceprint models, a plurality of space-time situations and the association degree of each voiceprint model and each space-time situation; acquiring a voice signal and a space-time tag synchronous with the voice signal; selecting a current context from the plurality of spatiotemporal contexts according to the spatiotemporal tag; screening a plurality of candidate voiceprint models from the plurality of voiceprint models, wherein the degree of association between the plurality of candidate voiceprint models and the current situation is higher than an association threshold value, and Identifying voiceprints in the speech signal according to the plurality of candidate voiceprint models.
2. The method of claim 1, wherein the step of creating the voiceprint database further comprises the step of determining a degree of association of each of the voiceprint models with each of the spatiotemporal contexts based on spatiotemporal data associated with each of the voiceprint models.
3. The method of claim 1, wherein the step of obtaining the voice signal and the space-time tag synchronized with the voice signal is to receive the voice signal and the space-time tag obtained by a voice obtaining device, to receive the voice signal and the space-time tag obtained by a different data obtaining device and synchronize the voice signal and the space-time tag, to receive the voice signal obtained by another voice obtaining device and extract an acoustic fingerprint of the voice signal and determine space data according to the acoustic characteristics to generate the space-time tag.
4. The method of claim 1, wherein the step of obtaining the speech signal further comprises the step of distinguishing a plurality of sub-speech signals from the speech signal according to the characteristics of the speech signal, and the step of identifying the speech signal according to the plurality of candidate speech models is to identify the speech pattern of each of the sub-speech signals according to the plurality of candidate speech models.
5. The method of claim 4, further comprising the steps of identifying voice content of each of the sub-voice signals after the step of identifying voice prints in the voice signals according to the plurality of candidate voice print models, and determining social relationships of speakers sending out the sub-voice signals according to the plurality of voice content.
6. The method of claim 1, wherein the step of selecting the current context from the plurality of spatiotemporal contexts based on the spatiotemporal tag further comprises the steps of extracting temporal data and spatial data from the spatiotemporal tag, providing the temporal data and the spatial data to an inference model to calculate an inference probability for each spatiotemporal context, and selecting the spatiotemporal context having the inference probability above a predetermined threshold as the current context.
7. The method of claim 1, further comprising the step of adjusting the correlation threshold based on the reliability or integrity of the spatiotemporal tag before the step of screening the plurality of candidate voiceprint models from the plurality of voiceprint models.
8. An apparatus for quickly identifying voiceprints based on a spatiotemporal context, the apparatus comprising: The voiceprint database is used for recording a plurality of voiceprint models, a plurality of space-time situations and the association degree of each voiceprint model and each space-time situation; The data acquisition module is used for acquiring a voice signal and a space-time tag synchronous with the voice signal; the situation judging module is used for selecting the current situation from the plurality of space-time situations according to the space-time labels; A model selection module for screening a plurality of candidate voiceprint models from the plurality of voiceprint models, the plurality of candidate voiceprint models having a degree of association with the current context higher than an association threshold, and The voiceprint recognition module is used for recognizing voiceprints in the voice signal according to the candidate voiceprint models.
9. A computer readable medium having stored thereon a computer program which, when executed by an apparatus, causes the apparatus to perform the method of fast recognition of voiceprints based on a spatiotemporal context of any one of claims 1 to 7.

Description

Apparatus, method and computer readable medium for identifying voiceprints Technical Field A voiceprint recognition device and method and computer readable medium thereof, in particular to a device, method and computer readable medium for quickly recognizing voiceprints based on space-time situation. Background With the popularization of artificial intelligence and biological recognition technologies, voiceprint recognition (Voiceprint Recognition) has been widely applied to the fields of identity verification, intelligent home, conference recording and the like. The conventional voiceprint recognition technology is mostly based on the acoustic characteristics of the voice signal itself, for example, comparing the voiceprint characteristics of the speaker through Mel-frequency cepstrum coefficients (Mel-Frequency Cepstral Coefficients, MFCC), i-vector, x-vector or deep neural network models, wherein the voiceprint recognition technology usually assumes that all voiceprint models have the same prior condition when recognizing voiceprints, and therefore, it is often required to perform a full comparison on all voiceprint models in a database. However, when database sizes reach the tens of thousands or even hundreds of thousands, full-scale comparisons can result in significant operational delays (Latency) and are prone to false positives due to similar voiceprint features (FALSE ACCEPTANCE). Although the existing improvements record time stamps or geographical location information of voice data, the recorded information is usually only used as additional recording purposes, such as regarding time or place information as Metadata (Metadata) of an audio file, only for subsequent file retrieval, or for post retrieval and backtracking analysis. That is, the voiceprint recognition needs to be compared with the voiceprint model completely or in a large scale after improvement, which results in huge calculation amount, long reaction time and easy generation of erroneous judgment in complex scenes. In view of the foregoing, it is known that the prior art has long been insufficient in voiceprint recognition speed and high in erroneous judgment rate under complex situations, and therefore an improved technical means is needed to solve the problem. Disclosure of Invention In view of the problems of insufficient voiceprint recognition speed and high misjudgment rate under complex situations in the prior art, the invention discloses a device and a method for rapidly recognizing voiceprints based on space-time situations, and a computer readable medium, wherein: The invention discloses a device for quickly identifying voiceprints based on space-time situation, which at least comprises a voiceprint database, a data acquisition module, a situation judgment module, a model selection module and a voiceprint identification module, wherein the voiceprint database is used for recording a plurality of voiceprint models, a plurality of space-time situations and the association degree of each voiceprint model and each space-time situation, the data acquisition module is used for acquiring a voice signal and a space-time label synchronous with the voice signal, the situation judgment module is used for selecting a current situation from the plurality of space-time situations according to the space-time label, the model selection module is used for screening a plurality of candidate voiceprint models from the plurality of voiceprint models, and the association degree of the plurality of candidate voiceprint models and the current situation is higher than an association threshold, and the voiceprint identification module is used for identifying voiceprints in the voice signal according to the plurality of candidate voiceprint models. The invention discloses a method for quickly identifying voiceprints based on space-time situation, which at least comprises the steps of establishing a voiceprint database, obtaining voice signals and space-time labels synchronous with the voice signals, selecting a current situation from the plurality of space-time situations according to the space-time labels, screening a plurality of candidate voiceprint models from the plurality of voiceprint models, wherein the association degree of the candidate voiceprint models and the current situation is higher than an association threshold, and identifying voiceprints in the voice signals according to the plurality of candidate voiceprint models. The disclosed computer readable medium has stored thereon a computer program which, when executed by an apparatus, causes the apparatus to implement the above-described method of rapidly identifying voiceprints based on a spatiotemporal context. The device and the method disclosed by the invention and the computer readable medium are as above, and the difference between the device and the computer readable medium and the prior art is that after the space-time label synchronous with the voice signal is obtained, the cur