CN-121999780-A - Bias method, device, equipment, storage medium and product based on pronunciation correlation

CN121999780ACN 121999780 ACN121999780 ACN 121999780ACN-121999780-A

Abstract

The application relates to a biasing method, a device, equipment, a storage medium and a product based on pronunciation correlation, and relates to the field of speech recognition. The method comprises the steps of receiving a preselected character deduced from target voice, determining candidate hotwords with pronunciation correlation with the preselected character by taking a threshold value higher than a specific pronunciation similarity as a screening condition from pre-constructed character correlation data based on pronunciation correlation, and biasing the preselected character according to the candidate hotwords.

Inventors

REN YULING
Zhou Longpu
ZHAO JIANGJIANG
DU SHENG
YANG ZHENGZHE
LI QINGLONG
ZHANG SHIXIN
LIU YU
LIU RUIBO
DAI XIAOKANG

Assignees

中移在线服务有限公司
中国移动通信集团有限公司

Dates

Publication Date: 20260508
Application Date: 20251222

Claims (10)

1. A method of biasing based on pronunciation correlation, the method comprising: receiving preselected characters deduced from target voice; Determining candidate hotwords with pronunciation correlation with the preselected characters from pre-constructed character correlation data based on pronunciation correlation by taking a threshold value higher than a specific pronunciation similarity as a screening condition; and biasing the preselected character according to the candidate hotword.
2. The method of claim 1, wherein the method for constructing the pronunciation-correlation-based character association data comprises: Constructing a training data set based on the marked first set of voice data, wherein the training data set comprises a plurality of groups of training audio pairs taking characters as units, phoneme sequences respectively corresponding to two audios in each pair and pronunciation similarity of the two audios in each pair; Training the similarity between the audio frequencies output by the pronunciation correlation model according to the training data set; Constructing a prediction data set based on the noted second set of speech data, the prediction data set comprising a plurality of sets of prediction audio pairs in units of characters; Generating pronunciation similarity between characters in the prediction data set by using the pronunciation correlation model; Based on the pronunciation similarity, character association data based on pronunciation correlation is constructed.
3. The method of claim 2, wherein the target speech and the second set of speech data are speech within a particular domain.
4. A method according to claim 3, wherein the second set is less than the first set of voice data.
5. The method of claim 2, wherein constructing the predicted data set based on the annotated second set of speech data comprises: Processing the marked second set of voice data into predicted audio taking characters as units; constructing a predicted audio set corresponding to the character by using a plurality of predicted audios corresponding to the same character; And combining the prediction audio sets corresponding to each two characters respectively to construct a plurality of groups of prediction audio pairs taking the characters as units.
6. The method of claim 2, wherein constructing the training data set based on the annotated first set of speech data comprises: Processing the marked first set of voice data into a plurality of groups of training audio pairs taking characters as units and phoneme sequences corresponding to two audios in each pair respectively; generating a similarity matrix of two audios in each pair based on a plurality of frames in a phoneme sequence, and determining pronunciation similarity of each pair based on the similarity matrix; And constructing a training data set according to the pronunciation similarity of each pair.
7. A bias device based on pronunciation correlation is characterized by comprising a character receiving unit, a hotword determining unit and a character bias unit, wherein, The character receiving unit is used for receiving preselected characters deduced from target voice; The hotword determining unit is used for determining candidate hotwords with pronunciation correlation with the preselected characters by taking a threshold value which is higher than a specific pronunciation similarity as a screening condition from character correlation data which is built in advance and is based on pronunciation correlation; And the character biasing unit is used for biasing the preselected characters according to the candidate hotword.
8. A computer device comprising a processor and a memory storing at least one computer program loaded and executed by the processor to implement the method of any one of claims 1 to 6.
9. A computer readable storage medium, characterized in that at least one computer program is stored in the computer readable storage medium, which computer program is loaded and executed by a processor to implement the method according to any of claims 1 to 6.
10. A computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer device, cause the computer device to perform the method of any of claims 1 to 6.

Description

Bias method, device, equipment, storage medium and product based on pronunciation correlation Technical Field The embodiment of the application relates to the field of voice recognition, in particular to a biasing method, device, equipment, storage medium and product based on pronunciation correlation. Background In traditional speech recognition, the correspondence between characters and pronunciations mainly depends on a pre-constructed pronunciation dictionary, and when target speech to be recognized is received, pre-selected characters are inferred out, and are used as bias objects, and then bias is carried out according to preset hotword weights for guiding recognition results. However, with the rapid popularization of speech recognition in multiple fields and multiple scenes, different languages, dialects, various specialized vocabularies and network new words are continuously appeared, which results in that many words with similar pronunciation and different semantics are not easy to be accurately recognized. Such as "loan" and "bandwidth", "by-card" and "Fuka", all belong to words with similar pronunciation but different semantics. The method has the advantages that a great challenge is caused to a voice recognition mode which mainly relies on manual construction of a pronunciation dictionary, and the current requirement for higher and higher voice recognition effect is difficult to meet due to poor generalization capability and low efficiency. Disclosure of Invention The embodiment of the application provides a bias method, a bias device, bias equipment, a storage medium and a bias product based on pronunciation similarity, which utilize pronunciation correlation to bias in the process of speech recognition, so that the accuracy of character recognition is improved, and the higher requirement on speech recognition is met. In one aspect, a method for biasing based on pronunciation similarity is provided, the method comprising: receiving preselected characters deduced from target voice; Determining candidate hotwords with pronunciation correlation with the preselected characters from pre-constructed character correlation data based on pronunciation correlation by taking a threshold value higher than a specific pronunciation similarity as a screening condition; the preselected character is biased based on the candidate hotword. In one embodiment, a method for constructing character association data based on pronunciation correlation includes: Constructing a training data set based on the marked first set voice data, wherein the training data set comprises a plurality of groups of training audio pairs taking characters as units, phoneme sequences corresponding to two audios in each pair respectively and pronunciation similarity of the two audios in each pair; Training the similarity between the audio frequencies by using a training data set and a pronunciation correlation model; constructing a prediction data set based on the noted second set of speech data, the prediction data set comprising a plurality of sets of prediction audio pairs in units of characters; Generating pronunciation similarity among characters in the prediction data set by using a pronunciation correlation model; based on the pronunciation similarity, character association data based on pronunciation correlation is constructed. In one embodiment, the target speech and the second set of speech data are speech within a particular domain. In one embodiment, the second set is less than the first set of voice data. In one embodiment, constructing the prediction data set based on the annotated second set of speech data includes: Processing the marked second set of voice data into predicted audio taking characters as units; constructing a predicted audio set corresponding to the character by using a plurality of predicted audios corresponding to the same character; And combining the prediction audio sets corresponding to each two characters respectively to construct a plurality of groups of prediction audio pairs taking the characters as units. In one embodiment, constructing a training data set based on the annotated first set of speech data includes: Processing the marked first set of voice data into a plurality of groups of training audio pairs taking characters as units and phoneme sequences corresponding to two audios in each pair respectively; Generating a similarity matrix of two audios in each pair based on a plurality of frames in the phoneme sequence, and determining pronunciation similarity of each pair based on the similarity matrix; and constructing a training data set according to the pronunciation similarity of each pair. In another aspect, a bias apparatus based on pronunciation correlation is provided, the apparatus comprising a character receiving unit, a hotword determining unit, and a character biasing unit, wherein, The character receiving unit is used for receiving preselected characters deduced from target voice; A h