CN-122024723-A - Power dispatching voice sample enhancement method, system, equipment and medium
Abstract
The invention discloses a power dispatching voice sample enhancement method, a system, equipment and a medium, wherein the method comprises the steps of obtaining high-frequency service vocabulary in a power dispatching scene, generating virtual instruction texts of power dispatching based on the high-frequency service vocabulary and a preset large language model, conducting privacy desensitization processing on sensitive entities of the virtual instruction texts to obtain target virtual entities, collecting real power dispatching environment sounds, generating compliance instruction texts of each power dispatching scene according to the target virtual entities and the virtual instruction texts, conducting acoustic style migration corresponding to the real power dispatching environment sounds on the compliance instruction texts to generate candidate enhancement voice texts, inputting the candidate enhancement voice texts into a pre-trained voice recognition model for confidence assessment, screening the candidate enhancement voice texts according to confidence assessment results, and generating target enhancement voice texts according to screening results. The method improves the identifiability of the generated samples in the power dispatching voice.
Inventors
- KE GUOFU
- HU JINGDONG
- REN YAN
- ZHANG YANBIN
- JIANG CHENGLONG
Assignees
- 广州广哈通信股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260212
Claims (10)
- 1. A power scheduling voice sample enhancement method, comprising: acquiring a high-frequency service vocabulary in a power dispatching scene, and generating a virtual instruction text of power dispatching based on the high-frequency service vocabulary and a preset large language model; Identifying a sensitive entity of the virtual instruction text, and performing privacy desensitization processing on the sensitive entity to obtain a target virtual entity, wherein the target virtual entity is associated with phonetic features of the sensitive entity, and semantic features are not associated with the phonetic features of the sensitive entity; Collecting real power dispatching environmental sounds of the power dispatching scenes, and generating compliance instruction texts of the power dispatching scenes according to the target virtual entity and the virtual instruction texts; performing acoustic style migration corresponding to the real power dispatching environment sound on the compliance instruction text to generate candidate enhanced voice text; And inputting the candidate enhanced voice text into a pre-trained voice recognition model for confidence evaluation, screening the candidate enhanced voice text according to a confidence evaluation result, and generating a target enhanced voice text according to a screening result.
- 2. The power dispatching voice sample enhancement method of claim 1, wherein the obtaining the high frequency business vocabulary in the power dispatching scene and generating the virtual instruction text of the power dispatching based on the high frequency business vocabulary and the large language model comprises: acquiring a power dispatching industry rule and a historical service log, and extracting a power dispatching high-frequency service vocabulary in the power dispatching industry rule and the historical service log based on a word frequency statistics rule; Based on the high-frequency business vocabulary and business flow specifications, typical fault scene characteristics and operation instruction execution logic in the power dispatching industry regulations, constructing a power special rule base, And inputting the special rule base of the power into a preset large language model in the form of prompt word constraint, and generating a power dispatching virtual instruction text comprising each typical fault scene.
- 3. The power dispatching voice sample enhancement method of claim 1, wherein the privacy desensitizing process is performed on the sensitive entity to obtain a target virtual entity, comprising: Performing phonetic feature extraction processing on the sensitive entities to obtain phonetic features of the sensitive entities; constructing a candidate virtual entity set, and matching a candidate entity subset with the syllable length consistent with the syllable length of the sensitive entity from the candidate virtual entity set; calculating the acoustic distance between each candidate virtual entity in the candidate entity subset and the sensitive entity; And screening each candidate entity based on the acoustic distance to obtain a target virtual entity.
- 4. The power dispatching voice sample enhancement method of claim 3, wherein the phonetic features include initial pronunciation part features, vowel opening degree and pronunciation stroke features, tone class features and tone length features; The constructing a candidate virtual entity set includes: based on a Chinese phonetic feature basic library, screening universal vocabulary which covers different initial pronunciation parts, vowel opening degree and pronunciation strokes; Screening the universal vocabulary according to syllable length to obtain target vocabulary matched with syllable length of each sensitive entity; and performing secondary screening on the target vocabulary based on the semantic features of each sensitive entity to generate a candidate virtual entity set.
- 5. The power dispatching voice sample enhancement method of claim 1, wherein the performing acoustic style migration on the compliance instruction text corresponding to the real power dispatching environmental sound, generating candidate enhanced voice text, comprises: converting the compliance instruction text into a source domain basic voice signal based on a voice synthesis method; And constructing and training a voice migration model, inputting the source domain basic voice signal into the pre-trained voice migration model, and generating candidate enhanced voice texts.
- 6. The power dispatching voice sample enhancement method of claim 5, wherein the training process of the voice migration model comprises: Constructing an initial voice migration model, wherein source domain data of the initial voice migration model comprise the source domain basic voice signals, and target domain data of the initial voice migration model comprise real power scheduling environment voice signals; Inputting the source domain basic voice signal into a generator of the initial voice migration model to obtain an intermediate generated voice signal; respectively extracting features of the source domain basic voice signal and the intermediate generated voice signal to obtain a first deep semantic feature map and a second deep semantic feature map; Constructing a loss function based on the Euclidean distance between the first deep semantic feature map and the second deep semantic feature map and the real power scheduling environmental sound signal; and carrying out iterative training on the initial voice migration model based on the loss function to obtain a trained voice migration model.
- 7. The power dispatching voice sample enhancement method of claim 1, wherein the screening the candidate enhanced voice text according to the confidence evaluation result, generating a target enhanced voice text according to the screening result, comprises: screening candidate enhanced voice texts with recognition confidence coefficient lower than a preset confidence coefficient threshold value in the confidence coefficient evaluation result to obtain a difficult sample; and obtaining real power dispatching voice data, mixing the difficult sample with the real power dispatching voice data according to a preset proportion, and generating a target enhanced voice text according to a screening result.
- 8. A power dispatch voice sample enhancement system, comprising: the power dispatching system comprises an acquisition module, a power dispatching module and a power dispatching module, wherein the acquisition module is used for acquiring high-frequency service vocabulary in a power dispatching scene and generating a virtual instruction text of power dispatching based on the high-frequency service vocabulary and a preset large language model; The recognition module is used for recognizing the sensitive entity of the virtual instruction text, and carrying out privacy desensitization processing on the sensitive entity to obtain a target virtual entity, wherein the target virtual entity is associated with the phonetic feature of the sensitive entity, and the semantic feature is not associated with the phonetic feature of the sensitive entity; The collection module is used for collecting real power dispatching environment sounds of the power dispatching scenes and generating compliance instruction texts of the power dispatching scenes according to the target virtual entity and the virtual instruction texts; The migration module is used for carrying out acoustic style migration corresponding to the real power dispatching environment sound on the compliance instruction text to generate candidate enhanced voice text; the generation module is used for inputting the candidate enhanced voice text into a pre-trained voice recognition model to carry out confidence evaluation, screening the candidate enhanced voice text according to a confidence evaluation result, and generating a target enhanced voice text according to a screening result.
- 9. A power scheduling speech sample enhancement device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the power scheduling speech sample enhancement method of any one of claims 1 to 7 when the computer program is executed by the processor.
- 10. A computer readable storage medium, wherein the computer readable storage medium stores a computer program, and wherein the power scheduling voice sample enhancement method according to any one of claims 1 to 7 is implemented when the computer program is executed by a device in which the computer readable storage medium is located.
Description
Power dispatching voice sample enhancement method, system, equipment and medium Technical Field The present invention relates to the field of data processing technologies, and in particular, to a method, a system, an apparatus, and a medium for enhancing a power dispatching voice sample. Background With the deep advancement of smart power grid construction, an Automatic Speech Recognition (ASR) technology has become a key support for man-machine interaction in power dispatching business, and is widely applied to scenes such as real-time transfer of dispatching instructions, misoperation prevention verification, digital log generation and the like. The performance of the main stream end-to-end voice recognition model (such as Transformer, conformer and the like) is highly dependent on massive labeled voice data, but the power dispatching business has obvious long tail distribution characteristic, particularly the duty ratio of a daily switching operation instruction is extremely high, and high-value voice samples of key scenes such as power grid faults, tripping accidents, emergency repair and the like are extremely scarce, so that the recognition accuracy of the model in actual combat scenes such as sudden accidents and the like is greatly reduced, and the emergency command requirements are difficult to meet. In the existing data expansion scheme, virtual samples are often generated through a voice synthesis (TTS) technology to realize expansion enhancement of power dispatching data. However, the synthesized voice has clear pronunciation and clean background, and has huge acoustic differences with complex background noise such as telephone ring, keyboard beating sound, multi-person talking and the like of a real dispatching hall, and rapid and tense intonation of a dispatcher in an emergency scene, so that a voice recognition model based on pure synthesized voice training cannot be adapted to complex acoustic characteristics of the real dispatching environment, and the voice recognition accuracy is difficult to guarantee. Disclosure of Invention The invention provides a power dispatching voice sample enhancement method, a system, equipment and a medium, which are used for solving the technical problem of how to improve the existing power dispatching voice sample enhancement method and realizing the effect of improving the recognition of generated samples in power dispatching voice. In order to solve the above technical problems, an aspect of the present invention provides a power scheduling voice sample enhancement method, including: acquiring a high-frequency service vocabulary in a power dispatching scene, and generating a virtual instruction text of power dispatching based on the high-frequency service vocabulary and a preset large language model; Identifying a sensitive entity of the virtual instruction text, and performing privacy desensitization processing on the sensitive entity to obtain a target virtual entity, wherein the target virtual entity is associated with phonetic features of the sensitive entity, and semantic features are not associated with the phonetic features of the sensitive entity; Collecting real power dispatching environmental sounds of the power dispatching scenes, and generating compliance instruction texts of the power dispatching scenes according to the target virtual entity and the virtual instruction texts; performing acoustic style migration corresponding to the real power dispatching environment sound on the compliance instruction text to generate candidate enhanced voice text; And inputting the candidate enhanced voice text into a pre-trained voice recognition model for confidence evaluation, screening the candidate enhanced voice text according to a confidence evaluation result, and generating a target enhanced voice text according to a screening result. As one preferable solution, the obtaining the high-frequency service vocabulary in the power dispatching scene, and generating the virtual instruction text of the power dispatching based on the high-frequency service vocabulary and the large language model includes: acquiring a power dispatching industry rule and a historical service log, and extracting a power dispatching high-frequency service vocabulary in the power dispatching industry rule and the historical service log based on a word frequency statistics rule; Based on the high-frequency business vocabulary and business flow specifications, typical fault scene characteristics and operation instruction execution logic in the power dispatching industry regulations, constructing a power special rule base, And inputting the special rule base of the power into a preset large language model in the form of prompt word constraint, and generating a power dispatching virtual instruction text comprising each typical fault scene. As one preferable solution, the performing privacy desensitization processing on the sensitive entity to obtain a target virtual entity includes: Per