CN-121999781-A - Voice recognition and sensitive word detection method

CN121999781ACN 121999781 ACN121999781 ACN 121999781ACN-121999781-A

Abstract

The invention relates to a voice recognition and sensitive word detection method which comprises the following steps of collecting voice signals, preprocessing and extracting acoustic features, constructing a voice recognition model, generating corresponding text data based on the acoustic features, constructing a sensitive word stock, constructing the sensitive word stock into a state diagram of a definite finite automaton, detecting sensitive words according to the state diagram of the definite finite automaton by traversing the text data, effectively maintaining the recognition accuracy of key terms in the field under a voice environment containing noise, accent or homophonic interference, reducing the error of key word missing recognition and homophonic replacement, improving the follow-up detection reliability from the source, and simultaneously reducing the calculation cost, reducing the matching jitter and outputting more stable hit positions and context fragments through a definite matching mechanism of the finite state automaton.

Inventors

WU QIAN
Huang Qiongxia
WANG FENG
Wu Baomei
Zhang Guihuang
YE LINPING
CHEN NINGZHAO
CHEN ZHENGTONG
WU HUIYING
WANG TIANLONG
XIE JINGYU
LIU JIAHAO

Assignees

国网福建省电力有限公司信息通信分公司
国网福建省电力有限公司
福建省亿力信息技术有限公司

Dates

Publication Date: 20260508
Application Date: 20260304

Claims (10)

1. A voice recognition and sensitive word detection method is characterized by comprising the following steps: Collecting voice signals, preprocessing and extracting acoustic characteristics; Constructing a voice recognition model, and generating corresponding text data based on acoustic features; constructing a sensitive word stock and constructing the sensitive word stock into a finite automaton state diagram; traversing the text data detects sensitive words according to the determined finite automaton state diagram.
2. A method for speech recognition and sensitive word detection as defined in claim 1, wherein preprocessing the voice signal includes denoising and endpoint detection.
3. The method for detecting the sensitive words by using the speech recognition according to claim 2, wherein the noise removal processing is performed on the collected speech signal by using spectral subtraction with a smoothing mechanism, comprising the following specific steps: In the noise estimation stage, the maximum noise frame is calculated, specifically as follows: Wherein: Representing a maximum noise frame; representing the total duration of the voice signal; Representation of Estimating noise at the moment; before representation A mean value of the time noise estimation; After the voice signal is subjected to spectral subtraction by spectral subtraction, if the voice signal frequency value of any frame is smaller than the frequency value of the maximum noise frame, the voice signal frequency value of any frame is replaced by the minimum frequency value in the adjacent frame.
4. The method for detecting voice recognition and sensitive words according to claim 2, wherein the steps of performing end point detection recognition on the voice signal to identify the start and end positions are as follows: Calculating the spectral entropy of a voice signal, extracting voice segments with the spectral entropy larger than a preset spectral entropy threshold, and reserving the rest voice segments as first voice segments; Dividing the extracted voice segment into a plurality of subsections, calculating the short-time zero-crossing rate of each subsection, reserving a field with the short-time zero-crossing rate smaller than a preset reserved short-time zero-crossing threshold value, and combining to obtain a second voice segment; And combining the first voice segment and the second voice segment to obtain a complete voice segment.
5. The method of claim 1, wherein mel-frequency cepstral coefficient features of the speech signal are extracted.
6. The method of claim 1, wherein the speech recognition model is constructed based on a hidden markov model, and the speech signal is converted into text data by a viterbi algorithm.
7. The method for detecting a voice recognition and sensitive word according to claim 6, wherein a hot word set is constructed, and the hot word set includes a plurality of preset sensitive words; And judging whether each observation sequence contains hot words in the hot word set or not according to the observation sequence set generated by the hidden Markov model, and setting additional addition probability of the observation sequence if the hot words are contained.
8. The method for recognizing and detecting sensitive words according to claim 1, wherein the method comprises the steps of: each sensitive word corresponds to a path which starts from an initial state and goes through a plurality of intermediate states to reach an end state, and at least one state transition is defined for each character of the sensitive word in the path construction process; the traversal text data detects sensitive words according to the state diagram of the determined finite automaton, and the specific steps are as follows: Initializing a state diagram of the determined finite automaton, reading text data character by character, and determining a next state according to the current state and the definition of the characters in the state diagram of the determined finite automaton for each character; Searching a corresponding next state in a state diagram of the finite automaton according to the current state and the read character, updating the current state to the next state if the next state is found, finding a sensitive word if the current state reaches an ending state, and recording the position of the sensitive word; When a sensitive word is found, selecting the next character from the sensitive word to continue the sensitive word detection, and ending the detection when the detection of all characters of the text data is completed.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 8 when the program is executed by the processor.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1 to 8.

Description

Voice recognition and sensitive word detection method Technical Field The invention relates to a voice recognition and sensitive word detection method, and belongs to the technical field of natural voice processing. Background The voice interaction and call center recording are used for business acceptance, consultation answering, fault repair and other links in the electric service scene. In order to perform subsequent retrieval, quality inspection and structuring processing on voice content, it is generally required to transcribe voice into text in real time or offline, and automatically detect and locate specific terms (such as sensitive words, offensive words, high-risk sentence fragments, etc.) in the text. The processing belongs to the technical problems of data processing realized by a computer such as voice signal processing, voice recognition decoding, text pattern matching and the like. Common schemes in the prior art mainly comprise: the universal speech recognition model directly transcribes the sound recording, and the sensitive word detection is completed by adopting regular expression/naive character string matching/keyword matching after word segmentation on the transcribed text, or the coarse-grained triggering is carried out by adopting a simple keyword list. However, the following disadvantages generally exist in practical engineering applications: The recognition rate of domain words/proper nouns is low, namely the general speech recognition model has insufficient coverage on proper nouns, abbreviations, place names, person names, equipment names, business terms and the like in the electric power service domain, homonym replacement and missing recognition are easy to occur, the recall rate is reduced, and errors can be detected and amplified by subsequent sensitive words. Noise, accent and call link distortion cause false recognition, namely background noise, echo, double overlapped voice, bandwidth-limited compression distortion and the like are often generated in the recording, and the traditional decoding strategy is not robust enough for selecting candidate sequences of uncertain fragments, so that related fragments of keywords are misinterpreted or segmented and misplaced. The text matching false alarm and positioning instability is that regular/naive matching is easy to generate false alarm and missing alarm when facing isomorphism, homophonic substitution, inserted words and variant expression (near-meaning substitution, harmonic sound and spoken language), and in long text or stream transcription, the matching algorithm can bring extra calculation cost and unstable positioning result if not carrying out state multiplexing and failure rollback optimization. Disclosure of Invention In order to solve the problems in the prior art, the invention provides a voice recognition and sensitive word detection method. The technical scheme of the invention is as follows: in one aspect, the invention provides a method for voice recognition and sensitive word detection, comprising the following steps: Collecting voice signals, preprocessing and extracting acoustic characteristics; Constructing a voice recognition model, and generating corresponding text data based on acoustic features; constructing a sensitive word stock and constructing the sensitive word stock into a finite automaton state diagram; traversing the text data detects sensitive words according to the determined finite automaton state diagram. Preferably, preprocessing the voice signal includes denoising and endpoint detection. Preferably, the collected voice signal is denoising processed by adopting spectral subtraction introducing a smoothing mechanism, and the specific steps are as follows: In the noise estimation stage, the maximum noise frame is calculated, specifically as follows: Wherein: Representing a maximum noise frame; representing the total duration of the voice signal; Representation of Estimating noise at the moment; before representation A mean value of the time noise estimation; After the voice signal is subjected to spectral subtraction by spectral subtraction, if the voice signal frequency value of any frame is smaller than the frequency value of the maximum noise frame, the voice signal frequency value of any frame is replaced by the minimum frequency value in the adjacent frame. Preferably, the method for identifying the starting and ending positions of the voice signal by end point detection comprises the following specific steps: Calculating the spectral entropy of a voice signal, extracting voice segments with the spectral entropy larger than a preset spectral entropy threshold, and reserving the rest voice segments as first voice segments; Dividing the extracted voice segment into a plurality of subsections, calculating the short-time zero-crossing rate of each subsection, reserving a field with the short-time zero-crossing rate smaller than a preset reserved short-time zero-crossing threshold value, an