CN-122024706-A - Call data intelligent analysis processing method and system based on voice recognition

CN122024706ACN 122024706 ACN122024706 ACN 122024706ACN-122024706-A

Abstract

The invention relates to the technical field of data analysis and processing, in particular to a call data intelligent analysis and processing method and system based on voice recognition, wherein the method comprises the following steps: and performing multi-head self-attention mechanism coding learning on the semantic feature standardized vector and the interrupt feature standardized vector, inputting a coding learning result into a decoder connected with an encoder, and performing feedback processing on response parameters of call voice data according to a plurality of call processing duration parameters. The method solves the technical problems that long-distance dependence and multi-granularity rhythm change in interaction cannot be effectively captured, so that decoded call processing time length parameters are misaligned, and effective feedback regulation and control are difficult to implement on response parameters.

Inventors

Hu Chenqing
YOU YIFENG
TIAN HAITIAN
TIAN YE

Assignees

深圳市银服通企业管理咨询有限公司

Dates

Publication Date: 20260512
Application Date: 20260206

Claims (8)

1. The intelligent analysis processing method for call data based on voice recognition is characterized by comprising the following steps: Collecting call voice data, and performing statement identification and time stamp marking on the call voice data to obtain a statement time sequence mapping sequence; Calculating the interrupt duration of the statement timing sequence, storing the statement timing sequence as a standardized statement timing sequence, and extracting semantic feature standardized vectors and interrupt feature standardized vectors of the standardized statement timing sequence; In a pre-training transducer model, carrying out multi-head self-attention mechanism coding learning on the semantic feature standardized vector and the interrupt feature standardized vector by a multi-layer encoder of the transducer model, inputting a coding learning result into a decoder connected with the encoder, wherein the decoder is used for decoding and obtaining a plurality of call processing duration parameters; and carrying out feedback processing on response parameters of the call voice data according to the call processing duration parameters.
2. The method of claim 1, wherein the semantic feature normalization vector and interrupt feature normalization vector are multi-headed self-attention mechanism code learning by a multi-layer encoder of the Transformer model, the method comprising: each layer of encoder in the multi-layer encoder comprises a multi-head self-attention mechanism, a feedforward network layer and a residual network layer, wherein the multi-head self-attention mechanism comprises a plurality of attention heads for learning user voice cut-off response time length characteristics, learning call data processing waiting time length characteristics and call feedback rhythm change time length characteristics; And splicing the semantic feature standardized vector and the interrupt feature standardized vector to obtain a spliced feature standardized vector, and performing multi-head self-attention mechanism coding learning on the spliced feature standardized vector by utilizing the multi-head self-attention mechanism.
3. The method of claim 2, wherein the multi-headed self-attention mechanism encoding learning is performed on the stitched feature normalization vector using the multi-headed self-attention mechanism, the method comprising: Each attention head of the multi-head self-attention mechanism is used for calculating a linear mapping matrix according to the spliced characteristic standardized vector, and the linear mapping matrix comprises a key vector characteristic matrix, a value vector characteristic matrix and a query vector characteristic matrix; Performing attention weight calculation on the linear mapping matrix to obtain an attention weight matrix; carrying out multi-head splicing on the weighted output characteristics of the attention weight matrixes to obtain a multi-head self-attention spliced output sequence; and processing the multi-head self-attention splicing output sequence according to the feedforward network layer and the residual network layer to obtain a multi-head self-attention coding output sequence, and outputting the multi-head self-attention coding output sequence as a coding learning result.
4. A method as claimed in claim 3, wherein after deriving the multi-headed self-attention encoding learning result, the method comprises: transmitting the multi-headed self-attention code output sequence to the decoder, the decoder comprising a plurality of prediction heads corresponding to the multi-headed self-attention mechanism for predicting a user speech cutoff response duration, a learning call data processing wait duration, and a call feedback rhythm variation duration; setting an encoder in a convergence state when the mean square error of the plurality of pre-measurement heads is smaller than a preset threshold value, and obtaining a plurality of call processing duration parameters which are decoded and output when the decoder is in the convergence state.
5. The method of claim 1, wherein after collecting call voice data, the method further comprises: carrying out semantic scene recognition on the call voice data to acquire a plurality of semantic scenes; dividing the call voice data into a plurality of sections of call voice data according to the plurality of semantic scenes; And respectively storing the multiple sections of call voice data into multiple sections of standardized statement time sequence mapping subsequences, and carrying out feedback processing on response parameters of the call voice data according to multiple call processing duration parameters corresponding to each section of standardized statement time sequence mapping subsequence.
6. The method of claim 5, wherein after obtaining a plurality of semantic scenes, configuring a plurality of valid query sequence lengths according to the plurality of semantic scenes; And according to the plurality of effective query sequence lengths, when each self-attention head of each layer encoder calculates each section of normalized statement time sequence mapping subsequence, cutting according to the corresponding effective query sequence length.
7. The method of claim 5, wherein after obtaining the plurality of semantic scenes, the method further comprises: generating gating weights of the multi-head self-attention mechanism according to the plurality of semantic scenes; correcting the linear mapping matrix calculated by each semantic scene according to the gating weight to generate attention distribution corresponding to the semantic scenes; And updating the coding learning result according to the attention distribution to obtain a plurality of updated call processing duration parameters.
8. A voice recognition based call data intelligent analysis processing system, characterized by the steps for implementing the voice recognition based call data intelligent analysis processing method of any one of claims 1-7, said system comprising: The calling voice data collection module is used for collecting calling voice data, and carrying out sentence recognition and time stamp marking on the calling voice data to obtain a sentence time sequence mapping sequence; The vector extraction module is used for calculating the interrupt duration of the statement sequence mapping sequence, storing the statement sequence mapping sequence as a standardized statement sequence mapping sequence, and extracting semantic feature standardized vectors and interrupt feature standardized vectors of the standardized statement sequence mapping sequence; The coding learning module is used for pre-training a converter model, performing multi-head self-attention mechanism coding learning on the semantic feature standardized vector and the interrupt feature standardized vector by a multi-layer coder of the converter model, inputting a coding learning result into a decoder connected with the coder, and decoding the decoder to obtain a plurality of call processing duration parameters; And the feedback processing module is used for carrying out feedback processing on the response parameters of the call voice data according to the plurality of call processing duration parameters.

Description

Call data intelligent analysis processing method and system based on voice recognition Technical Field The invention relates to the technical field of data analysis and processing, in particular to a call data intelligent analysis and processing method and system based on voice recognition. Background In the scenes of customer service, emergency dispatch, intelligent seats and the like, a call center generates massive voice interaction data every day, and behavior characteristics such as response delay, waiting time, conversation rhythm and the like are extracted from unstructured voices efficiently and converted into quantifiable processing time parameters, so that the method has become a core requirement for improving service efficiency, optimizing resource allocation and improving user experience. However, the existing call analysis focuses on keyword recognition or emotion classification, and lacks of fine modeling of sentence time sequence structure and interruption behavior, even if time sequence analysis is introduced, a voice transcription text is often used as a unique input, in addition, non-semantic features containing interaction intention such as silence intervals, voice round switching rhythms and the like are ignored, long-distance dependence and multi-granularity rhythms are difficult to effectively capture, so that predicted call processing parameters are weak in generalization capability, and fine operation decisions cannot be supported. In summary, in the prior art, the call analysis ignores the non-semantic features containing the interaction intention, and cannot effectively capture the long-distance dependence and multi-granularity rhythm variation in the interaction, so that the decoded call processing duration parameter is misaligned, and the technical problem that effective feedback regulation and control on the response parameter is difficult to implement is solved. Disclosure of Invention The application provides a call data intelligent analysis processing method and system based on voice recognition, which aim to solve the technical problems that call analysis in the prior art ignores non-semantic features containing interaction intention, long-distance dependence and multi-granularity rhythm change in interaction cannot be effectively captured, so that decoded call processing time length parameters are misaligned, and effective feedback regulation and control are difficult to implement on response parameters. In view of the above problems, the technical scheme for realizing the application is as follows: The application provides a call data intelligent analysis processing method based on voice recognition, which comprises the steps of collecting call voice data, carrying out statement recognition and timestamp marking on the call voice data to obtain statement time sequence mapping sequences, carrying out interrupt duration calculation on the statement time sequence mapping sequences, storing the statement time sequence mapping sequences as standardized statement time sequence mapping sequences, extracting semantic feature standardized vectors and interrupt feature standardized vectors of the standardized statement time sequence mapping sequences, carrying out multi-head self-attention mechanism coding learning on the semantic feature standardized vectors and the interrupt feature standardized vectors by a multi-layer encoder of a transducer model in a pre-training transducer model, inputting a coding learning result into a decoder connected with the encoder, and carrying out feedback processing on response parameters of the call voice data according to the call processing time duration parameters. In a possible implementation manner, each layer of encoder in the multi-layer encoder comprises a multi-head self-attention mechanism, a feedforward network layer and a residual network layer, wherein the multi-head self-attention mechanism comprises a plurality of attention heads for learning user voice cut-off response time length characteristics, learning call data processing waiting time length characteristics and call feedback rhythm change time length characteristics, the semantic feature standardized vector and the interruption feature standardized vector are spliced to obtain a spliced feature standardized vector, and the multi-head self-attention mechanism is utilized for carrying out multi-head self-attention mechanism coding learning on the spliced feature standardized vector. In a possible implementation manner, each attention head of the multi-head self-attention mechanism is used for calculating a linear mapping matrix according to the splicing characteristic standardization vector, the linear mapping matrix comprises a key vector characteristic matrix, a value vector characteristic matrix and a query vector characteristic matrix, attention weight calculation is conducted on the linear mapping matrix to obtain an attention weight matrix, multi-head splicing is conducted