CN-122001980-A - Intelligent outbound method, intelligent outbound device, computer equipment and storage medium

CN122001980ACN 122001980 ACN122001980 ACN 122001980ACN-122001980-A

Abstract

The invention relates to the field of voice processing and discloses an intelligent outbound method, a device, computer equipment and a storage medium, wherein the method comprises the steps of acquiring a client telephone voice stream when an intelligent outbound task is executed, and slicing the client telephone voice stream to acquire a plurality of client voice fragments; the method comprises the steps of calculating the voice fidelity of a client voice segment, further calculating the pass fidelity of an interactive pass, processing the voice segment in the interactive pass through an AI voice classification model to obtain AI voice probability, and executing a preset on-hook process when the pass fidelity is smaller than a preset fidelity threshold and the AI voice probability is larger than the preset AI voice threshold. The invention improves the identification accuracy and the processing efficiency of the intelligent outbound system on the AI answering scene, saves the conversation resources and the operation cost, and can be applied to the financial science and technology and the medical health care business scene.

Inventors

CHEN XIAOJIAN
ZHENG ZHE

Assignees

平安科技（深圳）有限公司

Dates

Publication Date: 20260508
Application Date: 20260108

Claims (10)

1. An intelligent outbound method, comprising: when an intelligent outbound task is executed, a client telephone voice stream is acquired through a preset audio acquisition protocol, and the client telephone voice stream is sliced through a preset time window to acquire a plurality of client voice fragments; The voice authenticity of each client voice segment is obtained by carrying out voice authenticity identification on the plurality of client voice segments based on a voice authenticity analysis model of a support vector machine; When the end point of the interactive round is detected, calculating the round reality of the interactive round according to the voice reality in the interactive round; And executing a preset on-hook process when the round reality is smaller than a preset reality threshold and the AI voice probability is larger than a preset AI voice threshold.
2. The intelligent outbound method according to claim 1, wherein the obtaining the client telephony voice stream through a preset audio acquisition protocol and slicing the client telephony voice stream through a preset time window to obtain a plurality of client voice segments when the intelligent outbound task is performed comprises: initiating an outbound call through an automatic dialer; after the call is connected, the voice stream of the client telephone is obtained through an SIP protocol or a WebRTC protocol; slicing the customer phone voice stream by taking the appointed duration as a preset time window to obtain a plurality of initial customer voice fragments; Scoring the plurality of initial client voice fragments through a fragment quality scoring model to obtain quality scores of the initial client voice fragments; and determining the initial client voice segment with the quality score larger than a preset quality score threshold as the client voice segment.
3. The intelligent outbound method according to claim 2 wherein said scoring said plurality of initial customer speech segments by a segment quality scoring model to obtain a quality score for each of said initial customer speech segments comprises: performing signal-to-noise ratio analysis on the initial client voice fragment to obtain signal-to-noise ratio data; Detecting the voice intensity and the voice duration of the initial client voice segment to obtain voice intensity duration data; extracting voice activity characteristics from the initial client voice segment by a voice activity detection algorithm; And carrying out weighted scoring on the signal-to-noise ratio data, the voice strength duration data and the voice activity characteristics to obtain the quality score of the initial client voice segment.
4. The intelligent outbound method according to claim 1, wherein before obtaining the speech realism of each of the client speech segments by performing speech realism recognition on the plurality of client speech segments based on a speech realism analysis model of a support vector machine, further comprising: acquiring an AI voice sample and a non-AI voice sample; Performing first pretreatment on the AI voice sample to obtain an AI voice pretreatment sample; performing second pretreatment on the non-AI voice sample to obtain a non-AI voice pretreatment sample; Performing feature extraction and feature splicing on the AI voice pretreatment sample to obtain an AI voice feature vector; performing feature extraction and feature splicing on the non-AI voice pretreatment sample to obtain a non-AI voice feature vector; Inputting the AI speech feature vector and the non-AI speech feature vector into an initial vector machine model for training; And determining the trained initial vector machine model as the voice fidelity analysis model.
5. The intelligent outbound method of claim 1, wherein when an endpoint of an interaction round is detected, calculating a round fidelity of the interaction round from a voice fidelity within the interaction round comprises: determining a time point at which the client is detected to start sounding as a starting point of the interactive round; determining a time point at which continuous silence exceeds a first preset threshold after the client voice is finished as an end point of the interactive round; and obtaining the voice fidelity of all the client voice fragments in the interactive round, and carrying out averaging to obtain the round fidelity.
6. The intelligent outbound method of claim 1 wherein said processing the speech segments within the interaction round through an AI speech classification model to obtain AI speech probabilities for the interaction round comprises: Extracting basic acoustic characteristics of the voice fragment, wherein the basic acoustic characteristics comprise Mel frequency cepstrum coefficients, F0 fundamental frequency and spectrum envelope; Calculating derived features according to the basic acoustic features, wherein the derived features comprise cosine similarity, fundamental frequency variance, rhythm entropy and frequency spectrum change rate; and determining the AI voice probability according to the derivative features.
7. The intelligent outbound method according to claim 1, wherein after the execution of the preset on-hook procedure, further comprising: recording an AI identification result of the call, and updating the AI identification result into a call behavior log of the client; if the multiple AI identification results in the call behavior log meet the preset AI answering judging standard, adding an AI answering label for the user portrait of the client; And adjusting the call strategy of the client according to the AI answering label.
8. An intelligent outbound call device, which comprises a base, characterized by comprising the following steps: the voice stream acquisition module is used for acquiring a client telephone voice stream through a preset audio acquisition protocol when an intelligent outbound task is executed, and slicing the client telephone voice stream through a preset time window to acquire a plurality of client voice fragments; The voice reality analysis module is used for carrying out voice reality recognition on the plurality of client voice fragments through a voice reality analysis model based on a support vector machine to obtain the voice reality of each client voice fragment; the double-probability evaluation module is used for calculating the round reality of the interaction round according to the voice reality in the interaction round when the end point of the interaction round is detected; And the on-hook response module is used for executing a preset on-hook process when the round reality is smaller than a preset reality threshold and the AI voice probability is larger than a preset AI voice threshold.
9. A computer device comprising a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, wherein the processor, when executing the computer readable instructions, implements the intelligent outbound method of any one of claims 1 to 7.
10. One or more readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the intelligent outbound method of any of claims 1 to 7.

Description

Intelligent outbound method, intelligent outbound device, computer equipment and storage medium Technical Field The present invention relates to the field of speech processing, and in particular, to an intelligent outbound method, apparatus, computer device, and storage medium. Background Along with the rapid development of artificial intelligence technology, AI speech synthesis technology has been widely applied to intelligent outbound systems, and has advanced the technology and huge-head arrangements of Microsoft, google, amazon, etc., to promote the continuous promotion of the outbound automation level. However, the existing system still faces a plurality of technical bottlenecks in practical application. With the popularity of smart phone assistants (e.g., siri, college alien) and customer service platform AI robots, more and more outgoing calls are answered by AI voice robots. The current intelligent outbound system lacks an effective mechanism to identify whether an answering party is a real person or an AI voice assistant, so that the system can mistakenly interact with the robot as the real person, and a lengthy conversation process is continuously executed, so that a large number of invalid conversations are caused. The invalid interaction not only prolongs the call duration and wastes system resources, but also increases the communication cost and the manual intervention requirement, and seriously affects the outbound efficiency and the operation economy. In addition, the traditional voice recognition system relies on keyword matching or shallow voice characteristics to judge, so that the highly anthropomorphic AI synthesized voice and the real voice are difficult to distinguish, and the recognition accuracy is low. When the voice recognition system and the voice robot fall into a cyclic response or an invalid waiting, the overall service quality is significantly reduced. Particularly in the financial and scientific field, intelligent outbound is widely used for credit collection, account verification and risk prompt, and if the system cannot identify the answering party as an AI assistant, sensitive information leakage or compliance risk may be caused. In the field of intelligent aged health, the outbound system is used for aged health return visit and emergency notification, and misjudgment of an answering object can delay real response to the solitary aged, so that service reliability and life safety guarantee are affected. Therefore, there is a need for an intelligent outbound method to effectively distinguish the difference between the AI synthesized voice and the real voice, optimize the outbound process, reduce the operation cost, and realize the efficient and intelligent operation of the intelligent outbound system. Disclosure of Invention Based on this, it is necessary to provide an intelligent outbound method, device, computer equipment and storage medium to effectively distinguish the difference between AI synthesized voice and real voice and optimize the outbound procedure in order to solve the above technical problems. An intelligent outbound method comprising: when an intelligent outbound task is executed, a client telephone voice stream is acquired through a preset audio acquisition protocol, and the client telephone voice stream is sliced through a preset time window to acquire a plurality of client voice fragments; The voice authenticity of each client voice segment is obtained by carrying out voice authenticity identification on the plurality of client voice segments based on a voice authenticity analysis model of a support vector machine; When the end point of the interactive round is detected, calculating the round reality of the interactive round according to the voice reality in the interactive round; And executing a preset on-hook process when the round reality is smaller than a preset reality threshold and the AI voice probability is larger than a preset AI voice threshold. Optionally, when the intelligent outbound task is executed, a client phone voice stream is obtained through a preset audio acquisition protocol, the client phone voice stream is sliced through a preset time window, and a plurality of client voice fragments are obtained, including: initiating an outbound call through an automatic dialer; after the call is connected, the voice stream of the client telephone is obtained through an SIP protocol or a WebRTC protocol; slicing the customer phone voice stream by taking the appointed duration as a preset time window to obtain a plurality of initial customer voice fragments; Scoring the plurality of initial client voice fragments through a fragment quality scoring model to obtain quality scores of the initial client voice fragments; and determining the initial client voice segment with the quality score larger than a preset quality score threshold as the client voice segment. Optionally, the scoring the plurality of initial client speech segments by a segment qua