CN-122024758-A - Playback audio generation method, device, medium and equipment

CN122024758ACN 122024758 ACN122024758 ACN 122024758ACN-122024758-A

Abstract

The embodiment of the specification discloses a playback audio generation method, which sequentially performs low-pass filtering, noise increasing, reverberation processing and clipping processing, and converts original audio data into analog playback audio. The key physical process in the playback attack is simulated, and the generated audio is similar to the real playback attack in acoustic characteristics, so that high-quality playback audio is provided for the audio living body detection model.

Inventors

WANG TAO
LIU JIAN
ZHANG CHANGHAO

Assignees

支付宝(杭州)数字服务技术有限公司

Dates

Publication Date: 20260512
Application Date: 20260108

Claims (10)

1. A playback audio generation method, comprising: Acquiring original audio data; determining low-pass frequency according to a frequency response range of preset recording equipment, performing low-pass filtering processing on the audio data, and determining first intermediate data; selecting a noise coefficient for doping from preset environmental noise coefficients, performing noise increasing processing on the first intermediate data, and determining second intermediate data; Generating an impulse response according to a preset attenuation factor and reverberation time, and carrying out convolution processing on the impulse response and the second intermediate data to determine third intermediate data; And determining a clipping peak value according to a preset working range of the playing device, and clipping the third intermediate data according to the clipping peak value to obtain playback audio.
2. The method of claim 1, wherein the convolving the impulse response with the second intermediate data to determine third intermediate data, comprises: convolving the impulse response with second intermediate data to determine a reverberation processing result; And carrying out dynamic range compression processing on the reverberation processing result according to the preset gain control parameters of the recording equipment, and determining third intermediate data.
3. The method of claim 1, wherein clipping the third intermediate data according to the clipping peak value to obtain playback audio, specifically comprising: Performing clipping processing on the third intermediate data according to the clipping peak value to obtain a clipping processing result; performing autocorrelation calculation on the clipping processing result and the audio data to determine a time offset caused by the convolution processing according to an autocorrelation peak position; And cutting the front section of the clipping processing result according to the time offset to obtain playback audio.
4. The method of claim 3, clipping the front segment of the clipping result according to the time offset to obtain playback audio, specifically comprising: cutting the front section of the clipping processing result according to the time offset; and complementing the tail end of the cut audio according to the duration of the audio data to obtain playback audio consistent with the duration of the audio data.
5. The method of claim 1, prior to performing the noise-increasing process on the first intermediate data, the method further comprising: Determining an attenuation factor and reverberation time according to preset acoustic parameters of a first recording environment, and generating a first impulse response; And carrying out convolution processing on the first impulse response and the first intermediate data so as to carry out noise increasing processing according to the reverberation data obtained by convolution.
6. The method of claim 5, generating an impulse response according to a preset decay factor and reverberation time, convolving the impulse response with the second intermediate data, and determining third intermediate data, comprising: determining an attenuation factor and reverberation time according to acoustic parameters of a preset second recording environment, and generating a second impulse response; And carrying out convolution processing on the second impulse response and the second intermediate data to determine third intermediate data.
7. The method of claim 1, the raw audio data being audio data to be live detected; the method further comprises the steps of: Inputting the audio data and the playback audio into a preset living body detection model, and respectively determining a first audio characteristic of the audio data and a second audio characteristic of the playback audio through a characteristic extraction layer of the living body detection model; and determining a living body detection result of the audio data through a twin network of the living body detection model according to the first audio feature and the second audio feature.
8. An apparatus for playback audio generation, comprising: The acquisition module is used for acquiring the original audio data; The recording simulation module is used for determining low-pass frequency according to a frequency response range of preset recording equipment, performing low-pass filtering processing on the audio data and determining first intermediate data; the environment simulation module is used for selecting a noise coefficient for doping from preset environment noise coefficients, executing noise increasing processing on the first intermediate data and determining second intermediate data; the reverberation simulation module is used for generating an impulse response according to a preset attenuation factor and reverberation time, carrying out convolution processing on the impulse response and the second intermediate data, and determining third intermediate data; and the playing simulation module is used for determining a clipping peak value according to the working range of the preset playing equipment, and clipping the third intermediate data according to the clipping peak value to obtain playback audio.
9. A computer readable storage medium storing a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the preceding claims 1-7 when the program is executed.

Description

Playback audio generation method, device, medium and equipment Technical Field The present disclosure relates to the field of computer technologies, and in particular, to a playback audio generation method, apparatus, storage medium, and device. Background In recent years, voice-based identity authentication and interaction technologies have become a hotspot for applications due to their convenience. In order to improve the security of voice interaction, audio in-vivo detection is required. By identifying the source of the audio data, whether the audio is the user himself or fake attack such as recording or synthesizing sound is judged, and the safety of service execution is improved. In the prior art, in order to train a living body detection model, a large amount of playback audio is required as a negative sample, and the attack behavior of playing original voice through recording equipment in a real attack scene is simulated. The playback audio is typically obtained by playing the original speech in a particular environment by a playback device and then re-capturing by a recording device. But obviously this approach is costly and in order to obtain more abundant data it is often necessary to record in different environments based on different playback devices, recording devices. Moreover, variables such as environmental noise, device status, recording distance, etc. are difficult to precisely control and reproduce, resulting in a lack of consistency in the generated data set. The detection model obtained through training has the problems of poor universality, easiness in overfitting and the like. Based on this, the present specification provides a playback audio generation method to partially solve the problems existing in the prior art. Disclosure of Invention Embodiments of the present disclosure provide a playback audio generation method, apparatus, storage medium, and electronic device, so as to partially solve the problems of the prior art. The embodiment of the specification adopts the following technical scheme: A playback audio generation method provided in the present specification, the method comprising: Acquiring original audio data; determining low-pass frequency according to a frequency response range of preset recording equipment, performing low-pass filtering processing on the audio data, and determining first intermediate data; selecting a noise coefficient for doping from preset environmental noise coefficients, performing noise increasing processing on the first intermediate data, and determining second intermediate data; Generating an impulse response according to a preset attenuation factor and reverberation time, and carrying out convolution processing on the impulse response and the second intermediate data to determine third intermediate data; And determining a clipping peak value according to a preset working range of the playing device, and clipping the third intermediate data according to the clipping peak value to obtain playback audio. An apparatus for playback audio generation provided herein, the apparatus comprising: The acquisition module is used for acquiring the original audio data; The recording simulation module is used for determining low-pass frequency according to a frequency response range of preset recording equipment, performing low-pass filtering processing on the audio data and determining first intermediate data; the environment simulation module is used for selecting a noise coefficient for doping from preset environment noise coefficients, executing noise increasing processing on the first intermediate data and determining second intermediate data; the reverberation simulation module is used for generating an impulse response according to a preset attenuation factor and reverberation time, carrying out convolution processing on the impulse response and the second intermediate data, and determining third intermediate data; and the playing simulation module is used for determining a clipping peak value according to the working range of the preset playing equipment, and clipping the third intermediate data according to the clipping peak value to obtain playback audio. A computer-readable storage medium provided in the present specification stores a computer program which, when executed by a processor, implements the playback audio generation method described above. An electronic device provided in the present specification includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the playback audio generation method described above when executing the program. The above-mentioned at least one technical scheme that this description embodiment adopted can reach following beneficial effect: The embodiment of the specification discloses a playback audio generation method, which sequentially performs low-pass filtering, noise increasing, reverberation processing and clipping processing, and