CN-116013317-B - Method, device, equipment and readable storage medium for improving audio generation quality

CN116013317BCN 116013317 BCN116013317 BCN 116013317BCN-116013317-B

Abstract

The invention provides a method, a device, equipment and a readable storage medium for improving audio generation quality, which relate to the technical field of speech conversion and recognition and comprise the steps of obtaining low-sampling-rate audio, target audio sampling rate and an audio processing model, obtaining initial high-sampling-rate audio through calculation according to the low-sampling-rate audio, the target audio sampling rate and an audio preprocessing mathematical model, obtaining a target audio time domain signal and a target audio wavelet coefficient through calculation according to the initial high-sampling-rate audio and an audio signal reconstruction mathematical model, and obtaining target high-sampling-rate audio through calculation according to the target audio time domain signal, the target audio wavelet coefficient and a preset fusion audio signal mathematical model. The invention captures multi-scale details of the audio signal by using discrete wavelet transformation, and reconstructs high sampling rate audio by combining time domain information and frequency domain information of the audio signal, thereby further improving the overall generation quality of the audio.

Inventors

RAN JIA
CHEN XIAOMING
CHEN WEN
ZUO WEI

Assignees

中信银行股份有限公司

Dates

Publication Date: 20260505
Application Date: 20221205

Claims (12)

1. A method of improving audio generation quality, comprising: acquiring low-sampling-rate audio, target audio sampling rate and audio processing mathematical models, wherein the audio processing mathematical models comprise an audio preprocessing mathematical model and an audio signal reconstruction mathematical model; Calculating to obtain initial high-sampling-rate audio according to the low-sampling-rate audio, the target audio sampling rate and the audio preprocessing mathematical model; reconstructing a mathematical model according to the initial high sampling rate audio and the audio signal, and calculating to obtain a target audio time domain signal and a target audio wavelet coefficient; According to the target audio time domain signal, the target audio wavelet coefficient and a preset fusion audio signal mathematical model, solving the mathematical model to obtain target high sampling rate audio; the solving the mathematical model according to the target audio time domain signal, the target audio wavelet coefficient and a preset fusion audio signal mathematical model to obtain target high sampling rate audio comprises the following steps: Performing discrete wavelet transformation on the target audio time domain signal to obtain an initial wavelet coefficient; Calculating to obtain a target audio final wavelet coefficient according to the initial wavelet coefficient, the target audio wavelet coefficient and a preset weight; And performing discrete wavelet inverse transformation on the final wavelet coefficient of the target audio, and calculating to obtain the target high-sampling-rate audio.
2. The method of claim 1, wherein said computing initial high sample rate audio from said low sample rate audio, said target audio sample rate and said audio preprocessing mathematical model comprises: Cutting off a calculation formula according to the low-sampling-rate audio and a preset audio mute signal to obtain low-sampling-rate voice audio, wherein the low-sampling-rate voice audio is audio data of a mute segment deleted by the sampling-rate audio; According to the low sampling rate audio, the target audio sampling rate, the low sampling rate voice audio and a preset audio interpolation calculation formula, calculating to obtain initial high sampling rate audio, wherein the length of the initial high sampling rate audio is equal to that of the low sampling rate audio, and the sampling rate of the initial high sampling rate audio is equal to that of the target audio.
3. The method for improving audio generation quality according to claim 2, wherein the calculating the low sampling rate speech audio according to the low sampling rate audio and a preset audio mute signal removal calculation formula includes: Dividing the low-sampling-rate audio into at least one low-sampling-rate audio fragment according to the low-sampling-rate audio and a preset audio dividing rule; dividing the low-sampling-rate audio segment into a voice segment and a mute segment according to the low-sampling-rate audio segment and a preset audio judgment model; and calculating to obtain the low sampling rate voice audio according to the voice segment and a preset audio combination method.
4. The method for improving audio generation quality according to claim 2, wherein the calculating an initial high sample rate audio according to the low sample rate audio, the target audio sample rate, the low sample rate speech audio, and a preset audio interpolation calculation formula includes: extracting to obtain low sampling rate audio duration according to the low sampling rate audio; according to the low-sampling-rate voice audio and a preset audio interpolation calculation formula, calculating to obtain high-sampling-rate voice audio; And calculating to obtain initial high-sampling-rate audio according to the high-sampling-rate voice audio, the low-sampling-rate audio duration and a preset audio expansion model.
5. The method for improving audio generation quality according to claim 1, wherein reconstructing a mathematical model from the initial high sample rate audio and audio signal, calculating a target audio time domain signal and a target audio wavelet coefficient, comprises: According to the initial high sampling rate audio and a preset audio time domain reconstruction mathematical model, calculating to obtain a target audio time domain signal; Performing discrete wavelet transformation on the target audio time domain signal to obtain an initial wavelet coefficient; reconstructing a mathematical model according to the initial wavelet coefficient and a preset audio wavelet coefficient, and calculating to obtain a target audio wavelet coefficient.
6. An apparatus for improving audio generation quality, comprising: The data acquisition module is used for acquiring low-sampling-rate audio, target audio sampling rate and audio processing mathematical models, wherein the audio processing mathematical models comprise an audio preprocessing mathematical model and an audio signal reconstruction mathematical model; The audio processing module is used for calculating to obtain initial high-sampling-rate audio according to the low-sampling-rate audio, the target audio sampling rate and the audio preprocessing mathematical model; the audio analysis module is used for reconstructing a mathematical model according to the initial high-sampling-rate audio and the audio signal, and calculating to obtain a target audio time domain signal and a target audio wavelet coefficient; The audio reconstruction module is used for solving the mathematical model according to the target audio time domain signal, the target audio wavelet coefficient and a preset fusion audio signal mathematical model to obtain target high sampling rate audio; The audio reconstruction module comprises: a ninth calculation unit, configured to perform discrete wavelet transform on the target audio time domain signal to obtain an initial wavelet coefficient; A tenth calculation unit, configured to calculate a final wavelet coefficient of the target audio according to the initial wavelet coefficient, the target audio wavelet coefficient and a preset weight; And the eleventh calculation unit is used for carrying out discrete wavelet inverse transformation on the final wavelet coefficient of the target audio and calculating to obtain the target high-sampling-rate audio.
7. The apparatus for improving audio generation quality of claim 6, wherein the audio processing module comprises: The first calculation unit is used for obtaining the low-sampling-rate audio and a preset audio mute signal cutting calculation formula, and calculating to obtain low-sampling-rate voice audio, wherein the low-sampling-rate voice audio is audio data of a mute segment deleted by the sampling-rate audio; The second calculation unit is used for calculating and obtaining initial high-sampling-rate audio according to the low-sampling-rate audio, the target audio sampling rate, the low-sampling-rate voice audio and a preset audio interpolation calculation formula, the length of the initial high-sampling-rate audio is equal to that of the low-sampling-rate audio, and the sampling rate of the initial high-sampling-rate audio is equal to that of the target audio.
8. The apparatus for improving audio generation quality according to claim 7, wherein the first calculation unit includes: the first dividing unit is used for dividing the low-sampling-rate audio into at least one low-sampling-rate audio fragment according to the low-sampling-rate audio and a preset audio dividing rule; The first classification unit is used for dividing the low-sampling-rate audio fragment into a voice fragment and a mute fragment according to the low-sampling-rate audio fragment and a preset audio judgment model; And the third calculation unit is used for calculating and obtaining the low-sampling-rate voice audio according to the voice segment and a preset audio combination method.
9. The apparatus for improving audio generation quality according to claim 7, wherein the second calculation unit includes: the first extraction unit is used for extracting and obtaining the duration of the low-sampling-rate audio according to the low-sampling-rate audio; The fourth calculation unit is used for calculating to obtain high-sampling-rate voice audio according to the low-sampling-rate voice audio and a preset audio interpolation calculation formula; And a fifth calculation unit, configured to calculate an initial high-sampling-rate audio according to the high-sampling-rate speech audio, the low-sampling-rate audio duration, and a preset audio extension model.
10. The apparatus for improving audio generation quality of claim 6, wherein the audio analysis module comprises: A sixth calculation unit, configured to calculate a target audio time domain signal according to the initial high sampling rate audio and a preset audio time domain reconstruction mathematical model; a seventh calculation unit, configured to perform discrete wavelet transform on the target audio time domain signal to obtain an initial wavelet coefficient; and an eighth calculation unit, configured to reconstruct a mathematical model according to the initial wavelet coefficient and a preset audio wavelet coefficient, and calculate a target audio wavelet coefficient.
11. An apparatus for improving audio generation quality, comprising: A memory for storing a computer program; A processor for implementing the steps of the method of improving audio generation quality according to any of claims 1 to 5 when executing said computer program.
12. A readable storage medium, characterized in that the readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the method of improving audio generation quality according to any of claims 1 to 5.

Description

Method, device, equipment and readable storage medium for improving audio generation quality Technical Field The present invention relates to the technical field of speech conversion recognition, and in particular, to a method, an apparatus, a device, and a readable storage medium for improving audio generation quality. Background With the development of artificial intelligence technology, voiceprint recognition technology is widely applied. In banking industry, based on voiceprint recognition technology, not only can user identity be verified, but also support can be provided for recognizing fraudulent applications. In voiceprint recognition applications, the audio acquired by different channels has different sampling rates, for example, the sampling rate of the audio acquired by a telephone channel is 8kHz and the sampling rate of the audio acquired by a network channel is 16kHz. In order to achieve a better effect of the voiceprint recognition model, a super-resolution reconstruction method can be used for reconstructing a low-sampling-rate signal into a high-sampling-rate signal, the current method for improving the audio generation quality generally adopts short-time Fourier transform to process audio, the window length of the short-time Fourier transform is fixed, only details of the audio in a certain scale can be captured, only one of time domain information and frequency domain information is used, and the audio information is not fully utilized. Disclosure of Invention It is an object of the present invention to provide a method, apparatus, device and readable storage medium for improving the quality of audio generation, in order to improve the above-mentioned problems. In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: In a first aspect, the application provides a method for improving audio generation quality, comprising obtaining low-sampling-rate audio, a target audio sampling rate and an audio processing model, wherein the audio processing method comprises an audio preprocessing mathematical model and an audio signal reconstruction mathematical model, calculating to obtain initial high-sampling-rate audio according to the low-sampling-rate audio, the target audio sampling rate and the audio preprocessing mathematical model, calculating to obtain a target audio time domain signal and a target audio wavelet coefficient according to the initial high-sampling-rate audio and the audio signal reconstruction mathematical model, and solving to obtain the target high-sampling-rate audio according to the target audio time domain signal, the target audio wavelet coefficient and a preset fusion audio signal mathematical model The application further provides a device for improving the audio generation quality, which comprises a data acquisition module, an audio processing module and an audio reconstruction module, wherein the data acquisition module is used for acquiring low-sampling-rate audio, target audio sampling rate and an audio processing model, the audio processing method comprises an audio preprocessing mathematical model and an audio signal reconstruction mathematical model, the audio processing module is used for calculating to obtain initial high-sampling-rate audio according to the low-sampling-rate audio, the target audio sampling rate and the audio preprocessing mathematical model, the audio analysis module is used for reconstructing the mathematical model according to the initial high-sampling-rate audio and the audio signal, calculating to obtain a target audio time domain signal and a target audio wavelet coefficient, and the audio reconstruction module is used for solving the mathematical model according to the target audio time domain signal, the target audio wavelet coefficient and a preset fusion audio signal mathematical model to obtain target high-sampling-rate audio. In a third aspect, the present application also provides an apparatus for improving audio generation quality, comprising: A memory for storing a computer program; A processor for implementing the steps of the method of improving the quality of audio generation when executing the computer program. In a fourth aspect, the present application also provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described method for improving audio generation quality. The beneficial effects of the invention are as follows: The invention uses discrete wavelet transformation to replace short-time Fourier transformation and captures multi-scale details of the audio signal, and reconstructs high sampling rate audio by combining time domain information and frequency domain information of the audio signal so as to obtain better high sampling rate audio signal and further improve the overall generation quality of the audio. Additional features and advantages of the invention will be set forth in the