CN-122024561-A - Resonance regulation prompting method based on oral cavity-pharyngeal cavity-laryngeal cavity posture
Abstract
The invention discloses a resonance regulation prompting method based on oral cavity-pharyngeal cavity-laryngeal cavity postures. Three-dimensional imaging data and audio signals of a sounding process are synchronously acquired through the three-dimensional imaging equipment and the sound acquisition device, various gesture parameters are extracted, and the first formant frequency, the second formant frequency and the third formant frequency of the formants are extracted. And calculating the observable credibility of each attitude parameter, performing gating, performing consistency check on the throat height by adopting two mutually independent measuring channels, and stopping outputting a corresponding quantitative prompt and outputting a degradation prompt when the credibility is insufficient or the check is not passed. The system inputs the gesture-resonance mapping model through the gated gesture parameter subset and the formant characteristic to output the adjustment quantity, applies time smoothing and rate constraint to prompt to stably output, carries out closed-loop convergence judgment based on deviation measurement, realizes multi-style self-adaptive repeatable resonance training, and improves prompt stability and training consistency.
Inventors
- YANG JIAN
Assignees
- 山东石油化工学院
Dates
- Publication Date
- 20260512
- Application Date
- 20260326
Claims (10)
- 1. The resonance regulation prompting method based on the posture of the oral cavity, the pharyngeal cavity and the laryngeal cavity is characterized by comprising the following steps: S1, acquiring three-dimensional imaging data of a singer in a sounding process by using three-dimensional imaging equipment, acquiring an audio signal of the sounding process by using a sound acquisition device, and performing time synchronization on the three-dimensional imaging data and the audio signal; S2, extracting a posture parameter set P based on three-dimensional imaging data, wherein the posture parameter set P comprises an opening and closing distance of a mouth shape, an opening and closing angle of a lower jaw, a tongue position height, a soft palate position, a pharyngeal cavity opening degree and a larynx height; S3, carrying out frequency spectrum analysis on the audio signal to extract a formant characteristic parameter set F, wherein the formant characteristic parameter set F comprises a first formant frequency F1, a second formant frequency F2 and a third formant frequency F3; S4, calculating observable reliability of each gesture parameter in the gesture parameter set P and executing gating, wherein a first throat height value and a second throat height value are obtained for the throat height, the first throat height value and the second throat height value are obtained through two mutually independent measuring channels, consistency check is carried out on the first throat height value and the second throat height value, and gating corresponding to the throat height is closed when the consistency check is not passed; S5, inputting the gesture parameter subset and the formant characteristic parameter set F through gating into a gesture-resonance mapping model to obtain a target deviation evaluation result and gesture adjustment output; S6, outputting and generating quantitative prompt information comprising an adjustment direction and an adjustment amplitude according to the posture adjustment quantity, and applying time smoothing and speed constraint to the quantitative prompt information; And S7, carrying out closed loop convergence judgment according to the target deviation evaluation result, outputting a standard reaching instruction when the preset convergence condition is met, and returning to the step S1 to continue execution when the preset convergence condition is not met.
- 2. The resonance control prompting method based on the oral cavity-pharyngeal cavity-laryngeal cavity posture according to claim 1, wherein the posture-resonance mapping model is obtained through training of a training data set, and the construction of the training data set comprises the following steps: Collecting three-dimensional imaging data and audio signals of a plurality of subjects under the conditions of a plurality of pitches and a plurality of vowels, and performing time synchronization; Extracting a gesture parameter set P and a resonance peak characteristic parameter set F; Establishing a target formant vector and a gesture parameter target interval corresponding to a target singing style; And constructing a supervision signal or a pseudo-supervision signal based on the target formant vector and the attitude parameter target interval, and weighting or screening out the training samples by adopting the sample credibility to obtain an attitude-resonance mapping model for outputting a target deviation evaluation result and attitude adjustment quantity output.
- 3. The resonance control prompting method based on the oral cavity-pharyngeal cavity-laryngeal cavity posture according to claim 2, wherein the closed loop convergence determination is performed based on a deviation metric value D, and D is calculated according to the following relation: , Wherein, the Is that Is selected from the group consisting of the i-th formant frequency, The ith target formant frequency corresponding to the target singing style is obtained; for the j-th pose parameter in the set of pose parameters P, The j-th target gesture parameter center value corresponding to the target singing style; Index set for attitude parameter through gate control; And (3) with Is a weight coefficient, when D is not greater than the threshold value in N continuous frames When the preset convergence condition is met, N is a positive integer, Is a preset threshold.
- 4. The resonance control prompting method based on oral-pharyngeal-laryngeal posture according to claim 3, wherein the target singing styles comprise a chorus method, a national singing method and a popular singing method, each target singing style corresponds to a style parameter set S, and the style parameter set S comprises a target formant vector Attitude parameter target space or target center value Weight coefficient And (3) with The step S4, the step S5 and the step S7 are all executed based on the currently selected style parameter set S.
- 5. The resonance control prompting method based on the oral cavity-pharyngeal cavity-laryngeal cavity posture according to claim 1, wherein the observable credibility of the jth posture parameter is recorded as , The value of (2) is in the range of 0 to 1, From visible measurements Depth quality metric Error metric with consistency Determining and satisfying the following relation: , Wherein, the To map real numbers to a monotonic function of 0 to 1, Is a calibration coefficient; The ratio of the number of pixels of the effective depth in the neighborhood of the corresponding key point to the attitude parameter to the total number of pixels; is a depth quality index determined by the neighborhood depth variance of the key points and the hole rate, when the gesture parameter is the throat height, Is the absolute difference between the first throat height value and the second throat height value.
- 6. The resonance control prompting method based on oral cavity-pharyngeal cavity-laryngeal cavity posture according to claim 1, wherein the time smoothing and rate constraint is used for prompting the j-th posture parameter Executing, wherein the following relation is satisfied: , And meet the following requirements , Wherein, the The original adjustment amount output for the posture-resonance mapping model, Is a smoothing coefficient and has a value ranging from 0 to less than 1, As an upper limit of the rate at which the data is to be processed, Is the frame interval time.
- 7. The resonance control prompting method based on oral cavity-pharyngeal cavity-laryngeal cavity posture according to claim 1, wherein the degradation prompt comprises a prompt and relaxation instruction based on formant deviation, and when the observable reliability of any posture parameter of tongue position, soft palate position and pharyngeal cavity opening is lower than a preset threshold, the output of quantitative prompt information of the posture parameter is forbidden, and only the output based on the quantitative prompt information is output And generating a mouth shape prompt and a relaxation instruction prompt corresponding to the deviation direction of the target formant frequency.
- 8. The resonance control prompting method based on the oral cavity-pharyngeal cavity-laryngeal cavity posture according to claim 1, further comprising the steps of abnormality detection and protection when the change rate of the laryngeal height exceeds a threshold Or when the jitter amplitude of the formant sequence in the preset time window exceeds a threshold value, suspending outputting quantitative prompt information and outputting a relaxation instruction, and suspending closed loop convergence judgment until the abnormal condition is relieved and then resuming the steps S1 to S7.
- 9. Resonance regulation and control suggestion device based on oral cavity-pharyngeal cavity-laryngeal cavity gesture, characterized by comprising: the three-dimensional imaging device is used for collecting three-dimensional imaging data of the singer in the sounding process; The sound collection device is used for collecting audio signals in the sounding process; The throat motion sensor is used for outputting throat motion signals; A control processing unit configured to execute a resonance regulation prompting method based on an oral cavity-pharyngeal cavity-laryngeal cavity posture according to any one of claims 1 to 8, communicatively connect with a three-dimensional imaging device, a sound collecting device and a laryngeal movement sensor, extract a posture parameter set P based on three-dimensional imaging data, extract a formant characteristic parameter set F based on audio signals, calculate observable reliability and perform gating, obtain a first laryngeal height value determined by the three-dimensional imaging data and a second laryngeal height value determined by the laryngeal movement sensor and perform consistency check, input a posture-resonance mapping model through a posture parameter subset and the formant characteristic parameter set F of gating to obtain a target deviation evaluation result and a posture adjustment amount output, generate quantitative prompting information and apply time smoothness and rate constraint, output a degradation prompt when the reliability is insufficient or consistency check fails, and perform closed loop convergence judgment; The prompt output unit is connected with the control processing unit and used for outputting quantitative prompt information, degradation prompt and standard indication.
- 10. The resonance regulation and control prompting device based on the oral cavity-pharyngeal cavity-laryngeal cavity posture according to claim 9, wherein the control processing unit comprises a style parameter library and a model parameter library, the style parameter library stores target formant vectors, posture parameter target intervals or target central values, weight coefficients, gating thresholds and convergence thresholds corresponding to a chorus method, a national singing method and a popular singing method, the model parameter library stores posture-resonance mapping model parameters and observable reliability calculation parameters, and the control processing unit loads a style parameter set from the style parameter library based on a currently selected target singing style and drives the prompting output unit to output prompting information matched with a target singing style.
Description
Resonance regulation prompting method based on oral cavity-pharyngeal cavity-laryngeal cavity posture Technical Field The invention relates to the technical field of human-computer interaction and multi-mode signal processing of vocal music training, in particular to a resonance regulation prompting method based on oral cavity-pharyngeal cavity-laryngeal cavity postures. Background In vocal music training, resonance regulation and control generally depends on coordination adjustment of vocal channel forms such as opening and closing of mouth, tongue position height, soft palate lifting, pharyngeal cavity opening, laryngeal position lifting and the like by a practitioner, so that distribution of a first formant frequency F1, a second formant frequency F2 and a third formant frequency F3 is matched with a target tone. In practical teaching and self-training scenarios, common auxiliary modes include displaying spectrum or formant tracks based on a mobile phone or a computer microphone to prompt resonance positions, face/mouth shape key point detection based on a common camera or a depth camera to prompt mouth shape opening and closing and mandibular movements, and throat position change estimation by wearable throat sensor or neck surface marker point tracking. However, in a real training environment, the above-mentioned approaches generally face the same core contradiction that the oral cavity interior, pharyngeal cavity and laryngeal cavity posture directly related to the formation of the resonance have objectivity which can not be directly observed or is easy to be misdetected. The concrete steps are as follows: (1) Under the condition of a depth camera or structured light collection, the internal structures such as the lingual back, the soft palate and the like are influenced by visual angles and shielding, and key point extraction is missing or drifting in certain frames; (2) In the aspect of throat position estimation, when the throat position estimation is based on the neck surface contour tracking, the swallowing, the neck skin displacement and the head posture change can introduce pseudo displacement inconsistent with the real lifting of the throat; (3) In the aspect of prompt output, the unstable measured value is often directly used for prompt in the existing scheme, so that the jitter phenomenon that the current frame prompts the descending of the throat and the next frame prompts the ascending of the throat occurs, or the tongue position quantitative prompt is still output when the tongue position is invisible, the prompt credibility is difficult to judge by a trainer, and the training direction is further deviated. Therefore, the main problem of the prior art is that under the condition that the posture of the oral cavity, the pharyngeal cavity and the laryngeal cavity is unobservable or mismeasured, the resonance regulation and control prompting method and the resonance regulation and control prompting device which can restrict the observation reliability, avoid misprompting and keep closed loop stable convergence are lacked. This problem directly affects the stability of the prompts, training consistency and repeatability. Disclosure of Invention Aiming at the defects of the prior art, the invention discloses a resonance regulation prompting method based on the posture of an oral cavity, a pharyngeal cavity and a laryngeal cavity, which is characterized in that the reliability of posture observation is quantitatively restrained, the height of the laryngeal is subjected to double-source consistency verification, and a prompting stabilization and degradation strategy and closed loop convergence judgment are combined, so that a system can still output stable and reliable quantitative prompts under the condition of unobservable or misdetection, and the training consistency and repeatability are improved. The technical scheme is that in order to achieve the technical purpose, the invention adopts the following technical scheme: a resonance regulation prompting method based on oral cavity-pharyngeal cavity-laryngeal cavity posture specifically comprises the following steps: S1, acquiring three-dimensional imaging data of a singer in a sounding process by using three-dimensional imaging equipment, acquiring an audio signal of the sounding process by using a sound acquisition device, and performing time synchronization on the three-dimensional imaging data and the audio signal; S2, extracting a posture parameter set P based on three-dimensional imaging data, wherein the posture parameter set P comprises an opening and closing distance of a mouth shape, an opening and closing angle of a lower jaw, a tongue position height, a soft palate position, a pharyngeal cavity opening degree and a larynx height; S3, carrying out frequency spectrum analysis on the audio signal to extract a formant characteristic parameter set F, wherein the formant characteristic parameter set F comprises a first formant frequency